-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling Metacat datasets submission failures #2167
Comments
@helbashandy Thanks for the report. Could you provide some examples of problematic documents that pass on the client side but fail on insert to Metacat? Metacat uses the EML project EML validator, which is definitely more robust than what we do on the client side. On the client side, we do not have a full EML validator written in javascript, so simpler field validation is used IIRC. With your examples, we might be able to catch a few more of these, but I would say that character encoding problems in browser editing fields are the worst and very hard to automatically fix (because there is often no correct solution). |
@mbjones @rushirajnenuji We had another instance this week where a user received an error when they were saving a dataset and the resource map was not created. The confluence of issues
This particular error is of no concern to the user and should not have affected the experience saving (since the metadata was updated anyway). We already receive system emails for these errors. Additionally, in other cases where there is an issue with the EML and the dataset metadata is NOT saved. We also result in a missing resource map. |
Additional dataset drafts that caused missing resource map ess-dive-47c069f4c82c6b3-20231115T191156677_draft.txt - this turned out to be an issue with the lat/lon coordinates IIRC. |
Also note that any upload errors that are encountered will cause a missing resource map as well. |
@rushirajnenuji @robyn This is seeming related to issue #1318 and #1586 as well. In both cases, it seems like a failure to save or validate a document leads to a corrupted/missing resource map. The character encoding problems are still tough to deal with, as discussed above, but we should be able to be better at detecting the validation error on the save to metacat and not lose content when that happens. Our error handling pipeline in MetacatUI seems to miss that metacat produces an error and silently moves on. This has been a common thread and involves data loss, so I am going to label this as critical. |
#2134 is a similar issue but equally important in that is can result in a missing resourceMap |
We encountered a data corruption use case on February 6th, 2024, around 9:35am PST. This is another example related to #2134 where a failed file upload prevented the dataset metadata from submitting, in this case the file upload timed out and crashed rather than failing with an error message. The user was creating a new dataset and since the upload crashed before the submit button was hit, we don't even have a draft eml file. The user had to start over from scratch without support. More details on the steps the user took here:
|
Another use case from 06/24/2024. Resource map broke and dataset could not be indexed because special characters were used in dataset metadata. This user requested to publish a dataset and the ESS-DIVE team requested they make changes to meet our quality checks. When the user revised their dataset, they put some kind of special character in "Step 7" of the methods. The user did not report this to us. We noticed this issue on Jun 28th because our Jira automation (which creates new publication request notifications and updates existing publication requests with new PID versions) was broken; we were not being notified of new publication requests. Solution: Create new dataset version. Converted special characters to utf-8 character set, but we can't know what should be written there. User will have to review and correct the characters. This solution also caused some duplication of requests on Jira that we had to correct. @vchendrix can provide more details on this use case. Broken dataset version: https://data.ess-dive.lbl.gov/view/ess-dive-3619bd077a60b7c-20240624T120319367 |
Logging another issue on wfsi-data.org that we encountered which may have been due to the user saving in quick succession. Error scenarioMessage from user on 9/10/2024
Error message on saveFrom the look of their history (See User's edit history below) they were not editing metadata but just trying to upload files. It seems like they were saving in between.
Metadata User's edit history<!-- save error 4 -->
<doc>
<str name="id">wfsi-20240910T004725372-cf9f57a473147b7</str>
<str name="fileName">eml_draft_Flanary.txt</str>
<str name="formatId">text/plain</str>
<long name="size">7953</long>
<date name="dateUploaded">2024-09-10T00:47:25.541Z</date>
</doc>
<!-- Upload and save error 3 -->
<doc>
<str name="id">wfsi-20240910T004724697-dfdbf8be3846281</str>
<str name="fileName">eml_draft_Flanary.txt</str>
<str name="formatId">text/plain</str>
<long name="size">7953</long>
<date name="dateUploaded">2024-09-10T00:47:24.902Z</date>
</doc>
<doc>
<str name="id">wfsi-20240910T004701546-f0ac034951449dd</str>
<str name="fileName">GCP_locations.csv</str>
<str name="formatId">text/csv</str>
<long name="size">428</long>
<date name="dateUploaded">2024-09-10T00:47:03.072Z</date>
</doc>
<!-- save error 2 -->
<doc>
<str name="id">wfsi-20240910T004203134-3d871fdcd433e9b</str>
<str name="fileName">eml_draft_Flanary.txt</str>
<str name="formatId">text/plain</str>
<long name="size">7806</long>
<date name="dateUploaded">2024-09-10T00:42:03.300Z</date>
</doc>
<!-- Upload and save error 1 -->
<doc>
<str name="id">wfsi-20240910T004202614-38133400041596b</str>
<str name="fileName">eml_draft_Flanary.txt</str>
<str name="formatId">text/plain</str>
<long name="size">7806</long>
<date name="dateUploaded">2024-09-10T00:42:02.789Z</date>
</doc>
<doc>
<str name="id">wfsi-20240910T004131325-abdd4d5227c40eb</str>
<str name="fileName">fuel_cover_and_loadings.csv</str>
<str name="formatId">text/csv</str>
<long name="size">9588</long>
<date name="dateUploaded">2024-09-10T00:41:32.442Z</date>
</doc>
<!-- Missing resource map -->
<doc>
<str name="id">wfsi-20240910T004121322-ff8fc344276c691</str>
<str name="fileName">Fuels_data_for_2019_Closing_Gaps_Sycan_Nature.xml</str>
<str name="formatId">https://eml.ecoinformatics.org/eml-2.2.0</str>
<long name="size">7349</long>
<date name="dateUploaded">2024-09-10T00:41:21.857Z</date>
</doc>
<!-- Last successful save -->
<doc>
<str name="id">wfsi-20240910T004051165-e2680b6f2b52db2</str>
<str name="fileName">wfsi_20240910T004051165_e2680b6f2b52db2.rdf.xml</str>
<str name="formatId">http://www.openarchives.org/ore/terms</str>
<long name="size">4849</long>
<date name="dateUploaded">2024-09-10T00:40:52.887Z</date>
</doc>
<doc>
<str name="id">wfsi-20240910T004051181-dea693a41e54b3f</str>
<str name="fileName">Fuels_data_for_2019_Closing_Gaps_Sycan_Nature.xml</str>
<str name="formatId">https://eml.ecoinformatics.org/eml-2.2.0</str>
<long name="size">7349</long>
<date name="dateUploaded">2024-09-10T00:40:51.756Z</date>
</doc>
<doc>
<str name="id">wfsi-20240910T004039869-bf24e8d6cfd5be1</str>
<str name="fileName">wfsi_20240910T004039869_bf24e8d6cfd5be1.rdf.xml</str>
<str name="formatId">http://www.openarchives.org/ore/terms</str>
<long name="size">4849</long>
<date name="dateUploaded">2024-09-10T00:40:42.105Z</date>
</doc>
<doc>
<str name="id">wfsi-20240910T004039885-bd5beaee9ce8563</str>
<str name="fileName">Fuels_data_for_2019_Closing_Gaps_Sycan_Nature.xml</str>
<str name="formatId">https://eml.ecoinformatics.org/eml-2.2.0</str>
<long name="size">7349</long>
<date name="dateUploaded">2024-09-10T00:40:40.938Z</date>
</doc>
<doc>
<str name="id">wfsi-20240910T000146916-b1624fc4efee242</str>
<str name="fileName">outer_E_postburn.e57</str>
<str name="formatId">application/octet-stream</str>
<long name="size">246889472</long>
<date name="dateUploaded">2024-09-10T00:29:53.078Z</date>
</doc>
<doc>
<str name="id">wfsi-20240910T000147049-d2085315f3f422f</str>
<str name="fileName">outer_W_postburn.e57</str>
<str name="formatId">application/octet-stream</str>
<long name="size">245432320</long>
<date name="dateUploaded">2024-09-10T00:29:50.933Z</date>
</doc>
<doc>
<str name="id">wfsi-20240910T000147044-6ea30ade6b69ee5</str>
<str name="fileName">outer_S_postburn.e57</str>
<str name="formatId">application/octet-stream</str>
<long name="size">244446208</long>
<date name="dateUploaded">2024-09-10T00:29:47.040Z</date>
</doc>
<doc>
<str name="id">wfsi-20240910T000146982-d04e1f0045e41b8</str>
<str name="fileName">outer_N_postburn.e57</str>
<str name="formatId">application/octet-stream</str>
<long name="size">243701760</long>
<date name="dateUploaded">2024-09-10T00:29:37.369Z</date>
</doc> Thoughts on potential measures prevent this from happeningIs it possible to prevent a dataset from being saved again until the previous save has been fully indexed? Maybe you can determine if the resource map there? If not, prevent uploading and saving? |
Describe the Issue
On the ESS-DIVE project, we found couple of use-cases where a dataset submission's validation would pass on MetacatUI but fails on Metacat. One example is entering special symbol characters in the abstract. Another use case is when due to network issues a data upload fails.
Currently, if uncaught issues happens on the submission, the user could potentially use their entries and then they would have to contact the team to recover the data. We were wondering if there was a way to implement general exception handling to those Metacat errors when they arise, so that the user would be able to correct their entry based on the error.
The text was updated successfully, but these errors were encountered: