-
Notifications
You must be signed in to change notification settings - Fork 0
Data Audit
These are some of the issues that arose during the most recent data audit.
Document not found in DocumentCloud, no match for hyperlink/source or MatterAttachmentId
Search for the document in DocumentCloud.
It is possible that it exists but was not returned in the search results because it is "Processing" or "Failed Import".
If the document does not actually exist, run pull_pdfs
for the associated Matter id.
Document not found in DocumentCloud, no match for hyperlink/source but MatterAttachmentId is associated with different hyperlink/source
We are not sure why this happens. If the MatterAttachment.hyperlink
changed, then the MatterAttachment.last_modified
should have changed, and pull_pdfs
would have seen that last_modified >= link_obtained_at
. Spot checks indicated the the database matches the most recent information from Legistar.
Manual fix: Run pull_pdfs
for the associated Matter id and manually change the old document to Access: Private
on DocumentCloud.
Multiple documents in DocumentCloud with the same MatterAttachmentId but a different "source"
The document was updated in Legistar with a new hyperlink but the same MatterAttachmentId. The MatterAttachment record matched the latest information from Legistar and the most recently uploaded document.
This should be fixed by updates to the pull_attachments
command, which now looks for changes to MatterAttachment.hyperlink
and will privatize the old document on DocumentCloud.
Manual fix: Manually change the old document to Access: Private
on DocumentCloud.
Data Mismatch
The data associated with a Matter does not match the document data associated with a related document in DocumentCloud. The pull_pdfs
query did not take changes to Matter-related data into account.
Some mismatches will self-correct once the appropriate cron job runs. Others should be prevented by updates to the pull_pdfs
command, which now also takes the Matter.last_modified
timestamp into account.
Manual fix: Run pull_pdfs
for the associated Matter id.