Skip to content

Data Audit

J Wilson edited this page Oct 22, 2015 · 5 revisions

These are some of the issues that arose during the most recent data audit.

Document not found in DocumentCloud, no match for hyperlink/source or MatterAttachmentId

Search for the document in DocumentCloud.

It is possible that it exists but was not returned in the search results because it is "Processing" or "Failed Import".

If the document does not actually exist, run pull_pdfs for the associated Matter id.

Document not found in DocumentCloud, no match for hyperlink/source but MatterAttachmentId is associated with different hyperlink/source

We are not sure why this happens. If the MatterAttachment.hyperlink changed, then the MatterAttachment.last_modified should have changed, and pull_pdfs would have seen that last_modified >= link_obtained_at. Spot checks indicated the the database matches the most recent information from Legistar.

Manual fix: Run pull_pdfs for the associated Matter id and manually change the old document to Access: Private on DocumentCloud.

Multiple documents in DocumentCloud with the same MatterAttachmentId but a different "source"

The document was updated in Legistar with a new hyperlink but the same MatterAttachmentId. The MatterAttachment record matched the latest information from Legistar and the most recently uploaded document.

This should be fixed by updates to the pull_attachments command, which now looks for changes to MatterAttachment.hyperlink and will privatize the old document on DocumentCloud.

Manual fix: Manually change the old document to Access: Private on DocumentCloud.

Data Mismatch

The data associated with a Matter does not match the document data associated with a related document in DocumentCloud. The pull_pdfs query did not take changes to Matter-related data into account.

Some mismatches will self-correct once the appropriate cron job runs. Others should be prevented by updates to the pull_pdfs command, which now also takes the Matter.last_modified timestamp into account.

Manual fix: Run pull_pdfs for the associated Matter id.

Clone this wiki locally