Skip to content

Structured Data API for Harvesting Crowdsourced Contributions

Ben W. Brumfield edited this page Feb 19, 2023 · 16 revisions

Structured Data API for Harvesting Crowdsourced Contributions

The following API supports harvesting crowdsourced contributions of structured data from FromThePage for integration with library, archive, museum and publishing systems.

Background

Although crowdsourcing efforts in cultural heritage have proven successful, integrating crowd contributions back into institutional systems--archival finding aids, library catalog systems, museum collection management database, or digital edition publishing platforms--remains a challenge. With the support of the IIIF Consortium, a IIIF-based API was developed in 2016/2017 to support harvesting free-form transcription and translation data from FromThePage. However, this API is of limited use for institutions running field-based, spreadsheet-based, or metadata creation projects, since transcripts must be scraped for the structured data rendered within them.

In our experience, institutions using FromThePage's existing spreadsheet-based exports need the following:

  • What the user-entered data is
  • What kind of object the data represents (an individual page vs. metadata for an entire work)
  • Which fields in the user-entered data correspond with which fields in their institutional systems
  • Who created the data
  • Any contextual information about the data's reliability, like user-created notes or items flagged for review

Exposing this data helps institutions evaluate quality and make the data usable, even after it has been exported.

Structured Data Endpoint Response

Usage: {protocol}://{domain}/iiif/{work id}/structured/{page id}

The response contains the following elements:

Contributors

The contributors stanza contains an unordered array of users who have made substantial edits to the data (including edits and transcriptions but excluding approvals or notes). Each user element will contain a user_name containing the pseudonym displayed on the system. User elements may contain real_name and orcid elements for contributor credit if those have been provided by the user.

Configuration

The config element of the top-level structured data response contains a URI which will fetch the project configuration used to create this data. This includes the types of data fields, their labels, any controlled vocabularies, and layout information.

Data

The data stanza of the response contains the actual data contributed by the users who have edited this page.

Each element of the data array contains

  • label: The human-readable label presented to the person who transcribed the field
  • value: The string value of the data
  • config: A URI representing the configuration for this particular field. (This can be used as an ID to map fields in a target system to fields in the FromThePage structured data response.)

Notes

The notes stanza embeds any comments left by users creating the data. This element should not appear if no notes exist.

On

The on stanza indicates the canvas or manifest corresponding to the page or work the data was created from. Canvas stanzas will contain a within element with the @id of the manifest containing the canvas.

Status

The pageStatus and workStatus elements reflect the status of the work and page being fetched.

Other Elements

The the context, profile, label and @id elements work as normal in IIIF-based APIs

Example

Example image The above image was transcribed as part of the Indiana WWI Service Cards project, producing the following data:

{
   "contributors":[
      {
         "userName":"geni"
      }
   ],
   "data":[
      {
         "label":"Last Name",
         "value":"Gilbert",
         "config":"http://localhost:3000/iiif/structured/config/field/460"
      },
      {
         "label":"First Name",
         "value":"Clifford",
         "config":"http://localhost:3000/iiif/structured/config/field/461"
      },
      {
         "label":"Middle Name",
         "value":"O",
         "config":"http://localhost:3000/iiif/structured/config/field/462"
      },
      {
         "label":"Serial Number",
         "value":"782884",
         "config":"http://localhost:3000/iiif/structured/config/field/463"
      },
      {
         "label":"Race",
         "value":"Caucasian",
         "config":"http://localhost:3000/iiif/structured/config/field/466"
      },
      {
         "label":"Branch",
         "value":"Army or Marines",
         "config":"http://localhost:3000/iiif/structured/config/field/471"
      },
      {
         "label":"Town or City of Residence",
         "value":"Peru, Indiana",
         "config":"http://localhost:3000/iiif/structured/config/field/467"
      },
      {
         "label":"County of Residence",
         "value":"",
         "config":"http://localhost:3000/iiif/structured/config/field/473"
      },
      {
         "label":"Place of Birth",
         "value":"Peru, Indiana",
         "config":"http://localhost:3000/iiif/structured/config/field/468"
      },
      {
         "label":"Date of Birth",
         "value":"",
         "config":"http://localhost:3000/iiif/structured/config/field/469"
      },
      {
         "label":"Age",
         "value":"23 8/12",
         "config":"http://localhost:3000/iiif/structured/config/field/470"
      },
      {
         "label":"Is this card a reverse side? (Indicated by \"-B\")",
         "value":"no",
         "config":"http://localhost:3000/iiif/structured/config/field/472"
      }
   ],
   "config":"http://localhost:3000/iiif/246/structured/config/page",
   "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-endpoint-response",
   "on":{
      "@type":"sc:Canvas",
      "@id":"http://localhost:3000/iiif/52679/canvas/1742382",
      "within":"http://localhost:3000/iiif/52679/manifest"
   },
   "@id":"http://localhost:3000/iiif/52679/structured/1742382",
   "label":"Structured data (field-based or spreadsheet transcriptions) for canvas",
   "notes":"http://localhost:3000/iiif/1742382/list/notes",
   "pageStatus":{
      "@context":"http://www.fromthepage.org/jsonld/1/context.json",
      "@id":"http://localhost:3000/iiif/52679/1742382/status",
      "label":"Page Status",
      "profile":"https://github.com/benwbrum/fromthepage/wiki/FromThePage-Support-for-the-IIIF-Presentation-API-and-Web-Annotations#service-1",
      "pageStatus":[
         "hasTranscript"
      ]
   },
   "workStatus":{
      "@context":"http://www.fromthepage.org/jsonld/1/context.json",
      "@id":"http://localhost:3000/iiif/52679/status",
      "label":"Work Status",
      "profile":"https://github.com/benwbrum/fromthepage/wiki/FromThePage-Support-for-the-IIIF-Presentation-API-and-Web-Annotations#service",
      "pctComplete":100.0,
      "pctTranscribed":100.0,
      "pctOcrCorrected":0.0,
      "pctIndexed":0,
      "pctMarkedBlank":0,
      "pctNeedsReview":0,
      "pctTranslationComplete":0,
      "pctTranslated":0,
      "pctTranslationNeedsReview":0,
      "pctTranslationIndexed":0,
      "pctTranslationMarkedBlank":0,
      "metadataStatus":"undescribed"
   }
}

Structured Data Configuration Response

Usage: {protocol}://{domain}/iiif/{collection id}/structured/config/{level}

Configuration

The configuration response returned from the config URI of the structured data response represents the project configuration as an array of field configurations. Each field configuration element contains

  • @id URI identifying the field. This URI is dereferenceable, and will fetch the configuration for the particular field.
  • label The label for the field presented to contributors
  • row The row on the data entry form on which this field should appear
  • position The position within the row on which this field should appear
  • page (optional) For multi-page forms, the page on which this row/field should appear
  • input_type The type of the field input; input types supported (as of 2022-01-17) include "text", "select", "date", "textarea", "description", "instruction", "spreadsheet","multiselect"
    • Fields of input type select or multiselect may contain an element options containing an array of possible options for user selection. (Note that users may override the option list in some cases.)
    • Fields configured as spreadsheets will contain an additional stanza spreadsheet_columns, an array of label, input_type, position and optional options elements, defining how each spreadsheet column is configured.
{
   "@id":"http://localhost:3000/iiif/246/structured/config/page",
   "label":"Transcription field configuration for Indiana World War I Service Record Cards",
   "config":[
      {
         "label":"Last Name",
         "input_type":"text",
         "position":1,
         "line":1,
         "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
         "@id":"http://localhost:3000/iiif/structured/config/field/460"
      },
      {
         "label":"First Name",
         "input_type":"text",
         "position":2,
         "line":1,
         "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
         "@id":"http://localhost:3000/iiif/structured/config/field/461"
      },
      {
         "label":"Middle Name",
         "input_type":"text",
         "position":3,
         "line":1,
         "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
         "@id":"http://localhost:3000/iiif/structured/config/field/462"
      },
      {
         "label":"Serial Number",
         "input_type":"text",
         "position":4,
         "line":2,
         "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
         "@id":"http://localhost:3000/iiif/structured/config/field/463"
      },
      {
         "label":"Race",
         "input_type":"select",
         "position":5,
         "line":2,
         "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
         "@id":"http://localhost:3000/iiif/structured/config/field/466",
         "options":[
            "Caucasian",
            "African American",
            "Other",
            "Not Given"
         ]
      },
      {
         "label":"Town or City of Residence",
         "input_type":"text",
         "position":7,
         "line":3,
         "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
         "@id":"http://localhost:3000/iiif/structured/config/field/467"
      },
      {
         "label":"Place of Birth",
         "input_type":"text",
         "position":9,
         "line":4,
         "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
         "@id":"http://localhost:3000/iiif/structured/config/field/468"
      },
      {
         "label":"Date of Birth",
         "input_type":"text",
         "position":10,
         "line":4,
         "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
         "@id":"http://localhost:3000/iiif/structured/config/field/469"
      },
      {
         "label":"Age",
         "input_type":"text",
         "position":11,
         "line":4,
         "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
         "@id":"http://localhost:3000/iiif/structured/config/field/470"
      },
      {
         "label":"Branch",
         "input_type":"select",
         "position":6,
         "line":2,
         "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
         "@id":"http://localhost:3000/iiif/structured/config/field/471",
         "options":[
            "Army or Marines",
            "Navy",
            "Coast Guard",
            "Nurse"
         ]
      },
      {
         "label":"Is this card a reverse side? (Indicated by \"-B\")",
         "input_type":"select",
         "position":12,
         "line":5,
         "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
         "@id":"http://localhost:3000/iiif/structured/config/field/472",
         "options":[
            "no",
            "yes"
         ]
      },
      {
         "label":"County of Residence",
         "input_type":"text",
         "position":8,
         "line":3,
         "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
         "@id":"http://localhost:3000/iiif/structured/config/field/473"
      }
   ],
   "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-configuration-response"
}

Structured Data Field Configuration Response

Dereferencing an individual field configuration will fetch an object identical to the object appearing in the project-wide configuration response, with the addition of a within element containing the project configuration URI.

Usage: {protocol}://{domain}/iiif/structured/config/field/{field id}

{
   "label":"Branch",
   "input_type":"select",
   "position":6,
   "line":2,
   "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
   "@id":"http://localhost:3000/iiif/structured/config/field/471",
   "options":[
      "Army or Marines",
      "Navy",
      "Coast Guard",
      "Nurse"
   ],
   "within":"http://localhost:3000/iiif/246/structured/config/page"
}

Structured Data Spreadsheet Column Configuration Response

{
   "label":"Persons Names (LN,FN)",
   "input_type":"text",
   "position":1,
   "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-spreadsheet-column-configuration-response",
   "@id":"http://localhost:3000/iiif/structured/config/column/6",
   "within":"http://localhost:3000/iiif/structured/config/field/3060"
}

Structured Data Endpoint Reference

References to the structured data response are embedded within a manifest in a seeAlso block in the canvas (for field-based/spreadsheet transcription projects) or in the manifest itself (for item metadata creation projects).

Example

{
  "@context": "http://iiif.io/api/presentation/2/context.json",
  "@id": "http://localhost:3000/iiif/52679/manifest",
  "@type": "sc:Manifest",
  "label": "IN WWI Service Record Cards Army and Marine GIL-GOF",
...
  "sequences": [
    {
      "@id": "http://localhost:3000/iiif/52679/sequence/default",
      "@type": "sc:Sequence",
...
      "canvases": [
...
        {
          "@id": "http://localhost:3000/iiif/52679/canvas/1742382",
          "@type": "sc:Canvas",
          "label": "WWI0000932-A",
...
          "seeAlso": [
...
            {
              "@id": "http://localhost:3000/iiif/52679/structured/1742382",
              "label": "Structured data (field-based or spreadsheet transcriptions) for canvas",
              "format": "application/ld+json",
              "@context": "http://www.fromthepage.org/jsonld/structured/1/context.json",
              "profile": "https://github.com/benwbrum/fromthepage/wiki/FromThePage-Support-for-the-IIIF-Presentation-API-and-Web-Annotations#structured-data-service"
            }
          ],

Spreadsheet example

Spreadsheet-based transcription projects are a subset of structured data projects. Their data response is slightly more complex than field-based projects.

Screenshot from 2022-04-30 12-34-51

Structured Data Endpoint Response(subset)

{
   "contributors":[
      {
         "userName":"heidimarie"
      }
   ],
   "data":[
      {
         "label":"County",
         "value":"Pasquotank County",
         "config":"http://localhost:3000/iiif/structured/config/field/3056"
      },
      {
         "label":"Day",
         "value":"",
         "config":"http://localhost:3000/iiif/structured/config/field/3057"
      },
      {
         "label":"Month",
         "value":"",
         "config":"http://localhost:3000/iiif/structured/config/field/3058"
      },
      {
         "label":"Year",
         "value":"1769",
         "config":"http://localhost:3000/iiif/structured/config/field/3059"
      },
      {
         "data":[
            [
               {
                  "label":"Persons Names (LN,FN)",
                  "value":"Brought Forward",
                  "config":"http://localhost:3000/iiif/structured/config/column/6"
               },
               {
                  "label":"Whites",
                  "value":"940",
                  "config":"http://localhost:3000/iiif/structured/config/column/7"
               },
               {
                  "label":"Black Males",
                  "value":"506",
                  "config":"http://localhost:3000/iiif/structured/config/column/8"
               },
               {
                  "label":"Black Females",
                  "value":"249",
                  "config":"http://localhost:3000/iiif/structured/config/column/9"
               },
               {
                  "label":"Total",
                  "value":"1695",
                  "config":"http://localhost:3000/iiif/structured/config/column/10"
               }
            ],
            [
               {
                  "label":"Persons Names (LN,FN)",
                  "value":"Williams, Williss",
                  "config":"http://localhost:3000/iiif/structured/config/column/6"
               },
               {
                  "label":"Whites",
                  "value":"3",
                  "config":"http://localhost:3000/iiif/structured/config/column/7"
               },
               {
                  "label":"Total",
                  "value":"3",
                  "config":"http://localhost:3000/iiif/structured/config/column/10"
               }
            ],
            [
               {
                  "label":"Persons Names (LN,FN)",
                  "value":"Williams, [Elhmey?]",
                  "config":"http://localhost:3000/iiif/structured/config/column/6"
               },
               {
                  "label":"Whites",
                  "value":"1",
                  "config":"http://localhost:3000/iiif/structured/config/column/7"
               },
               {
                  "label":"Total",
                  "value":"1",
                  "config":"http://localhost:3000/iiif/structured/config/column/10"
               }
            ],
            [
               {
                  "label":"Persons Names (LN,FN)",
                  "value":"[Wooldudge?], John",
                  "config":"http://localhost:3000/iiif/structured/config/column/6"
               },
               {
                  "label":"Whites",
                  "value":"1",
                  "config":"http://localhost:3000/iiif/structured/config/column/7"
               },
               {
                  "label":"Total",
                  "value":"1",
                  "config":"http://localhost:3000/iiif/structured/config/column/10"
               }
            ],
         "config":"http://localhost:3000/iiif/structured/config/field/3060"
      }
   ],
   "config":"http://localhost:3000/iiif/1195/structured/config/page",
   "profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-endpoint-response",
   "on":{
      "@type":"sc:Canvas",
      "@id":"http://localhost:3000/iiif/57901/canvas/1832940",
      "within":"http://localhost:3000/iiif/57901/manifest"
   },
   "@id":"http://localhost:3000/iiif/57901/structured/1832940",
   "label":"Structured data (field-based or spreadsheet transcriptions) for canvas",
   "pageStatus":{
      "@context":"http://www.fromthepage.org/jsonld/1/context.json",
      "@id":"http://localhost:3000/iiif/57901/1832940/status",
      "label":"Page Status",
      "profile":"https://github.com/benwbrum/fromthepage/wiki/FromThePage-Support-for-the-IIIF-Presentation-API-and-Web-Annotations#service-1",
      "pageStatus":[
         "needsReview",
         "hasTranscript"
      ]
   },
   "workStatus":{
      "@context":"http://www.fromthepage.org/jsonld/1/context.json",
      "@id":"http://localhost:3000/iiif/57901/status",
      "label":"Work Status",
      "profile":"https://github.com/benwbrum/fromthepage/wiki/FromThePage-Support-for-the-IIIF-Presentation-API-and-Web-Annotations#service",
      "pctComplete":0,
      "pctTranscribed":0,
      "pctOcrCorrected":0.0,
      "pctIndexed":0,
      "pctMarkedBlank":0,
      "pctNeedsReview":100.0,
      "pctTranslationComplete":0,
      "pctTranslated":0,
      "pctTranslationNeedsReview":0,
      "pctTranslationIndexed":0,
      "pctTranslationMarkedBlank":0,
      "metadataStatus":"undescribed"
   }
}

Credits

This work would not have been possible without the collaboration of Nicholas ver Steegh (Ohio University)

Clone this wiki locally