Understanding the pipeline for publishing data to the Books & Media Catalog
A note of caution about this page - library staff are continuing active work to improve the pipeline, so parts of the workflow may be updated before the documentation can be updated. When in doubt about something you see in Books & Media, Summon or elsewhere, please submit a ticket to the Alma group at https://support.lib.duke.edu.
Part 1: Alma publishes three sets of data
ย
ย
We use Almaโs publishing profile functionality to include specific MARC and other fields in published data, allowing us to use that information further on in the pipeline.
In the OAI-PMH publishing profile (step 2a), Alma tells us what type of resource a record is, which determines how the record is enriched as the pipeline proceeds.
In addition to the MARC fields, Alma publishes:
Physical record information in one or more 940 fields
Electronic record information in one or more 943 fields
Collection information in a 944 field (note - collection records published currently donโt have all the needed information, and the integrations team is working to fix this)
That means the pipeline can tell what type of resource a record contains based off of what local 9xx fields are present. Library staff donโt see those fields in records - theyโre added specifically to support the publishing pipeline.
We also run a job using the Primo publishing profile (step 2b.) We run this job because the OAI-PMH publishing profile does not include item availability information, which we need for the Books & Media catalog. The Primo publishing profile gives us specific record IDs that we can use later in the pipeline.
And finally, Alma publishes records to Summon (Step 2c). Though we primarily direct patrons to Summon for articles, all of our electronic and print resources are published to Summon and can be viewed with appropriate content types checked or unchecked. Because Summon is offered by Ex Libris, we use Almaโs integrated publishing profile to send data, and no further pipeline work is needed.
Specific field mappings in the OAI-PMH publishing profile can be found in the Digital Strategies & Technology wiki here: https://duldev.atlassian.net/wiki/spaces/DSTP/pages/3870195734/TRLN+Discovery+Publishing+Profile+Enrichment . Note that that wiki has limited access - if you canโt view the page, submit a ticket to the Alma queue at https://support.lib.duke.edu and a staff member will send you the information.
Part 2: Custom Duke scripts begin enriching the Alma data with additional information
ย
First, we retrieve the records published by the Primo publishing job, extract the record IDs, and keep the information in dedicated storage (labeled in the diagram as 3). This ID information is needed so we can retrieve item availability, which we do not get from the OAI-PMH publishing profile (2a, above). This process must complete before the pipeline can continue.
Then we retrieve the records published by the OAI-PMH publishing profile and store them on a local server (4a). These files are in MARCXML format.
We then run a script that processes the XML files from OAI-PMH (labeled in the diagram as steps 4b through 5f.)
If a record is marked suppressed, or deleted, we log the MMSID for the record. We will send the ID to TRLN Discovery for deletion later in the pipeline
If a record contains physical inventory, we query the dedicated database of IDs from Primo for the record ID, which we then use to query Summon for the item availability. We add it as enrichment data to the MARCXML file (step 5d)
If the record does not have any inventory (step 5e), then we send the record for deletion later in the pipeline.
Part 3: Duke scripts complete the data transformation and send the records to the Books & Media Catalog
ย
ย
ย
We copy all of the enriched XML files that we created into a specific directory to be prepared for the Books & Media Catalog (step 6).
We then run a program that transforms the enriched XML to JSON format following Argot, TRLN Discoveryโs shared ingest format. You may hear developers refer to this program as MARC-to-Argot. (If youโd like to learn more about Argot, thereโs documentation here: data-documentation/argot/README.adoc at main ยท trln/data-documentation )
MARC-to-Argot is where we do the majority of record transformation to make sure the MARCXML field value goes in the right field to show in the right place in the Books & Media Catalog interface. For example, MARC-to-argot is where we:
Map the record call number into the right call number facet - for example, ensuring that a record with call number BF411 .I343 appears nested within the BF309 - BF499 call number facet
Create the urls for electronic resources, based off of 943 enrichment data
Determine what names to display under โAuthors, etc.โ on a Books & Media Catalog bibliographic display page
Map edition information to display if present in the MARC 250, 251 or 254
Map variant title information if the MARCXML has 210, 222, 246 or 247 fields
Map information like the library, location, and barcode for physical holdings and items
Normalize call numbers and store them on the record to support searching
Add donor information to the local note for display
And finally, to finish the pipeline in step 8, we send the processed records in batches to TRLN Discoveryโs shared index, which the Books & Media Catalog uses as its shared backend to provide records to patrons.
ย