Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Setup a publishing profile in Alma to determine what metadata elements are included when we harvest records

  2. Kick off a publishing job in Alma

  3. Harvest all of the published records from Alma

  4. Enrich the records with additional information not available from the harvest (e.g., availability information)

  5. Transform the final records into the correct format for TRLN Discovery

  6. When needed, test a subset of records in a Duke sandbox to see how they will look in the catalog before we push them into the live application

  7. Publish the records into the TLRN Discovery data stream

A note about the stale data in the catalog: When we have to run the full pipeline on all ~8 million records that go into the catalog, the process can take almost a month to complete. Since the Alma cutover in July, we’ve had to make several changes to the pipeline, and each change meant we had to reprocess all of the records again. As the pipeline nears its final state, we are working to transition to “intermittent updates.” This means that instead of reprocessing all of our millions of records each time, we can just ask Alma for the records that have been updated since our last run. Switching to intermittent updates takes some extra coding to automate each part of the pipeline, and that work is currently underway. When we get to the point where we can switch to intermittent updates, we hope to be processing updated records every hour. In the meantime, changes in Alma will still take several weeks to appear in the catalog.

Recent updates to the data pipeline

...