TRLN Discovery/POD Integration Specifications
This document defines the requirements to change the data source for current integrations to TRLN Discovery and POD from Aleph to Alma. TRLN is the Triangle Research Libraries Network and POD is the Platform for Open Discovery through IvyPlus. POD data supports both BorrowDirect and TRLN borrowing with ReShare.
Alma to TRLN/POD Data Pipeline Diagram
Diagrams describing data storage to support this and other Alma integrations
(CI Pipline = Continuous Integration Pipeline)
TRLN Architecture Overview Diagram
This diagram was provided by TRLN staff and describes the data architecture from a TRLN perspective.
Customizations
- duke specific https://github.com/trln/marc-to-argot/tree/main/lib/marc_to_argot/macros/duke
- general https://github.com/trln/marc-to-argot/tree/main/lib/marc_to_argot/macros/shared
- questions on microform https://github.com/trln/marc-to-argot/blob/main/lib/marc_to_argot/macros/shared/physical_media.rb
more customizations https://github.com/trln/marc-to-argot/tree/main/lib/data/duke
POD code in Github
- Extract instance, holding, and item records from Alma that aren't suppressed from discovery and aren't marked with a "delete" tag on the record.
- Extract records that are newly suppressed and newly deleted so that they can be removed from TRLN Discovery.
See the Alma Configuration Dependencies section below for detailed integration profile documentation.
- The Aleph update-only extract ran every 30 minutes. We expect the Alma updates-only extract to run every hour; this is a configuration controlled by Ex Libris and we cannot make it run more often.
If no records are identified during the current Aleph harvest, a 0 record file is generated, so that practice should be continued for Alma.
Historically, a set of 5000 records is extracted every hour on the 45th minute so that eventually the entire set of inventory records in Alma will be sent to downstream systems. We do not yet know if this is needed for Alma; we have not attempted to replicate this workflow yet and won't do so until we wknow that it's needed.
- Full extract is approx 7.15 million records
This section describes the Alma configurations required to support this integration.
OAI-PMH
An Alma integration profile is used to define the specifications for publishing records via OAI-PMH.
Documentation is in progress.
Profile details
- Name: OAI-PMH for TRLN Discovery
- Profile description: OAI-PMH for TRLN Discovery
- Publishing Parameters:
- Status: Active
- Scheduling: Every 1 hour(s)
- Email notifications: no address
- Content:
- Set name: Entire repository default set
- Additional set name: (blank)
- Publish the entire repository: checked
- Filter out the data using: (blank)
- Publish on: Bibliographic level
- Output format: MARC21 Bibliographic
- Publishing protocol
- FTP: (not checked)
- OAI: checked
- Set Spec: trln_discovery_spec
- Set Name: trln_discovery_name
- Metadata prefix: marc21
- Z39.50: (not checked)
Data Enrichment
- Bibliographic normalization
Correct the data using normalization processes: Delete tags for TRLN extract
(needs documentation)- Linked data enrichment: unchecked
- Bibliographic Enrichment
- Add management information: checked
- Repeatable field: 942
- Publish suppressed records as deleted: checked
- all other fields are empty
- Related Records Enrichment
- Add related records information: checked
- Relation type field: TYPE
- Relation record MMS ID subfield: a
- Relation type subfield: 8
- Related fields enrichment:
- Related tag: 245, related subfield: a, Bib tag: RELATED, Bib subfield: a, relation type All
- Add holdings/items of related records: checked
- Authority enrichment
- Add authority information: unchecked
- Physical Holdings Enrichment
- Add holdings information: checked
Holdings tag | Holdings subfield | Bib tag | Bib subfield |
---|---|---|---|
852 | h | 852 | h |
852 | i | 852 | i |
852 | j | 852 | j |
852 | k | 852 | k |
852 | l | 852 | i |
852 | m | 852 | m |
866 | a | 866 | a |
866 | z | 866 | z |
867 | a | 867 | a |
868 | a | 868 | a |
852 | b | 852 | b |
852 | c | 852 | c |
852 | q | 852 | q |
- Exclude suppressed records: (checked)
- Physical items enrichment
- Add items information: checked
- Repeatable field: 940
Item PID subfield | k |
Barcode subfield | p |
Item Policy subfield | o |
Description subfield | n |
Current library subfield | b |
Current location subfield | c |
Call number type subfield | d |
Call number subfield | h |
Public note subfield | z |
Create date subfield | s |
Update date subfield | u |
Holdings ID subfield | e |
- Electronic Inventory Enrichment
- Planned implementation as of August 20 2024 in the 943
Portfolio PID | 8 |
URL Type Subfield | a |
Access URL subfield | u |
Link Resolver Base URL | e |
Static URL | h |
Electronic Material Type subfield | q |
Proxy Select subfield | b |
Proxy Enabled subfield | c |
Authentication Note subfield | x |
Public Note subfield | y |
Direct Link subfield | d |
Service ID | g |
- Digital Inventory Enrichment
- Add Digital Representation Information: Unchecked
- Add File Information: Unchecked
- Add Remote Representation Information: Unchecked
- Collection Enrichment
- Add Collection Information: Unchecked
This section describes any data mapping requirements for this integration.
Note: this section needs to be updated for Alma
Alma publishing creates MARCXML records and the downstream ingest processes for TRLN and POD map data to the discovery layer and POD. These are links to the data mappings already in use in Production with Aleph. It is assumed that the new integration with Alma will continue to use these mappings:
Mapping Tables
- Duke-specific https://github.com/trln/marc-to-argot/tree/main/lib/translation_maps/duke
- General TRLN marc to argot https://github.com/trln/marc-to-argot/tree/main/lib/translation_maps/shared
- POD Aggregator/Datalake https://github.com/pod4lib/aggregator
Additionally, this is a working spreadsheet to track any mapping changes and example records for validation: https://duke.box.com/s/3rrlpx2soe54ir587ij3j1st3wh9b3st
Mapping spreadsheet created 3/29/23 to track MARC to TRLN mapping from Aleph and FOLIO: https://duke.box.com/s/uyebv605j1lma0ex90csrpg30qlg6gys
Argot Configuration
Defaults https://github.com/trln/marc-to-argot/tree/main/lib/data/argot
List of Duke-specific Overrides
These are attributes that Duke does not want to be parsed by the default argot configuration: https://github.com/trln/marc-to-argot/blob/main/lib/data/duke/overrides.yml
id |
local_id |
rollup_id |
oclc_number |
institution |
items |
holdings |
names |
url |
access_type |
date_cataloged |
primary_oclc |
physical_media |
Mapping Holdings Summaries from Alma to TRLN Discovery
- Formerly, Aleph output the holdings summary to the 852$a; Now, Alma outputs the institution code to the 852$a
- For Alma, we map the 866$a and 866$z to the holding summary for display, with handling for repeatable fields.
- We do not show the 866$x if present.
Inventory of MARC fields with indication of where/how each value is set (ex: yml file, macro)
To see full list, open this link to the file in Box: https://duke.box.com/s/a98wrly961091bxnj5bj3ysnslrvqo15
Record identifiers in Books & Media - Aleph-born records and Alma-born record
When migrating records from Aleph to Alma, the Aleph bib ID is transformed to become the Alma MMS ID. See Ex Libris record number documentation: link
The format of the migrated record MMS ID is 99 + Aleph bib ID + 010 + 8501
- 99 indicates the type of record (bibliographic)
- 010 is a bibliographic library identifier added by the Ex Libris migration process
- 8501 is our Alma institutional ID
For an Alma-born record, the format of the migrated MMS ID is 99 + record identifier + 8501
- 99 indicates the type of record (bibliographic)
- 8501 is our Alma institutional ID
- The record identifier is a unique identifier for the record.
Books & Media URLs (find.library.duke.edu...)
- For Aleph born records, we strip '99' and '0108501' so that the URL does not change at Alma cutover
- E.g., for a record with Aleph bib "006288172", the Alma MMS ID is "990062881720108501", and the Books & Media URL is https://find.library.duke.edu/catalog/DUKE006288172
- For Alma born records, the MMS ID is generated when the record is created in Alma, and it becomes the identifier in Books & Media
- E.g., a new title is ordered in Alma and it is assigned an MMS ID of "99112812262508501". The Books & Media URL is https://find.library.duke.edu/catalog/DUKE99112812262508501
Default availability value for Rubenstein and University Archives
Because of the process we are using with Alma to obtain availability data with Summon, we have run into issues disambiguating some Rubenstein and University Archives holdings so that we know which holding record in the OAI output corresponds to which Summon record.
Because we do not have a straightforward way to disambiguate (Summon APIs do not give us collection codes,) we will default to having Rubenstein and University Archives show as available in the underlying data sent to Books & Media. When they appear in search results, the circ status API will check their availability in real time and update the display to a different value if there is one.
The purpose of this section is to describe the validation steps needed to confirm that this integration is working successfully. Since this integration replaces the data source from Aleph to Alma, we expect the end of the data pipeline to mirror what is currently flowing from Aleph. We want to confirm that the switch from Aleph to Alma doesn't impact how data is displayed in the TRLN catalog, POD, or Summon.
- Is the expected volume of records moving from Alma to TRLN and POD?
- For various record types, is the data from Alma displaying in the TRLN catalog as expected?
- For various record types, is the data from Alma mapping to POD as expected?
The following types of records should be reviewed during testing. Links to specific records of each type are maintained here: https://duke.box.com/s/3rrlpx2soe54ir587ij3j1st3wh9b3st
Type of Record | Fields of Interest | General Fields of Interest |
---|---|---|
book | 650, 901, 904, 905, 951, 952, 998 | |
bound-withs | 987, 943 | |
e-books | 914 | |
"funky" serials record | ||
Lok Sabha debates | 952 (145 times) | |
map | ||
microfiche | ||
microfilm | ||
musical score | 348, 655, 914 | |
physical media (record) | 700 | |
readers digest record | 952 (205 times) | |
rubenstein |
The following library staff should be included in validation efforts once a testing infrastructure is ready to help confirm that data sourced from Alma displays in the TRLN catalog as expected:
Name | Title | Library |
---|---|---|
Andy Armacost | Head of Collection Development and Curator of Collections | Rubenstein |
Sean Chen | Head of Cataloging and Metadata Services | Goodson Law Library |
Bethany Costello | Resource Access Librarian | Ford Library |
Neal Fricks | Content and Discovery Specialist | Medical Center Library |
Jessica Janecki | Team Lead for Original Cataloging | DUL Collections Services |
Ryan Johnson (for music cataloging) | Special Formats Description Librarian | DUL Collections Services |
Meghan Lyon | Head of Technical Services | Rubenstein |
Erin Nettifee | IT Business Analyst | DST - LSIS |
Lauren Reno (for single item cataloging) | Section Head, Rare Materials Cataloging | Rubenstein |
Jacquie Samples | Head, Metadata & Discovery Strategy | DUL Collections Services |
add_transfer_url="https://trln-discovery-ingest.cloud.duke.edu/add_update_ingests" delete_transfer_url="https://trln-discovery-ingest.cloud.duke.edu/delete_ingests" pod_transfer_url="https://pod.stanford.edu/organizations/duke/uploads"
Source: https://gitlab.oit.duke.edu/aleph/home/-/blob/23_prod/home-slash-aleph/opac/call_extract_scripts.sh
The above locations would need to be changed for testing and using a test server.
This section includes historical information about the data pipeline from Aleph to TRLN Discovery for reference.
Historical documentation - Aleph and TRLN Discovery OPAC
This wiki page documents the Aleph to TRLN Discovery integration: Historical documentation - Aleph and TRLN Discovery OPAC
Aleph to TRLN Transformations
In the data pipeline diagram for Aleph, the gray oval representing "PERL bib data extract scripts" performs several data processes:
- Identify which bib records should be extracted from Aleph.
- Extract MARC items and holdings records for those bib records.
- Extract additional item data from Aleph that isn't included in MARC for each bib ("enhanced item data").
- Create a file containing records in MARCXML format.
Aleph-based Pipeline
add_transfer_url="https://trln-discovery-ingest.cloud.duke.edu/add_update_ingests"
delete_transfer_url="https://trln-discovery-ingest.cloud.duke.edu/delete_ingests"
pod_transfer_url="https://pod.stanford.edu/organizations/duke/uploads"
Source: https://gitlab.oit.duke.edu/aleph/home/-/blob/23_prod/home-slashaleph/opac/call_extract_scripts.sh
Metadata Management Imp Team documents
- Aleph alphabetic and 9xx fields: https://duke.box.com/s/2x7h54k7gctbwmf7h9egi9vdn6rvpsma
- Extract criteria used for OCLC: /wiki/spaces/LIDS/pages/34768401
- Extract criteria used for authorities: Criteria for ACVS Bibliographic Extracts
The following table describes where each data processing task occurs in the current Aleph data pipeline so that the team can validate that each task is covered in the new Alma-sourced data pipeline to TRLN.
- The rows contain the list of data processing tasks that are occurring against Aleph data (grey column).
- The blue columns identify where each process will be handled in the new Alma data pipeline.
# | Data processing in the Aleph to TRLN data pipeline | New Alma data pipeline - where will each process be handled? | |||||
---|---|---|---|---|---|---|---|
No action required | Migration of Aleph records to Alma | Alma Integration Profile | Alma to TRLN scripts | TRLN Discovery App | Other/notes | ||
1 | Non-unicode character cleanup | ✓ | |||||
2 | Aleph records are excluded from the current TRLN/POD extract based on various criteria such as:
NEED TO REVIEW THIS (Alma doesn't support suppress items) | ✓ | These records are loaded to Alma and tagged with "suppressed from discovery" flag so that they'll be excluded from Alma publishing. Note from Julie - confirm that this is true for Alma | ||||
3 | Drop specified local use fields such as the 029 - see notes for link to full list | ✓ | The data elements listed in the Aleph Alphabetic and 9xx Fields.xlsx spreadsheet as "Do not migrate" are not migrated on records loaded to Alma. confirm that this is true for Alma To do: Review the current Aleph to TRLN scripts to confirm that everything that is currently dropped is covered on this spreadsheet to confirm how that was handled in the Alma migration | ||||
4 | Identify which bib records should be extracted: Updates-only during 30 min window, skip "suppressed records" | ✓ | Follow-up with Matt/Jeff/Ayse to see if TRLN is using the same criteria as current MADS criteria for determining which records are pulled. | ||||
5 | Extract MARC items and holdings records for each bib record that meets the extract criteria | ✓ | |||||
6 | Extract additional item data that isn't included in MARC for each bib, such as location and availability | ✓ | |||||
7 | Extract an additional 5000 records every hour so that the entire dataset is eventually refreshed in downstream systems incrementally. | ✓ | This step enables the entire dataset to be refreshed incrementally. This was originally done as a compensatory maintenance measure for Endeca. We're not sure if it will need to be done for Alma, but are tracking it here for verification either way. | ||||
8 | Convert extracted data to MARCXML format | ✓ | |||||
"New" data processing for Alma data (these processes weren't needed for Aleph data) | No action required | Migration of Aleph records to Alma | Alma Integration Profile | Alma to TRLN scripts | TRLN Discovery App | Other/notes | |
9 | Declare namespace at the beginning of MARC XML output, needed for Marc-to-Argot | ✓ | |||||
10 | Specify record type as Bibliographic in XML | ✓ | |||||
11 | Remove marc namespace prefixes - this was needed for FOLIO, is it needed for Alma? | ✓ | |||||
12 | Remove unnecessary metadata node to XML | ✓ | |||||
13 | Add collection node to XML | ✓ | |||||
End of "New" data processing for Alma data | No action required | Migration of Aleph records to Alma | Alma Integration Profile | Alma to TRLN scripts | TRLN Discovery App | Other/notes | |
14 | Run Marc-to-Argot based off of Duke specific overrides | ✓ | Does this need to be updated for Alma? The MARC 001 field control number needs to be transformed with string 'DUKE' as prefix and first 4 digits stripped, so a FOLIO 001 containing 'in00009142400' needs tranformation to 'DUKE009142400'. Currently handled in the marc to argot scripts (added 3/29/23) | ||||
15 | Move data via Spofford app | ✓ |
Historical Enhanced item data
The following data elements were extracted from Aleph to enhance what is included in MARC
Aleph Oracle Table | Data element | Description (from Aleph documentation) |
---|---|---|
Z30 | sublibrary | Code of the sublibrary that “owns” the item. |
Z30 | collection | Collection code of the item |
Z30 | call_no_type | This defines the type of location assigned in the Z30-CALL-NO field |
Z30 | call_no | Shelving location of the item |
Z30 | description | Description of the item (for multi-volume monographs or serial items) to help the user identify the item they are interested in. For items created through the Serials function, this field is automatically set to the enumeration and chronology description of the issue. |
Z30 | note_opac | Note field. This note is displayed in the OPAC |
Z30 | item_status | Status of the copy for loan purposes. |
Z30 | item_process_status | Defines the item's processing status (e.g. on order, cancelled, binding, etc.). |
Z30 | material | Material type. This can be VIDEO, BOOK, ISSUE, etc. "ISSUE" has special functionality within the system, all other types are used in an equal manner. |
Z30 | barcode | Unique identifier of the item. |
Z30 | open_date | Creation date of the copy. |
Z30 | update_date | Date copy was last updated |
Z30 | date_last_return | Date copy was last updated |
Z30 | no_loans | Number of times the item was loaned. |
Z30 | hol_doc_number_x | System number of the holdings record to which the item is attached. |
Z30 | temp_location | This is a toggle field used to control the overriding of the location fields by the holdings record. Values are Y and N. |
Z36 | due_date | Active due date (see also ORIGINAL-DUE-DATE and RECALLDUE-DATE). This field changes according to loan transactions (e.g. renew loan). |
Z103 | lkr_type | Defines the type of link. ADM = Link from ADM record to BIB record. Link is build from BIB to ADM. ITM = Link between a BIB record and the items of another BIB record. |
Z37 | status | Hold request status. |
Z37 | end_request_date | Last date of interest for the hold request. |
Z36 | id | Patron‟s ID |
Z36 | recall_due_date | Due date computed as a result of the recall transaction. The date might actually be later than the due date, if the recall was generated close to the end of the loan period. It is retained for computing fine owing for late return of recalled item. |
Z30 | rec_key 1,9 | System number of the administrative record (ADM) associated to the item. |
For documents that have been approved and are in a finalized state (and locked for edits), include this change log to keep track of future changes.
Date | Description of Changes | Updated By |
---|---|---|
1/30/2024 | Page created. | J. Brannon |
4/8/2024 | Replaced the FOLIO data pipeline diagram with the proposed Alma to TRLN/POD diagram | J. Brannon |
4/8/2024 | Removed "Summon" references since that feed will be handled in Alma through a separate process, not as part of this data pipeline. | J. Brannon |
6/4/24 | Updated the new TRLN/POD data pipeline diagram | J. Brannon |
6/11/24 | Updated data transformation rule #7 to indicate it might not be needed (incremental extract of 5,000 records for background refresh.) Added section "Record identifiers in Books & Media - Aleph-born records and Alma-born records" to document decision on Books & Media identifier formats. | E. Nettifee |
6/28/24 | Add information about mapping holdings summaries as this changes from Aleph to Alma. | E. Nettifee |
7/26/24 | Added diagrams about data storage to the overview section | J. Brannon |
8/20/2024 | Copied in planned imp of 943 e-resource enrichment from Teams chat (into the Configuration Dependencies section) | E. Nettifee |
10/4/2024 | Moved data pipeline diagram out of expand section and made minor updates to Overview and Reference Link section | J. Brannon |
11/5/2024 | Removed questions section and empty assumptions section, FOLIO references. Moved Aleph diagram and enhanced item list of fields from Aleph to a "historical" section. | J. Brannon |
11/13/2024 | Moved more Aleph info to the Aleph expand section | J. Brannon |