Where MTA is looking for URLs

Where MTA is looking for URLs

MARC 941 Field - Alma Metadata

Purpose: Contains Alma system identifiers
Tag: 941
Subfield $e: Entity ID
Usage: Primary identifier for SOA URL construction when available
Processing: Takes precedence over portfolio IDs extracted from URLs


MARC 943 Field - Electronic Resource Access (Primary)

Purpose: Alma-generated electronic resource access information
Tag: 943
Priority: Highest for URL creation

Subfields:

  • $d: Direct access URL/href to the electronic resource

  • $q: Resource type (JOURNAL, NEWSPAPER, BOOK, DATABASE, etc.)

  • $s: Resource status/availability label

Processing Logic:

  • Multiple 943 fields: Creates single URL using SOA URL + identifier

  • Single 943 + journal type: Uses SOA URL + identifier

  • Single 943 + non-journal: Uses direct URL from $d with proxy handling

  • Status filtering: Skips fields where $s = "Not Available"

Identifier Fallback Strategy:

When constructing SOA URLs, the system uses this priority order:

  1. Primary: Entity ID from 941$e (if present and not empty)

  2. Fallback: Portfolio ID extracted from 943$d URL

    • Extracts portfolio_pid parameter from Alma resolver URLs

    • Example URL: <https://na05-psb.alma.exlibrisgroup.com/view/uresolver/01DUKE_INST/openurl?portfolio_pid=53896265270008501&Force_direct=true>

    • Extracted ID: 53896265270008501

  3. For multiple resources: Uses first available portfolio ID from any 943 field


MARC 944 Field - Collection Access

Purpose: Collection-level electronic access
Tag: 944
Priority: Secondary (only if no 943 fields exist)

Subfields:

  • $b: Collection identifier for SOA URL construction

Processing:

  • Creates URL: SOA_BASE_URL + collection_id

  • Only processed when no 943 fields are present


MARC 856 Field - Electronic Location and Access (Legacy)

Purpose: Standard MARC field for electronic resources
Tag: 856
Priority: Lowest (only if no 943 or 944 fields exist)

Subfields:

  • $u: Primary URL/URI

  • $a: Host name (fallback URL source)

  • $y: Link text/description

  • $3: Materials specified/notes


URL Type Logic (Shared Processing):

The system determines URL types based on content analysis:

fulltext type:

  • Default for most electronic resources

  • Triggers proxy application for restricted content

  • Used for journal articles, e-books, databases

findingaid type:

  • URLs containing Duke finding aid patterns:

    • library.duke.edu/rubenstein/findingaids

    • scriptorium.lib.duke.edu/dynaweb/findaids

    • library.duke.edu/digitalcollections/rbmscl

  • Link text containing "finding aid" or "collection guide"

  • Subfield $3 containing "finding aid" or "collection guide"

supplement type:

  • Supplementary materials and related resources

  • Additional content beyond primary resource

toc (Table of Contents) type:

  • Links to table of contents

  • Preview or summary content

Text Processing:

  • Link text: Assembled from $y subfields

  • Filtering: Removes "get it@duke" text variants

  • Notes: Extracted from $3 subfields


Processing Hierarchy

  1. MARC 943 (highest priority) - Alma electronic resources

  2. MARC 944 (if no 943) - Collections

  3. MARC 856 (if no 943/944) - Legacy/manual entries

Special Processing Features

Portfolio ID Extraction:

When Entity ID (941$e) is missing, the system extracts portfolio IDs from 943$d URLs:

  • Pattern: Searches for portfolio_pid=([^&]+) in Alma resolver URLs

  • Usage: Creates SOA URLs using extracted portfolio ID

  • Metadata: Adds portfolio_id field to URL output (not added when using Entity ID)

Proxy Handling:

  • Restricted resources: Automatic Duke proxy application

  • Shared records: Uses proxy placeholder {+proxyPrefix}

  • Legacy proxy URLs: Updates old Duke proxy prefixes

SOA URL Construction:

  • Base URL from soa_url_conf.yml

  • Identifier priority: Entity ID > Portfolio ID > Collection ID

  • Portfolio ID extraction from Alma resolver URLs

Restriction Detection:

  • Proxied URLs: Containing proxy.lib.duke.edu

  • SSO Resources: Domains from unproxied_restricted.yml

  • Default: Assumes unrestricted unless detected