Finding records in Alma with repeating MARC fields - example process
This is for reference for future explorations like this; I wrote this up after having to identify a set of records and get the field data out of Alma, as an example of how this could be done. Not saying at all that this is the best way.
Task: identify the set of records in Alma that have multiple 856 fields, and then identify subsets of those records for testing scenarios.
The best way to identify records with multiple fields is to start with a big set of records that you can filter down using normalization rules on an Alma set. That’s because normalization rules have two defined functions for rules written in drools called recordHasDuplicateFieldsand recordHasDuplicateSubfields. See ExLibris documentation: Working with Normalization Rules
Get the list of all the records with multiple fields
This could be done multiple ways - in the example of looking for 856s, I could do it from Analytics to avoid running a long task on Alma production. However, not all fields will be in Analytics / usable in Analytics, so your mileage may vary.
To generate the source set of records that had any value in the 856, I actually used Analytics, to avoid running a very long task on Alma production. In Analytics, in the fields grouping “Bibliographic Details”, there is a field called “Electronic location and access” that contains all of the subfields of the 856. (Unfortunately, further analysis of duplicate fields is hard in Analytics because all of the field content is shoved into the one Analytics field, and with the wide variations in cataloging and text over the years, it’s not really possible to even determine which of them are actually duplciates.)
So the source set came from an Analytics report, subject area Titles, include fields MMS ID and Electronic Location, with filters
Suppressed from Discovery = No
Lifecycle = In Repository (note, this field is named differently in some subject areas, some areas it will be called bibliographic lifecycle)
Electronic Location and access is not null
Run the report and save the output - you’ll need the MMS IDs to keep going in Step 2, and then you’ll need to return to this spreadsheet in step 6.
Create the source set in Alma
From the Analytics report, I exported the list of MMS IDs, which was ~650,000, and created a set in Alma using the “From file” option. Because of the size of the file, you do need to split it into two. (You could also try doing this straight from the Analytics analysis.)
If you have two sets, use the combine set function in Alma to combine them into one big set with the ~650,000 “All Titles” records.
Create the normalization rule
You then need to create the normalization rule to identify the records with duplicate 856s. There’s documentation on normalization rules in Ex Libris’s KB and on plenty of other library sites.
rule "Find records with more than one 856"
when
recordHasDuplicateFields "856"
then
set indication."true"
endSave it as a shared rule because it will need to be shared to use in filtering a set.
Filter the source set
Go back to Manage Sets in Alma, and from your combined set created in Step 2, choose Filter Set. In the popup, choose the normalization rule you created in Step 3. Make sure the description will help people know what the set is. Then let the filter job run.
Review set after filtering is complete to make sure it’s what you expected.
The set after filtering is complete is ~20,000 records, which is much easier to manage. Export the set from the Manage Sets screen to get the set data. It will include columns for Electronic Availability; Physical Availability; Digital Availability. You can use these options to filter for records that would have a 943 (electronic availability) and/or records that would have a 941 (physical availability.) We don’t use Digital Availability currently.
At this point, you have the list of IDs with multiple 856s, but to find specific testing examples, you need the MMS ID plus the content of the 856 field. Unfortunately, the export in step 5 does not give you the Electronic availability information, so you have to go back to Analytics for that.
Copy the subset of records back into your big report from Step 1 and use Excel to match the MMS ID to the electronic access value.
Open the report you have in Step 1, where you have the big list of MMS IDs (~650,000) with their associated electronic access values.
In a second tab, copy and paste in the MMS IDs from the subset. (Make sure to use Format Cells to make the column text so Excel doesn’t screw with the numbers, before you do the copy/paste).
Then, on the second tab, use either VLOOKUP or XLOOKUP to pull in the electronic access values for the smaller ~20,000 records list.