Components of the Library Data Clean Up Project

Scope: Bibliographic data (i.e. bib, holding, item) cleanup is an essential activity to ensure data integrity and operational efficiencies.  This page lists all of components for the  FY22 project. Links to project component action plans lead to the Box folder for the project.  This page will be updated to reflect new or updated information as the timeline advances.

Contact: Jacquie Samples Natalie Sommerville

Units: Metadata and Discovery Strategy Department and Resource Description Department

Date created:  07/29/2021

Date finalized: 07/15/2022


Priority Definitions

LevelPriorityPriority description
P1

Major Blocker/no work around
Must be completed before or during OCLC DataSync and Postwork period

Probable deadlines prior to May 27, 2022

Broken data that means FOLIO will not work correctly if it is not cleaned up.
Project need for transition to RapidILL
Patrons may not be able to find/access materials.
Example: bad Unicode as found in error logs from Nov. 2020 Honeysuckle load
P2

Huge Hurdle/work around painful
Can wait until after OCLC DataSync, but before FOLIO implementation


Probable deadlines prior to June 30, 2022 (adjusted since FOLIO timeline shifted)

Errors in data that will be difficult or impossible to remediate in FOLIO on migration or immediately after migration. 
Example: Item material type is BK instead of SER.
Example: Batch change IPS for withdrawing items, missing items, etc.
Example: Fragile data like the LKR , like multiple bibs for 1 title.
Example: Addressing invalid (legacy) location codes when the code exists in Aleph records but not in Aleph tables (PENAP, ECCO, etc.)

P3

Data Enhancement/work around not sustainable

Probable deadline time-frame to be determined

Data change not possible in FOLIO  on migration or immediately after migration, but we could limp along.  Fine for now, clean up would add value.
P4Nice to doProbably can't do in FOLIO, but won't really mess any process up.
P5none knownJust don't forget this type of problem exists


Project Components and Priorities

PriorityStatus
Green means completed
Title/NameCleanup Need Description/NotesImpacts/OutcomesSize (S/M/L)Stakeholder Group, or
Who reported this?

Project Component Folder

*Contact Jacquie or Natalie for access to this folder
P1

Completed

as of May 27,  2022, all local work is completed.

The only task left is to request that OCLC clears Staging.  That decision needs to be made.

P1-A Full Scale OCLC Data Sync -- (previously called reclamation)Remediate NDD holding and title data between Aleph and OCLC
Leila records (and others) do not automatically get set with our normal process
budget implications?
Then start OCLC update service (to get vernacular, serial updates, etc.)

OCLC Weekly Update Script is being redone with new logic by LSIS.
Benefits holdings data integrity; enables transitions to RapidILL and migration to FOLIOLRos Raeford, Dracinehttps://duke.app.box.com/folder/144608512372
P1

Completed

in conjunction with P1-A.

Manual post-work pending

P1-E 040 by language clean-up project

Most of these (15,246) had record matches in OCLC, Aleph records should be overlaid.  Field protections/merge routine is being developed (as of 2/15/2022)

Remaining unresolved titles will be cataloged in RDD or RLTS

There are full-level bibs in system which were created for language of cataloging other eng (040$b) which should be reviewed and replaced (may require significant manual review/cataloging). 
 
Matt's help is needed to create/execute separate extract of these records

Related work: Presence of 040 $b (not eng) needs to be updated in cataloging documentation and processes.

Benefits bibliographic data integrity and migration to FOLIOL (10,179 per WMS; 18,567 per Aleph)RDDhttps://duke.app.box.com/folder/144606964413
P1

Postponed as of 6/30/2022

In-Process post-work pending


P1-B Clean up discrepancy between Gov Doc shipping list records and actual physical holdings, some titles never held

Delete Gov Doc records for print records without items.

Other "extra" records exist, suppress as needed.  (i.e.: malformed holding codes)

Benefits holdings data integrity; enables transitions to RapidILL and migration to FOLIOLJacquie, Natalie, Jianyinghttps://duke.app.box.com/folder/144607421409
P1

Completed

as of Aug 30, 2021


P1-C Clean up WorldShare Ebsco ebooks on OCLC
Before Full Scale OCLC Data Sync,
Holdings were "set" based on WorldShare configurations, but we do not use those records, so all are erroneous
Scope needs to be reviewed.

Benefits holdings data integrity; enables transitions to RapidILL and migration to FOLIOLJacquie, Leedahttps://duke.app.box.com/folder/144607704100
P1

Completed

as of May 27, 2022

P1 -D Aleph 935 - reunite related titles (935)


As of June 30, decision made to not migrate remaining 935 data

Lok Saba debates is an example
Delete extraneous 935s, then cause titles to be reunited

Have data-set

Benefits holdings data integrity, shared print retention and migration to FOLIOLMMIThttps://duke.app.box.com/folder/144606799982
P1

Completed

as of May 27, 2022

In-Process in conjunction with P1-A, post-work pending


On-the-fly cleanup portion of the component is complete.

P1- F Delete On the Fly and/or Circ-Created records with no "real" barcode

Can't delete records in FOLIO
Most have STA=circ-created
Some have "Temporary On-the-Fly" in 245 or 500


Matt's help is needed to to ID the rest

Benefits holdings data integrity; enables transitions to RapidILL and migration to FOLIOS (1190)MMIThttps://duke.app.box.com/folder/144607402184
P1

Completed

as of Oct. 12, 2021

P1- G Gov Docs Resizing Batch withdrawals, Schedules A and D completed as of 7 Oct. 2021.

review needed in RDD due to data inconsistencies

Have data-set, needs manual refinement

Benefits holdings data integrity; enables transitions to RapidILL and migration to FOLIO
Ros Raefordhttps://duke.app.box.com/folder/144607450317
P1

Completed, provisional*

Data cleanup complete as of 24 Jan 2022


P1-H Items with sublibrary CHEM or VESIC (AskTech #5167)

Follow up on all BES holdings (including Herbaria), these may come back to the libraries, but decision was delayed in 2020.

*Decision on remaining collection is needed before transfer and final data cleanup possible, but planning suggests this will be done by January 2023, so cleanup aspect is done for now.

Ghost holdings in bound-withs.  Are these old reserves?  How do we ID and remove them? 

wsl=VESIC (17 records)
wsl=CHEM (0 records)

Leeda has data and notes to share

Benefits holdings data integrity; enables transitions to RapidILL and migration to FOLIO
Erin Nettifeehttps://duke.app.box.com/folder/144608791942
P1

Completed

as of Oct. 12, 2021

P1-I Lost materials

Several sets of holdings need to be set to  Lost.  This includes IPS=LM, holding codes for "Off-Site Stacks," etc.  When the only holding we had is Lost, holdings need to be removed from OCLC

LM meant "missing from LC Reclass", but put in use by DUL Circ and Law for other reasons, now LM means the same across DUL and professional school libraries.

DUL 'LM's need to be changed to "LO" when material is not currently circulating.
Postwork: Document new workflow in TSPD, link to ADS page for additional info on their part of the workflow

Benefits holdings data integrity; enables transitions to RapidILL and migration to FOLIOMRos, Jacquie, Andrea Loigmanhttps://duke.app.box.com/folder/144609024877
P1

Completed

as of Sept.1, 2021

P1-J Clean up holdings with PK3  "locations and call numbers vary" wcl=PK3* Suppress holdings records?  Delete if no connected order?  What happens with any "real" items?
RLTS has some similar issues with some location.
Ditch OCLC holdings for these titles.
Benefits holdings data integrity; enables transitions to RapidILL and migration to FOLIOM (19,473)Nataliehttps://duke.app.box.com/folder/144608066317
P1

Completed

as of July 15, 2022

P1-K Microfiche described on print holding records (need to transfer to microfiche collection)

cross check item material type (microformat) and "general" stacks in DOC sublibrary. (~34,300 Suppressed on 10/12)

consider maps with same inconsistency

Benefits bibliographic data integrity and migration to FOLIOLHolly Chang, CRAhttps://duke.app.box.com/folder/144607882593
P1

Completed

as of Feb. 22, 2022

 – no bearing on Data Sync

P1-L HOL 863 summary statement in multi-part mono holdings: LDR Pos 06=v

move to 866 statements

Make sure that this is a retired practice

Benefits holdings data integrity; enables transitions to RapidILL and migration to FOLIOLJacquiehttps://duke.app.box.com/folder/144608390521
P1

Completed

as of May 2021.

P1-M Lilly Locked Stacks materials without "D" barcodesReview each tab and ensure that IPS and other data are correct.Corrects data errors and simplifies migration to FOLIOSRos Raefordhttps://duke.app.box.com/folder/144607710251
P1

In-Process

as of Oct.1, 2021

P1-N Update bibliographic records for Gale electronic resources


Records provided by Gale; not available from 360.  Original records loads described print, not electronic resources.  New file has been supplied.  We need to check to see if OCLC holdings are set. Attempt to match on 901 NCCO number.

Approximately 300,000 records were suppressed as of Oct. 11, 2021

Make sure to think about RL records where Gale URLs may be added in error.

Benefits holdings data integrity; enables transitions to RapidILL and migration to FOLIOL (approx.270,000)CRA, RDDhttps://duke.app.box.com/folder/144605595544
P2

Completed

as of January 2022

P2-A Statistical Analysis needed to enable logical Gov Docs resizing projectsNatalie will set up meeting with Jacquie & Jianying to tease out what strategy should be pursuedBenefits holdings data integrity; 
Nataliehttps://duke.app.box.com/folder/145243891245
P2

Completed

as of July 8, 2022

P2-B DDC to LC long-tail

see planning request from Ros
both IPS LC and GW mean "In-Process LC reclass project" 
 -- Flip IPS LC to "blank" when 05099 exists
 -- Flip IPS GW to "blank" when 05099 exists
-- Manual intervention to apply LC classification when no 05099

Benefits item data integrity and access to these materials to Duke community.SRos Raefordhttps://duke.app.box.com/folder/145243896752
P2

Completed

as of March 11, 2022

P2-C Crash records delete after analysisRemoving extraneous records simplifies FOLIO migration.M (5)MMIThttps://duke.app.box.com/folder/145244118929
P2

Completed

as of June 3, 2022

P2-D Bib 999 tagsSome Aleph 999 tags will migrate to FOLIO 996, some need to be deleted for data integrityFacilitates migration to FOLIO
MMIThttps://duke.app.box.com/folder/145240459334
P2

Completed

as of June 4, 2021

P2-E Aleph to FOLIO error logs
There are many sub-projects included here, which will be delegated to the Data WG,  MMIT, and LSIS when fix details are known

Many lines in the log files. Need to be broken out into workable projects

Dennis has created PostGres DB to enable error log review and planning.

Corrects data errors and simplifies migration to FOLIO
MMIT,  Data WG, LSIShttps://duke.app.box.com/folder/145243517027
P2

Completed

as of February 28, 2022

P2-F Update MFHD summary statements for multi-part monographs

These data are not valid according to the ANSI/NISO Z39.71standard and should be updated both to confirm holding accuracy and to comply with this standard.

Benefits holdings data integrity; enables transitions to RapidILL and migration to FOLIO

https://duke.app.box.com/folder/145243021725
P2

Completed

as of June 3, 2022

Now considered normal workflow

P2-G Clean up unused/unnecessary locations
For example, DOSS holding codes, (and AskTech #4181)

Reduce the number of locations to create in FOLIO to facilitate migration
Make sure to update/review the LSC Accession list

example: Remove Quarto locations:

We no longer use/apply locations that include a Q because we haven't used Quarto since at least 2012. RL may still use Quarto accurately (20,550)
Benefits holdings data integrity, shared print retention and migration to FOLIOLErin Nettifee, Nataliehttps://duke.app.box.com/folder/145240601644
P2

In-Process
December 2020;

Postponed

Change in collection decision at DKU, means that this component is on long-term hiatus.


P2-H DKU Gift Loads (AskTech #5398)Wuhan University gifted DKU library 50,000 titles with CNMARC records; June asked for help in transforming them into MARC21
-Work with Terry Reese to update MarcEdit (done)
-Set up scripts to embed holdings and item data into MARC (done)
-load records into KUN51 library, creating bib, hol, and item data
-hand back to June the clean-up for multi-part monographs (summary statements and additional items)
Benefits DKU and MarcEdit library community; priority set based on timeline of requestLJune Mahttps://duke.app.box.com/folder/145244368335
P2

Completed

As of 3/7/2022 all item records are deleted

P2-I Material Type ITNET (AskTech #5983)It looks like we can delete all of these items as well as the linked holdings records. Which means that we "should" be able to retire that location as well.Removing extraneous records simplifies FOLIO migration.SErin Nettifeehttps://duke.app.box.com/folder/145242877388
P2

Completed

as of June 3, 2022

P2-J CHART material type review (AskTech #5981)Mapping CHART to Visual Materials during FOLIO Migration.  Leeda and I worked on collapsing  material types in 2020 and I want to make sure that consistent decisions are made.  I have already seen that some of these materials are not posters or charts, but microfilm/microfiche. So, some analysis and cleanup is needed. Corrects data errors and simplifies migration to FOLIOMErin Nettifeehttps://duke.app.box.com/folder/145243525717
P2

Completed

as of July 28, 2021

P2-K Move items in item status 23 (Math bound periodicals) to item status 06 (AskTech #6258)There was never any reason to have a separate bound periodicals item status for Math materials.Corrects data errors and simplifies migration to FOLIOSErin Nettifeehttps://duke.app.box.com/folder/145240526069
P2

Completed

as of July 31, 2020

P2-L Items with blank item statuses (AskTech #5108)"This is from the material type work. I’ve attached SQL output for items with no item status field to be reviewed and corrected."Corrects data errors and simplifies migration to FOLIOSErin Nettifeehttps://duke.app.box.com/folder/145244191937
P2

Completed

as of 3/15/2022

P2-M National Academy Press print records with eholdings PENAPDelete 856's from records and delete PENAP holdingsCorrects mixed print/e records and error log (FOLIO)M (1,351)Pat, MADShttps://duke.app.box.com/folder/145996912499
P2Postponed

P2-N IPS cleanup discovered by /wiki/spaces/LIB/pages/34898762 working group meetings (AskTech #6782, #6783, #6785, #6787) as of 7/23/2021.

Several IPS codes were identified for deprecation and need to be removed from the Aleph tables after data has been cleaned up

Corrects data errors and simplifies migration to FOLIOMErin Nettifeehttps://duke.app.box.com/folder/151780828202
P3PostponedP3-A YBP MARC to 360MARCMoving the source for ebook MARC records from YBP to 360MARC will both reduce our expenditure for those records and improve the process for staff.Facilitates workflow efficiency and supports  a single e-resource metadata flowMCRA, RDD, MADShttps://duke.app.box.com/folder/145243616075
P3

Completed

as of March 16, 2022

P3-B Item Status 04 (AskTech #4772)

some overlap with P2-I

Virtual Materials -- need to update IPS or remove items or entire bib, depending.  Spreadsheet in MADS Dept. L:driveCorrects data errors and simplifies migration to FOLIOL (15,000)Erin Nettifeehttps://duke.app.box.com/folder/145244230655
P3

Completed

as of June 3, 2022

P3-C Batch change remaining serials with BK in item type to SERBased on ARC reportsCorrects data errors and simplifies migration to FOLIOLJacquiehttps://duke.app.box.com/folder/145244473682
P3

Completed 

as of October 14, 2021

P3-D Bib 940 tags


Get sysIDs, run "no book  report" those without real barcodes should be suppressed from discovery.Corrects data errors and simplifies migration to FOLIO
MMIThttps://duke.app.box.com/folder/145243584959
P3

Completed

as of June 29, 2022

P3-E Bib 952 tags"Duke Items Gap" -- needs metadata evaluation
Run "no book report" analyze to ensure that each has HOL and ITM data
Corrects data errors and simplifies migration to FOLIOM (15,329)MMIThttps://duke.app.box.com/folder/145244535243
P3

Postponed

Rerun extract and determine how many are left post datasync.

P3-F 069, 016, & 040 $aNLM system number weirdness
Identify records and overlay with new OCLC records where data is already fixed
Corrects data errors and simplifies migration to FOLIOSMMIThttps://duke.app.box.com/folder/145244429107
P3Postponed

P3-G 035 number begins with T

Action plan written, cleanup needed

Review "035 starts with T" to find out if these records are needed. Many also have 'real' OCLC numbers in a second 035, those without real OCLC numbers may need to be deleted as they were created in pre-DRA as temporary records.Corrects data errors and simplifies migration to FOLIOS (1,656)Jacquie, Natalie
P4PostponedP4-A Delete HOL 866|z notes "not currently received"Batch updates not possible in FOLIOBenefits holdings data integrity, and simplifies data for FOLIO migrationM (1,269)MMIThttps://duke.app.box.com/folder/145244440118
P4

Completed

as of December, 2021

P4-B Clean up copy numbers ( c.1 ) in "Item Description"

Send requirements to Jeff F. so that this can be done during migration

Copy numbers are also held in copy number field; confirm how spine label printing might work in FOLIO. The Description field does not have a natural home in the migration, so may be put in notes or left behind in migration.


RL wants to retain information in the Item description field, including copy numbers.

Corrects data repetition and simplifies migration to FOLIO
MADS, LSIShttps://duke.app.box.com/folder/145244660327
P4PostponedP4-C Remove MARC 590 *cataloger initials

Simply delete these fields (but not all 590s).  Consider that this set will be smaller post-OCLC reclamation sync.

RLTS has no concerns on this.

Corrects data errors and simplifies migration to FOLIOM (34,628)MMIThttps://duke.app.box.com/folder/145244113387
P4

Completed

as of October 2021

P4-D Delete LEILA test record set

Deprecated after analysis revealed not actual data set

(982 plus 910=LEILA)

will not impact data sync due to lack of OCLC number

Corrects data errors and simplifies migration to FOLIOS (146)MMIThttps://duke.app.box.com/folder/145243773408
P5PostponedP5-A Loading/Updating MeSH into DUK12Fix match so that it hits 016 properly. Fix indexing table to undo previous adjustmentStorage of all types of  purchased Authority Data as is a best practiceLRDD/MCLhttps://duke.app.box.com/folder/145240641200
P5

Completed

as of June 24, 2022

P5-BRemove Duke Press duplicate bib records (DUKIR DUKPR)
M (2690)