File-Tracker

DRAFT – DRAFT – DRAFT – DRAFT – DRAFT

Introduction

File-Tracker is a home-grown system that watches for changes in files in the /wiki/spaces/DPPPD/pages/27197704 and reports on any such changes that it finds.  It runs periodic fixity checks (checksums) on all files in the preservation storage volumes.

Governance:

This service was created in response to the loss of similar functionality inside a former repository tool.  

It is worth noting that while fixity checking is considered to be a core requirement for preservation storage, the current repository platforms do not perform periodic fixity checks and thus we have to continue to develop and maintain File-Tracker.

Mission and Scope:

File-Tracker is used to watch the Preservation Storage Environment files and report on any changes it finds.  The assumption is that such file-changes (checksum changes) are likely to indicate damaged files or "bit-rot" – by catching those errors sooner, we have a better chance to create another copy from the still-good backup copies in the PSE.

Stakeholders:
  • John Pormann, Core Services
  • Giao Luong Baker, DPC
  • Matthew Farrell, RL/University Archives
  • DP3 Committee, including DPWG
Owners:
  • Functional Owner:  DPWG
  • Technical Owner:  David Chandek-Stark wrote the initial File-Tracker application and is the most knowledgeable in its operation
  • OIT owns and operates the primary and secondary storage volumes
Users:
  • DPC and RL staff directly read and write files onto primary storage, and would be the first ones to notice if a file was missing or "bad" and needed to be retrieved from secondary storage
  • DDR, RDR, and Dspace applications read and write files onto primary storage


Maintenance and Sustainability:

file-tracker-01.lib.duke.edu

DescriptionThis VM runs a custom Ruby-on-Rails application that presents a straightforward web-UI to users.  In the background, it is running checksum jobs on all files – it walks each file system (a lengthy process).  It also mounts a shared (backend) log-storage area that the /wiki/spaces/LIB/pages/34907272 write to – this data is not integrated with other File-Tracker data, but at least users could track down where/when a file-error occurred.
SupportIT-Core Services developed the app and supports/maintains it.https://gitlab.oit.duke.edu/dul-its/file-tracker
CommunityThere is no "community" for this particular service.
Strategy

The technology team is responsible for monitoring of security patches, etc.  


SunsettingGiven the time horizon for preservation of assets deposited with the libraries, any sunsetting effort must occur concurrently with a plan for data migration to another preservation and publication system.  Future repository platforms could implement fixity checking internally, reducing the need for File-Tracker, but we would need all repository systems to handle their own fixity checking before sunsetting File-Tracker – including the Dark Archives (which are currently just a file system).
RisksThe current File-Tracker implementation relies on periodic "walks" of entire file-tree for each of the storage volumes.  As the repository size grows, the time to perform those walks increases and could mean that we cannot walk the file system in a reasonable time-frame.  Similarly, large files take considerable time to checksum, leading to apparent "hangs" in the system.
Issue Submission and Escalation:


Review:

Annual:

The stakeholders team will meet annually with the technology team to:

  • Review the existing service description, stakeholder list, support structure,
  • Close out any issues that may no longer be relevant,
  • Estimate any project efforts or needs for the coming year,
  • Review and adjust the list of users with access to the system, including both privileged and non-privileged access control groups, in Grouper/Group Manager as well as programmatic access.

This activity falls in the November-December time frame.  Any actionable work will be documented and reviewed by DST-LT and DP3.


Servers/Service Details:

file-tracker-01.oit.duke.edumain VM running the file-tracker code
file-tracker.oit.duke.eduDNS entry