Verifying and Validation¶
Verify HathiTrust Package Completeness¶
This workflow takes as its input a directory of HathiTrust packages. It evaluates each subfolder as a HathiTrust package, and verifies its structural completeness (that it contains correctly named marc.xml, meta.yml, and checksum.md5 files); that its page files (image files, OCR, and optional OCR XML) are formatted as required (named according to HathiTrust’s convention, and an equal number of each); and that its XML, YML, and TIFF or JP2 files are well-formed and valid. (This workflow provides console feedback, but doesn’t write new files as output).
Validate Tiff Image Metadata for HathiTrust¶
Validate the metadata located within a tiff file. Validates the technical metadata to include x and why resolution, bit depth and color space for images located inside a directory. The tool also verifies values exist for address, city, state, zip code, country, phone number insuring the provenance of the file.
Input is path that contains subdirectory which containing a series of tiff files.
Uses Exiv2 to read the image metadata.
Validate Metadata¶
Validates the technical metadata for JP2000 files to include x and why resolution, bit depth and color space for images located inside a directory. The tool also verifies values exist for address, city, state, zip code, country, phone number insuring the provenance of the file.
Input is path that contains subdirectory which containing a series of jp2 files.
Uses Exiv2 to read the image metadata.
Verify Checksum Batch [Multiple]¶
Verify checksum values in checksum batch file, report errors. Verifies every entry in the checksum.md5 files matches expected hash value for the actual file. Tool reports discrepancies in console of Speedwagon.
Input is path that contains subdirectory which a text file containing a list of multiple files and their md5 values. The listed files are expected to be siblings to the checksum file.
Verify Checksum Batch [Single]¶
Verify checksum values in checksum batch file, report errors. Verifies every entry in the checksum.md5 files matches expected hash value for the actual file. Tool reports discrepancies in console of Speedwagon.
Input is a text file containing a list of multiple files and their md5 values. The listed files are expected to be siblings to the checksum file.