Tools

In order to produce good data and to ensure an easy way to control or quantify the data produced, we offer a certain set of tools with tutorials on how to use them both in a standalone way or in a continuous integration way.

HUMGenerator

HUMGenerator (Htr United Metadata Generator) generates metadata, specifically metrics for your corpora. It's specifically being used to generate the volume key of the catalog schema. It can generate a character count table, to help users choose new kind of ground truth to cover more characters.

See demo


HTRUC

HTRUC allows for controlling that the htr-united catalog file is parsable and HTR-United compliant. HTRUC is also meant to offer the tools to parse, augment and compile statistics about catalog file(s). This allows the HTR-United central repository to build a general catalog for everyone's use, see the main catalog. It can also be used with HUMGenerator to update a catalog file with new volumes.


HTRVX

HTRVX focuses on controlling the quality of the XML. It provides many options in terms of quality control: schema validation (checking that your ALTO or PAGE is okay), empty line or children-less region detection, Segmonto compatibility for segmentation (see the documentation on the Segmonto controlled vocabulary)

See demo

HTRVX

ChocoMufin

ChocoMufin

ChocoMufin focuses on the characters that are used in the ground truth. It can either help you have an overview of the character used, can be used to control which one are used (to avoid having two characters for the same purpose, which can be common with historical documents) or, using a conversion table, it can be used to normalize a whole set of ground-truth.