HTR-United
Github

HTR-United + Github Actions

Github Actions are a way to run specific code everytime you change your data. The following form will generate a configuration file to use our 4 tools automatically and ensure both complete compatibility with the HTR-United initiative and high quality data. Some options below will focus on controling the data, your cataloging file or simply automatically updating it with the metrics of your choice !

Informations générales sur la structure du dossier

https://github.com/

This should look like https://github.com/htr-united/cremma-medieval or https://github.com/demo-user/demo-repository.

This file should have been generated with the form on this website: it describes your whole repository with metadata, make sure you did that first ! This file is generally named htr-united.yml and should be at the root of your github repository.

This path is the path on your repository. Use * for a filename wildcard, ** for a folder wildcard. You can have a single path, such as ./data/*.xml or multiple, such as ./french/data/*.xml ./english/data/*.xml. If you have a structure such as ./data/a_book/some.xml and ./data/b_book/some.xml, the UNIX path is ./data/**/*.xml

In the example on the right, if TNAH-2021-Project-Correspondance-Berlioz is the root of your data repository, the path to get all XML would be donnees/**/*.xml

folder-structure

Generaly main or master.


Test du fichier de catalogage

This uses HTRUC.

This will run a test on your htr-united.yml and makes sure it's compliant with the schemas. If the schema evolves, it allows you to ensure your compatibility with it.


Génération de rapports

This uses HTR United Metadata Generator.

This will produces in depth overview of how many characters, zones, lines and pages you have, but also gives you the amount of zone or lines per type if you typed your zones and lines.

Using the computed metrics (line, characters, files, zones), update the volume information of the cataloging file automatically.
This automatically pushes new files to your repository. This does not damage your data.

Using this, badges will be generated, such as which you can include in your README file to show how cool you are.
This automatically pushes new files to your repository. This does not damage your data.

Creates a Github release everytime master is changed. Coupled with Zenodo, this will allow you to have long term preservation and DOI (Digital Object Identifier) for your repository !


Test des fichiers XML (Structure)

This uses HTRVX.

This will help you make sure that your XML is valid and follows some other specifications.

Segmonto is an ontology for typing Zones (sometimes called Regions) and Lines. This ensures you are compatible with it. cf. Publication or Segmonto Guidelines.

This will warn you about empty lines or regions without lines, without raising an error: this won't make the test fail !

This is used on top of the former check. This will raise an error if a line has no text or if a region has no lines.

This just makes sure you are compatible with the XML schema. This should always be the case if you use a Graphical User Interface such as Transkribus or eScriptorium.

Contrôle des caractères

This uses ChocoMufin.

This will read the text of each lines and summarize the characters found.

In Generate mode, the Actions won't control your characters but will instead list all characters used. In Control mode, you need to provide a file named table.csv to help ChocoMufin control your data. See the tutorial for more details.


Étapes de mise en place

(1) Récupérez le contenu, au choix:

(2) Ajoutez le à votre dépôt en l'ajoutant dans le dossier .github/workflows

(3) Ajoutez les badges au fichier README.MD

![characters badge](badges/characters.svg) ![regions badge](badges/regions.svg) ![lines badge](badges/lines.svg) ![files badge](badges/files.svg) 
  1. Copiez
  2. Éditer le readme sur Github.