Introduction

Standardized code systems, such as NACE, ISCO, or COICOP, are often designed in a hierarchical manner, meaning they can be organized in a tree-like structure, with parent and child nodes. This allows for analysis on different levels of details. These hierarchies can potentially be exploited for the classification of text into such code systems.

Some attempts have been made to integrate the hierarchical structures into the modelling process. A review of the existing literature can be found here: https://github.com/AIML4OS/WP10/blob/main/LiteratureReviews/literature_review_hierarchical_models.pdf

Although a fair amount of research has been conducted on the topic, the resulting hierarchical models do not seem to increase performances significantly.

As part of the AIML4OS project, WP 10 “Text classification” - Cluster 3, we developed our own hierarchical text classification models. This report includes a description of the introduced methods, evaluation and comparisons, as well as conclusions on the topic of hierarchical text classification.