The Catta project (CATegorisation Tools Archive) aims to create tools and collect data for use in text classification. Specifically, our goal is to make the study of text classification more transparent and systematic, to make it easier to compare research and gain a deeper understanding of how classification works and what affects the results.

The main page for the tool is at, and the files can be downloaded from


  • Research infrastructure project
  • Internally funded

