is a tool for automatic collocation extraction and terminology lexica construction. Extraction is based on fourteen
different associatioon measures applicable to n-grams up to length four. Implemented lemmatization and POS filtering enable
TermeX to better cope with morphological complexity of natural languages.
Main features of TermeX are:
- Extraction of collocations from UTF-8 formatted text files
- Determining lists of posible collocations using one of 14 association measures
- Processing of n-grams up to length four
- Hand selection of candidate n-grams for terminology lexica
- Viewing of concordances for extracted candidates
- Exporting lists of colocations
- Processing of multiple documents
- Support for Windows and Linux operating systems
In addition to that, TermeX ensures fast and memory efficient processing of large corpora.
Frane Šarić, dipl. ing.
This work has been jointly supported by the Ministry of Science, Education
and Sports, Republic of Croatia and Government of Flanders under the grants
036-1300646-1986 and KRO/009/06 (CADIAL
Developed by TakeLab
, Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia, 2008.