About TermitUp

TermitUp is a tool for terminology enrichment: given a domain specific corpus, TermitUp performs statistical terminology extraction and post-process the resulting term list with a series of liguistic processes and external tools such as the Añotador, to clean temporal expressions. Then, it queries several language resources (some part of the Linguistic Linked Open Data cloud) for candidate terms matching those in the term list.

TermitUp builds sense indicators for both the source and the candidate terms, and performs a Word Sense Disambiguation process (with Semantic Web Company's service), matching those concepts with the closest domain. From the concepts matched in the external resources, TermitUp retrieves every piece of information available (translations, synonyms, definitions, usage notes and terminological relations), already disambiguated, and enriches the source term lists, creating links amongst the resources in the LLOD.

Afterwards, TermitUp offers the possibility of creating hierarchical relations amongst the terms in the source list and also of validating the synonymy relations retrieved from the external resources, by applying linguistic patterns and additional language resources. Finally, the results are published in separate json-ld files, modeled in SKOS and Ontolex (users' choice).

Finally, TermitUp API publishes the enriched terminologies generated in a Virtuoso Enpoint, where they can be freely queried.

See TermitUp architecture below.

API

Test this service through TermitUp Swagger API .

TermitUp SPARQL Endpoint

Access to the generated enriched terminologies through TermitUp SPARQL Endpoint .

Contact

This work has been developed by Ontology Engineering Group (Universidad Politécnica de Madrid) as part of the PhD. thesis of Patricia Martín-Chozas. For more information, send an email to pmchozas@fi.upm.es

This service has been supported by the EU funded Prêt-à-Llod H2020 project (grant agreement No. 825182) and the Spanish project "Knowledge Spaces" project, with ref. PID2020-118274RB-I00 (Técnicas y herramientas para la gestión de grafos de conocimientos para dar soporte a espacios de datos).

Code is availabe at Prêt-à-Llod Github repository.