Searcher Online CINTIL-Treebank

Developed at the University of Lisbon, Dept. of Informatics, by the NLX-Natural Language and Speech Group.


searcher    |    intro    |    what's in?    |    how to?

Introduction


Table of contents

CINTIL-Treebank Online Searcher

CINTIL-Treebank Online Searcher(beta version) is a freely available online service to search and view the constituency and dependency tree of the CINTIL-Treebank. This service was developed and is maintained at University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics.

The searcher allows the use of generic structural patterns of the syntactic trees in order to find those trees in the treebank that conform to these patterns. This service is a robust search tool that finds linguistic structures of great complexity.

Click here to know how to use the CINTIL-Treebank Online Searcher

CINTIL-Treebank

The CINTIL-Treebank is a corpus of syntactic trees of constituencies, composed of sentences taken from the CINTIL-International Corpus of Portuguese. This treebank is being developed and maintained at the University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics.

The annotation of CINTIL-Treebank is performed according to the method of annotation presented in the literature as that which ensures most confiability in the results obtained: multiple independent annotation, followed by adjudication. Each sentence is automatically analysed by LXGram, a grammar for the computational processing of Portuguese. Once a grammatical analysis is obtained, two independent annotators choose the analysis they each consider to be correct. In case of divergence between annotators, an adjudicator reviews their decisions and the final choice will be his. The annotators and adjudicators are specialists with post-graduate degrees in Linguistics.

The CINTIL-Treebank is currently under development. At present it is composed of 35499 sentences. The treebank is composed of sentences taken from the CINTIL-International Corpus of Portuguese (newspaper articles) and sentences of the regression corpus of the grammar LXGram.

Acquiring CINTIL-Treebank

The CINTIL-Treebank is released through ELDA-Evaluation and Language Resources Distribution Agency. The information about acquisition will be available here.

Authorship

CINTIL-Treebank Online Searcher is being developed by Patrícia Gonçalves and managed by António Branco, by the NLX-Natural Language and Speech Group, partly in the scope of the SemanticShare Project, funded by FCT-Fundação para a Ciência e Tecnologia.

Reference

The CINTIL-Treebank is described in the following publication:

Branco, António, Francisco Costa, João Silva, Sara Silveira, Sérgio Castro, Mariana Avelãs, Clara Pinto and João Graça, 2010, "Developing a Deep Linguistic Databank Supporting a Collection of Treebanks: the CINTIL DeepGramBank ", In Proceedings, LREC2010 - The 7th international conference on Language Resources and Evaluation, La Valleta, Malta, May 19-21, 2010.

When referring the CINTIL-Treebank or this online service, the CINTIL-Treeban Searcher, please use the reference above.

Contact us

Contact us using the following email address: 'nlx' concatenated with 'at' concatenated with 'di.fc.ul.pt'.

Acknowledgments

This work was partly supported by FCT-Fundation of Science and Technology under the grant FCT/PTDC/PLP/81157/2006 for the project SemanticShare. SemanticShare.
The system uses the Tregex library, available from The Stanford Natural Language Processing Group .