Searcher Online CINTIL-Treebank

Developed at the University of Lisbon, Dept. of Informatics, by the NLX-Natural Language and Speech Group.


searcher    |    intro    |    what's in?    |    how to?

What's in?


Table of contents

Treebank composition

The CINTIL-Treebank is a corpus of syntactic trees of constituencies and dependencies, composed of sentences taken from the CINTIL-International Corpus of Portuguese. This treebank is being developed and maintained at the University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics.

The CINTIL-Treebank is currently under development. At present it is composed of 35499 sentences. The treebank is composed of sentences taken from the CINTIL-International Corpus of Portuguese (newspaper articles) and sentences of the regression corpus of the grammar LXGram.

Grammar for the computational processing

The annotation of CINTIL-Treebank is performed according to the method of annotation presented in the literature as that which ensures most confiability in the results obtained: multiple independent annotation, followed by adjudication. The parser tree chosen in the annotation was produced by LXGram, a grammar for the computational processing of Portuguese. It is being developed under the following major design features:

Annotation guidelines

The treebank was designed along the principles described in the following handbooks:

Branco António, João Silva, Francisco Costa, Sérgio Castro, 2011, CINTIL TreeBank Handbook: Design options for the representation of syntactic constituency. Department of Informatics, University of Lisbon, Technical Reports series, nb. di-fcul-tp-11-02.

Branco António, Sérgio Castro, João Silva, Francisco Costa, 2011, CINTIL DepBank Handbook: Design options for the representation of grammatical dependencies. Department of Informatics, University of Lisbon, Technical Reports series, nb. di-fcul-tr-11-03.

Tagset

Lexical and part-of-speech tags

Tag
Meaning
A
Adjective
AP
Adjective Phrase
ADV
Adverb
ADVP
Adverb Phrase
C
Complementizer
CP
Complementizer Phrase
CARD
Cardinal
CONJ
Conjuction
CONJP
Conjuction Phrase
D
Determiner
DEM
Demonstrative
N
Noun
NP
Noun Phrase
P
Preposition
PP
Preposition Phrase
POSS
Possessive
QNT
Predeterminer
S
Sentence
V
Verb
VP
Verb Phrase

Part-of-speech tags

TagCategoryExamples
ADJAdjectivesbom, brilhante, eficaz, …
ADVAdverbshoje, já, sim, felizmente, …
CARDCardinalszero, dez, cem, mil, …
CJConjunctionse, ou, tal como, …
CLCliticso, lhe, se, …
CNCommon Nounscomputador, cidade, ideia, …
DADefinite Articleso, os, …
DEMDemonstrativeseste, esses, aquele, …
DFRDenominators of Fractionsmeio, terço, décimo, %, …
DGTRRoman NumeralsVI, LX, MMIII, MCMXCIX, …
DGTArabic Numerals0, 1, 42, 12345, 67890, …
DMDiscourse Markerolá, …
EADRElectronic Addresseshttp://www.di.fc.ul.pt, …
EOEEnd of Enumerationetc
EXCExclamationah, ei, …
GERGerundssendo, afirmando, vivendo, …
GERAUXGerund "ter"/"haver" in compound tensestendo, havendo
IAIndefinite Articlesuns, umas, …
INDIndefinitestudo, alguém, ninguém, …
INFInfinitiveser, afirmar, viver, …
INFAUXInfinitive "ter"/"haver" in compound tensester, haver, …
INTInterrogativesquem, como, quando, …
ITJInterjectionbolas, caramba, …
LTRLettersa, b, c, …
MGTMagnitude Classesunidade, dezena, dúzia, resma, …
MTHMonthsJaneiro, Dezembro, …
NPNoun Phrasesidem, …
ORDOrdinalsprimeiro, centésimo, penúltimo, …
PADRPart of AddressRua, av., rot., …
PNMPart of NameLisboa, António, João, …
PNTPunctuation Marks., ?, (, …
POSSPossessivesmeu, teu, seu, …
PPAPast Participles not in compound tensessido, afirmados, vivida, …
PPPrepositional Phrasesalgures, …
PPTPast Participle in compound tensessido, afirmado, vivido, …
PREPPrepositionsde, para, em redor de, …
PRSPersonalseu, tu, ele, …
QNTQuantifierstodos, muitos, nenhum, …
RELRelativesque, cujo, tal que, …
STTSocial TitlesPresidente, drª., prof., …
SYBSymbols@, #, &, …
TERMNOptional Terminations(s), (as), …
UM"um" or "uma"um, uma
UNITAbbreviated Measurement Unitkg., km., …
VAUXFinite "ter" or "haver" in compound tensestemos, haveriam, …
VVerbs (other than PPA, PPT, INF or GER)falou, falaria, …
WDWeek Dayssegunda, terça-feira, sábado, …
Tags for multi-word expressions
LADV1…LADVnMulti-Word Adverbsde facto, em suma, um pouco, …
LCJ1…LCJnMulti-Word Conjunctionsassim como, já que, …
LDEM1…LDEMnMulti-Word Demonstrativeso mesmo, …
LDFR1…LDFRnMulti-Word Denominators of Fractionspor cento
LDM1…LDMnMulti-Word Discourse Markerspois não, até logo, …
LITJ1…LITJnMulti-Word Interjectionsmeu Deus
LPRS1…LPRSnMulti-Word Personalsa gente, si mesmo, V. Exa., …
LPREP1…LPREPnMulti-Word Prepositionsatravés de, a partir de, …
LQD1…LQDnMulti-Word Quantifiersuns quantos, …
LREL1…LRELnMulti-Word Relativestal como, …
Tags specific to the spoken corpus
EMPEmphasis
ELExtra-linguistic
PLPara-linguistic
FRGFragment

Inflection tags

TagDescription
Tags for nominal categories
mMasculine
fFeminine
sSingular
pPlural
dimDiminutive
supSuperlative
compComparative
Tags for verbs
1First Person
2Second Person
3Third Person
piPresente do Indicativo
ppiPretérito Perfeito do Indicativo
iiPretérito Imperfeito do Indicativo
mpiPretérito Mais que Perfeito do Indicativo
fiFuturo do Indicativo
cCondicional
pcPresente do Conjuntivo
icPretérito Imperfeito do Conjuntivo
fcFuturo do Conjuntivo
impImperativo
Tags for infinitive verbs
iflInflected
niflNot Inflected

Gramatical Function Tagset

Tag
Meaning
C
Complement
DO
Direct Object
IO
Indirect Object
M
Modifier
N
Relationship between words and named entities
OBL
Oblique Complement
PRD
Predicate
SJ
Subject
SP
Specifier

Semantic Role Tagset

Tag
Meaning
ADV
Adverbial
ARG1
First Argument
ARG2
Second Argument
ARGA
Causative agent of verb with causative alternance
CAU
Cause
DIR
Direction
EXT
Extension
LOC
Localization
MNR
Mode
NULL
Null
PNC
Objective
POV
Viewpoint
PRD
Secondary predication
TMP
Time