The CINTIL-Treebank is currently under development.
At present it is composed of 35499 sentences.
The treebank is composed of sentences taken from the
CINTIL-International Corpus of Portuguese
(newspaper articles) and sentences of the regression corpus of the grammar LXGram.
The annotation of CINTIL-Treebank is performed according to the method of annotation presented in the literature
as that which ensures most confiability in the results obtained: multiple independent annotation, followed by adjudication.
The parser tree chosen in the annotation was produced by LXGram, a grammar for the computational processing of Portuguese.
It is being developed under the following major design features:
Tag | Category | Examples |
ADJ | Adjectives | bom, brilhante, eficaz, … |
ADV | Adverbs | hoje, já, sim, felizmente, … |
CARD | Cardinals | zero, dez, cem, mil, … |
CJ | Conjunctions | e, ou, tal como, … |
CL | Clitics | o, lhe, se, … |
CN | Common Nouns | computador, cidade, ideia, … |
DA | Definite Articles | o, os, … |
DEM | Demonstratives | este, esses, aquele, … |
DFR | Denominators of Fractions | meio, terço, décimo, %, … |
DGTR | Roman Numerals | VI, LX, MMIII, MCMXCIX, … |
DGT | Arabic Numerals | 0, 1, 42, 12345, 67890, … |
DM | Discourse Marker | olá, … |
EADR | Electronic Addresses | http://www.di.fc.ul.pt, … |
EOE | End of Enumeration | etc |
EXC | Exclamation | ah, ei, … |
GER | Gerunds | sendo, afirmando, vivendo, … |
GERAUX | Gerund "ter"/"haver" in compound tenses | tendo, havendo |
IA | Indefinite Articles | uns, umas, … |
IND | Indefinites | tudo, alguém, ninguém, … |
INF | Infinitive | ser, afirmar, viver, … |
INFAUX | Infinitive "ter"/"haver" in compound tenses | ter, haver, … |
INT | Interrogatives | quem, como, quando, … |
ITJ | Interjection | bolas, caramba, … |
LTR | Letters | a, b, c, … |
MGT | Magnitude Classes | unidade, dezena, dúzia, resma, … |
MTH | Months | Janeiro, Dezembro, … |
NP | Noun Phrases | idem, … |
ORD | Ordinals | primeiro, centésimo, penúltimo, … |
PADR | Part of Address | Rua, av., rot., … |
PNM | Part of Name | Lisboa, António, João, … |
PNT | Punctuation Marks | ., ?, (, … |
POSS | Possessives | meu, teu, seu, … |
PPA | Past Participles not in compound tenses | sido, afirmados, vivida, … |
PP | Prepositional Phrases | algures, … |
PPT | Past Participle in compound tenses | sido, afirmado, vivido, … |
PREP | Prepositions | de, para, em redor de, … |
PRS | Personals | eu, tu, ele, … |
QNT | Quantifiers | todos, muitos, nenhum, … |
REL | Relatives | que, cujo, tal que, … |
STT | Social Titles | Presidente, drª., prof., … |
SYB | Symbols | @, #, &, … |
TERMN | Optional Terminations | (s), (as), … |
UM | "um" or "uma" | um, uma |
UNIT | Abbreviated Measurement Unit | kg., km., … |
VAUX | Finite "ter" or "haver" in compound tenses | temos, haveriam, … |
V | Verbs (other than PPA, PPT, INF or GER) | falou, falaria, … |
WD | Week Days | segunda, terça-feira, sábado, … |
Tags for multi-word expressions |
LADV1…LADVn | Multi-Word Adverbs | de facto, em suma, um pouco, … |
LCJ1…LCJn | Multi-Word Conjunctions | assim como, já que, … |
LDEM1…LDEMn | Multi-Word Demonstratives | o mesmo, … |
LDFR1…LDFRn | Multi-Word Denominators of Fractions | por cento |
LDM1…LDMn | Multi-Word Discourse Markers | pois não, até logo, … |
LITJ1…LITJn | Multi-Word Interjections | meu Deus |
LPRS1…LPRSn | Multi-Word Personals | a gente, si mesmo, V. Exa., … |
LPREP1…LPREPn | Multi-Word Prepositions | através de, a partir de, … |
LQD1…LQDn | Multi-Word Quantifiers | uns quantos, … |
LREL1…LRELn | Multi-Word Relatives | tal como, … |
Tags specific to the spoken corpus |
EMP | Emphasis | |
EL | Extra-linguistic | |
PL | Para-linguistic | |
FRG | Fragment | |