Searcher Online CINTIL-Treebank

Developed at the University of Lisbon, Dept. of Informatics, by the NLX-Natural Language and Speech Group.


searcher    |    intro    |    what's in?    |    how to?

How to?


Table of contents

Interface

The interface of CINTIL-Treebank Online Searcher is simple.

(1) To help you we have 3 examples of different levels of difficulty: simple, complex and advanced.

(2) There is a text box where you must type the syntactic pattern you want to search.

(3) You can mark the option to the show the POS tag in the trees.

(4) You can choose the number of results returned (between 1 and 20 sentences).

(5) Once the search results are returned, use the navigation buttons and arrows to search for the next results.

(6) To view the tree just place the cursor on the sentence you want and click.

(7) The syntactic tree corresponding to the sentence will appear below.

(8) The dependency tree corresponding to the sentence will appear below of syntactic tree.




Searching by linguistic tags

To start the search by linguistic tags, you must know the tags and syntax for searching.
Information about tags used in the annotation of CINTIL-Treebank is available here.
The table below presents the syntax and symbols used for searching in the CINTIL-Treebank. In the search by linguistic tags, tags should always be capitalized.

Symbol
Meaning
Example
    A << B    
A dominates B
NP << N
A >> B
A is dominated by B
V >> VP
A < B
A immediately dominates B
PP < P
A > B
A is immediately dominated by B
CONJ > NP
A $ B
A is a sister of B
NP $ CONJ
A .. B
A precedes B
P .. POSS-M
A . B
A immediately precedes B
CONJ . VP
A ,, B
A follows B
CARD ,, VP
A , B
A immediately follows B
D-SP , NP-C
A <<, B
B is a leftmost descendent of A
VP <<, P
A <<- B
B is a rightmost descendent of A
PP <<- N
A >>, B
A is a leftmost descendent of B
ADV >>, S
A >>- B
A is a rightmost descendent of B
S >>- VP
A <, B
B is the first child of A
PP <, P
A >, B
A is the first child of B
V >, VP
A <- B
B is the last child of A
PP <- NP-C
A >- B
A is the last child of B
CARD >- D-SP
A <i B
B is the ith-to-last child of A
NP-C <1 D-SP
A >i B
A is the ith-to-last child of B
ADV >1 ADVP
A <: B
B is the only child of A
NP-C <: N
A >: B
A is the only child of A
N >: NP
A <<# B
B is a head of phrase A
    D-SP <<# CARD    
A <# B
B is the immediate head of phrase A
NP <# N
@A
All tags that have string A
@NP



Searching by regular expressions

It is possible to search with regular expressions. The usual notational conventions are followed:

Alternation
Alternatives are introduced by the | (vertical bar) character: |
  • NP|VP matches all parser trees with a noun phrase and all parser trees with a verbal phrase.

Iteration
There are three forms of expressing iteration.
The .* (final mark + star) operators permit that the character/expression preceding it is matched zero or more times, provided it is enclosed in bars /:
  • /NP.*/ matches any parser tree with tag NP, for example: NP, NP-C, NP-M e NP-SJ.

Delimiters
To delimit the beginning and end of a tag, you can use special characters ^ e $. This type of search is useful when you want to find parser trees with a composition of semantic roles and grammatical tags, provided it is enclosed in bars /:
  • /^NP.*.ARG1$/ matches any parser tree with beginning with tag NP, with any tag in the middle, but ending with tag ARG1, which indicates the semantic role of first argument, for example: NP-DO-ARG1 e NP-SJ-ARG1.

Searching by words

The search can also be performed in leaves of trees where the words occur.
To find any word, type it in the text box. For example:

Click the button "Search" and all sentences where the word exists will be shown below:
The search by words depends upon their spelling in the treebank. The word can be written both in upper or lower case.
To improve the search we can try words with different spellings. See the following image:



Searching by sentence identifier

All sentences in the CINTIL-Treebank have a unique number identifier. The identifier is shown when the sentence is returned on the screen.
The user can use this number to directly find sentences in the CINTIL-Treebank. In order to search for a sentence using its number identifier the user must make a note of the corresponding returned with the sentence. The search uses the pattern "ID:". The example below shows how to use this search:

Thus the sentence with number identifier 9 is selected in the CINTIL-Treebank. To visualize the parser tree just click on the sentence.

Search Reversed

The Searcher Online CINTIL-Treebank provides an option to find parser trees that don't have a determined pattern.
To use this search option, it is required to use the word "INV", following the colon ":". Thus, the parser trees where the pattern is not found are return as a result, as in the example below:

All sentences that do not have verbal phrases will appear on the screen as a result. To visualize the parser tree just click on the sentence.