Informal Persian Universal Dependency Treebank (iPerUDT)
Informal Persian Universal Dependency Treebank, consisting of 3000 sentences and 54,904 tokens, is an open source collection of colloquial informal texts from Persian blogs. The corpus is annotated in CoNLL-U format within the Universal Dependencies scheme (Nivre et al., 2020).
The following Course-grained Universal Dependencies parts of speech tags (UPOS), and fine-grained language-specific parts of speech tags (XPOS) are used in this treebank.
UPOS | XPOS | Description |
ADJ | ADJ | Adjective |
ADJ | ADJ_CMPR | Comparative adjective |
ADJ | ADJ_SUP | Superlative adjective |
ADV | ADV | Adverb |
ADV | ADV_I | Adverb of interrogation |
ADV | ADV_LOC | Adverb of location |
ADV | ADV_NEG | Adverb of Negation |
ADV | ADV_TIME | Adverb of time |
ADP | P | Preposition |
AUX | V_AUX | Auxiliary/copula verb |
CCONJ | CON | Coordinating conjunction |
DET | DET | Determiner |
INTJ | INTJ | Interjection |
NOUN | N_PL | Plural noun |
NOUN | N_SING | Singular noun |
NUM | NUM | Numeral |
PART | PART | Differential object marker, focus marker, negative particle, question particle |
PRON | PRO | Pronoun |
PROPN | PROPN | Proper nouns (persons,locations, months, organizations, geopolitical entities) |
PUNCT | DELM | Punctuation/delimiter |
SCONJ | CON | Subordinating conjunction |
VERB | V_IMP | Imperative verb |
VERB | V_PA | Past tense verb |
VERB | V_PP | Past participle |
VERB | V_PRS | Present tense verb |
VERB | V_SUB | subjunctive verb |
X | FW | Foreign word |
We used the Universal Dependencies annotation scheme which produces syntactic analyses of sentences in terms of the dependency structures of dependency grammar, determined by the relation between a head and its dependents. The syntactic annotation consists of 42 dependency relations, including 32 universal and 10 language-specific relations (marked by *).
Dependency relation | Description |
acl | Clausal modifier of noun |
acl:relcl* | relative clause modifier |
advcl | Adverbial clause modifier |
advmod | Adverbial modifier |
amod | Adjectival modifier |
appos | Appositional modifier |
aux | Auxiliary |
aux:pass | Passive auxiliary |
case | Accusative marker/case marking |
cc | Coordination |
cc:preconj* | Preconjunction |
ccomp | Clausal complement |
compound | Compound |
compound:lvc* | Nominal/adjectival NVE in complex predicates |
compound:prt* | Particle NVE in complex predicates |
compound:redup* | Reduplicative words |
compound:svc* | Serial verb constructions |
conj | Conjunct |
Cop | Copula |
det | Determiner |
det:predet* | Predeterminer |
discourse | Discourse element |
discourse:top/foc* | Topic/focus marker |
dislocated | Dislocated elements |
fixed | Fixed multiword expressions |
flat | Flat multiword expressions |
goeswith | Goes with for poorly-edited words |
nmod | Nominal modifier |
nmod:poss* | Possessive/genitive modifier |
nsubj | Nominal subject |
nsubj:pass | Passive nominal subject |
nummod | Numeric modifier |
mark | Complementizer/marker |
obj | Object |
obl | Oblique |
obl:arg* | Oblique core argument |
orphan | Ellipsis constructions |
parataxis | Parataxis |
punct | Punctuation |
root | Root |
vocative | Vocative |
xcomp | Open clausal complement |
Nivre, Joakim, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajič, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis M. Tyers, and Dan Zeman. (2020). Universal dependencies v2: An evergrowing multilingual treebank collection. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC), 4027–4036.