Претрага
50 items
-
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others (2020)Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages ...... (2013). Dijkstra-WSA: A graph-based approach to word sense alignment. Trans- actions of the Association for Computational Linguistics, 1:151–164. Matuschek, M. and Gurevych, I. (2014). High perfor- mance word sense alignment by joint modeling of sense distance and gloss similarity. In Proceedings of ...
... monolingual word sense alignment. Different dictionaries and related resources such as word- nets and encyclopedia have significant differences in struc- ture and heterogeneity in content, which makes aligning information across resources and languages a challenging task. Word sense alignment (WSA) is a ...
... to in- crease domain coverage, enrich sense representations and decrease sense granularity (Miller, 2016). Miller and Gurevych (2014) describe a technique for constructing an n-way alignment of LSRs and applied it to the produc- tion of a three-way alignment of the English WordNet, Wikipedia and ...Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others . "A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment" in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, European Language Resources Association (ELRA) (2020)
-
Bilingual lexical extraction based on word alignment for improving corpus search
Jelena Andonovski, Branislava Šandrih, Olivera Kitanović. "Bilingual lexical extraction based on word alignment for improving corpus search" in The Electronic Library, Emerald (2019). https://doi.org/10.1108/EL-03-2019-0056
-
Resource-based WordNet Augmentation and Enrichment
In this paper we present an approach to support production of synsets for SerbianWordNet(SerWN)byadjustingPrincetonWordNet(PWN)synsetsusing several bilingual English-Serbian resources. PWN synset definitions were automatically translated and post-edited, if needed, while candidate literals for Serbian synsets were obtained automatically from a list of translational equivalents compiled form bilingual resources. Preliminary results obtained from a setof1248selectedPWNsynsetsshowthattheproducedSerbiansynsetscontain 4024 literals, out of which 2278 were offered by the system we present in this paper, whereas experts added the remaining 1746. Approximately one half of ...... the two wordnets, such as the ILI. Automatic alignment of synsets belonging to different languages is closely related to the task of pairing their word senses. This approach was followed by Matuschek and Gurevych (2013) who solved the word sense alignment (WSA) task by pairing senses with the same ...
... Malta, may. European Language Resources Asso- ciation (ELRA). Matuschek, M. and Gurevych, I. (2013). Dijkstra-WSA: A graph-based approach to word sense alignment. TACL, 1:151–164. Mladenović, M. and Mitrović, J. (2014). Natural Language Processing for Serbian – Resources and Application, chapter Semantic ...
... correlated with the comprehensiveness of the resource used in the alignment process (Hristea, 2007). Different methods and resources can be used for alignment. One of the common approaches is to take PWN as the source for alignment, and a bilingual dictionary of English and the target language. There ...Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev. "Resource-based WordNet Augmentation and Enrichment" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018)
-
Towards translation of educational resources using GIZA++
... terminology look-up, display and insertion of the search results into the text being translated. 4. ENVIRONMENT FOR TEXT ALIGNMENT Preliminary phase for the text alignment (parallelization) consists of XML document (eXtensible Markup Language) preparation according to TEI (Text Encoding Initiative) ...
... translation variants in large parallel corpora [17]. Volk et al. argue that automatic word alignment allows for major innovations in searching parallel corpora. Some online query systems already employ word alignment for sorting translation variants, but they describe the system for efficiently searching ...
... texts: English left and Serbian right Document with the extension _fs contains the information about paired segments. The method used in the alignment is based on the number of characters (length of the segment). This approach is very successful (on the average as much as 96% correctly paired ...Ivan Obradović, Dalibor Vorkapić, Ranka Stanković, Nikola Vulović, Miladin Kotorčević. "Towards translation of educational resources using GIZA++" in The Seventh International Conference on e-Learning (eLearning-2016), September 2016, Belgrade : Metropolitan Univesity (2016)
-
Using English Baits to Catch Serbian Multi-Word Terminology
In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection... and chunk alignment. In this first experiment a source language is English, a target language is Serbian, a domain is Library and Information Science for which a bilingual terminological dictionary exists. Our term extractor is based on e-dictionaries and shallow parsing, and for word alignment we use GIZA++ ...
... id- iomatic expressions using automatic word-alignment. In Proceedings of the EACL 2006 Workshop on Multi-word expressions in a multilingual context, pages 33–40. Och, F. J. and Ney, H. (2003). A Systematic Comparison of Various Statistical Alignment Models. Computational linguistics, 29(1):19–51 ...
... different Serbian domain phrases, containing 515 Serbian phrases that were not present in the existing domain terminology. Keywords: aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection 1. Motivation Terminology is rapidly developing in many research and ...Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
-
The shear strength evaluation of rough and infilled joints and its indications for stability of rock cutting in schist rock mass
Construction of E75 highway section through Grdelica gorge was one of the most demanding projects realized in recent Serbian history. The alignment approximately 25 km long consists of several tens of bridges, two tunnels, three galleries and cuts with total length of 6 km. The alignment passes through highly anisotropic Palaeozoic schist rock formation of different weathering grades. This study focuses on shear strength properties of discontinuities, which are found to be the critical feature contributing to the occurrence ...Dušan Berisavljević, Zoran Berisavljević, Svetlana Melentijević. "The shear strength evaluation of rough and infilled joints and its indications for stability of rock cutting in schist rock mass" in Bulletin of engineering geology and the environment, Springer (2022). https://doi.org/10.1007/s10064-022-02580-8
-
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
Branislava Šandrih, Ranka Stanković (2020)U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...... interface of this module is displayed in Figure 3. Figure 3. Input module of the BiLTe Web application Alignment and Post-Processing Module Aligning with GIZA++ yields a so called “phrase-table”. The alignment works in the following way. GIZA++ reads the two input texts in parallel. Whenever two bilingual ...
... English-Serbian language pair that relies on an aligned bilingual domain corpus, a termi- nology extractor for a target language and a tool for chunk alignment. We examine the per- formance of the method on a Library and In- formation Science domain. The obtained re- sults, as well as the application that ...
... Section 7. 2 Related Work Over the past years, in order to compile bilingual lexica, researchers used various techniques for MWT extraction and alignment that differ in method- ology, resources used, languages involved and purpose for which they were built. Bilingual lexica were compiled for different ...Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
-
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Two approaches to compilation of bilingual multi-word terminology lists from lexical resources" in Natural Language Engineering, Cambridge University Press (CUP) (2020). https://doi.org/10.1017/S1351324919000615
-
Integrisano okruženje za pripremu paralelizovanog korpusa
Razvoj paralelizovanih korpusa zahteva pripremu paralelnih tekstova za njihovu integraciju u paralelizovani korpus. Reč je o jednom kompleksnom zadatku koji se može rešiti na različite načine, i koji mora da se odvija u nekoliko koraka. U ovom radu najpre je iznet postupak pripreme paralelnih tekstova za paralelizovani korpus koji se koristi u Grupi za jezičke tehnologije Univerziteta u Beogradu. Potom je dat kratak pregled programa (XAlign, Concordancier, WS4LR), odnosno softverskih alata koji se pri tome koriste. Nedostatak udobnog okruženja ...... datoteka u TMX formatu na datoteke pojedinačnih jezika • vertikalizaciju teksta Sve navedene funkcije su dostupne preko menija Alignment, Tools i TMX. Meni Alignment obezbeđuje GUI za programske pakete za paralelizaciju laboratorije Loria. Pojedinačne stavke u meniju omogućavaju korišćenje svakog ...
... Encoding Initiative) consortium recommendations, and their alignment is performed at the level of paragraphs and sentences. We then give an overview of the software, namely programs (XAlign, Concordancier, WS4LR) that are used for alignment. The absence of a comfortable environment with a graphical ...
... construction of this environment we chose the C# programming language. Among other things, ACIDE provides a graphical user interface (GUI) for alignment and visualization of aligned texts, their control and correction, as well as generation of files in TMX format. ACIDE also enables the decomposition ...Ivan Obradović, Ranka Stanković, Miloš Utvić. "Integrisano okruženje za pripremu paralelizovanog korpusa" in Zbornik radova međunarodnog simpozijuma Razlike između bosanskog/bošnjačkog, hrvatskog i srpskog jezika, Graz, Austria, April 2007, - (2007)
-
Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities
Овај рад представља активности на развоју корпуса ELEXIS-sr, српском додатку вишејезичном анотираном корпусу ELEXIS-а, који се састоји од семантичких анотација и репозиторија значења речи. ELEXIS је паралелни вишејезични анотирани корпус на десет европских језика, који може да се користи као вишејезички репер за евалуацију европских језика са мање и средње развијеним ресурсима. Фокус овог рада је на вишечланим изразима и именованим ентитетима, њиховом препознавању у скупу реченица ELEXIS-sr и поређењу са анотацијама на другим језицима. Разматрају се први кораци ...Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov. "Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
-
Keyword-Based Search on Bilingual Digital Libraries
This paper outlines the main features of Biblisha, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblishsa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic ...Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović. "Keyword-Based Search on Bilingual Digital Libraries" in Semantic Keyword-Based Search on Structured Data Sources - Second COST Action IC1302 International KEYSTONE Conference, IKC 2016, Springer (2017). https://doi.org/10.1007/978-3-319-53640-8_10
-
A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals
This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of TMX documents generated from aligned parallel articles residing in multilingual digital libraries of e-journals. The queries initiated by a simple or multiword keyword, in Serbian or English, can be expanded by Bibliša, both semantically and morphologically, using different supporting monolingual and multilingual resources, such as wordnets and electronic dictionaries. The tool operates within a complex system composed ...... two alignment tools developed by LORIA (Laboratoire lorrain de recherche en informatique et ses applications), one for automatic sentence alignment of texts (Xalign, http://led.loria.fr/outils/ALIGN/align.html), and another for alignment visualization and manual correction of alignment errors ...
... improving recall in the case when the user opts for the “AND” search (Section 5). 7. Acknowledgements Preprocessing of texts and correction of alignment were done by Jelena Andonovski, PhD student at the Faculty of Philology. This research was supported by the CESAR (Central and South-East European ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić. "A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals" in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, May 2012, Istanbul, Turkey, Istanbul, Turkey : European Language Resources Association (2012)
-
A bilingual digital library for academic and entrepreneurial knowledge management
A generic knowledge management process of organization, storage and retrieval of knowledge can suitably be fitted in a digital library. In the digital and knowledge age digital libraries can be used in knowledge management to handle intellectual assets and support knowledge creation. A multilingual digital library either stores content in more than one language or provides multilingual query access to monolingual content. In Serbia 18 of 308 scientific journals regularly published are bi-lingual, with papers simultaneously being in English ...... Concordancier, developed in Loria labaratory in France (Laboratoire Lorrain de Recherche en Informatique et ses Applications) are used for alignment. The alignment method is based on the number of characters (length of the segment). Utvić reports that this approach is very successful (as much as 96% ...
... native XML DBMS database to enterprise NoSQL. In one platform, it combines a database, search engine and application services. The preliminary alignment phase consists of preparing an XML document (eXtensible Markup Language) according to TEI (Text Encoding Initiative) guidelines.2 Practically, ...
... formedness checking and validation according to a DTD (Document Type Definition) or XML Scheme can be used for that purpose. The next key step is the alignment itself: the task is to establish relations between translation equivalents in both texts. In this case, segments are paired that usually represent ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Dalibor Vorkapić. "A bilingual digital library for academic and entrepreneurial knowledge management" in Proceeding of 10th International Forum on Knowledge Asset Dynamics — IFKAD 2015: Culture, Innovation and Entrepreneurship: connecting the knowledge dots, Bari, Italy, 10-12 June 2015, Bari : IFKAD (2015)
-
The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach ...... Similarly, comparison and partial alignment of the DSA tag 3 http://www.tei-c.org/ 5 / 9 946 ProceediNGS oF tHe xviii euraLex iNterNatioNaL coNGreSS set was done with Ontolex4 and LexInfo5, but a more precise and detailed alignment is envisaged. The dictionary article ...
... particular sense of a lexical entry, and to 4 https://www.w3.org/community/ontolex/wiki/Final_Model_Specifi cation 5 http://www.lexinfo.net/ontology/2.0/lexinfo 6 / 9 947Lexicography in gLobaL contexts link a lexical entry with a set of senses. Each sense can be related ...
... guidelines for dictionary writing were used to defi ne the rules for the segmentation of the dictionary articles, the pattern recognition, and the alignment of the recognized markers with the predefi ned categories, as described in the previous section. The dictionary article units that were recognized ...Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo. "The Dictionary of the Serbian Academy: from the Text to the Lexical Database" in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana : Ljubljana University Press, Faculty of Arts (2018)
-
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... c,. . . , u. Each individual sense can be related to one or more other terms in the dictionary, and it can be followed by its bibliographic source. Digitization of DMMRT yielded 28,757 terms with a total of 37188 sense definitions, where 24,115 terms have only one sense, 2942 have 2, 890 have 3, 641 ...
... related to semantics, such as finding definitions or identifying senses in two distinct processes: word-sense disambiguation (attributing the correct sense from a predefined set of senses) and word-sense induction (clustering of senses based on word context). Also, integration of results into linked open ...
... g terms from different resources, one of the reasons for alignment of terms from multiple dictionaries (paper and electronic) was to assess term usage, which determines its importance for raw material terminology. On the other hand, alignment of terms with SrpMD was necessary, since these dictionaries ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
-
WS4LR - a Worksation for Lexical Resources
...ENG20-11902751-n n pear<SENSE>2SENSE> pear tree<SENSE>1SENSE> Pyrus communis <SENSE>1SENSE> ENG20-11902961-n hyp ...
... >ENG20-11902751-n krusxka<SENSE>1SENSE> Pyrus communis <SENSE>1SENSE> Vocxka s glatkim listom i beliI cvetovima, plodovi su slatki i ...
... them forming a semantic network (Fellbaum, 1998). Each synset word or “literal” is denoted by a “literal string” followed by a “sense tag” which represents the specific sense of the literal string in that synset, pretty much as in any explanatory dictionary, where an entry corresponding to a word is ...Cvetana Krstev, Ranka Stanković, Duško Vitas, Ivan Obradović. "WS4LR - a Worksation for Lexical Resources" in Proceedings of the Fifth Interantional Conference on Language Resources and Evaluation, Genoa, Italy, May 2006, ELRA - European Language Resources Association (2006)
-
Softverski alati za korišćenje resursa za srpski jezik
Ivan Obradović, Ranka Stanković (2008)... smaller, such as words. The sec- ond step is the alignment of segmented parallel texts by means of one of the available alignment methods. The goal is to connect equivalent seg- ments in two or more parallel texts. The method usually used for alignment at the sentence level, which is the most common ...
... SERBIAN LEXICAL RESOURCES 46a representation of each concept by a set of synon- ymous word-sense pairs that represent the basis for the central element of this base, namely the synset. The use of the word-sense pair is based on the approach used in standard dictionaries of spoken languages, where ...
... and its hypernyms 3.4 Aligned texts WS4LR contains a module for processing of parallel texts which have previously been aligned using the text alignment tool XAlign (Bonhomme et al., 2001). The module enables the transformation of texts aligned by XAlign into different formats: textual, XML, tabular ...Ivan Obradović, Ranka Stanković. "Softverski alati za korišćenje resursa za srpski jezik" in INFOteka: časopis za informatiku i bibliotekarstvo, Belgrade, Serbia : Zajednica biblioteka univerziteta u Srbiji (2008)
-
E-Connecting Balkan Languages
In this paper we present a versatile language processing tool that can be successfully used for many Balkan languages. This tool relies for its work on several sophisticated textual and lexical resources that were developed for most of Balkan languages. These resources are based on several de facto standards in natural language processing.... text logical layout. At the beginning of the alignment process all segments coincided with sentences automatically tagged by Unitex. The XAlign system [1] was used for the alignment process. Starting from the French version, the goal of the alignment was to establish 1:1 relations on the segment ...Cvetana Krstev, Ranka Stanković, Duško Vitas, Svetla Koeva. "E-Connecting Balkan Languages" in Proceedings of the Workshop Workshop on Multilingual resources, technologies and evaluation for Central and Eastern European Languages, 17 September 2009, eds. C. Vertan, S. Piperidis, E. Paskaleva and Milena Slavcheva, Borovets, Bulgaria : Association for Computational Linguistics Stroudsburg, PA, USA (2009)
-
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...... the same) or weighted, where partial overlapping is taken into account, but with some weighted value to mea- sure overlapping segment. To indicate alignment type, one can choose among the two options: the first option is greedyMatching, where the match- ing of annotations in the first and second files ...
... 2 × 3 × 4 evaluation rounds: two test sets, three NERs and four models per each. All trials were run with strict matching type and max- Matching alignment type. To indicate the chosen score type to evaluate the correspondence between one annotation from the first file and one annotation from the second ...
... entities, classes, attributes per doc- ument and collection; Gemini tool allows comparison of two text anno- tation files and provides different alignment scores. It is possible to compare a pair of XML files, a pair of files in BRAT for mat and one XML file against a file in BRAT for- mat. The first ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
-
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...... supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment between Serbian morphological dictionaries, MULTEXT-East and Universal Part-of-Speech tagset. The trained models will be used to publish the new ...
... prepara- tion of training sets to be used for different taggers and tagsets in the future. The research was focused on anno- tation schemata alignment between Serbian morphological dictionaries tagset (presented briefly in Subsection 2.1.), MULTEXT-East tagset (Erjavec, 2012), and the Universal ...Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)