Претрага
88 items
-
Речник САНУ као база терминолошких речника (на примеру речника кулинарства)
... extraction of multiword lexical units that are allocated to the frequency of terms that are significantly more frequent in a culinary text than in the corpus of contemporary Serbian language. Using this approach, we were able to identify extremely rich term collection for culinary lexicon contained ...Рада Стијовић, Олга Сабо, Ранка Станковић. "Речник САНУ као база терминолошких речника (на примеру речника кулинарства)" in Словенска терминологија данас, Београд : Српска академија наука и уметности (2017)
-
Automatic construction of a morphological dictionary of multi-word units
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion... in C# and operates on the .NET platform. It supports development, maintenance and exploitation of various resources: e- dictionaries, wordnets, and aligned texts. A user of this tool need not use all of these resources, or even possess them, but those that exist are visible in all modules and can be exploited ...
... The calculation is performed on the basis of dictionaries described in [10] and [11] that are part of the standard distribution of Unitex [12], a corpus processing system based on the finite-state technology. 4 Authors Suppressed Due to Excessive Length Table 1. Initial content of the Serbian m ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić. "Automatic construction of a morphological dictionary of multi-word units" in Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010, Springer (2010): 226-237. https://doi.org/10.1007/978-3-642-14770-8_26
-
Wordnet Development Using a Multifunctional Tool
Ivan Obradović, Ranka Stanković (2007)In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite ...... Management of aligned parallel texts Parallel texts, which usually originate from a text in one language and its translation in another, are often aligned at a certain level (paragraph, sentence, etc) by matching the corresponding segments of the original and its translation. Aligned parallel texts ...
... WS4LR module for management of aligned parallel texts uses texts which have previously been aligned using Xalign as an alignment tool [3]. The module converts these texts to the Translation Memory eXchange (TMX) format, which is becoming the standard format for aligned texts. Figure 4 depicts the ...
... similar manner. Figure 8. Aligned texts with highlighted words Another, more complex option is to use aligned texts. If PWN is used for the source synset, then the language of one of the parallel texts must be English. Namely, WS4LR allows the user to search aligned texts using words from both ...Ivan Obradović, Ranka Stanković. "Wordnet Development Using a Multifunctional Tool" in Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007, - (2007)
-
Named Entity Recognition for Distant Reading in ELTeC
Francesca Frontini, Carmen Brando, Joanna Byszuk, Ioana Galleron, Diana Santos, Ranka Stanković (2020)Akcija COST „Udaljeno čitanje za evropsku književnu istoriju“, koja je počela 2017. godine, ima među svojim glavnim ciljevima stvaranje višejezične zbirke evropskih književnih tekstova (ELTeC) otvorenog koda. U ovom radu predstavljamo rad koji je obavljen na ručnom označavanju selekcije ELTeC kolekcije za imenovane entitete, kao i na proceni postojećih alata za prepoznavanje imenovanih entiteta u pogledu njihove sposobnosti da automatski urade takve anotacije. U poslednjem paragrafu se razmatraju zajedničke tačke između ove inicijative i CLARIN-a.... IN se rv ic e s a n d to o ls , w ith s o m e id e a s fo r p o s s ib le c o lla b o ra tio n . 2 Developing the NE layer of the ELTeC corpus 2.1 Desiderata and annotation set N E R is a w e ll k n o w n ta s k in N L P , a n d th e re a re se v e ra l se ts o f g u id e l in ...
... 4 2 5 2 1 3 138 1 0 5 1 4 3 6 8 5 131 T a b le 1: D a ta o n th e m a n u a lly N E -a n n o ta te d c o rp u s . 2.2 Current state of the corpus T h e N E a n n o ta t io n o f th e c o rp u s is p a r t o f th e p la n fo r th e so c a lle d le v e l 2 a n n o ta tio n , w ...Francesca Frontini, Carmen Brando, Joanna Byszuk, Ioana Galleron, Diana Santos, Ranka Stanković. "Named Entity Recognition for Distant Reading in ELTeC" in CLARIN Annual Conference 2020, Oct 2020, Virtual Event, France, CLARIN (2020)
-
Чији је пример? Анализа лексичких обележја на примерима Речника САНУ
У овом раду поставља се питање: да ли се може утврдити ко је аутор неког текста уколико се анализирају искључиво његова лексичка обележја? Како бисмо покушали да добијемо одговор на ово питање, посматрали смо примере у оквиру речничког чланка појединачне лексеме Речника САНУ, који су забележени у пет томова (и то: I, II, XVIII, XIX и XX). Сваки пример је преузет из неког извора на шта упућују скраћенице, наведене у заградама. Од преко 5.000 понуђених извора, определили смо се ...... 1959. и (допуњено) 2017. Бранислава Б. Шандрих, Ранка М. Станковић, Мирјана С. Гочанин316 Утвић 2014: Miloš Utvić, The construction of reference corpus of contemporary Serbian [Izgradnja referentnog korpusa savremenog srpskog jezika] (Doc- toral dissertation, University of Belgrade). Фекете 1993: ...Бранислава Б. Шандрих, Ранка М. Станковић, Мирјана С. Гочанин. "Чији је пример? Анализа лексичких обележја на примерима Речника САНУ" in Српски језик и његови ресурси, Међународни славистички центар, Филолошки факултет, Универзитет у Београду (2019). https://doi.org/10.18485/msc.2019.48.3.ch13
-
Electronic Dictionaries - from File System to lemon Based Lexical Database
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... existing SMD to Lex- Info, as a catalog of data categories (e.g., to denote gender, number, part of speech, etc.). 3Unitex is a lexically-based corpus processing suite that offers strong support for finite-state processing using morphological dic- tionaries –http://unitexgramlab.org/ Figure 1: ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Mihailo Škorić. "Electronic Dictionaries - from File System to lemon Based Lexical Database" in Proceedings of the 11th International Conference on Language Resources and Evaluation - W23 6th Workshop on Linked Data in Linguistics : Towards Linguistic Data Science (LDL-2018), LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
-
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others (2020)Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages ...... represents the results of our evaluations on the aligned senses. The degree indicates the distribution of the alignments with respect to the senses. For instance, a de- gree of 1.182 (k1) in the case of Russian shows that every sense is at least aligned with another one. On the other hand, a low degree ...
... present a manually-annotated dataset for WSA between the English WordNet and Wiktionary. On the other hand, there are a fewer number of manu- ally aligned monolingual resources in other languages. For instance, there have been considerable efforts in aligning lexical semantic resources (LSRs) in German ...
... WordNet synsets. This method yields a sense inventory of higher coverage in comparison to taxonomy mapping techniques where Wikipedia categories are aligned to WordNet synsets (Ponzetto and Navigli, 2009). Matuschek and Gurevych present the Dijkstra-WSA algorithm as a graph-based ap- proach (Matuschek ...Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others . "A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment" in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, European Language Resources Association (ELRA) (2020)
-
Developing Termbases for Expert Terminology under the TBX Standard
... envisages in- tegration with cascades for named entity recognition such as mining equipment, specific minerals and the like. Building of an aligned Serbian-English corpus of texts in the area of mining and geology from sources like the bilingual jour- nal “Underground Mining” are underway. The possibility ...
... statistical machine translation (SMT), an approach developed at IBM in the late 1980s, now the state-of-the art paradigm in MT. The exponential growth of aligned multilingual corpora greatly improved the efficiency and accuracy of SMT in general, and many tools based on this ap- proach, such as Google Translate ...
... t❤❛♥ Developing Termbases under the TBX Standard 13 are still bound to maintain their importance in the case of expert terminology in domains where aligned corpora are sparse [10], such as, for example mining engineering or geology. In order to secure terminological consistency in one or more termbases ...Ranka Stanković, Ivan Obradović, and Miloš Utvić. "Developing Termbases for Expert Terminology under the TBX Standard" in Natural Language Processing for Serbian - Resources and Applications, Belgrade : University of Belgrade, Faculty of Mathematics (2014)