Integrisano okruženje za pripremu paralelizovanog korpusa
Razvoj paralelizovanih korpusa zahteva pripremu paralelnih tekstova za njihovu integraciju u paralelizovani korpus. Reč je o jednom kompleksnom zadatku koji se može rešiti na različite načine, i koji mora da se odvija u nekoliko koraka. U ovom radu najpre je iznet postupak pripreme paralelnih tekstova za paralelizovani korpus koji se koristi u Grupi za jezičke tehnologije Univerziteta u Beogradu. Potom je dat kratak pregled programa (XAlign, Concordancier, WS4LR), odnosno softverskih alata koji se pri tome koriste. Nedostatak udobnog okruženja ...... Varga. The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC'06, ELRA, Paris, 2006. [6] Tomaž Erjavec: Compiling and Using the IJS-ELAN Parallel Corpus. Informatica, 26(3), pp. 299-307 ...
... 299-307, 2002. SUMMARY The development of aligned corpora requires a preparation of parallel texts for their integration into aligned corpora. This is a very complex task, which can be solved in different ways, and which has to be realized in several of steps. At the beginning of this paper we ...
... environment for the preparation of aligned corpora, under the name of ACIDE. For the construction of this environment we chose the C# programming language. Among other things, ACIDE provides a graphical user interface (GUI) for alignment and visualization of aligned texts, their control and correction ...Ivan Obradović, Ranka Stanković, Miloš Utvić. "Integrisano okruženje za pripremu paralelizovanog korpusa" in Zbornik radova međunarodnog simpozijuma Razlike između bosanskog/bošnjačkog, hrvatskog i srpskog jezika, Graz, Austria, April 2007, - (2007)
Softverski alati za korišćenje resursa za srpski jezik
Ivan Obradović, Ranka Stanković (2008)... other resourc- es, such as the e-corpus of Serbian, as well as parallel multilingual corpora composed of par- allel texts or bi-texts, usually comprising two texts of which one is original, and the other its translation. The majority of these parallel texts are aligned, which means that relations are ...
... research related to paraphrasing (Barzilay i McKeown, 2001). The Human Language Technology Group developed several aligned corpora, among them the largest one being the French-Serbian corpus which contains more than a million words (Vitas and Krstev, 2005). 3 WS4LR – a tool for maintenance and integrated ...
... Paris, Masson. Steinberger, R., Pouliquen B., Widiger A., Ignat C., Er- javec T., Tufiş D., Varga D. (2006) “The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languag- es”, Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’2006). Ge- noa, Italy, ...Ivan Obradović, Ranka Stanković. "Softverski alati za korišćenje resursa za srpski jezik" in INFOteka: časopis za informatiku i bibliotekarstvo, Belgrade, Serbia : Zajednica biblioteka univerziteta u Srbiji (2008)
Keyword-Based Search on Bilingual Digital Libraries
This paper outlines the main features of Biblisha, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblishsa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic ...Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović. "Keyword-Based Search on Bilingual Digital Libraries" in Semantic Keyword-Based Search on Structured Data Sources - Second COST Action IC1302 International KEYSTONE Conference, IKC 2016, Springer (2017). https://doi.org/10.1007/978-3-319-53640-8_10
A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals
This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of TMX documents generated from aligned parallel articles residing in multilingual digital libraries of e-journals. The queries initiated by a simple or multiword keyword, in Serbian or English, can be expanded by Bibliša, both semantically and morphologically, using different supporting monolingual and multilingual resources, such as wordnets and electronic dictionaries. The tool operates within a complex system composed ...... for testing our tool. The interest in collections of aligned texts and tools tailored for their search is increasing substantially, primarily due to the growing needs of statistical machine translation. Thus, for example, the OPUS corpus offers freely available parallel corpora in many languages ...
... multilingual proper name databases, which enables, among other things, versatile handling of both monolingual and aligned or comparable texts. LeXimir provides for enhanced querying of aligned texts by using available lexical resources to perform semantic and morphological expansion of queries. The ...
... aimed for search of document collections consisting of aligned parallel texts converted in TMX (Translation Memory eXchange) format. TMX is an open XML-based standard intended for easier exchange of translation memory data, that is, aligned parallel texts, between tools and translation vendors ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić. "A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals" in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, May 2012, Istanbul, Turkey, Istanbul, Turkey : European Language Resources Association (2012)
Knowledge and Rule-Based Diacritic Restoration in Serbian
In this paper we present a procedure for the restoration of diacritics in Serbian texts written using the degraded Latin alphabet. The procedure relies on the comprehensive lexical resources for Serbian: the morphological electronic dictionaries, the Corpus of Contemporary Serbian and local grammars. Dictionaries are used to identify possible candidates for the restoration, while the dataobtainedfromSrpKorandlocalgrammarsassistsinmakingadecisionbetween several candidates in cases of ambiguity. The evaluation results reveal that,dependingonthetext,accuracyrangesfrom95.03%to99.36%,whilethe precision (average 98.93%) is always higher than the recall (average 94.94%).Cvetana Krstev, Ranka Stanković, Duško Vitas. "Knowledge and Rule-Based Diacritic Restoration in Serbian" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018): 41-51
Terminological and lexical resources used to provide open multilingual educational resources
Open educational resources (OER) within BAEKTEL (Blending Academic and Entrepreneurial Knowledge in Technology enhanced learning) network will be available in different languages, mostly in the languages of Western Balkans, Russian and English. University of Belgrade (UB) hosts a central repository based on: BAEKTEL Metadata Portal (BMP), terminological web application for management, browse and search of terminological resources, web services for linguistic support (query expansion, information retrieval, OER indexing, etc.), annotation of selected resources and OER repository on local edX ...... Internet. This component consists of morphological dictionaries, WordNet, domain specific terminological resources such as GeolISSterm, RudOnto, aligned texts in TMX format, corpora etc. Special attention will be given to Termi, newly developed application for terminology management. Keywords: Open ...
... was recognized 14 most productive patterns that represent structure of MWU terms. They are represented in form of transducers applied on domain corpus to extract terminology. Examples of patterns are presented in [15]. After applying these transducers on domain text extracted potential terms were ...
... , methods and applications,” Knowledge engineering review, vol. 11, No. 2, pp. 93–136, June. 1996. [12] P. Pantel and L. Dekang, “A statistical corpus-based term extractor,” in Advances in Artificial Intelligence, 1st ed., E. Stroulia and S. Matwin, Ed. Springer Berlin Heidelberg, 2001, pp. 36-46 ...Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić. "Terminological and lexical resources used to provide open multilingual educational resources" in The Seventh International Conference on eLearning (eLearning-2016), 29-30 September 2016, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2016)
A bilingual digital library for academic and entrepreneurial knowledge management
A generic knowledge management process of organization, storage and retrieval of knowledge can suitably be fitted in a digital library. In the digital and knowledge age digital libraries can be used in knowledge management to handle intellectual assets and support knowledge creation. A multilingual digital library either stores content in more than one language or provides multilingual query access to monolingual content. In Serbia 18 of 308 scientific journals regularly published are bi-lingual, with papers simultaneously being in English ...... documents). XAlign is now integrated into Unitex1, a corpus processing system, based on automata-oriented technology (Utvić et al., 2007). Text preparation, alignment and generation of TMX documents are done within a special-purpose tool ACIDE (Aligned Corpora Integrated Development Environment) (Utvić ...
... She has developed the Serbian morphological e-dictionary, and was one of the key contributors to the development of Serbian wordnet, aligned multilingual corpus, and many other language resources. She also developed the first Serbian Named Entity Recognition system. She participated in a number ...
... lexical resources, access to aligned resources, etc.) 4 System components In designing Bibliša special attention is given to its language support component. It supports various aspects of multilingual libraries: its content is not only multilingual, but also aligned and it can be searched in any ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Dalibor Vorkapić. "A bilingual digital library for academic and entrepreneurial knowledge management" in Proceeding of 10th International Forum on Knowledge Asset Dynamics — IFKAD 2015: Culture, Innovation and Entrepreneurship: connecting the knowledge dots, Bari, Italy, 10-12 June 2015, Bari : IFKAD (2015)
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...... present briefly this system and how it was used to produce the corpus of newspaper texts an- notated with personal names – the gold standard. Section 3 describes NER systems based on Ma- chine Learning methods that were trained on the corpus derived from the gold standard, while the evaluation and discussion ...
... NER methods and tools with an ultimate goal to produce a successful hybrid system. The im- portant next step is the enhancement of our news- paper corpus with other types of text (Wikipedia articles, domain texts, literary texts). The literary texts would be particularly important for improv- ing the ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach ...... n about prove- nance and use. The class References contains metadata about the bibliographic information from the dictionary corpus. The set of markers is partially aligned with the TEI elements (and attributes) and LexInfo in order to relate the lexical data to other resources and provide automatic ...
... References Ahačič, K., Ledinek, N., & Perdih, A. (2015). Fran: The Next Generation Slovenian Dictionary Portal. In Natural Language Processing, Corpus Linguistics, Lexicography. Eight International Conference Bratislava, Slovakia, pp. 21-22. Berg, D. L., Gonnet, G. H., & Tompa, F. W. (1988). The ...Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo. "The Dictionary of the Serbian Academy: from the Text to the Lexical Database" in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana : Ljubljana University Press, Faculty of Arts (2018)
Белешка о дигитализацији речника
У раду ће се анализирати ограничења која проистичу из линеарног процеса традиционалне израде речника на примеру Речника САНУ. Начин да се превазиђу ова ограничења се састоји у формирању електронске лексикографске базе која не представља само пуку дигиталну транскрипцију папирног издања речника. Посебно се указује на чињеницу да текст речника може представљати корпус и приказују се одабрани примери анализе таквог корпуса формираног из текстове 1. и 19. тома Речника САНУ.... dictionary edition. It is additionally stressed that a dictionary text represents itself a valuable corpus for various research purposes which is illustrated by a few examples of the analysis of such corpus compiled form the first and 19th volume of the SASA Dictionary. ...Душко М. Витас, Цветана Ј. Крстев, Ранка М. Станковић. "Белешка о дигитализацији речника" in Српски језик и његови ресурси, Међународни славистички центар, Филолошки факултет, Универзитет у Београду (2019). https://doi.org/10.18485/msc.2019.48.3.ch3
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources
Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...... lemmatization for which TreeTagger trained for Serbian is available, already used for lemmatization of the Corpus of Contemporary Serbian [25]. However, this lemmatizer was trained on a general corpus that differs significantly from domain corpora, such as our textual database of geological projects, and ...
... much as possible [13]. These local grammars were organized in cascades that further resolve ambiguities [16]. NER system was evaluated on a newspaper corpus and results reported in [13] showed that F -measure of recognition was 0.96 for types and 0.92 for tokens.3 3 Tokens are all occurrences (in this ...
... B., Nikolić, V.: The devel- opment of the geolissterm terminological dictionary. INFOtheca 12(1), 49a–63a (2011) 25. Utvić, M.: Annotating the corpus of contemporary Serbian. INFOtheca - J. Inform. Librariansh. 12(2), 36a–47a (2011) 26. Vitas, D., Popović, L., Krstev, C., Obradović, I., Pavlo ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8
Creation of a Training Dataset for Question-Answering Models in Serbian
Razvoj i primena veštačke inteligencije u jezičkim tehnologijama značajno su napredovali poslednjih godina, posebno u domenu zadatka odgovaranja na pitanja (Question Answering - QA). Dok su postojeći resursi za QA zadatke razvijeni za glavne svetske jezike, srpski jezik je relativno zanemaren u ovoj oblasti. Ovaj rad predstavlja inicijativu za kreiranje obimnog i raznovrsnog skupa podataka za obučavanje modela za odgovaranje na pitanja na srpskom jeziku, koji će doprineti unapređenju jezičkih tehnologija za srpski jezik. Pored brojnih istraživanja o jezičkim modelima ...veštačka inteligencija, obrada prirodnog jezika, jezički resursi, anotirani skupovi, ekstrakcija informacija, odgovaranje na pitanjaRanka Stanković, Jovana Rađenović, Maja Ristić, Dragan Stankov. "Creation of a Training Dataset for Question-Answering Models in Serbian" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024)
Српски језик у дигиталном добу -- The Serbian Language in the Digital Age
Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević (2012)... as for example SrpNet. ‚ A reference corpus of contemporary Serbian in Eka- vian dialect is available, as well as several parallel aligned corpora, all of which are available to re- searchers of Serbian. Current research is focused on upgrading the reference corpus and expanding it with the Ijekavian ...
... versity of Novi Sad. e AlfaNum company has a con- siderable number of users among Serbian companies. e first corpus of contemporary Serbian, an electronic morphological dictionary of Serbian, aligned French- Serbian and English-Serbian corpora of literary texts, as well as different soware tools were developed ...
... developed within the scope of bilateral cooperation with France, whereas a one-million aligned English-Serbian project, lemmatised and morphologically annotated, was devel- oped within the scope of the Intera project. is corpus was used for tagger training, as well as for experiments in alignment at the word ...Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević. "Српски језик у дигиталном добу -- The Serbian Language in the Digital Age" in META-NET White Paper Series, G. Rehm, H. Uszkoreit (eds.), Springer (2012)
Глаголи у кухињи и за столом
Цветана Крстев, Биљана Лазић (2015)У раду је приказано истраживање лексике на српском језику кулинарског домена које се заснива на коришћењу доменског корпуса, електронских лексичких ресурса, пре свега WordNet-а и морфолошких речника, и локалних граматика. Приказане су доменске специфичности ових ресурса, како се користе, и међусобно употпуњују. Посебно је приказано како се коришћењем доменског корпуса могу екстраховати глаголи специфични за кулинарски домен и описати начини њиховог коришћења. Дат је попис глагола са основним подацима који је добијен применом представљених метода.аутоматска обрада, коначни трансдуктори, електронски речници, семантичке мреже, локалне граматике, кулинарство... and at a Table Summary In this paper we present a research of the lexica of the culinary domain in Serbian based on the use of the domain corpus, electronic lexical resources – WordNet and morphologcila dictionaries – and local grammars. We presented the domain characteristics of these resources ...
... can be used for research and for mutal enrichment. In more details we showed how verbs used in the culinary domain can be extracted from the domain corpus and how the extracted information can be used to describe these verbs. At the end we gave the list of all verbs extracted using the described approach ...Цветана Крстев, Биљана Лазић. "Глаголи у кухињи и за столом" in Научни састанак слависта у Вукове дане - Српски језик и његови ресурси: теорија, опис и преимене, Вол. 44/3, Београд : Међународни славистички центар (2015)
Ontološki model upravljanja rizikom u rudarstvu
Olivera Kitanović (2021)Rudarska proizvodnja obuhvata kompleksne tehnološke sisteme, što nameće potrebu za uspostavljanjem i unapređivanjem sistema upravljanja rizikom. Heterogenost i obim podataka neophodnih za upravljanje rizikom zahtevaju sistem koji ih na fleksibilan način integriše i omogućava njihovo optimalno korišćenje. Osnovni cilj ove disertacije je razvoj ontologije za domen rudarstva i na njoj zasnovanog modela za upravljanje rizikom. Njegova realizacija podrazumeva i implementaciju algoritama ekstrakcije informacija za popunjavanje ontologije, kao i odgovarajuće softversko rešenje. Razvoj modela obuhvata i značajno proširenje rudarskog korpusa, kao ...rudarstvo, rizik, upravljanje rizikom, procena rizika, ontologija, semantička mreža, ekstrakcija informacija, upravljanje znanjem, računarska lingvistika... solution. The development of the model includes a significant expansion of the mining corpus, as well as the creation of a terminological database, realized using methods of computational linguistics and a corpus of documents from the mining domain (plans, reports, laws, textbooks and monographs) ...
... prirodnog jezika (NLP): metoda konačnih automata (Gross 1987) i upitni jezik CQL (eng. Corpus Query Language) zasnovan na podudaranju obrazaca u sistemu za upravljanje velikim količinama tekstualnih podataka CQP (eng. Corpus Query Processor) (Evert 2005). Tehnikama obrade prirodnog jezika su ekstrahovani ...
... ][lemma="jesam"] 70 https://www.sketchengine.eu/ 71 https://www.sketchengine.eu/documentation/corpus-querying/ https://www.sketchengine.eu/ https://www.sketchengine.eu/documentation/corpus-querying/ 88 Gde se „[word!="\."]“ omogućava da se u tekstu mogu javiti neke reči ali ne i tačka. ...Olivera Kitanović. Ontološki model upravljanja rizikom u rudarstvu, Beograd : [O. Kitanović], 2021
An Integrated Environment for Management and Exploitation of Linguistic Resources
Ranka Stanković, Ivan Obradović (2009)... different types of dictionaries, the Group is engaged in developing other resources, such as the e-corpus of Serbian, as well as parallel multilingual corpora, with the majority of parallel texts aligned. The resources have been developed during several decades in the framework of various ...
... sentences were aligned by means of one of the available alignment methods. The tool used for the majority of alignments was XAlign, developed within LORIA, Labo- ratoire Lorrain de Recherche en Informatique et ses Appli- cations [4]. Parallel and aligned texts were grouped in aligned corpo- ra. ...
... possibility of adding hypernym literals. D. Aligned texts WS4LR contains a module for processing of parallel texts which have previously been aligned using the text align- ment tool XAlign. The module enables the transformation of texts aligned by XAlign into different formats: textual ...Ranka Stanković, Ivan Obradović. "An Integrated Environment for Management and Exploitation of Linguistic Resources" in Proceedings of the International Multiconference on Computer Science and Information Technology, Computational Linguistics – Applications Workshop (CLA09), Mrągowo, Poland, October 2009, Piscataway : IEEE (2009)
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion... can run on any personal computer under Windows and supports simultaneous manipulation of various language resources: e-dictionaries, wordnets, and aligned texts. Implementation of LeXimir followed a modular approach. Namely, there exists a common core of the system, which is coupled with several modules ...
... morphological information to wordnet synsets, whereas both morphological dictionaries and the wordnet can be used in production of concordances for aligned 1LeXimir is available under CC NC BY licence. For more information see http://korpus.matf.bg.ac.rs/soft/LeXimir.html Fig. 3. LeXimir’s editor for ...
... dictionary of forms containing all necessary grammatical information (DELAF) can be generated from it. The dictionary of forms is used in NLP tasks. Two corpus processing systems that support work with this dictionary format were developed, Unitex [2] and Nooj [3], both of which are based on the use of ...Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)
Увођење доменских и семантичких маркера за област рударства у српске електронске речнике
... information retrieval and extraction, and proposesanexpansion of the set of the semarkers for the field of mining. A brief description of the developed corpus of texts from the field of mining is also given, for the search of which the proposed markers are extremely important. ...Иван Обрадовић, Александра Томашевић, Ранка Станковић, Биљана Лазић. "Увођење доменских и семантичких маркера за област рударства у српске електронске речнике" in Научни састанак слависта у Вукове дане - Српски језик и његови ресурси: теорија, опис и примене, Београд : Међународни славистички центар на Филолошком факултету, Филолошки факултет (2017). https://doi.org/10.18485/msc.2017.46.3.ch10
A WordNet Ontology in Improving Searches of Digital Dialect Dictionary
In this paper, we present a method for automatic generation of a digital resource, which connects all indirect synonyms of a dialect term to all indirect synonyms of a corresponding term in the standard language, aiming to improve the search of a digital dialect dictionary. The method uses SWRL rules defined in the Serbian WordNet ontology to identify sets of synonymous words. It also uses e-dictionaries to produce correct lemmas in standard language that users usually employ in searches. ...... attach lemma for all inflected forms in the dialect dictionary that match a form in mor- phological e-dictionary. After separation of all synonyms aligned with a dialect lexeme (from the standard language or dialect), infinitive forms were attached to the original form. Among 4,152 filtered entries ...
... by a verb in the standard language representing a set of synonyms ob- tained from SWN ontology. Next example represents two joined sets of synonyms aligned by the verb upropastitt. upropastiti unerediti, unistiti, uprskati, zabrljati, zajebati, zakrmasiti, zasvinjiti | isabim batiSem dokrajisem istrovim ...
... 0.874, Fl = 2PR/(P + R) = 0.933, accuracy= 0.897. Table 1. The confusion matrix of the process deciding whether dictionary entries are correctly aligned with standard language entries. System Yes | System No Expert yes tp = 3022 fn = 436 Expert no fp=0 tn = 784 We can notice that the proposed method ...Miljana Mladenović, Ranka Stanković, Cvetana Krstev. "A WordNet Ontology in Improving Searches of Digital Dialect Dictionary" in New Trends in Databases and Information Systems: ADBIS 2017 Short Papers and Workshops - SW4CH (Semantic Web for Cultural Heritage) 767, Springer International Publishing (2017). https://doi.org/10.1007/978-3-319-67162-8_37
An Approach to Efficient Processing of Multi-Word Units
Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making ...... run on any personal computer under Windows and supports simul- taneous manipulation of various language resources: e-dictionaries, wordnets, and aligned texts. Implementation of LeXimir followed a modular approach. Namely, there exists a common core of the system, which is coupled with several modules ...
... morphological information to wordnet synsets, whereas both morphological dictionaries and the wordnet can be used in production of concordances for aligned texts. On the other hand, it enables the use of LeXimir Core in different scenarios: as a standalone Windows application LeXimir.exe or as a web ...
... dictionary of forms containing all necessary grammatical information (DELAF) can be generated from it, and subsequently used in various NLP tasks. Two corpus processing systems that support work with this dictionary format were developed, Unitex [13] and Nooj [20], both of which use finite-state technology ...Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas. "An Approach to Efficient Processing of Multi-Word Units" in Computational Linguistics - Applications, Studies in Computational Intelligence 458 no. 458, Berlin Heidelberg : Springer-Verlag (2013): 109-129. https://doi.org/10.1007/978-3-642-34399-5_6