391 items
Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities
Овај рад представља активности на развоју корпуса ELEXIS-sr, српском додатку вишејезичном анотираном корпусу ELEXIS-а, који се састоји од семантичких анотација и репозиторија значења речи. ELEXIS је паралелни вишејезични анотирани корпус на десет европских језика, који може да се користи као вишејезички репер за евалуацију европских језика са мање и средње развијеним ресурсима. Фокус овог рада је на вишечланим изразима и именованим ентитетима, њиховом препознавању у скупу реченица ELEXIS-sr и поређењу са анотацијама на другим језицима. Разматрају се први кораци ...Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov. "Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
Keyword Extraction from Parallel Abstracts of Scientific Publications
... Organ. Sci. 39(1), 1–20 (2015) 2. Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of Empirical Methods in Natural Language Processing - EMNLP 2004, pp. 404–411. ACL, Barcelona (2004) 3. Marujo, L., Viveiros, M., Neto, J.P.: Keyphrase cloud generation of broadcast news. ...
... LNCS, vol. 10151, pp. 124–135. Springer, Cham (2017). https://doi.org/10.1007/ 978-3-319-53640-8_11 11. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media, Inc., Sebastopol (2009) 12. Balakrishnan, V., Ethel, L.-Y.: Stemming and lemmatization: a comparison of retrieval ...
... particular language. Keyword Extraction from Parallel Abstracts of Scientific Publications 47 2.2 Text Preprocessing Tools Serbian is a highly inflectional Slavic language. Although we use the keyword extraction method designed with light or no linguistic knowledge, some text pre- processing is needed ...Slobodan Beliga, Olivera Kitanović, Ranka Stanković, Sanda Martinčić-Ipšić . "Keyword Extraction from Parallel Abstracts of Scientific Publications" in Sematic Keyword-Based Search on Structured Data Sources - Third International KEYSTONE Conference, IKC 2017 Gdańsk, Poland, September 11–12, 2017 Revised Selected Papers and COST Action IC1302 Reports, Springer (2017)
Resource-based WordNet Augmentation and Enrichment
In this paper we present an approach to support production of synsets for SerbianWordNet(SerWN)byadjustingPrincetonWordNet(PWN)synsetsusing several bilingual English-Serbian resources. PWN synset definitions were automatically translated and post-edited, if needed, while candidate literals for Serbian synsets were obtained automatically from a list of translational equivalents compiled form bilingual resources. Preliminary results obtained from a setof1248selectedPWNsynsetsshowthattheproducedSerbiansynsetscontain 4024 literals, out of which 2278 were offered by the system we present in this paper, whereas experts added the remaining 1746. Approximately one half of ...... wordnet with wn-toolkit and cro-deriv. In Proceedings of the International Conference Recent Advances in Natural Language Processing, pages 480–487. Simões, A., Gómez, X. G., and Almeida, J. J. (2016). Enriching a portuguese wordnet using synonyms from a monolingual dictionary. In Chair), N. C. C. ...
... International Language Resources and Evaluation (LREC’10), Valletta, Malta, may. European Language Resources Asso- ciation (ELRA). Matuschek, M. and Gurevych, I. (2013). Dijkstra-WSA: A graph-based approach to word sense alignment. TACL, 1:151–164. Mladenović, M. and Mitrović, J. (2014). Natural Language ...
... Microsoft language portal10 has published Microsoft Terminology Collection data in the form of a .tbx (ISO 30042:2008) file containing: Concept ID, Definition, Source term, Source language identifier, Target term, Target language identifier. The number of terms differ from language to language, due to ...Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev. "Resource-based WordNet Augmentation and Enrichment" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018)
A Tel Platform Blending Academic And Entrepreneurial Knowledge
... Obradović, I., Vitas, D., & Utvić, M. (2010). Automatic construction of a morphological dictionary of multi-word units. In Advances in Natural Language Processing (pp. 226- 237). Springer Berlin Heidelberg. [9] Daconta, M. C., Obrst, L. J., & Smith, K. T. (2003). The Semantic Web: a guide to the ...
... mentioned that due to the complex Serbian grammar the language support system also features grammars implemented through finite state automata, finite state transducers and compound inflection rules. The language resources in the BAEKTEL language support system are managed by a web application ...
... FMG. Since it pertains to language resources, its function will be outlined in more detail in the following section. The learning content comprises of academic and entrepreneurial learning resources as well as reference resources, where the most important are language resources, which will ...Ivan Obradović, Ranka Stanković, Jelena Prodanović, Olivera Kitanović. "A Tel Platform Blending Academic And Entrepreneurial Knowledge" in Proceedings of the The Fourth International Conference on e-Learning (eLearning-2013), September 2013, Belgrade, Serbia, Belgrade, Serbia : Belgrade Metropolitan University (2013)
SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian
Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković (2019)У овом раду представљамо модел за избор добрих примера за речник српског језика и развој иницијалних компоненти модела. Метода која се користи заснива се на детаљној анализи различитих лексичких и синтактичких карактеристика у корпусу састављених од примера из пет дигитализованих свезака речника САНУ. Почетни скуп функција био је инспирисан сличним приступом и за друге језике. Дистрибуција карактеристика примера из овог корпуса упоређује се са карактеристиком дистрибуције узорака реченица ексцерпираних из корпуса који садрже различите текстове. Анализа је показала да ...Српски, добри примери из речника, аутоматизација израде речника, издвајање својстава, Машинско учење... Rundell, M. (2008). The Oxford Guide to Practical Lexicography. Oxford: Oxford University Press. Bird, S., Loper, E. & Klein E. (2009). Natural Language Processing with Python. O’Reilly Media Inc. Eibe, F., Hall, M. A. & Witten, I. (2016). The Weka Workbench. Online Appendix for "Data Mining: Practical ...
... Blueprint for the computerized dictionary of the Serbian language [Nacrt za informatizovani rečnik srpskog jezika]. Naučni sastanak slavista u Vukove dane, 44(3), pp. 105–116. (In Serbian, Cyrillic.) Vitas, D. & Krstev, C. (2012). Processing of Corpora of Serbian Using Electronic Dictionaries. Prace ...
... used to be the SC language territory. According to the Style Guide2, lexicographers have to choose two to six examples for each LU, taking into account the following facts: a) each example should clearly show the meaning of the LU; b) they have to be from different parts of SC language territory; c) ...Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković. "SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian" in Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference , Lexical Computing CZ, s.r.o. (2019)
A Description of Morphological Features of Serbian: a Revision using Feature System Declaration
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... (2009) Taggers Applied On Texts On Serbian Language, Language Tools And Machine Learning. In Infotheca, Vol. X, No. 2, (to appear). Przepiórkowski, A. and Woliński, M. (2003) A Flexemic Tagset For Polish. In Proc. of the Workshop on Morphological Processing of Slavic Languages : 10th Conference EACL ...
... 07 Language resource management - Feature Structures – Part 2: Feature System Declaration, ISO/TC 37/SC 4. ISO. (2009) ISO 12620 Terminology and other language and content resources – Data Categories – Specification of data categories and management of a data category registry for language resources ...
... cases from the natural gender (or sex) which affects some agreement conditions. For instance, in the next example petorica ‘five men’ has the grammatical gender feminine and natural gender masculine, and in the same time grammatical number singular and natural number plural. Natural gender and number ...Cvetana Krstev, Ranka Stanković, Vitas Duško. "A Description of Morphological Features of Serbian: a Revision using Feature System Declaration" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta : European Language Resources Association (2010)
Automatic construction of a morphological dictionary of multi-word units
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion... In: 6th LREC, Marrakech, Marocco (2008) 7. Jacquemin, C.: Spotting and Discovering Terms through Natural Language Pro- cessing. MIT Press (2001) 8. Laporte, E.: Lexicons and Grammars for Language Processing: Industrial or Hand- crafted Products? In Rezende, L.M., da Silva, B.C.D., Barbosa, J.B., eds.: ...
... units | Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić | Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010 | 2010 | | 10.1007/978-3-642-14770-8_26 ...
... word units, noun phrases, query expansion 1 Introduction We have been developing morphological electronic dictionaries of Serbian for natural language processing for many years now. Our e-dictionaries follow the methodology and format known as DELAS/DELAF, which is presented for French in [1]4. Serbian ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić. "Automatic construction of a morphological dictionary of multi-word units" in Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010, Springer (2010): 226-237. https://doi.org/10.1007/978-3-642-14770-8_26
E-Dictionaries and Finite-State Automata for the Recognition of Named Entities
Krstev Cvetana, Vitas Duško, Obradović Ivan, Utvić Miloš. "E-Dictionaries and Finite-State Automata for the Recognition of Named Entities" in Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing, FSMNLP 2011, July 2010, Blois, France, A. Maletti and M. Constant (eds.), :Association for Computational Linguistics (2011): 48-56
A Lexical Approach to Acronyms and their Definitions
In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that enables MWU lemmatization and production of entries in morphological electronic dictionaries, both for MWU and acronyms, that are provided with grammatical, syntactic, semantic and domain information. This approach enables representation that reflects complex relations between acronyms and their definitions.... Serbia, ∗(cvetana|vitas)@matf.bg.ac.rs, †ranka@rgf.bg.ac.rs Abstract In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs) ...
... while a natural number can be plural, e.g. UN je organi- zovala ‘UN organized’ (natural feminine, singular), UN je odobrio ‘UN sanctioned’ (grammatical masculine, singu- lar), UN su optuživale ‘UN were accusing’ (natural fem- inine, natural plural). It should be noted that the natural number is ...
... у Београду [ДР РГФ] A Lexical Approach to Acronyms and their Definitions | Cvetana Krstev, Duško Vitas, Ranka Stanković | Proceedings of the 7th Language & Technology Conference, November 27-29, 2015, Poznań, Poland | 2015 | | http://dr.rgf.bg.ac.rs/s/repo/item/0001760 Дигитални репозиторијум Руд ...Cvetana Krstev, Duško Vitas, Ranka Stanković. "A Lexical Approach to Acronyms and their Definitions" in Proceedings of the 7th Language & Technology Conference, November 27-29, 2015, Poznań, Poland, Springer (2015)
Increasing the Local Road Network Resilience from Natural Hazards in Municipalities in Serbia
Biljana Abolmasov, Miloš Marjanović, Ranka Stanković, Uroš Đurić, Nikola Vulović. "Increasing the Local Road Network Resilience from Natural Hazards in Municipalities in Serbia" in Progress in Landslide Research and Technology, Volume 3, Issue 1, Springer Cham. (2024). https://doi.org/https://doi.org/10.1007/978-3-031-55120-8_22
Electronic Dictionaries - from File System to lemon Based Lexical Database
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... Web, 6:363– 369. Vitas, D., Pavlović-Lažetić, G., and Krstev, C. (1993). Electronic dictionary and text processing in Serbo- Croatian. Sprache–Kommunikation–Informatik, 1:225. 10. Language Resource References Krstev, Cvetana and Vitas, Duško. (2015). Serbian Mor- phological Dictionary - SMD. ...
... developed and database exploitation stars. Given that language resources for more than 22 languages, cur- rently distributed with Unitex/GramLab, were developed in the same DELA format and that the presented migration ap- proach is language independent, it is safe to say that it will prove useful ...
... catalog of data categories (e.g., to denote gender, number, part of speech, etc.). 3Unitex is a lexically-based corpus processing suite that offers strong support for finite-state processing using morphological dic- tionaries –http://unitexgramlab.org/ Figure 1: Data categories (markers) dictionary. ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Mihailo Škorić. "Electronic Dictionaries - from File System to lemon Based Lexical Database" in Proceedings of the 11th International Conference on Language Resources and Evaluation - W23 6th Workshop on Linked Data in Linguistics : Towards Linguistic Data Science (LDL-2018), LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
On the compatibility of lexical resources for NooJ
Lexical resources for many languages are provided for the NooJ linguistic development environment. Meta-data descriptions of morphosyntactic and semantic properties of these languages and their resources are a mandatory part of each language module. In this paper we analyze how well the meta-data actually describe resources for a chosen subset of languages and to what extent are they compatible across languages to support multilingual processing. We show that there is place for improvement in both directions.... resources are a mandatory part of each language module. In this paper we analyze how well the meta-data actually describe resources for a chosen subset of languages and to what extent are they compatible across languages to support multilingual processing. We show that there is place for improvement ...
... on Language Resources and Evaluation, LREC'10, Paris: ELRA. (http://nl.ijs.si/ME/V4/) Erjavec, T., Krstev, C., Petkevič, V., Simov, K., Tadić, M., Vitas, D., 2003, “The MULTEXT -East Morphosyntactic Specifications for Slavic Languages”, in Proc. of the Workshop on Morphological Processing of Slavic ...
... Paris-Est, Institut Gaspard-Monge. ISO. (2009) ISO 12620 Terminology and other language and content resources – Data Categories – Specification of data categories and management of a data category registry for language resources. http://nl.ijs.si/ME/V4/ ...Ranka Stanković, Miloš Utvić, Duško Vitas, Cvetana Krstev, Ivan Obradović. "On the compatibility of lexical resources for NooJ" in Automatic Processing of Various Levels of Linguistic Phenomena: Selected Papers from the 2011 International Nooj Conference, Cambridge Scholars Publishing (2012): 96-108
Keyword-Based Search on Bilingual Digital Libraries
This paper outlines the main features of Biblisha, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblishsa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic ...Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović. "Keyword-Based Search on Bilingual Digital Libraries" in Semantic Keyword-Based Search on Structured Data Sources - Second COST Action IC1302 International KEYSTONE Conference, IKC 2016, Springer (2017). https://doi.org/10.1007/978-3-319-53640-8_10
English for Geology Students. 2
Lidija Beko (2023)... previous textbook with this one, putting their own principles of clarity and coherence as a way in which they wish to teach the subject of English language and geology. Six thematic units: 1. Landslides 2. Metamorphic rocks 3. Mineral deposits 4. Hydrological cycle and groundwater 5. Surface ...
... aking between registers while at the same time referring to active learning within the given context. Teaching vocabulary, which is the base of language knowledge, can be continued by creating paper vocabulary cards, and later even electronic cards, which would ensure continuity in vocabulary learning ...
... textbook to life. I would also like to express my gratitude to Ana Stojanović, who was a valuable collaborator on a large number of tasks, processing and conceptual solutions. Without her, this textbook would not have its structure. I had valuable help from my colleagues Marija Đorđević and ...Lidija Beko. English for Geology Students. 2, Belgrade : The Faculty of Mining and Geology, 2023
Bridging Computational Lexicography and Corpus Linguistics: A Query Extension for OntoLex-FrAC
OntoLex, dominantni standard zajednice za mašinski čitljive leksičke resurse u kontekstu RDF-a, Linked Data i tehnologija Semantičkog veba, trenutno se proširuje sa posebnim modulom za Frekvencije, Primere i Informacije zasnovane na Korpusu (OntoLex-FrAC). Predlažemo novi komponent za OntoLex-FrAC, koji se bavi inkorporacijom korpusnih upita za (a) povezivanje rečnika sa korpusnim mašinama, (b) omogućavanje RDF baziranih web servisa da dinamički razmenjuju korpusne upite i podatke odgovora, i (c) korišćenje konvencionalnih upitačkih jezika za formalizaciju unutrašnje strukture kolokacija, skica reči i ...standardizacija, digitalna leksikografija, OntoLex, upiti korpusa, povezani podaci, Lingvistički povezani otvoreni podaciChristian Chiarcos, Ranka Stanković, Maxim Ionov, Gilles Sérasset. "Bridging Computational Lexicography and Corpus Linguistics: A Query Extension for OntoLex-FrAC" in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Turin, 20-25 May 2024, LREC (2024)
Using technology for knowledge transfer between academia and enterprises
Ivan Obradović, Ranka Stanković (2014)... this processing the system can perform a morphological expansion of the query to improve recall, which is especially important for morphologically rich languages such as Serbian. In order to support the multilinguality of the TEL platform, LSS can also expand the query in one language to another ...
... TEL platform consists of tools and resources: learning, language and implementation resources. Among the tools some are available open source and commercial tools, some are in-house tools developed by the University of Belgrade Human Language Technology Group. Learning resources are both academic: ...
... 4 The language support system The need for multilinguality of OER is a combined effect of globalization and European integration, favoring a holistic approach that takes into account all the languages a learner may use, as opposed to the more traditional approach looking at one language at a time ...Ivan Obradović, Ranka Stanković. "Using technology for knowledge transfer between academia and enterprises" in Knowledge and Management Models for Sustainable Growth, Proc. of IFKAD 2014, 9th International Forum on Knowledge Asset Dynamics, 11-13 June 2013, Matera, Italy, Bari : IFKAD (2014)
Infotheca (Q25460443) in Wikidata
Ranka Stanković, Lazar Davidović (2021)Vikipodaci su baza znanja Zadužbine Vikimedija koja predstavlja zajednički izvor različitih vrsta podataka koje koriste ne samo drugi Vikipedijini projekti, već sve više i brojne aplikacije semantičkog veba. U ovom radu ćemo prezentovati primer integracije Vikipodataka sa digitalnim bibliotekama i eksternim sistemima, kao i mogućnost ubrzanja pripreme i unosa podataka na primeru radova iz časopisa za digitalnu humanistiku Infoteka.... ns in the domain of natural language processing. Let’s mention some of them: text classification, indexing, text analysis, summarizing, nor- malizing, linking, etc. Another important feature is multilingual support, making it possible to link each item to a label in any language registered in Wikimedia ...
... semantic web. The concept of the seman- tic web and open linked data technologies expand the traditional web by using a standard markup language and similar processing tools, where RDF (Resource Description Framework) plays a significant role and makes more efficient information retrieval solutions possible ...
... can be described by a string of statements, each of which provides a fact or a piece of data about the item. Table 1 shows several examples of natural language sentences and the encoding of this information in Wikipedia, represented as triples of subject, predicate and object (left), and in shortened ...Ranka Stanković, Lazar Davidović. "Infotheca (Q25460443) in Wikidata" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.5
Језички модели, шта је то?
Михаило Шкорић (2023)Михаило Шкорић. "Језички модели, шта је то?" in Језик данас, Нови Сад : Матица српска (2023)
Building learning capacity by blending different sources of knowledge
... available resources. During this processing the system can perform a morphological expansion of the query to improve recall, which is especially important for morphologically rich languages such as Serbian. In order to support the multilinguality of the BMP, the language support system can also expand ...
... textual resource that BMP language support system makes use of. The language support system handles various types of requests issued by users, usually in the form of a query. The requests are handled by WSDL (Web Services Description Language) described Language Web Service, basically composed ...
... terminology is concerned, a language support system is developed within the BAEKTEL metadata portal. Besides sustaining expert terminology in a multilingual environment this language support system will also improve the search and browse functions of BMP. The BMP language support system, whose structure ...Ivan Obradović, Ranka Stanković, Olivera Kitanović, Dalibor Vorkapić. "Building learning capacity by blending different sources of knowledge" in International Journal of Learning and Intellectual Capital (2016). https://doi.org/10.1504/IJLIC.2016.075698
Development of terminological resources for expert knowledge: a case study in mining
Ljiljana Kolonja, Ranka Stanković, Ivan Obradović, Olivera Kitanović, Aleksandar Cvjetić. "Development of terminological resources for expert knowledge: a case study in mining" in Knowledge Management Research & Practice, Palgrave Macmillan (2015). https://doi.org/10.1057/kmrp.2015.10