92 items
Using Lexical Resources for Irony and Sarcasm Classification
The paper presents a language dependent model for classification of statements into ironic and non-ironic. The model uses various language resources: morphological dictionaries, sentiment lexicon, lexicon of markers and a WordNet based ontology. This approach uses various features: antonymous pairs obtained using the reasoning rules over the Serbian WordNet ontology (R), antonymous pairs in which one member has positive sentiment polarity (PPR), polarity of positive sentiment words (PSP), ordered sequence of sentiment tags (OSA), Part-of-Speech tags of words (POS) ...... One of the first challenges one encounters while trying to solve tasks of automatic recognition of verbal irony is selection of the col- lection of texts andmarking ironic statements in it. For that purpose, online resources, such as Twitter, are used very frequently, where the hashtag #irony can be used ...
... lexical resources. Although resources we are using were developed for Serbian primarily, their development was based on traditional re- sources and texts covering to certain extent other related languages as well, making them suitable for this task. A language classifier was built and assessed in the ...
... Serbian that combines three NLP tasks: PoS tagging, compound and named-entity recognition [10] (step 5 in Fig. 1) that was trained on various annotated texts – literary, newspaper and textbooks. Tagging results are represented by two previously given sentences (double-underlined are incor- rectly tagged words ...Miljana Mladenović, Cvetana Krstev, Jelena Mitrović, Ranka Stanković. "Using Lexical Resources for Irony and Sarcasm Classification" in Proceedings of the 8th Balkan Conference in Informatics (BCI '17), New York, NY, USA, : ACM (2017). https://doi.org/
Preparation of Multimedia Document “YU Rock Scene”
SUMMARY: This study will present the preparation process of a multimedia document entitled YU ROCK SCENE in which participants were senior students of undergraduate studies of the Department of Library and Information Science at the University of Belgrade Faculty of Philology during the academic year 2014/2015, as a part of the subject Multimedia Documents. This study gives an overview of the historical development of rock and roll in the territory of the former Yugoslavia, rock scene in Yugoslav republics, ...... sound and rock style) and the western, mostly Anglo-American rock. Unlike the world rock scene whose texts were a powerful means of propaganda against wars and class conflicts, original domestic rock texts had visual and musical identity related to youth fantasies, dreams about success, as well as to the ...
... music had much greater presence. Due to the conflict with the Soviet Union during the Cold War, Yugoslavia, one of the founding countries of the Non-Aligned Movement, was more open to the West and all products of pop-culture, especially American pop-culture. Yugoslavia was thus the only Communist country ...
... system of Yugoslavia. Until the emergence of punk and the New Wave the main topic of lyrics was love. With the development of punk and the New Wave the texts gained new breadth and complexity which was at variety with the social, cultural and political norms of the time. Although the New Wave was equated ...Milena Obradović, Aleksandra Arsenijević, Mihailo Škorić. "Preparation of Multimedia Document “YU Rock Scene”" in Infotheca - Journal for Digital Humanities, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2016.16.1_2.6
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian
Uvredljivi govor na društvenim medijima, uključujući psovke, pogrdni govor i govor mržnje, dostigao je nivo pandemije. Sistem koji bi bio u stanju da detektuje takve tekstove mogao bi da pomogne da internet i društveni mediji postanu bolji virtuelni prostor sa više poštovanja. Istraživanja i komercijalna primena u ovoj oblasti do sada su bili fokusirani uglavnom na engleski jezik. Ovaj rad predstavlja rad na izgradnji AbCoSER-a, prvog korpusa uvredljivog govora na srpskom jeziku. Korpus se sastoji od 6.436 ručno označenih ...... in social media, including profanities, derogatory and hate speech, has reached the level of a pandemic. A system that would be able to detect such texts could help in making the Internet and social media a better and more respectful virtual space. Research and commercial application in this area were ...
... written in Serbian. The resulting data set had 6,436 tweets and this set was used for annotation. Tweeter data differs significantly from other types of texts, e.g. books or newspaper articles, meaning that there are specific issues that have to be considered when processing such data. Some of them are: 1 ...
... and current circumstances to understand and annotate the message. In the next phase, we plan to extend the AbCoSER corpus with new tweets and with texts from other sources e.g. online news comments. Meanwhile, we started developing models for the automatic classification of abusive tweets and the first ...Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih. "A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian" in 3rd Conference on Language, Data and Knowledge (LDK 2021), MDPI AG (2021). https://doi.org/10.4230/OASIcs.LDK.2021.13
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion... run on any personal computer under Windows and supports simultaneous manipulation of various language resources: e-dictionaries, wordnets, and aligned texts. Implementation of LeXimir followed a modular approach. Namely, there exists a common core of the system, which is coupled with several modules ...
... and the wordnet can be used in production of concordances for aligned 1LeXimir is available under CC NC BY licence. For more information see http://korpus.matf.bg.ac.rs/soft/LeXimir.html Fig. 3. LeXimir’s editor for MWU dictionaries texts. On the other hand, it enables the use of LeXimir Core in different ...
... ” in Proceedings of HLT/EMNLP on Interactive Demonstrations, ser. HLT-Demo ’05, 2005, pp. 10–11. [4] C. Krstev, Processing of Serbian — Automata, Texts and Electronic Dictionaries. Belgrade: Faculty of Philology, University of Belgrade, 2008. [5] A. Savary, “Computational Inflection of Multi-Word Units ...Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)
Building Terminological Resources in an e-Learning Environment
... The importance of terminological resources for specific domains in electronic format is growing with the rapidly expanding availability of various texts on the web. First and foremost, they are indispensable in information an document retrieval systems. In addition to monolingual resources, machine ...
... When blended learning is implemented, where e-learning is an important part of the learning process, then the ever expanding number of available texts in electronic form on the web makes this issue even more critical. Bearing this in mind, we have recognized the necessity of developing electronic ...Ranka Stanković, Ivan Obradović, Olivera Kitanović, Ljiljana Kolonja. "Building Terminological Resources in an e-Learning Environment" in Proceedings of the Third International Conference on e-Learning, eLearning-2012, September 2012, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2012)
Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology
Mihailo Škorić, Mauro Dragoni (2019)This paper is a result of a task that was presented to attendants of Keyword Search in Big Linked Data summer school, that was organized by Vienna University of Technology, under the Keystone COST action in the summer of 2017. It presents a specific approach to the classification via creation of minimal document surrogates based on the US National medical library’s MeSH ontology, which is derived from the Medical Subject Headings thesaurus. In a series of previously classified medically ...... library’s MeSH ontol- ogy, which is derived from the Medical Sub- ject Headings thesaurus. In a series of previ- ously classified medically related texts, which are the bases for the task, all of the signifi- cant terms are located and replaced with tax- onomical references from the MeSH ontology. Extracted ...
... similar to the document that is the subject of the classification.7 The problem that arises in calculating the coefficient of similarity be- tween texts is high computer cost, which must be paid either in processing power or high execution time. For this reason, the first step in classifying (and indexing) ...Mihailo Škorić, Mauro Dragoni. "Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology" in Infotheca, Faculty of Philology, University of Belgrade (2019). https://doi.org/10.18485/infotheca.2019.19.1.3
FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain
U radu se daje kratak prikaz teorije semantike okvira, na kojoj je zasnovana leksička baza Frejmnet. Predstavljena je koncepcija ove mreže, kao i mogućnosti njene primene. Predstavljena je i leksička analiza koja se primenjuje u projektu izrade Frejmneta i ukazano na razlike između analize zasnovane na okviru u odnosu na analizu zasnovanu na reči. Zatim je prikazano nekoliko povezanih okvira koje prizivaju reči iz domena rizika. U radu je predstavljena i platforma NLTК pomoću koje se mogu koristiti ...... FrameNet2 is a lexical database of English based on annotated examples of how a lexical unit (hereinafter abbreviated as LU) is used in an actual7 texts. The basic premise comes down to the fact that most LUs are best defined through semantic frames, a conceptual structure that provides a description ...
... ion 22 Infotheca Vol. 21, No. 1, September 2021 Scientific paper technologies (Tomašević et al. 2018, 996). Back then, the corpus contained texts from the domain of mining and similar research areas with a total of 172 documents (in Serbian) and 2.7 million words in the first iteration (997). ...
... 2021. “A Data Driven Approach for Raw Material Terminology.” Applied Sciences 11 (7): 2892. Krstev, Cvetana. 2008. Processing of Serbian. Automata, texts and electronic dictionaries. Faculty of Philology of the University of Belgrade. Krstev, Cvetana, and Duško Vitas. 2005. “Corpus and Lexicon-Mutual ...Aleksandra Marković, Ranka Stanković, Natalija Tomić, Olivera Kitanović. "FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.1
Developing Termbases for Expert Terminology under the TBX Standard
... processing texts containing expert terminology, such as the textbook “Introduction to Mining’. The approach also envisages in- tegration with cascades for named entity recognition such as mining equipment, specific minerals and the like. Building of an aligned Serbian-English corpus of texts in the area ...
... word forms. This information is essential for proper processing of all texts, such as lematization, morphological analysis, named entity recognition and the like. This is especially important in the case of domain specific texts as in the fields of geology or mining. Thus, appropriate electronic mor ...
... lezisSte mineralnih sirovina.N:w4qn Domain specific e-dictionaries are especially important in recognition of com- pound words in texts featuring expert terminology, as such texts usually abound with compounds having a meaning often very different from the meaning of each of their components. Thus if such ...Ranka Stanković, Ivan Obradović, and Miloš Utvić. "Developing Termbases for Expert Terminology under the TBX Standard" in Natural Language Processing for Serbian - Resources and Applications, Belgrade : University of Belgrade, Faculty of Mathematics (2014)
Automatic construction of a morphological dictionary of multi-word units
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion... C# and operates on the .NET platform. It supports development, maintenance and exploitation of various resources: e- dictionaries, wordnets, and aligned texts. A user of this tool need not use all of these resources, or even possess them, but those that exist are visible in all modules and can be exploited ...
... Courtois, B., Silberztein, M.: Dictionnaires électroniques du français. Larousse, Paris (1990) 2. Krstev, C.: Processing of Serbian - Automata, Texts and Electronic Dictionaries. Faculty of Philology, University of Belgrade, Belgrade (2008) 3. Savary, A.: Computational Inflection of Multi-Word Units ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić. "Automatic construction of a morphological dictionary of multi-word units" in Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010, Springer (2010): 226-237. https://doi.org/10.1007/978-3-642-14770-8_26
An Approach to Efficient Processing of Multi-Word Units
Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making ...... on any personal computer under Windows and supports simul- taneous manipulation of various language resources: e-dictionaries, wordnets, and aligned texts. Implementation of LeXimir followed a modular approach. Namely, there exists a common core of the system, which is coupled with several modules ...
... morphological information to wordnet synsets, whereas both morphological dictionaries and the wordnet can be used in production of concordances for aligned texts. On the other hand, it enables the use of LeXimir Core in different scenarios: as a standalone Windows application LeXimir.exe or as a web application ...
... computational linguistics, Lecture Notes in Computer Science, vol. 377, pp. 34–50. Springer (1989) 6. Krstev, C.: Processing of Serbian — Automata, Texts and Electronic Dictionaries. Faculty of Philology, University of Belgrade, Belgrade (2008) 7. Krstev, C., Stanković, R., Obradović, I., Vitas, D ...Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas. "An Approach to Efficient Processing of Multi-Word Units" in Computational Linguistics - Applications, Studies in Computational Intelligence 458 no. 458, Berlin Heidelberg : Springer-Verlag (2013): 109-129. https://doi.org/10.1007/978-3-642-34399-5_6
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Two approaches to compilation of bilingual multi-word terminology lists from lexical resources" in Natural Language Engineering, Cambridge University Press (CUP) (2020). https://doi.org/10.1017/S1351324919000615
Advantages and challenges in presenting mathematical content using EDX platform
... to use and reuse for teaching, learning and research [1]. OERs have been used for different topics and they can be in various forms from simple texts, pictures and videos to entire courses. In this paper we discuss OERs related to Mathematics in Serbian. All over the world there are lot of ...Marija Radojičić, Ivan Obradović, Ranka Stanković, Olivera Kitanović, Roberto Linzalone. "Advantages and challenges in presenting mathematical content using EDX platform" in The Seventh International Conference on e-Learning (eLearning-2016), Belgrade : Metropolitan University (2016)
Proširivanje upita zasnovano na leksičkim resursima
U radu je opisano kako se leksički resursi za srpski jezik i softverski alati, razvijeni u okviru Grupe za jezičke tehnologije Univerziteta u Beogradu, mogu koristiti za unapređenje postavljanja upita. Rezultati pretrage mogu biti značajno unapređeni korišćenjem različitih leksičkih resursa, kakvi su morfološki rečnici i semantičke mreže. Izloženi pristup može se iskoristiti i u Sistemu naučnih, tehnoloških i poslovnih informacija, jer je efikasno pretraživanje ovog dragocenog resursa, imajući u vidu njegovu heterogenost i obim, kao i preovladavajući tekstualni sadržaj, ...... korišćenja u okviru SNTPI. LITERATURA [1] Vitas D., Pavlović-Lažetić G., Krstev C., Popović Lj., Obradović I. (2003): „Processing Serbian Written Texts: An Overview of Resources and Basic Tools“, Proc. of the International Workshop on Balkan Language Resources and Tools, Thessaloniki, Greece, ...Ranka Stanković, Ivan Obradović, Cvetana Krstev. "Proširivanje upita zasnovano na leksičkim resursima" in SNTPI 09 - Naučno-stručni skup Sistem naučnih, tehnoloških i poslovnih informacija, Beograd 19. i 20. jun 2009, Beograd : Fakultet informacionih tehnologija (2009)
GIS Application Improvement with Multilingual Lexical and Terminological Resources
... geološko društvo, Beograd, pp. 37-44. Vitas D., G. Pavlović-Lažetić, C. Krstev, Lj. Popović, I. Obradović (2003): „Processing Serbian Written Texts: An Overview of Resources and Basic Tools“, Proceedings of the International Workshop on Balkan Language Resources and Tools, Thessaloniki, Greece ...Ranka Stanković, Ivan Obradović, Olivera Kitanović. "GIS Application Improvement with Multilingual Lexical and Terminological Resources" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta, May 2010, Valetta, Malta : European Language Resources Association (2010)
Bridging Computational Lexicography and Corpus Linguistics: A Query Extension for OntoLex-FrAC
OntoLex, dominantni standard zajednice za mašinski čitljive leksičke resurse u kontekstu RDF-a, Linked Data i tehnologija Semantičkog veba, trenutno se proširuje sa posebnim modulom za Frekvencije, Primere i Informacije zasnovane na Korpusu (OntoLex-FrAC). Predlažemo novi komponent za OntoLex-FrAC, koji se bavi inkorporacijom korpusnih upita za (a) povezivanje rečnika sa korpusnim mašinama, (b) omogućavanje RDF baziranih web servisa da dinamički razmenjuju korpusne upite i podatke odgovora, i (c) korišćenje konvencionalnih upitačkih jezika za formalizaciju unutrašnje strukture kolokacija, skica reči i ...standardizacija, digitalna leksikografija, OntoLex, upiti korpusa, povezani podaci, Lingvistički povezani otvoreni podaciChristian Chiarcos, Ranka Stanković, Maxim Ionov, Gilles Sérasset. "Bridging Computational Lexicography and Corpus Linguistics: A Query Extension for OntoLex-FrAC" in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Turin, 20-25 May 2024, LREC (2024)
Integrisanje heterogenih leksičkih resursa
Osnovna aktivnost Grupe za obradu prirodnih jezika na Matematičkom fakulteta Univeziteta u Beogradu je usmerena na razvoj različitih resursa za obradu srpskog jezika. Među njima su posebno značajni sistem morfoloških rečnika srpskog jezika razvijenih u okviru mreže RELEX [1] i semantička mreža (tipa wordnet) za srpski jezik razvijena u okviru međunarodnog projekta Balkanet. Radi se o dva heterogena leksička resursa, razvijena na osnovu sasvim različitih modela, koji samim tim sadrže i različite vrste leksičkih informacija. Integracijom ovih resursa, informacije ...... 1st International Wordnet Conference, Mysore, India. [4] Vitas, D. et al. (2003). Resources and Basic Tools for the Processing of Serbian Written Texts. Proc. of the Workshop on Balkan Language Resources, 1st Balkan Conference in Informatics. [5] Vossen, P. (ed.) (1998). EuroWordNet: A Multilingual ...Ranka Stanković, Cvetana Krstev, Duško Vitas, Ivan Obradović, Gordana Pavlović-Lažetić. "Integrisanje heterogenih leksičkih resursa" in Festivalski katalog 11. Festivala informatičkih dostignuća INFOFEST 2004, 26th September - 2nd October, 2004, Budva, Montenegro, INFOFEST (2004)
Combining Heterogeneous Lexical Resources
... International Wordnet Conference, Mysore, India. - Vitas, D. et al. (2003). Resources and Basic Tools for the Processing of Serbian Written Texts. Proc. of the Workshop on Balkan Language Resources, 1st Balkan Conference in Informatics. - Vossen, P. (ed.) (1998). EuroWordNet: A Multilingual ...Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić. "Combining Heterogeneous Lexical Resources" in Proceedings of the Fourth Interantional Conference on Language Resources and Evaluation, Lisabon, Portugal , May 2004, vol. 4, ELRA - European Language Resources Association (2004)
Part of Speech Tagging for Serbian language using Natural Language Toolkit
Ranka Stanković, Boro Milovanović (2020)Dok se razvijaju složeni algoritmi za NLP (obrada prirodnog jezika), osnovni zadaci kao što je označavanje ostaju veoma važni i još uvek izazovni. NLTK (Natural Language Toolkit) je moćna Python biblioteka za razvoj programa zasnovanih na NLP-u. Pokušavamo da iskoristimo ovu biblioteku za kreiranje PoS (vrsta reči) oznake za savremeni srpski jezik. Jedanaest različitih modela je kreirano korišćenjem NLTK API-ja za označavanje. Najbolji modeli se transformišu sa Brill tagerom da bi se poboljšala tačnost. Obučili smo modele na označenom ...... extremely valuable because they are expensive to produce. Dataset used in this paper is composed of different annotated text collections (Table I). All texts are either originally written in Serbian or translated to it. 1984 is a novel by George Orwell, part of MULTEXT-East resources [9]. INTERA (Integrated ...Ranka Stanković, Boro Milovanović. "Part of Speech Tagging for Serbian language using Natural Language Toolkit" in 7th International Conference on Electrical, Electronic and Computing Engineering IcETRAN 2020, Academic Mind, Belgrade (2020)
An aproach to Implementation of blended learning in a university setting
... the problem of terminology, namely of adequate translations of terms into Serbian. In order to help students in reading scientific articles and texts in English, and finding appropriate translations in Serbian for specific terms, we have developed an electronic dictionary of basic GIS terms ...Ivan Obradović, Ranka Stanković, Olivera Kitanović, Jelena Prodanović . "An aproach to Implementation of blended learning in a university setting" in Proceedings of the Second International Conference on e-Learning, eLearning 2011, September 2011, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2011)
The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines
In this paper we present how resources and tools developed within the Human Language Technology Group at the University of Belgrade can be used for tuning queries before submitting them to a web search engine. We argue that the selection of words chosen for a query, which are of paramount importance for the quality of results obtained by the query, can be substantially improved by using various lexical resources, such as morphological dictionaries and wordnets. These dictionaries enable semantic ...LR web services, MultiWord Expressions & Collocations, Information Extraction, Information Retrieval... menu with the functions offered and the right side the login part. Besides query expansion, WS4QE also offers functions for manipulation of aligned texts and wordnet management, as listed in the menu, but we will leave here these functions aside and concentrate on query expansion. Figure 1 ...Krstev Cvetana, Stanković Ranka, Vitas Duško, Obradović Ivan. "The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines" in LREC 2008: Conference on Language Resources and Evaluation, Marrakesh, Morocco, May 2008, European Language Resources Association (ELRA) (2008)