Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from
... aleksandra@unilib.bg.ac.rs Abstract In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same ...
... Croatian texts (Tadić&Šojat, 2003). Although the statistical approach has been steadily pursued by a number of researchers, development of lexical resources and local grammars has given impetus to an alternative approach, namely multi-word extraction based on linguistic rules. Recently, a rule-based ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
Integracija heterogenih tekstualnih resursa
Ranka Stanković, Ivan Obradović (2007)U radu je opisan pristup integraciji heterogenih tekstualnih resursa za srpski jezik uz pomoć jednog kompleksnog softverskog alata, razvijenog specijalno za ove potrebe. Opisani su struktura i osnovne komponente razvijenog sistema. Iznete su i mogućnosti unapređivanja resursa međusobnom razmenom informacija, koje pruža razvijeno integrisano okruženje. Konačno, opisana je i mogućnost primene integrisanih heterogenih resursa za proširenje upita, kao i pretraživanje tekstova uopšte, a naznačeni su i neki od pravaca daljeg razvoja.
... Unicodea. Da bi se rešili ovi problemi heterogenosti, nastalo je integrisano i prilagodljivo softversko rešenje, nazvano WS4LR (Work Station for Lexical Resources) kojim je omogućeno upravljanje i rad pojedinačnim resursima, kao i njihovo integrisanje (Krstev et al. 2006). Iz perspektive funkcionalnosti ...
... integration. In this paper we outline the structure and main components of the system we developed under the name of WS4LR (WorkStation for Lexical Resources), which synchronously handles corpora of Serbian, multilingual aligned corpora, a system of morphological dictionaries for Serbian, the Serbian ...Ranka Stanković, Ivan Obradović. "Integracija heterogenih tekstualnih resursa" in Zbornik radova međunarodnog simpozijuma Razlike između bosanskog/bošnjačkog, hrvatskog i srpskog jezika, Graz, Austria, April 2007, - (2007)
Using English Baits to Catch Serbian Multi-Word Terminology
In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a
aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection
... differ both in resources and techniques used and in purpose for which they are compiled. In several cases the bilingual MWE lists are produced in order to improve statistical machine transla- tion (Bouamor et al., 2012; Tsvetkov and Wintner, 2010) or to help developing certain lexical resources in the tar- ...
... different ways of utilizing existing lexical re- sources to improve the quality of statistical machine align- ment. In order to do that we have augmented the set of aligned sentences with inflected forms (English/Serbian). We have used two bilingual lexical resources. (a) Serbian Wordnet (SWN) (Cvetana ...Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
Ontološki model upravljanja rizikom u rudarstvu
Olivera Kitanović (2021)Rudarska proizvodnja obuhvata kompleksne tehnološke sisteme, što nameće potrebu za uspostavljanjem i unapređivanjem sistema upravljanja rizikom. Heterogenost i obim podataka neophodnih za upravljanje rizikom zahtevaju sistem koji ih na fleksibilan način integriše i omogućava njihovo optimalno korišćenje. Osnovni cilj ove disertacije je razvoj ontologije za domen rudarstva i na njoj zasnovanog modela za upravljanje rizikom. Njegova realizacija podrazumeva i implementaciju algoritama ekstrakcije informacija za popunjavanje ontologije, kao i odgovarajuće softversko rešenje. Razvoj modela obuhvata i značajno proširenje rudarskog korpusa, kao
rudarstvo, rizik, upravljanje rizikom, procena rizika, ontologija, semantička mreža, ekstrakcija informacija, upravljanje znanjem, računarska lingvistika
... Belgrade, Serbia. Krstev, Cvetana, Ranka Stanković, Ivan Obradović, and Biljana Lazić. 2015. “Terminology Acquisition and Description Using Lexical Resources and Local Grammars.” In Proceedings of the 11th International Conference on Terminology and Artificial Intelligence, edited by Thierry Poibeau ...
... Writing Dictionaries to Weaving Lexical Networks.” International Journal of Lexicography 27 (4): 396–418. Radojičić, Marija, Ivan Obradović, Ranka Stanković, Miloš Utvić, and Sebastijan Kaplar. 2018. “A Mathematical Learning Environment Based on Serbian Language Resources.” In Proceedings of the 7th ...Olivera Kitanović. Ontološki model upravljanja rizikom u rudarstvu, Beograd : [O. Kitanović], 2021
Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model
Ova studija predstavlja analizu sentimenta srpskih starih romana iz perioda 1840-1920, koristeći veliki jezički model (LLM) Mistral za tehniku učenja sa zasnovani na takozvanim "zero" i "few-shot" pokušajima. Glavni pristup uvodi inovacije osmišljavanjem istraživačkih upita (promptova) uključuju tekst sa uputstvom za klasifikaciju bez primera i na osnovu nekoliko primera, omogućavajući jezičkom modelu da klasifikuje osećanja u pozitivne, negativne ili objektivne kategorije. Ova metodologija ima za cilj da pojednostavi analizu osećanja ograničavanjem odgovora, čime se povećava preciznost ...Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković, Biljana Rujević. "Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model" in In Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024), BAS (2024)
SrpELTeC: A Serbian Literary Corpus for Distant Reading
U članku je predstavljen SrpELTeC, korpus razvijen u okviru akcije COST Distant Reading for European Literary History (CA16204). Svi romani u SrpELTeC-u su odabrani, pripremljeni i obeleženi korišćenjem zajedničkih principa uspostavljenih za sve jezičke zbirke u Evropskoj zbirci književnog teksta (ELTeC). Navedeni su izazovi i rešenja u pripremi SrpELTeC od nule. Svi romani su ručno kodirani u TEI sa bogatim metapodacima i strukturnim napomenama. Automatska anotacija je uključivala POS-označavanje, lematizaciju i imenovane entitete, oslanjajući se na resurse za obradu ...digital humanities, Serbian literature, text corpora, distant reading , linked data, named entity recognition, text analyticsRanka Stanković, Cvetana Krstev, Duško Vitas. "SrpELTeC: A Serbian Literary Corpus for Distant Reading" in Primerjalna književnost, Research Centre of the Slovenian Academy of Sciences and Arts (2024). https://doi.org/10.3986/pkn.v47.i2.03
Old or New, We Repair, Adjust and Alter (Texts)
Cvetana Krstev, Ranka Stanković (2020)U ovom radu predstavljamo kako se e-rečnici i kaskade transduktora konačnih stanja implementirani u alatu Unitex mogu koristiti za rešavanje tri problema transformacije teksta: ispravljanje tekstova nakon OCR-a, vraćanje dijakritičkih znakova i prebacivanje između različitih jezičkih varijanti.
ispravka teksta, OCR greške, restauracija dijakritika , jezičke varijante, elektronski rečnik, transduktori konačnih stanja
... text”, Acm Computing Surveys (CSUR) Vol. 24, no. 4 (1992): 377–439 Lazić, Biljana and Mihailo Škorić. “From DELA based Dictionary to Lex- imirka Lexical DataBase”. Infotheca – Journal for Digital Humanities Vol. 19, no. 2 (2019): 00–00, https://infoteka.bg.ac.rs/ojs/index. php/Infoteka/article/view/2019 ...
... word form is saved as the value of a new marker +CR=; – All information that is not needed for this procedure is deleted (lemma, POS, syntactic and semantic markers, etc.); – Information about the frequency of the original word form in SrpKor is added; – Information for same word forms is merged. This ...Cvetana Krstev, Ranka Stanković. "Old or New, We Repair, Adjust and Alter (Texts)" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.3
Serbian NER&Beyond: The Archaic and the Modern Intertwinned
U ovom radu predstavljamo srpski književni korpus koji se razvija pod okriljem COST Akcije „Distant Reading for European Literary History" CA16204. Koristeći ovaj korpus romana napisanih pre više od jednog veka, razvili smo i učinili javno dostupnim Sistem za prepoznavanje imenovanih entiteta (NER) obučen da prepozna 7 različitih tipova imenovanih entiteta, sa konvolucionom neuronskom mrežom (CNN), koja ima F1 rezultat od ≈91% na test skupu podataka. Ovaj model je dalje ocenjen na posebnom skupu podataka za evaluaciju. Završavamo poređenje
... given in Section 5. Finally, conclusions and plans for the future work were stated in Section 6. 2 Related Work The existence of large-scale lexical resources for Serbian, e-dictionaries in particular (Kr- stev, 2008), coupled with local grammars in the form of finite-state transducers (Vitas and Krstev ...
... Named Entity Recognition and Relation Extraction of Financial News. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 2293–2299, Marse- ille, France. European Language Resources As- sociation. Ridong Jiang, Rafael E Banchs, and Haizhou Li. 2016. Evaluating and Combining Name ...Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić. "Serbian NER&Beyond: The Archaic and the Modern Intertwinned" in Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications, INCOMA Ltd. Shoumen, BULGARIA (2021). https://doi.org/10.26615/978-954-452-072-4_141
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Two approaches to compilation of bilingual multi-word terminology lists from lexical resources" in Natural Language Engineering, Cambridge University Press (CUP) (2020). https://doi.org/10.1017/S1351324919000615
SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian
Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković (2019)У овом раду представљамо модел за избор добрих примера за речник српског језика и развој иницијалних компоненти модела. Метода која се користи заснива се на детаљној анализи различитих лексичких и синтактичких карактеристика у корпусу састављених од примера из пет дигитализованих свезака речника САНУ. Почетни скуп функција био је инспирисан сличним приступом и за друге језике. Дистрибуција карактеристика примера из овог корпуса упоређује се са карактеристиком дистрибуције узорака реченица ексцерпираних из корпуса који садрже различите текстове. Анализа је показала да
Српски, добри примери из речника, аутоматизација израде речника, издвајање својстава, Машинско учење
... This analysis enabled the recognition of the entry structure: headword group, grammatical data, etymology, lexical units (senses), multiword expressions and proverbs (if any). Each lexical unit may contain linguistic labels (domain, style, time etc.), syntax patterns, definitions, related words, ...
... dictionary into various standard structured formats and a lexical database was implemented using a custom software solution, with the primary goal to speed up the linear production process of the dictionary. This enabled the use of the lexical database for research purposes. After successful import ...Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković. "SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian" in Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference , Lexical Computing CZ, s.r.o. (2019)
An intelligent hybrid system for surface coal mine safety analysis
Nikola Lilić, Ivan Obradović, Aleksandar Cvjetić. "An intelligent hybrid system for surface coal mine safety analysis" in Engineering Applications of Artificial Intelligence (2010)
Претрага корпуса заснована на употреби екстерних лексичких ресурса путем веб-сервиса
У раду се разматра хибридни приступ претрази корпуса, илустрован на примеру алатки OCWB и NoSketch Engine, примењених на специјални корпус из области рударства (РудКор) и Корпус савременог српског језика (СрпКор). Разматрани приступ комбинује постојеће могућности алатки OCWB и NoSketch Engine, које своју претрагу заснивају на лингвистичкој анотацији корпуса, са новим могућностима претраге у виду консултовања екстерних језичких ресурса (морфолошки електронски речници српског језика и лексичка база података Српски ворднет). Хибридни приступ је реализован надоградњом вебсучеља која поменуте алатке користе
... net/ files/ CQP_ Tutorial. pdf Крстев и др. 2004: Cvetana Krstev, Gordana Pavlović-Lažetić, Duško Vitas and Ivan Obradović, “Using Textual and Lexical Resources in Developing Ser- bian Wordnet”, Romanian Journal of Information Science and Technology, 7/1–2, 147–161. Крстев 2008: Cvetana Krstev, Processing ...
... Stanković, Cvetana Krstev, Ivan Obradović, Ol- ivera Kitanović, “Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources”, In: N. Nguyen, R. Kowalczyk, A. Pinto, J. Cardoso (eds) Transactions on Computational Collective Intel- ligence XXVI, Lecture Notes in Computer ...Милош Утвић, Ранка Станковић, Александра Томашевић, Михаило Шкорић, Биљана Лазић. "Претрага корпуса заснована на употреби екстерних лексичких ресурса путем веб-сервиса" in Научни састанак слависта у Вукове дане - Vol. 48/3 Српски језик и његови ресурси, Међународни славистички центар, Филолошки факултет, Универзитет у Београду (2019). https://doi.org/10.18485/msc.2019.48.3.ch12
Речник САНУ као база терминолошких речника (на примеру речника кулинарства)
... to extract a list of the lemma frequencies in the cookbook, which are then filtered and classified by applying semantic markers. The obtained list was compared to a list of the lexical entry of the SASA dictionary; extracted the entries from the Dictionary that have no information about the culinary ...
... and words that do not exist in the Dictionary (should be entered). In this research are also used syntactic graphs for the extraction of multiword lexical units that are allocated to the frequency of terms that are significantly more frequent in a culinary text than in the corpus of contemporary Serbian ...
... српскохрватског књижевног и народног језика. Београд, 1959–2014, I–XIX. Литература 1. Krstev, Cvetana, Duško Vitas and Gordana Pavlović-Lažetić. „Resources and methods in the morphosyntactic processing of Serbo-Croatian.” In Gerhild Zybatow et al. (eds.) Formal Description of Slavic Languages: The Fifth ...Рада Стијовић, Олга Сабо, Ранка Станковић. "Речник САНУ као база терминолошких речника (на примеру речника кулинарства)" in Словенска терминологија данас, Београд : Српска академија наука и уметности (2017)
Using Metadata For Content Indexing Within An OER Network
Ranka Stanković, Olivera Kitanović, Ivan Obradović, Roberto Linzalone, Giovanni Schiuma, Daniela Carlucci (2014)
by scoring resources against keywordson basis of user search activity Preselected groups of resources Resource access level permissions by user group Multilingual,allowingthe user to change the languagewith most major languages supported Automatic thumbnail creation for resources Minimal hosting
... for a metadata portal indexing open educational resources within a network of institutions.The network is aimed at blending academic and entrepreneurial knowledge,by enabling higher education institutions to publish various academic learning resources e.g. video lectures, course planning materials ...
... corresponding metadata portal described in this paper is to provide structured access to information on open educational resources within the network. Keywords:OER, Open educational resources, metadata, TEL, Technology enhanced learning 1. INTRODUCTION Due to intense technological development there is a ...Ranka Stanković, Olivera Kitanović, Ivan Obradović, Roberto Linzalone, Giovanni Schiuma, Daniela Carlucci. "Using Metadata For Content Indexing Within An OER Network" in Proceedings of the Fifth International Conference on e-Learning, eLearning 2014, September 2014, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2014)
Towards Better Valorisation of Industrial Minerals andRocks in Serbia—Case Study of Industrial Clays
CRIRSCO reporting standard Indicated resources Measured resources Proven reserves Proven reserves Drill spacing (m) 120 60 30 15 Drilling technique Core drilling using double tube core barrels, core yield almost 100 % Pneumatic air drilling
... well as the employees' publications. - The Repository is available at: www.dr.rgf.bg.ac.rs Resources 2021, 10, 63. https://doi.org/10.3390/resources10060063 www.mdpi.com/journal/resources Review Towards Better Valorisation of Industrial Minerals and Rocks in Serbia—Case Study of Industrial ...
... field of geological exploration and eval- uation of mineral resources/reserves and their classification, which is still causing some inconveniences, especially for the mineral industry sector. The best possible valorisation of mineral resources, particularly of industrial minerals and rocks (IMR), depends ...Vladimir Simić, Dragana Životić, Zoran Miladinović. "Towards Better Valorisation of Industrial Minerals andRocks in Serbia—Case Study of Industrial Clays" in Resources, MDPI AG (2021). https://doi.org/10.3390/resources10060063
Development Of The Serbian Geological Resources Portal
... Development Of The Serbian Geological Resources Portal Ranka Stanković, Jelena Prodanović, Olivera Kitanović, Velizar Nikolić Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Development Of The Serbian Geological Resources Portal | Ranka Stanković, Jelena Prodanović ...
sharing of geologic information and can help avoid duplicated efforts, inconsistencies, delays, confusion, and a waste of resources. Development of the Serbian Geological Resources Portal RANKA STANKOVIĆ1, JELENA PRODANOVIĆ1, OLIVERA KITANOVIĆ1 & VELIZAR NIKOLIĆ2 Abstract. The Geological information
... exploitation permits and works in the mineral resources field, and represents a basis for archiv- ing and efficient handling of vector, raster and related thematic alphanumeric content in one place, as well as efficient management and usage of mineral resources.8 Google Maps API Web Services are also used ...Ranka Stanković, Jelena Prodanović, Olivera Kitanović, Velizar Nikolić. "Development Of The Serbian Geological Resources Portal" in Proceedings of the 17th Meeting of the Association of European Geological Societies, Belgrade, Serbia : The Serbian Geological Society (2011)
Geological Exploration of Mineral Resources in Serbia and Reporting
Jelenković Rade (2013)Jelenković Rade. "Geological Exploration of Mineral Resources in Serbia and Reporting" in 3nd International Conference Mineral resources in the Republic of Serbia: A Driving Force for Economic Development, Beograd:TGI Executive Meetings (2013)
Mineral resources in the Republic of Serbia: A Driving Force for Economic Development – Pro et Contra
Jelenković Rade (2014)Jelenković Rade. "Mineral resources in the Republic of Serbia: A Driving Force for Economic Development – Pro et Contra" in 4nd International Conference Mineral resources in the Republic of Serbia: A Driving Force for Economic Development, Beograd:TGI Executive Meetings (2014)
Developing Termbases for Expert Terminology under the TBX Standard
... Ranka Stankovié, Dusko Vitas, and Ivan Obradovié. WS4LR: A Workstation for Lexical Resources. In Proceedings of the Fifth International Con- ference on Language Resources and Evaluation (LREC’06). European Language Resources Association (ELRA), 2006. Ch. Lieske, S$. McCormick, and G. Thurmair. The Open ...
process that is as automated as possible. With term extraction as its cornerstone, it requires a post-processing strategy that repurposes existing lexical resources to maximize efficiency. Terms extracted from corpora and subsequently translated should be channeled into the company termbase, so that they can
... concepts from termbases as central resources in a custom in-house scheme to standard formats such as TBX has been provided, by a wizard integrated in the terminological information system supporting the termbases. Keywords: Termbases, TBX standard, Language Resources, Terminol- ogy Integration and Portability ...Ranka Stanković, Ivan Obradović, and Miloš Utvić. "Developing Termbases for Expert Terminology under the TBX Standard" in Natural Language Processing for Serbian - Resources and Applications, Belgrade : University of Belgrade, Faculty of Mathematics (2014)
Prognozna ocena resursa kaolinitskih glina u sedimentnim basenima Srbije
Vladimir M. Simić (2003)Kaolinitske gline predstavljaju jednu od privredno najznačajnijih nemetaličnih mineralnih sirovina u Srbiji. Ležišta tih glina odavno se istražuju i eksploatišu u Aranđelovačkom i Kolubarskom basenu i basenima vlašićkog antiklinorijuma. Osim toga, u većem ili manjem obimu istražene su mnoge lokalnosti, a na nekima od njih vršila se povremena eksploatacija. Dosadašnja istraživanja kaolinitskih glina imala su, međutim, prevashodno uzak istraživački cilj - utvrđivanje rezervi glina određenog ležišta, dok su istraživanja regionalnog karaktera bila malog obima i vezana za delove pojedinih ...prognozna ocena, resursi kaolinitskih glina, sedimentni baseni, Srbija, geologija, mineralogija, kvalitet, potencijalnostVladimir M. Simić. Prognozna ocena resursa kaolinitskih glina u sedimentnim basenima Srbije, Beograd : V. Simić, 2003