A Lexical Approach to Acronyms and their Definitions
In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that enables MWU lemmatization and production of entries in morphological electronic dictionaries, both for MWU and acronyms, that are provided with grammatical, syntactic, semantic and domain information. This approach enables representation that reflects complex relations between acronyms and their definitions.
... are look- ing locally for acronyms, their definitions and their varia- tions, with a final goal to incorporate collected information into lexical resources for Serbian. In order to achieve these goals we have to deal with complex inflection of both Ser- bian MWUs and acronyms. We have followed these ...
KFOR-u, Medjunarodne mirovne snage na Kosovu.N +NProp+Org+DOM=Mil+ACR=KFOR:ms3:ms7 KFOR-u,KFOR.ABB+NProp+Org+DOM=Mil +ACR=KFOR:ms3:ms7 3. Used Resources and Tools Corpus: As a corpus we have used an excerpt
Proširivanje upita zasnovano na leksičkim resursima
U radu je opisano kako se leksički resursi za srpski jezik i softverski alati, razvijeni u okviru Grupe za jezičke tehnologije Univerziteta u Beogradu, mogu koristiti za unapređenje postavljanja upita. Rezultati pretrage mogu biti značajno unapređeni korišćenjem različitih leksičkih resursa, kakvi su morfološki rečnici i semantičke mreže. Izloženi pristup može se iskoristiti i u Sistemu naučnih, tehnoloških i poslovnih informacija, jer je efikasno pretraživanje ovog dragocenog resursa, imajući u vidu njegovu heterogenost i obim, kao i preovladavajući tekstualni sadržaj,
named WS4QE, accompanied by several web services, that enables the solution of various tasks via the web. Besides a short description of the lexical resources for Serbian involved, we shall also describe how the functions of the WS4LR tool can be used for their maintenance and development, as well
Data from the Digital Repository of the Faculty of Mining and Geology in eScience (eNauka)
Biljana Rujević, Mihailo Škorić (2024)The paper describes linking the Digital Repository of the University of Belgrade, Faculty of Mining and Geology, with the eScience system in terms of transferring metadata about the results of researchers' scientific work. The steps taken to ensure a smooth harvesting of metadata are outlined. Additionally, a presentation of additional improvements to the OAI system is provided, aiming to contribute to the automatic linking of authors with their results in the eScience system.Biljana Rujević, Mihailo Škorić. "Data from the Digital Repository of the Faculty of Mining and Geology in eScience (eNauka)" in Infotheca, Faculty of Philology, University of Belgrade (2024). https://doi.org/10.18485/infotheca.2023.23.2.4
Creation of a Training Dataset for Question-Answering Models in Serbian
Razvoj i primena veštačke inteligencije u jezičkim tehnologijama značajno su napredovali poslednjih godina, posebno u domenu zadatka odgovaranja na pitanja (Question Answering - QA). Dok su postojeći resursi za QA zadatke razvijeni za glavne svetske jezike, srpski jezik je relativno zanemaren u ovoj oblasti. Ovaj rad predstavlja inicijativu za kreiranje obimnog i raznovrsnog skupa podataka za obučavanje modela za odgovaranje na pitanja na srpskom jeziku, koji će doprineti unapređenju jezičkih tehnologija za srpski jezik. Pored brojnih istraživanja o jezičkim modelima ...veštačka inteligencija, obrada prirodnog jezika, jezički resursi, anotirani skupovi, ekstrakcija informacija, odgovaranje na pitanjaRanka Stanković, Jovana Rađenović, Maja Ristić, Dragan Stankov. "Creation of a Training Dataset for Question-Answering Models in Serbian" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024)
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources
Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named
... morphological electronic dictionaries and finite-state transducers for Serbian [12]. 3.1 Used Resources Lexical Resources. The resources for natural language processing of Serbian consisting of lexical resources and local grammars are being developed using the finite-state methodology as described in [3 ...
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović
Development of A Business Intelligence Tool For Accident Analysis in Mines
all terms used within a domain need to be standardized, with a clear and unambiguous definition, accompanied by lexical and semantic relations with other terms. The example of lexical relations is established between general and more specific terms, such as "coal mine", and "open pit", which is
of the first terminological resources in the field of mining was developed at the University of Belgrade Faculty of Mining and Geology (FMG) within the Technological coal mine information system (Kolonja et al, 2006). Further growth and variety of terminological resources for specific domains developed
intelligence offers some novel approaches to presentation and analysis of business information. The field is expected to benefit from application of semantic technology, especially ontologies. The tool developed for accident analysis in mines offers to the users an insight into large quantities of
Српски језик у дигиталном добу -- The Serbian Language in the Digital Age
Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević (2012)
... coverage of existing lexical resources (e. g., WordNet) and grammars ‚ Resources: uality and size of existing text corpora, speech corpora andparallel corpora, quality and cov- erage of existing lexical resources and grammars e relevant tables show that the tools and resources available for Serbian ...
... es- sential to integrate deeper linguistic knowledge to fa- cilitate semantical analysis. Experiments using lexical resources such as machine-readable thesauri or onto- logical language resources (e. g., WordNet for English or SrpNet for Serbian) have demonstrated improve- ments in finding pages using ...Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević. "Српски језик у дигиталном добу -- The Serbian Language in the Digital Age" in META-NET White Paper Series, G. Rehm, H. Uszkoreit (eds.), Springer (2012)
Open Educational Resources in Serbia
e-learning, open education, semantic web, information systems, database modelling, geoinformation management and artificial intelligence. Her current research is focused on building custom components that incorporate knowledge from various language and lexical resources. She is head of Computer
OPEN EDUCATIONAL RESOURCES IN SERBIA AUTHOR(s) - Ivan Obradović, Ranka Stanković, Marija Blagojević, Danijela Milošević Abstract: This chapter provides a review of open educational resources in Serbia. It covers different aspects of open educational resources: policy, resources, licenses,
current state of open educational resources development and implementation in Serbia. Analysis of the results show an affirmative direction of open educational resources implementation in Serbia and future possibilities. Key words: Open educational resources, BAEKTEL, metadata portal
Integrisanje heterogenih leksičkih resursa
Osnovna aktivnost Grupe za obradu prirodnih jezika na Matematičkom fakulteta Univeziteta u Beogradu je usmerena na razvoj različitih resursa za obradu srpskog jezika. Među njima su posebno značajni sistem morfoloških rečnika srpskog jezika razvijenih u okviru mreže RELEX [1] i semantička mreža (tipa wordnet) za srpski jezik razvijena u okviru međunarodnog projekta Balkanet. Radi se o dva heterogena leksička resursa, razvijena na osnovu sasvim različitih modela, koji samim tim sadrže i različite vrste leksičkih informacija. Integracijom ovih resursa, informacije
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment
morphological dictionaries Serbian morphological dictionaries represent a rich lexical resource, which can be used in various NLP tasks (Krstev, 2008). It is being continually developed and maintained in the lexical database LeXimirka (Stanković et al., 2018), which supports different export functions
Речници у дигиталном добу - информатичка подршка за српски језик
Биљана Рујевић (2022)Морфолошки речници српског језика представљају електронски језички ресурс који има значајну историју развоја и коришћења за потребе обраде природних језика. С обзиром на то да су чувани у облику датотека чији је број нарастао па је самим тим управљање речницима постало отежано јавила се потреба за смештањем информација из речника у облик лексикографске базе. Како би се омогућио симултани рад на развоју речника за више корисника јавила се потреба за веб-апликацијом заснованој на лексикографској бази. Како би се размотриле ...Биљана Рујевић. Речници у дигиталном добу - информатичка подршка за српски језик, Београд : [Б. Рујевић], 2022
Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology
Mihailo Škorić, Mauro Dragoni (2019)This paper is a result of a task that was presented to attendants of Keyword Search in Big Linked Data summer school, that was organized by Vienna University of Technology, under the Keystone COST action in the summer of 2017. It presents a specific approach to the classification via creation of minimal document surrogates based on the US National medical library's MeSH ontology, which is derived from the Medical Subject Headings thesaurus. In a series of previously classified medically
language morphology should be taken into account 68 Infotheca Vol. 19, No. 1, September 2019 Scientific paper and preparation of additional lexical resources specific to the field of medicine would be required in order to normalize text before classification or indexing, which would help to identify
on the resources available, specifically the ontology or taxonomy used for the classification (Rakesh et al., 2001).Once established, the system may find wider application. When it comes to the classification of (medical) documents for the Serbian language, it is necessary to prepare resources first.
Measuring semantic relevance of words in synsets
Obradović Ivan, Krstev Cvetana, Vitas Duško. "Measuring semantic relevance of words in synsets" in Text and Language, Structures · Functions · Interrelations. Quantitative Perspectives, P. Grzybek, E. Kelih, J. Mačutek (eds.), Wien:Praesens Verlag (2010): 133-144
Увођење доменских и семантичких маркера за област рударства у српске електронске речнике
... sophisticated approaches to lexical diversity assess- ment, Behavior Research Methods, 42(2), pp. 381–392. Иван Обрадовић, Александра Томашевић, Ранка Станковић, Биљана Лазић158 Ivan Obradović, Aleksandra Tomašević, Ranka Stanković, Bilјana Lazić INTRODUCING DOMAIN AND SEMANTIC MARKERS FOR THE FIELD ...
117–136. Крстев и др. 2015: Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić "Terminology Acquisition and Description Using Lexical Resources and Local Grammars", In: Proc. of the 11th Conferenceon Terminology and Artificial Intelligence, Granada, Spain, eds. Thierry Poibeau and
sponse. This paper discusses the importance of lexical mark- ers in the processes of information retrieval and extraction, and proposesanexpansion of the set of the semarkers for the field of mining. A brief description of the developed corpus of texts from the field of mining is also given, for
Part of Speech Tagging for Serbian language using Natural Language Toolkit
Ranka Stanković, Boro Milovanović (2020)Dok se razvijaju složeni algoritmi za NLP (obrada prirodnog jezika), osnovni zadaci kao što je označavanje ostaju veoma važni i još uvek izazovni. NLTK (Natural Language Toolkit) je moćna Python biblioteka za razvoj programa zasnovanih na NLP-u. Pokušavamo da iskoristimo ovu biblioteku za kreiranje PoS (vrsta reči) oznake za savremeni srpski jezik. Jedanaest različitih modela je kreirano korišćenjem NLTK API-ja za označavanje. Najbolji modeli se transformišu sa Brill tagerom da bi se poboljšala tačnost. Obučili smo modele na označenom
typology," Proc. Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, May 2014 [14] C. Krstev and D. Vitas, "Serbian Morphological Dictionary – SMD," University of Belgrade, HLT Group and Jerteh, Lexical resource, 2.0, 2015
An Italian-Serbian Sentence Aligned Parallel Literary Corpus
This article presents the construction and relevance of an Italian-Serbian sentence-aligned parallel corpus, delving into the aligned sentences in order to facilitate effective translation between the two languages. The parallel corpus serves as a valuable resource for language experts, researchers, and language enthusiasts, fostering a deeper understanding of linguistic nuances and cultural expressions. By bridging the gap between Serbian and Italian, this corpus opens new avenues for cross-cultural communication and collaboration, and ultimately contributes to the improvement of language-related ...Saša Moderc, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić. "An Italian-Serbian Sentence Aligned Parallel Literary Corpus" in Review of the National Center for Digitization, Belgrade : Faculty of Mathematics, University of Belgrade (2023). https://doi.org/10.5281/zenodo.11203388
The Use of the Omeka Semantic Platform for the Development of the University of Belgrade, Faculty of Mining and Geology Digital Repository
Under the regulations of the Ministry of Education, Science and technological Development, a digital repository based on the Omeka S data storage platform has been developed for the Faculty of Mining and Geology. The platform has been upgraded with the required modular extensions, Solr index and automatic OCR. Furthermore, document indexing and search have been fine-tuned with the aid of e-dictionaries of the Serbian language, which has brought about outstanding results in terms of usage facilitation and overall ...Petar Popović, Mihailo Škorić, Biljana Rujević. "The Use of the Omeka Semantic Platform for the Development of the University of Belgrade, Faculty of Mining and Geology Digital Repository" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2020.20.1_2.9
Automatic construction of a morphological dictionary of multi-word units
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation
electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion
Query Expansion) was developed on basis of LeXimir, and it enables expansion of queries submitted to the Google search engine [6]. Integrated lexical resources enable modifications of user queries for both monolingual and multi-lingual search. The main feature of WS4QE is that it enables inflection of
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation
electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion
non- compositionality and have constant references can be de- scribed using a similar approach. The NLP community offered various approaches to lexical treatment of multi-word units (MWUs) that were analyzed in detail by Savary [5]. Productive classes of MWUs, like numerals and various named entities
Terminology Acquisition and Description Using Lexical Resources and Local Grammars
Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for
Terminology Acquisition and Description Using Lexical Resources and Local Grammars Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić
terms and integrating them with other resources for linguistic text processing; 5.3. Linguistic pre-processing with expanded dictionaries for verification of recognition of new MWU lemmas. Figure 1: Diagram of terminology acquisition using lexical resources and local grammars The newly acquired
as well as the employees' publications. - The Repository is available at: www.dr.rgf.bg.ac.rs Terminology acquisition and description using lexical resources and local grammars Cvetana Krstev Ranka Stanković Ivan Obradović Biljana Lazić University of University of University of University of