Претрага
109 items
-
The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines
In this paper we present how resources and tools developed within the Human Language Technology Group at the University of Belgrade can be used for tuning queries before submitting them to a web search engine. We argue that the selection of words chosen for a query, which are of paramount importance for the quality of results obtained by the query, can be substantially improved by using various lexical resources, such as morphological dictionaries and wordnets. These dictionaries enable semantic ...LR web services, MultiWord Expressions & Collocations, Information Extraction, Information RetrievalKrstev Cvetana, Stanković Ranka, Vitas Duško, Obradović Ivan. "The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines" in LREC 2008: Conference on Language Resources and Evaluation, Marrakesh, Morocco, May 2008, European Language Resources Association (ELRA) (2008)
-
Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...... Savary, A., Zaborowski, B., Krawczyk-Wieczorek A., and Makowiecki F. (2012). SEJFEK — a Lexicon and a Shallow Grammar of Polish Economic Multi-Word Units. In Proc. of the 3rd Workshop on Cognitive Aspects of the Lexicon (CogALex-III), COLING 2012, Mumbai: COLING, pp. 195--214. Schone, P., Jurafsky, ...
... linguistic knowledge, presumably because linguistic resources and tools they used were underdeveloped. In (Małyszko et al., 2015) authors lemmatize multiword entity names (organization names and similar named entities found in a corpus of legislative acts) by using rules generated on the basis of ...
... Applications Conference, October 17-19, 2011, Jachranka: Polskie Towarzystwo Informatyczne, pp. 77--84. Tadić, M., Šojat, K. (2003). Finding multiword term candidates in Croatian. In Proc. of IESL2003 Workshop, Borovets: Context, pp. 102-107. Vintar, Š. (2010). Bilingual term recognition revisited ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
-
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others (2020)Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages ...... linked to the Princeton Wordnet. The latter re- source, SIMPLE, constitutes the semantic level of a quadri- partite Italian lexicon. Its structure is inspired by Gener- ative Lexicon theory (Pustejovsky, 1995) and in particular the notion of qualia structure which is used to organise the Semantic Units ...
... dictio- nary made it possible to combine verb groups and dictio- nary valency information, used as input for the compilation of the Danish FrameNet Lexicon (Nimb, 2018). Further- more, they constitute the basis for the automatically inte- grated information on related words in DDO, on the fly for each ...
... Proceedings of the XVIII EU- RALEX International Congress: Lexicography in Global Contexts, pages 915–923. Nimb, S. (2018). The Danish FrameNet lexicon: method and lexical coverage. In Proceedings of the Interna- tional FrameNet Workshop at LREC 2018: Multilingual FrameNets and Constructions, pages ...Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others . "A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment" in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, European Language Resources Association (ELRA) (2020)
-
Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model
Ova studija predstavlja analizu sentimenta srpskih starih romana iz perioda 1840-1920, koristeći veliki jezički model (LLM) Mistral za tehniku učenja sa zasnovani na takozvanim "zero" i "few-shot" pokušajima. Glavni pristup uvodi inovacije osmišljavanjem istraživačkih upita (promptova) uključuju tekst sa uputstvom za klasifikaciju bez primera i na osnovu nekoliko primera, omogućavajući jezičkom modelu da klasifikuje osećanja u pozitivne, negativne ili objektivne kategorije. Ova metodologija ima za cilj da pojednostavi analizu osećanja ograničavanjem odgovora, čime se povećava preciznost ...Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković, Biljana Rujević. "Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model" in In Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024), BAS (2024)
-
Knowledge and Rule-Based Diacritic Restoration in Serbian
In this paper we present a procedure for the restoration of diacritics in Serbian texts written using the degraded Latin alphabet. The procedure relies on the comprehensive lexical resources for Serbian: the morphological electronic dictionaries, the Corpus of Contemporary Serbian and local grammars. Dictionaries are used to identify possible candidates for the restoration, while the dataobtainedfromSrpKorandlocalgrammarsassistsinmakingadecisionbetween several candidates in cases of ambiguity. The evaluation results reveal that,dependingonthetext,accuracyrangesfrom95.03%to99.36%,whilethe precision (average 98.93%) is always higher than the recall (average 94.94%).... that is, the transfer of word forms to dictionary forms (lemmas); • Matching with the thesaurus based on the lemma representation of the document. Multiword terms from a thesaurus are matched with the text using lemma sequences. Fig. 3 shows the term coverage of news text ”Kudrin’s experts named the ...
... of thesauri russnet and yarn. In Proceedings of Conference ”Internet and Modern Society”, pages 7–13. Azarowa, I. (2008). Russnet as a computer lexicon for russian. Proceedings of the Intelligent Information systems IIS-2008, pages 341–350. Balkova, V., Suhonogov, A., and Yablonsky, S. (2008). Some ...
... Musical artY , where Y means full expansion to lower levels of the hierarchy, including hyponyms, parts, and dependent con- cepts. The full Boolean expression for this category looks like a disjunction of more than 400 concepts, Proceedings of CLIB 2018 98 including musical styles, musical instruments ...Cvetana Krstev, Ranka Stanković, Duško Vitas. "Knowledge and Rule-Based Diacritic Restoration in Serbian" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018): 41-51
-
OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian
Ovaj rad predstavlja novi jezički resurs za pretraživanje i istraživanje verbalnih aspektnih parova u BCS (bosanskom, hrvatskom i srpskom), kreiran korišćenjem principa Lingvističkih Povezanih Otvorenih Podataka (LLOD). Pošto ne postoji resurs koji bi pomogao učenicima bosanskog, hrvatskog i srpskog kao stranih jezika da prepoznaju aspekt glagola ili njegove parove, kreirali smo novi resurs koji će korisnicima pružiti informacije o aspektu, kao i link ka aspektnim parovima glagola. Ovaj resurs takođe sadrži spoljne linkove ka monolingvalnim rečnicima, Wordnetu i BabelNetu. ...Ranka Stanković, Maxim Ionov, Medina Bajtarević, Lorena Ninčević. "OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
-
Determining the Impact of Cutting Elements State on the Bucket–Wheel Excavator Vibration and Energy Consumption
Background There are two types of specifc sizes’ measurements. The frst is the measurement of energy consumption in a working technological process, and the second is the measurement of vibration, which is displayed over amplitudes of speed. There is an interplay between these two processes. Increasing the power consumption, the value of the vibration speed amplitudes increases. Purpose The process of the determination of machining parameters was based on minimizing the power consumption and vibration of the cutting tool, ...Filip Miletić, Predrag Jovančić, Miloš Milovančević, Miloš Tanasijević, Stevan Đenadić. "Determining the Impact of Cutting Elements State on the Bucket–Wheel Excavator Vibration and Energy Consumption" in Journal of Vibration Engineering & Technologies, Springer (2022). https://doi.org/10.1007/s42417-022-00482-3
-
The Fuzzy–AHP Synthesis Model for Energy Security Assessment of the Serbian Natural Gas Sector
Natural gas is used for the production of almost 20% of total energy today. The natural gas security of the Republic of Serbia is an urgent strategic, political and security issue. Serbia is one of the most vulnerable countries in Southeast Europe, because it only has one supply route. This study is a contribution to efforts to better understand the factors affecting energy security through the implementation of a new methodology based on the fuzzy–AHP synthesis model for measuring ...... energy security terms. If Mi belongs entirely to the ith expression, then βi = 1 and the others are 0. Normalized βi can be seen as the degree of confidence that Mi belongs to that linguistic expression of energy security. The final expression of identified energy security performance is: Mi = {(βi ...
... Using expression 17, the assessment was obtained depending on the affiliation function and class. Using one of the fuzzification and identification methods, the assessment of energy security (M) can be expressed depending on the linguistic variables A, B, C, D, E according to Figure 5 and expression 2. ...
... high and exceptional} is defined with expression 1 and 2. Symmetry 2020, 12, 908 12 of 42 The ‘best-fit’ method uses the distance di between the energy safety assessments M of the observed system defined through the asymmetric max–min composition of expression 17 and the linguistic variables A, B ...Aleksandar R. Madžarević, Dejan D. Ivezić, Miloš L. Tanasijević, Marija A. Živković. "The Fuzzy–AHP Synthesis Model for Energy Security Assessment of the Serbian Natural Gas Sector" in Symmetry, MDPI AG (2020). https://doi.org/10.3390/sym12060908
-
Geologic Information System of Serbia
Geologic information system of Serbia (GeolISS) represents repository for digital archiving, query, retrieving, analysis and geologic data visualization. The GeolISS is implemented through ESRI ArcGIS technology, and is designed to operate as a personal geodatabase (MS Jet 4.0 Engine) and SDE enterprise geodatabase in MS SQL Server. The objective of GeolISS implementation is integration of existing geologic archives, data from published maps at different scales, newly acquired field data, as well as Web publishing of geologic information. Physical implementation ...... that is implemented as compilation of geologic vocabularies such as petrologic and mineralogic classification, geologic time scale, stratigraphic lexicon etc. The terms in the vocabularies are used to classify observations/interpretations, or to specify attribute values. Observations implement field ...Branislav Blagojević, Branislav Trivić, Ranka Stanković, Nenad Banjac, Olivera Kitanović. "Geologic Information System of Serbia" in Proceedings of the 17th Meeting of the Association of European Geological Societies, 14.-18. september 2011., Beograd : Srpsko geološko društvo (2011)
-
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Two approaches to compilation of bilingual multi-word terminology lists from lexical resources" in Natural Language Engineering, Cambridge University Press (CUP) (2020). https://doi.org/10.1017/S1351324919000615
-
A Mathematical Learning Environment Based on Serbian Language Resources
In recent years, in line with ever growing usage of Information technology, the learning environments are changing. The amount of available learning materials in various forms has increased. These new environments demand comprehensive learning systems, which enable management of the learning corpus with special attention paid to relevant lexical resources. In this paper we present the concept of a Mathematical Learning Environment in Serbian (MLES), which is based on a corpus of mathematical materials and various lexical resources, enabling ...... problem of two alphabets, the entire corpus is transliterated into Latin alphabet. As for the various expression of formulas, mathematical content is converted to LaTex, which allows for expression of mathematical formulae in text only format. The second component handles user queries, semantic ...
... ekstremumom,ekstremum.N:ms6q IT Education and Practice Radojičić et al. 251 A large number of terms in mathematics, as in other domains, are multiword expressions (MWE). Thus a procedure described in [12] has been used for semi-automatic extraction of MWEs on basis of lexical resources and ...
... exist different expressions for the same mathematical content, with the same meaning such as: 1𝑥 = 1 𝑥⁄ = 1 𝑥⁄ = 𝑥−1 On the other hand, an expression can represent different content depending on the context, such as the number ϖ (Pi), which can present the transcendent number Pi= 3,14159 ...Radojičić Marija, Obradović Ivan, Stanković Ranka, Utvić Miloć, Kaplar Sebastijan. "A Mathematical Learning Environment Based on Serbian Language Resources" in Proceedings of the 7th International Scientific Conference Technics and Informatics in Education, Faculty of Technical Sciences, Čačak (2018)
-
Combining Heterogeneous Lexical Resources
... relationship. For example, the way elements relate to one another, the way attributes relate to elements, and so on. For instance, the following XPath expression (10) //SYNSET[POS='n' and not(ILR/TYPE='hypernym')] retrieves from a XML document representing wordnet using the XSD from the Fig 1 all the ...
... bilingual list, by following the hypernym/hyponym or some other relation from the current working synset, or even by typing one’s own XPath expression. An edit form is provided for the working synset in which the content of all elements can be filled and updated. More than one edit form can ...
... general. The semantic marks in the DELAS entries enable the formulation of complex queries in the Intex environment. The use of the Intex regular expression(noun marked as a musical instrument in any form) enables, for instance, retrieving from a text of all the phrases of the type ... Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić. "Combining Heterogeneous Lexical Resources" in Proceedings of the Fourth Interantional Conference on Language Resources and Evaluation, Lisabon, Portugal , May 2004, vol. 4, ELRA - European Language Resources Association (2004)
-
Unraveling Innerworkings of Magmatic System Beneath the East Pacific Rise 9º50’N
Milena Marjanović, Suzanne M. Carbotte, Alexandre Stopin, Felix Waldhauser, Satish C. Singh, René-Édouard Plessix, Miloš Marjanović, Malden R. Nedimović, Juan Pablo Canales, Hélène D. Carton, Javier Escartin, John C. Mutter (2021)... tectonic forces leading to the eruption. In addition to the depression, based on the close correlation between the topography of the magma body and expression of the faults in the seafloor, we infer that the inward facing near-axis faults penetrate deep into the crust at steep angles, ~75º, shaping magma ...Milena Marjanović, Suzanne M. Carbotte, Alexandre Stopin, Felix Waldhauser, Satish C. Singh, René-Édouard Plessix, Miloš Marjanović, Malden R. Nedimović, Juan Pablo Canales, Hélène D. Carton, Javier Escartin, John C. Mutter. "Unraveling Innerworkings of Magmatic System Beneath the East Pacific Rise 9º50’N" in AGU Fall Meeting 2021, American Geophysical Union (2021)
-
A Description of Morphological Features of Serbian: a Revision using Feature System Declaration
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...Cvetana Krstev, Ranka Stanković, Vitas Duško. "A Description of Morphological Features of Serbian: a Revision using Feature System Declaration" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta : European Language Resources Association (2010)
-
On the compatibility of lexical resources for NooJ
Lexical resources for many languages are provided for the NooJ linguistic development environment. Meta-data descriptions of morphosyntactic and semantic properties of these languages and their resources are a mandatory part of each language module. In this paper we analyze how well the meta-data actually describe resources for a chosen subset of languages and to what extent are they compatible across languages to support multilingual processing. We show that there is place for improvement in both directions.... miliona ljudi. However, a more detailed analysis of aligned text revealed examples where a measure expression was recognized in some but not in all languages. For instance, a measure expression “four hundred horse power” was recognized in English and Croatian, but not in Serbian, because the ...
... colloquial “horses” has been used instead of “horse- power”. The same example shows that the expression “seventeen hundred and seventy tons” was not recognized in Croatian because of the insertion brutto registarskih ‘gross register’. En: …built of iron, weighing about seventeen hundred and seventy ...Ranka Stanković, Miloš Utvić, Duško Vitas, Cvetana Krstev, Ivan Obradović. "On the compatibility of lexical resources for NooJ" in Automatic Processing of Various Levels of Linguistic Phenomena: Selected Papers from the 2011 International Nooj Conference, Cambridge Scholars Publishing (2012): 96-108
-
Application of the Fuzzy Model in the Evaluation and Selection of Hydraulic Excavators on Open-Pit Lignite Mine
Stevan Đenadić, Miloš Tanasijević, Vladimir Milisavljević, Dragan Ignjatović, Predrag Jovančić (2021)The production of lignite in large open-pit mines is mainly performed with continuously operating equipment, where bucket-wheel excavators, bucket-chain excavators, belt conveyors, and spreaders are the basic machines. Smaller machines, usually of discontinuous operating nature, are commonly categorized as auxiliary machines. This paper presents the research related to the analysis of auxiliary machine parameters with the case study for a hydraulic excavator. The purpose of the analysis was to develop a model of rating quality of service of the ...Stevan Đenadić, Miloš Tanasijević, Vladimir Milisavljević, Dragan Ignjatović, Predrag Jovančić. "Application of the Fuzzy Model in the Evaluation and Selection of Hydraulic Excavators on Open-Pit Lignite Mine" in SSRN Electronic Journal, Elsevier BV (2021). https://doi.org/10.2139/ssrn.3945617
-
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... dictio- naries should be considered as reuse of data. One of the issues related is the processing and representation of terminological phrases, or multiword expressions (MWEs), ranging from compound nouns (e.g., nickname) to complex phrasal verbs (e.g., give up) and idiomatic expressions (e.g., break the ...
... material terminology. On the other hand, alignment of terms with SrpMD was necessary, since these dictionaries are a base resource for lemmatization and multiword term extraction. Since SrpMD are already in the lexical database Leximirka [32], developed and managed by the same research team, this type of alignment ...
... Infrastructure. Available online: https://elex.is/ (accessed on 12 February 2020). 3. Smolka, E.; Schulte im Walde, S. The Role of Constituents in Multiword Expressions: An Interdisciplinary, Cross-Lingual Perspective; Language Science Press: Berlin, Germany, 2020; Volume 4. [CrossRef] 4. Kallas, J.; ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
-
Critical disorder and critical magnetic field of the nonequilibrium athermal random-field Ising model in thin systems
Svetislav Mijatović, Dragutin Jovković, Sanja Janićević, Đorđe Spasojević. "Critical disorder and critical magnetic field of the nonequilibrium athermal random-field Ising model in thin systems" in Physical Review E, American Physical Society (APS) (2019). https://doi.org/10.1103/PhysRevE.100.032113
-
WS4LR - a Worksation for Lexical Resources
... a search key can only be a simple word lemma. We would like to enable a multi-word search as well, and to that end we plan to incorporate the multiword inflection module into WS4LR. • Inflection for the target language in aligned texts is not yet supported. Namely, the translation equivalence ...
... new entries. A new entry can be generated from scratch or by copying an existing lemma, which in some cases facilitates the work. The regular expression or FSA graph describing the inflectional properties of the selected lemma can be inspected and corrected if found inadequate. The handling of ...
... methods, from simple string matching to complex Xpath expressions, either predefined or specified by the user. For instance, by means of the Xpath expression “//SYNSET[DOMAIN='geology']” the user can retrieve all synsets from the working wordnet that belong to the domain of geology, or more precisely ...Cvetana Krstev, Ranka Stanković, Duško Vitas, Ivan Obradović. "WS4LR - a Worksation for Lexical Resources" in Proceedings of the Fifth Interantional Conference on Language Resources and Evaluation, Genoa, Italy, May 2006, ELRA - European Language Resources Association (2006)
-
SrpELTeC: A Serbian Literary Corpus for Distant Reading
U članku je predstavljen SrpELTeC, korpus razvijen u okviru akcije COST Distant Reading for European Literary History (CA16204). Svi romani u SrpELTeC-u su odabrani, pripremljeni i obeleženi korišćenjem zajedničkih principa uspostavljenih za sve jezičke zbirke u Evropskoj zbirci književnog teksta (ELTeC). Navedeni su izazovi i rešenja u pripremi SrpELTeC od nule. Svi romani su ručno kodirani u TEI sa bogatim metapodacima i strukturnim napomenama. Automatska anotacija je uključivala POS-označavanje, lematizaciju i imenovane entitete, oslanjajući se na resurse za obradu ...digital humanities, Serbian literature, text corpora, distant reading , linked data, named entity recognition, text analyticsRanka Stanković, Cvetana Krstev, Duško Vitas. "SrpELTeC: A Serbian Literary Corpus for Distant Reading" in Primerjalna književnost, Research Centre of the Slovenian Academy of Sciences and Arts (2024). https://doi.org/10.3986/pkn.v47.i2.03