109 items
An Approach to Efficient Processing of Multi-Word Units
Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making ...... adjective (POS="A"), in the nominative case (Case="1"), in the singular (Num="s"), and in the masculine gender (Gen="m"). • Do values of a grammatical category agree for two or more components? The rules use unification variables in a similar way as inflectional transducers for MWUs (described in ...
... since the gender of stepen is masculine. Additionally, grammatical-feature equations can contain not only concrete values but also unification variables. A unification variable instantiates to all values of the corresponding grammatical feature. For Serbian, a pattern <$3:Case=$c> means that forms ...
... following way: in a dictionary of lem- mas (DELAS) every lemma is described in full detail so that a dictionary of forms containing all necessary grammatical information (DELAF) can be generated from it, and subsequently used in various NLP tasks. Two corpus processing systems that support work with ...Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas. "An Approach to Efficient Processing of Multi-Word Units" in Computational Linguistics - Applications, Studies in Computational Intelligence 458 no. 458, Berlin Heidelberg : Springer-Verlag (2013): 109-129. https://doi.org/10.1007/978-3-642-34399-5_6
A Description of Morphological Features of Serbian: a Revision using Feature System Declaration
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... binary category that states whether grammatical category is fixed or not for nouns in general. It should be noted that this is the property of a category as related to the part of speech and not for the category in general. A part of the feature structure declaration for nouns is:
... years. They can be grouped in several categories. (a) The lack of a value for an existing category. This problem occurred, for instance, with the grammatical category number. The values for this category for nouns are singular and plural (shared by all involved languages), and besides that dual ...
... adding the missing values in the new MULTEXT-East versions. (b) The lack of a category. In Serbian, gender of nouns is a grammatical category, and it has values: masculine, feminine and neuter. The value of this category is different in certain cases from the natural gender (or sex) which affects ...Cvetana Krstev, Ranka Stanković, Vitas Duško. "A Description of Morphological Features of Serbian: a Revision using Feature System Declaration" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta : European Language Resources Association (2010)
Towards Better Valorisation of Industrial Minerals andRocks in Serbia—Case Study of Industrial Clays
... after the analysis of all modifying factors) but not necessarily within final pit design D2 + D1category C2 category C1 category A + B category C1 category A + B category Almost 60 years ago, the Russian school provided a simple classification of deposits by groups, without unnecessary ...
... exploration level of B category is 4 times less than A category, and in the case of C1 category the exploration level is 16 times lower compared to A category (based on the usual maximum distances between explora- tion works for majority of IMR). That is the reason why C1 category is correlated to indi- ...
... porting system. The Serbian C1 category can be a nightmare in some cases for experts not familiar with Serbian practice in this field. Namely, until 2006 [76] this category was not exploita- ble, as it was mandatory to increase the exploration level at least to the B category. It is always important to ...Vladimir Simić, Dragana Životić, Zoran Miladinović. "Towards Better Valorisation of Industrial Minerals andRocks in Serbia—Case Study of Industrial Clays" in Resources, MDPI AG (2021). https://doi.org/10.3390/resources10060063
Damage quantification of built stone on Dark Gate (Belgrade, Serbia): sample of damage index application for decay rate evaluation
The Dark Gate is the monument of culture, part of the cultural and historical complex of the Belgrade Fortress. It is constructed of limestone blocks that after 270 years of exposure to environmental conditions and different anthropogenic influences show wide ranges of decay forms. During 2007, detail registration of all built limestone microfacies and weathering forms was done using tools of monument mapping. A correlation scheme ‘‘intensity– damage category’’ was made according to the type, intensity, and distribution of ...... forms and damage category in relation to orientation of gate parts reveal very interesting: data. Looking at the facades ori- ented to south and north and theirs results of mapping damage category, no significant difference can be noted. The distribution of damage category is almost random on ...
... stone blocks in the Dark Gate in correlation to damage category Environ Earth Sci (2015) 73:6181–6193 Group Loss of stone material Discoloration/deposits Detachment Fissures/deformation Main weathering forms Damage category || 2 3 4 5 Surface recession Intensity of decay Depth ...
... blocks where loss of material is superficial (Fig. 4c, e damage category 3in Table 1). On the blocks with very slight and slight damage, the original stone surface is preserved but discoloration or crusts are present (Fig. 4a, category 1 and 2). Stone blocks without visible damage were not observed ...Maja Franković, Nevenka Novaković, Vesna Matović. "Damage quantification of built stone on Dark Gate (Belgrade, Serbia): sample of damage index application for decay rate evaluation" in Environmental Earth Sciences, Springer Science and Business Media LLC (2015). https://doi.org/10.1007/s12665-014-3843-z
On the compatibility of lexical resources for NooJ
Lexical resources for many languages are provided for the NooJ linguistic development environment. Meta-data descriptions of morphosyntactic and semantic properties of these languages and their resources are a mandatory part of each language module. In this paper we analyze how well the meta-data actually describe resources for a chosen subset of languages and to what extent are they compatible across languages to support multilingual processing. We show that there is place for improvement in both directions.... def files. The general form of these metadata is “category_property=value”. For example, N_Nb=s+p3 means that the property grammatical number (Nb), in the case of category nouns (N), can have two values: s (singular) and p (plural). In addition to that, the *.def files offer the information on ...
... are marked as determiners, which represent a wider category, as illustrated by the following example. En: Squarely in his armchair, his feet close together like those... Standardization A starting point for the standardization of grammatical attributes, their values and codes can be find in ...
... representation of these phenomena in grammars. For example, the number category does not exist in English and French, which use the determiner category instead, whereas other languages do not recognize the determiner category at all. As to properties, there are, for example, three types of masculine ...Ranka Stanković, Miloš Utvić, Duško Vitas, Cvetana Krstev, Ivan Obradović. "On the compatibility of lexical resources for NooJ" in Automatic Processing of Various Levels of Linguistic Phenomena: Selected Papers from the 2011 International Nooj Conference, Cambridge Scholars Publishing (2012): 96-108
From DELA Based Dictionary to Leximirka Lexical Database
Biljana Lazić, Mihailo Škorić (2020)In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...... lexicographic database that stores various categories of data, that is, grammatical, general, derivational, pronunciation, variational, syntactic, domain, and semantic markers. Figure 2. A lexicographic database model for data category information . The DataCategories table stores information about marker ...
... combinations are stored in FormGramCats table. Each separated grammatical category is in FormGramCatProperties table. This example approves that the same database can be used for information from different morphological dictionar- ies in DELA format. The only difference comparing to Serbian example is that ...
... part of speech the category applies to. If the part of speech is not significant for a particular data category, the record in the DatCatSets table has the value “MOT”. The marker value is written to the DatCatValues table. Multiple marker values from the same category form one category that is a record ...Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
Part of Speech Tagging for Serbian language using Natural Language Toolkit
Ranka Stanković, Boro Milovanović (2020)Dok se razvijaju složeni algoritmi za NLP (obrada prirodnog jezika), osnovni zadaci kao što je označavanje ostaju veoma važni i još uvek izazovni. NLTK (Natural Language Toolkit) je moćna Python biblioteka za razvoj programa zasnovanih na NLP-u. Pokušavamo da iskoristimo ovu biblioteku za kreiranje PoS (vrsta reči) oznake za savremeni srpski jezik. Jedanaest različitih modela je kreirano korišćenjem NLTK API-ja za označavanje. Najbolji modeli se transformišu sa Brill tagerom da bi se poboljšala tačnost. Obučili smo modele na označenom ...... word category is Noun, followed by a Punctuation and a Verb. Cumulative distribution of PoS categories is shown in Figure 1. First five categories account to 57% of all tokens. These numbers help us in creating the taggers and interpreting their performance. Fig. 1. Word PoS category cumulative ...
... to the tasks which are performed later in the pipeline. One basic task is PoS (Part of Speech) tagging, a process of assigning a part of speech category to each token in the text. The program that performs tagging is called tagger. The taggers can be created in multiple ways. In this paper, we will ...
... collection of tags. UD_POS is a Universal Dependency tagset [13]. N_POS is a tagset used in Serbian Morphology Dictionary [14] expanded with a gender category. From the given data we extracted token, N_POS and UD_POS tag. We stripped gender from the N_POS and got a third tagset which we called SMD_POS ...Ranka Stanković, Boro Milovanović. "Part of Speech Tagging for Serbian language using Natural Language Toolkit" in 7th International Conference on Electrical, Electronic and Computing Engineering IcETRAN 2020, Academic Mind, Belgrade (2020)
Knowledge and Rule-Based Diacritic Restoration in Serbian
In this paper we present a procedure for the restoration of diacritics in Serbian texts written using the degraded Latin alphabet. The procedure relies on the comprehensive lexical resources for Serbian: the morphological electronic dictionaries, the Corpus of Contemporary Serbian and local grammars. Dictionaries are used to identify possible candidates for the restoration, while the dataobtainedfromSrpKorandlocalgrammarsassistsinmakingadecisionbetween several candidates in cases of ambiguity. The evaluation results reveal that,dependingonthetext,accuracyrangesfrom95.03%to99.36%,whilethe precision (average 98.93%) is always higher than the recall (average 94.94%).... in the category profile: without an extension (denoted by the symbol ”N”), the full extension in the hierarchy tree ( ”Y”), extension only by subclass relations (”L”), etc. In every step, automatically derived concepts in the category description can be edited. For example, the music category can be ...
... the main topic node ”demographic situation”, its weight is considerably higher; • Calculation of category weights in dependence of concepts included into the rules of the inference for this category. Fig. 5 (upper part) shows the categories found in the mentioned document, including ”Depopulation” ...
... sufficient training collection for learning the algorithms. However, many organizations have a need in automatic text categorization, when even a category system (system Proceedings of CLIB 2018 97 Figure 3: Security thesaurus terms found in a text. Brown and blue boxes show ambiguous terms, which ...Cvetana Krstev, Ranka Stanković, Duško Vitas. "Knowledge and Rule-Based Diacritic Restoration in Serbian" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018): 41-51
Multi-word Expressions for Abusive Speech Detection in Serbian
Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...... 17 subcategories. The creators of HurLex opted for a detailed categorization in order to have the possibility to search for a specific category or group of category types. This makes HurtLex amenable to automatic usage for tasks in many languages. Koufakou et al. (2020) used HurtLex in the TRAC-2 task ...
... a osoba ‘uneducated person’ can not be in the category animals. Instead of zmija u travi ‘snake in the grass’ one would use in Serbian just zmija ‘snake’. Table 2 shows number of MWEs that were rejected (no) and confirmed (yes) per each HurtLex category. 4 MWE - dictionary construction 4.1 Selection ...
... 3.1 srHurtLex lexical cleaning The initial version of HurtLex for Serbian1 has been analysed, first from a lexical point of view, then from a grammatical point of view. The errors in srHurtLex were introduced due to the automatically generated translation. In the retrieved data set, consisting of 2518 ...Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev. "Multi-word Expressions for Abusive Speech Detection in Serbian" in Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Association for Computational Linguistics (2020)
The ICL Adriatic-Balkan Network: Scientific background, opportunities and challenges for regional cooperation
Mihalić Snježana, Arbanas Željko, Mikoš Matjaž, Abolmasov Biljana. "The ICL Adriatic-Balkan Network: Scientific background, opportunities and challenges for regional cooperation" in Proceedings of IPL Symposium, Kyoto, 20th January 2012. International Consortium on Landslides 1 no. 1, Kyoto University, Japan:International Consortium on Landslides (2012): 27-39
Building Terminological Resources in an e-Learning Environment
... element which defines to which category from a predefined set of categories the dictionary belongs. In the case of export from RudOnto the user can select from several options: no category, the category of a dictionary is the root of the sub-tree, or the category the concept is an ancestor of the ...
... 01 Geostatistics <CATEGORY>Geostatistički rečnikCATEGORY> 5. CONCLUSION The terminological resource features offered by Moodle in the form of its glossaries can ...Ranka Stanković, Ivan Obradović, Olivera Kitanović, Ljiljana Kolonja. "Building Terminological Resources in an e-Learning Environment" in Proceedings of the Third International Conference on e-Learning, eLearning-2012, September 2012, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2012)
Applicability of the risk ranking methodology designed for water reservoirs to tailings storage facilities
The risks associated with operating water reservoirs and tailings storage facilities (TSFs) are different because of their different purposes, methods of construction and operation, and characteristics of the materials impounded and their flow behaviour. Regardless of the differences, these two types of structures are often put in the same category when it comes to risk assessment and the application of relevant methodologies, which may result in unrealistic outcomes. In this paper we investigate whether it is possible to apply ...Dragana Nišić, Dinko Knežević, Aleksandar Cvjetić, Neda Nišić, Vladimir Jovanović. "Applicability of the risk ranking methodology designed for water reservoirs to tailings storage facilities" in Journal of the Southern African Institute of Mining and Metallurgy (2022). https://doi.org/10.17159/2411-9717/1492/2022
Characterization and Environmental Evaluation of Recycled Aggregates from Construction and Demolition Waste in Belgrade City Area (Serbia)
Filip Abramović, Miroslav Popović, Vladimir Simić, Vesna Matović, Radmila Šerović. "Characterization and Environmental Evaluation of Recycled Aggregates from Construction and Demolition Waste in Belgrade City Area (Serbia)" in Materials, MDPI AG (2024). https://doi.org/10.3390/ma17040820
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...... were: • Some texts were fully annotated, with lemmas and all grammatical categories (1984, Verne), some were only lemmatized with assigned PoS (Intera, Švejk, Floods, History), while in one text (Novels) values of the grammatical category gender were added. 3ELTeC is a corpus prepared in the scope ...
... bg.ac.rs Abstract The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important ...
... numbers) into tagger models required an update of the training corpus and addition of re- spective category values to texts where they were missing. The biggest challenge was to add new grammatical cate- gories to the Intera corpus. Having in mind its size, it had to be done automatically using ...Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
CC-PESTO: a novel GIS-based method for assessing the vulnerability of karst groundwater resources to the effects of climate change
The new GIS-based CC-PESTO method is shown to successfully assess and map the vulnerability/resilience of karst aquifers to effects of climate change. Karst aquifers were chosen due to their importance at the global level and widespread utilisation in potable water supply and irrigation, but also because of their hydrogeological complexity. The method was developed to assess the intrinsic vulnerability of aquifers, without considering the direct impact of variable climate factors, but considering the adaptive capacity of aquifers in response ...Zoran Stevanović, Veljko Marinović, Jelena Krstajić. "CC-PESTO: a novel GIS-based method for assessing the vulnerability of karst groundwater resources to the effects of climate change" in Hydrogeology Journal, Springer Science and Business Media LLC (2020). https://doi.org/10.1007/s10040-020-02251-6
Developing Termbases for Expert Terminology under the TBX Standard
... four general types of TBX data-categories: 1. A core-structure module data-category is any data-category that is defined in the core-structure module DTD as a XML element. 2. A meta data-category is a general data-category used to group similar data- categories together. It is implemented as a cor ...
... a XCS file. 3. A terminological data-category is an instance of the meta data-category with a particular value of the type attribute. A value of the type attribute represents the name of the corresponding terminological data-category. 4. A simple data-category is one value of a closed set of values ...
... permissible content of an XML element (meta data-category) having a specific type attribute value. For example, XML element with an open tagrepresents terminological data-category definition as an instance of the meta data-category descrip, while ... Ranka Stanković, Ivan Obradović, and Miloš Utvić. "Developing Termbases for Expert Terminology under the TBX Standard" in Natural Language Processing for Serbian - Resources and Applications, Belgrade : University of Belgrade, Faculty of Mathematics (2014)
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian
Uvredljivi govor na društvenim medijima, uključujući psovke, pogrdni govor i govor mržnje, dostigao je nivo pandemije. Sistem koji bi bio u stanju da detektuje takve tekstove mogao bi da pomogne da internet i društveni mediji postanu bolji virtuelni prostor sa više poštovanja. Istraživanja i komercijalna primena u ovoj oblasti do sada su bili fokusirani uglavnom na engleski jezik. Ovaj rad predstavlja rad na izgradnji AbCoSER-a, prvog korpusa uvredljivog govora na srpskom jeziku. Korpus se sastoji od 6.436 ručno označenih ...... irregular forms. Since Serbian is a highly inflective and morphologically rich language that uses a lot of different word suffixes to express different grammatical, syntactic, or semantic features, we also established the relation with the Serbian electronic dictionaries and the management platform Leximirka ...
... 50, 49, 36]). At the first level annotators marked a tweet as abusive (TRUE) or non-abusive (FALSE). At the second level annotators determined the category of abusive speech in tweets marked as abusive: 1. Profanity (PROF), the tweet contains simplicity and vulgarity (e.g. “lakše se kenja i preti iz ...
... An abusive tweet belongs to at least one of the categories from the second annotation level. An example of a tweet that belongs to both PROF and DR category is “@USER NAME je govno, bilo gde da radi, čak i u mediju vlasti, ostaće bezlično govno” (eng. “NAME is shit, wherever he/she works, even in the ...Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih. "A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian" in 3rd Conference on Language, Data and Knowledge (LDK 2021), MDPI AG (2021). https://doi.org/10.4230/OASIcs.LDK.2021.13
Потреба за израдом биланса и оценом расположивих резерви подземних вода Републике Србије
Природне карактеристике Србије, просторна дистрибуција ресурса водe и њихових корисника, као и међусобна интеракција коришћења, заштите вода, као и заштите од вода, условиле су да се подземне воде на читавој територији морају посматрати интегрално, јединствено, комплексно и рационално. Међутим, тренутно је у Републици Србији могуће дефинисати само одлике појединих издани и/или изворишта услед непостојања довољних података за свеобухватну анализу стања ресурса. За оцену притиска на квантитет ресурса подземних вода, што је обавеза према Оквирној директиви о водама ЕУ (ОДВ) ...Зоран Стевановић, Вељко Мариновић, Бранислав Петровић. "Потреба за израдом биланса и оценом расположивих резерви подземних вода Републике Србије" in Записници Српског геолошког друштва, Српско геолошко друштво (2021)
Fuzzy Model for Risk Assessment of Machinery Failures
The main goal of this research was the development of an algorithm for the implementation of negative risk parameters in a synthesis model for a risk level assessment for a specific machine used in the mining industry. Fuzzy sets and fuzzy logic theory, in combination with statistical methods, were applied to analyze the time picture state of the observed machine. Fuzzy logic is presented through fuzzy proposition and a fuzzy composition module. Using these tools, the symmetric position of the ...... and very high. The membership degree β, which classifies the obtained risk assessment into a risk level category, was determined for each category. The membership degree defines the risk category and, therefore, it was found that the risk was “high” for the crushing machine in question with a membership ...
... of fuzzy sets (for the risk of failure of a single element). The results were in the “moderate” category, except for the subsystems “hydraulics” and “electrical system”, which were in the “high” category. For the subsystem “engine”, the first lower value for the membership degree was “high”, Symmetry ...
... value for the membership degree was for the risk category “moderate”, and the difference was less than 1%. According to the results of the assessment, the risk of failure for the crushing machine was “high”, but it tended toward the “moderate” category because this is where the first lower value for ...Dejan V. Petrović, Miloš Tanasijević, Saša Stojadinović, Jelena Ivaz, Pavle Stojković. "Fuzzy Model for Risk Assessment of Machinery Failures" in Symmetry, MDPI AG (2020). https://doi.org/10.3390/sym12040525
3D modeling and monitoring of karst system as a base for its evaluation and utilization: a case study from eastern Serbia
Earth-Surface Processes, Geology, Pollution, Soil Science, Water Science and Technology, Environmental Chemistry, Global and Planetary ChangeSaša Milanović, Zoran Stevanović, Ljiljana Vasić, Vesna Ristić-Vakanjac. "3D modeling and monitoring of karst system as a base for its evaluation and utilization: a case study from eastern Serbia" in Environmental Earth Sciences, Springer Science and Business Media LLC (2013). https://doi.org/10.1007/s12665-013-2591-9