Building Terminological Resources in an e-Learning Environment
... these concepts, and a simple reasoning mechanism related to a specific domain. Ontology A formal representation of knowledge which includes the vocabulary containing a set of concepts, semantic relationships between those concepts and simple reasoning about a certain domain RudOnto ...
... some other authors, a terminological resource qualifies for an ontology only if new knowledge can be derived from segments of knowledge already existing in the resource. According to this view. an ontology is a formal representation of knowledge, which includes a vocabulary with a set of concepts, ...
... semantic description Weak semantic description Thesaurus A taxonomy expanded to include additional semantic relationships Index a list of terms specific to a certain field Glossary an index containing definitions of terms Taxonomy A hierarchical organization of terms RudOnto Figure ...Ranka Stanković, Ivan Obradović, Olivera Kitanović, Ljiljana Kolonja. "Building Terminological Resources in an e-Learning Environment" in Proceedings of the Third International Conference on e-Learning, eLearning-2012, September 2012, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2012)
Advantages and Disadvantages of a Parallel and Zigzag Method of Acquisition in Walking Mode in Magnetometric Archeological Research
магнетометријска испитивања, цик-цак и паралелна аквизиција у ходајућем моду, линеарне аномалије, археологија... gradiometer, with and without the Global Positioning System. To obtain a regular data grid, the sam- pling was conducted at a 1s interval, and with a distance between the traverses of 1m. The traverses for this process were oriented in a north-south and east-west direction. The best results or, more accurately ...
... Advantages and Disadvantages of a Parallel...(93-110) Fig. 3 Contour m ap of a vertical gradient of TM I in N -S orientation. K ey as for Fig 1. 97 Archaeology and Science 10 (2014)Petković et al - Advantages and Disadvantages of a Parallel...(93-110) Fig. 3a Contour map of a vertical gradient of TMI ...
... and Disadvantages of a Parallel...(93-110) Fig. 5 Contour m ap of a vertical gradient of TM I in E-W orientation. K ey as for Fig 1. Archaeology and Science 10 (2014) 100 Petković et al - Advantages and Disadvantages of a Parallel...(93-110) Fig. 5a Contour map of a vertical gradient of TMI ...Mirko Petković, Vesna Cvetkov, Branislav Sretenović. "Advantages and Disadvantages of a Parallel and Zigzag Method of Acquisition in Walking Mode in Magnetometric Archeological Research" in Arheologija i prirodne nauke (2014)
OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian
Ovaj rad predstavlja novi jezički resurs za pretraživanje i istraživanje verbalnih aspektnih parova u BCS (bosanskom, hrvatskom i srpskom), kreiran korišćenjem principa Lingvističkih Povezanih Otvorenih Podataka (LLOD). Pošto ne postoji resurs koji bi pomogao učenicima bosanskog, hrvatskog i srpskog kao stranih jezika da prepoznaju aspekt glagola ili njegove parove, kreirali smo novi resurs koji će korisnicima pružiti informacije o aspektu, kao i link ka aspektnim parovima glagola. Ovaj resurs takođe sadrži spoljne linkove ka monolingvalnim rečnicima, Wordnetu i BabelNetu. ...Ranka Stanković, Maxim Ionov, Medina Bajtarević, Lorena Ninčević. "OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons
Mihailo Škorić (2017)The goal of this paper is to draw attention to the possibility of using emoticon-riddled text on the web in language-neutral sentiment analysis. It introduces several innovations in the existing framework of research and tests their effectiveness. It also presents a software tool especially made for that purpose, explains how it builds a database with sentimental value of terms and offers the user manual. Finally, it presents a software tool that tests the new database and gives some examples ...... Hyperlink beginnings http:// and https:// are replaced with a neu- tral character string yy, so that they wiil not be mistaken for :/ emoticon. Finally, regular expressions [a|h|A|H][a|h|A|H][h|H][a|A][h|H][a|h|A|H] and [h|H][a|A][h|H][a|A] are used to find as many different examples of expres- sion ...
... Serbian can use it independently, create a new database or use it for a new research. 3 A total of 143 users actually suggested a determiner, and the necessary number of proposers of a new determiner was 48, or more than one-third. If enough users suggested a determiner its value was calculated as the ...
... and that the machine can decide between multiple choices using a single parameter, task is reduced to deter- mination of what is positive and what is negative. With a goal of making a more cost-effective intelligent system, it would not be a good idea to ignore any available resources and that could po- ...Mihailo Škorić. "Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons" in Infotheca, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2017.17.1.4
Formative evaluation of e-learning projects with the logical framework approach
... Learning is the project. The project is a temporary endeavor undertaken to create a unique product or service [6]. From a managerial perspective it is a unique set of activities designed to produce a definite result, with a clear start and end date, and a clear allocation of resources [7] (Bowen ...
... graphical-textual model that takes the form of a Matrix (Logical Framework Matrix). According to the Logical Framework a program or a project is seen as a causal sequence of events. Actions to implement it are, in sequence: a. identification of project objectives; b. identification of causal r ...
... Evaluation of E-learning projects is a topic of great interest and growing importance. The evaluation of a project is the construction of the overall judgement, based on a quali-quantitative determination of the benefits and costs associated, with scientific criterion, of a project (evaluand). The purposes ...Roberto Linzalone, Giovani Schiuma, Ivan Obradović, Ranka Stanković. "Formative evaluation of e-learning projects with the logical framework approach" in The Sixth International Conference on e-Learning (eLearning-2015), September 2015, Belgrade, Serbia, Belgrade Metropolitan Univesity (2015)
A Knowledge-Based Approach to Mine Ventilation Planning in Yugoslav Mining Practice
Ventilation system analysis is a complex process based on the calculation and analysis of numerous parameters. These problems can be successfully solved by the SimVent numerical package, but a full understanding and use of the obtained results require the involvement of an experienced specialist in the ventilation field. The solution was found in the creation of a hybrid system INVENTS, whose knowledge base represents a formalization of the expert knowledge in the mine ventilation field. In this paper, we ...... either a negative value of resistance (requiring the introduction of a fan) in a branch with predetermined air flow or a recirculation through a loss branch is detected after the calculation of air flow distribution in the ventilation network, the fitness of that particular solution gets a very high ...
... / s 3 - ex p lo it at io n w o rk p la ce - in ta ke a ir - re tu rn a ir - st o p p in g w it h d o o r - m ai n f an F ig . 1 4 . T h e a ir d is tr ib u ti o n o b ta in ed b y st a n d a rd m et h o d . M in . R es . E n g . 2 0 0 2 .1 1 :3 6 1 -3 ...
... rl d sc ie n ti fi c. co m b y U N IV E R S IT Y O F A L B E R T A o n 0 4 /2 6 /1 5 . F o r p er so n al u se o n ly . A Knowledge-Based Approach to Mine Ventilation Planning 363 introduction of a knowledge base, thus upgrading existing mathematical models with ...Nikola Lilić, Ivan Obradović, Ranka Stanković. "A Knowledge-Based Approach to Mine Ventilation Planning in Yugoslav Mining Practice" in Mineral Resources Engineering 11 no. 4, Imperial College Press (2002): 361-382. https://doi.org/10.1142/S0950609802001014
Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...... evaluation of results is to follow and if the approved MWT is to be entered into some kind of a dictionary or a terminological data-base. In this case we need a lemmatized MWT, that is, a MWT in the form of a dictionary head-word. The problem of lemmatization of special kind of MWUs, 507 ...
... number, case and animateness, belong to the AXN class. X stands for a component that does not inflect when the MWU inflects or a separator, usually a space or a hyphen. Nominal MWUs in Serbian belong to one of several tens of different general classes, but 14 of these classes account for ...
... lemmas. For instance, in the case of 2-component MWUs the order of precedence of graphs is: AXN, 2XN (a noun preceded by a word that does not inflect in the MWU), N2X (a noun followed by a word that does not inflect in the MWU), NXN. Thus, two MWU forms mašine taložnice and korita trake ‘belt ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
Old or New, We Repair, Adjust and Alter (Texts)
Cvetana Krstev, Ranka Stanković (2020)U ovom radu predstavljamo kako se e-rečnici i kaskade transduktora konačnih stanja implementirani u alatu Unitex mogu koristiti za rešavanje tri problema transformacije teksta: ispravljanje tekstova nakon OCR-a, vraćanje dijakritičkih znakova i prebacivanje između različitih jezičkih varijanti.ispravka teksta, OCR greške, restauracija dijakritika , jezičke varijante, elektronski rečnik, transduktori konačnih stanja... ‘beer’ and ниво ‘level’ would be accepted. A few examples of the application of this procedure are given in Table 1. Special attention is paid to hyphenated words. A hyphen in a Serbian OCR text can signify a word hyphenated at the end of the line or a hyphen in a multi-word. Our procedure first eliminates ...
... performed. Table 2. A FST that initiates a correction for a specific replacement pair (top). A generic FST that performs a certain type of replacement (bottom). two words: the original, input word ($1$Y) and the corrected, output word ($1$X).4 The application of these FSTs always leaves in a processed texts ...
... multiple candidates for sto (sto ‘table/hundred’ and što, a functional word that can be an adverb, a pronoun or a conjunc- tion); – A list of 50 most frequent bigrams obtained from the SrpKor in which at least one word would contain letters c, s, z or a digraph dj if diacritics were removed; for instance ...Cvetana Krstev, Ranka Stanković. "Old or New, We Repair, Adjust and Alter (Texts)" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.3
Using English Baits to Catch Serbian Multi-Word Terminology
In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection... language, a terminology extractor for a target language, and a tool for word and chunk alignment. In this first experiment a source language is English, a target language is Serbian, a domain is Library and Information Science for which a bilingual terminological dictionary exists. Our term extractor ...
... that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language ...
... which a source part of a chunk matches a term from a list of domain terms in a source language: S(align.chunk) ∼ S(term.list), where symbol ∼ denotes the relation “match” (that is for our ex- periment defined in Subsection 4.5.); • Filtering once more previously filtered chunks to those in which a target ...Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian
Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković (2019)У овом раду представљамо модел за избор добрих примера за речник српског језика и развој иницијалних компоненти модела. Метода која се користи заснива се на детаљној анализи различитих лексичких и синтактичких карактеристика у корпусу састављених од примера из пет дигитализованих свезака речника САНУ. Почетни скуп функција био је инспирисан сличним приступом и за друге језике. Дистрибуција карактеристика примера из овог корпуса упоређује се са карактеристиком дистрибуције узорака реченица ексцерпираних из корпуса који садрже различите текстове. Анализа је показала да ...Српски, добри примери из речника, аутоматизација израде речника, издвајање својстава, Машинско учење... example. If a noun has a complement that affects its meaning, the complement should be represented in the example: Tada se javila u njega velika ljubav i velika podobnost za slikarstvo (paraphrase: ‘In that moment he felt a great affection and a great talent for painting’)3. If a keyword is a verb that ...
... only in paper form. At the same time, a formal description of dictionary entry was produced, and a lexical database model was developed (Stanković et al., 2018). The conversion of the SASA dictionary from unstructured text into a lexical database consisted of a thorough analysis of formatting conventions ...
... syntactic whole – with a subject and a predicate. It is even possible to add a missing sentence constituent, but it has to be in square brackets, as a mark of this kind of editorial intervention. (Though the excerpts are in the form of full sentences, the context they provide is sometimes insufficient, and ...Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković. "SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian" in Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference , Lexical Computing CZ, s.r.o. (2019)
2D geoelectrical resistivity tomography application at the former city waste dump "Ada Huja": Eco-geological problem
... profile line I has a length of 124 m (Figure 4). A high value resistivity zone with a length of 56 m (252 m to 308 m of the profile line) can be noticed and it represents most likely sand, gravel and other construction materials. Decomposed waste has a lens-like geometry and retains a low resistivity ...
... exhibited data, and looking at the 2017 methane stimulated fire at the landfill “Vinča“, which posed a threat to a million and a half inhabitants of Belgrade, such a mini-plant would not be a enormous investment. Daily production of waste for the city of Belgrade reaches 1700 tones (Mitrović, 2014) ...
... reflection is noticed that could indicate a horizontal interface, that was not detected with the geoelectrical scanning method. Acquired borehole data indicated a presence of subsurface water at a depth of around 5 m (Figure 5, middle) which makes a good reflective surface that divides two di ...Branislav Sretenović, Filip Arnaut, Ivana Vasiljević, Vesna Cvetkov. "2D geoelectrical resistivity tomography application at the former city waste dump "Ada Huja": Eco-geological problem" in Podzemni radovi, Centre for Evaluation in Education and Science (CEON/CEES) (2019). https://doi.org/10.5937/PodRad1934059S
An aproach to Implementation of blended learning in a university setting
... contains a short definition of the term in both languages, but without any relations between equivalents. An example of a dictionary entry, in English, and then in Serbian, is: geodatabase (GDB), A collection of geographic datasets of various types held in a common file system folder, a Microsoft ...
... support system. Hence, a decision was made to introduce Moodle to the already existent blended learning system at FMG. Moodle is a free and open source (published under the GNU Public License) LMS (Learning Management System) platform. From a technical point of view, it is a web application ...
... motivation for introducing a blended learning system came from the fact that in traditional face-to-face learning, the students were prone to taking a basically passive role. Thus the new learning system was aimed at turning the student role in the learning process to a more active one, and enhancing ...Ivan Obradović, Ranka Stanković, Olivera Kitanović, Jelena Prodanović . "An aproach to Implementation of blended learning in a university setting" in Proceedings of the Second International Conference on e-Learning, eLearning 2011, September 2011, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2011)
Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology
Mihailo Škorić, Mauro Dragoni (2019)This paper is a result of a task that was presented to attendants of Keyword Search in Big Linked Data summer school, that was organized by Vienna University of Technology, under the Keystone COST action in the summer of 2017. It presents a specific approach to the classification via creation of minimal document surrogates based on the US National medical library’s MeSH ontology, which is derived from the Medical Subject Headings thesaurus. In a series of previously classified medically ...... set of terms from document Di, and Sj is a set of terms from document Dj , then this index can be defined as a double the number of common terms divided with the total number of terms in both documents (if S is a set, |S| is a number of terms in that set). If documents do not have any common terms then ...
... identified with a maximum of four repetitions (if they have more than eight blocks) of a term denoting a class. After applying these steps, only class identifiers now appear in document surrogates, which should be easily counted. After the test sets have been successfully created, a simple program is ...
... eca.2019.19.1.3 ABSTRACT: This paper is a result of a task presented to attendants of Keyword Search in Big Linked Data summer school, that was organized by Vienna University of Technology, under the Keystone COST action in the summer of 2017. It presents a specific approach to the classification via ...Mihailo Škorić, Mauro Dragoni. "Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology" in Infotheca, Faculty of Philology, University of Belgrade (2019). https://doi.org/10.18485/infotheca.2019.19.1.3
Asbestos-Based Pottery from Corsica: The First Fiber-Reinforced Ceramic Matrix Composite
Asbestos-containing pottery shards collected in the northeast of Corsica (Cap Corse) and dating from the 19th century, or earlier, have been analyzed by SEM-EDS, XRPD, FTIR and Raman microspectroscopy. Blue (crocidolite) and white (chrysotile) asbestos fiber bundles are observed in cross-sections. Most of the asbestos is partly or totally dehydroxylated, and some transformation to forsterite is observed to occur, indicative of a firing above 800 C. Examination of freshly fractured pieces shows a nonbrittle fracture with fiber pull-out, consistent with ...... of a firing; above 800 "C. Examination of freshly fractured pieces shows a nonbrittle fracture with fiber pull-out, consistent with a composite material behavior, which makes these ceramics the oldest fiber-reinforced ceramic matrix composite. Residues indicate the use of this pottery as a crucible ...
... fracture of a ceramic gives rise to a brittle edge, and fracture initiation depends upon the defects and their distribution in the matrix. It is thus difficult to design the shape and mechanical strength of a ceramic object. However, the association of two brittle materials, i.e., a composite material ...
... Coarse grains (up to 5 mm) are mixed in a rather homogeneous matrix of a high open porosity (~30%). Table 1. Oxide composition measured by SEM-EDS on spots indicated in Figure 1 for samples a to c. Characteristic values in bold. Sample b Sample c Sample a Sample d Oxide «0» «» «3» «60» «l» «6» ...Philippe Colomban, Aleksandar Kremenović. "Asbestos-Based Pottery from Corsica: The First Fiber-Reinforced Ceramic Matrix Composite" in Materials, MDPI AG (2020). https://doi.org/10.3390/ma13163597
Threshold-induced correlations in the Random Field Ising Model
Softverski alati za korišćenje resursa za srpski jezik
Ivan Obradović, Ranka Stanković (2008)... of a transformation of resources such as dictionar- ies, graphs and regular expressions from a format SOFTWARE TOOLS FOR SERBIAN LEXICAL RESOURCES 52a used by Intex to a format used by NooJ. Figure 9 depicts a panel for conversion of a morphologi- cal dictionary into Unicode using a C# procedure ...
... words “lice” and “lik” are used in Serbian. Thus a semantic network of concepts in a particular language becomes a cor- responding semantic network of words, which is further materialized as a lexical data base of a specific structure. Development of wordnets started in 1985 in the research team ...
... whether it is a simple word or a compound, using SMD, and connects them by logical “or” relations. In the same way, by a simple choice of appropriate fields, the user selects whether he/ she wants to make a query in Cyrillic or Latin alphabet, or in both. WS4QE obtains semantic expansion of a query by ...Ivan Obradović, Ranka Stanković. "Softverski alati za korišćenje resursa za srpski jezik" in INFOteka: časopis za informatiku i bibliotekarstvo, Belgrade, Serbia : Zajednica biblioteka univerziteta u Srbiji (2008)
Vebran Web Services for Corpus Query Expansion
Ranka Stanković, Miloš Utvić (2020)U ovom radu se govori o razvoju veb usluga Vebran i njihovoj primeni u poboljšanju pretraživanja korpusa. Veb-servisi Vebran koriste se za konsultovanje spoljnih leksičkih izvora za srpski jezik (uglavnom elektronski morfološki rečnici i srpski Vordnet) i proširivanje korisničkih upita radi dobijanja relevantnijih rezultata iz srpskih korpusa.... only be used over a https connection, since passing it over a non-encrypted channel would make it trivial for third parties to intercept. The token endpoint is where apps make a request to get an access token for a user. After successful authorization, clients are allowed to send a request spec- ifying ...
... attribute word (it does not inflect as a part of MWU). 2) If SMD does not contain an MWU as a compound, each component of MWU is analysed separately. A component of MWU which has been found to be a lemma is associated with marker _L (e.g. ptica pevačica in Table 2). If a component has been found to be an ...
... migrated from textual e-dictionaries to a lexical database. After years of development, SMD, developed as a system of textual files, have become a large and complex lexical resource. An on-line applica- tion for dictionary development and management, based on a central lexical data repository (lexical ...Ranka Stanković, Miloš Utvić. "Vebran Web Services for Corpus Query Expansion" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.5
WS4LR - a Worksation for Lexical Resources
... describes the syntactic, semantic, derivational, or other properties of a lemma. A part of speech code and an inflectional class code uniquely determine the finite transducer that generates all the forms in a lemma paradigm. A finite transducer, being capable of producing the output, adds to all ...
... these dictionaries are given in Appendix A. 2.2 Wordnets Roughly speaking, a wordnet, such as the Princeton WordNet (PWN) is composed of synsets, or sets of synonymous words representing a concept, with basic semantic relations between them forming a semantic network (Fellbaum, 1998). Each synset ...
... and runs on a personal computer under Windows 2000/XP/2003 operating system with at least 256MB of internal memory. 1 Introduction The Human Language Technology group at the Faculty of Mathematics has been developing various lexical resources over quite a long period, reaching a considerable ...Cvetana Krstev, Ranka Stanković, Duško Vitas, Ivan Obradović. "WS4LR - a Worksation for Lexical Resources" in Proceedings of the Fifth Interantional Conference on Language Resources and Evaluation, Genoa, Italy, May 2006, ELRA - European Language Resources Association (2006)
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
Branislava Šandrih, Ranka Stanković (2020)U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...... ‘statistical yearbook of the republican institute’ where A stands for an adjective, N for a noun and PREP for a preposition. Each of these components can be a single word or a MWU. Our system was used in a mode in which all possible MWTs in a word sequence are recognised, and not only the longest one ...
... 119–138 ekrana (namely, a photo of a current state of the screen) or as a “skrinšot” (i.e, the word is transcribed). It is not uncommon that even experts from a certain field have difficulties while translating texts that contain domain terminology. As in the example with a “debugger”, the transcribed ...
... on the existence of a bilingual dictio- nary with no parallel texts and the second one requiring only the existence of a small amount of parallel data. In order to compile a bilingual lexicon for a specific domain, we combined and compared several settings. Besides using only a parallel sentence-aligned ...Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
Multi-word Expressions for Abusive Speech Detection in Serbian
Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...... muzara ‘dairy cow’; N PREP N (25), a noun followed by a prepositional phrase, e.g. govno od čoveka ‘shit of a man’, roba s greškom ‘dam- aged goods’; A ADV N (11), adjectives as simile figures, e.g. glup kao noć ‘stupid as night’, N CONJ N (9), two nouns connected with a conjunction, e.g. bruka i sramota ...
... lexicon. They took words with negative polarity as a baseline for creating a basic lexicon of 551 words, which was further enriched via machine learning into a lexicon of 2898 abusive words. Several authors used the Wiegand lexicon as a blacklist in their hate speech and abusive language detection systems ...
... crowdsourcing. In an abusive content detection system, a lexicon could be used in one of the following ways: (i) As a classification feature, either as a binary indicator of the abusive word occurrence in the examined text (Pamungkas and Patti, 2019), or a numerical value corresponding to the number of abusive ...Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev. "Multi-word Expressions for Abusive Speech Detection in Serbian" in Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Association for Computational Linguistics (2020)