126 items
Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy and the Lexicon-Corpus Interface
Verginica Barbu Mititelu, Voula Giouli, Kilian Evang, Daniel Zeman, Petya Osenova, Carole Tiberius, Simon Krek, Stella Markantonatou, Ivelina Stoyanova, Ranka Stankovic, Christian Chiarcos (2024)Predstavljamo trenutne aktivnosti na definisanju interfejsa leksikona i korpusa koji će služiti kao referenca u prikazu polileksemskih jedinica - višečlanih izraza - (različitih tipova - imenskih, glagolskih, itd.) u specijalizovanim leksikonima i povezivanju ovih unosa sa njihovim pojavljivanjima u korpusima. Konačni cilj je korišćenje ovakvih resursa za automatsko identifikovanje višečlanih izraza u tekstu. Uključivanje nekoliko prirodnih jezika ima za cilj univerzalnost rešenja koje nije usredsređeno na određeni jezik, kao i prilagođavanje idiosinkrazijama. Raspravljaju se izazovi u leksikografskom opisu višerečnih ...Verginica Barbu Mititelu, Voula Giouli, Kilian Evang, Daniel Zeman, Petya Osenova, Carole Tiberius, Simon Krek, Stella Markantonatou, Ivelina Stoyanova, Ranka Stankovic, Christian Chiarcos. "Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy and the Lexicon-Corpus Interface" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
From DELA Based Dictionary to Leximirka Lexical Database
Biljana Lazić, Mihailo Škorić (2020)In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...... Guidelines for Electronic Text Encoding and Interchange, Text Encoding Initiative (TEI)3, Lexical Markup Framework (LMF)4 and the Lemon model5. Although Chapter 9 of the TEI Guidelines addresses the issue of dictionary encoding, they only recently address the specificities of ontologies and web resources ...
... syntax, NLP Semantic Extension and NLP Мultilingual Notations. LMF is suitable for encoding morpho- logical, semantic and grammatical aspect of lexical entry. The Lemon was modeled after the LMF, but with the idea of compensating the LMF short- comings in dealing with externally standardized vocabularies ...
... Scientific paper References Bański, Piotr, Jack Bowers and Tomaž Erjavec. “TEI-Lex0 Guidelines for the Encoding of Dictionary Information on Written and Spoken Forms”. In Proceedings of eLex 2017 conference: Electronic lexicography in the 21st century, 485–94. Brno: Lexical Computing CZ s.r.o. ...Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...... Intera (see Subsection 2.2.) and the full lexicon of 1.3+ million tokens (including punctuation) previously derived from then latest version of SMD (Utvić, 2011). TT11 tagset consists of 16 tags, most of them acquired from SMD as labels for major Parts-of-Speech (Table 3 column T11). TT11 was used ...
... SrpKor2013, current version of SrpKor – Corpus of Contemporary Serbian.4 As pointed out in (Utvić, 2011) “TreeTagger isn’t a ‘true’ lemmatizer”, it assigns “the most likely Part-of-Speech tag” and “simply concatenates lemma from a full lexicon, which corresponds to the chosen Part-of-Speech. Hence, word ...
... the same Part-of-Speech, but different lemma cannot coexist in the full lexicon.” A new TreeTagger was produced for this research – TT19, based on the same technology as TT11, the only difference being the set of resources used for training. Both the train- ing corpus and the lexicon were expanded ...Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
Electronic Dictionaries - from File System to lemon Based Lexical Database
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... dictionary. The main class of the core of the lexicon model is the class LexicalEntry, representing a unit of analysis of the lexicon, which encompasses a set of inflected forms that are grammatically related, and a set of base meanings that are associated with all of these forms (Figure 2). A lexical ...
... considered. The first one were TEI (Text Encoding Initiative) Guidelines for dictionary de- scription. TEI is a widely accepted standard for text encod- ing that proposes solutions for many text types, one of them being dictionaries. However, it seems that TEI is more of- ten used for traditional human-oriented ...
... Ireland. Bański, P., Bowers, J., and Erjavec, T. (2017). TEI-Lex0 guidelines for the encoding of dictionary information on written and spoken forms. In Electronic lexicography in the 21st century. Proceedings of eLex 2017 conference. Leiden, the Netherlands, 19—21 September 2017, pages 485 – 494 ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Mihailo Škorić. "Electronic Dictionaries - from File System to lemon Based Lexical Database" in Proceedings of the 11th International Conference on Language Resources and Evaluation - W23 6th Workshop on Linked Data in Linguistics : Towards Linguistic Data Science (LDL-2018), LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
Vebran Web Services for Corpus Query Expansion
Ranka Stanković, Miloš Utvić (2020)U ovom radu se govori o razvoju veb usluga Vebran i njihovoj primeni u poboljšanju pretraživanja korpusa. Veb-servisi Vebran koriste se za konsultovanje spoljnih leksičkih izvora za srpski jezik (uglavnom elektronski morfološki rečnici i srpski Vordnet) i proširivanje korisničkih upita radi dobijanja relevantnijih rezultata iz srpskih korpusa.... full-form lexicon. Each entry of the TreeTagger full-form lexicon contains one-word form and a sequence of tag-lemma pairs that could correspond to that word form (Schmid, 1997). TreeTagger full- form lexicon does not allow the possibility of a lexicon entry with two or more tag-lemma pairs corresponding ...
... 1. The inflection of multiword units is additionally supported by the rule based system. The system supports different alphabets and character encod- ings (the aurora alphabet and ISO-8859-1 character encoding for SrpKor2013 corpus, Serbian Latin alphabet and UTF-8 character encoding for RudKor corpus) ...
... have homograph word forms (tati, tatom, tate, tatu, tata) causing that lexicon entries with these forms cannot contain both tag-lemma pairs (N, tat) and (N, tata) where N is PoS tag denoting noun. Thus, creator of full-form lexicon has to choose which tag-lemma pair will keep and the choice is commonly ...Ranka Stanković, Miloš Utvić. "Vebran Web Services for Corpus Query Expansion" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.5
SrpELTeC: A Serbian Literary Corpus for Distant Reading
U članku je predstavljen SrpELTeC, korpus razvijen u okviru akcije COST Distant Reading for European Literary History (CA16204). Svi romani u SrpELTeC-u su odabrani, pripremljeni i obeleženi korišćenjem zajedničkih principa uspostavljenih za sve jezičke zbirke u Evropskoj zbirci književnog teksta (ELTeC). Navedeni su izazovi i rešenja u pripremi SrpELTeC od nule. Svi romani su ručno kodirani u TEI sa bogatim metapodacima i strukturnim napomenama. Automatska anotacija je uključivala POS-označavanje, lematizaciju i imenovane entitete, oslanjajući se na resurse za obradu ...digital humanities, Serbian literature, text corpora, distant reading , linked data, named entity recognition, text analyticsRanka Stanković, Cvetana Krstev, Duško Vitas. "SrpELTeC: A Serbian Literary Corpus for Distant Reading" in Primerjalna književnost, Research Centre of the Slovenian Academy of Sciences and Arts (2024). https://doi.org/10.3986/pkn.v47.i2.03
A Description of Morphological Features of Serbian: a Revision using Feature System Declaration
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... Slovakia, 15-16 April, 2009. Metalanguage and encoding scheme design for digital lexicography : innovative solutions for lexical entry design in Slavic lexicography: proceedings. Bratislava: L'. Štúr Institute of Linguistic, Slovak Academy of Sciences, 2009, str. 59-70. ISO. (2007) ISO 24 ...
... problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions ...
... The Digital repository of The University of Belgrade Faculty of Mining and Geology archives faculty publications available in open access, as well as the employees' publications. - The Repository is available at: www.dr.rgf.bg.ac.rs A Description of Morphological Features of Serbian: a Revision ...Cvetana Krstev, Ranka Stanković, Vitas Duško. "A Description of Morphological Features of Serbian: a Revision using Feature System Declaration" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta : European Language Resources Association (2010)
Wordnet Development Using a Multifunctional Tool
Ivan Obradović, Ranka Stanković (2007)In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite ...... support the processing of texts in Unicode, and as the usage of this encoding became more and more frequent, the development of a new tool that could handle text in Unicode became inevitable. Building on the functionalities of Intex, but allowing the processing of texts in Unicode, such a new ...
... pairs to represent the concept of the synset. With a particular concept in mind, one approach of the lexicographer to the solution of the first problem would be an inspection of the part of the wordnet where he/she believes a hypernym of the new synset might be found, and if an appropriate ...
... d by a synset, a set of synonymous English word-sense pairs accompanied by a definition of the concept. Concepts are interconnected by various semantic relations, such as hypernym/hyponym (kind of, e.g. animal/dog) or holonym/meronym (part of, e.g. hand/finger). As of 2006, this database contains ...Ivan Obradović, Ranka Stanković. "Wordnet Development Using a Multifunctional Tool" in Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007, - (2007)
Multi-word Expressions for Abusive Speech Detection in Serbian
Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...... presented at the end of the paper. 2 Related work The use of offensive and hateful language has been a concern since the early days of the Internet. It has been estimated that the number of MWEs in the lexicon of a native speaker has the same order of magnitude as the number of single words (Moreno-Ortiz ...
... multilingual online lexicon of hate speech available at hatebase.org in their research. (Wiegand et al., 2018; Silva et al., 2016; Nobata et al., 2016). Wiegand et al. (2018) built a lexicon of abusive words using the subjectivity lexicon of Therese Wilson that is in essence a sentiment lexicon. They took words ...
... belong to one of these two levels: 1) conservative: obtained by trans- lating offensive senses of the words in the original lexicon and 2) inclusive: obtained by translating all the potentially relevant senses of the words in the original lexicon. The basis for HurtLex was a lexicon of offensive terms ...Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev. "Multi-word Expressions for Abusive Speech Detection in Serbian" in Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Association for Computational Linguistics (2020)
Sentiment Analysis of Serbian Old Novels
In this paper we present first study of Sentiment Analysis (SA) of Serbian novels from the 1840-1920 period. The preparation of sentiment lexicon was based on three existing lexicons: NRC, AFFIN and Bing with additional extensive corrections. The first phase of dataset refinement included filtering the word that are not found in Serbian morphological dictionary and in second automatic POS tagging and lemma were manually corrected. The polarity lexicon was extracted and transformed into ontolex-lemon and published as initial ...Ranka Stanković, Miloš Košprdić, Milica Ikonić Nešić, Tijana Radović. "Sentiment Analysis of Serbian Old Novels" in Proceedings of the 2nd Workshop on Sentiment Analysis and Linguistic Linked Data, June 2022, Marseille, France, European Language Resources Association (2022)
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian
Uvredljivi govor na društvenim medijima, uključujući psovke, pogrdni govor i govor mržnje, dostigao je nivo pandemije. Sistem koji bi bio u stanju da detektuje takve tekstove mogao bi da pomogne da internet i društveni mediji postanu bolji virtuelni prostor sa više poštovanja. Istraživanja i komercijalna primena u ovoj oblasti do sada su bili fokusirani uglavnom na engleski jezik. Ovaj rad predstavlja rad na izgradnji AbCoSER-a, prvog korpusa uvredljivog govora na srpskom jeziku. Korpus se sastoji od 6.436 ručno označenih ...... annotation of corpus data are given in Section 2.2 while the extraction of abusive triggers is explained in Section 2.3. The Section 3 presents results of our research: Twitter data analysis (Subsection 3.1), the outcome of the annotation (Subsection 3.2) and the structure of the lexicon of abusive words ...
... and B. Šandrih 13:11 3.3 The lexicon of abusive speech The lexicon of abusive speech, consisting of words that could be used as triggers for the recognition of abusive language is being built, with the idea that the Serbian system for the recognition and normalization of abusive expressions will also ...
... improved version of Hurtlex [2], resources that can be useful for the creation of a lexicon of offensive words are lists of swear words, curses, abusive expressions, existing general dictionaries, slang dictionaries, surveys and contributions through crowd- sourcing, translation of dictionaries and lexicons ...Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih. "A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian" in 3rd Conference on Language, Data and Knowledge (LDK 2021), MDPI AG (2021). https://doi.org/10.4230/OASIcs.LDK.2021.13
Novi koncept izrade Osnovne hidrogeološke karte Srbije
Igor Jemcov, Zoran Stevanović, Vladimir Živanović, Saša Milanović, Dušan Polomčić, Veselin Dragišić (2022)Osnovna hidrogeološka karta (OHGK) predstavlja bazični dokument u hidrogeologiji, a ima za cilj sagledavanje osnovnih tipova izdani što da omogućava sagledavanje podzemnih vodnih resursa na području obuhvaćenom kartom. Primena postojećeg Uputstva za izradu Osnovne hidrogeološke karte SFRJ 1:100.000 (iz 1984, odnosno 1988. godine), vezana je za brojne poteškoće, što je uslovilo da je u proteklom periodu od 30 godina bilo je više inicijativa za formiranjem novog Uputstva. Sagledavajući postojeću situaciju uz činjenice o savremenim trendovima razvoja hidrogeoloških karata u ...Igor Jemcov, Zoran Stevanović, Vladimir Živanović, Saša Milanović, Dušan Polomčić, Veselin Dragišić. "Novi koncept izrade Osnovne hidrogeološke karte Srbije" in Zbornik radova XVI srpskog Simpozijum o hidrogeologiji sa međunarodnim učešćem, Univerzitet u Beograd, Rudarsko-geološki fakultet (2022)
Using Lexical Resources for Irony and Sarcasm Classification
The paper presents a language dependent model for classification of statements into ironic and non-ironic. The model uses various language resources: morphological dictionaries, sentiment lexicon, lexicon of markers and a WordNet based ontology. This approach uses various features: antonymous pairs obtained using the reasoning rules over the Serbian WordNet ontology (R), antonymous pairs in which one member has positive sentiment polarity (PPR), polarity of positive sentiment words (PSP), ordered sequence of sentiment tags (OSA), Part-of-Speech tags of words (POS) ...... complete set of irony markers in lexicon form (resource B in Fig. 1) is a part of the architecture of the suggested model. Ironic tweet classifier (Fig 1) for the purpose of feature construc- tion uses: (1) a set of antonymous pairs (a, z) obtained from the SWN ontology (resource D) a lexicon of irony markers ...
... the set of five features (PPR, PSP, POS, OSA, M) gave the best results of this classifier (acc = 86.1%), for values tp = 144, f p = 66, f n = 175, tn = 1, 347. Downsides of this type of classification, in a general case, lie in the limited nature of the resources (sentiment lexicon, set of rules used ...
... other resource used to detect the occurrence of irony is a lexicon of sentiment words and phrases in Serbian (resource C, Fig. 1). Keeping in mind the nature of the rhetorical figure verbal irony which is used to portray a negative statement in the form of a positive one, using the sentiment lexi- con ...Miljana Mladenović, Cvetana Krstev, Jelena Mitrović, Ranka Stanković. "Using Lexical Resources for Irony and Sarcasm Classification" in Proceedings of the 8th Balkan Conference in Informatics (BCI '17), New York, NY, USA, : ACM (2017). https://doi.org/
Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities
Овај рад представља активности на развоју корпуса ELEXIS-sr, српском додатку вишејезичном анотираном корпусу ELEXIS-а, који се састоји од семантичких анотација и репозиторија значења речи. ELEXIS је паралелни вишејезични анотирани корпус на десет европских језика, који може да се користи као вишејезички репер за евалуацију европских језика са мање и средње развијеним ресурсима. Фокус овог рада је на вишечланим изразима и именованим ентитетима, њиховом препознавању у скупу реченица ELEXIS-sr и поређењу са анотацијама на другим језицима. Разматрају се први кораци ...Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov. "Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...... Serbia cvetana@matf.bg.ac.rs Ranka Stanković University of Belgrade Faculty of Mining and Geology Belgrade, Serbia ranka@rgf.bg.ac.rs Abstract In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news- paper texts that was used to prepare a ...
... levels of annota- tion, which were further used to train two Named Entity Recognition (NER) sys- tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon- based system were evaluated on two sam- ple texts: a part of the gold standard and an independent newspaper text of approx- ...
... For Serbian, thus far a rule-based and lexicon-based NER system was developed – SRPNER (Krstev et al., 2014). Its development started with the recognition of a NE class present in all NE schemes, personal names (Krstev et al., 2005), while the recognition of other main NE classes was subsequently added ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School
Prva škola za obuku polaznika koju je organizovala COST akcija NexusLinguarum održana je od 8. do 12. februara 2021. godine sa ciljem da studenti, istraživači i stručnjaci nauče osnove lingvističke nauke o podacima. Tokom obuke polaznici su se upoznali sa širokim spektrom tema: od semantičkog veba, RDF -a i ontologija, do modeliranja i pretraživanja jezičkih podataka pomoću najsavremenijih ontoloških modela i alata. Škola je održana u okviru serije letnjih škola EUROLAN-a i organizovalo ju je virtuelno (onlajn) nekoliko instituta; ...nauka o lingvističkim podacima, povezani podaci u lingvistici, jezički podaci, EUROLAN, NexusLinguarum, COST akcija, škola za obuku... sentation and Semantic web. The Lemon-OntoLex Frac module was used for representation of the entries from the lexicon used for abusive speech detec- tion with attestations from the Twitter corpus with annotation of abusive spans (Jokić et al. 2021). 3 Organization Due to the COVID-19 pandemic and current ...
... 98–113. Springer. Jokić, Danka, Ranka Stanković, Cvetana Krstev, and Branislava Šandrih. 2021. “A Twitter Corpus and lexicon for abusive speech detection in Serbian.” In Proceedings of the 2021 Language, Data and Knowledge (LDK), 1-3 September in Zaragoza, Spain. McCrae, John P, Julia Bosque-Gil, Jorge ...
... 2014)), lexicog11 – lexicography module (Bosque-Gil, Gracia, and Montiel- 6. Data Catalog Vocabulary (DCAT) - Version 2 7. Lemon - Lexicon Model for Ontologies; Lexicon Model for Ontologies: Com- munity Report, 10 May 2016 8. SKOS Simple Knowledge Organization System - home page 9. Protégé 10. VocBench: ...Milan Dojchinovski, Julia Bosque Gil, Jorge Gracia, Ranka Stanković. "EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.7
Terminology Acquisition and Description Using Lexical Resources and Local Grammars
Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...... the domain of ecsonomy is presented for Polish. It has two modules: a grammatical lexicon of terminological MWEs and a fully lexicalized shallow grammar, obtained by an automatic con- version of the lexicon. Przepiorkowski and asso- ciates (2007) present results of automatic extraction of term definitions ...
... and Application of Automata (Vol. 5642, pp. 237-240): Springer Berlin Heidelberg. Savary, A., Zaborowski, B., Krawczyk-Wieczorek, A. & Makowiecki, F (2012). SEJFEK—a Lexicon and a Shallow Grammar of Polish Economic Multi-Word Units. Proc. of Cognitive Aspects of the Lexicon (COGALEX-III). (pp ...
... , & Wójtowicz, B. (2007). On the evaluation of Polish definition ex- traction grammars. Proc. of the 3rd Language & Technology Conference. Quochi, V., Frontini, F., & Rubino, F. (2012). A MWE Acquisition and Lexicon Builder Web Ser- vice. Proc. of COLING 2012 (pp. 2291-2306). Ramisch, C., De ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
The Nooj System as Module within an Integrated Language Processing Environment
... l dictionaries of lemmas for simple and compound words but also of bilingual and multilingual dictionaries - development and refinement of wordnets, with simultaneous usage of wordnets for different languages - conversions from different formats such as one character encoding set to another ...
... crucial since it adds to the flexibility of resources exploitation. Conversion from one character encoding set to another is very important for languages such as Serbian, where two alphabets, Cyrillic and Latin are equally used. WS4LR enables the exploitation of language resources both in Cyrillic ...
... alphabet, as well as in a special encoding, that uses the ASCII character set and that can be unambiguously transformed into Serbian Latin or Serbian Cyrillic alphabet. In that special encoding, for example, “sx” is used as a code for “š”. With the emergence of NooJ the available resources have ...Ranka Stanković, Duško Vitas, Cvetana Krstev. "The Nooj System as Module within an Integrated Language Processing Environment" in Proceedings of the 2007 International Nooj Conference, Cambridge Scholars Publishing (2008)
Употреба веб платформе Омека за дигиталне библиотеке из домена рударства
У овом раду биће представљена Омека, веб платформа за приказивање дигиталних колекциjа и систем за управљање њиховим садржаjем. Њену примену у области техничких наука, а конкретно у области рударства, приказаћемо на примеру дигиталне библиотеке ROmeka@RGF. За Омеку смо се определили првенствено због чињенице да jе jедноставна за коришћење, има обимну пратећу документациjу и не захтева уско специфичне информатичке вештине што jе чини приступачном за већину корисника, а нарочито за рударске инжењере, коjима jе ова дигитална библиотека првенствено намењена. Документа ...... Вирџиниjи. Припада 2 http://jerteh.rs/biblisha/ListaDokumenata.aspx?JCID=2&lng=en 3 http://bibliotheque.clermont-universite.fr/glangeaud/ 4 TEI: Text Encoding Initiative http://www.tei-c.org/index.xml 5 Content Management System (CMS) 6 Roy Rosenzweig Center for History and New Media (RRCHNM), https://rrchnm ...
... Repository припрема обjекте за размену, представљаjући инверзну функциjу претходном додатку. Подржава Даблинско jезгро, MODS и METS. 10 The Metadata Encoding and Transmission Standard, https://www.loc.gov/standards/mets/METSOverview.v2.html 11 Digital Library Federation, https://www.diglib.org/?s=mets ...
... 17, бр. 2, 2017. Научни рад Слика 7. Резултати претраживања jедночланих и вишечланих термина у дигиталноj библиотеци ROmeka@RGF 6. TEI (Text Encoding Initiative) Са циљем екстракциjе информациjа из дигиталних обjеката коjи су део дигиталне библиотеке ROmeka@RGF, одлучили смо се да обjекте коjи ...Александра Томашевић, Биљана Лазић, Далибор Воркапић, Михаило Шкорић, Љиљана Колоња. "Употреба веб платформе Омека за дигиталне библиотеке из домена рударства" in Инфотека, Филолошки факултет, Универзитет у Београду; Универзитетска библиотека „Светозар Марковић“; Заједница библиотека универзитета у Србији (2017)
Football terminology: compilation and transformation into OntoLex-Lemon resource
У овом раду представља се пројекат који је у развоју, креирање првог дигиталног фудбалског речника на српском језику, као и да демонстрација примене модела OntoLex и љегових модула. OntoLex-FrAC модул укључује информације о учесталости и примерима употребе екстрахованих из корпуса. У овом случају, креиран је корпус за специфичан домен под називом СрФудКо, који садржи чланке вести о фудбалу на српском језику. Вишечлани термини аутоматски су екстраховани из српског корпуса, а затим ручно евалуирани и класификовани као спортски или ...Jelena Lazarević, Ranka Stanković, Mihailo Škorić, Biljana Rujević. "Football terminology: compilation and transformation into OntoLex-Lemon resource" in LDK 2023 – 4th Conference on Language, Data and Knowledge, 12-15 September in Vienna, Austria, Lisabon : NOVA FCSH - CLUNL (2023). https://doi.org/10.34619/srmk-injj