Претрага ⚒ Радови ⚒ Др РГФ - Репозиторијум РГФ

Претрага

Per page

Sort by

88 items

Annotation of the Serbian ELTeC Collection

Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Mihailo Škorić (2021)

Ovaj rad predstavlja takozvano izdanje nivoa 2 kolekcije tekstova SrpELTeC razvijene u okviru aktivnosti Radne grupe 2 – Metode i alati COST akcije CA 16204 (Distant Reading for European Literary History) i njene specifikacije šeme. Izdanje nivoa 2 je nastavak izdanja nivoa 1, koje se koristi kao ulaz za morfosintaksičke i NER anotacije romana. Srpska obrada nivoa-2 je navedena kroz potrebne korake, uključujući metode i alate koji se koriste u tom procesu. Neki statistički podaci iz srpske kolekcije nivoa ...

udaljeno čitanje, literarni korpus, tagiranje, prepoznavanje imenovanih entiteta, lematizacija, ELTeC

Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Mihailo Škorić. "Annotation of the Serbian ELTeC Collection" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.2.3
Development of Open Educational Resources (OER) for Natural Language Processing

Cvetana Krstev, Biljana Lazić, Ranka Stanković, Giovanni Schiuma, Miladin Kotorčević (2015)

In this paper we present the development of an online course at the edX BAEKTEL platform named “Lexical Recognition in the Natural Language Processing (NLP)”. It is based on the course of the same name for PhD studies at the University of Belgrade, Faculty of Philology. There are not many courses in Computational Linguistics (CL) on OER platforms, and there is none in Serbian either for CL or NLP. We have developed this course in order to improve this ...

E-Learning, Open Educational Resources, Computational Linguistics, Lexical Resources, edX

... improve this situation as it can prove useful both for linguists working in corpus linguistics and computer scientists developing NLP applications. The participant will become familiar with the use of Unitex, the corpus processing system for which many valuable resources for Serbian were already ...
... oriented processing of texts in human languages. As an illustration, a string oriented web interface to the Corpus of Contemporary Serbian 13 is presented. [13][14] 2. Unitex corpus processing system is presented from the practical point of view: how to install it and start working with it ...
... within BAEKTEL project of its OER version within the edX BAEKTEL platform. 2 The main features of Unitex, an open access and open source corpus processing system, are presented in Section 4. Section 5 presents course content with didactic criteria and specific formats used in the OER course ...
Cvetana Krstev, Biljana Lazić, Ranka Stanković, Giovanni Schiuma, Miladin Kotorčević. "Development of Open Educational Resources (OER) for Natural Language Processing" in The Sixth International Conference on e-Learning (eLearning-2015), September 2015, Belgrade, Serbia, Belgrade : Belgrade Metropolitan Univesity (2015)
Old or New, We Repair, Adjust and Alter (Texts)

Cvetana Krstev, Ranka Stanković (2020)

U ovom radu predstavljamo kako se e-rečnici i kaskade transduktora konačnih stanja implementirani u alatu Unitex mogu koristiti za rešavanje tri problema transformacije teksta: ispravljanje tekstova nakon OCR-a, vraćanje dijakritičkih znakova i prebacivanje između različitih jezičkih varijanti.

ispravka teksta, OCR greške, restauracija dijakritika , jezičke varijante, elektronski rečnik, transduktori konačnih stanja

... present the solution for detecting and correcting OCR errors developed for the compilation of the corpus of Serbian novels written and published in the period 1840–1920.2 The novels selected for this corpus were mainly printed in Cyrillic script (only a few of them were in Latin script). When scanning ...
... The text obtained by OCR is processed using Serbian morphological electronic dictionaries (SMD) (Krstev, 2008); 2 This corpus is a part of the European Literary Text Collection corpus (ElTEC) developed in the scope of the COST action 16204 Distant Reading for European Literary History (d-reading). 64 ...
... performed by finite-state transducers (FST) implemented in Unitex.3 A separate FST is written for each replacement 3 UnitexGramLab, a lexicon-based corpus processing suite. Infotheca Vol. 19, No. 2, December 2019 65 Krstev C., Stanković R., “Old or new, we repair . . . ”, pp. 61–80 a non-valid ...
Cvetana Krstev, Ranka Stanković. "Old or New, We Repair, Adjust and Alter (Texts)" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.3
Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons

Mihailo Škorić (2017)

The goal of this paper is to draw attention to the possibility of using emoticon-riddled text on the web in language-neutral sentiment analysis. It introduces several innovations in the existing framework of research and tests their effectiveness. It also presents a software tool especially made for that purpose, explains how it builds a database with sentimental value of terms and offers the user manual. Finally, it presents a software tool that tests the new database and gives some examples ...

data mining, information extraction, emotions, text on the web

... databases for this study were created, from the collection of the corpus to the export of completed database, which can then be used in several ways. 2.1 Collecting textual corpus The basic idea was for the database to be based on a corpus of texts containing determiners which express positive or negative ...
... pendent, the system would be language-independent as well. If it turns out to be valid, this method could allow machine learning the usage of huge corpus of texts that are pre-labeled with determiners. 1.1 Review of their former similar studies In 2005 a series of experiments with the classification ...
... and language-neutral determiner strings will be used. Goal is to create a fully language-independent system that would greatly broaden the possible corpus. 1 Users of Twitter platform have an option to additionally mark their posts with tags so that posts that talk about a certain topic can be found ...
Mihailo Škorić. "Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons" in Infotheca, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2017.17.1.4
Using English Baits to Catch Serbian Multi-Word Terminology

Cvetana Krstev, Branislava Šandrih, Ranka Stanković (2018)

In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...

aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inﬂection

... contained 491,990 translation pair candidates. We decided to enrich corpus with additional parallel lists (described in Subsection 4.4.) since we observed certain improvement in evaluations of translation quality. First we splitted corpus of aligned sentences into three disjoint parts: training (80%), ...
... word forms and MWE pairs derived from bilingual dictionaries and morphological (inflected) dictionaries for Serbian and English; 4.1. Aligned/parallel corpus The English/Serbian textual resource was derived from the journal for Digital Humanities Infotheca3 that is published biannually in Open Access ...
... 551 English/Serbian entries, parallel list from SWN and PWN containing 75,766 aligned English/Serbian literals and aligned Serbian and English inflected word forms hav- ing 372,432 entries (all described in previous subsections). 10Unitex/GramLab, a lexical-based corpus processing suite http://unitexgramlab ...
Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
Terminology Acquisition and Description Using Lexical Resources and Local Grammars

Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić (2015)

Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...

... transducers using CasSys tool incorporated in Unitex1 corpus processing platform, as well as the use of TMF standard for the representation of terms is proposed in (Ammar et al., 2015) and applied on Arabic scientific and technical corpus. In (Savary et al., 2012) terminology extraction in the ...
... ported that modern statistical Natural Language Processing (NLP) is in great need of better lan- guage models and linguistic tools must come to 1 Corpus processing System Unitex: http://www-igm.univ- mlv.fr/~unitex/ Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada ...
... extraction In order to evaluate our approach, we applied it to a collection of 74 papers in Serbian from the journal Infotheca. 6 The size of the corpus is 6 Infotheca - Journal for Digital Humanities (http://infoteka.bg.ac.rs/index.php/en/infoteca) Proceedings of the conference Terminology and ...
Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
Multi-word Expressions for Abusive Speech Detection in Serbian

Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev (2020)

Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...

uvredljiv govor, govor mržnje, leksički izvori, višejezični leksikon, izrazi sa više reči

... the domain corpus of hateful content and Subjectivity lexicon of Therese Wilson in combination with the SentiWordNet (Esuli and Sebastiani, 2006).For clas- sification, they leveraged rules and achieved a result of F1 = 0.783 for strongly hateful sentences on a manually annotated domain corpus. Razavi ...
... hyperbole, litotes etc. Initial work on detecting some of these figures has been presented in (Mladenović et al., 2017; Krstev et al., 2020). Using a corpus of newspaper articles from 2006, Krstev et al. (2007) presented the results of an infor- mation search experiment in search of attacks which are the ...
... speech (1260), MAYBE – could lead to abusive content (462), NO – not abusive (2902). The manual classification was supported by search over a Twitter corpus collected specifically for his research, Web 79 A ADV N PRO V (blank) Total maybe 93 12 152 0 168 37 462 no 432 142 978 17 1333 2902 yes 213 39 ...
Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev. "Multi-word Expressions for Abusive Speech Detection in Serbian" in Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Association for Computational Linguistics (2020)
EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School

Milan Dojchinovski, Julia Bosque Gil, Jorge Gracia, Ranka Stanković (2021)

Prva škola za obuku polaznika koju je organizovala COST akcija NexusLinguarum održana je od 8. do 12. februara 2021. godine sa ciljem da studenti, istraživači i stručnjaci nauče osnove lingvističke nauke o podacima. Tokom obuke polaznici su se upoznali sa širokim spektrom tema: od semantičkog veba, RDF -a i ontologija, do modeliranja i pretraživanja jezičkih podataka pomoću najsavremenijih ontoloških modela i alata. Škola je održana u okviru serije letnjih škola EUROLAN-a i organizovalo ju je virtuelno (onlajn) nekoliko instituta; ...

nauka o lingvističkim podacima, povezani podaci u lingvistici, jezički podaci, EUROLAN, NexusLinguarum, COST akcija, škola za obuku

... September 2021 115 Dojchinovski M. et al., eurolan 2021: . . . Linked Data. . . , pp. 113–120 Ponsoda 2017), FrAC 12 – frequency, attestation and corpus Informa- tion (Chiarcos et al. 2020). Finally, the training school ended with a closing session where an ontology of participants, lecturers and ...
... and building on to present more specific topics in a detailed fashion on the last day, the participants had 12. FrAC – Frequency, Attestation and Corpus Information - Ontology-Lexica Community Group 116 Infotheca Vol. 21, No. 1, September 2021 Professional paper a chance to acquire a solid foundation ...
... Lex Frac module was used for representation of the entries from the lexicon used for abusive speech detec- tion with attestations from the Twitter corpus with annotation of abusive spans (Jokić et al. 2021). 3 Organization Due to the COVID-19 pandemic and current travel restrictions in Europe and beyond ...
Milan Dojchinovski, Julia Bosque Gil, Jorge Gracia, Ranka Stanković. "EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.7
From DELA Based Dictionary to Leximirka Lexical Database

Biljana Lazić, Mihailo Škorić (2020)

In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...

Morfološki rečnici, jezički resursi, Leksimirka

... 000 most frequent words in the Serbian Corpus of the Serbian Language SrbCorp (version of 122 million words by Duško Vitas and Miloš Utvić)6. Information about the Corpus is stored in the KorpusMeta table. The LexicalRelation table stores information 6 Corpus of the Serbian Language – SrbCorp 86 ...
... that match the specified search criteria appear as rows in the table. The registered user has access to multiple corpus searches (in the MatKorp and SrpKorpRGF corpora). The Mining Corpus (RudKorp) (Tomašević et al., 2018) that can be searched by some predefined queries that retrieve a word searched ...
... their main importance is their reusability. They were used for the basic tasks of word processing, automatic recognition 1 Unitex is cross-platform Corpus Processing Suite to retrieve data. Infotheca Vol. 19, No. 2, December 2019 81 Lazić B., Škorić M., “From DELA based dictionary to . . . ”, pp ...
Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
Infotheca (Q25460443) in Wikidata

Ranka Stanković, Lazar Davidović (2021)

Vikipodaci su baza znanja Zadužbine Vikimedija koja predstavlja zajednički izvor različitih vrsta podataka koje koriste ne samo drugi Vikipedijini projekti, već sve više i brojne aplikacije semantičkog veba. U ovom radu ćemo prezentovati primer integracije Vikipodataka sa digitalnim bibliotekama i eksternim sistemima, kao i mogućnost ubrzanja pripreme i unosa podataka na primeru radova iz časopisa za digitalnu humanistiku Infoteka.

semantički veb,otvoreni povezani podaci, vikpodaci,Infoteka, metapodaci časopisa

... gual lexical extraction based on word alignment for improving corpus search.” The Electronic Library. Krstev, Cvetana, Jelena Jaćimović, Branislava Šandrih, and Ranka Stanković. 2019. “Analysis of the first Serbian Literature Corpus of the Late 19th and Early 20th century with the TXM platform.” ...
... data network was used by Andonovski (Андоновски 2020) to describe lan- guage resources, namely, novels forming part of the Serbian-German literary corpus (Andonovski, Šandrih, and Kitanović 2019). For a number of years now, students at the Faculty of Mining and Geology have been undergoing training ...
... уносу метаподатака о српским романима из корпуса srpELTeC 13 COST Action CA16204 (2017-2021) metadata about Serbian novels included in the srpELTEC corpus is being entered into the knowledge base (Krstev et al. 2019) and Wikidata linked to various applications, one of which is Au- rora.14 Members of JeRTeh ...
Ranka Stanković, Lazar Davidović. "Infotheca (Q25460443) in Wikidata" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.5
Transformer-Based Composite Language Models for Text Evaluation and Classification

Mihailo Škorić, Miloš Utvić, Ranka Stanković (2023)

Parallel natural language processing systems were previously successfully tested on the tasks of part-of-speech tagging and authorship attribution through mini-language modeling, for which they achieved significantly better results than independent methods in the cases of seven European languages. The aim of this paper is to present the advantages of using composite language models in the processing and evaluation of texts written in arbitrary highly inflective and morphology-rich natural language, particularly Serbian. A perplexity-based dataset, the main asset for the ...

General Mathematics, Engineering (miscellaneous), Computer Science (miscellaneous)

Mihailo Škorić, Miloš Utvić, Ranka Stanković. "Transformer-Based Composite Language Models for Text Evaluation and Classification" in Mathematics, MDPI AG (2023). https://doi.org/10.3390/math11224660
Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model

Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković, Biljana Rujević (2024)

Ova studija predstavlja analizu sentimenta srpskih starih romana iz perioda 1840-1920, koristeći veliki jezički model (LLM) Mistral za tehniku učenja sa zasnovani na takozvanim "zero" i "few-shot" pokušajima. Glavni pristup uvodi inovacije osmišljavanjem istraživačkih upita (promptova) uključuju tekst sa uputstvom za klasifikaciju bez primera i na osnovu nekoliko primera, omogućavajući jezičkom modelu da klasifikuje osećanja u pozitivne, negativne ili objektivne kategorije. Ova metodologija ima za cilj da pojednostavi analizu osećanja ograničavanjem odgovora, čime se povećava preciznost ...

zero-shot, few-shot, sentiment, Serbian, Mistral model

Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković, Biljana Rujević. "Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model" in In Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024), BAS (2024)
The Nooj System as Module within an Integrated Language Processing Environment

Ranka Stanković, Duško Vitas, Cvetana Krstev (2008)

NooJ, electronic dictionary, lexical resources

... lemma cyelo 4. Textual resources management 4.1. Parallel Text Management The WS4LR module for management of aligned parallel texts uses texts which have previously been aligned using Xalign as an alignment tool (Bonhomme 2001). Parallel texts which usually originate from a text in one language ...
... lex-resources to texts”) then syntactic resources should not be chosen, and if the last option is on (“Apply query to corpus”), then the user selects only a query and a corpus. Figure 12 presents results in the form of concordances for the query: kompjuter, which was automatically expanded with ...
... retrieval and related areas. If query is further combined with ILI, a multilingual wordnet pivot, the possibility of searching text resources (web, corpus, text) in different languages with a single query is opened. NooJ supports morphological query expansion and expansion of queries by graphs and ...
Ranka Stanković, Duško Vitas, Cvetana Krstev. "The Nooj System as Module within an Integrated Language Processing Environment" in Proceedings of the 2007 International Nooj Conference, Cambridge Scholars Publishing (2008)
Frequency and Length of Syllables in Serbian

Marija Radojičić, Biljana Lazić, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Ján Mačutek, Lívia Leššová (2019)

Basic analyses of several properties of syllables (the rank-frequency distribution, the distribution of length, and the relation between length and frequency) in Serbian is presented. The syllabification algorithm used combines the maximum onset principle and the sonority hierarchy. Results indicate that syllables behave similarly to words as far as mathematical models are concerned, but values of parameters in models for syllables are quite different from those for words.

frekvencije slogova, dužina slogova, srpski jezik

... Russian socialist realist novel “Kak zakalyalas’ stal’” (How the Steel Was Tempered) by N. Ostrovsky. The choice is motivated by the fact that a parallel corpus consisting of the first ten chapters of the novel and their translations to all standard Slavic languages (except for Lower Sorbian) is available ...
... so far performed on one language only. In future, other Slavic languages and other aspects of syllables will be investigated. As there is a parallel corpus of Slavic languages available, properties of syllables can be used to construct a data-based typology of Slavic languages and to compare it with ...
... onsets and codas. If one follows his modification, a large enough corpus is needed to perform statistical tests, based on which a decision on the (non-) marginality of a particular consonant cluster is made. Finding or creating such a corpus can be problematic for minor languages (such as e.g. Lower and ...
Marija Radojičić, Biljana Lazić, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Ján Mačutek, Lívia Leššová. "Frequency and Length of Syllables in Serbian" in Glottometrics (2019)
Part of Speech Tagging for Serbian language using Natural Language Toolkit

Ranka Stanković, Boro Milovanović (2020)

Dok se razvijaju složeni algoritmi za NLP (obrada prirodnog jezika), osnovni zadaci kao što je označavanje ostaju veoma važni i još uvek izazovni. NLTK (Natural Language Toolkit) je moćna Python biblioteka za razvoj programa zasnovanih na NLP-u. Pokušavamo da iskoristimo ovu biblioteku za kreiranje PoS (vrsta reči) oznake za savremeni srpski jezik. Jedanaest različitih modela je kreirano korišćenjem NLTK API-ja za označavanje. Najbolji modeli se transformišu sa Brill tagerom da bi se poboljšala tačnost. Obučili smo modele na označenom ...

obrada prirodnog jezika, mašinsko učenje, neuronske mreže

... George Orwell, part of MULTEXT-East resources [9]. INTERA (Integrated European language data Repository Area) is a project that produced multilingual corpus on law, health and education [10]. Around the world in 80 days is a novel by Jules Verne annotated during SEE-ERA.net project [11]. ELTeC (European ...
... International Conference on Computational intelligence, man-machine systems and cybernetics, Tenerife, Spain, Dec. 2009 [6] M. Utvić, “Annotating the Corpus of Contemporary Serbian,” INFOtheca, vol. 12 no. 2 pp 36a-47a, Dec. 2011 [7] M. Constant, C. Krstev, and D. Vitas “Lexical Analysis of Serbian ...
... Piperidis, V. Giouli, N. Calzolari, M. Monachini, C. Soria, and K. Choukri, “Language Resources Production Models: the Case of the INTERA Multilingual Corpus and Terminology,” Proc. Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy, May 2006 [11] D. l. Tufis ...
Ranka Stanković, Boro Milovanović. "Part of Speech Tagging for Serbian language using Natural Language Toolkit" in 7th International Conference on Electrical, Electronic and Computing Engineering IcETRAN 2020, Academic Mind, Belgrade (2020)
E-Connecting Balkan Languages

Cvetana Krstev, Ranka Stanković, Duško Vitas, Svetla Koeva (2009)

In this paper we present a versatile language processing tool that can be successfully used for many Balkan languages. This tool relies for its work on several sophisticated textual and lexical resources that were developed for most of Balkan languages. These resources are based on several de facto standards in natural language processing.

Query expansion, e-dictionary, wordnet, proper name, aligned text

... 14-20, 2008. [18] R. Steinberger, B. Pouliquen, A. Widiger, C. Ignat, T. Erjavec, D. Tufiş. 2006. The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th LREC Conference, Genoa, Italy, 22-28 May, 2006, pp.2142-2147, 2006. [19] M. Tran, D. Maurel ...
... 2005-02 de l’Institut Gaspard- Monge, CNRS, 2005. [4] T. Erjavec and N. Ide. The MULTEXT-East Corpus. In LREC’98, Granada, pp. 971-974, 1998. [5] A. Gelbukh, G. Sidorov, J.-A. Vera-Félix. A Bilingual Corpus of Novels Aligned at Paragraph Level. In proc. FinTAL-2006. Lecture Notes in Artificial ...
... Morphologie et syntaxe. Le cas du grec moderne, Proceedings AILA 1990, Chalcidique, 1990. [13] E. Laporte, T. Nakamura, S. Voyatzi. A French Corpus Annotated for Multiword Nouns, in: Towards a Shared Task for Multiword Expressions (MWE 2008), in scope of the Sixth Interantional Conference ...
Cvetana Krstev, Ranka Stanković, Duško Vitas, Svetla Koeva. "E-Connecting Balkan Languages" in Proceedings of the Workshop Workshop on Multilingual resources, technologies and evaluation for Central and Eastern European Languages, 17 September 2009, eds. C. Vertan, S. Piperidis, E. Paskaleva and Milena Slavcheva, Borovets, Bulgaria : Association for Computational Linguistics Stroudsburg, PA, USA (2009)
Indexing of textual databases based on lexical resources: A case study for Serbian

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović (2015)

In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...

... for which we could have used the TreeTagger trained for Serbian that was used for the lemmatization of the Corpus of Contemporary Serbian [16]. However, this lemmatizer was trained on a corpus that differs significantly from our collection, and additionally it does not take into account MWUs. The approach ...
... much as possible [7]. These local grammars were organized in cascades that further resolve ambiguities [10]. NER system was evaluated on a newspaper corpus and results reported in [7] showed that F -measure of recognition was 0.96 for types and 0.92 fot tokens. For the purpose of indexing, we applied ...
... Nikolić, V.: The Develop- ment of the GeolISSTerm Terminological Dictionary. INFOtheca 12(1), 49a–63a (August 2011) 16. Utvić, M.: Annotating the Corpus of contemporary Serbian. INFOtheca – Journal of Informatics & Librarianship 12(2), 36a–47a (2011) 17. Vossen, P.: EuroWordNet: a multilingual database ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian

Biljana Rujević, Marija Kaplar, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Jan Mačutek (2021)

slogovi, distribucija rang-preciznost, slovenski jezici

Biljana Rujević, Marija Kaplar, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Jan Mačutek. "Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian" in Language and Text: Data, models, information and applications, John Benjamins Publishing Company (2021). https://doi.org/10.1075/cilt.356.04ruj
Речници у дигиталном добу - информатичка подршка за српски језик

Биљана Рујевић (2022)

Морфолошки речници српског језика представљају електронски језички ресурс који има значајну историју развоја и коришћења за потребе обраде природних језика. С обзиром на то да су чувани у облику датотека чији је број нарастао па је самим тим управљање речницима постало отежано јавила се потреба за смештањем информација из речника у облик лексикографске базе. Како би се омогућио симултани рад на развоју речника за више корисника јавила се потреба за веб-апликацијом заснованој на лексикографској бази. Како би се размотриле ...

електронски речници, лексикографска база података, лексички ресурси, српски језик

Биљана Рујевић. Речници у дигиталном добу - информатичка подршка за српски језик, Београд : [Б. Рујевић], 2022
Keyword Extraction from Parallel Abstracts of Scientific Publications

Slobodan Beliga, Olivera Kitanović, Ranka Stanković, Sanda Martinčić-Ipšić (2017)

... 03:24:53 Keyword Extraction from Parallel Abstracts of Scientific Publications Slobodan Beliga, Olivera Kitanović, Ranka Stanković, Sanda Martinčić-Ipšić Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Keyword Extraction from Parallel Abstracts of Scientific Publications ...
... from parallel abstracts of scientific publication in the Serbian and English languages. The keywords are extracted by a selectivity-based keyword extraction method. The method is based on the structural and statistical properties of text represented as a complex network. The constructed parallel corpus ...
... SBKE method on parallel texts from the Serbian and English languages2. 2 Bilingual Serbian-English KE dataset is publicly available from http://langnet.uniri. hr/resources.html. http://langnet.uniri.hr/resources.html http://langnet.uniri.hr/resources.html Keyword Extraction from Parallel Abstracts of ...
Slobodan Beliga, Olivera Kitanović, Ranka Stanković, Sanda Martinčić-Ipšić . "Keyword Extraction from Parallel Abstracts of Scientific Publications" in Sematic Keyword-Based Search on Structured Data Sources - Third International KEYSTONE Conference, IKC 2017 Gdańsk, Poland, September 11–12, 2017 Revised Selected Papers and COST Action IC1302 Reports, Springer (2017)

Претрага

88 items

Annotation of the Serbian ELTeC Collection cite

Development of Open Educational Resources (OER) for Natural Language Processing cite

Old or New, We Repair, Adjust and Alter (Texts) cite

Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons cite

Using English Baits to Catch Serbian Multi-Word Terminology cite

Terminology Acquisition and Description Using Lexical Resources and Local Grammars cite

Multi-word Expressions for Abusive Speech Detection in Serbian cite

EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School cite

From DELA Based Dictionary to Leximirka Lexical Database cite

Infotheca (Q25460443) in Wikidata cite

Transformer-Based Composite Language Models for Text Evaluation and Classification cite

Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model cite

The Nooj System as Module within an Integrated Language Processing Environment cite

Frequency and Length of Syllables in Serbian cite

Part of Speech Tagging for Serbian language using Natural Language Toolkit cite

E-Connecting Balkan Languages cite

Indexing of textual databases based on lexical resources: A case study for Serbian cite

Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian cite

Речници у дигиталном добу - информатичка подршка за српски језик cite

Keyword Extraction from Parallel Abstracts of Scientific Publications cite

Annotation of the Serbian ELTeC Collection

Development of Open Educational Resources (OER) for Natural Language Processing

Old or New, We Repair, Adjust and Alter (Texts)

Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons

Using English Baits to Catch Serbian Multi-Word Terminology

Terminology Acquisition and Description Using Lexical Resources and Local Grammars

Multi-word Expressions for Abusive Speech Detection in Serbian

EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School

From DELA Based Dictionary to Leximirka Lexical Database

Infotheca (Q25460443) in Wikidata

Transformer-Based Composite Language Models for Text Evaluation and Classification

Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model

The Nooj System as Module within an Integrated Language Processing Environment

Frequency and Length of Syllables in Serbian

Part of Speech Tagging for Serbian language using Natural Language Toolkit

E-Connecting Balkan Languages

Indexing of textual databases based on lexical resources: A case study for Serbian

Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian

Речници у дигиталном добу - информатичка подршка за српски језик

Keyword Extraction from Parallel Abstracts of Scientific Publications