Претрага ⚒ Радови ⚒ Др РГФ - Репозиторијум РГФ

Претрага

Per page

Sort by

92 items

Improvement of geodatabase queries within GeolISS

Ranka Stanković (2008)

... handles aligned texts. A pair of semantically equivalent texts in different languages, such as an original text and its translation, that are aligned on a structural level (paragraph, sentence, phrase, etc.) is known as an aligned text or bitext. The standard format for representing aligned texts ...
... is the Translation Memory eXchange format (TMX) that is XML-compliant [13]. Expanded query can be applied on TXM documents in order to retrieve aligned segments that correspond to search criteria in the source and target language. A filtered TMX document is transformed into XML, TXT and HTML output ...
... Developer network (http://edn.esri.com) [8] Vitas D., G. Pavlović-Lažetić, C. Krstev, Lj. Popović, I. Obradović (2003): „Processing Serbian Written Texts: An Overview of Resources and Basic Tools“, Proceedings of the International Workshop on Balkan Language Resources and Tools, Thessaloniki, Greece ...
Ranka Stanković. "Improvement of geodatabase queries within GeolISS" in Review of the National Center for Digitization, Beograd : Faculty of Mathematics, Belgrade (2008)
Transformer-Based Composite Language Models for Text Evaluation and Classification

Mihailo Škorić, Miloš Utvić, Ranka Stanković (2023)

Parallel natural language processing systems were previously successfully tested on the tasks of part-of-speech tagging and authorship attribution through mini-language modeling, for which they achieved significantly better results than independent methods in the cases of seven European languages. The aim of this paper is to present the advantages of using composite language models in the processing and evaluation of texts written in arbitrary highly inflective and morphology-rich natural language, particularly Serbian. A perplexity-based dataset, the main asset for the ...

General Mathematics, Engineering (miscellaneous), Computer Science (miscellaneous)

Mihailo Škorić, Miloš Utvić, Ranka Stanković. "Transformer-Based Composite Language Models for Text Evaluation and Classification" in Mathematics, MDPI AG (2023). https://doi.org/10.3390/math11224660
Parallel Stylometric Document Embeddings with Deep Learning Based Language Models in Literary Authorship Attribution

Mihailo Škorić, Ranka Stanković, Milica Ikonić Nešić, Joanna Byszuk, Maciej Eder (2022)

This paper explores the effectiveness of parallel stylometric document embeddings in solving the authorship attribution task by testing a novel approach on literary texts in 7 different languages, totaling in 7051 unique 10,000-token chunks from 700 PoS and lemma annotated documents. We used these documents to produce four document embedding models using Stylo R package (word-based, lemma-based, PoS-trigrams-based, and PoS-mask-based) and one document embedding model using mBERT for each of the seven languages. We created further derivations of these ...

General Mathematics, Engineering (miscellaneous), Computer Science (miscellaneous)

Mihailo Škorić, Ranka Stanković, Milica Ikonić Nešić, Joanna Byszuk, Maciej Eder. "Parallel Stylometric Document Embeddings with Deep Learning Based Language Models in Literary Authorship Attribution" in Mathematics, MDPI AG (2022). https://doi.org/10.3390/math10050838
Towards translation of educational resources using GIZA++

Ivan Obradović, Dalibor Vorkapić, Ranka Stanković, Nikola Vulović, Miladin Kotorčević (2016)

... of semantically related word pairs, ideally translational equivalents, is presented, from aligned texts in SELFEH, a Serbian- English corpus of texts related to education, finance, health and law, aligned at the sentence level within Intera project. The corpus was lemmatized and the method applied ...
... attached to each aligned sentence (element ) in order to establish a direct relation to metadata and the original (pdf, edX, docx,…) form of resource document, article, course or other resource. Image 2 presents one part from the TMX document with ID: 1.2010.1.4. From aligned TMX documents ...
... needs several reviews before publishing or preparation for voice recording. [10] A Computer Aided Translation (CAT) Tool is based on collection of aligned sentence pairs in the form of Translation Memory, which facilitates and speeds up the translator's work. Main key functions of a CAT tool that speed ...
Ivan Obradović, Dalibor Vorkapić, Ranka Stanković, Nikola Vulović, Miladin Kotorčević. "Towards translation of educational resources using GIZA++" in The Seventh International Conference on e-Learning (eLearning-2016), September 2016, Belgrade : Metropolitan Univesity (2016)
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++

Branislava Šandrih, Ranka Stanković (2020)

U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...

ekstrakcija terminologije, validacija terminologije, GIZA++, grafovi, Unitex, klasifikacija teksta

... technique used in (Naguib Sabtan, 2016), groups of aligned sentences (verses) were used. In (Irvine and Callison-Burch, 2016) authors performed two experiments, the first one relying on the existence of a bilingual dictio- nary with no parallel texts and the second one requiring only the existence of ...
... options, thus obtaining 8 different experimental settings: 1. The input domain aligned corpus (Input i) consists of: (a) the aligned corpus LIS-corpus; (b) the aligned corpus LIS-corpus extended with the bilingual aligned pairs bi-list (LIS-corpus+); 2. The list of domain terms for the source language ...
... sentence-aligned domain-specific corpus involving a source and a target language, denoted as S(text.align) ↔ T (text.align). In this paper we refer to this tool as LIS-corpus. As a textual resource, twelve issues with a total of 84 papers were aligned at the sentence level resulting in 14,710 aligned segments ...
Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
A Data Driven Approach for Raw Material Terminology

Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)

The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...

sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci

... (32%). The bilingual corpus of texts aligned on the sentence level was produced from the bilin- gual digital library Bibliša. The initial set of 55 documents containing 4831 aligned Serbian- English sentences [29] was enlarged with 44 new documents containing 12,657 aligned sentences from the raw material ...
... Underground Mining, published both in Serbian and English, stored in the bilingual digital library Bibliša, as one of the collections of aligned English-Serbian bi-texts [29,30], were also used in our approach. A monolingual corpus from the mining domain was developed as part of a project related to managing ...
... characteristics of the distribution of the sample sentences extracted from the corpus that contains different texts. The approach was adapted to work also for English and to be applied for bilingual aligned sentences. For ranking, we have used a weighted score derived from lexical features (e.g., sentence length ...
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model

Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković, Biljana Rujević (2024)

Ova studija predstavlja analizu sentimenta srpskih starih romana iz perioda 1840-1920, koristeći veliki jezički model (LLM) Mistral za tehniku učenja sa zasnovani na takozvanim "zero" i "few-shot" pokušajima. Glavni pristup uvodi inovacije osmišljavanjem istraživačkih upita (promptova) uključuju tekst sa uputstvom za klasifikaciju bez primera i na osnovu nekoliko primera, omogućavajući jezičkom modelu da klasifikuje osećanja u pozitivne, negativne ili objektivne kategorije. Ova metodologija ima za cilj da pojednostavi analizu osećanja ograničavanjem odgovora, čime se povećava preciznost ...

zero-shot, few-shot, sentiment, Serbian, Mistral model

Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković, Biljana Rujević. "Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model" in In Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024), BAS (2024)
A bilingual digital library for academic and entrepreneurial knowledge management

Ranka Stanković, Cvetana Krstev, Biljana Lazić, Dalibor Vorkapić (2015)

A generic knowledge management process of organization, storage and retrieval of knowledge can suitably be fitted in a digital library. In the digital and knowledge age digital libraries can be used in knowledge management to handle intellectual assets and support knowledge creation. A multilingual digital library either stores content in more than one language or provides multilingual query access to monolingual content. In Serbia 18 of 308 scientific journals regularly published are bi-lingual, with papers simultaneously being in English ...

Knowledge management, Digital library, Multilingualism, Language Resources, Terminology

... Architecture and Urbanism presently has just 10 papers. Project reports included originate from the BAEKTEL Tempus project. All bilingual texts are aligned at the sentence level, represented in a TMX1 (Translation Memory eXchange) format, and stored in the MarkLogic NoSQL database. Text collections ...
... lexical resources, access to aligned resources, etc.) 4 System components In designing Bibliša special attention is given to its language support component. It supports various aspects of multilingual libraries: its content is not only multilingual, but also aligned and it can be searched in any ...
... tool ACIDE (Aligned Corpora Integrated Development Environment) (Utvić et al., 2007). The TMX document consists of TU2 (Translation Unit) and TUV (Translation Unit Variant) elements, where each TUV is a segment in one of the languages. The following example illustrates a single aligned segment (TU) ...
Ranka Stanković, Cvetana Krstev, Biljana Lazić, Dalibor Vorkapić. "A bilingual digital library for academic and entrepreneurial knowledge management" in Proceeding of 10th International Forum on Knowledge Asset Dynamics — IFKAD 2015: Culture, Innovation and Entrepreneurship: connecting the knowledge dots, Bari, Italy, 10-12 June 2015, Bari : IFKAD (2015)
Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons

Mihailo Škorić (2017)

The goal of this paper is to draw attention to the possibility of using emoticon-riddled text on the web in language-neutral sentiment analysis. It introduces several innovations in the existing framework of research and tests their effectiveness. It also presents a software tool especially made for that purpose, explains how it builds a database with sentimental value of terms and offers the user manual. Finally, it presents a software tool that tests the new database and gives some examples ...

data mining, information extraction, emotions, text on the web

... the system would be language-independent as well. If it turns out to be valid, this method could allow machine learning the usage of huge corpus of texts that are pre-labeled with determiners. 1.1 Review of their former similar studies In 2005 a series of experiments with the classification of mood ...
... The main idea of this experiment is to prove that it is possible to: – build an inverted index of terms in a language-neutral way using a corpus of texts that contain known determiners. – automatically assign values to terms on positive-negative scale using those determiners, so that specific values ...
... successful and its final outcome satis- factory, three prerequisites should be met: – collected corpus must be organized in a certain way; – collected texts and messages must contain determiners that would help assign a value to a nearby term; – determiners must have a predetermined value. In the following ...
Mihailo Škorić. "Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons" in Infotheca, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2017.17.1.4
Terminological and lexical resources used to provide open multilingual educational resources

Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić (2016)

Open educational resources (OER) within BAEKTEL (Blending Academic and Entrepreneurial Knowledge in Technology enhanced learning) network will be available in different languages, mostly in the languages of Western Balkans, Russian and English. University of Belgrade (UB) hosts a central repository based on: BAEKTEL Metadata Portal (BMP), terminological web application for management, browse and search of terminological resources, web services for linguistic support (query expansion, information retrieval, OER indexing, etc.), annotation of selected resources and OER repository on local edX ...

otvoreni obrazovni resursi, leksički resursi, obrada prirodnih jezika, terminologija

... Internet. This component consists of morphological dictionaries, WordNet, domain specific terminological resources such as GeolISSterm, RudOnto, aligned texts in TMX format, corpora etc. Special attention will be given to Termi, newly developed application for terminology management. Keywords: Open ...
... development in scientific research which constantly produces new terms which need to be translated in other languages. There is a huge amount of texts available on the Internet which is growing daily and needs to be translated for different purposes, at the same time paying attention to terminology ...
Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić. "Terminological and lexical resources used to provide open multilingual educational resources" in The Seventh International Conference on eLearning (eLearning-2016), 29-30 September 2016, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2016)
SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian

Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković (2019)

У овом раду представљамо модел за избор добрих примера за речник српског језика и развој иницијалних компоненти модела. Метода која се користи заснива се на детаљној анализи различитих лексичких и синтактичких карактеристика у корпусу састављених од примера из пет дигитализованих свезака речника САНУ. Почетни скуп функција био је инспирисан сличним приступом и за друге језике. Дистрибуција карактеристика примера из овог корпуса упоређује се са карактеристиком дистрибуције узорака реченица ексцерпираних из корпуса који садрже различите текстове. Анализа је показала да ...

Српски, добри примери из речника, аутоматизација израде речника, издвајање својстава, Машинско учење

... distribution of examples from this corpus is compared with the feature distribution of sentence samples extracted from corpora comprising various texts. The analysis showed that there is a group of features which are strong indicators that a sentence should not be used as an example. The remaining ...
... dictionary is conceived as a thesaurus, meant primarily for native speakers. Its primary goal is to help understanding words from different kinds of texts (receptive use of dictionary). It covers a large portion of the vocabulary of the Serbian language, standard and vernacular, for the last 200 years ...
... from the beginning of the 19th century to the present day, as well as about 300-word collections (for details see Stanković et al., 2018). Written texts, as well as word collections, come from what used to be the SC language territory. According to the Style Guide2, lexicographers have to choose two ...
Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković. "SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian" in Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference , Lexical Computing CZ, s.r.o. (2019)
Multi-word Expressions for Abusive Speech Detection in Serbian

Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev (2020)

Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...

uvredljiv govor, govor mržnje, leksički izvori, višejezični leksikon, izrazi sa više reči

... some resources that will facilitate abusive language detection already exist. Serbian Morphological Dictionaries are certainly a staple in processing texts in Serbian (Krstev, 2008). In order to process implicitly abusive language, we need to take into account the usage of non-literal language, the rhetorical ...
... words in each category. 4.2 Lexical Representation of Multi-Word Abusive Expressions In order to enable the detection of abusive language in Serbian texts it is necessary to represent in a lexicon both simple- and multi-word abusive expressions. Lexical representation should address various aspects of ...
... complemented with finite-state automata (FSA) that deal with word order, model complements, etc. and that are used to retrieve verbal expressions in texts. So far three classes of V N were modelled, covering 68 ver- bal MWEs.2 This approach enables formulation of elaborate retrieal queries, similar to ...
Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev. "Multi-word Expressions for Abusive Speech Detection in Serbian" in Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Association for Computational Linguistics (2020)
Terminology Acquisition and Description Using Lexical Resources and Local Grammars

Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić (2015)

Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...

... con- version of the lexicon. Przepiorkowski and asso- ciates (2007) present results of automatic extraction of term definitions from unstructured texts in Bulgarian, Czech and Polish by use of regular grammars. There are also combinations of the two ap- proaches (Rodrıguez et al., 2007). Sag et al ...
... the corresponding DELAS word is assigned to the lemma. 4. For thresholds 80 and less steps 1 and 2 only are repeated. From a sample of domain texts and dictionar- ies we manually filtered 623 new terms from domains of mining, geology and e-learning and applied the described procedure for FST class ...
... transporting device measured from the vertical excavator rotation axis to the front edge of the caterpillar”. 4.2 Extraction of MWUs from domain texts The extraction of MWUs from a text is preceded by the retrieval of new simple word terms from it and their incorporation in the existing system ...
Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities

Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov (2024)

Овај рад представља активности на развоју корпуса ELEXIS-sr, српском додатку вишејезичном анотираном корпусу ELEXIS-а, који се састоји од семантичких анотација и репозиторија значења речи. ELEXIS је паралелни вишејезични анотирани корпус на десет европских језика, који може да се користи као вишејезички репер за евалуацију европских језика са мање и средње развијеним ресурсима. Фокус овог рада је на вишечланим изразима и именованим ентитетима, њиховом препознавању у скупу реченица ELEXIS-sr и поређењу са анотацијама на другим језицима. Разматрају се први кораци ...

полилексемске језинице, именовани ентитет, вишезначност значења речи, складиште смисла, LLOD

Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov. "Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović (2017)

Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...

... normalization for logarithm of tflog (the log-number of times the given word appears in a document) for calculat- ing semantic similarity of short texts. Graovac [6] applies lexical resources for A u t h o r P r o o f Improving Document Retrieval in Large Domain Specific Textual Databases 3 ...
... retrieved documents. As we have already pointed out, a Serbian keyword in a search query is almost always entered in the nominative singular, while in the texts that are searched it can occur in different inflectional forms. Thus, for languages such as Serbian, some kind of normalization of morphological forms ...
... in [13] showed that F -measure of recognition was 0.96 for types and 0.92 for tokens.3 3 Tokens are all occurrences (in this case, NEs) in a given texts, types are different occurrences. A u t h o r P r o o f Improving Document Retrieval in Large Domain Specific Textual Databases 9 Table 2 ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8
Indexing of textual databases based on lexical resources: A case study for Serbian

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović (2015)

In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...

... singular). However, a large number of other forms cannot be found by scanning the text, for example, the form zlata (genitive singular) cannot be aligned with the query keyword key zlato (nominative singular). The disadvantage of the system based on text scanning which affects the precision is especially ...
... problems of full text search in Serbian is its rich morphology, where the keyword for search is always entered in the first person singular, while in the texts that are searched it can occur in different inflectional forms. For languages such as Serbian, some kind of normalization of morphological forms has ...
... text from several records and fields in the database related to a particular document or project; 2. Lemmatizing and Part-Of-Speech tagging of all texts Di, where i = 1, . . . N and N is the size of text collection; 3. Recognizing NEs and assigning the chosen types to documents; 4. Selecting ungrammatical ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
Integracija heterogenih tekstualnih resursa

Ranka Stanković, Ivan Obradović (2007)

U radu je opisan pristup integraciji heterogenih tekstualnih resursa za srpski jezik uz pomoć jednog kompleksnog softverskog alata, razvijenog specijalno za ove potrebe. Opisani su struktura i osnovne komponente razvijenog sistema. Iznete su i mogućnosti unapređivanja resursa međusobnom razmenom informacija, koje pruža razvijeno integrisano okruženje. Konačno, opisana je i mogućnost primene integrisanih heterogenih resursa za proširenje upita, kao i pretraživanje tekstova uopšte, a naznačeni su i neki od pravaca daljeg razvoja.

... processing of texts, namely resource combining, in particular the combining of morphological information from the dictionaries and semantic information from the wordnet. Finally, we explain how integrated heterogeneous resources can be used for query expansion, as well as for searching texts in general ...
... the system we developed under the name of WS4LR (WorkStation for Lexical Resources), which synchronously handles corpora of Serbian, multilingual aligned corpora, a system of morphological dictionaries for Serbian, the Serbian wordnet and the multilingual ontology of proper names Prolex. We describe ...
... Journal on Information Science and Technology. Bucureş of the Romanian academy. t al. 2003 – Vitas, D. et al. (2003): Processing Serbian Written Texts: An Overview of Resources an (Hg.): Proceedings of the International Workshop on Balkan Language Resources and Tools. Thessaloniki, November 2003 ...
Ranka Stanković, Ivan Obradović. "Integracija heterogenih tekstualnih resursa" in Zbornik radova međunarodnog simpozijuma Razlike između bosanskog/bošnjačkog, hrvatskog i srpskog jezika, Graz, Austria, April 2007, - (2007)
A Description of Morphological Features of Serbian: a Revision using Feature System Declaration

Cvetana Krstev, Ranka Stanković, Vitas Duško (2010)

In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...

Morphology, Lexicon, lexical database, Standards for LRs

... Serbian. In Informatica, No. 28, pp. 431-436, The Slovene Society Informatika, Ljubljana. Krstev, C. (2008). Processing of Serbian – Automata, Texts and Electronic dictionaries. Faculty of Philology, University of Belgrade, Belgrade. Krstev, C. and Vitas, D. (2009) An Effective Methode for Developing ...
... 373-376. Paumier, S. (2008). Unitex 2.1 User Manual, http://www-igm.univ-mlv.fr/~unitex/UnitexManual2.1 .pdf. Popović, Z. (2009) Taggers Applied On Texts On Serbian Language, Language Tools And Machine Learning. In Infotheca, Vol. X, No. 2, (to appear). Przepiórkowski, A. and Woliński, M. (2003) A ...
Cvetana Krstev, Ranka Stanković, Vitas Duško. "A Description of Morphological Features of Serbian: a Revision using Feature System Declaration" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta : European Language Resources Association (2010)
Увођење доменских и семантичких маркера за област рударства у српске електронске речнике

Иван Обрадовић, Александра Томашевић, Ранка Станковић, Биљана Лазић (2017)

... Jurafsky & James H. Martin, Speech and Lan- guage Processing, Draft of November 7, 2016. Крстев 2008: Cvetana Krstev, Processing of Serbian – Automata, Texts and Elec- tronic dictionaries Faculty of Philology, University of Belgrade, Belgrade. Крстев и др., 2008: Cvetana Krstev, DuškoVitas, Gordana Pav ...
... retrieval and extraction, and proposesanexpansion of the set of the semarkers for the field of mining. A brief description of the developed corpus of texts from the field of mining is also given, for the search of which the proposed markers are extremely important. ...
Иван Обрадовић, Александра Томашевић, Ранка Станковић, Биљана Лазић. "Увођење доменских и семантичких маркера за област рударства у српске електронске речнике" in Научни састанак слависта у Вукове дане - Српски језик и његови ресурси: теорија, опис и примене, Београд : Међународни славистички центар на Филолошком факултету, Филолошки факултет (2017). https://doi.org/10.18485/msc.2017.46.3.ch10
Српски језик у дигиталном добу -- The Serbian Language in the Digital Age

Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević (2012)

Serbian language

... among Serbian companies. e first corpus of contemporary Serbian, an electronic morphological dictionary of Serbian, aligned French- Serbian and English-Serbian corpora of literary texts, as well as different soware tools were developed in the scope of joint projects of the Faculty of Mathematics and ...
... the electronic dictionary of simple words was finalised, the development of a dictionary of compounds was initiated. Aligned French-Serbian andEnglish-Serbian corpora of literary texts were devel- oped, as well as local grammars for certain segments of Serbian (especially for named entities). Different ...
... countries aswell, but also due to the fact that in harmonisation with the European Union the source texts used are texts in English. ‚ euse of the Latin alphabet is increasing (except in official texts). ‚ Texts in Serbian are increasingly realised in digital form (use of computers, electronic publishing ...
Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević. "Српски језик у дигиталном добу -- The Serbian Language in the Digital Age" in META-NET White Paper Series, G. Rehm, H. Uszkoreit (eds.), Springer (2012)

Претрага

92 items

Improvement of geodatabase queries within GeolISS cite

Transformer-Based Composite Language Models for Text Evaluation and Classification cite

Parallel Stylometric Document Embeddings with Deep Learning Based Language Models in Literary Authorship Attribution cite

Towards translation of educational resources using GIZA++ cite

Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++ cite

A Data Driven Approach for Raw Material Terminology cite

Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model cite

A bilingual digital library for academic and entrepreneurial knowledge management cite

Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons cite

Terminological and lexical resources used to provide open multilingual educational resources cite

SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian cite

Multi-word Expressions for Abusive Speech Detection in Serbian cite

Terminology Acquisition and Description Using Lexical Resources and Local Grammars cite

Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities cite

Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources cite

Indexing of textual databases based on lexical resources: A case study for Serbian cite

Integracija heterogenih tekstualnih resursa cite

A Description of Morphological Features of Serbian: a Revision using Feature System Declaration cite

Увођење доменских и семантичких маркера за област рударства у српске електронске речнике cite

Српски језик у дигиталном добу -- The Serbian Language in the Digital Age cite

Improvement of geodatabase queries within GeolISS

Transformer-Based Composite Language Models for Text Evaluation and Classification

Parallel Stylometric Document Embeddings with Deep Learning Based Language Models in Literary Authorship Attribution

Towards translation of educational resources using GIZA++

Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++

A Data Driven Approach for Raw Material Terminology

Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model

A bilingual digital library for academic and entrepreneurial knowledge management

Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons

Terminological and lexical resources used to provide open multilingual educational resources

SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian

Multi-word Expressions for Abusive Speech Detection in Serbian

Terminology Acquisition and Description Using Lexical Resources and Local Grammars

Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities

Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

Indexing of textual databases based on lexical resources: A case study for Serbian

Integracija heterogenih tekstualnih resursa

A Description of Morphological Features of Serbian: a Revision using Feature System Declaration

Увођење доменских и семантичких маркера за област рударства у српске електронске речнике

Српски језик у дигиталном добу -- The Serbian Language in the Digital Age