Претрага
108 items
-
Knowledge and Rule-Based Diacritic Restoration in Serbian
In this paper we present a procedure for the restoration of diacritics in Serbian texts written using the degraded Latin alphabet. The procedure relies on the comprehensive lexical resources for Serbian: the morphological electronic dictionaries, the Corpus of Contemporary Serbian and local grammars. Dictionaries are used to identify possible candidates for the restoration, while the dataobtainedfromSrpKorandlocalgrammarsassistsinmakingadecisionbetween several candidates in cases of ambiguity. The evaluation results reveal that,dependingonthetext,accuracyrangesfrom95.03%to99.36%,whilethe precision (average 98.93%) is always higher than the recall (average 94.94%).... Rule-Based Diacritic Restoration in Serbian Cvetana Krstev, Ranka Stanković, Duško Vitas Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Knowledge and Rule-Based Diacritic Restoration in Serbian | Cvetana Krstev, Ranka Stanković, Duško Vitas | Proceedings of ...Cvetana Krstev, Ranka Stanković, Duško Vitas. "Knowledge and Rule-Based Diacritic Restoration in Serbian" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018): 41-51
-
Resource-based WordNet Augmentation and Enrichment
In this paper we present an approach to support production of synsets for SerbianWordNet(SerWN)byadjustingPrincetonWordNet(PWN)synsetsusing several bilingual English-Serbian resources. PWN synset definitions were automatically translated and post-edited, if needed, while candidate literals for Serbian synsets were obtained automatically from a list of translational equivalents compiled form bilingual resources. Preliminary results obtained from a setof1248selectedPWNsynsetsshowthattheproducedSerbiansynsetscontain 4024 literals, out of which 2278 were offered by the system we present in this paper, whereas experts added the remaining 1746. Approximately one half of ...... Ivan Obradović, Marko Vitas, Cvetana Krstev Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Resource-based WordNet Augmentation and Enrichment | Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev | Proceedings of the Third In ...
... bg.ac.rs Miljana Mladenović College for Preschool Teachers Bujanovac, Serbia ml.miljana@gmail.com Cvetana Krstev and Marko Vitas Faculty of Philology University of Belgrade, Serbia cvetana@matf.bg.ac.rs vitas.marko@gmail.com Abstract In this paper we present an approach to support production ...
... engleski. Narodna biblioteka Srbije. Krstev, C., Stanković, R., Vitas, D., and Obradović, I. (2006). WS4LR: A Workstation for Lexical Resources. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006, pages 1692–1697. Krstev, C., Stanković, R., and Vitas, ...Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev. "Resource-based WordNet Augmentation and Enrichment" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018)
-
Terminology Acquisition and Description Using Lexical Resources and Local Grammars
Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...... Resources and Local Grammars Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Terminology Acquisition and Description Using Lexical Resources and Local Grammars | Cvetana Krstev, Ranka Stanković, Ivan ...
... acquisition and description using lexical resources and local grammars Cvetana Krstev Ranka Stanković Ivan Obradović Biljana Lazić University of University of University of University of Belgrade Belgrade Belgrade Belgrade cvetana @matf.bg.ac.rs ranka @rgf.bg.ac.rs ivan.obradovic @rgf.bg ...
... a considerable size: they have about 135,000 lemmas generating more than 5 million forms and 13,000 compound lemmas, that is, multi-word units (Krstev, 2008). The number of simple lemmas by Part-Of-Speech (POS) is de- picted in Figure 2 (left). Figure 2: Statistics of lemmas and inflectional ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
-
A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals
This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of TMX documents generated from aligned parallel articles residing in multilingual digital libraries of e-journals. The queries initiated by a simple or multiword keyword, in Serbian or English, can be expanded by Bibliša, both semantically and morphologically, using different supporting monolingual and multilingual resources, such as wordnets and electronic dictionaries. The tool operates within a complex system composed ...... Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals | Ranka Stanković, Cvetana Krstev, Ivan Obradović, ...
... of multilingual digital libraries of e-journals Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić University of Belgrade, Serbia Studenski trg 1, 11000 Belgrade E-mail: ranka@rgf.bg.ac.rs, cvetana@matf.bg.ac.rs, ivano@rgf.bg.ac.rs, aleksandra@unilib.bg.ac.rs, misko@matf ...
... submitted to our collection of documents. The most important resources are Serbian morphological dictionaries of simple words and multi-word units [Krstev, 2008]. These comprehensive resources were developed and are being mainly used within two corpus processing systems: Unitex and Nooj. However ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić. "A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals" in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, May 2012, Istanbul, Turkey, Istanbul, Turkey : European Language Resources Association (2012)
-
Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection
Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Duško Vitas, Mihailo Škorić, Milica Ikonić Nešić (2022)In this paper we present the Serbian part of the ELTeC multilingual corpus of novels written in the time period 1840-1920. The corpus is being built in order to test various distant reading methods and tools with the aim of re-thinking the European literary history. We present the various steps that led to the production of the Serbian sub-collection: the novel selection and retrieval, text preparation, structural annotation, POS-tagging, lemmatization and named entity recognition. The Serbian sub-collection was published ...Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Duško Vitas, Mihailo Škorić, Milica Ikonić Nešić. "Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection" in Proceedings of the Language Resources and Evaluation Conference, June 2022, Marseille, France, European Language Resources Association (2022)
-
On the compatibility of lexical resources for NooJ
Lexical resources for many languages are provided for the NooJ linguistic development environment. Meta-data descriptions of morphosyntactic and semantic properties of these languages and their resources are a mandatory part of each language module. In this paper we analyze how well the meta-data actually describe resources for a chosen subset of languages and to what extent are they compatible across languages to support multilingual processing. We show that there is place for improvement in both directions.... Stanković, Miloš Utvić, Duško Vitas, Cvetana Krstev, Ivan Obradović Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] On the compatibility of lexical resources for NooJ | Ranka Stanković, Miloš Utvić, Duško Vitas, Cvetana Krstev, Ivan Obradović | Automatic Processing ...
... the 2011 International NooJ Conference 1 ON THE COMPATIBILITY OF LEXICAL RESOURCES FOR NOOJ RANKA STANKOVIĆ, MILOŠ UTVIĆ, DUŠKO VITAS, CVETANA KRSTEV AND IVAN OBRADOVIĆ Abstract Lexical resources for many languages are provided for the NooJ linguistic development environment. Meta-data ...
... Proceedings of the Fourth International Conference on Language Resources and Evaluation, LREC'10, Paris: ELRA. (http://nl.ijs.si/ME/V4/) Erjavec, T., Krstev, C., Petkevič, V., Simov, K., Tadić, M., Vitas, D., 2003, “The MULTEXT -East Morphosyntactic Specifications for Slavic Languages”, in Proc. of the ...Ranka Stanković, Miloš Utvić, Duško Vitas, Cvetana Krstev, Ivan Obradović. "On the compatibility of lexical resources for NooJ" in Automatic Processing of Various Levels of Linguistic Phenomena: Selected Papers from the 2011 International Nooj Conference, Cambridge Scholars Publishing (2012): 96-108
-
The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach ...... Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] The Dictionary of the Serbian Academy: from the Text to the Lexical Database | Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo | Proceedings ...
... Stanković1, Rada Stijović2, Duško Vitas1, Cvetana Krstev1, Olga Sabo2 1University of Belgrade, 2Institute for Serbian Language, Serbian Academy of Sciences and Arts E-mail: ranka.stankovic@rgf.bg.ac.rs, rada.stijovic@isj.sanu.ac.rs, vitas@matf.bg.ac.rs, cvetana@matf.bg.ac.rs, olga011@yahoo.com Abstract ...
... in recent years were these ideas revitalized, and various possibilities of updating the work on this vocabulary have since been considered (Vitas & Krstev, 2015; Ivanović et al. 2016). The digitization (which is also the topic of the present paper) of the published volumes and raw materials (lexicographic ...Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo. "The Dictionary of the Serbian Academy: from the Text to the Lexical Database" in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana : Ljubljana University Press, Faculty of Arts (2018)
-
Using Lexical Resources for Irony and Sarcasm Classification
The paper presents a language dependent model for classification of statements into ironic and non-ironic. The model uses various language resources: morphological dictionaries, sentiment lexicon, lexicon of markers and a WordNet based ontology. This approach uses various features: antonymous pairs obtained using the reasoning rules over the Serbian WordNet ontology (R), antonymous pairs in which one member has positive sentiment polarity (PPR), polarity of positive sentiment words (PSP), ordered sequence of sentiment tags (OSA), Part-of-Speech tags of words (POS) ...... and Sarcasm Classification Full Paper Miljana Mladenović Milenijum III Vranje, Serbia ml.miljana@gmail.com Cvetana Krstev University of Belgrade, Faculty of Philology Belgrade, Serbia cvetana@matf.bg.ac.rs Jelena Mitrović University of Passau, Faculty of Computer Science and Mathematics Passau, Germany ...
... processing; Lexical semantics; KEYWORDS Computational irony, Verbal irony, Verbal Sarcasm, WordNet ACM Reference format: Miljana Mladenović, Cvetana Krstev, Jelena Mitrović, and Ranka Stanković. 2017. Using Lexical Resources for Irony and Sarcasm Classification. In Proceedings of BCI ’17, Skopje, Macedonia ...
... of the ACL: Human Language Technologies: short papers – Volume 2. Association for Computational Linguistics, 564–568. [10] Matthieu Constant, Cvetana Krstev, and Duško Vitas. 2015. Hybrid Lexical Tagging in Serbian. In Proc. of 7th Language & Technology Conference. Fundacja Uniwersytetu im. A. Mickiewicza ...Miljana Mladenović, Cvetana Krstev, Jelena Mitrović, Ranka Stanković. "Using Lexical Resources for Irony and Sarcasm Classification" in Proceedings of the 8th Balkan Conference in Informatics (BCI '17), New York, NY, USA, : ACM (2017). https://doi.org/
-
Automatic construction of a morphological dictionary of multi-word units
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion... dictionary of multi-word units Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Automatic construction of a morphological dictionary of multi-word units | Cvetana Krstev, Ranka Stanković, Ivan ...
... publications. - The Repository is available at: www.dr.rgf.bg.ac.rs Automatic Construction of a Morphological Dictionary of Multi-Word Units Cvetana Krstev1, Ranka Stanković2, Ivan Obradović2, Duško Vitas3, and Miloš Utvić1 1 Faculty of Philology, University of Belgrade 2 Faculty of Mining ...
... produced need not be revised. References 1. Courtois, B., Silberztein, M.: Dictionnaires électroniques du français. Larousse, Paris (1990) 2. Krstev, C.: Processing of Serbian - Automata, Texts and Electronic Dictionaries. Faculty of Philology, University of Belgrade, Belgrade (2008) 3. Savary ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić. "Automatic construction of a morphological dictionary of multi-word units" in Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010, Springer (2010): 226-237. https://doi.org/10.1007/978-3-642-14770-8_26
-
Integrisanje heterogenih leksičkih resursa
Osnovna aktivnost Grupe za obradu prirodnih jezika na Matematičkom fakulteta Univeziteta u Beogradu je usmerena na razvoj različitih resursa za obradu srpskog jezika. Među njima su posebno značajni sistem morfoloških rečnika srpskog jezika razvijenih u okviru mreže RELEX [1] i semantička mreža (tipa wordnet) za srpski jezik razvijena u okviru međunarodnog projekta Balkanet. Radi se o dva heterogena leksička resursa, razvijena na osnovu sasvim različitih modela, koji samim tim sadrže i različite vrste leksičkih informacija. Integracijom ovih resursa, informacije ...... leksičkih resursa Ranka Stanković, Cvetana Krstev, Duško Vitas, Ivan Obradović, Gordana Pavlović-Lažetić Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Integrisanje heterogenih leksičkih resursa | Ranka Stanković, Cvetana Krstev, Duško Vitas, Ivan Obradović, Gordana ...
... available at: www.dr.rgf.bg.ac.rs Integrisanje heterogenih leksičkih resursa Ranka Stanković, Rudarsko-geološki fakultet, Beograd Cvetana Krstev, Filološki fakultet, Beograd Duško Vitas, Matematički fakultet, Beograd Ivan Obradović, Rudarsko-geološki fakultet, Beograd Gordana Pavl ...
... [5] Vossen, P. (ed.) (1998). EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Dordrecht: Kluwer Academic Publishers. [6] Krstev C., et al. (2004) Combining Heterogeneous Lexical Resources, Proceedings of LREC2004, 4th International Conference On Language Resources And Evaluation ...Ranka Stanković, Cvetana Krstev, Duško Vitas, Ivan Obradović, Gordana Pavlović-Lažetić. "Integrisanje heterogenih leksičkih resursa" in Festivalski katalog 11. Festivala informatičkih dostignuća INFOFEST 2004, 26th September - 2nd October, 2004, Budva, Montenegro, INFOFEST (2004)
-
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources
Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...... Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources | Ranka Stanković, Cvetana Krstev, Ivan Obradović ...
... Address Belgrade, Serbia Email ranka@rgf.bg.ac.rs Author Family Name Krstev Particle Given Name Cvetana Prefix Suffix Division Faculty of Philology Organization University of Belgrade Address Belgrade, Serbia Email cvetana@matf.bg.ac.rs Author Family Name Obradović Particle Given Name Ivan Prefix ...
... Ranka Stanković1(B), Cvetana Krstev2, Ivan Obradović1, and Olivera Kitanović1 1 Faculty of Mining and Geology, University of Belgrade, Belgrade, Serbia {ranka,ivan.obradovic,olivera.kitanovic}@rgf.bg.ac.rs 2 Faculty of Philology, University of Belgrade, Belgrade, Serbia cvetana@matf.bg.ac.rs Abstract ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8
-
SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian
Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković (2019)У овом раду представљамо модел за избор добрих примера за речник српског језика и развој иницијалних компоненти модела. Метода која се користи заснива се на детаљној анализи различитих лексичких и синтактичких карактеристика у корпусу састављених од примера из пет дигитализованих свезака речника САНУ. Почетни скуп функција био је инспирисан сличним приступом и за друге језике. Дистрибуција карактеристика примера из овог корпуса упоређује се са карактеристиком дистрибуције узорака реченица ексцерпираних из корпуса који садрже различите текстове. Анализа је показала да ...Српски, добри примери из речника, аутоматизација израде речника, издвајање својстава, Машинско учење... Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian | Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško ...
... Stijović2, Cvetana Krstev1, Duško Vitas1, Aleksandra Marković2 1 University of Belgrade, Studentski trg 1, Belgrade, Serbia 2 Institute for Serbian Language, SASA, Knez Mihailova 36, Belgrade, Serbia E-mail: ranka@rgf.rs, branislava.sandrih@fil.bg.ac.rs, rada.stijovic@isj.sanu.ac.rs, cvetana@matf.bg ...
... University of Belgrade). Vitas D. & Krstev C. (2015). Blueprint for the computerized dictionary of the Serbian language [Nacrt za informatizovani rečnik srpskog jezika]. Naučni sastanak slavista u Vukove dane, 44(3), pp. 105–116. (In Serbian, Cyrillic.) Vitas, D. & Krstev, C. (2012). Processing of Corpora ...Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković. "SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian" in Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference , Lexical Computing CZ, s.r.o. (2019)
-
Combining Heterogeneous Lexical Resources
... Heterogeneous Lexical Resources Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Combining Heterogeneous Lexical Resources | Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan ...
... publications. - The Repository is available at: www.dr.rgf.bg.ac.rs Combining Heterogeneous Lexical Resources Cvetana Krstev, professor, Faculty of Philology, Belgrade, cvetana@matf.bg.ac.yu Duško Vitas, professor, Faculty of Mathematics, Belgrade, vitas@matf.bg.ac.yu Ranka Stankoviæ , assistant ...Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić. "Combining Heterogeneous Lexical Resources" in Proceedings of the Fourth Interantional Conference on Language Resources and Evaluation, Lisabon, Portugal , May 2004, vol. 4, ELRA - European Language Resources Association (2004)
-
Увођење доменских и семантичких маркера за област рударства у српске електронске речнике
... and Lan- guage Processing, Draft of November 7, 2016. Крстев 2008: Cvetana Krstev, Processing of Serbian – Automata, Texts and Elec- tronic dictionaries Faculty of Philology, University of Belgrade, Belgrade. Крстев и др., 2008: Cvetana Krstev, DuškoVitas, Gordana Pavlović-Lažetić, “Re- sources and ...
... Peter Lang: Frankfurt am Main, pp. 3–17. Крстев и др., 2013: Cvetana Krstev, Ivan Obradović, Miloš Utvić, DuškoVitas, “A system for namedentity recognition based on local grammars”, In: J Logic Computation 24 (2), pp. 473–489. Крстев/Лазић, 2015: Цветана Крстев, Биљана Лазић, „Глаголи у кухињи и за ...
... за столом”, Научни састанак слависта у Вукове дане, 44/3, 117–136. Крстев и др. 2015: Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić “Terminology Acquisition and Description Using Lexical Resources and Local Grammars”, In: Proc. of the 11th Conferenceon Terminology and Artificial ...Иван Обрадовић, Александра Томашевић, Ранка Станковић, Биљана Лазић. "Увођење доменских и семантичких маркера за област рударства у српске електронске речнике" in Научни састанак слависта у Вукове дане - Српски језик и његови ресурси: теорија, опис и примене, Београд : Међународни славистички центар на Филолошком факултету, Филолошки факултет (2017). https://doi.org/10.18485/msc.2017.46.3.ch10
-
Bilingual lexical extraction based on word alignment for improving corpus search
Jelena Andonovski, Branislava Šandrih, Olivera Kitanović. "Bilingual lexical extraction based on word alignment for improving corpus search" in The Electronic Library, Emerald (2019). https://doi.org/10.1108/EL-03-2019-0056
-
Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis
U ovom radu predstavljen je model koji omogućava prikupljanje, pripremu, opis metapodataka, upravljanje i eksploataciju, uključujući pretragu punog teksta dokumenata iz domena kriminalistike napisanih na srpskom jeziku. Predloženi pristup primenjuje se na veb portalu koji sakuplja različite tekstove nastale iz časopisa Akademije za kriminalistiku i policijske studije, Krivičnog zakona Srbije, konferencija „Tara“ i „Reiss“, kao i iz nekih doktorskih disertacija vezanih za ovu oblast istraživanje. Nakon obrade teksta, korpus koji sadrži preko 5500 stranica običnog teksta, kreiran je i ...... Cardiff 9 Cvetana Krstev, Duško Vitas, “Corpus and Lexicon - Mutual Incompletness ”, in Proceedings of the Corpus Linguistics Conference, 14-17 July 2005, Birmingham, eds. Pernilla Danielsson and Martijn Wagenmakers, ISSN 1747-9398, http://www.corpus.bham.ac.uk/PCLC/, 2005 10 Cvetana Krstev, Ranka Stanković ...
... resources and from the other with Omeka KPA digital library. 15 Cvetana Krstev. Processing of Serbian – Automata, Text and Electronic Dictionaries, Faculty of philology, Belgrade, 2008 16 Duško Vitas, Cvetana Krstev, Ivan Obradović, Ljubomir Popović, Gordana Pavlović-Lažetić”, An Processing ...
... Analysis as a next step in a study of forensic texts. REFERENCES 1. Cvetana Krstev. Processing of Serbian – Automata, Text and Electronic Dictionaries, Faculty of philology, Belgrade, 2008. 2. Cvetana Krstev, Duško Vitas, “Corpus and Lexicon - Mutual Incompletness ”, in Proceedings of the ...Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović. "Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis" in International Scientific Conference “Archibald Reiss Days” Thematic Conference Proceedings Of International Significance, Belgrade, 7-9 November 2017, Academy Of Criminalistic And Police Studies Belgrade (2017)
-
Managing mining project documentation using human language technology
Purpose: This paper aims to develop a system, which would enable efficient management and exploitation of documentation in electronic form, related to mining projects, with information retrieval and information extraction (IE) features, using various language resources and natural language processing. Design/methodology/approach: The system is designed to integrate textual, lexical, semantic and terminological resources, enabling advanced document search and extraction of information. These resources are integrated with a set of Web services and applications, for different user profiles and use-cases. Findings: The ...Digital libraries, Information retrieval, Data mining, Human language technologies, Project documentationAleksandra Tomašević, Ranka Stanković, Miloš Utvić, Ivan Obradović, Božo Kolonja . "Managing mining project documentation using human language technology" in The Electronic Library (2018). https://doi.org/10.1108/EL-11-2017-0239
-
Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking
U radu se prikazuju rezultati istraživanja vezanih za pripremu paralelnih korpusa, fokusirajući se na transformaciju u RDF grafove koristeći NLP Interchange Format (NIF) za lingvističku anotaciju. Pružamo pregled paralelnog korpusa koji je korišćen u ovom studijskom slučaju, kao i proces označavanja delova govora, lematizacije i prepoznavanja imenovanih entiteta (NER). Zatim opisujemo povezivanje imenovanih entiteta (NEL), konverziju podataka u RDF, i uključivanje NIF anotacija. Proizvedene NIF datoteke su evaluirane kroz istraživanje triplestore-a korišćenjem SPARQL upita. Na kraju, razmatra se povezivanje Linked ...paralelni korpusi, povezivanje imenovanih entiteta, prepoznavanje imenovanih entiteta, NER, NEL, povezani podaci, NIF, VikipodaciRanka Stanković, Milica Ikonić Nešić, Olja Perisic, Mihailo Škorić, Olivera Kitanović. "Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
-
Претрага корпуса заснована на употреби екстерних лексичких ресурса путем веб-сервиса
У раду се разматра хибридни приступ претрази корпуса, илустрован на примеру алатки OCWB и NoSketch Engine, примењених на специјални корпус из области рударства (РудКор) и Корпус савременог српског језика (СрпКор). Разматрани приступ комбинује постојеће могућности алатки OCWB и NoSketch Engine, које своју претрагу заснивају на лингвистичкој анотацији корпуса, са новим могућностима претраге у виду консултовања екстерних језичких ресурса (морфолошки електронски речници српског језика и лексичка база података Српски ворднет). Хибридни приступ је реализован надоградњом вебсучеља која поменуте алатке користе ...... CQP_ Tutorial. pdf Крстев и др. 2004: Cvetana Krstev, Gordana Pavlović-Lažetić, Duško Vitas and Ivan Obradović, “Using Textual and Lexical Resources in Developing Ser- bian Wordnet”, Romanian Journal of Information Science and Technology, 7/1–2, 147–161. Крстев 2008: Cvetana Krstev, Processing of Serbian ...
... за анотацију, настала је као дериват система морфолошких електронских речника српског језика (у даљем тексту: СМР) чији су аутори Цветана Крстев и Душко Витас (Крстев 2008). Делимична морфолошка анотација у корпусу СрпКор2013 је реализована позиционим атрибутима pos (ознака врсте речи) и lemma (лема) ...
... Serbian — Automata, Texts and Elec- tronic Dictionaries. Belgrade: Faculty of Philology, University of Belgrade. Крстев и др. 2018: Cvetana Krstev, Ranka Stanković, Duško Vitas, ”Knowl- edge and Rule-Based Diacritic Restoration in Serbian”, In: Proceedings of the Third International Conference Computational ...Милош Утвић, Ранка Станковић, Александра Томашевић, Михаило Шкорић, Биљана Лазић. "Претрага корпуса заснована на употреби екстерних лексичких ресурса путем веб-сервиса" in Научни састанак слависта у Вукове дане - Vol. 48/3 Српски језик и његови ресурси, Међународни славистички центар, Филолошки факултет, Универзитет у Београду (2019). https://doi.org/10.18485/msc.2019.48.3.ch12
-
Речник САНУ као база терминолошких речника (на примеру речника кулинарства)
... Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić and Aleksandra Trtovac. „Rule based automatic multi-word term extraction and lemmatization.” In: 10th edition of the Language Resources and Evaluation Conference (LREC), 23-28 May 2016, Portorož. 7. Ranka Stanković, Ivan Obradović, Cvetana Krstev ...
... док се електронски речници користе у истраживањима језика и креирању језичких алата. Морфолошке речнике српског језика развили су проф. др Цветана Крстев и проф. др Душко Витас уз помоћ Групe за језичке технологије Универзитета у Београду. Анализа обрађеног корпуса обухватила је екстракцију речи ...
... Катарина, Велики српски кувар. Нови Сад, 1904. Речник српскохрватског књижевног и народног језика. Београд, 1959–2014, I–XIX. Литература 1. Krstev, Cvetana, Duško Vitas and Gordana Pavlović-Lažetić. „Resources and methods in the morphosyntactic processing of Serbo-Croatian.” In Gerhild Zybatow et ...Рада Стијовић, Олга Сабо, Ранка Станковић. "Речник САНУ као база терминолошких речника (на примеру речника кулинарства)" in Словенска терминологија данас, Београд : Српска академија наука и уметности (2017)