Претрага
2205 items
-
Softverski alati za korišćenje resursa za srpski jezik
Ivan Obradović, Ranka Stanković (2008)... manipulation of lexi- cal resources (for the time being only the update and search of SWN is envisaged), and also offer information of the Human Language Technology Group and the developed software for lexical re- sources, namely WS4LR and WS4QE. Figure 10. The main menu of the WS4QE web application The ...
... ticular language systematized and organized in a specific manner, are developed in various for- mats. Thus, for example, several different types of e-dictionaries, along with other lexical and textual resources, are being developed within the Human Language Technology Group, which SOFTWARE TOOLS ...
... Hu- man Language Technology Group, can be further developed and used with the help of the WS4LR software tool and the WS4QE web application. In Section Two we will give a brief description of lexical resources for Serbian, in Section Three the main functionalities of the WS4LR tool, and in Section ...Ivan Obradović, Ranka Stanković. "Softverski alati za korišćenje resursa za srpski jezik" in INFOteka: časopis za informatiku i bibliotekarstvo, Belgrade, Serbia : Zajednica biblioteka univerziteta u Srbiji (2008)
-
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... Terminology and Artificial Intelligence, Granada, Spain, 4–6 November 2015; Volume 1495, pp. 81–89. 35. Stankovic, R.; Šandrih, B.; Krstev, C.; Utvić, M.; Škorić, M. Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian. In Proceedings of The 12th Language Resources ...
... but the same approach and developed software solutions can be used for other areas, which is certainly one of the further directions of activity. It should also be noted that the approach can be applied to other languages, depending on the available data and not on the language itself. The vast amount ...
... Utvić, M. Developing Termbases for Expert Terminology under the TBX Standard. In Natural Language Processing for Serbian-Resources and Applications, Proceedings of the 35th Anniversary of Computational Linguistics in Serbia, Belgrade, Serbia, 12 November 2013; Pavlović Lažetić, G., Vitas, D., Krstev ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
-
Frequency and Length of Syllables in Serbian
Marija Radojičić, Biljana Lazić, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Ján Mačutek, Lívia Leššová (2019)Basic analyses of several properties of syllables (the rank-frequency distribution, the distribution of length, and the relation between length and frequency) in Serbian is presented. The syllabification algorithm used combines the maximum onset principle and the sonority hierarchy. Results indicate that syllables behave similarly to words as far as mathematical models are concerned, but values of parameters in models for syllables are quite different from those for words.... (1996). A comparison of lexeme and speech syllables in Dutch. Journal of Quantitative Linguistics 3, 8-28. Schiller, N.O., Meyer, A.S., Levelt, W.J.M. (1997). The syllabic structure of spoken words: Evidence from the syllabification of intervocalic consonants. Language and Speech 40, 103-140. Stanojčić ...
... framework of quantitative linguistics in several other papers. However, borders between syllables were determined either using language-specific rules (Obradović et al., 2010, for Serbian; Meštrović et al, 2015, for Croatian), or using the approach suggested by Pulgram (1970) and modified by Lehfeldt (1971) ...
... word length. In: Grzybek, P. (ed.), Contributions to the Science of Text and Language: 117-156. Dordrecht: Springer. Best, K.-H. (2005). Wortlänge. In: Köhler, R., Altmann, G., Piotrowski, R.G. (eds.), Quantitative Linguistics. An International Handbook: 260-273. Berlin, New York: de Gruyter. ...Marija Radojičić, Biljana Lazić, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Ján Mačutek, Lívia Leššová. "Frequency and Length of Syllables in Serbian" in Glottometrics (2019)
-
An Approach to Development of Bilingual Lexical Resources
... Ranka, Obradović Ivan, Trtovac Aleksandra | Proceedings of the Fifth Balkan Conference in Informatics BCI 2012, Workshop on Computational Linguistics and Natural Language Processing of Balkan Languages – CLoBL 2012, September 2012 | 2012 | | http://dr.rgf.bg.ac.rs/s/repo/item/0001462 Дигитални репозиторијум ...
... Conference on Language Resources and Evaluation (LREC), (Istanbul, Turkey, May 23-25, 2012). [7] Stanković, R., Obradović, I., Krstev, C., Vitas, D. 2011. Production of Morphological Dictionaries of Multi-Word Units Using a Multipurpose Tool. In: Proceedings of the Computational Linguistics-Applications ...
... collocations from documents and generates multiword keywords. The paper also outlines linguistic criteria used for building language resources for French, Italian, and German, and the use of multi-term descriptors as a means to better identify the content. The Human Language Technology group at the ...Stanković Ranka, Obradović Ivan, Trtovac Aleksandra. "An Approach to Development of Bilingual Lexical Resources" in Proceedings of the Fifth Balkan Conference in Informatics BCI 2012, Workshop on Computational Linguistics and Natural Language Processing of Balkan Languages – CLoBL 2012, September 2012, Novi Sad : BCI (2012)
-
Using technology for knowledge transfer between academia and enterprises
Ivan Obradović, Ranka Stanković (2014)... thematic content, and entrepreneurial: case studies, best practice examples, expert presentations and software demonstrations. Language resources supporting the multilinguality of the platform, terminology and its search and browse functions are lexical and textual resources and grammars. Impl ...
... materials, video lectures, thematic content and the like, supported by evaluation tools, and entrepreneurial, such as case studies, best practice examples, expert presentations and software demonstrations; • Language resources – lexical and textual resources and grammars to support the multilinguality ...
... the University of Belgrade, Faculty of Mining and Geology (FMG), the Learning management system encompassing several specific content and learning management software tools, and Development tools, software to support the development, use, reuse and delivery of learning content; • Learning resources ...Ivan Obradović, Ranka Stanković. "Using technology for knowledge transfer between academia and enterprises" in Knowledge and Management Models for Sustainable Growth, Proc. of IFKAD 2014, 9th International Forum on Knowledge Asset Dynamics, 11-13 June 2013, Matera, Italy, Bari : IFKAD (2014)
-
Open Educational Resources in Serbia
... published four books and more than 100 scientific papers related to applications of artificial intelligence methods and tools, development of language resources and tools, open educational resources and e-learning application in HE. He participated in numerous domestic and international projects ...
... degree in 2000 and Ph.D. 2009 at the Department for Computer science, Faculty of Mathematics University of Belgrade. Professor Stanković is interested in e-learning, open education, semantic web, information systems, database modelling, geoinformation management and artificial intelligence. Her current ...
... incorporate knowledge from various language and lexical resources. She is head of Computer Centre for the Mining department, Chairman of Technical comity A037 Terminology in Institute for Standardisation of Serbia and vice president of Language Resources and Technologies Society (JERTEH). She actively ...Ivan Obradović, Ranka Stanković, Marija Blagojević, Danijela Milošević. "Open Educational Resources in Serbia" in Current State of Open Educational Resources in the “Belt and Road” Countries, Springer Singapore (2020). https://doi.org/10.1007/978-981-15-3040-1_10
-
An Integrated Environment for Management and Exploitation of Linguistic Resources
Ranka Stanković, Ivan Obradović (2009)... ly used for management and exploitation of linguistic resources. Both the tools and the resources were developed within the University of Belgrade Human Language Technology Group. The tools we describe are WS4LR, a software tool that has been devel- oped and used for solving different ...
... their volume and content, their maintenance and develop- ment became more and more complex. In addition to that, there was also a growing need to use several different re- sources for solving a particular task. Thus the development of a software tool for maintenance and integration of multi- ...
... “Improvement of Queries using a Rule Based Procedure for Inflection of Compounds and Phrases”, Polibits, Special section: Natural Language Processing, Journal of Research and Development in Computer Science and Engineering, ed. G. Sidorov (ed.), Centro Innovacion y Desarrollo Tecnologico ...Ranka Stanković, Ivan Obradović. "An Integrated Environment for Management and Exploitation of Linguistic Resources" in Proceedings of the International Multiconference on Computer Science and Information Technology, Computational Linguistics – Applications Workshop (CLA09), Mrągowo, Poland, October 2009, Piscataway : IEEE (2009)
-
Српски језик у дигиталном добу -- The Serbian Language in the Digital Age
Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević (2012)... Informatics, Language Technology Group, Univ. of Oslo: Stephan Oepen Пољска Poland Institute of Computer Science, Polish Academy of Sciences: Adam Przepiórkowski, Maciej Ogrodniczuk Univ. of Łódź: Barbara Lewandowska-Tomaszczyk, Piotr Pęzik Dept. of Computer Linguistics and Artificial Intelligence, Adam ...
... Sciences de l’Ingénieur and Institute for Multilingual and Multimedia In- formation: Joseph Mariani Evaluations and Language Resources Distribution Agency: Khalid Choukri Холандија Netherlands Utrecht Institute of Linguistics, Utrecht Univ.: Jan Odijk Computational Linguistics, Univ. of Groningen: Gertjan ...
... from the field of computational linguistics are present within com- puter science, electronics, library science, linguistics and psychology studies at the Universities of Belgrade and Novi Sad. Courses offered to students cover the ba- sic concepts of natural language processing, but they aim to educate ...Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević. "Српски језик у дигиталном добу -- The Serbian Language in the Digital Age" in META-NET White Paper Series, G. Rehm, H. Uszkoreit (eds.), Springer (2012)
-
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...... Constant, M., Krstev, C., and Vitas, D. (2018). Lexical analysis of serbian with conditional random fields and large-coverage finite-state resources. In Zygmunt Vetu- lani, et al., editors, Human Language Technology. Chal- lenges for Computer Science and Linguistics. LTC 2015. Lecture Notes in ...
... Building open- source textual analysis software compatible with the tei encoding scheme. In Proceedings of the 24th Pacific Asia Conference on Language, Information and Compu- tation, pages 389–398. Honnibal, M. and Montani, I. (2017). spaCy 2: Natural Language Understanding with Bloom Embeddings, ...
... May. European Language Resources Association (ELRA). Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., and McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system dem ...Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
-
A Knowledge-Based Approach to Mine Ventilation Planning in Yugoslav Mining Practice
Ventilation system analysis is a complex process based on the calculation and analysis of numerous parameters. These problems can be successfully solved by the SimVent numerical package, but a full understanding and use of the obtained results require the involvement of an experienced specialist in the ventilation field. The solution was found in the creation of a hybrid system INVENTS, whose knowledge base represents a formalization of the expert knowledge in the mine ventilation field. In this paper, we ...... parameters and when differences are identified, specific changes have to be made in the planning process. The software implementation of the approach relies on available mine ventilation software as well as on the possibilities offered by the coupling of various artificial intelligence (AI) methods ...
... planning and design where it interacts with CFD (Computational Fluid Dynamics) software. INVENTS is a complex structure composed from several integrated software packages: ResNet, SimVent and VENTEX (Fig. 3). These packages integrate both well known nu- merical optimization and various artificial intelligence ...
... 329–352. 11. E. Rich, Artificial Intelligence (McGraw-Hill, New York, 1983). 12. Z. Y. Yang, I. S. Lowndes and B. Denby, Genetic algorithm optimization of large UK coal mine ventilation network, Proc. 8th US Mine Ventilation Symp., 1998, 625-632. 13. Z. Y. Yang, I. S. Lowndes and B. Denby, Application ...Nikola Lilić, Ivan Obradović, Ranka Stanković. "A Knowledge-Based Approach to Mine Ventilation Planning in Yugoslav Mining Practice" in Mineral Resources Engineering 11 no. 4, Imperial College Press (2002): 361-382. https://doi.org/10.1142/S0950609802001014
-
An Italian-Serbian Sentence Aligned Parallel Literary Corpus
This article presents the construction and relevance of an Italian-Serbian sentence-aligned parallel corpus, delving into the aligned sentences in order to facilitate effective translation between the two languages. The parallel corpus serves as a valuable resource for language experts, researchers, and language enthusiasts, fostering a deeper understanding of linguistic nuances and cultural expressions. By bridging the gap between Serbian and Italian, this corpus opens new avenues for cross-cultural communication and collaboration, and ultimately contributes to the improvement of language-related ...Saša Moderc, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić. "An Italian-Serbian Sentence Aligned Parallel Literary Corpus" in Review of the National Center for Digitization, Belgrade : Faculty of Mathematics, University of Belgrade (2023). https://doi.org/10.5281/zenodo.11203388
-
The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach ...... model and software solution can be successfully used for the other volumes as well. Keywords: computer lexicography, lexical database, language resources, dictionary, Serbian language 1 Introduction The first volume of the Dictionary of the Serbo-Croatian Standard and Vernacular Language (re- ferred ...
... Mapping of the lexical entry markers to LexInfo and TEI enabled export of the lexical data to the mentioned formats. A software solution for the dictionary text analysis, parsing and lexical database population was developed and tested on the first and the last published volumes of the dictionary (which ...
... public. References Ahačič, K., Ledinek, N., & Perdih, A. (2015). Fran: The Next Generation Slovenian Dictionary Portal. In Natural Language Processing, Corpus Linguistics, Lexicography. Eight International Conference Bratislava, Slovakia, pp. 21-22. Berg, D. L., Gonnet, G. H., & Tompa, F. W. (1988) ...Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo. "The Dictionary of the Serbian Academy: from the Text to the Lexical Database" in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana : Ljubljana University Press, Faculty of Arts (2018)
-
OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian
Ovaj rad predstavlja novi jezički resurs za pretraživanje i istraživanje verbalnih aspektnih parova u BCS (bosanskom, hrvatskom i srpskom), kreiran korišćenjem principa Lingvističkih Povezanih Otvorenih Podataka (LLOD). Pošto ne postoji resurs koji bi pomogao učenicima bosanskog, hrvatskog i srpskog kao stranih jezika da prepoznaju aspekt glagola ili njegove parove, kreirali smo novi resurs koji će korisnicima pružiti informacije o aspektu, kao i link ka aspektnim parovima glagola. Ovaj resurs takođe sadrži spoljne linkove ka monolingvalnim rečnicima, Wordnetu i BabelNetu. ...Ranka Stanković, Maxim Ionov, Medina Bajtarević, Lorena Ninčević. "OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
-
Resource-based WordNet Augmentation and Enrichment
In this paper we present an approach to support production of synsets for SerbianWordNet(SerWN)byadjustingPrincetonWordNet(PWN)synsetsusing several bilingual English-Serbian resources. PWN synset definitions were automatically translated and post-edited, if needed, while candidate literals for Serbian synsets were obtained automatically from a list of translational equivalents compiled form bilingual resources. Preliminary results obtained from a setof1248selectedPWNsynsetsshowthattheproducedSerbiansynsetscontain 4024 literals, out of which 2278 were offered by the system we present in this paper, whereas experts added the remaining 1746. Approximately one half of ...... Different methods and resources can be used for alignment. One of the common approaches is to take PWN as the source for alignment, and a bilingual dictionary of English and the target language. There are, however, several other approaches. In (Chugur et al., 2001) a monolingual and a bilingual Sp ...
... Source term, Source language identifier, Target term, Target language identifier. The number of terms differ from language to language, due to varying levels of localization. The Microsoft Terminology Collection is a set of standard technology terms used across Microsoft products, and comprises 13,147 ...
... Mariani, J., Odjik, J., Piperidis, S., Rosner, M., and Tapias, D., Eds., Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta, may. European Language Resources Asso- ciation (ELRA). Matuschek, M. and Gurevych, I. (2013). Dijkstra-WSA: A graph-based ...Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev. "Resource-based WordNet Augmentation and Enrichment" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018)
-
FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain
U radu se daje kratak prikaz teorije semantike okvira, na kojoj je zasnovana leksička baza Frejmnet. Predstavljena je koncepcija ove mreže, kao i mogućnosti njene primene. Predstavljena je i leksička analiza koja se primenjuje u projektu izrade Frejmneta i ukazano na razlike između analize zasnovane na okviru u odnosu na analizu zasnovanu na reči. Zatim je prikazano nekoliko povezanih okvira koje prizivaju reči iz domena rizika. U radu je predstavljena i platforma NLTК pomoću koje se mogu koristiti ...... collaborations and future goals.” Language Resources and Evaluation 46 (2): 269–286. Boas, Hans C., and Ryan Dux. 2017. “From the past into the present: From case frames to semantic frames.” Linguistics Vanguard 3 (1): 20160003. https://doi.org/doi:10.1515/lingvan-2016-0003. Brač, Ivana, and Ana Ostroški ...
... 1976. “Frame semantics and the nature of language.” In Annals of the New York Academy of Sciences: Conference on the origin and development of language and speech, 280:20–32. 1. New York. Fillmore, Charles J. 1982. “Frame semantics.” Linguistic society of Korea (ed.), Linguistics in the morning calm, ...
... Hamilton, Craig, Svenja Adolphs, and Brigitte Nerlich. 2007. “The meanings of ‘risk’: A view from corpus linguistics.” Discourse & Society 18 (2): 163–181. Jurafsky, Dan, and James H Martin. 2020. “Semantic Role Labeling and Ar- gument Structure.” Chap. 19 in Speech and Language Processing, 3rd ed. December ...Aleksandra Marković, Ranka Stanković, Natalija Tomić, Olivera Kitanović. "FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.1
-
Bilingual lexical extraction based on word alignment for improving corpus search
Jelena Andonovski, Branislava Šandrih, Olivera Kitanović. "Bilingual lexical extraction based on word alignment for improving corpus search" in The Electronic Library, Emerald (2019). https://doi.org/10.1108/EL-03-2019-0056
-
LRMI markup of OER content within the BAEKTEL project
... next generation of the Web is denoted as Web 3.0, which is an umbrella term for customization, semantic contents, and more sophisticated web applications toward artificial intelligence, including computer-generated contents. While the conventional Web is the “Web of documents”, the Semantic Web ...
... a markup language, where its simplicity makes it useful and easily implemented convention for tagging content. Key benefits of using this approach is expanded access to descriptive data on educational resources, pooling knowledge about learning resources and providing tools and services to ...
... obrazaca u tekstu, konačni automati, transduktori, elektronski rečnici, kaskade i višečlane reči. about: Unitex about: Computational linguistics about: Natural language procesing about: Računarska linvistika about: Obrada tekstova na prirodnom jeziku about: elektronski rečnici about: analiza teksta ...Ranka Stanković, Daniela Carlucci, Olivera Kitanović, Nikola Vulović, Bojan Zlatić. "LRMI markup of OER content within the BAEKTEL project" in The Sixth International Conference on e-Learning (eLearning-2015), September 2015, Belgrade, Serbia, Belgrade : Belgrade Metropolitan Univesity (2015)
-
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...... 313(1):93– 104. Ralph Grishman and Beth Sundheim. 1996. Message Understanding Conference-6: A Brief History. In Proceedings of the 16th International Conference on Computational Linguistics (COLING 1996). vol- ume 1. Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural Language Understanding with Bloom ...
... Proceed- ings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 10Gemini, https://github.com/fyh828/gemini/ 1068 1. Association for Computational Linguistics, pages 141–150. Nathalie Friburger and Denis Maurel. 2004. Finite- state Transducer Cascades to Extract ...
... Task: Language-independent Named Entity Recognition. In COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002). Satoshi Sekine, Kiyoshi Sudo, and Chikashi No- bata. 2002. Extended Named Entity Hier- archy. In Proceedings of the Third Interna- tional Conference on Language Resources ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
-
Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities
Овај рад представља активности на развоју корпуса ELEXIS-sr, српском додатку вишејезичном анотираном корпусу ELEXIS-а, који се састоји од семантичких анотација и репозиторија значења речи. ELEXIS је паралелни вишејезични анотирани корпус на десет европских језика, који може да се користи као вишејезички репер за евалуацију европских језика са мање и средње развијеним ресурсима. Фокус овог рада је на вишечланим изразима и именованим ентитетима, њиховом препознавању у скупу реченица ELEXIS-sr и поређењу са анотацијама на другим језицима. Разматрају се први кораци ...Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov. "Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
-
From ELTeC Text Collection Metadata and Named Entities to Linked-data (and Back)
In this paper we present the wikification of the ELTeC (European Literary Text Collection), developed within the COST Action ``Distant Reading for European Literary History'' (CA16204). ELTeC is a multilingual corpus of novels written in the time period 1840—1920, built to apply distant reading methods and tools to explore the European literary history. We present the pipeline that led to the production of the linked dataset, the novels’ metadata retrieval and named entity recognition, transformation, mapping and Wikidata population, ...Milica Ikonić Nešić, Ranka Stanković, Christof Schöch and Mihailo Škorić. "From ELTeC Text Collection Metadata and Named Entities to Linked-data (and Back)" in Proceedings of The 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference, June 2022, Marseille, France, European Language Resources Association (2022)