Old or New, We Repair, Adjust and Alter (Texts)
Cvetana Krstev, Ranka Stanković (2020)U ovom radu predstavljamo kako se e-rečnici i kaskade transduktora konačnih stanja implementirani u alatu Unitex mogu koristiti za rešavanje tri problema transformacije teksta: ispravljanje tekstova nakon OCR-a, vraćanje dijakritičkih znakova i prebacivanje između različitih jezičkih varijanti.ispravka teksta, OCR greške, restauracija dijakritika , jezičke varijante, elektronski rečnik, transduktori konačnih stanja... Cyrillic В (corresponding to Latin V ). The majority of books were obtained from the University Library “Svetozar Marković”, and the rest from other libraries and private collections. First of all, the OCR software was set to recognize in novels printed in Cyrillic only the Cyrillic script. As a consequence ...
Jaćimović, Jelena. "Textometric methods and the TXM platform for cor- pus analysis and visual presentation". Infotheca – Journal for Digital Hu- manities Vol. 19, no. 1 (2019): 30–54. https://infoteka.bg.ac.rs/ojs/ index.php/Infoteka/article/view/2019.19.1.2_en Kolak, Okan and Philip Resnik
FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain
U radu se daje kratak prikaz teorije semantike okvira, na kojoj je zasnovana leksička baza Frejmnet. Predstavljena je koncepcija ove mreže, kao i mogućnosti njene primene. Predstavljena je i leksička analiza koja se primenjuje u projektu izrade Frejmneta i ukazano na razlike između analize zasnovane na okviru u odnosu na analizu zasnovanu na reči. Zatim je prikazano nekoliko povezanih okvira koje prizivaju reči iz domena rizika. U radu je predstavljena i platforma NLTК pomoću koje se mogu koristiti ...... stemming, tagging, parsing and se- mantic reasoning. The NLTK system uses wrappers for other Python natural language processing and lexical resource libraries. One of the APIs available within NLTK is FrameNet and the accompanying program library designed for searching this resource, as well as for extracting ...
... other languages e.g. Serbian (keeping frame information which is shared and adding language-specific material) therefore making it applicable to multilingual resources. 25. Статистичке формуле коjе се корите у алату Sketch engine: statis- tics/formulae Infotheca Vol. 21, No. 1, September 2021 29 ...
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca" za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu
Mesozoic carbonate rocks in Serbia used as dimension stone
Vesna Matović, Tijana Vojnović Ćalić (2016)The building industry in Serbia uses, to a great extent, imported natural stone for architectural purposes. The significance of local deposits, particularly limestones, is not adequately perceived despite the country’s abundance of these valuable resources. Therefore, this study focuses on Serbia’s Mesozoic carbonate rocks, specifically on the deposits of four selected quarries: Klisura, Skrzut, Struganik, and Tisnica. The quality and prospects of the application of these limestones has not yet been the subject of a detailed, comprehensive investigation. Therefore, ...... museums, residential buildings, restaurants, schools, etc.) IH-3 Paving of interior horizontal surfaces of moderate pedestrian traffic areas (libraries, archives, book stores, waiting rooms, etc.) IV-I Cladding of interior vertical surfaces EH-I Paving of exterior horizontal surfaces of very intensive ...
graphic analyses of the rocks was performed on thin sec- tions using a Leica DMLSP microscope for polarized light connected to a Leica DC 300 digital camera. The thin sections were used to determine the mineral composition and microtextural features (size and type of allochem) for the purpose
From DELA Based Dictionary to Leximirka Lexical Database
Biljana Lazić, Mihailo Škorić (2020)In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...... M., “From DELA based dictionary to . . . ”, pp. 81–98 of terms, the extraction of time expressions and advanced search of text repositories and libraries. The morphological dictionaries were developed in the DELA text format (fr. Dictionnaires électroniques du LADL2 ) which will be discussed in Sec- ...
... can be used to link lexical entries. The ini- tial morphological dictionaries were Serbian Morphological Dictionaries. However, we will show multilingual application of Leximirka us- ing French Morphological Dictionaries. KEYWORDS: morphological dictionaries, language resources, Leximirka. PAPER ...
... be provided in Section 4.1. The possibilities for establishing relations among lexical entries in the database will be introduced in Section 4.2. Multilingual application of Leximirka based on French lexical entries will be presented in Section 4.3. Ideas for further work on application development will ...Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Two approaches to compilation of bilingual multi-word terminology lists from lexical resources" in Natural Language Engineering, Cambridge University Press (CUP) (2020). https://doi.org/10.1017/S1351324919000615
The Use of the Omeka Semantic Platform for the Development of the University of Belgrade, Faculty of Mining and Geology Digital Repository
Under the regulations of the Ministry of Education, Science and technological Development, a digital repository based on the Omeka S data storage platform has been developed for the Faculty of Mining and Geology. The platform has been upgraded with the required modular extensions, Solr index and automatic OCR. Furthermore, document indexing and search have been fine-tuned with the aid of e-dictionaries of the Serbian language, which has brought about outstanding results in terms of usage facilitation and overall ...Petar Popović, Mihailo Škorić, Biljana Rujević. "The Use of the Omeka Semantic Platform for the Development of the University of Belgrade, Faculty of Mining and Geology Digital Repository" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2020.20.1_2.9
A WordNet Ontology in Improving Searches of Digital Dialect Dictionary
In this paper, we present a method for automatic generation of a digital resource, which connects all indirect synonyms of a dialect term to all indirect synonyms of a corresponding term in the standard language, aiming to improve the search of a digital dialect dictionary. The method uses SWRL rules defined in the Serbian WordNet ontology to identify sets of synonymous words. It also uses e-dictionaries to produce correct lemmas in standard language that users usually employ in searches. ...... received a new impulse with the development of many software tools and digital resources for maintaining, enhancement, sharing, visu- alization and analysis of digital dialect dictionaries. This includes: the develop- ment of digital dictionaries of dialects [4], [13], [1], the development of tools for ...
... techniques for representing digital resources as knowledge based resources and as Linked Open Data (LOD) on the Web [6], [15]. Digital dictionary of the South Serbian dialect*, containing over 20 thousand terms, is the first comprehensive implementation [7] of a digital version of a di- alect vocabulary ...
... would enable search over a digital dialect dictionary by using terms in the standard language. In Section 2 we discuss some previous approaches to searching digital dialect dictionaries. In Section 3 we represent re- sources used to improve searching performances of the digital dialect dictionary: Serbian ...Miljana Mladenović, Ranka Stanković, Cvetana Krstev. "A WordNet Ontology in Improving Searches of Digital Dialect Dictionary" in New Trends in Databases and Information Systems: ADBIS 2017 Short Papers and Workshops - SW4CH (Semantic Web for Cultural Heritage) 767, Springer International Publishing (2017). https://doi.org/10.1007/978-3-319-67162-8_37
Classifying large strains from digital imagery: application to analogue models of lithosphere deformation
... strains from digital imagery: application to analogue models of lithosphere deformation Taco Broerse, Nemanja Krstekanić, Cor Kasbergen, Ernst Willingshofer Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Classifying large strains from digital imagery: application ...
General Assembly 2021© Author(s) 2021. This work is distributed under the Creative Commons Attribution 4.0 License. Classifying large strains from digital imagery: application toanalogue models of lithosphere deformation Taco Broerse1, Nemanja Krstekanic1,2, Cor Kasbergen3, and Ernst Willingshofer1 1Utrecht
Split-Desktop software for the analysis of fragment size distribution of blasted rock mass
Milanka Negovanović, Lazar Kričak, Stefan Milanović, Jovan Marković, Nikola Simić, Snežana Ignjatović (2023)Drobljenje stena je najvažniji pokazatelj u proceni efekata miniranja pri proizvodnom miniranju u površinskoj eksploataciji. Stepen drobljenja stena ima veliki uticaj na efikasnost daljih operacija utovara, transporta, drobljenja i mlevenja. Optimalno drobljenje stena pri proizvodnom miniranju utiče na smanjenje ukupnih troškova proizvodnje. Stoga je pouzdana procena veličine drobljenja odminirane stenske mase veoma važno pitanje, ne samo u operacijama miniranja, već i u rudarskoj proizvodnji. Za predviđanje distribucije veličine komada odminirane stenske mase postoje različiti empirijski modeli. KUZ-RAM model omogućava ...Milanka Negovanović, Lazar Kričak, Stefan Milanović, Jovan Marković, Nikola Simić, Snežana Ignjatović. "Split-Desktop software for the analysis of fragment size distribution of blasted rock mass" in 9th International Conference Mining and environmental protection, Sokobanja, Serbia, 24 – 27. May 2023, Belgrade : University of Belgrade, Faculty of Mining and Geology (2023)
Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection
Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Duško Vitas, Mihailo Škorić, Milica Ikonić Nešić (2022)In this paper we present the Serbian part of the ELTeC multilingual corpus of novels written in the time period 1840-1920. The corpus is being built in order to test various distant reading methods and tools with the aim of re-thinking the European literary history. We present the various steps that led to the production of the Serbian sub-collection: the novel selection and retrieval, text preparation, structural annotation, POS-tagging, lemmatization and named entity recognition. The Serbian sub-collection was published ...Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Duško Vitas, Mihailo Škorić, Milica Ikonić Nešić. "Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection" in Proceedings of the Language Resources and Evaluation Conference, June 2022, Marseille, France, European Language Resources Association (2022)
Digitalizacija u rudarstvu: Kreiranje sistema za efikasno poslovno izveštavanje
Proces donošenja odluka u oblasti rudarstva i geologije uslovljen je blagovremenim posedovanjem kvalitetnih podataka i informacija. Kompleksnost rudarskih procesa nalaže prikupljanje podataka na dnevnom odnosno na smenskom nivou. Podaci kao takvi bez analitičkog pristupa nisu dovoljni. Kako bi pristup podacima bio brz i efikasan neophodno je posedovanje adekvatnog digitalnog rešenja uz adekvatne centralizovane baze podataka. U ovom radu je dat pregled trenutne pozicije rudarstva sa aspekta digitalne transformacije kao i predlog jednostavnog prototipa u obliku digitalnog sistema za poslovno ...... necessary to have an adequate digital solution along with appropriate centralized databases. This paper provides an overview of the current position of mining from the perspective of digital transformation, as well as a proposal for a simple prototype in the form of a digital system for business reporting ...
... component/b9ed6f5e-en [3] Ganerwalla, A. & Harnathka, S. (2021). Racing Toward a Digital Future in Metals and Mining. Boston Consulting Group. available on: https://www.bcg.com/publications/2021/adopting-a-digital-strategy-in- the-metals-and-mining-industry [4] Stolterman, E. & Fors, A.C. (2004) ...
... de Sousa, J. S., Rocha, L. O., & de Castro, R. M. (2016). Digital transformation applied to Bauxite and Alumina Business System: BABS 4.0. In Proceedings of 37th International ICSOBA Conference, pp. 119-132. [6] Berman S.J. (2012). Digital transformation: opportunities to create a new business models ...Stevan Đenadić, Aleksandar Mirković, Veljko Rupar . "Digitalizacija u rudarstvu: Kreiranje sistema za efikasno poslovno izveštavanje" in XVI Međunarodna rudarska konferencija OMC 2024, Zlatibor, 9 - 12. oktobar 2024, Jugoslovenski komitet za površinsku eksploataciju (2024)
Social-Emo.Sr: Emotional Multi-Label Categorization of Conversational Messages from Social Networks X and Reddit
U digitalnom okruženju južnoslovenskih jezika, analiza emocija u tekstovima na društvenim mrežama postaje sve važnija za razumevanje javnog mnjenja, kreiranje personalizovanog sadržaja i analizu međusobnih interakcija korisnika. U okviru ovog rada predstavljamo detaljnu metodologiju i rezultate označavanja korpusa na srpskom jeziku prema Plutčikovom modelu kategorizacije, koji prepoznaje osam osnovnih emocionalnih kategorija, kao što su radost, tuga, bes, strah, poverenje, gađenje, iščekivanje i iznenađenje. Cilj istraživanja je da se analizira emocionalni sadržaj tekstova preuzetih sa društvenih mreža X (nekada Twitter) ...Milena Šošić, Ranka Stanković, Jelena Graovac. "Social-Emo.Sr: Emotional Multi-Label Categorization of Conversational Messages from Social Networks X and Reddit" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024., University of Belgrade - Faculty of Philology (2024)
New Language Models for South Slavic Languages
Mihailo Škorić (2024)Izlaganje će predstaviti izazove i perspektive modelovanja južnoslovenskih jezika, sa posebnim osvrtom opšte jezičke modele građene na arhitekturi transformera (BERT, GPT), na dostupne skupove tekstova za obučavanje tih modela, te kvantitet i kvalitet tih skupova. Izlaganje će ponuditi pregled dostupnih skupova i modela, dok će posebna pažnja biti posvećena najnovijim korpusima tekstova. Prvi korpus, Kišobran, predstavlja krovni veb korpus južnoslovenskih jezika i ujedno trenutno najveći korpus tekstova na našim prostorima koji broji preko osamnaest milijardi reči i uključuje sve ...Mihailo Škorić. "New Language Models for South Slavic Languages" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024)
Improvement of geodatabase queries within GeolISS
Ranka Stanković (2008)... spatial multilingual database of named entities, can be used for a multilingual symbolization of maps. References [1] Richard, S.M., Matti, Jonathan, Soller, D.R., 2003. Geoscience terminology development for the National Geologic Map Database, in Soller, David R., ed., Digital Mapping ...
... very important in highly inflective languages, such as Serbian. The geological dictionary, developed within GeolISS, supports semantic and multilingual expansions of the query. The Human Language Technology group at the University of Belgrade (HLT) has been developing various lexical resources ...
... that can be used, among others, for cross-language information retrieval. For expansion of queries with proper names WS4LR is using Prolex, a multilingual database of proper names which represents the implementation of an elaborate four-layered ontology of proper names [12] organized around a ...Ranka Stanković. "Improvement of geodatabase queries within GeolISS" in Review of the National Center for Digitization, Beograd : Faculty of Mathematics, Belgrade (2008)
Српски језик у дигиталном добу -- The Serbian Language in the Digital Age
Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević (2012)... automatically correct search enginequeries, as found inGoogle’sDid youmean… sug- gestions. 4.2.2 Web Search Searching theWeb, intranets or digital libraries is proba- bly themostwidely used yet largely underdeveloped lan- guage technology application today. e Google search 60 Input Text Spelling ...
... functionalities of net- worked information technology. e network supports a Europe that unites as a sin- gle digital market and information space. It stimulates and promotes multilingual technologies for all Euro- pean languages. ese technologies support automatic translation, content production, ...
... Sprache im Digitalen Zeitalter – e German Language in the Digital Age. META-NET White Paper Series. Georg Rehm and Hans Uszkoreit (Series Editors). Springer, 2012. [2] Aljoscha Burchardt, Georg Rehm, and Felix Sasaki. e Future European Multilingual Information Society: Vision Paper for a Strategic Research ...Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević. "Српски језик у дигиталном добу -- The Serbian Language in the Digital Age" in META-NET White Paper Series, G. Rehm, H. Uszkoreit (eds.), Springer (2012)
The Many Faces of SrpKor
Акроним СрпКор означава фамилију електронских корпуса савременог српског језика чија је изградња почела крајем седамдесетих година прошлога века, а која је постала шире видљива заинтересованој истраживачкој заједници објављивањем његове прве верзије на вебу 2002. године. У овом дугом периоду, посебно пре појаве корисних текстуелних ресурса на вебу, развој корпуса се састојао у прикупљању и обради грађе као и у развоју метода обраде корпуса. Наиме, електронски корпус није само колекција текстова у дигиталном облику (како се то, на пример, наводи ...Duško Vitas, Ranka Stanković, Cvetana Krstev. "The Many Faces of SrpKor" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024.)
Contrastive Analysis of Syntax Patterns in Comparable Football Corpora in Spanish and Serbian Languages
Jelena Lazarević, Olivera Kitanović (2024.)Cilj rada je istraživanje kolokabilnosti kao načina na koji se leksičke jedinice povezuju sa rečima iz različitih kategorija, formirajući veće jedinice. Istraživanje semantičkih i sintaksičkih principa ovih kombinacija u španskom i srpskom jeziku fudbala izvedeno je na komparabilnim fudbalskim korpusima SrFudKo i EsFudko, razvijenim u okviru doktorske disertacije Jelene Lazarević pod nazivom: Jezičke odlike diskursa novih medija o fudbalu: kontrastivna analiza na korpusu srpskog i španskog jezika. Korpus fudbala SrFudKo, kreiran na osnovu tekstova o fudbalu sa pet srpskih veb-portala: ...Jelena Lazarević, Olivera Kitanović . "Contrastive Analysis of Syntax Patterns in Comparable Football Corpora in Spanish and Serbian Languages" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024.)
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation
electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion
2006, T. Erjavec and J. Žganec Gros, Eds. Ljubljana, Slovenia: Institut "Jožef Stefan", October 2006, pp. 192–197. [7] A. Savary, "Multiflex: A Multilingual Finite-state Tool for Multi-Word Units," in CIAA, 2009, pp. 237–240. [8] A. Savary, C. Krstev, and D. Vitas, "Inflectional Non-compositionality
... Marrakech, Marocco, 2008. [11] C. Krstev, R. Stanković, D. Vitas, and S. Koeva, “E-Connecting Balkan Languages,” in Proc. of the Workshop on Multilingual Resources, Tech- nologies and Evaluation for Central and Eastern European Languages — RANLP09, Borovetz, Bulgaria, 2009, pp. 23–29. [12] C. Krstev ...Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)
Assessment of historical flood risk to the groundwater regime: case study of the Kolubara Coal Basin, Serbia.
Polomčić Dušan, Bajić Dragoljub, Ratković Jelena. "Assessment of historical flood risk to the groundwater regime: case study of the Kolubara Coal Basin, Serbia." in Water 5 no. 10, Basel:Multidisciplinary Digital Publishing Institute (2018): 588. https://doi.org/10.3390/w10050588