Collected Item: “BERT Downstream Task Analysis: Named Entity Recognition in Serbian”
Врста публикације
Рад у зборнику
Верзија документа
објављена
Језик
енглески
Аутор/и (Милан Марковић, Никола Николић)
Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković
Наслов рада (Наслов - поднаслов)
BERT Downstream Task Analysis: Named Entity Recognition in Serbian
Назив конференције (зборника), место и датум одржавања
Lecture Notes in Networks and Systems
Издавач (Београд : Просвета)
Springer Nature Switzerland
Година издавања
2024
Сажетак рада на енглеском језику
This paper compares different architectures and techniques for preparing named entity recognition (NER) models for the Serbian language via integrating BERT with spaCy. Models were trained to recognize seven different named entity types (persons, locations, organisations, professions, events, demonyms, and artworks), and are trained on the dataset containing Serbian novels published between 1840 and 1920, publicly available newspaper articles and sentences generated from the Wikidata knowledge base and Leximirka lexical database. We explore various configurations and several training pipelines that differ in complexity and functionality. Some are dedicated solely to NER, while others encompass additional features like Part-of-speech tagging and lemmatization. One of the key aspects of this work involves testing different versions of BERT, with varied architectures, sizes, and pre-training corpora that contain the Serbian language. This approach allows us to evaluate the trade-offs between model complexity and performance and offers a nuanced understanding of how different configurations impact the efficiency and effectiveness of NER task in Serbian.
Почетна страна рада
333
Завршна страна рада
347
DOI број
10.1007/978-3-031-71419-1_29
ISBN број изворне публикације
9783031714184
ISSN број изворне публикације
2367-3370
Линк
https://link.springer.com/content/pdf/10.1007/978-3-031-71419-1_29
Шира категорија рада према правилнику МПНТ
М30
Ужа категорија рада према правилнику МПНТ
М33
Ниво приступа
Затворени приступ
Лиценца
All rights reserved
Формат датотеке
.pdf