Contrastive Analysis of Syntax Patterns in Comparable Football Corpora in Spanish and Serbian Languages

Објеката

Тип
Саопштење са скупа штампано у изводу
Верзија рада
објављена
Језик
енглески
Креатор
Jelena Lazarević, Olivera Kitanović
Извор
South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024
Уредник
prof. dr Jasmina Moskovljević Popović, prof. dr Ranka Stanković
Издавач
University of Belgrade - Faculty of Philology
Датум издавања
2024.
Сажетак
Cilj rada je istraživanje kolokabilnosti kao načina na koji se leksičke jedinice povezuju sa rečima iz različitih kategorija, formirajući veće jedinice. Istraživanje semantičkih i sintaksičkih principa ovih kombinacija u španskom i srpskom jeziku fudbala izvedeno je na komparabilnim fudbalskim korpusima SrFudKo i EsFudko, razvijenim u okviru doktorske disertacije Jelene Lazarević pod nazivom: Jezičke odlike diskursa novih medija o fudbalu: kontrastivna analiza na korpusu srpskog i španskog jezika.
Korpus fudbala SrFudKo, kreiran na osnovu tekstova o fudbalu sa pet srpskih veb-portala: B92, Blic, Mondo, Politika i Sport klub, sadrži 10.100.553 tokena, od toga 8.618.426 reči. Korpus EsFudKo o fudbalu na španskom jeziku potiče od tekstova sa dva španska veb-portala: Marca fútbol i Mundo deportivo, a sadrži 9.106.812 tokena, od čega 8.024.164 reči. Oba korpusa nad kojima su primenjene metode korpusne lingvistike za ekstrakciju podataka se nalaze platformi https://noske.jerteh.rs i dostupni su ovlašćenim korisnicima. U ovom radu se za kolokacije određuje uzajamna leksičko-semantička „privlačnost“ na osnovu frekvencija i drugih mera u korpusima. Kolokacije se posmatraju u najširem smislu korpusne lingvistike - kao niz reči ili pojmova koji se pojavljuju zajedno, češće nego što bi se slučajno očekivalo. Predstavićemo kroz primere sedam glavnih tipova kolokacija: pridev + imenica (brza kontra), imenica + imenica (penal serija), glagol + imenica (postići gol), prilog + pridev (veoma talentovan), glagoli + predloška fraza (igra na stadionu) i glagol + prilog (šutirati snažno). Ekstrakcija kolokacija predstavlja tehniku računarske lingvistike za identifikaciju kolokacija u tekstu ili korpusu tekstova koristeći elemente slične rudarenju podataka, oslanjajući se na sintaksičke obrasce i frekvencije pojavljivanja.
Osim frekvencija pojavljivanja, razmatramo i druge faktore, poput bliskosti i konteksta u oba jezika. Na primer, da li određene kolokacije imaju specifična značenja ili se koriste samo u određenim situacijama. Takođe razmatramo da li su prethodno identifikovane kolokacije razumljive opštoj javnosti koja ne prati sport i nije upućena u jezik fudbala. Ukoliko ih prosečan govornik razume, govorimo o kolokacijama koje su postale deo javnog domena i nadmašile svoje poreklo fudbalskog domena.
Doprinos istraživanja čini i analiza veza između kolokacija i višečlanih termina. Veza je snažna kada višečlani termini sadrže kolokate sa jasnim značenjem unutar domena fudbala. Time pomažemo u razumevanju terminološke povezanosti unutar jezika fudbala, pružajući uvid u standardne kombinacije reči i njihovu upotrebu ilustrujući ih u fudbalskim korpusima srpskog i španskog jezika fudbala, što produbljuje njegovu analizu.
The aim of the paper is to explore collocability as a manner in which lexical units are combined with words from different categories, forming larger units. The research of the semantic and syntactic principles of these combinations of Spanish and Serbian footballing terms was carried out on the comparable football corpora SrFudKo and EsFudko developed as part of Jelena Lazarevic's doctoral dissertation titled: Language characteristics of the new media discourse on football: a contrastive analysis of the Serbian and Spanish language corpora.
The football corpus SrFudKo was developed through texts about football from five Serbian web news sites: B92, Blic, Mondo, Politika, and Sport klub, containing 10,100,553 tokens, of which 8,618,426 words. The corpus of Spanish-language texts on football EsFudKo, comes from two Spanish sites: Marca fútbol and Mundo deportivo, containing 9,106,812 tokens, of which 8,024,164 words. Both corpora to which corpus linguistics methods have been applied for data extraction are located on the platform https://noske.jerteh.rs, and are available to authorized users.
In this paper, the mutual lexical-semantic "attractiveness" of collocations is determined based on frequencies and other measures within the corpora, so that collocations are viewed in the broadest sense of Corpus linguistics - as a series of words or concepts that appear together more often than expected by chance. We will present seven main types of collocations through the following examples: adjective + noun (fast counter), noun + noun (penalty shootout), verb + noun (to score a goal), adverb + adjective (very talented), verbs + prepositional phrase (play at the stadium) and verb + adverb (to kick hard). Collocation extraction represents a technique in Computational linguistics that identifies collocations in a text or corpus of texts, using elements similar to data mining, while relying on syntactic patterns and frequencies of occurrence.
In addition to frequencies of occurrence, we also consider other factors, such as semantic closeness and context in both languages. For example, do certain collocations have specific meanings, or are they only used in certain situations? We also consider whether the previously identified collocations are understandable to the general public who do not follow sports and are not versed in the language of football. If a speaker from the general understands them, then the collocations have surpassed their origin in the football domain, becoming part of the public domain.
The contribution of the research also means analyzing the connections between collocations and multi-part terms. Their connection is strong when the multi-part terms contain collocates that have a clear meaning within the domain of football. This helps understand the terminological connection within the language of football, providing insight into typical word combinations and their use, illustrating those that often appear in football corpora of the Serbian and Spanish languages of football.
Subject
fudbal, korpusi, terminologija, kolokacije, srpski, španski
football, corpora, terminology, collocations, Serbian, Spanish
Шира категорија рада
М60
Ужа категорија рада
М64
Је дио
Text Embeddings - Serbian Language Applications - TESLA
Права
Отворени приступ
Лиценца
Creative Commons – Attribution 4.0 International
Формат
.pdf

Jelena Lazarević, Olivera Kitanović . "Contrastive Analysis of Syntax Patterns in Comparable Football Corpora in Spanish and Serbian Languages" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024.)

This item was submitted on 28. новембар 2024. by [anonymous user] using the form “Рад у зборнику радова” on the site “Радови”: http://gabp-dl.rgf.rs/s/repo

Click here to view the collected data.