Matches in Nanopublications for { ?s ?p ?o <https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/assertion>. }
Showing items 1 to 20 of
20
with 100 items per page.
- SSW240006 type Article assertion.
- author-list _1 0000-0001-8004-0464 assertion.
- author-list _2 0000-0002-1267-0234 assertion.
- author-list _3 0000-0002-2146-4803 assertion.
- author-list _4 0000-0002-7748-4715 assertion.
- 0000-0002-1267-0234 name "Tobias Kuhn" assertion.
- 0000-0002-7748-4715 name "Jacco van Ossenbruggen" assertion.
- 0000-0001-8004-0464 name "Margherita Martorana" assertion.
- 008xxew50 name "Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands" assertion.
- 0000-0002-2146-4803 name "Lise Stork" assertion.
- 1868-1158 title "Studies on the Semantic Web" assertion.
- SSW240006 title "Zero-Shot Topic Classification of Column Headers: Leveraging LLMs for Metadata Enrichment" assertion.
- SSW240006 date "2024-09-11" assertion.
- SSW240006 isPartOf 1868-1158 assertion.
- SSW240006 abstract "Traditional dataset retrieval systems rely on metadata for indexing, rather than on the underlying data values. However, high-quality metadata creation and enrichment often require manual annotations, which is a labour-intensive and challenging process to automate. In this study, we propose a method to support metadata enrichment using topic annotations generated by three Large Language Models (LLMs): ChatGPT-3.5, GoogleBard, and GoogleGemini. Our analysis focuses on classifying column headers based on domain-specific topics from the Consortium of European Social Science Data Archives (CESSDA), a Linked Data controlled vocabulary. Our approach operates in a zero-shot setting, integrating the controlled topic vocabulary directly within the input prompt. This integration serves as a Large Context Windows approach, with the aim of improving the results of the topic classification task. We evaluated the performance of the LLMs in terms of internal consistency, inter-machine alignment, and agreement with human classification. Additionally, we investigate the impact of contextual information (i.e., dataset description) on the classification outcomes. Our findings suggest that ChatGPT and GoogleGemini outperform GoogleBard in terms of internal consistency as well as LLM-human-agreement. Interestingly, we found that contextual information had no significant impact on LLM performance. This work proposes a novel approach that leverages LLMs for topic classification of column headers using a controlled vocabulary, presenting a practical application of LLMs and Large Context Windows within the Semantic Web domain. This approach has the potential to facilitate automated metadata enrichment, thereby enhancing dataset retrieval and the Findability, Accessibility, Interoperability, and Reusability (FAIR) of research data on the Web." assertion.
- 0000-0002-1267-0234 affiliation 008xxew50 assertion.
- 0000-0002-7748-4715 affiliation 008xxew50 assertion.
- 0000-0001-8004-0464 affiliation 008xxew50 assertion.
- 0000-0002-2146-4803 affiliation 008xxew50 assertion.
- SSW240006 authorList author-list assertion.