Communications of International Proceedings

Application of Large Language Models to Data Extraction from Text with an Unknown Structure

AI and Advanced Data-Driven Technologies: 44AI 2024

Maciej HOJDA and Wojciech LORKIEWICZ

Faculty of Computer Science and Telecommunications, Wrocław University of Science and Technology, Wrocław, Poland

Volume 2024 (22), Article ID 4441324, AI and Advanced Data-Driven Technologies: 44AI 2024

Abstract

We present the results of applying Large Language Models to extract meteorological data from weather forecasts provided in a variety of formats. Processing information expressed in natural language is a difficult task, even more so when the goal is the extraction of certain numerical values from sources with unknown (at the design) structure. We apply Large Language Models to this task, and we verify their usefulness when processing various data formats describing the forecast in the natural language and condensed tabular and/or numerical representations. All the data was sourced from real meteorological systems and the output was fixed XML structure. We show that all the models tested in the paper succeed, with varying degrees of efficiency, in extracting basic data from the source forecasts and in encoding extracted information into a predefined XML structure. Finally, we pinpoint main types of errors encountered in the transformation process.

Keywords: Text Processing, Data Extraction, Large Language Model, LLM

Application of Large Language Models to Data Extraction from Text with an Unknown Structure

Maciej HOJDA and Wojciech LORKIEWICZ

Faculty of Computer Science and Telecommunications, Wrocław University of Science and Technology, Wrocław, Poland

Abstract

+Articles

+General Information