CLARIN – Shared Language Resources and Technological Infrastructure

CLARIN – Shared Language Resources and Technological Infrastructure CLARIN – Shared Language Resources and Technological Infrastructure

The primary goal of the project is to enhance the CLARIN-PL research infrastructure by expanding its capabilities to support scientific research and innovative activities related to linguistic data analytics and natural language processing (NLP). Building on advancements achieved by the CLARIN-PL-Biz project, the initiative focuses on integrating Large Language Models (LLMs) to improve tools for information extraction, personalized solutions, and effective communication in natural language. The project aims to develop five virtual laboratories that will offer advanced analytical, personalized, and trustworthy dialogue systems while ensuring high-quality linguistic resources and computational services.

Key objectives include:

  1. Expanding tools and systems for linguistic data analysis, focusing on temporal data and integration with cutting-edge LLMs to enhance the usability and efficiency of language tools.
  2. Developing context-aware and personalized NLP tools, including systems for personalized content generation, hate speech detection, and emotion analysis tailored to user-specific needs.
  3. Creating trusted dialogue systems that ensure reliability, transparency, and security in user interactions.
  4. Enhancing linguistic resources for AI, ensuring FAIR standards (Findable, Accessible, Interoperable, Reusable) to address challenges specific to the Polish language and provide a counterbalance to English-centric LLMs.
  5. Improving computational capacity to meet the demands of large-scale LLM training and usage, offering flexible and efficient solutions for data preparation, model training, and inference.

The project will leverage the latest advancements in LLM technology to integrate tools into the CLARIN-PL platform, ensuring seamless access and usability for research, business, and public service users. The ultimate aim is to create a robust, scalable, and user-focused infrastructure for linguistic and NLP advancements in Poland and beyond.

Partners:

  • Wroclaw University of Science and Technology
  • Institute of Computer Science, Polish Academy of Sciences
  • Institute of Slavic Studies, Polish Academy of Sciences
  • University of Łódź
  • University of Wrocław

Program: European Funds for a Modern Economy 2021–2027

Duration: 01.01.2025 - 31.12.2027

Funding: 61 141 241,03 PLN

Maciej Piasecki
Maciej Piasecki
Associate Professor

Professor Maciej Piasecki is a researcher specializing in computational linguistics, natural language processing, and machine learning. His work focuses on areas such as the development of linguistic resources, semantic networks, Large Language Models, and applications of AI in text analysis. He has contributed significantly to advancing methods for analyzing and processing Polish language data within multidisciplinary contexts.