KALM: Knowledge-Aware Language Models

The project “Knowledge-Enriched Representation Learning for Natural Language: Algorithms and Applications” investigates knowledge-enriched representation learning for natural language processing, focusing on developing algorithms that incorporate structured and unstructured knowledge sources to improve text understanding and generation.

The research spans both foundational and applied dimensions, targeting tasks such as text simplification, ESG, knowledge graph integrated with contrastive reasoning, and others. A key goal is to bridge the gap between external knowledge (logic foundations, knowledge graphs) and neural representations, enabling models that are more interpretable, robust, and data-efficient.

Objectives

  • Develop representation learning methods that integrate external knowledge into pre-trained language models
  • Design algorithms for low-resource and cross-lingual NLP scenarios
  • Investigate applications in biomedical, legal, and social media domains
  • Publish and release open-source tools, datasets, and benchmarks for the NLP community

Funding & Support

This project is funded by CNPq (National Council for Scientific and Technological Development) under a Productivity Grant, covering the period from March 2024 to August 2027.