KALM: Knowledge-Aware Language Models

The project “Knowledge-Enriched Representation Learning for Natural Language: Algorithms and Applications” investigates knowledge-enriched representation learning for natural language processing, focusing on developing algorithms that incorporate structured and unstructured knowledge sources to improve text understanding and generation.

The research spans both foundational and applied dimensions, targeting tasks such as text simplification, ESG, knowledge graph integrated with contrastive reasoning, and others. A key goal is to bridge the gap between external knowledge (logic foundations, knowledge graphs) and neural representations, enabling models that are more interpretable, robust, and data-efficient.

Objectives

Develop representation learning methods that integrate external knowledge into pre-trained language models
Design algorithms for low-resource and cross-lingual NLP scenarios
Investigate applications in biomedical, legal, and social media domains
Publish and release open-source tools, datasets, and benchmarks for the NLP community

Funding & Support

This project is funded by CNPq (National Council for Scientific and Technological Development) under a Productivity Grant, covering the period from March 2024 to August 2027.