Paper published at JBCS about LLM Brazilian Fauna
🦜 Exploring Brazil’s LLM Fauna: Evaluating Generative Performance in Portuguese
We are excited to share our new paper, “Exploring Brazil’s LLM Fauna: Investigating the Generative Performance of Large Language Models in Portuguese.”
As Large Language Models (LLMs) become increasingly embedded in real-world applications, their evaluation still relies heavily on narrow, mostly English-centered benchmarks. These traditional evaluations often neglect essential generative aspects such as discourse coherence, adequacy, and linguistic transformations — all crucial for practical use.
In this work, we provide a comprehensive evaluation of Brazilian Portuguese LLMs across three core Natural Language Generation tasks:
- 📝 Text Summarization
- ✂️ Sentence Simplification
- âť“ Generative Question Answering
We evaluate six Brazilian models and compare them with GPT-4o, combining automatic metrics, an LLM-as-a-judge framework, and human evaluation.
🔎 Key findings:
- GPT-4o achieves the strongest overall generative performance in Portuguese.
- The Sabiá-3 family follows closely behind.
- The open-weight model Tucano stands out for its computational efficiency, making it a strong candidate for deployment in resource-constrained environments.
All experimental code is publicly available: 👉 https://github.com/MeLLL-UFF/brfauna-gen-eval
This work contributes to a broader understanding of how LLMs perform beyond English and supports more realistic, generation-focused evaluation pipelines for Portuguese NLP.