Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework

1Department of Statistics, LMU Munich, 2Munich Center for Machine Learning (MCML)

Ranking models and decoding strategies for open-ended text generation.

Abstract

Open-ended text generation has become a prominent task in natural language processing due to the rise of powerful (large) language models. However, evaluating the quality of these models and the employed decoding strategies remains challenging because of trade-offs among widely used metrics such as coherence, diversity, and perplexity.

Decoding methods often excel in some metrics while underperforming in others, complicating the establishment of a clear ranking. In this paper, we present novel ranking strategies within this multicriteria framework. Specifically, we employ benchmarking approaches based on partial orderings and present a new summary metric designed to balance existing automatic indicators, providing a more holistic evaluation of text generation quality. Furthermore, we discuss the alignment of these approaches with human judgments. Our experiments demonstrate that the proposed methods offer a robust way to compare decoding strategies, exhibit similarities with human preferences, and serve as valuable tools in guiding model selection for open-ended text generation tasks. Finally, we suggest future directions for improving evaluation methodologies in text generation.

Our codebase, datasets, and models are publicly available.

Related Links

Please check our other papers if you are interested in related work.

Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation introduces adaptive contrastive search, a novel decoding strategy extending contrastive search by incorporating an adaptive degeneration penalty, guided by the estimated uncertainty of the model at each generation step.

Decoding Decoded: Understanding Hyperparameter Effects in Open-Ended Text Generation provide practical guidelines for hyperparameter tuning and demonstrate the substantial influence of these choices on text quality.

BibTeX

@article{Esteban2024towards,
  author    = {Esteban Garces Arias, Hannah Blocher, Julian Rodemann, Meimingwei Li, Christian Heumann, Matthias Aßenmacher},
  title     = {Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework},
  journal   = {arXiv preprint},
  year      = {2024},
}