Evaluation in Natural Language Generation: Lessons from Referring Expression Generation

Jette Viethen* and Robert Dale*
*Centre for Language Technology; Division of Information and Communication Sciences; Macquarie University; Sydney NSW 2109; Australia; jviethen@ics.mq.edu.au; robert.dale@mq.edu.au
Résumé (en anglais)
As one of the most well-defined subtasks in Natural Language Generation (NLG), the generation of referring expressions looks like a strong candidate for piloting shared evaluation tasks. Different to other areas of Natural Language Processing, it is still unclear what benefit the introduction of such tasks might have for the field of NLG. Based on an earlier evaluation of a number of well-established algorithms for the generation of referring expressions, this paper explores several problems that arise in designing evaluation for this task, and identifies general considerations that need to be met in evaluating Natural Language Generation subtasks.