The TAL journal proposes a call for papers on the subject of "Machine Learning for NLP". Machine Learning is the study of algorithms that allow computer programs to automatically improve through experience (definition proposed by Tom Mitchell in his "Machine Learning" book). This domain has drastically increased in the last few years, and its interactions with NLP are more and more tight and frequent.
From a linguistic point of view, the interests for this evolution are numerous. As a matter of fact, manually built resources are time-consuming and expensive, and the process must be started again for each distinct language and each distinct sub-domain of a language. Machine Learning offers an attractive alternative, allowing to obtain or improve at a lower cost such a resource, with better guarantees of robustness and coverage. The inductive approach, used for a long time in the "corpus linguistic" community, can now be operationalized at a large scale, and its results be rigorously tested. And formal theories of learning also contribute to the long-standing debate about natural language acquisition.
From a Machine Learning point of view, NLP is a rich application domain where problems are numerous and difficult, and for which many data are usually available. But the interpretability of the obtained results is often problematic. More and more subtle specialist-reserved mathematical device are used : in this context, is linguistics still useful ? What confidence can a linguist have on the result of a Machine Learning system ?
A number of the electronic review TAL will be dedicated to this theme. Beyond reports about yet another experiment applying a special Machine Learning method on a special linguistic task, more general theoretical and methodological reflexions are encouraged. For each contribution and each method used, a special effort should be made to clarify what are the linguistic as well as computational underlying hypotheses.
The Machine Learning approach considered can be :
either theoretical, concerning learnability/non learnability results for classes of objects, with respect to formal criteria
either empirical, based on an experimental protocol exploiting annotated (in the case of supervised learning) or not annotated (in the case of non supervised learning) data
The methods used can be :
symbolic (grammatical inference, ILP...)
based on probabilistic (either generative or discriminative) models
based on similarities (neighboring, analogy, memory-based learning...)
Application domains can be :
acquisition or improving of resources (including automata, grammars, sub-categorisation frames, concept-based ontologies...)
corpus labeling (either lexical, syntactic, functional, thematic, semantic...)
clustering and classification of texts (according to various possible criteria : author, content, opinion...)
information extraction (including : extraction and typing of named entities)
Isabelle Tellier, LIFO, University of Orléans
Mark Steedman, ICCS, University of Edinburgh, Scotland
Contributions (25 pages maximum, PDF format) must be sent by e-mail to the following address: (isabelle dot tellier at univ dash orleans dot fr) Style sheets are available here. Language: manuscripts may be submitted in English or French. French-speaking authors are requested to submit in French.
01/07/2009 Detailed summary (1p)
06/07/2009 Deadline for submission.
04/09/2009 Notification to authors.
02/10/2009 Deadline for submission of a revised version.
10/11/2009 Final decision.
February 2010 publication on line.
Scientific commitee :
Pieter Adriaans, HSC Lab, University of Amsterdam, Netherlands
Massih Amini, LIP6, Paris and ITI-CNRC, Canada
Walter Daelemans, CNTS, University of Anvers, Belgium
Pierre Dupont, university of Louvain, Belgium
Alexander Clark, Royal Holloway, University of London, Great-Britain
Hervé Dejean, Xerox Center, Grenoble
George Foster, National Research Council, Canada
Colin de la Higuera, Laboratoire Hubert Curien, University of St Etienne
François Denis, LIF, University of Marseille
Patrick Gallinari, LIP6, University of Paris 6
Cyril Goutte, National Research Council, Canada
Laurent Miclet, Enssat, Lannion
Richard Moot, Labri/CNRS, Bordeaux
Emmanuel Morin, LINA, University of Nantes
Jose Oncina, PRAI Group, University of Alicante, Spain
Pascale Sébillot, IRISA, INSA, Rennes
Marc Tommasi, LIFL-Inria, University of Lille
Menno van Zaanen, ILK, University of Tilburg, Netherlands