The language that real-world natural language processing systems have to deal with bears little resemblance to the perfectly grammatical examples often found in linguistics textbooks. Instead, it comes to us damaged in various ways: authors introduce spelling and grammatical errors into the texts they type, speakers produce incomplete or otherwise disfluent sentences, OCR systems misrecognize the characters on the printed page, and speech recognition systems produce inaccurate hypotheses as to what was actually said.
Noisy input is a fact of life: our systems ignore it at their peril. For some applications, we require mechanisms which are robust to error; for example, a spoken language dialog system may assign a low confidence to a hypothesis, and as a consequence ask the user to repeat his/her utterance. For other applications, we need to make use of error correction techniques, so that, for example, an OCR system might use contextual post-processing to validate the spellings of words.
This special issue aims to bring together work on error handling in natural language processing from a range of different application areas. Many subfields of NLP have a need to do something about noise in the signal, but rarely do researchers from these diverse areas have an opportunity to compare their methods and techniques. Our aim is to juxtapose work from these different areas in order to encourage cross-fertilization of ideas.
We consider as in-scope for this special issue any papers which describe and discuss techniques that are concerned with processing linguistic data which are in some regard noisy. The most developed subfields here are spelling correction and, to a lesser extent, grammar correction; neither of these are completely solved problems, and as far as errors at the stylistic, semantic, and discourse levels are concerned, automated textual error correction has barely scratched the surface. Robust processing regimes, where the aim is to extract something useful from a broken input, are also of interest, for both speech and text input; and more broadly, repair and recovery techniques in dialog systems are also of relevance.
We encourage submissions on any aspect of natural language processing
related to the handling of errors, including in particular:
automatic spelling and grammar correction
semantic and logical errors
stylistic and discourse-level correction
automatic correction of machine-produced texts (OCRs, speech transcripts, etc.)
spelling correction in web search
errors in controlled language input
acquisition, annotation and analysis of errors in real texts
errors in language learning
handling performance errors
building error corpora
text normalization issues
robust NLP techniques
handling disfluent speech
handling errors in speech recognition
confidence measure estimation
managing noise in training corpora
error metrics
error as signatures; watermarking with errors
measuring the seriousness of errors
GUEST EDITORS
Robert Dale (Macquarie University, Australia)
François Yvon (LIMSI/CNRS and Univ. Paris Sud, France)
SCIENTIFIC COMMITTEE (TBA)
IMPORTANT DATES
Deadline for submission: october 15th, 2012
First notification to authors: december 15th 2012
Deadline for revisions: february 1st, 2013
Final decisions: april 15th, 2013
Camera-ready: june 15th, 2013
Publication: summer 2013
THE JOURNAL
TAL (Traitement Automatique des Langues / Natural Language Processing) is a forty year old international journal published by ATALA (French Association for Natural Language Processing) with the support of CNRS (National Centre for Scientific Research). It has moved to an electronic mode of publication, with printing on demand (see http://www.atala.org/-Revue-TAL). This affects in no way its reviewing and selection process.
PRACTICAL ISSUES
Contributions (approx. 25 pages, PDF format) must be uploaded at http://tal-53-3.sciencesconf.org/ Style sheets are available for download on the Web site of the journal (http://www.atala.org/-Revue-TAL). The journal only publishes original contributions in French or in English.