Shane Bergsma* and David Yarowsky*
Improved readability ratings for second-language readers could have a huge impact in areas such as education, advertising, and information retrieval. We propose ways to adapt readability measures for users who (a) are proﬁcient in a particular domain, and (b) have a particular native language (L1). Speciﬁcally, we predict the readability of individual words. Our learned models use a range of creative features based on diverse statistical, etymological, lexical, and morphological information. We evaluate on a corpus of computational linguistics articles divided according to seven L1s ; we show that we can accurately predict the target readability scores in this domain. Our technique improves over several reasonable baselines. We provide an in-depth analysis showing which kinds of information are most predictive of word difﬁculty in different L1s, and show how this differs for style and content words.