TD: screencast – simplifying texts for your learners using VocabProfile

In this post, I’ll show you a fairly easy way of analysing and reducing the vocabulary load of any text you want to use with your class, by changing or removing the number of low-frequency words and expressions it contains. I’ll write a bit about why it can be a good idea to try and simplify authentic texts, and list some ways you can use the techniques I describe with your students.

This tutorial is designed in part to help teachers use the texts in some of the topic-based materials I’ve collected here, though it has more general applications as well, as we’ll see!

The screencast below will talk you through the stages of simplifying a text for your learners, and touches on why you might want to do so; and the written text underneath it goes into the same issues in a little more depth, as well as adding a couple of uses of the program(s) we’ll encounter along the way.

Why simplify a text for your learners?

My answer to this is bound up in the idea that some words work harder than others. For example, in the introduction to this post, the word “use” occurs three times, “texts” appears twice, and “simplify” appears only once (I used the excellent, free Textalyser to uncover these word frequencies). As we collect frequency data from greater and greater numbers of texts, we find that some words appear almost everywhere, while others are far rarer, perhaps even appearing just once every five years or so. A good collection of high-frequency words is the General Service List or GSL (West: 1953), which is a set of 2,000 headwords deemed to be of greatest “general service” to English language learners (a headword plus any inflected forms and close derivatives constitutes a word family, like “conclude, concluding, inconclusive, …”). In my own little experiments, words from the GSL made up about 75% of all the texts I tested, give or take 5%. If this is generally true of texts in English, it means that students who are familiar with the 2,000 headwords in the GSL should at least recognise about 75% of the words in every English-language article they came across, which puts them on the road towards understanding these texts.

Here’s an example. The text below is the first few paragraphs of this article from the BBC:

The US unemployment rate dropped sharply to 8.6% in November, its lowest level in two-and-a-half years, from 9% the month before, official figures show.

The US economy added 120,000 new jobs in November, the Department of Labor said, in line with forecasts.

The number of jobs created in September and October was revised up by 72,000.

The US has struggled for many months with slow growth while the unemployment rate has remained stubbornly high.

One of the reasons for the sharp drop in the unemployment rate in November was the large number of people who gave up looking for work, and therefore were no longer counted as part of the workforce.

The report helped the US market to open higher, with the Dow Jones index climbing 0.8% in early trading.

Vocabulary researchers Paul Nation and Peter Gu have written that, for “reasonable unassisted comprehension” of a text, a learner should be familiar with 95-98% of its words (Nation and Gu: 2007, p.23). If I paste the above excerpt in the English Web VocabProfiler, I discover that 88.32% of its words appear in the top one thousand on the GSL, 2.92% appear in the second thousand, and 5.11% appear in the Academic Word List. That is, if learners are familiar with both the GSL and the Academic Word List, and no other word families, they’ll still recognise 96.35% of the text’s content.

Of course, there’s a lot more to vocabulary than merely a list of words, and recognising most of a text does not equal understanding it: in the extract above, learners may recognise “in line with” as separate words, but not as a fixed expression; likewise, metaphors, idioms, phrasal verbs, and words with different meanings may all present problems; also, it might be that a key word is amongst the 3.65% of unfamiliar ones. However, it seems that, the more learners get acquainted with words on the GSL and the Academic Word List, the easier a time they will have decoding most texts in English, whether written or spoken. It may not even be advisable to spend time in class teaching or focussing on words which are not on these lists, at least with students at intermediate level and below, but rather let them explore low-frequency words for themselves and in their own time – perhaps by guessing in context (assisted by a dictionary) or by looking at the familiar elements of the unknown word, or by using a concordancer.

Context will affect this recommendation to an extent: if you are teaching English for specific purposes (English for nurses, or businessmen), some of the most useful word families for your students may not be on either the GSL or the Academic Word List. However, such lists could be modified for specific lexical domains, and many of their listed words will still occur within the GSL or the Academic Word List.

How to simplify authentic texts

If it is generally a good idea to focus on higher-frequency words in class, and if we want our learners to achieve reasonable, unassisted understanding of a text, we should simplify that text as necessary, so it conforms both to the word families in the GSL and the Academic Word List, or to our modified versions of these lists, and (as far as possible) to our learners’ progress within them, whether facilitated through explicit teaching, or introduced via texts, discussions, and so on.

Here are the steps I follow if I want to simplify a text for general English classes (please note, I go into this part in more detail in the screencast, above):

  1. Highlight and copy the text, and paste it into the Web VP v.3 vocabulary range finder, then press “submit”;
  2. Check the percentage of “K1 Words” in my text, as per the image below (K1 Words are the first 1,000 words on the General Service List [see above]). If this is less than about 85%, I’ll probably try to find a different text, else there will be too many low-frequency words for my class to focus on;
  3. Web VP vocabulary range analysis screen

  4. Assuming all is well with the number of K1 words, open up a word processing program and copy my text into it, then go back to Web VP and scroll down until I see the colour-coded text (see the image below; light blue = K1 words, green = K2 words [also from the GSL], yellow = words from the academic word list, red = lower-frequency words). Then amend, delete or leave in place the lower-frequency words, as appropriate for my class.
  5. The second Web VP vocabulary range analysis screen

Some uses for the Web VP application

Apart from simplifying texts, here are two useful things you or your students can do with Web VP and related programs:

  • Analyse texts

  • This is what Web VP was written for, after all; and it can be an especially useful way for students to find out what level and kinds of vocabulary they need to focus on for English language exams (there are plenty of websites where you can see exam reading and listening texts – here’s a Google search for IELTS reading texts, and here’s another for FCE listening transcripts, for example).

    One problem with using Web VP v.3 to check the vocabulary range needed to understand exam texts is that it only has four frequency bands: K1 and K2 (from the GSL); words from the Academic Word List; and lower-frequency words. You or your students may prefer to use the Web VP BNC-20 application, which analyses and colour-codes texts according to a 20-band frequency list from the British National Corpus (note, Web VP BNC-20 doesn’t include the Academic Words List). You can access the BNC-20k word lists here.

  • Analyse your learners’ texts

  • If your students submit digital versions of first or final drafts of their written work, you or they can copy and paste these into Web VPs v.3 or BNC-20 to see the level of their written production. The more advanced the student’s English lexicon, the greater the number of words outside the K1 and K2 ranges, or outside the first 2,000 most frequent words of the BNC-20 list. An intermediate-level learner’s written work might have 15% of its different words (called “word types” or “types” in the Web VP programs) outside K1 and K2, for instance. Note that your students will need to submit texts of 300 words or more to get a statistically relevant result, however!

    You or your students can check their work regularly over a longer period of time (say, 10 months) to chart their progress towards proficiency. Note, though, that this method generally records a slower progress than either the (receptive) Vocabulary Levels Test or the Productive Levels Test (Laufer and Paribakht: 1998, Laufer: 1998), and it will definitely record a slower progress than that measured in any regular progress tests you devise for your class, so don’t let your students get too disheartened if you use this approach, and don’t only use this one way of measuring vocabulary development!

Laufer, B The development of passive and active vocabulary in a second language: same or different? (Applied Linguistics vol.12, 1998, pp. 255-271)

Laufer, B and Paribakht, TS, Relationship between passive and active vocabularies: effects of language learning context (Language Learning vol. 48, 1998, pp. 365-391)

Nation, Paul and Gu, Peter Yongqi, Focus on Vocabulary (Sydney: Macquarie University Press, 2007).