Cambridge English Language Teaching  
  • View basket
  • Help
Home > English Language Teaching > Cambridge International Corpus > Cambridge International Corpus
Cambridge International Corpus

What is it?

The Cambridge International Corpus (CIC) is a very large collection of English texts, stored in a computerised database, which can be searched to see how English is used. It has been built up by Cambridge University Press over the last ten years to help in writing books for learners of English. The English in the CIC comes from newspapers, best-selling novels, non-fiction books on a wide range of topics, websites, magazines, junk mail, TV and radio programmes, recordings of people's everyday conversations and many other sources.

How big is it?

The corpus resources of Cambridge University Press are huge and currently include over one billion words, and the CIC continues to expand each year as new data is added. This gives us access to:
British English
No. of words Corpus
700 million Written British English
18 million Spoken British English including the unique CANCODE corpus, collected jointly by Cambridge University Press and the University of Nottingham
20 million Written British academic English
60 million Written British business English
1 million Spoken British business English — CANBEC — The Cambridge and Nottingham spoken Business English Corpus
American English
No. of words Corpus
275 million Written American English
30 million Spoken American English including the Cambridge-Cornell Corpus of Spoken North American English collected jointly by Cambridge University Press and Cornell University in the United States
9 million Written American academic English
40 million Written American business English
Learner English
No. of words Corpus
30 million Learners' written English (the Cambridge Learner Corpus)
15 million Error coded learner written English

Cambridge University Press is also a consortium member of the American National Corpus, which will substantially increase our holdings of American spoken and written data over the next few years.

Analysis of the CIC can give accurate information because of the huge size of the corpus and because the types of text are so wide-ranging and varied. We are able to avoid the pitfall of skewed results encountered when smaller amounts of data are used.

The CIC is continuing to grow as new collaborative ventures and joint research projects are initiated. This means that Cambridge University Press can remain at the forefront of corpus based ELT publishing, with up-to-the-minute resources to on which to base publications for students and teachers.

Who can use the CIC?

At the moment, because of third party restrictions, the CIC can only be used by authors and writers working on books for Cambridge University Press.

How is the CIC used?

Authors, editors and lexicographers use the CIC and Cambridge Corpus Tools (a state-of-the-art software program developed at Cambridge University Press) when they are working on books for Cambridge University Press. They can search the CIC to find examples of how English is used and to check facts about the English language.

Course book writers, for example, are able to delve into the wealth of data available and find plenty of authentic, real-life examples, which appeal to students and are relevant to their interests.

Writers of grammar and vocabulary books, too, can use the CIC to look at grammatical constructions, to look at words and meanings and how they are changing, and at how we use phrases and groups of words. They can look at frequency of words and see which words are used most commonly in different contexts.

For example they can compare spoken and written English in the Cambridge International Corpus to find out whether a particular word or phrase is used more commonly in speech or writing.

Our dictionary writers, like all our authors, benefit from the sophisticated software developed by Cambridge University Press, which allows them to search the corpus and analyse the results in depth. The results of such analysis are incorporated into specially designed usage notes and study pages in Cambridge dictionaries. The English language is constantly evolving and the continuing growth of the CIC means that we can monitor trends in English and see, for instance, which new words are only short-lived and which are adopted into English on a more permanent basis. In addition, dictionary examples illustrating word use can be taken from the corpus, making them sound natural and realistic.

All this enables the writers of Cambridge materials and Cambridge dictionaries to portray the English language more vividly and more accurately than ever before.

The Cambridge International Corpus is an important resource behind Cambridge publications and means teachers and students can depend on Cambridge University Press for the highest quality materials for English Language Teaching today.

Home
What is a Corpus?
What can Corpus do for me?
Which Cambridge publications use the Corpus?