Cambridge International Corpus : Cambridge and Nottingham Corpus of Discourse in English

What is it?

CANCODE is the Cambridge and Nottingham Corpus of Discourse in English. It is a unique collection of spoken English that has been built up by Cambridge University Press and the University of Nottingham. It forms part of the Cambridge International Corpus. The recordings were collected in Britain between 1995 and 2000, keyboarded by trained transcribers, coded, and stored in a computerised database which can be searched with specially designed software. CANCODE comprises 5 million words.

About the recordings

The tapes for CANCODE were recorded at hundreds of locations across the British Isles. They include a wide variety of situations: casual conversation, people working together, people shopping, people finding out information, discussions, and many more types of interaction. Only spontaneous speech is found in the CANCODE corpus.

How is it different?

A feature of CANCODE that makes it different from other spoken corpora is that all the recordings have been coded according to the relationship between the speakers: whether they are intimates (living together), casual acquaintances, colleagues at work, or strangers.

This coding allows us to look more closely at how different levels of familiarity (formality) affect the way in which we speak to each other.

How is it used?

The University of Nottingham use CANCODE for many kinds of research into spoken English.

At Cambridge University Press, authors, editors and lexicographers use CANCODE and the Cambridge Corpus Tools (a state-of-the-art software package developed at Cambridge University Press) when they are working on books for Cambridge University Press. They can search CANCODE to find examples of how English is spoken today and to check facts about what people really say when they talk to each other.

Here's what a lexicographer would see if they were writing the dictionary entry for 'know' and wanted to see how people use the word 'know' when they are speaking.

The lexicographer can take any citation and look at more of the context than is shown here. They can also sort the citations in many different ways to help them analyse the use of the word and its surrounding constructions.

See how many different types of use of the word 'know' you can find in this sample. Compare your answer with the different meanings in the Cambridge Learner's Dictionary or use Cambridge Dictionaries Online

Which Cambridge publications use the Corpus?