Corpus workshop SWET 2019

All materials at <http://materials.stoptothink.org>

Register

https://www.english-corpora.org      (BYU corpus)

Today we will be looking at some examples from the BYU corpus, using the “English Corpora” site, hosted by Professor Mark Davies at Brigham Young University in the US. The site is free, and you can make six searches per day without registering. If you register, though, you can make more searches. Registering is very easy and safe, and the site does not send any spam email. 

To register, go to https://www.english-corpora.org   

Click the “my account” tab, and then “register”.

Enter your details. Select “teacher, not university”

Check your email for confirmation mail. Click on the link in that mail. 

An introduction to simple corpus searches

1. Corpus basics

1. The basics of using a real corpus as a tool for language learning: how a corpus works and what can be concluded from the search results; four steps of corpus investigation.

How a corpus works. A corpus is a collection of texts and/or speech in electronic form. Each word of the texts is given a specific tag according to the part of speech it represents: for example research can be studied as a noun or as a verb. As you look up a certain word, the results are usually presented in the form of a KWIC-list, although other options are also possible. The KWIC-list shows you how the word you searched for is used in real sentences, and studying the sample provides you with a wide range of information considering that particular language item.

Two corpora, similar search interface

1. The British National Corpus (BNC, 1980’s-1993) of 100 million words 

2. Corpus of Contemporary American English (COCA, updated twice a year).

WARM-UP. When just one look can speak volumes

To get familiar with the BYU-interface, let’s start by a simple search. Please read the instructions carefully, and then go to the BNC/COCA through a link on the left.

Type the word look in the search string field. Press enter or click the search button.

Click on the item to open a KWIC-list of 100 sample sentences in the lower frame. You can get more results by clicking the link at the top of the KWIC-list: examine at least the first 200 of the search results.

Notice also that by clicking the item number in the KWIC-list (the first column on the left) you get the expanded context (a larger piece of text and references) for the phrase in question.

Now observe the sample sentences and answer the questions below.

1. What prepositions can “look” as a verb take? 

2. How does the meaning of the verb change according to the preposition it takes?

3. Which adverbs and adjectives do occur with “look”?

4. “Look” can also be a noun. Which adjectives are used to modify the noun? In what type of texts you think you could find these examples?

5. What verb is often used before the noun “look”?

6. Find at least three examples of “look” used as part of a fixed expression:

7. “Look” is sometimes used in the beginning of a sentence. Based on the examples you found, what can be said about the context in which it can be used? (eg. written/spoken language, academic/non-academic context)

Was that it? The answer is no, this is not all about corpora. The purpose of this exercise was to hit it off with corpus searches and illustrate the wide range of information a simple search with a single word can bring. By looking into the search results instead of just looking at them you can really get to know a word. In a way a corpus offers a shortcut to the kind of knowledge that native speakers have on their language.

The BNC/COCA functions

The corpus interface enables you to work on the English language in several ways. With BNC and COCA you can

  • search by word, phrase, part of speech ( e.g. adjectives, prepositions) and lemma (e.g. all forms of be: am, are, were, being)
  • find and compare synonyms of a given word
  • find words that collocate (group together, are used side by side)
  • explore the usage (context, genre, collocates) of a word/expression
  • compare the use of words and their collocates across time periods and genres
  • find words that stem from a specific word, wordfamilies (e.g. conclude; inconclusive, conclusion)
  • explore a genre for its specific features (e.g. typical verbs in Academic English)
  • All these functions and possibilities serve the same end: researching the language to better understand the meaning, usage and context of words.