Corpus data: however

a look at the difference between but and however August 6, 2018 language corpus

As a teacher, it’s a common situation: your student wants to know the difference between two similar words. Often, you just don’t know the answer to the question. Most of the time, all you can do is use your own experience to make a guess.

This isn’t usually a great way to solve this problem—sometimes our intuition might just be wrong! Luckily, we can use something called a corpus—a collection of text, like magazine articles, research papers, and even transcriptions of people speaking to each other—to answer these sorts of questions.

In this article, I will briefly show how to use a corpus to compare the difference between two words. I will look at different registers, which are different language styles, like for writing a newspaper article versus talking to your friend. The two words I will look at are “but” and “however”.

TL;DR? Skip to the conclusion, but I’ll bet you’re nerdy enough to continue if you’re here in the first place.

For one thing, “however” and “but” are different types of conjunctions (subordinating and coordinating), so they will be used differently. But to an ESL student, knowing that the grammar is different doesn’t help him know when and how to use it. This problem is made worse by the fact that many English textbooks and courses ignore the question of how completely.

I decided to put aside questions of grammar and use Brigham Young University’s Corpus of Contemporary American English (COCA) to look at different registers. This corpus includes transcriptions of spoken language. Because it contains language from many different registers, it was the right tool for this question.

I wanted look at different registers for the two words, so I just entered them in the search box like so:

the search box for comparing different sections of the corpus

the search box for comparing different sections of the corpus

For me, searching does take some time because I live in China. (All websites outside of China are slow here.) Once the search is complete, it automatically shows me the results.

Results for “but”

Here is what a search for “but” returns:

register comparisons for the word “but” in American English

register comparisons for the word “but” in American English

It’s clear that it’s used most often in speech, and least often used in academic writing. Overall, though, we can see from the normalized word count1 (PER MIL) that it’s a common word. Go figure.

Results for “however”

“However” is different:

the same comparison for the word “however”

the same comparison for the word “however”

The most obvious difference is that it’s mainly used in academic writing and maybe a little bit in magazine articles. This means that only students at a level to read magazine articles in English are likely to encounter this word.

The other thing that’s important to note is that “however” is a full 13 times less common than “but” in American English.

Making statistically informed decisions in the classroom

For students whose primary goal is oral communication, teachers shouldn’t put too much emphasis on “however” because students are unlikely to need to say it. “But” is 13 times more common. They are likely to read it in some places, though, so at most they need to be familiar with what it means.

If teachers do teach it to students, we should teach them not just about the form of the grammar but when to use the word. Exercises and examples should be based on reading and writing magazine articles.

At least, teachers should tell their students that we don’t use “however” much when we speak—it’s mainly used for writing.

Using a corpus like COCA is a useful tool to answer questions like these. As a teacher, this is information that you can’t really get anywhere else. If we can avoid teaching students vocabulary and grammar that isn’t useful to them, then we become better teachers. Using COCA to see the difference between “but” and “however” is just one example of a tool teachers can use to make their jobs easier.

Have any comments, questions or rants to send my way? If you know me personally, just give me a call or send me a text. Otherwise, you can email me, but I may or may not receive the message.

  1. When we “normalize” a number, it means we use math to make the numbers easier to compare with each other. Each word count is in the form of “per million words”: in one million words, you will see “but” 4,484 times.

Randy Josleyn teacher-linguist-guitarist wannabe