Professor Clair Bowern of the Yale University Linguistics Department is leading a
ground-breaking North American Dialects Project to create a snapshot of the demographic and geographic diversity of North American speech by collecting 5,000 samples. Volunteer participants can contribute their voices by logging into the project website.
Bowern has recorded the initial 1,500 voice recording samples using Evoca’s voice-to-web recording service. She has embedded the Evoca browser mic on the project’s dedicated website. Anyone with a computer mic can simply click on the record button and speak into the microphone. Most laptops are equipped with built-in mics or a mic can be plugged into the computer. Skype users will use their headsets.
Voices are instantly recorded and saved online as MP3 files to the project’s account in the Evoca system and then analyzed by Bowern and other linguistics experts. Bowern reports that Yale students have been actively involved in the project as capable researchers and enthusiastic voice sample contributors.
Murem Sharpe, Evoca CEO, interviewed Professor Bowern about the project and her professional background in linguistics.
Bowern describes the value of using Evoca with the words ” … we can’t interview people in person. Otherwise that would take years and years, and maybe the dialects would change before we were able to get a lot of data together,” and added “… we were looking to use a web-based solution to make recordings.”
“The North American Dialects Project as led by Professor Bowern is an excellent application of Evoca’s web-based voice recording production services,” commented Murem Sharpe, Evoca CEO. Sharpe noted that “Evoca Express is easy to use, and at the same time, highly secure and reliable. These features are practical for linguists in carrying out their important work in language documentation, study, and analysis.”
While pursuing her PhD in Linguistics from Harvard University, Bowern continued her professional specialization in the documentation and study of endangered Aboriginal Australian languages of the Pama-Nyungan and Nyulnyulan families in the northern part of her home country.
The transcription of Sharpe’s interview of Professor Bowern, produced with the Evoca transcription service, is provided below.
Interview with Professor Claire Bowern, Yale University by Murem Sharpe, CEO, Evoca
Murem Sharpe: I’m here this morning with Professor Claire Bowern of Yale University. This is Murem Sharpe, the CEO of Evoca. Good morning Professor Bowern.
Claire Bowern: Good morning.
Murem Sharpe: Thank you for this opportunity, first of all to talk with you as a subscriber to Evoca, but also most importantly to hear about the work that you’re doing in linguistics. Could you please tell us a bit about your background and how you came to be a professor of linguistics at Yale University?
Claire Bowern: Sure. I’m always happy to talk about linguistics and different things like that. I’m from Australia originally as you might be able to hear by my accent. It’s not a typical North American accent. I came to the US to go to graduate school in linguistics some time ago now. And from there, once I got my PhD, I decided to stay in the U.S. and I worked at Rice University for a couple of years before coming to Yale a couple of years ago.
I work on a whole bunch of different things that relate to language in different ways, mostly historical linguistics. How language has changed over time, how we can use the information from language to find out about the past, and what correlations we have between different linguistics features and how those might vary with things like geography. And that’s how I got interested in North American dialect.
Murem Sharpe: Very good. How do you in the best sense of the word, use Yale University and its students and resources with your projects?
Claire Bowern: Yale is a very wonderful place to work. It’s sort of the place that has a really good set-up for doing research, research infrastructure, and that sort of thing. But the students are also very motivated to participate in research projects, right from when they arrive on campus pretty much, it seems.
So we have a pool of undergraduates who are very keen to get involved in research and that means both being research participants and helping with the analysis of the data and the coding of the data, and things like that. I also work with graduate students who work on their own projects and they work with me on things related to what I do as well.
Murem Sharpe: Very good. Now for the North American Dialect Project in particular, what was your purpose in organizing the project and then could you just go right into how you organized it? We certainly appreciate your mentioning how you’re using Evoca, the technology in I, but also to mention the other technologies you’re using.
Claire Bowern: Sure. This work I should mention is being done in collaboration with some colleagues in Auckland, New Zealand, some evolutionary biologists who are interested in different ways that we can look at language and how language has changed, and the similarities between changes in language and changes in biological systems. So this is part of a much larger set of projects and set of ideas for work.
We were looking at language in general, dialects in general and so not just English dialects. And we were interested in thinking about the different ways that we can categorize variation. So we know that people speak differently and they can also speak, the same people can speak differently to different other people. So I don’t talk to my grandmother the same way that I talk to my husband for example. And that’s true for basically everyone.
So we were interested in seeing how we could categorize some of this variation and how we could see what, if we looked at a particular geographical area, say a city like New Haven, where Yale was located or a city like New York and try to categorize the different ways in which people speak the same and also differently.
We know from the wider linguistic literature that language varies by age, so younger people can speak differently from older people. We know that ethnicity plays a role in dialect formation. That men and women can speak differently. And of course also, particularly in the US, there’s a location-based difference as well, so people from New York sound different from people from Georgia, or people from Houston, or people from LA.
But all of these factors get mixed in together. Previous work on U.S. dialects tended to try and keep these factors as separate as possible so they will record, say only white people or only African Americans or only Hispanics. They’ll try and record older people rather than younger people. They’ll try and record people who spent their entire lives in the same city and not moved around so to get a better handle on the way that geography works in categorizing different languages.
So what we wanted to do was to do something different. We wanted to take the variation as read. We recognize that we have in a place like New York a lot of differences. People who live in New York would have a lot of different contacts with people who speak in English in different ways. Then see whether we could store, extract geographical signals when we look at all of the other different ways that done, the language will vary.
Now in order to do that, we need to have a very large sample of people. The previous large samples of English dialect recordings are about 300, 400, 500 people. We need recordings in the thousands to do that. So that means we can’t interview people in person otherwise that would take years and years and maybe the dialects would change before we were able to get a lot of data together.
So we were looking to use a web-based solution to make recordings. We also needed to have the survey be very fairly short otherwise people wouldn’t want to participate. And we needed to keep it fairly simple so that we would be able to get high quality data from a lot of people without having to do a lot of explanation for what they needed to do.
So something like Evoca’s web recorder that we can embed in a website has been extremely useful for us. It was the only way that we’d be able to do this project. We wouldn’t be able to do it without something like this.
So what we do is we have an Evoca recorder that’s linked to my Evoca account. We have a website that has the instructions for what people need to read and then they entered some information about themselves. So their age, gender, ethnicities, current zip code, and where they went to high school and whether they speak another language as well. And then they record the items. The items are to count from one to ten, and to say a sample of words. It’s about 60 words I think in total. And then they stop the recording and send it to us and then we have another data point.
So far we have about 1,500 responses. We’re hoping for probably about four or five thousand from the U.S. and Canada. So we’re well on the way but we still have some way to go.
Murem Sharpe: Well, I’m certainly hopeful that in broadcasting this interview so to speak, and some information around it in our blog articles and marketing communications that we may be able to help increase that number, so that you can achieve your objective of thousands of samples.
Claire Bowern: That would be absolutely wonderful.
Murem Sharpe: Yes. And again because Evoca is a software company, a technology company, and we also respect the software and technology of other companies. What is the name of the software that you’re using to do the analysis?
Claire Bowern: Oh, yes, I should tell you a bit about that. So once we do the recordings and using Evoca and download them from the Evoca site, I use a free program called Praat which is widely used in linguistics analysis. It’s a speech analysis tool.
So I load the recordings into that and I tag them for the words that were spoken. And then there are various things that we can do with the sounds to categorize how the different vowels sound.
Just to give a very brief example, someone from Texas is likely to say something like “mah” for what I would say is “my” as in “my mother” or “my uncle”. So the vowel in “my” is quite different for someone from Texas, than from, say, someone from New York.
There are ways that we can measure that difference. And so the speech analysis program that I use extracts a number or range of these vowels and then we can put that into a spreadsheet program and do statistical analysis on them.
Murem Sharpe: Good. And where do you expect to publish the results of this project, the North American Dialect Project?
Claire Bowern: We have a couple of ideas for papers and publications for this. We would assume that we would do a general summary paper for a general publication such as “Language.” That’s the main journal of the Linguistics Society of America. It’s where a lot of current researches are published in linguistics.
I’m also planning to do a general summary of the methods and the results and some of the maps that we’re going to create. And I’m going to put that on the website with the dialect project so that the people who’ve contributed will be able to see what the results are. And that will be a very non-technical discussion of the results and what we can conclude from them. There are other things that we’re planning to do with the data at some point but those would be the first things that we’d want to do.
Murem Sharpe: Good. Because I’m sure there will be many interested parties on the lookout for that information. And this leads to a question. It’s certainly a burning question for the Evoca team and others who are very interested in language and supportive of research concerning language. Could you please comment from your field? You have your doctorate from Harvard and a tremendous amount of experience and are a Yale professor. What should we be concerned about with respect to language is that or I’m going to use a lay person’s term, disappearing or perhaps becoming extinct? And what is going on in your field regarding those languages?
Claire Bowern: There’s a lot that’s going on at the moment. This is actually what I do when I’m not working on North American dialects. I work on a couple of Aboriginal Australian languages from the Northern part of the country. When I started my PhD about ten years ago I guess, yes, not ten years ago now, there were 50 speakers of the language that I was doing my PhD on. Now there are really only about five, that probably fewer than five now. So it’s something that I personally feel very strongly about because I’m seeing the ends of languages that I have a lot to do with.
I think in the field in general, there are a couple of things that are happening, current set of trends. One is that we’re seeing an increase in public awareness of endangered languages. People like David Harrison and Gregory Anderson have been widely reported in the media. They’re part of the National Geographic Language Hotspots team. So they’ve been doing a lot of work in raising the profile of endangered languages. So we’re seeing more of that in the media, which has to be a good thing for everyone.
We’re also seeing more funding for basic language documentation. One thing that’s very crucial for endangered languages from the linguistic point of view is that most of these languages are “undescribed”. So we don’t have grammars or dictionaries or recordings of most of these languages. Many of these languages don’t have writing systems so once they’re gone, they’re gone forever. It’s taken many, many thousands of years for this diversity to accrue in a world and we’re going to see it disappear very quickly, and assume we will never see it recreated. So it’s good to see that there’s more funding for doing basic documentary work.
The third thing is that communities (of) endangered language speaker communities are also getting evolved in asserting their rights to speak their languages and to teach those languages in community schools and community programs. So there are various mailing lists for instance, email lists, web groups where speakers of endangered languages, Native Americans, those language, those nations, people in the US and Canada are getting together to share ideas and experiences about how they can preserve their languages and pass them onto their kids and revitalize them in the cases where they’re no longer spoken as a first language in the community.
Murem Sharpe: And are these projects generally funded by government agencies? Or do any non-profit foundations stand out as supportive in this area?
Claire Bowern: This is not a highly resourced area compared to say public health funding. But there are some organizations that fund us a lot. The National Science Foundation does research funding for documentation of endangered languages. They have a joint partnership with the National Endowment for the Humanities.
There are also some private foundations. There’s the Foundation for Endangered Languages and the Endangered Language Fund. One of those is a UK-based organization and the other is actually based at Haskins Lab which is affiliated with Yale. But they both have worldwide scope.
I’m trying to think what else there is. There are other sporadic initiatives. The Microsoft Foundation has been partnering with some Native American organizations to port Microsoft software to have local versions. That’s particularly the case in North and South America. They have a pilot project I think with about 25 languages.
And there was one other, oh yes the Rosetta Stone, the Rosetta Project. Well the people who did the language learning software, they also have a charitable component where they donate their software and their recording engineers and software experts volunteer a certain amount of time to help a couple of first nations. I think it’s been most first nations in Canada and Native American groups in the US to create Rosetta language learning software for those communities.
There’s been one for Inupiaq which is spoken in Alaska and Northern Navajo. And I think they have a couple more in the works as well.
Murem Sharpe: Well that’s very good to hear. As you know, Evoca is a commercial voice-to-web service, but our approach is to make our services highly affordable, so we’re certainly hopeful that we can be part of the very important global community that is working to save and document endangered languages, as well as to study. What you are doing with the North American dialect, a broadly spoken language of American English, is to help us understand all the many differences.
Claire Bowern: Spoken languages with the most number of speakers and the language with the smallest number of speakers.
Murem Sharpe: Exactly, that’s quite a spectrum. Well, I hope that this is not our last conversation because I would like to put this interview out on wires and also some information around it so that you certainly can get more people aware of the North American Dialect Project, in particular, but also across the board to understand what’s going on in the field of linguistics and to reach out to you and your colleagues whether they are professionals such as yourself or others who want to support the work that you’re doing.
Claire Bowern: Well thank you Murem. It’s great to talk to you.
Murem Sharpe: Okay. Thank you so much Professor Bowern. Bye now.
Claire Bowern: Bye.