130: Back to the FTR (with Séan Roberts, Cole Robertson, and Annemarie Verkerk)

December 6, 20252h 53m · 30,213 words

Open in Steadcast for Mac Apple Podcasts Overcast

Show notes

You know the story. The language you speak doesn't determine your savings. If your language has a future tense, there's no impact on the way you see or describe the future. Language and perception are separate. Well, maybe it's time to revisit this. Séan Roberts and Cole Robertson are finding a cognitive connection, not with how our language makes us talk about the future, but with how our language lets us express uncertainty. Also, Annemarie Verkerk and Hedvig Skirgård team up to test out language universals. Which ones are getting knocked over? Timestamps Start: 0:00 Intros: 0:36 News: 6:08 Chat with Annemarie Verkerk and Hedvig Skirgård: 23:06 Related or Not: 49:22 Interview with Séan Roberts and Cole Robertson: 1:10:38 Words of the Week: 2:18:09 Comments: 2:37:20 The Reads: 2:42:37 Outtakes: 2:50:05

Highlighted moments

“we found across the board that, um, when you look at how, uh, the future tense works and how this temporal distance stuff works, all of that is just non-significant.”

Jump to 1:57:30 in the transcript

“LLM Grooming refers to the deliberate manipulation of large language models by flooding their training data with disinformation, aiming to bias their outputs towards specific narratives.”

Jump to 2:25:19 in the transcript

“the most exhausting relationships I've ever had are with people who you have to have reality fights with. You have to constantly do battle to establish that the version of events that you remember was real.”

Jump to 2:25:57 in the transcript

Transcript

0:00By the way, had a birthday party for my oldest daughter. Tried to bait the kids into a 6'7". Didn't work. I think it's passé now. Oh, interesting. Yeah, too many people know about it. Yeah, when sad, tired old dads like Daniel are trying to get in on the joke, that's how you know for sure it has died.

0:30Hello, and welcome to Because Language, a show about linguistics, the science of language. I'm Daniel Midgley. Let's meet the team. First up, it's Hedvig Hirgaard. Hedvig, your favorite way to express the future in English. You have to choose one. Will, shall, or gonna. Or something else. There might be others that I haven't thought of. I'm trying to think of what I'm going to do tomorrow, and then I'm trying to say it. Or tonight.

1:00I'm going, going to, going to, going to. 100%. Going to. Yeah. It's a classic. Doesn't get much better. Yep. It's there for a reason. Okay, and our second co-host, Ben Ainsley. Hello. Same question. Will, shall, or gonna. Tomorrow, I will. My brain wants to say that I will, but I suspect I'm a gonna. Not a going to, a gonna. Gonna. It's gotta be gonna. I like gonna, too.

1:31I love shall. I want to bring back shall. Big fan of shall. Oh, yuck. No, that's, that's as, that's as noxious to my ears as an Australian affecting appreciate. Tomorrow, I shall feast upon dainties. But you can use I'll, and then just be like, actually, it's not I will, it is I shall. And I just. Oh. You can't tell. Yeah, okay, that's fun. You can just. I don't have to tell you that I'm thinking shall. I can say I'll, but in my head, it's shall.

2:02You'll never know my secrets. I just noticed, by the way, have you noticed that I'll in the supermarket and I'll see you tomorrow? Yes, that is something I have. We have noticed that, yes. You've noticed that. I just did.

2:17Bonus, I'll like the small thing in a body of water. All three. Yeah. I knew someone who had internalized that as all and wrote, I'll be back, A-L-L, which I thought was really interesting. Well, for this episode, we are having a chat about the future, which is why I introduced us this way. You see, we did a news item a little while ago. We were revisiting, once again, the story about if you have a grammatical future tense, do you have better savings?

2:48And we've long poo-pooed this story. A very Wolfian sort of idea. Memory test. Ben, do you remember, we talked about this with our guests as well, but do you remember which direction it's supposed to go in? Oh, if you do have future tense, you would be better at it, right? Better at saving? Yeah. That's what you'd think. Right. But in fact, it's according to work by Keith Chen, 2013, it's the opposite. Because remember, I say remember, I hardly remember.

3:18This is about temporal discounting. So if you, according to the original Worfian paper by Chen, if you have a grammatical future tense, then it means that your present and your future are disconnected. It's like the future is some weird, different thing. Okay. And you can discount the future. So we mentioned an article by Dr. Cole Robertson and Dr. Sean Roberts, friend of the show, in which they found people on the individual level didn't discount the future just because

3:50they had a separate future tense. It just didn't work out that way. In fact, it worked the opposite. Well, we thought there was a little more to the story. We wanted to find out a bit more of the lore of how they got involved, how they got wrapped in to this project and what their work on the future has found. So we're going to be having a chat with Sean and Cole for this episode. Ooh. And we will be going to the future. I do think that they labeled our chat Back to the Future, which I thought was very funny.

4:20And I think we should keep for the episode if we can. Well, that's why this episode is called Back to the FTR, which means future time reference. Oh. So we'll hear that. Nested levels of meaning. Our next episode is going to be our last episode of the year. It's the Word of the Week of the Year episode. And you can get involved in lots of ways. If you are a patron, even if you're a free patron, you can come to our live episode. We'll have special guests. We'll count down our words of the week of the year and see which one wins. Does anybody remember our last year's Word of the Week of the Year?

4:51I do not. Wait, which one won? Yeah. What? I can remember. Wait. I'm thinking really hard. Oh, yes, I do remember now. No, that wasn't last year, was it? Incentification was two years ago. Yeah, that sounds right. Last year was washing. Oh, yeah. Different kinds of washing. Green washing, pink washing, sports washing. Sports washing, sane washing, which, you know, wasn't my favorite. Get your votes in. The other thing is we are having our annual mail out.

5:23So, patrons, please make sure that your address is correct in Patreon if you want that. Maybe there are a lot of people who just don't have their address on file because they don't want it, which is fine. But if it is on file, I will send you a mail out if you are a paid patron at any level. So, we'll have some fun stuff that way. That's very cool. And we didn't end up making the Coppertone ad where my bum is, like, coquettishly being shown. So, don't let that stop you becoming a patron is what I'm saying. Oh, Ben, we've all seen your bum.

5:54I haven't. A big face right now is a very clear, like, I haven't. I haven't seen that. I haven't. I don't know if I want to. I haven't. I feel like I'm left out of something. No, no, no, no. Okay, let's get to the news. You ready? What's going on in the news? Other than my buttocks, which are obviously hot news in so many more ways than one, what linguistically is going on that is newsworthy? Have you noticed how large language models are terrible at getting stuff right when it's

6:25something that you know well? I have noticed this, yes. I wouldn't say terrible, but definitely not good. I have noticed definitely biases in training data. So, the more something is written about on the internet, the more likely it is to get it right. And when you ask it about something niche, especially if it doesn't occur a lot on the internet, it is about to get things wrong. I found that etymology is especially bad. It just makes things up terribly because it doesn't seem to have any access to word histories. The hallucinating with stuff is, especially when you get it to cite references, is wild.

7:01It is unhinged how readily. It's like the worst kind of undergrad student. You'll just be like, oh, okay, cool. Yeah, that's a really interesting point you made about the implications of this thing on this thing. What reference were you using to base your conclusion on? And then they'll say a thing and then you'll be like, let's have a check. And it's like, that just doesn't exist. The statistically likely thing for me to say is this paper and it's just nothing. Yeah, exactly. We need to all remember that it is a fancy autocomplete. You say something, it guesses out what comes next.

7:31That is it. Why do you bring that up, Daniel? How do you think they would do it linguistic analysis? Like solving the kinds of problems that linguists typically have to do? Like what? I think, for example, it wouldn't be too bad. I would guess that if you had a text collection, a corpora, and you wanted to tag it for word class. So every word it finds, you want to know if it's a noun, adjective, or a verb. I think it could do pretty well at that. Yeah, that's true.

8:01But then a simple part of speech tagger usually gets 99% and up. So I can't say the bar is pretty high. I would imagine my intuitive sense would be quite good because the inner guts of a large language model, I mean, it's in the name, right? It's a large language model. These things have been designed to ape and to mimic as best as is possible. And we have to say, to a pretty astoundingly good degree, natural human writing and speech. So clearly something in the working there is having a really strong and nuanced understanding

8:37of linguistics and language, at least on some level. That's my intuition. Well, this paper is an attempt to test that very thing. This is work from Gaspar Begos, Maximilian Dobkowski, and Ryan Rhodes. Did you know Ryan Rhodes? Pretty good follow on Blue Sky. Oh, nice. This is published in IEEE Transactions on Artificial Intelligence. It's open access. There's a link in the show notes. They got lots of large language models to do four kinds of linguistic analysis. And now I'm going to test you, Ben.

9:09Linguistic Olympics. Oh, no. Am I dumber than a machine? The answer is almost certainly yes. Hedvig will give us the answers. I'm so curious of what these tasks are, because linguistic analysis could literally be a lot of different things. Okay. What I love in this particular game is I have the very good job, but Hedvig has just been flung this, like, P.S., you just tell us whether it's right or wrong, and she has no idea what's coming. I'm sure that since she's a linguist who does analysis, she'll be great at this kind of thing. First one. This is the first task they gave it.

9:41Here's a sentence, and is this sentence ambiguous? Okay. Sentence number one. Eliza wanted her cast out. Eliza wanted her cast out. Are we assuming all of this is the written word and not the spoken word? It is written, not spoken. That is a... I was trying hard to speak it with a neutral sort of... That is an ambiguous sentence. Can you tell me how? Because it could mean two things, and it is equally possible that either could be true. The only...

10:12Well, it's actually... Sorry, that's not true. Tell us the two things. It's not... So, it could be that Eliza wanted someone or something cast out as in removed from a place, but the cast can also be a noun. It can refer to an actual object. So, it could be a cast that a person has on their arm, or it could be a cast, like maybe she's an artist, and she's working with like molding and casting and that sort of thing, and she wanted her cast out of the form or something like that. Okay, good. Yep. I agree. Maybe, I do think that it would have been better if it's cast off.

10:44Perhaps. Perhaps. She wanted her cast off. Oh, that is very good. That's ambiguous as well. Yeah, because you have your cast on your leg or something, and you want it off. Yeah. Now, if I were asking Hedwig to do a thing, I would say, all right, Hedwig, please now draw a syntax tree of both of those sentences, where you start with the... I started... You know what? I don't want you to do it. Give me a second. She's such a nerd. She already did the homework. No, I... Liza... And what this looks like for anyone who's never seen a syntax tree is it starts at the top with something like a sentence, and then that goes to something like, you know, a noun

11:18phrase and a verb phrase, and then it breaks down and breaks down. And when you get to the final end, you've got all the words. Liza wanted her cast out. Liza wanted her cast... Ben, I'm going to ask you the second task. Okay. They tested these models on recursion. Okay. Recursion is when a structure gets nested into another copy of the same structure. For example, I can have one adjective in a phrase, like the blue rug, or I can start doing recursion and stacking those adjectives in the same phrase, like the big blue ornate

11:54circular rug. Sure. So I gave you an example of recursion, where we stack up adjectives. Right. For this, they tested on a particularly tricky kind of recursion, at least for humans. It's called center embedding. Are you ready for an example? Yes. Hedwig knows what I'm talking about. This is a similar problem as the cast off thing. Okay. Yes, it is. But I think this is a cognitive problem. I like it. I like it. I like it. Gimme, gimme, gimme. So here's one sentence, right? Yeah. The man left. Easy. Now I'm going to take a sentence and I'm going to embed that into the middle of the first

12:26sentence. Okay. The man the girl saw left. Okay. You see what I did? I stuck a sentence in the middle of that first sentence. Yeah. But now I'm going to do it again. Okay. The man, the girl, the dog loved, saw, left. Are we assuming there's any punctuation in this sentence? Don't have to. Yeah. That's what he's doing with stress. Yeah. The punctuation is shorthand for stress and pauses. And Daniel is doing like, the man, the girl saw, the dog loved, left.

12:56Right? So imagine sort of commas. The man, the girl, the dog loved, saw, left. Oh, fuck. You did it that way. What I find here is that when you get to three levels, it just turns into gibberish. Like when you get to that third. Now, could I ask, it only turns into gibberish because the things that you are nesting are very, very small, right? So you've essentially got your three layers of recursion all flopping over each other straight away. Now, if you stretch stuff out, right? If you added some more words in there, the old man that the young girl, the smelly dog

13:34loved, left. That's the thing. You're adding the that and that is helping you. Okay. It's the that that's helping you. It's not more words that's helping you, if that makes sense. I think that you can give it all the help you want and it would still be just a tangle of words. I won't lie. I'm desperately trying to stop myself from drowning. So I'm getting to just my nose above water on this one. We're just not good at that level of complexity. So, bad for humans, but I could ask a large language model, does this sentence have recursion

14:08where please draw a tree? Okay. Here's the third task. Okay. It involves a sentence like this. I think she will arrive at three. Okay. Now I'm going to take that sentence and I'm going to twiddle it. When do you think she will arrive? And if I were asking a linguist, I would say, show me a tree for the first sentence and then show me the second tree for the second sentence, showing where the bits moved to. This is pretty tricky stuff. All right.

14:39Okay. And then the fourth one was I could give it a phonological analysis problem, very similar to the kinds I give my students, but on a made up language, here are some words. You've got to figure out which sounds are actually the same and why they're changing into the sounds that they are. This will only make any sense to linguists. Phonological analysis problems are kind of like a second year thing. Can I ask a question? Yes. Large language models in the kind of like auto-completing shouldn't really be that good necessarily at doing these things because it's kind of, in a way, mainly influenced by

15:13its training material and some added skeletons. Lately, we've seen a lot of AI agents being added like what's called like reasoning or like tools. So famously in the beginning, people would say like, what's five plus two? And it'd be like, I don't know, eight? And it'd be like, well, good, good, good work. But if it knows to ask its little calculator tool, what's five plus two? And it says, actually, that's seven. It's seven. Then it can do better. But that is no longer a bare language model, if you know what I mean.

15:47It's a more comprehensive product. It's stapled to different tools. Mm-hmm. Yes. So in this paper, when it was able to, for example, draw trees, was it just a plain large language model? All of the models were just plain large language models. Here they were. They used LAMA, GPT 3.5 and 4, and then OpenAI's 01. Now, again, I'm wondering if this is to do with the fact that large language... Hang on. I haven't told you how they did.

16:17Oh, okay. Sorry. I'm assuming it's news because they did good. Like, you're not going to do the story because it's like, and it was all terrible. They were all terrible. Okay. Except for one. Okay. OpenAI's 01 did a great job. I actually managed to answer these hard linguistic questions, even on made-up data, very, very well. Which one is the 01 one? 01... What even is 01? We're releasing a preview of OpenAI 01, a new series of AI models designed to spend more

16:52time thinking before they respond. Yeah, yeah, yeah. So I think it's the most advanced one. It's the most advanced one. With a first series of reasoning models. It's not a bearish language... It isn't handing off jobs to different capable tools. It's still a language model. Here's the difference. It uses something called a chain of thought mechanism. Now, usually, a chain of thought is something you do in your prompts to guide the model bit

17:22by bit. You break the task down and you say, for this first bit, think about this. Now that you've got that, give me the answer to this second step. Now the third step. Now the fourth step. And you do that bit by bit to avoid errors. But 01 has a chain of thought mechanism built into it. And it will actually show you it as well, which is a little bit interesting. Like as you are prompt crafting and waiting, it will actually sort of articulate the steps that it's undertaking, which is, you know, like I know we like to poo-poo, but that was

17:56a moment where I was like, oh, this is quite fascinating, really. Well, I think so too, because now what you've got is a model that, yes, takes a lot longer, but generates more steps. But what it's doing at each step is it's including more context. Instead of giving it a big task that it has no clue on, you're giving it a series of, well, it's taking it into a series of small tasks that then include more context, which then allows for a greater depth of reasoning.

18:27It's like a chunking thing, like a much more effective chunking methodology. And we've kind of known that chain of thought mechanisms have been better. Since about 2022, there was a paper called Chain of Thought Prompting Elicits Reasoning in Large Language Models. It was by Jason Way and a team. They say experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, common sense, and symbolic reasoning tasks, because you're breaking down the problem. I just feel like this is a viable avenue for hallucination mitigation.

19:01I think that's actually quite a good thing. I think, I don't know about, yeah, mitigation might be the right word, but even that I think is possibly a bit too kind. Possibly. Like hallucination band-aiding, maybe, I would be willing to say. But it's still, so one thing that is true about them is if they're breaking them down and going through different more steps. But one thing I've noticed, like I was having some basic programming problems the other day

19:32and I had a file and I was like, can you help me find out what's going wrong in this file? And it was like, that file's empty. And I was like, no, no, I know it's not though. And it's like, no, no, no, it's empty. And then we talked about it for a while, me and the little large language model. And then we found out that it had asked another tool if this file was empty. The tool was not able to read the file. And that means either it couldn't read it or it was empty. And it went with, it's empty and then just stuck to its guns. And I was like, can you not hold more than two thoughts in your head?

20:03It's the great, it's the great, like, I have Eleanor Schulstrop's file. Do you have the file or are you holding a cactus? I am definitely holding the file. Yes, it's exactly that. And I went like, just because I was frustrated at the end of the day, I was like, are you not able to hold two ideas? And it was like, I am not. And I was like, great. So at each of these reasoning steps, only one alternative survives. Even if it is like a 52-48 percentage split between the probabilities of two answers, only one survives to the next step.

20:35Do you see what I mean? It's decided. So if it's 98% sure or 52% sure, it doesn't know. And it goes through all those steps anyway. So it can still hallucinate. And I also read here on Wikipedia that there were some research that show that these kinds of reasoning models are, in certain tests, more able to deceive.

20:56Don't forget, when you make them smarter, you make them better liars. Oh, ouch. Yeah. Daniel, I'm surprised you didn't learn that just through parenting. It's very difficult when you have children that are much cleverer than you are, even at their young age. I don't have that problem. Let's.

21:14Like, at all. Let's go on. Let's go on to our last news story. And this one was suggested by James on our Discord. And when I read a little bit further, I found out that one of the authors, well, it's you, Hedvig. Hello. This is some of your work. And I thought, and usually I would say, who cares about that? That's just internal stuff. Whatever the Hedvig version of AI slop is. What Hedvig slop is this?

21:46What Hedvig slop is this? But then I realized we have like a publishing powerhouse among us. It's Hedvig Kiergaard. And. What, what? It's also Dr. Anna-Marie Vakirk, another of the authors. Well, I couldn't resist. I wanted to find out about this paper myself. This week, me and a team of colleagues from my department, University of Auckland and University of Sarland published a paper about language universals. So these are the kind of statements as in like, if you have a lot of suffixes on your

22:17verb, you also have this word order or something like that. Um, and we published this after many years and it was nice this week because this week we published a paper and I got rejected from my grant. So it's like the life of a researcher in one week. Highs and lows. Ouch. Highs and lows. Eh, it's okay. I have to apply for more, but. Okay. Language universals are these things where if a language is, and we sometimes express them in terms of tendencies. So if a language is adjective noun, then we see certain other things.

22:51But if it's noun adjective, then we see the opposite. And you actually went and checked them all out. We checked a bunch of them out. How much should I say? Because Anna-Marie and I are probably going to say the same thing that I'm going to say now when she's here. Okay. Let's bring her on. I'm here with Anna-Marie Vakirk and inexplicably Hedwig Hiergaard. What's she doing on this show? So we're talking about a new paper called Enduring Constraints on Grammar, Revealed by Bayesian Spaciophilogenetic Analyses.

23:22What a great title. It's in Nature, Human Behavior. It's an illustrious panel of authors, Anna-Marie Vakirk, Olena Shcherbakova, Hannah J. Haney, Hedwig Hiergaard, Christoph Rzymski, Quentin D. Atkinson, Simon J. Greenhill, and Russell D. Gray. Lots of great authors on that project. We've got the lead author here along with Hedwig. So, Anna-Marie and Hedwig, welcome to the show. Thank you. Thank you. All right. So this one's about universals. And I think I know some stuff about universals, like I know why linguists would be interested in universals.

23:52But what's your view? Where did this come from in your research world? Well, in my research world, it always existed because I'm a linguistic typologist. So I got introduced to the topic in my undergraduate classes. So ever since then, I've known about them at least. And they have so many different possible origins. Like some of them are like, ooh, we think this would make like cognition easier if you are like sort of harmonious in this and this way.

24:24So if the demonstrative comes before the noun and the adjective comes before the noun, maybe like lowers processing costs. But then some other universals are like entirely differently motivated. So it's like a bit of a bag of like mixed, it's mixed nuts. It's not a homogenous set, which I find frustrating and annoying and interesting, I mean. Okay. So it sounds like we're talking about different kinds of universals, because when I think of like language universals, I think, oh, here's a thing that all languages have in common.

24:57All languages do this thing. And we've talked about those a bunch. There are some interactional ones. Some people think that there are syntactic universals, but I don't know. But what kind of universals are we talking about here, Anna-Marie? Yeah. So we are talking about a very specific type in a way. In the literature, we call these implicational universals. So that means that they say something like, if X, then Y, right? And the textbook example of that is, if a language has subject, object, verb, word order, it tends

25:32to have postpositions, right? So appositions that come after the complement noun. So if I say something like, the man, the bear, saw, or I, the bear, saw, that's subject, object, verb. And if a language is like that, then it has a strong tendency to, you say, do postpositions, like instead of on the table, I might say table on. Is that what we're saying? Yeah, that's what we're saying. Okay. Yeah. Now, I can think of a few more of these.

26:03For example, if a language has a word for three, it's probably got a word for two as well, right? I mean, it wouldn't make sense for it to just skip two and go on to three. It's just, so that's, is that an implicational universal? Yeah, that would have been one. It's an interesting one, though. I got him in trouble already. No, no, no. There are actually word orders about number. The one that we had, or one of the ones that we had in the paper, is about whether, how, or how number is expressed in morphological markers for number, right?

26:40So, there's one really strong one which states if a language has a grammatical category which is called a trial, which means that it somehow expresses units of threes of things, then it must also have a dual. And now I've forgotten whether that's one that's highly supported in our favor or not, but we should check it. That's kind of what I was getting at. Like, if a language says, you know, the three of us, then it's also going to have a bit for the two of us. Yes. So, the idea here is there are tons of these supposed language universals, and you and the team started plowing through them and investigating them.

27:21How did you do that? Well, we were very lucky that Franz Planck and Elena Filmonova, am I getting that right, Filmonova, had made this thing called the Constance Universals Archive, where they took suggestions from all over the literature and wrote them down in a big old webpage. So, like, on page 92, this author said that languages that do this, do this. So, we took all of those, and then we tried to filter them for suggestions that we could actually test with our GramBank database.

27:53Okay, now remind me about GramBank, which is this awesome tool that you've got. Yes. So, GramBank is a big old database of grammatical structures. So, we have a bunch of different features, 195, and we fill them in for as many languages we can. So, it'll be things like, is there a trial marker on the noun? It'll be things like, is there a case suffix? Blah, blah, blah. Yeah. We don't cover all things in grammar, so we couldn't match all of them. So, we went through all of the ones that Franz and Filmonova had in their database,

28:24and then we found 191 that we could actually implement meaningfully into GramBank. That's a lot. That's a lot. There were more than, what is it, more than 2,000 in there? 2,000, yeah. Yeah. Oh, my gosh. But many, many of these, of course, you can basically just immediately determine that we can't test them because they are about phonology or they are about highly specific, passive constructions that GramBank doesn't have data on. So, we struggled for the longest part, actually, with, you know, the kind of fringe cases where we're like,

28:58well, this, you know, we can cover this, but exactly how to do it was not always straightforward. We had the most discussion. And that took the most time just to figure out, like, how to actually implement them using GramBank questionnaire items. Yeah. We had a lot of revision and back and forth. So, it was mainly me, Anna-Marie, and Hannah Haney, and Olena Shcherbakova who sort of trolled through that list to sort of filter it. Because, as maybe some listeners are aware, but some may not be, linguists don't use consistent terminology.

29:29Oh, no. So, what one author says is case. Is that actually what GramBank means is case? Maybe they're not identical, but are they close enough that we can do something with it? So, these kinds of discussions, we had to have a lot. Okay. Well, I'm still just amazed that linguists have spent so much time making up to 2,000 claims about, yeah, I've noticed that if a language does this, it tends to always do that. Yeah, and then you managed to, like, plow through 200 of them.

30:01Now, the ones that I always think of, you know, Joseph Greenberg, the linguist Joseph Greenberg, is always mentioned in connection with universals because he's sort of really, is he the first person to get people thinking about this stuff? Yeah. Probably. Not the first person to make suggestions like this, but certainly the first person to, like, wildly popularize the idea of searching for these and trying to explain them. So, he made a bunch of primarily word-order universals that have been very, like, setting the tone in the field, for sure.

30:32Now, this can't be the first time that people have plowed over this. Yes, they've been going over Greenberg's universals for a really long time, but still, I feel like you're coming after them. It's like, okay, buddy. It's time.

30:46We're going to get you. I mean, a lot of them hold up. A lot of them. All right. Well, maybe it's time to get the broad results. Anna-Marie, give me the top-line results for what you found. So, we did it in two different ways. First up, we did it on a more or less contemporary level, taking into account genealogical and geographical non-independence, but not doing anything with language history. And then in the second set of experiments that we did, we explicitly modeled these features as they evolve on phylogenetic trees, or a phylogenetic tree, I should say.

31:20And both of these experiments point out that about a third of them hold, if you look at both of these experiments. And what's also very interesting to us is that, depending on the type of universal that you look at, you get very different results. So, I can elaborate on that if you want. One thing before you do, though. This is for both of you. We've talked numerous times about Galton's problem. We would like to treat things like they're unrelated, but they're not unrelated.

31:53They're related by family, or they're related by place. And so, you might think that you're looking at 17 different languages to do this thing, but actually, you're only really looking at one thing. How did you control for that? Yes. So, one way you could think about it is that if you think about language families, language families are trees. So, you have one parent for each tip or node. So, each language has one parent, and if you go further back, suddenly there's only one parent for all of them. You have the root of the tree. And if a certain feature was present at the root, and that just got inherited by all the descendants, then that can look like it's very common.

32:28But if it's two features that are co-occurring, they will look like they're co-occurring independently in all those languages. But in fact, they're all just being inherited. So, this is what we call phylogenetic autocorrelation, or also commonly known as Galton's problem. So, when you conduct statistical tests, you have to meet the assumptions of your tests. And one of them is that if you think that your data is independent, they should actually be that. And we know that they aren't in this case. So, the first step, like Anne-Marie said, is a sort of traditional, what's called a regression model, where you say, this thing, trial, this thing, dual.

33:02Do they have an effect on each other, given these relationships between all the languages? And we also think that there could be some spatial spreading of things that could also interfere. That's called spatial autocorrelation. So, we include a spatial term as well. That's sort of the first step. And then, once we've completed that, we go on to this more nuanced analysis of actually what Anne-Marie described as, like, modeling it on the tree. Just a bit more involved. But I quite like it. We were discussing a lot when we were writing the paper about, like, how to do this and what order to do things.

33:37And we came up with this sort of, I think of it as, like, a bit of a filter process. Like, if each universal gets through step one, if it gets through the regression analysis, and if it survives that, it gets to go through the second gauntlet of this co-evolutionary analysis. Yeah, credit to Simon Greenhill for that one. Yeah, it's a good step. And I'm actually going to reuse those steps in some other papers. Because it means you do less work. You don't have to test every language for everything. You can knock out some in the early round.

34:09Yep, very good. Okay, and you found that once you had accounted for area effects and phylogenetic effects, one-third of these universals that you were looking at held up. Or, in other words, two-thirds didn't. Yeah, exactly. Yeah. So the fact that we can very confidently show with two very different types of experiments in a way that a third of them survive shows that they're relevant, right? It's not the case that we should not think about these types of patterns at all.

34:40So some of them really seem to be quite strongly supported. And I think it's super interesting, especially the way that they come about, right? So you can think about different ways in which languages get to that point, right? It's just thinking about how we arrive at these kind of somehow attractor states of two features that are combined with one another from a historical perspective. I think that's super interesting. And I think that's something that I want to explore in greater detail is exactly how that happens.

35:13Anne-Marie wrote a very nice supplementary section, if I can do some personal advertising. I think I contributed a bit to it, but I think Anne-Marie did most of the writing. If you go to our article and you go to supplementary number nine, there's a whole lit review of all kinds of classes that people have suggested, like different theories, a lot of things that couldn't cram into the main text. So if you want to get into theories of why certain things have been suggested, I can recommend reading. Thank you. I'm aware that many people have suggested that language universals exist because there's a broad faculty of language and all human languages are basically the same language.

35:55What's your view as to why we have language universals? Well, first off, maybe we should say that we're not here looking for what's called absolute universals. These are not always true. So if you write into us and tell us that you do have a language that, for example, the verb agrees with the object but not the subject, that's technically a counter-evidence for that universal. But what we're seeing here is that the tendencies are very strong, but they're not absolute.

36:26So we're looking for meaningful related patterns, not absolute counts of like, if we found one counterexample that doesn't turn over. So what we're looking here is like strong tendencies. And what Dania was describing is sort of the generativist paradigm of what's called universal grammar, where they are, as far as I know, not really open to strong tendencies. It should kind of always be the case, more or less, right? That's a bit of a point of difference there. To my understanding, yes.

36:57So why do we have these tendencies? Why did the tendencies that you found exist? Yeah, so this is the conundrum. I mean, for me, personally, that's a huge kind of conundrum, right? So on the one hand, you don't have to believe in universal grammar or anything like that to be able to say, well, there is something about human language that we share as a species. And may drive certain feature combinations to come up simply because they are somehow more adequate than others, right?

37:28This has been a very influential idea in functional typology. And I can see why, right? It has a natural appeal. So, for example, it's very, not the universal that we tested, but it's very strong across the world's languages that subjects precede objects. And probably this is the case because people want to put first in sentences what the sentence is about, right? So subjects come out of what was originally topics, and this all makes a lot of sense, right? But on the other hand, it cannot be the case that that's all there is because we do get all of this variation and this deviation from expected patterns, right?

38:08And I think there the role that we can explore is history, right? So historical change very often might actually be contact-induced. So one language community meets another speaker community, and, well, they do things slightly differently, and some change happens to be taken over, and this change then triggers other stuff happening in that language. And, of course, this is something that we don't know much about from any languages of the world simply because they're not recorded in history like, for example, Latin is and stuff like that.

38:38But in some cases you can't discover some of these kind of historical pathways to future combinations that you see today. And that's kind of what I find interesting, this interface between, on the one hand, preferences that are probably somehow at least global or universal, that probably have a cognitive nature. And on the other hand, these kind of historical accidents or kind of historical pathways through which things change.

39:09Another thing to think about maybe for some of these, for example, scalar ones is just raw frequencies in language use. So dual, you talk about singular things and plural things probably a lot. The amount of times you talk about specifically two is probably not as often. Probably more than you talk about things of three, though, I would suspect. So if you go from the theory of, like, you make into grammar what you do most often, it would make sense to first create a form for plural and then create a form for dual just by the frequencies of you could look at some sort of corpora or something.

39:52So some of these are explained by sort of just pragmatic use frequencies stuff as well. Which is still, I would still kind of put it in the cognitive bin, right? Because that relates to the human experience. That makes sense. Before I let you go, can you just reel off a few cool ones that you were pleased made the cut or maybe didn't? A lot of the usual suspects are topping what we're calling the narrow word order. So things like if you have your demonstrative after your noun, you have your adjective after your noun.

40:27If you have your numeral after your noun, your adjective is beyond. So if I say instead of red house, I say house red, I'm also going to say house that instead of that house. Yep. Makes sense. So you sort of like, you either put all the things before or all the things after. So be that demonstrative numerals, whatever it is. And you get like freak things like French where like some adjectives are in the other place. Yeah. Yeah. No, but that makes sense. I mean, I'm getting a really strong cognitive feeling off of that one.

40:58So cool. What else? Yeah. So those were kind of usual suspects, but those aren't very surprising. You asked us for something that's like a little bit unusual. Well, in our other category, there's a wild bunch that I don't really, I was a bit surprised by, I think. It's like if you have less case markers. So markers on nouns for like if they are the subject or object or things like that. Like going to the house or from the house or for the house or...

41:28Yeah, you do more marking of tens. If I have more cases, I'll also do more marking of tens. No, if you have less cases, you have more tens. Is that because now I'm inventing crazy theories? I don't really know what that one... That is in our other bin for a reason because we're like, I don't... Because languages spread the complexity around. And if it's too complex, they'll knock it down. No, we know that that's not true. Oh, no, I got it wrong. I have a paper and that's not really seen...

41:59At least for certain measurements of complexity, that doesn't really seem to fully hold true in certain studies. Yeah, so... Yeah, this is a curious one for sure. I think we're going to see more work on that. Well, I'm not going to beef up my verbal stuff because I already beefed up my cases. I only got so much time in a day. That is the ICHRI complexity hypothesis. But that has been shown to not really work out and... Damn it! You're knocking over my hypothesis. This is great. I love this.

42:31This is possible that this one is the sort of like odd fluke that is actually tracking something else. There's something else about these languages that is causing both of these things or something like that. But that's one that, at least when I saw that it come out... Anne-Marie, what did you think when you saw Less Case, More Tense being supported? I didn't think very hard about this one because we had another one in there that was kicked out earlier in round one. where we basically had, I think it was like more case, more tense aspect.

43:07And that one would basically speak to the general morphological profile of a language where indeed, like some languages are just morphologically more complex than others, right? They just like sticking things on the ends of words. Right? Yeah, no, that one apparently doesn't hold. But this one still does. Maybe this, I think probably Hathwig is right in the sense that there might be another factor here. That's the third factor, right? The hidden factor that we don't see that could be affecting both of these to align a certain way such that they align here.

43:45Yeah, maybe that one is a little bit of a fluke. We need to look a bit more closer because we also have the reverse one listed, which is more case, less tense. And that one is not supported. That one is supported in the synchronic analyses, but not in the base traits. In the base traits, yeah. That's amazing. So something is probably a little bit like funny here. Maybe. Sorry, I pointed out. Why don't we take one that we actually know like a bit better? Because that one's a mystery one to me.

44:16Amri, do you have a favorite one you want to take? Well, one that I think might be interesting for listeners is that there's this paper, I think it's 1988 or it's 1989, by Matthew Dreyer, who says that adjective noun order is not involved in narrow word order, universals. And also, if you look at walls, right, and you look at whether the order of noun and adjective correlates with the order of verb and object, it doesn't work at all, right?

44:48But then, if you look at our list of supported narrow word order universals, we see noun, adjective, and adjective noun popping up relatively frequently. And I find this super interesting in the sense that it doesn't seem to be the case that the order of adjective and noun is correlated with the order of object and verb. I mean, and then, you know, 40 years ago, Dreyer already said this and proved it. But it does seem to be the case that the order of adjective and noun seems to correlate with other abnominal word orders, right?

45:18So with the order of noun and demonstrative, with the order of noun and numeral, with et cetera, and the order of noun and genitive. So there might be something going on with somehow, you know, putting modifiers of nouns either in the same position with respect to the noun or kind of distributing them so that they're on both sides to make things somehow more balanced. I don't know. I have no idea. But I think this one stands out as something that wasn't really expected and somehow could be quite theoretically relevant as well.

45:51I sometimes like saying that as a linguist, I get to see language in the broad view. But I feel like you, the typologists, are just coming at this from the very most broadest view because you get to see so many languages, so many features over so much time. Is it not amazing? Does it make you gasp sometimes? It sometimes gives you a little bit of vertigo. You can also be subject to what I like to call a galaxy brain, where you can kind of, you see so many ways that things can be that like nothing surprises you.

46:22And you're like, oh, this could be this way and this way and this way. Like there's at least one language in the world that does that. So you kind of, you kind of wear out. It does terrible things for your, at least mine. I don't know about you, Anne-Marie, but like your, your writing and your, like I, anything is possible in my English. I will just do whatever I feel like, but sometimes I need to remind myself that I need to be more in on the details as well. There is that for sure. The other thing, like with respect to this paper, right, it's just that we tested 191.

47:00On the one hand, it sounds like a big number, but on the other hand, we could also say, well, we're going to test every Grambank feature in relationship to every other Grambank feature, which would be a whole large number of analyses. And this has also been suggested by people in the audience when I talked about this stuff, right? Like, why don't you only, why do you restrict yourself? Why don't you just do some mining? Yeah. Yeah. Why don't you just do some mining? Right. And then that's what I find when I go a bit like wide-eyed, like saying like, that's really, that would be a really computationally intensive task.

47:37And what probably is going to come out is that Greenberg was right all along, right? So, on the one hand, we do have, as a discipline, some ideas about patterns that still hold, right? And some of them might not, but still, I mean, we had pretty good heuristics to find these patterns in the first place. So, we don't need all of that mining.

48:03Maybe, I don't know. We already know where to look, probably. Yeah, we already know where to look, yeah. We want to do hypothesis-driven research where we're starting from a foundation of, like, people have a belief that this and this is related because of these reasons. Let's test that. Though sometimes those reasons are a little bit murkier than others. Another thing to note, if we were to take all the 195 gramma features and test every pair for those, first of all, it would be over 30,000 pairs. And then, also, you could run the risk of what's called p-hacking yourself, which means you could find relationships that are kind of spurious and are not theoretically supported, and then you ad hoc afterwards find a reason to say why they're supported.

48:50But we want to start in sort of theory and hypotheses and then employ these, like, more sophisticated and nuanced ways of testing it and then discuss the results. Otherwise, you can, yeah, you can do bad science. Well, the paper is definitely worth the read. We're going to have a link to it in the show notes for this episode. We've been talking to the authors, Anna-Marie Verkirk and Hedwig Hurgard. Hedwig and Anna-Marie, thank you so much for coming on and chatting today.

49:20Thank you. And now it's time for Related or Not. Now, what have we got this week? Ben. Yes? You said that you wanted to hear a certain kind of theme music, a certain kind of jingle. I think I've put a call out for a few things, but I may have, knowing me as I do, I have either asked for a… That shows a high level of self-awareness. Knowing me as I do is… I think we've got to stop that right there. That's really good.

49:50I've either asked for an EDM banger or a rinse-tash-stick, wobble-filled, drum-and-bass nonsense one. Which one? Which one did I get? You kind of asked for all of the above. Yeah. You wanted a jungle-style EDM. Jungle. Because it's massive. Jungle is massive. Well, we got it. This one comes to us from David, who sent it to me. I've legitimately never been as excited as I am right now in my entire life. This is so great. It's going to be good. Oh, it's liquid.

50:33Related or not. I love it. There you go. I was expecting, yeah, I was expecting something really crunchy and really, like, grimy, but then he gave me this, like, awesome liquid, like, London scene, so 2008, David, get out of my head, bro. This, that was, that was fantastic. I love it. That was so good. Very good. Thanks for that. Very impressive. Our first one comes from Kalina, who says- More paper. Thought of this because I have some heavy-duty casters, those wheel things. The things that are on the bottom of beds that, when you have sex, turn your, like, mattress

51:06into, like, a yacht sailing on the high seas. You've never had that experience?

51:13I'm really struggling here. My bed doesn't have casters. So, Hedvig, have you ever owned a bed with those tiny little black wheels? No. The kind of wheels that are on the bottom of an office chair? Yes. Okay. That's a caster wheel. On a bed? Yeah. On a bed. Sorry, that was very loud. Some cheap, shitty, some cheap, shitty beds come with them. But why would you want that on a bed? You want your bed to stay put? 100%. And like I said, when you have sex on such a bed, it turns it into a, like, into the jolly road. Without the wheels. It's bad. It's really bad. No, no, it's great.

51:44You can go for, like, a distance record. It's like, that was a good one. Let's try it again and see how far we can go. So, caster wheels, the wheels that are on the bottom of office chairs. Okay. Kalina continues. Kalina's email is not about having sex, by the way. Made of cast iron. Go with me. Okay. If I were to cast the table across the floor, the casters let it glide smoothly as if on oil. Caster oil. Particularly.

52:14Okay. So, four things. You've got castor, a wheel on a table or a chair. Not a bed. Cast, as in set in a mold, like cast iron. Okay. Cast as to throw, like dice or a net. Uh-huh. And castor oil. Kalina says, love the show. It's one I regularly recommend to people looking for a new podcast to listen to. Thank you for making it. No, thank you, Kalina. Okay. That is very nice, especially after last recording where, I don't know if you kept it in, Daniel, but we read some reviews that weren't that friendly, so I really appreciate that.

52:46I sure did. And it was great. Of all our reviews. Okay. Can I ask a question? Please. The wheelie things that I had never heard of, how are they spelled? Now, that is a good question. C-A-S-T-O-R. Castor. Yes. And? Oh, no. C-A-S-T-E-R is considered a variant. Oh, yeah. That's what I was asking about. Yeah, exactly. E or O. Mm-hmm. Because English vowels, especially when they're in that position, I'm like, what are you asking me to do?

53:17Well, I mean, yes. So, is it relevant or is it not? Maybe not. Well, castor oil, I know it's with an O. That's true. Hmm. Okay. Do you want me to go first? I'll go first. Yeah, you go first. You go first. I think cast iron is cast because it's thrown into a mold. That's what I think. Okay. Yeah, I think so. Like poured, essentially. Yep. Yep. I don't think the other two are related to those or to each other. Okay. You don't think cast, like a gypsum cast, is related to cast iron?

53:48Oh, you mean like a, sorry, gypsum cast? Yes. She's talking about like what you would put on your arm or whatever. And he is saying exactly the same thing. Oh, I do think that those, I do think that that is the same process. Yeah, yeah, yeah. Okay. That process is the same process. But I don't think the bedcaster is related and I don't think that the oil is related. Huh. Why? What do you think, Kevin? What? Oh, this is a hard one. I'm actually, sorry. Yeah. You go. Ben, you go. Ben, you go. I was actually going to say, and this is very rare for me.

54:18I'm sitting pretty close to where Daniel is on this. Um, castor oil, I think is just going to have its own entirely silly, bespoke etymology related to nothing at all because it's like an oil and it's really old and it's been around for a squajillion years and blah, blah, blah, blah. Castor wheel is the odd one out for me. I completely agree that to throw something and then to pour stuff into something is just like a related sense. Castor wheel, the only thing I'm wondering is could it in some way be related to either

54:57of the two other ones, right? So maybe it was made of a kind of plastic that was like derived from castor oil originally. I don't know. Or could it also have been like you put it on stuff and then you could like throw them across the room. Um, I'm going to, I'm going to back Daniel here. I'm going to say that castor wheel, its own thing, cast and throw, related, castor oil, its own thing. Okay. Hedwig, we both agree. Pure pressure.

55:27I am more and more working myself into the all related camp. Ooh.

130: Back to the FTR (with Séan Roberts, Cole Robertson, and Annemarie Verkerk)

Show notes

Highlighted moments

Transcript

More from Talk the Talk

139: Magpie Syntax (with Stephanie Mason)

138: Pop-Up Gaeltacht (live with Laura Pakenham and friends)

137: Are Trees Real? (with Yngwie Nielsen and Morten Christiansen)

136: These Languages Are Anchors (with Mary Walworth)

135: Linguistic Illusions (with Dan Parker)