Steadcast
Kaleidoscience: Conversations on Cognitive Science cover art
Kaleidoscience: Conversations on Cognitive Science

S3 #44 How should machines translate sensitive language? Brain-to-brain with Sabrina Frohn.

February 15, 20261h 4m · 7,989 words

Show notes

Sabrina’s linkedin: https://www.linkedin.com/in/sabrina-frohn/ Papers: not THE implicit bias paper but one explaining implicit and explicit bias: “Social Justice in Our Minds, Homes, and Society: The Nature, Causes, and Consequences of Implicit Bias” by Laurie A. Rudman, 10.1023/B:SORE.0000027406.32604.f6 about the implicit association test I mentioned: https://www.projectimplicit.net/nosek/iat/default.html (I was not able to find the study I participated in, but I assume it is similar to this, perhaps was even based on this.) bias in machine translations: „Gender Bias in Machine Translation Systems“ Stefanie Ullmann et al., isbn: 978-3-030-88615-8 “What about em? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns, Lauscher et al., 10.48550/arXiv.2305.16051 comparing LLM and MT “Benchmarking Machine Translation with Cultural Awareness” by Binwei Yao, 10.48550/arXiv.2305.14328 „Evaluating Gender Bias in Machine Translation“, Stanovski et al., 10.18653/v1/P19-1164 machtsprache: https://www.machtsprache.de/ macht.sprache plugins: chrome: https://chromewebstore.google.com/detail/machtsprache-for-sensitiv/dichlnekfmanlagciihdnkgiefppilol firefox: addons.mozilla.org/en-GB/firefox/addon/macht-sprache/ interesting reads: „The complexities of linguistic discrimination“, Drożdżowicz et al., 10.1080/09515089.2024.2307993 “On the Translation of Otherness: The Univocal Case of Will Grayson, Will Grayson”, Badenes, 10.7202/1068906ar “Word embeddings quantify 100 years of gender and ethnic stereotypes”, Garg et al., 10.1073/pnas.1720347115 https://pocolit.com/ Sabrina’s paper: https://publications.waset.org/10014353/bibtex Podcast Credits: Produced by: Imogen Hüsing, Clara Kühne, Sophie Kühne, Sönke Lülf and Elisa Palme Logo by: Annika Richter Music by: Jan-Luca Schröder Write us an email to: kaleidopod@uos.de Contact us on Instagram: @kaleidoscience_pod

Highlighted moments

For example, in German, the word mauschön, which we just use if something is a bit shady, I guess, has actually a background in Jewish history and is actually kind of replicating this bias that, in quotes, obviously, Jewish people are a bit shady or acting shady.
Jump to 15:22 in the transcript

Transcript

Introduction

0:00Hi, and welcome to Kaleidoscience. Here you find answers about cognition that you may or may not have asked yourself. This episode is hosted by Imogen Hüsing and Elisa Palme. So sit back, relax, and enjoy this week's episode. Despite the best intentions, it can be hard to use politically sensitive language even in your native tongue. Translating texts into another language with different conventions for sensitivity can be even tougher, especially when you're not proficient in the other language.

Guest Introduction

0:31Today, we are talking with Sabrina Frohn about language sensitivity in machine translation tools, if they can be politically sensitive, and how machine translations can be improved in the future. Sabrina Frohn is currently a PhD candidate at the University of Osnabrück, where she also completed her Bachelor in Cognitive Science. She completed her Master's in Computational Cognitive Neuroscience at the Goldsmith University of London. At the University of Osnabrück, she also founded the Cognitive Science Student Journal

1:03and is still an active member or the active leader of the journal.

Get-to-Know Game

1:08Yeah, welcome. So nice of you to join our podcast today. Thanks for the invitation. I'm excited. And as always, at the beginning, we play our short get-to-know game where you will get five sentence beginnings. And I would ask you to complete them as spontaneously as you can. Since I know that you've listened to episodes before, do you still remember the questions or are they novel? No idea. Okay, perfect. I have something about an emoji in the back of my head, but even there, I'm not sure. Okay, perfect. Our first sentence is, as a kid, I always wanted to be...

1:46As a little kid, I wanted to be a firefighter or a policewoman.

1:53Yeah. Why?

1:56Honestly, I just thought they were cool because they could make noises with their cars. It was kindergarten time. But that's a relative reason to think they're cool. Yeah, the dream stopped when we had an excursion to the fire station and they were allowing us to go onto the car and drive up. And I realized I'm afraid of heights. So, short dream, but very strong one.

Emoji Association

2:25Our second sentence is, if I was an emoji, I would be... Ooh, I would be the emoji with the hands, which is smiling. The hugging one? Is it a hugging one? I don't know. I was interpreted as an inviting hug. Oh, no, I thought it's like, hey! Okay. But I think it's called the hugging face emoji. That's where the AI and data set website gets its name from, like the hugging face.

2:57Oh, I didn't realize. I just like that it sounds... Not sounds. It looks so inviting. Maybe now I have to rethink it again. No, I'm kidding. So you stick to the... Yes. Okay, perfect. Then our third sentence. My favorite thing to do on a day off is?

Favorite Activities

3:18Do some arts and crafts at home and go for a walk. What kinds of arts and crafts do you do? I do knitting, crocheting, currently teaching myself to sew. Anything you can do with pencils or colors and paper. Yeah, I'm kind of jumping around the arts and crafts.

3:50That's amazing. Yeah. Our next sentence.

Current Interests

3:54Right now, I'm most fascinated by?

4:00Tricky one. Actually, yesterday I was walking home and it was snowing and I was walking on top of the snow. And that actually fascinated me. Water can have so many different forms. Yeah.

4:19That I can walk on it. But I think this fascination might die when spring rolls around.

4:30Yeah. But even with snow, like snow can be really good for making snowballs or can be so powdery that you can't even... When you try to make a snowball or build a snowman that it just falls apart. Yeah. So just even that is super fascinating. That's true. Yeah. And the difference between those two states are like one or two degrees.

Last Sentence

4:53And our last sentence. I know it's time to call it a day when?

5:00When the text or the computer screen flimmers in front of my eyes and I have to read sentences again and again. Then I just better go home. Similarly for art projects, when I'm making like three mistakes in a row, I'm like, okay, call it a day, start again tomorrow. Oh, interesting. Because for art projects, I tend to go always deeper the longer the day gets. Because the tireder I get, the more I want to finish whatever I just started. Yes, I get it.

5:33But then I have been on the next day side too often where I had to like unravel the scarf I was knitting too many times. So I made this a rule for myself. That's smart.

5:48Okay.

Scientific Background

5:49So now since we have gotten to know you a bit better, could you maybe tell us a bit about your scientific background, where you started and how you got into the field you are working in now? So actually in high school, for a long time, I wanted to be a chemist and study chemistry and even switched schools to be able to take an advanced course in chemistry, during which I learned I don't see myself standing in the lab for the rest of my life.

6:22Unhelpfully, my mom then said, well, you won't be if you study chemistry, you will be a manager. And then I thought, no, I want to do the cool chemistry stuff. I don't want to manage. So then always gone. I had no idea what to do anymore, but just knew, oh, I want to do a bit of math, a bit of IT, a bit of psychology, a bit of philosophy and so on. But you can't study that. At least that's what I thought until a friend of mine at the gym said that her daughter was studying cognitive science.

6:57And I looked at the description and realized, oh, that's exactly what I want out of a study program. So I started cognitive science in Osnabrück and I pretty much realized how cool it is to dive deeper into the different areas, but also that I might not be as fond of all the areas as I thought. And yeah, what kind of drove me with research projects, even for my bachelor thesis originally, was that I wanted to do something that has kind of a purpose beyond the paper I'm writing about.

7:37Originally, for my bachelor thesis, I wanted to build it upon an internship I was doing where I was working with robots and fingertip sensors. So you could imagine them being used for prosthetic limbs. And they were so fancy. It was like 5,000 pounds per sensor and you had five on one hand. And I was very worried to break them, but everything worked out until I wanted to write my thesis about it and had some stuff coded for the project and sent it back to the lab in the UK.

8:17And they were like, well, somehow in the setup of the robot, a mistake was made and all the data you gathered is not completely random, but not usable. So I switched a bit with, or had to switch the project.

8:36But this idea of making or like working on something that, I don't know, has a bit of a purpose beyond just, I don't know, just information acquisition kind of stuck with me. So in my master's, I, it's funny now that I think about it, every time something didn't work out, maybe that's also science. So in my master's, I wanted to work with a theater company to investigate emotions and specifically the connection between people through emotions.

9:20And then COVID hit, that's where the love from before came from, so I had to adapt it a bit and not work with the theater company, but with other participants. But the idea there was that perhaps it could be used in, in aiding communication with autistic people or people, nonverbal people for any other reason. And actually, I think the project in itself is still running, at least they wanted to make a startup out of it after I finished my master's.

9:56But to be honest, I haven't checked up on it because I did other projects and now I'm in my PhD, where I was at first a bit lost because I had so many options of what I want to do and couldn't quite decide until my supervisor wanted to collaborate or start a collaboration with another department at the university.

10:26And they crowd sourced or designed a platform, crowd sourced data on politically sensitive language, especially for translators, but also people generally interested in translation. And yeah, I kind of got hooked on this because all through my studies, I have been studying in English, but my mother tongue is German and I have used a lot of online translators and dictionaries and everything.

11:01And I always, yeah, realized how much I have to adapt what is being translated. And then with this added level of sensitivity and quite a few phrases I was not familiar with before I worked on this project. But I had some interest in the topic beforehand, so it kind of fit together quite well with what I already knew, what I was interested in reading up on and researching in my free time, plus the scientific background of it.

11:40And so, yeah, this is kind of how I came to my project now. Thank you so much for this whole background, and I think it's super fascinating on how many different projects you worked throughout the whole way and now ended up in something completely different as where you started at the beginning.

Relevance of Sensitive Language

12:03Maybe before we go deeper into the technical part as well, could we maybe just shortly talk about why it's relevant to talk about biases or other things and why it's relevant to talk about it's also about a sensitive language? Absolutely. So, every person has biases, I would say, mostly implicit, I hope, but of course also people have explicit biases.

12:34But I think regarding language, especially the implicit ones are what actually drives the need for the topic, because one is just not aware necessarily what historical background words have, even in the language they are speaking fluently. Could you maybe give an example for implicit biases?

13:02Sure. I participated in a study once, and I don't know the people who ran the study, but what they were doing was showing you pictures of people with different skin tones and looking at word associations, I think. So, they measured how much time you needed to pick a word that fits in a sentence. So, for example, Anna is a find-a-verb person.

13:41And then they had suggestions for the words. I hope this has been published so you can look it up. And they had suggestions for the words like smart or stupid. I don't know. And you were tasked beforehand to pick a word. And I don't know the results of the study because I was just participating. What I assume they were measuring is how much time it took you to – I think they prompted you to pick the positive one – how much time it took you to select it.

14:14And implicit biases, you can measure this way because it could be that it takes you longer to select positive verbs for people with certain skin colors. And you're not consciously doing it, you're not thinking consciously the one person is worse than the other. But implicitly, it has just been brought up in you, in the systems we live in, that we have these kinds of biases and stereotypes.

14:45And with language, it's a bit similar in that people can more explicitly show these implicit biases and explicitly to the people who are more familiar with the language. So, for example, in this crowdsourced – it's called Nachtsprache – in this table of words that they gathered, they wrote meaning or like they give explanations to words that they used.

15:22For example, in German, the word mauschön, which we just use if something is a bit shady, I guess, has actually a background in Jewish history and is actually kind of replicating this bias that, in quotes, obviously, Jewish people are a bit shady or acting shady. So, if you don't know this background, so you're not actively making yourself familiar with what words mean beyond what you have as a surface level, you can either show your biases or just non-knowledge through the words you use.

16:12Is it relevant for each term? Because, for example, mauschön – I just repeat the term – is not as commonly used. And I personally wouldn't have connected it to anything negative besides that. For me, it has a meaning of you try to hide that you did something not well, in a way. And I wouldn't have connected it to any historical background.

16:42So, my question is mainly, in how far is the original meaning still relevant if it's, in modern use, more disconnected from this original meaning? As far as I have read, and I understand it is, because of the people who it still affects. So, in the case I just brought up, I'm not personally affected, so it's a bit tricky to speak of it.

17:14However, I think that, for me, it makes it even more relevant to look into these topics, because I don't want to offend or hurt or unintentionally other people who are affected. And I also think, in the case of translations, to take it a step further, one just not necessarily would know this whole background of historic relevance, because it differs from country to country.

17:56I want to quickly give an example about something, like, in this case, an idiom that, like, for the English language, because we talked about a German word and also a German word that you probably, like, don't encounter if you're not a native speaker. But I think, for example, the term, that something is a cakewalk, has a background, as far as I know, in, like, really damaging practices that plantain owners did to their slaves.

18:37And since I was made aware of that, every time I hear the term cakewalk, I just cringe.

18:49Yeah. I just wanted, yeah, just as a term, like, as an example for the English language, that, yeah. And if you don't know this context, like, there's no way you can know where, like, the origin, where that comes from, because there's nothing in the word from which you can derive the context where it comes from. It could have a completely different background, but you need to have quite a good understanding of the language and the culture to even know that this is a politically sensitive idiom.

19:28Yeah, absolutely. And I mean, I didn't even know this word and its background, and I'm working on sensitive language. So I find it is very difficult if you work with translations, for example, if you're not at least as familiar with the language you're translating to, you unintentionally might use language that is discriminating people. Yeah, I think that makes it also, that also highlights why it's relevant in the context of machine translations, right?

20:04Yeah. Yeah. Exactly. Because I always want to say, because machine translations don't have context, and then I get, like, metaphorically hit on my fingers by the computer science people. Because, of course, they get trained on so much data, and somewhere in the data, it probably, it might even say the definition of the word or the historic background of the word.

20:35But nonetheless, when they make translations, it's not based on these contextual clues in the first place, but on statistic likelihood. And that way, the majority language is also replicated, which in turn means that biases and stereotypes that are used by the majority of people are replicated in translations.

Translation Tools

21:03Maybe before we go a bit deeper into insensitive or discriminating things that happen when using translation tools, Sarina, could you explain or tell us what different tools for translations exist, and how they differ with regard to their functioning and performance? Sure. So, nowadays, there are largely two types of translation algorithms used.

21:35One is neural machine translation, and the other is large language models, and I think by now, we're all familiar with large language models, because of GPT. However, the difference between the two in the context of translation is the way they learn the translation skills, so to say. Neural machine translation tools are trained on corpuses of data that are already matching target and source language.

22:09So, for example, so for example, German and English text that is matched together, translations of books or whatever, where the tool can basically learn what is supposed to be the translation. Neural machine translation, whereas large language models are trained on all the data, for example, of the entire internet, and then they learn a lot or they extract a lot of skills from that regarding language, and one of it is translation, because the internet is existing in multiple languages.

22:45Most tools that are now available, maybe even commercially for translation, specifically on your machine translation or any hybrid forms, but if we go into hybrid forms, this becomes a lecture. So, just think of neural machine translation tool, because they perform quite well. In that case, we have DeepL, Google Translate, anything.

23:21If you go online and search for online translation tool, it will probably be a neural machine translation tool. Do neural translation tools perform differently compared to large language models? In the accuracy, or are they comparable?

23:41I am not familiar with a comprehensive study on it. In the context of sensitive language, there are some studies that compare neural machine translation tools to large language models. And also there, it's not quite conclusive. My impression is that neural machine translation can be better, however, large language models can be prompted to take sensitivity of the translation into account.

24:24And in a similar vein, maybe it also makes sense to briefly explain my definition of sensitive language. It's not my personal definition, but the one that I'm working with. And that is that language exists on a spectrum from preferred language. So, language that the people who are affected by the terms or marginalized in a way.

24:59So, the terms that they want to use or want to have used to discriminatory language. So, usually language used by the people who are not marginalized or language that has developed somehow to be discriminatory. And along this spectrum can be a lot of different levels of sensitive or less sensitive language. For example, hate speech or derogatory terms.

25:34And what makes it tricky in translations is that you don't want to prescribe people how to translate. You don't want to tell them you always have to translate sensitively or unsensitively. Because it depends on what the original text was and what the intention is with the translation. And also how, I just know that for some, on some areas, there are also the discussion between person first or it doesn't matter, for example, in, you could say, autistic people or people with autism.

26:18And there also is even within the community, the discussion of, which is appropriately describing a person who has autism or who is on the autism spectrum. And I think that will make it even more difficult, right? Yes, absolutely. Also, language changes over time. So, words are reacclaimed. Words that were originally from the, on the preferred side of the spectrum can be misappropriated and used as slurs, for example.

26:54So, it's not a static, we said it once and then it's done situation, especially in different languages. This is also influenced by the context of the country, the region, the people that the language is spoken by. Now that we talked a bit about what sensitive language is, or with which definition you are working, what strategies are there already, or are there already strategies to ensure that machine translations use more sensitive language?

27:42A lot of research has been done on gender bias in machine translations, so much so that I think they coined their own research area, because, and this is a special case, first of all, because in gendered language, if we think binary, about 50% of the population is affected with mistranslations. Of course, there's also a case for doing non-binary gender bias research, which is more and more done, but in gender bias research regarding translations, they look at different areas of where the bias is introduced to the system.

28:35So, does it come from the data the machine translation is trained with, does it come from engineering, from how the algorithm is structured, is it what is rewarded to be an output, for example? Is it the labeling of the labeling of the data, is it the labeling of the data, and so on? And they're still working on it. It's a complex problem, because different languages have different approaches to gendered language.

29:08So, English, for example, there are different gendered pronouns, but the nouns are non-gendered, typically, whereas if you translate then English to German, you have, or Spanish, you have many cases where you need to make decisions. Does the noun now become a female or male version of the noun, for example, talking about jobs?

29:38And because this problem is inherent to translations, and so many people are affected, a lot of research has been done on it, and they're getting better and better in translations, which I think is great. Another strategy that I see with commercial tools nowadays is providing alternative translations. So, you have a sentence like, the doctor went to work, and then in German or Spanish, which gives you the male version of doctor, and then there's an alternative translation, the female version of doctor, which then lets the user decide how they want to proceed.

30:27And I mentioned Machtsprache before, which is the project I've been collaborating with. They have developed plugins to use alongside translation tools, such as DeepL and Google Translate, where they highlight words that have a sensitive context. And then the user can read up more on it and make a more informed decision on how they want to translate. So, there are quite a few approaches, and it's not just on gendered language or gendered bias in language, but the main focus is on it.

31:12It's super interesting, I didn't know that those plugins existed, but I think it's helpful, especially when you're translating or when you're writing in a language where you're just learning the language as well, to also learn more about nuances within the language. Because when I think back of before I started to use, for example, English more in day-to-day life as well, I often didn't, I still sometimes struggle, of course, because I'm not a native English speaker, but I often didn't understand why a term would be not appropriate in a situation.

31:49Although, it could mean the thing I wanted to say, but not in the context. And I think it will be, I think it depends on different parts, because, for example, English is quite rich in words, while other languages have less nuanced words, for example, but then again have more things that could change meaning with the grammar used, which is a bit easier in English compared to other languages.

32:19So, I think there also, it can be super confusing when you come from a background native tongue, which is more weighted into the, for example, grammar changes everything direction compared to, there are 10 words to say the same thing, but every word has a slightly different meaning. Yeah. Yeah. Yeah, I'm also just thinking that a tool like that could help non-native speakers navigate instances where it's, for example, not clear how you would translate that in German, because you have different options.

32:55Like you can, for example, in German, you have two words for, for example, student or teacher or whatever, in a male or a female version. And especially when you want to, or when it's not certain which gender the person has, you have different techniques to basically include the two binary genders or like the gendered spectrum. And there are different attempts to do that, and I don't know, this is not clear, there are no clear rules in the German languages.

33:33This is something that's currently like developing. So, I think a tool like that could also be super interesting for non-native speakers to read up on the different ways how you could translate that to be more gender inclusive. Um, and yeah, maybe also what to navigate when translating that. Absolutely. And from a researcher slash engineering perspective, the challenge already lies somewhere else.

34:05So, it's not so much about what rule there is in, in German to include all the genders, um, because you could suggest alternatives. But it already starts with the data we have, because we don't have so much data to train the algorithms on marginalized language, which is somehow already inherent to the fact that it's marginalized, or that it comes from people who are marginalized in society. Because these algorithms, because these algorithms all work on huge amounts of data, to, to make the translations more inclusive, we first need to get some kind of solution, either more data sets or different techniques to, um, yeah, level out the gap in data or the, the disparity between, um.

35:00Um, um, um, um, majority language, minority language, though these terms are a bit iffy. Um, are there any, or could you give us a few examples on techniques, how you could level out the data sets? Um, well, if I knew.

35:26Or are there already any approaches? So, actually, so what, there, there are machine, machine learning approaches to level out data sets. The issue I have, or I find with this specific case of sensitive languages, that, um, you can't necessarily use the form. So, what, what they do, for example, if you have image, if you think of image data sets, and you have 2,000 pictures of red flowers and only 200 of yellow flowers, but you want the algorithm to learn both.

36:07What you can do is, um, either cut down on the red flowers or increase the number of yellow flowers, for example, artificially generating pictures of yellow flowers, or you can use different kinds of, um, like, let's say, reward systems, um, or different kinds of splits of data. There are so many options. Um, the tricky thing with, um, sensitive languages that, um, artificially generating sentences is already questionable because how realistic would they be in actual language use?

36:49Um, it's, um, it's tricky to use language, um, from, for example, um, social media platforms, um, not only because of the data restrictions of the social media platforms, but, uh, for one example we had in a course was, well, why can't we just use all the tweets with hashtag Black Lives Matter? Um, and extract language from there, um, the issue here is on a, you, you would then get to the, first of all, you would get kind of free work from the people who make the tweets without the intention to be used in algorithms.

37:37So you don't really have, uh, an approval concept or consent concept. One could say, well, this is just how, um, the internet works. And I think a lot of AI people think so too. It's a kind of struggle I, I balance with every day because in the social science side or the, um, the, the, there are, there's also a history of scientific misuse towards people who are marginalized. So you don't want to repeat that process either.

38:11Also, um, yeah, uh, you, you cannot necessarily be sure that only people who are using the, um, the preferred language would be using the, the correct hashtag for example, or the not correct, the selected hashtag. Um, so there's an issue, um, then on different kinds of, um, in different areas of marginalization, different amounts of data are available in the first place, not only online, but in general.

38:51So, um, um, where, um, through these discussions and, um, discourses happen about word definitions, um, sometimes just in spoken language, sometimes, uh, it might be a, a Reddit thread, who knows, but, uh, it's so spread out, especially with groups who are not so online. Um, so, um, um, that's an issue or not an issue, but, uh, an issue if you're a computer scientist.

39:29Um, and another part is also the standpoint or, or the, the position I have as a researcher. So, who am I to define language, marginalized language in an area that's not part of my personal identity? And even if it is, who am I to make decisions for an entire community? So, there needs to be different strategies and we have thought about some in workshops and other researchers are probably also having other either solutions or approaches to the topic.

40:10Um, but what currently happens in research is usually that, um, we work with, um, usually small artificial data sets. Um, so for example, for some specific, um, use case, uh, student of mine created a data set of occupational nouns and how they could, could be translated. For example, um, um, um, I also saw research of people who, um, went through, I don't know, TV translations of series and filtered out slurs.

40:52Um, so, yeah, those, those, um, the, the approaches now, um, and I think that's something, yeah, we, we work with and see if we find different opportunities or solutions to gather data or maybe find algorithmic solutions, computer science solutions to the problem. Um, um, um, of having not so much data.

41:29You shortly touched on that. It's difficult to, as a researcher, decide on the most sensitive language for a group, um, that is not part of, for example, of the researcher's identity in this sense. Um, how would you generally recommend, um, if a person would, or wants to be more sensitive in their language use also when using, for example, translation tools to take care of people within marginalized groups?

42:01Because you can't ask each person individually, how would you like to be referred to just because there's so many, um, people and so many different opinions, as we said earlier, for example, on the autism example. Um, yeah, um, I don't think there's one rule to it. Um, the way I approach this, um, learning from people that are part of the community, reading books, um, that have been published, learning, um, article, uh, reading articles.

42:38Uh, just having an eye out for these kinds of, um, language issues that arise. I think that's maybe also just an interest of mine to learn more of different, uh, of different communities. Um, in translations, um, in the first place, you can make use of the opportunities. that the translation tools already give you.

43:10So if they give you alternatives, you can think of the selection you do. Um, if you are more familiar with the language you're translating to, you can already also just be on the lookout, um, from the language or the background that you have in your native language. Um, of course, there is no, like, 100% guarantee you will find it, plus it requires a lot of personal effort.

43:45So, um, um, the other way I, I take, or I would suggest is to also be open to critique and be open to, to change words or your use of words, um, when you hear, or when you learn more about their background. Um, and yeah, I wish there was a, there was a, these are the five steps, but, uh, so far there is not.

44:20Uh, I think it's just about being open to adapting language and being aware that there is an issue with language translations and somehow balance these two things, um, to, to find your own solution to it. So one could be, um, to state when you use the translation tool, um, or, um, I don't know, open the room to, to receive comments on your language use and say, hey, if, yeah, I don't know what you would say.

45:04Hey, if anything irks you, or if I'm saying something that I'm not, uh, that has a, um, non-sensitive background, just let me know. Um, yeah, but again, these are kind of the rules I made for myself as well. Um, so see how you can work with this and adapt it and of course, give me feedback if you have better ideas or I said something that is not, uh, as sensitive as it could be.

45:36I have two small additions to that. Um, I think one tip that I heard from somebody, um, um, who was, uh, giving inputs on Instagram about, um, um, ableist, ableistic language was, um, if you're referring to, um, people with a disability and there are like, um, different preferences within the community, how they would like to be addressed, mix them up.

46:07Like, don't stick to just one version, but, um, use them all interspersed. And the other thing I think, which is really important to keep in mind, especially for people who are part of a majority, um, like, no, that was a not great way to express that. But who are part of a population that is not that much discriminated against. Like, for example, we are all, we are three white women discussing this topic right now.

46:38Um, if it's a conversation or a discussion that makes you uncomfortable, it's a discussion worth having. Um, so like, yeah, to kind of realize when something about this makes you uncomfortable and then not run away from it, but like sit with it and reflect on that. Of course, not in every context, but especially when it's about like just a sensitive language and so on and yeah, reflecting your own biases and so on. Um, yeah, great addition.

47:11Thank you. Um, I, I don't know if it's a weird question. I think not, um, if you could build the perfect translation tool that tries to incorporate everything that's important about sensitive language, what would this tool need? Oh, I love this question. Not weird at all. Um, oh, this is so many, uh, this is many fold. So first of all, I would love to have a diverse team of people working on the tool to not just have my input to it, um, which is a challenge I face at my research a lot that I may need the only person working on my research.

47:54Um, but if we're talking about the tool itself, I think I would like the suggested translation to be a sensitive one or on a preferred language site. Um, and then the options that are given to be across the spectrum of sensitive language, as opposed to having the biased one as a default. And if we go even beyond that, I would love for a translation tool to, um, take positionality.

48:34So, um, what, what is your stance as a translator? So what's the intention, for example, target audience, the, uh, all the things that professional translators think about when they're translating into account and maybe let the user also choose. Okay, I'm now translating in historic text from the perspective of a historian and translated to the other language for an audience that is also mostly historians would give a totally different translation than having the first part before historians.

49:16And then the second part for the general public, for example, because you would choose different language.

49:25So, yeah, I would like the tool to be descriptive, not prescriptive, not tell people how to use language. And, uh, ideally it would be as easy to use it as it is now, but the users are more aware or more involved in translation decisions. Because in the end, they are using the translation as their own work, so to say, or as their own words.

50:03Um, and typically don't write, it's translated by a tool underneath it or, um, similar. So, yes, I, I have thought about this, but it didn't come out as coherent as I planned because I have so many different ideas. I don't think there is one ideal tool either. I'm also not so familiar with different languages. Um, so that's also an aspect that will go into it.

50:34I don't know. But the aspect of, uh, different options to choose from, I don't just want that for sensitive language. I want that for everything. Like, there are so many nuances, uh, between different languages. And, uh, I don't know how many times I've been mad at, I don't know, Google Translate or whatever, because it translated something. And I was like, well, I guess that's what I wrote there in German, but that's not really what I want to say in English. Fair enough, yeah.

51:04I think just kind of a multi-choice option, kind of after, like after you've entered a text and have your target language. I think a tool would be awesome if they could, or if it would, um, not they, if it would kind of scan through the text, get the main parts of the text or the main content. For example, when we translate a text from English to German, so from a non-gendered language to a gendered language. And the English text is talking about, let's say, um, doctors, because we had doctors in before, as an example.

51:42Um, to then kind of give options of, okay, do you want to have, what kind of, um, gender do you want? Do you even want to have a gendered, or do you just want to use the masculine? So just to have the options where you can select, okay, I want to have that masculine. I want to have a version or a option where I have both genders, or I want to have a gender neutral version in the end. Or to also, um, for words, which can be ambiguous, have an option on general after entering a text to say, well, for this kind of context, um, I would like the translated text to have this nuance, which isn't even needed in the original language, but is needed in the target language.

52:25Yeah, I feel like a lot of points could be resolved if the translation tool would be just more aware of context and the content of the context, be it, uh, yeah, historical discourse, activism, whatever relevant in the respective country or language you're translating to.

52:55Um, um, what I also thought about was, what would be a really interesting would be to get the, let's call it positionality of the translation tool. So, um, know with which perspective the tool has translated your text in the first place. Yeah, I don't know if, I don't think it, I mean, it doesn't choose a perspective, um, obviously, but some perspective will be in there, be it based on the engineers who worked on it, on the data it had, on the prompts it was given, if we're talking about LLMs.

53:40And just as a user to be aware of, from which lens the translation tool translated would already be so interesting.

53:50Yeah, maybe to also just have an overview about which data sets were used to train on it, for example.

53:59I guess you have to assume the whole internet, but. Yeah, funny enough, um, as I mentioned before that, like this intersecting area that I'm researching on is relatively new, relatively new. So while we all have anecdotal experiences of more or less sensitive language, we have a lot of information about gender bias and machine translation, but we don't, as researchers, we don't yet know how sensitive or non-sensitive translation tools are in the first place.

54:40So actually that's what I'm working on first, I skipped the data set part, which I'm now finding is this was a not so smart decision, but, um, yeah, I want to establish first are the tools as unsensitive, or, yeah, as I'm expecting, or anticipating, because from there, you can then go on and build a better translation tool.

Improving Translation Tools

55:04Or, um, um, um, target areas where it can be improved.

55:14Yeah, that makes sense to just, yeah, establish the status quo and then see what needs to be done. Yeah. Um, how are you currently working on checking out what the baseline is that you're working with? Um, yeah, so I, first I did the classic thing that every PhD does at first, which is a literature review and looked at what, what research is out there and, um, yeah, uh, not, not so much that is intersecting, um, on, on different parts of sensitive language.

55:56Um, and now for establishing it, um, I'm not quite set yet on how I want to approach it. I thinking about some kind of benchmark situation, but then, uh, so developing some kind of data set and evaluation tool that allows me to run it through a translation tool. And then I get a number back basically telling me, this is good, this is bad.

56:26Um, but then my next thought is benchmarks have to be maintained to be, um, useful long-term. Um, and as soon as the stuff is in the internet, the tools can train on it and then kind of optimize just for the things I want to test. So, um, um, I'm trying to come up with something that is a mixture of the classic approach to a benchmark and creative, uh, creative measurement tools or evaluation strategies, um, that, of course, the tools eventually will be able to train against.

57:14But at least, but at least don't make my work obsolete in five seconds after it's been published.

57:22Yeah, that, uh, is a very, uh, understandable goal for the project. Or, uh, well, my aim and it might shift with deadlines rising, but my aim is to, to build the whole thing in a way that it's extendable. So for different languages, for more sentences to be input or if we're talking about like classic benchmark style for new techniques to be added and so on.

57:55And so even if I've put it into the world and the tools adapt to it, it is already designed to be extended. Um, so maybe that's a way to mitigate my worry of obsolete PhD projects. Maybe we can slowly come to an end. Um, so, um, one of the two final questions is, um, what role cognition plays in your research?

58:31Well, first of all, I think historically language has been a big marker of researching cognition and, um, what biases are, how they express and so on is also quite cognitively interesting. Oh, wait, I mean, me thinking about it, it is, it is, it's interesting, but also how they establish in the brain is interesting.

59:07Um, however, in the first place, my topic doesn't quite sound like it will figure out something about cognition. I think it's in the other direction rather, and I, at least with my background, but I hope to also do it a bit more explicitly, use, um, techniques from different areas of cognitive science and, um, different knowledge about language use, um, to work on my project.

59:47And if a person has listened to the episode, what should they definitely take home from the whole topic of sensitive language and machine translation tools? You should take away that, um, machine translation tools are, well, not yet established, but anecdotally observed, are not so good at, um, translating sensitive language.

1:00:19And this is an issue for all of us because first of all, as users of machine translation, we will encounter the issue, but as users of the internet who have ever seen an automatic translation, um, which is everywhere, uh, we are also constantly confronted with translations made by machine translation. And if we have biases or stereotypes or unsensitive language reproduced in this, um, we reproduce it, um, in our minds because we read it, we consume it and, um, and therefore having, um, more sensitive machine translation tools will allow us to have more sensitive content we consume.

1:01:14And this is, um, and this is a bit more of a stretch, but language shapes how we think and taking care of the language we use and we consume, um, well, hopefully in the long, long run also have a benefit for the people, for, for the marginalized groups and lead to more equality and, and understanding. Yeah, thank you so much for talking to us and, yeah, also explaining this topic a bit more because I think, like, I am aware of the topic because we know each other and we work in the same office, so I know what you're working on.

1:01:56Um, but before I haven't thought about that this much, although I was into a sensitive language and I'm in the context of translation tools, um, I'm really glad that we were able to cover this on this episode. So thanks a lot for that. Thank you so much for having me. It was a pleasure.

1:02:16Also, I always love talking about my PhD stuff. Um, and if people would like to learn more about you, is there anything where they could find you online or is there any things they could look into to learn more about the topic? Um, yes, people can find me on LinkedIn. Um, I'm called Sabrina Frohn. If it's associated with University Osnabrück, that's me. Um, I have published my first paper now.

1:02:49So you could also. Congrats. Thanks. So you could also read that. Um, um, and to be honest, if you just want to look into sensitive language in general, don't read, like my, my stuff is already quite like it's research heavy, but, but read about, read books from people who are, um, affected by the sensitive language. Or not, not use of sensitive language or listen to podcasts, seek out information from black BIPOC people, um, people who are neurodivergent, people, um, who have a disability.

1:03:34Just surround yourself with more diverse content. And if it's just in quotes, just that you check out your YouTube subscriber, uh, not subscribers, but the people you subscribe to on YouTube. And have an eye out of whether this is a diverse group. Um, I think already that makes a difference for, yeah, yourself. Yeah, perfect. Uh, thank you so much. We will link the papers mentioned in this episode, um, in the show notes.

1:04:08So if someone would like to read up on it, you could also find it there. So thanks a lot for your time. Thank you. This was Kaleidoscience. We hope that you enjoyed this episode and we would love to have your feedback. You can rate our podcast and give us feedback on our Instagram account. Have a great week and you'll hear from us again in two weeks. This episode was hosted by Imogen Hüsing and Elisa Palmer. Produced by Imogen Hüsing, Clara Kühne, Sophie Kühne, Sönke Lölf and Elisa Palmer.

1:04:42The music is from Jan-Lukas Schröder and the logo is from Annika Richter.

More from Kaleidoscience: Conversations on Cognitive Science

S3 #48 What happens in the brain of your cat? Brain-to-brain with Dr. Sevim Isparta and Prof. Nadja Freund.

Apr 30, 202657 min

S3 #47 Does sign language change your brain? Brain-to-brain with Dr. Karen Emmorey.

Apr 9, 20261h 3m

S3 #46 Why should we be vigilant when politicians talk? Brain-to-brain with Prof. Nicole Gotzner.

Mar 26, 202640 min

S3 #45 How do children learn adjectives? Brain-to-brain with Charlotte Uhlemann.

Mar 12, 202650 min

S3 #43 How do parasocial relationships with chatbots form? Brain-to-brain with Takuya Maeda.

Jan 29, 20261h 3m