How Can Math Protect Our Data?

August 7, 202539 min · 7,700 words

Open in Steadcast for Mac Apple Podcasts Overcast

Show notes

Every time data travels — from smartphones to the cloud, or across the vacuum of space — it relies on a silent but vigilant guardian in the form of error-correcting codes. These codes, baked into nearly every digital system, are designed to detect and repair any errors that noise, interference or cosmic rays might inflict.

Highlighted moments

error-correcting codes is the story of math fighting noise.

1:56

“I could send you A and B, and then A plus B mod two, or A, X, or B.”

9:33

“What polar codes utilize is a transformation that basically will transform a whole bunch of channels that are kind of equally not great into a few channels that are fantastic and a few channels that are really, really bad.”

19:00

“the types of errors that are introduced when doing DNA storage are different than the types of errors that you expect in silicon devices or something like that. And you have to designing different codes to deal with them.”

30:55

Transcript

Introduction

0:00I'm Steve Strogatz, and I'm Jana Levin, and this is The Joy of Why, a podcast from Quantum Magazine exploring some of the biggest unanswered questions in math and science today.

0:17Hi there, Jana. Hey. How's it going? Good. What do you have for me today? Well, we're more or less about the same age, so I'm guessing you remember what CDs are. Oh, yeah. The kids don't necessarily know. I guess some of them are using vinyl now, aren't they? Records. Back to vinyl. Yeah. Nobody's back to CDs, though. I don't think anyone's got a new CD, Walkman, you know. But do you remember back when CDs were the thing and people would demonstrate that you could scratch them and they would still play?

Error-Correcting Codes

0:48Yeah. I'm bringing it up because the fact that when you play a CD that's been mangled, if not too totally messed up, it'll still play, is because of a wonderful technology called error-correcting code that was built into that kind of device. And that was the first time I ever ran across the concept. But apparently, it's really out there in our modern world today when you're streaming TV shows or cell phones, probably this communication we're having with each other right now, there are probably error-correcting codes running behind the scenes, making sure that all our

1:22bits, you know, our zeros and ones are getting where they're supposed to be, even though there's always noise. Think how weird it would be if your light switch suddenly switched itself off. That doesn't really happen in our everyday macroscopic world. But apparently, there's a lot of things that can cause bits to flip, like ones become zeros when they shouldn't. They get hit by a cosmic ray, or your cell phone signal is bouncing off the walls and it can interfere with itself. But you have to add more bits to help correct the inevitable errors caused by all the noise

1:53and all the potential sources of corruption in the environment.

Math Fighting Noise

1:56And so error-correcting codes is the story of math fighting noise. How to use those extra bits as efficiently as possible, using redundancy, but in very clever ways. And would this be a computer scientist's arena? Yeah, totally. That's totally their bailiwick. They're very ingenious. This is a lot of discrete, really clever stuff about algebra, about graph theory. And do they ever get it really wrong? Do they ever make the errors worse?

2:27Do the error-correcting people need to be corrected? Right. Well, from what I hear, some of the error-correcting codes can recover messages in astonishing ways.

Interview with Mary Wooters

2:37But maybe we should dive in and let Mary Wooters, an assistant professor in electrical engineering and computer science at Stanford, tell us a little bit about the grand story of error-correcting codes.

2:54Hi, Mary. Hi, Steve. Thanks so much for having me. I'm very happy to meet you and see you here. I was wondering a little bit about this book of yours that you wrote with your co-author, Aviad, about algorithms for toddlers. You want to tell us a little bit about that? Sure. Algorithms for Toddlers is a whole semester of undergraduate algorithms condensed into 14 illustrated pages with rhyming couplets, which you can find, I would like to say, at all fine bookstores, but actually just online at barnesandnoble.com because we self-published.

3:26So if that is for you or for your inner toddler, yeah, it's a good thing. I enjoyed the few bits of it that I saw on YouTube. That's right. Yeah, I have my best kindergarten teacher voice on if you want to see me reading the book on YouTube. Do you have an illustrator or did one of you illustrate it or? I illustrated it. You illustrated it. So this is a book co-authored with my colleague here at Stanford, Aviad Rubenstein. Aviad and I both occasionally teach the undergraduate algorithms course here at Stanford, and Aviad has young kids. And so he thought it might be a good idea to try to make a book to teach algorithms to

3:58them. I see. You do seem to have some passion for helping convey the wonders of computer science, of algorithms, coding to a pretty broad audience. I mean, you're cornering the toddler market and their parents. We're going where are the niches, yeah. But so do you think it's important to share this kind of information in a very broad or accessible way? What's motivating you to do that kind of work? It's not so common. That's a good question. I mean, I think you're right. This is something that I really care about.

4:28I get really excited about my work, about the mathematical and technical ideas in it, and I want to share the joy, right? I want to communicate that to others. So I think that's where that comes from in me is basically there's like this little five-year-old kid inside being like, oh, my God, it's so cool. That's an honest answer.

Error Correction Discussion

4:46Well, so we have you with us to talk about error-correcting codes, and I would love to dive into that. But it's not something I'm very familiar with, and it's possible our listeners are not either. So if you don't mind, I'm wondering, how did you get interested in that? What inspired you to start learning about them? Yeah. So maybe just for listeners who have no idea what the heck these things are, error-correcting codes are sort of a fundamental tool for protecting data from noise. So basically, whenever you send data, think of your cell phone talking to a cell tower,

5:16or if you want to store data on a hard drive or something like that, like errors are going to happen. Radio waves get interfered with, hard disks get scratched, or things go wrong. And so you need to add redundancy, basically, in order to make sure you don't lose data. And an error-correcting code is a way to do that. And how did I get into it? One of the things I really like about this area is it's super interdisciplinary. So my background's in math, and there's a lot of really beautiful mathematics that goes on with error-correcting codes.

5:46But now I'm in a computer science department and electrical engineering department. There's also a bunch of folks in CS and EE who think about these things, too, in lots of other communities where error-correcting codes are interesting. So I think that's one of the things that really drew me in, is just the people and the opportunity to deploy a wide range of toolkits to study this one type of object. Great. Okay. You've mentioned a little bit about why error-correction is important, that there's always some source of corruption or noise or something. We hear the word noise a lot. I mean, in ordinary speech, noise means I'm talking to someone at a cocktail party and

6:20there's other people making noise or there's music. How should we think of noise generally? It's a great question. And there's not one answer. But it really depends on the application. So in communication, you have your signal that you want to send and you have lots of other signals that might interfere with it, which is not that different, at least conceptually, with the cocktail party. When things are digitized on either ends, that can be manifest as bit flips, for example. Like I meant to send a zero and actually a one gets received. Or possibly you get some kind of uncertainty where you're like, ah, an 85% sure it was a one, but it could have been

6:52a zero. So that's one sort of model of noise. But you could also have different types of noise depending on the context. So for example, you might have something that's called an erasure rather than an error, which would mean that a bit just turns into a question mark. That would be the formal version of it. Sort of like a blank or something? There's nothing there anymore? For example, in a distributed storage system or something, if a server goes down or if a storage node goes down, you know which one went down, but you still don't know the data on it. So that would be like an erasure type of model. But there are lots of other models. You could have

7:23synchronization errors. Like I'm supposed to receive a string of length seven and I actually receive one of length six. I know one bit went missing somewhere, but I don't know where. Oh, wow. Or the other way around. Things could get inserted in or insertions and deletions. There's really a rich class of errors. And that's kind of what makes this problem very interesting. So there's something really mysterious about how you could even possibly do error correction, which is if something is missing, like a one or a zero was supposed to be there, but it's not, or a one got flipped to a zero for whatever reason. How could I possibly know that? What

7:58is an error correcting code, I guess is what I'm asking. Great. Yeah. So first you're totally right. If we don't do anything to the data, there's nothing that can be done. If I'm going to be sending you a random string of zeros and one and something gets erased and turned into a question mark, you have no hope of learning what that is. But the idea of an error correcting code is that you're going to add a little bit of redundancy to help. And so the most naive way to do this is just to repeat the data a whole bunch of times. And then you'd hope that enough copies are there, that corruptions don't matter too much. And that works in most situations, but it's extremely costly. If

8:30you're talking about communication systems or storage systems or something like that, that's more power, more time, more space. That's not great. And so instead, we use something called error correcting codes to have the same idea, but add a little bit less redundancy. To show how this works, maybe I can give a quick toy example illustrating the principle, and then I could talk a little more generally about some geometric intuition behind what's going on here. So as a first example, let's suppose that I want to send you two bits, so A and B, and they're either going to be zero or one. And I know that despite all the effort that

9:02we've done to get the sound just right on this podcast and everything, unfortunately, whatever I tell you in the next like 10 seconds, one bit that I tell you is going to get erased, is going to get replaced with a question mark. And so in this setting, if I want to send you two bits, the naive thing to do is to send you four bits total. I'll send you A and B, and then A and B again. And then no matter what gets erased, you'll still get both A and B. For example, I could send you, if you hear A question mark A, B, well, you still have at least one copy of A and one copy of B, so you know what's going on. Okay, so that's the baseline. That's four bits.

9:33But here's a better scheme that I could do. I could send you A and B, and then A plus B mod two, or A, X, or B. Okay, wait, let's think about that. Go slowly. Let's see why that works. So just to give a quick example, if A and B were zero and one, then I would send you zero, one, one, because A plus B is one, mod two is one. Or if A and B were both one, I'd send you one, one, zero, because A plus B is two, which is zero mod two. Right, so I'm going to send you now three

10:04bits. And why does this work? Well, suppose that one of these bits gets erased, just for the sake of example, say it's the second one. So you hear A question mark A plus B mod two. And when you get both of those pieces of information, you can actually figure out what B was meant to be, even though it got erased. If you know that A was zero, and A plus B mod two was one, then you know that B must have been one. That's the only thing that you can add to A to get one mod two. So you can always solve the system to figure out what you are missing. I see. And so just in case anyone

10:37listening isn't familiar with the idea of mod two, I guess a simple way to say it would be, is it even? Yeah, that's exactly right. So you add those together and you'd say, is it even? If so, that's a zero. If it's odd, that's a one. At the receiving end, am I supposed to know that you're doing this A plus B mod two? Like, is that understood? Great. Yes, that is understood. So we've agreed beforehand on what scheme we're going to use. Okay. So this was just a very simple example, but it illustrates that you can do better than sort of the naive repetition. Real error correcting codes are generally much bigger than this, but there's a big class called linear error correcting

11:10codes, which have basically this idea. So you start with your message. Let's say instead of two bits, I have k bits, and I want to add some redundancy. The redundancy that I add is by taking parity checks of my message. So instead of taking all of them mod two, that's one parity check, but I could also take the first plus the third plus the seventh mod two, and the seventh plus the eighth plus the ninth mod two, and so on. And the game is kind of how do you design these parity checks so that at the end of

11:41the day, you can correct against the flavor of error that you're interested in, be that erasures or errors or synchronization errors or other sorts of things. Here's a geometric way to think about this previous example. So what's going on? There are four possible messages that I might want to send to you. There's 0, 0, 0, 1, 1, 0, or 1, 1. And correspondingly, there's four possible strings that I might send to you. These are now three-bit strings. So I might send you 0, 0, 0, 0, 1, 1, 1, 0, 1, or 1, 1, 0. Those are the allowable strings in this silly example. So now I've got these

12:16four strings of length three. These four special strings that I just said, they have the property that they all differ from each other in at least two places. So it takes two bit flips to get from one to the other. Oh, that's nice. Okay, this is helpful. Right. And so what I've just done here is I've designed a set of strings so that they all differ in at least two places. And we can see why this means that they can correct one erasure. And you can generalize this to talk about errors as well. So instead of something getting replaced with a

12:47question mark, I would have like a 0 getting replaced with a 1 or a 1 getting replaced with a 0, but I don't know where. But if I have such a design of strings all three apart, I can actually correct one bit flip error. So you could put in a lot of redundancy, but then you're not sending much message, right? Tell us how you would characterize the trade-off. So there's many, many things that one could care about. But a key trade-off that one cares about here is this notion of distance. So how far apart are my special strings in terms of bit flip error? And then also how many strings are

13:22there given the length of the string? I only get to say one message per string. So the more strings there are, the more messages I get to send. Okay, that's a nice way to say it. So let's say I'm going to have a bunch of strings of length, I don't know, 100, right? And I want them all to be at least 10 apart from each other. I want to be able to correct at least four errors or something like that. Then the more length 10 binary strings that I can fit in while never making any pair of them too close to each other, the more efficient that code is, basically the more expressive my language is.

13:58It occurs to me that maybe this is trouble with a mathematician talking to another mathy person, that we may want to back up for a second and just go a little more palpable. As a grad student, I remember someone showing me a CD. I don't know if our listeners even remember what CDs are, but it was a big deal back then to take a coin or something and scratch the CD. And then unlike vinyl records, you didn't hear it. The CD player would take care of it somehow. Wow. That is almost certainly something called

14:28Reed-Solomon error correction, which is very pretty stuff based on polynomials. That is a perfect example of error correction at work. Uh-huh. And do you have some others for us? I mean, if I'm watching some TV show or movie, are they doing error correction when they send it to me? Yeah. Basically anywhere where data is stored or transmitted today, there's going to be error correction. There are also other things you can do on top, like for Netflix or something, like they're trying to send you these relatively big packets or something. So if a packet doesn't come through, your computer will just say, hey, I didn't get that packet and they can

14:59send it again or something. I should also say, I'm not an expert on streaming video. I don't know exactly how that works, but that's my understanding. But yeah, there is error correction at some level, basically anywhere data is communicated or stored. So CDs are a great example, cell phone communication, satellite communication. It's actually very interesting. Error correcting codes are just a very useful sort of combinatorial and algorithmic primitive. They have uses in algorithm design. They have uses in complexity theory, in theoretical computer science, pseudo randomness, things like that. Well, you've mentioned some error correcting codes by name

15:32already. Is there sort of a broad category of codes that we should know about other than those? Yeah. So there are lots and lots of different types of error correcting codes out there. One family is algebraic. I mentioned Reed Solomon codes earlier. This would fit into that family. So algebraic codes are codes that are mostly based on polynomials over finite fields. So that's some jargon. But the basic idea is if our goal is to come up with a whole bunch of strings that don't agree with each

16:02other too much. And imagine I were to come up with a big set of such strings by taking a whole bunch of quadratic polynomials and evaluating them at a bunch of points. And these are going to give me a bunch of strings. In this example, it's strings of real numbers, one, two, three, four, five, six, seven, eight, and so on. And then I might ask, well, okay, how far apart are these strings from each other? Well, any two quadratic polynomials can only intersect in two points. You can kind of see this for yourself if you like imagine trying to draw parabolas and like how many points can you get

16:35them to intersect in? It's no more than two. So that means that if I were to build a bunch of strings this way, they can't agree in more than two points. That's a nice picture. And so that's the basic idea that's at the core of a lot of these algebraic constructions. Another family is what I would call graph-based codes. So these are codes that are based on bipartite graphs. And so the idea here is I'm going to view all of the elements of my code words, that is the elements of these special

17:05vectors that I will send. So maybe I have a binary string of length n. I'm going to view each one of those n positions as a vertex in a graph. Then I'm going to have a bunch of other vertices, which I'll call check vertices floating around, that are connected to some of these what's called variable vertices. So the vertices that are indexing positions of my code word. And I'm going to say that I need to come up with a big set of binary strings that are all pairwise far apart. The set of strings I'm going to consider are strings so that when I label these n vertices with those numbers, zeros or ones,

17:44then all of the other vertices, these check vertices, should always see something nice. Let's say they should always see an even number of ones or something like that. Depending on the types of constraints you put on these extra nodes, you can define a whole lots of classes of codes like this. And then you can use ideas from graph theory to analyze how far apart these strings are from each other. And you can also use that to design efficient algorithms. This is something maybe we haven't talked about so much, is that you actually don't just want a big set of strings that's all far apart from each other. You also want them to have enough structure so that you can

18:18correct errors efficiently with an efficient algorithm. And sometimes these graph-based codes give us very efficient algorithms for doing that. Okay. So we've got graph-based codes, algebraic codes. Yeah. And I think those have been the two big ones for a long time, but relatively recently, there's a new class of codes, which is more information theoretically motivated, called polar codes. These were invented in 2008, something like that. And these don't maybe quite fit into either of those categories. They're now in wireless standards and they're very good. Does the word polar have any meaning that we can understand? Is it a very technical thing or

18:53what's polar about it? Yes. So there's some really great intuition about it. So imagine that I have some noisy channel between you and me, something that's going to introduce some errors. What polar codes utilize is a transformation that basically will transform a whole bunch of channels that are kind of equally not great into a few channels that are fantastic and a few channels that are really, really bad. So that's where polar comes from. It kinds of polarizes into having a few really good

19:28channels and a few channels that are really not great. And then you just throw away the channels that are really not great and use the really good ones. And that kind of gives you your code. Wow. It seems like a big gulf between that, you know, intersecting parabolas and when your friend's trying to tell you the end of their dramatic story from last night. Yeah, I have to admit, I mean, this is very far out, sophisticated stuff. So can I run a few analogies by you? Yeah, please. They're not perfect, but I think they might get some of the essence of what she was trying to say.

19:58For instance, she mentioned first algebraic codes. So here's my analogy. It's sort of like I'm trying to send you a message and I'm going to do it by sending you a secret math puzzle. Okay. The secret puzzle is going to take the form of a function. She mentioned algebraic objects like parabolas, you know, quadratic polynomials. So there's some ingenious way that I can translate my message into a curve. That's already a cool idea, that my message is now going to have a shape. And I'm going to send you a few points on that curve. And then the idea is that even if some

20:32of the points get messed up, because there's so much structure built into the way I'm sending it to you, you'll be able to reconstruct the whole curve and thereby recover the intended message. Right. That's very helpful. Yeah. Let me give you another one. So she mentioned graph-based codes. Okay. The analogy here would be that this is like a group project with a lot of cross-checking. You know, like think of the kids, they have to do group work. Usually they hate it because some kids are loafing, other kids are doing all the work. But here it's not going to be like that. Each bit we're going to think of as like

21:06a person in the group. And each person is going to work on several parts of the project. And then small subgroups of the whole group are going to compare their answers. So that if one person messes up, that's like an error, then the group can spot the error and fix it. There's a network structure in the problem that decides who's checking who. And through all those interactions, somehow errors can get corrected collectively. It's kind of like the opposite of a game of telephone. Something like that.

21:37Right. That's just a chain and the error just has no way to correct. It just gets worse and worse and worse. Yeah. But in a more complicated graph, there's a potential for correction. It's interesting. That's the idea that the design of the graph can help you with the error correction. And then the last thing that Mary mentioned was polar codes. Those are pretty deep and modern. As she says, it's only 2008. Those are apparently close to theoretically optimal. It's like if I have a bunch of junky walkie-talkies and we're going to communicate with those. And there's some magical way to make some of the walkie-talkies fantastically clear and good while others become totally terrible

22:13and useless. And somehow you then make sure to send your message only over the good ones. Whereas the bad ones, you don't have to throw them away. They turn out to act like fillers for the rest of the structures. They play a sort of supporting role. Almost like if you built a complicated bridge to do a different analogy, where you had some strong beams and some really bad beams. The bad beams are needed to make the whole bridge, but you're not going to put any weight on those. You're only going to drive over the strong beams. It's something like that in the polar codes. You

22:45polarize the system into really good and really bad stuff and try to just use the good stuff. I'm not driving on that bridge, man, is all I'm saying, though. No, it's the best bridge you could be on in the face of noise, turns out. Right. Anyway, so it's really deep stuff. And we're going to hear more from Mary Wooters after the break about how these ideas get used in things like cloud servers, how they may help in the future for quantum computing, and just how practical it is. So stick around. We'll see you after the break.

23:28Welcome back to The Joy of Why. We're speaking with Stanford computer scientist, Mary Wooters.

23:46From a little bit of the preparation I did to talk to you, I got the impression that there are some folks in your field who might think of the whole subject as a little bit old-fashioned. Are there people making claims like that? Or what's your take on that? Yeah. So air correction has been around since probably late 40s, early 50s. So Richard Hamming, Claude Shannon, and then a whole bunch of other people, you know, sort of foundational in this area. And all of these NASA missions and stuff like that, is air correction going on there too? In my view, I think air correction is a very classical area, but I think it's also still a very

24:21interesting and important area for a couple of reasons. So the first reason is that there are actually a lot of questions that people were asking back in the 50s that we still don't know the answers to. There's been progress on that over the past 75 years or so. There's still very foundational and fundamental questions that are left to be answered. The second reason is that technology changes. And so the types of air correction that was developed in the 50s or 60s or 70s or 80s is no longer necessarily the type of air correction that we want today. One big example of this is

24:52now that everything is much more distributed, we might care a lot more about the communication costs of our air correction. This is not something that people cared about as much in the 50s. Communication between the parts of this distributed system? What would be the example? Yeah. So for example, if I want to store something in a distributed storage system, what does that mean? I have my data and it's broken up and sent all over the cloud. And the types of errors that I might be worried about in this context are some storage node goes down. I don't want to lose information if that happens. And traditionally, when such a thing were to happen,

25:26if a server were to go down, you'd want to set up a replacement node or something like that. And traditionally, that would involve downloading all of the data from elsewhere in the system, doing the error correction, figuring out what belongs on that node and putting it back there. That's a lot of sort of internal communication within the system. And one thing that researchers have been working on for a little while more recently is schemes for doing that whole repair process, but in a low bandwidth way. So a way where the internal nodes don't need to talk to each other all that much. So this language of nodes, servers, I'm sure it's very tangible to you, but I'm never exactly sure

26:03what people are talking about. So a server is a kind of computer with a particular purpose, or a node is another name. What do these things mean in plain speak? What I have in mind here is that you've got a hard drive over here, you've got a hard drive over there, and your data is getting split between those two hard drives. That's sort of the model. Okay. And so that's something that didn't used to happen when different hard drives didn't talk to each other in the old days. It was just a computer sitting on a desk with a hard drive. And now you're saying a lot of distributed systems and you refer to the cloud. I'm never exactly sure what the cloud is

26:36either. The cloud is what? Lots of computers in different locations, so-called server farms and stuff like that. Yeah, that's right. That's the model to have in mind here. So you have lots of different computers, either compute nodes. So these would be like computers that are actually doing tasks for you or storage nodes. These are just computers that are storing stuff for you. And you have many of them and they're all talking to each other. So the error correction then that is a modern problem that we didn't used to have is dealing with the errors between all the parts of this distributed system as they talk to each other.

27:09Yeah. Like if you just forget that they're different computers and imagine they're all the same computer, then this is the same question as before. But now our objective function is a little bit different. I don't just want whatever correction algorithm I'm doing to run fast. I also want it to run fast so that the computers don't have to talk to each other too much because that sort of intra-system communication is costly or more costly than it used to be. Okay. Now, so given that we're now talking about new challenges or new-ish challenges, one that I'm sure is on the minds of some of our listeners is quantum computing, which we keep

27:44hearing about, that that is going to be a real thing sometime soon and we should be prepared for it. Is there anything really different that we need to think about if we're doing error correction for quantum computing? And also, why would we need to do it for quantum computing? Definitely, error correction for quantum computing is a big thing. And at least at the moment, it's arguably much more important than error correction for classical computing because my understanding is that the implementations we have of quantum computers are very, very noisy. And so it's really

28:17important to do error correction. There's a couple of different ways to get such things. One of them is based on classical coding theory. So there's something called a CSS code, which basically you take two classical error correcting codes that have some nice relationship with each other and you can put them together to correct quantum errors. But then there's other sorts of types of error correction, which are a special thing. I'm not an expert, but I have been thinking about it a little bit. It's a really fascinating problem and I'm getting more and more into it recently. But I would say also,

28:50quantum error correction is another great example. Why should we still study error correction in this modern day and age? Didn't we know about it since the 50s? But yeah, quantum computers are a great example of a technology that at least we hope will be around fairly soon that certainly wasn't around back then. And there's something called DNA storage, which basically you're using DNA as a storage system. So the idea here is we can synthesize DNA and we can sequence it. So basically that means we can write and we can read. That allows us potentially to build storage systems out of it.

29:21And so the proposed application for such systems would be for very cold storage. So things that you want to store for a very long time because it is very costly to access. You need to sequence some DNA, which is not as fast as reading something from a hard disk. But it is extremely stable and extremely small. So you could archive things for a very long time very efficiently this way. And for this application, like any storage system, you want to design error correction for it. And what sorts of errors do you would expect? Precisely these synchronization errors like bases drop out or bases get added depending on the technology that you're using to sequence the DNA.

29:55Oh, that's an interesting idea. So rather than doing nanotechnology, this is like natural nanotechnology. Exactly. Yeah. Wow. People are seriously thinking about this? I mean, it seems so inaccessible to have stored things in DNA. Yeah. Not only are people thinking about it, people have done it. This is definitely possible. But what kinds of things would get stored that way? Well, any data you could want. So you could store a movie, you could store an image, you could store text files. It seems outrageous. True. But, well, I mean, so one important thing to note is that this is not like DNA in a living

30:26organism. It's not like you're walking around with a library of converse like on your elbow. This is just the molecules, not in a living organism. Right. There was a field some years ago that people called DNA computing. So I guess the thinking is we've got this naturally occurring wonderful storage device that evolution has used for probably literally billions of years. So why not use it ourselves? Yeah. And it brings up lots of really interesting challenges from the error correction point of view, because we were talking earlier about these sorts of synchronization errors, like the types of errors

30:59that are introduced when doing DNA storage are different than the types of errors that you expect in silicon devices or something like that. And you have to designing different codes to deal with them. Right. I mean, the biologists know quite a bit about that. Like we talked earlier about frame shift errors where you might be off by a single base and you're just shifted. A whole string is shifted. But they also have something called intercalation errors, right, where a wrong base can get wedged in there and split up a string. And there are other sorts of constraints as well. Like you can't just write any old molecule you want

31:32or any old string of A, G, Cs, and Ts, because some of them will have weird properties. Well, they'll then like kink up back on themselves or things like that. And so there are some constraints also for how you design your code just based on the biology. Well, OK, so you're giving us a pretty good broad survey here of this. And I wonder when you mentioned the foundational questions that still exist in error correction, theoretically, is that where your heart is? Like, do you like to do the real world part of error correction? Do you like the fundamental theoretical computer science parts of it? What winds your watch?

32:04I really like these fundamental math questions that sort of remain open, I think, are very important, not just from an error correcting code standpoint, but also just from a mathematical standpoint. But then it's also sometimes nice to put on a more applied hat to at least go talk to folks who are actually implementing stuff, both because that means I get to learn new things, which is fantastic. And it also, I think, informs different ways of thinking about these very classical questions. Like if you get an injection of a new, more applied problem, you're like, oh, that is a question I hadn't thought to ask before. It is really interesting. So you can sort of get

32:36new fundamental questions that way. Uh-huh. So we did talk earlier about your book, Algorithms for Toddlers, but I see that you've also designed games, that game about monsters that grew from beans. That's true. Yeah. So a project I had with some of my friends from graduate school, it's a puzzle game, it's called Bean and Nothingness. Oh, I see. Now I'm getting the joke. Hearing you saying it that way, now I get the joke. It's a philosophy pun. Ah, being in nothingness. What, is that a Jean-Paul Sartre book or something?

33:09Yeah. Jeopardy question right there. Yeah. So Being and Nothingness is a famous book. Bean and Nothingness, B-E-A-N, is our video game. And what's the idea there? Well, I think it's a very good puzzle game. Personally, I find it very difficult. The storyline has a little bit to do with being a math graduate student because we were math graduate students when we started making it. Although I think it took us nine years to finish because once we graduated, it's dragged out for quite some time. But the basic premise is you get dropped in a puzzle. It's totally static, like nothing's going on. You have to get to an end

33:44tile. And what you have is a magic wand and some recipes. And the recipes give you ways of turning beans into monsters, which will then act and interact with each other and with you in interesting ways. And they can kill you, but they can also help you in ways like destroying blocks or interacting with each other or things like this to help you get to the end. That's pretty separate from my academic work, but it's a lot of fun to make. And if your listeners enjoy difficult puzzle games, I would definitely recommend it. I have fun playing. Does that kind of playful energy come into your mathematical and scientific work?

34:16You mentioned to us that more generally, you have some interest in illustration, but some people think of art and visual things as being kind of holistic right brain stuff, whereas very logical reasoning of the type I imagine you need to do in coding theory is kind of left brain. Maybe that's all bogus. Maybe you don't buy any of that. You're using both halves of your brain all the time and you don't make that distinction. That's a good question. It definitely comes into my work in that the things that I produce in my teaching materials and also within my more academic scholarly work typically has silly

34:50illustrations. But when I'm thinking about a problem, I think my first approach is very holistic and you want to understand the whole picture. You think about new ideas. And then at some point, it's time to sit down and do all the nitty gritty details. But that's the same when you're making a picture or something. For me, at least, there is a time when you have to go down, you have to make all the little lines just right and do the inking and stuff. So to me, it seems like there's both big picture stuff and detailed stuff in both types of tasks. It does to me, too, that you have to make a mess and then you have to clean it up.

35:24Right. Or something, right? It's sort of in that sequence. You've mentioned that you like the collaborative aspect of learning new things, how you can import those ideas back to fundamental theory. How would you put it, though? What is it that gives you the most pleasure in your subject? Gosh, I'm fortunate that there are many things that bring me joy in my job. The human aspect, I really love. Another aspect is the joy of discovery. Solving puzzles is fun. That's enjoyable. But then I think also there's this larger project that all researchers are

35:56contributing to the sum of human knowledge, stuff like that. I think some of these problems are important. And even if I can't solve them perfectly, maybe I can make a dent. And that brings me joy as well. Well, Mary, it's been a real pleasure to talk to you about error correcting codes. And thanks a lot for joining us today. Thanks so much for having me.

36:17I'm struck how we always come back to the human dimension. Here, in some sense, you're mathematizing human communication in a way that's become so difficult, so challenging, and so abstract. And yet, at the end of the day, it kind of comes back to this human capacity to see things on this large scale and this small scale. Uh-huh. Is it this desire to play? Is that, do you think, a productive impulse for a scientist? Is it an essential impulse? What do you make of that? There's no question that my colleagues will say, hey, you know, it might be really fun

36:49to look into, I don't know, something crazy, a non-orientable manifold or whatever, right? This is very common in how scientists talk to each other. You know, I'm not done playing around with how that looks in that space-time. So, I think it's anecdotally a really crucial part of how we interact with the work and why we pursue certain things. Yeah. Even the word puzzle, I think, is indicative of a spirit. If you think of what you're doing as solving puzzles, that puts you in a childlike frame of mind where it's okay to make a mistake,

37:20you can explore. If you think you're doing an important problem, that feels more grown up and starchy. I definitely think there's a childlike character to a lot of the scientists I know and that there has to be a moment where you really are just playing in the sandbox before you can figure out, okay, what's the actual direction to go in? Well, as always, Jana, it's a pleasure to hang out and play with you. Exactly. Until next time. Take care. Bye. Bye-bye.

37:53If you're enjoying The Joy of Why and you're not already subscribed, hit the subscribe or follow button where you're listening. You can also leave a review for the show. It helps people find this podcast. Find articles, newsletters, videos, and more at quantamagazine.org.

38:10The Joy of Why is a podcast from Quanta Magazine. An editorially independent publication supported by the Simons Foundation. Funding decisions by the Simons Foundation have no influence on the selection of topics, guests, or other editorial decisions in this podcast or in Quanta Magazine. The Joy of Why is produced by PRX Productions. The production team is Caitlin Foulds, Livia Brock, Genevieve Sponsler, and Merit Jacob. The executive producer of PRX Productions is Jocelyn Gonzalez,

38:42Edwin Ochoa is our project manager. From Quanta Magazine, Simon France and Samir Patel provided editorial guidance with support from Matt Karlstrom, Samuel Velasco, Simone Barr, and Michael Canyungulo. Samir Patel is Quanta's editor-in-chief. Our theme music is from APM Music. The episode art is by Peter Greenwood, and our logo is by Jackie King and Christina Armitage. Special thanks to the Columbia Journalism School and the Cornell Broadcast Studios.

39:14I'm your host, Steve Strogatz. If you have any questions or comments for us, please email us at quanta at simonsfoundation.org. Thanks for listening. From PRX. .

39:46. . . . .

How Can Math Protect Our Data?

Show notes

Highlighted moments

Transcript

Introduction

Error-Correcting Codes

Math Fighting Noise

Interview with Mary Wooters

Error Correction Discussion

More from The Joy of Why

How Fast Is the Universe Really Expanding?

Will We Ever Find Alien Civilizations?

What Is the Positive Grassmannian and Why Does It Show Up Everywhere?

What’s the Future of Gene Editing?

More Conversations, Complex Questions, and Bold Ideas in Season Five of ‘The Joy of Why’