Innovation Files: Where Tech Meets Public Policy

Deciphering the World of Data, With George Sciadas

Information Technology and Innovation Foundation (ITIF) — The Leading Think Tank for Science and Tech Policy Episode 77

A data-driven world raises the stakes for numeric literacy. Rob and Jackie sit down with George Sciadas, the former director of the Statistics Canada Center for Special Business Projects and author of the new book Number Savvy, to discuss the past, present, and future of data in society.

Mentioned

Rob Atkinson: Welcome to Innovation Files. I’m Rob Atkinson, founder and president of the Information Technology and Innovation Foundation.

Jackie Whisman: And I’m Jackie Whisman. I head development at ITIF, which I’m proud to say is the world’s top ranked think tank for science and technology policy.

Rob Atkinson: This podcast is about the kinds of issues we cover at ITIF from the broad economics of innovation to specific policy and regulatory questions about new technologies. If you’re into this stuff, please be sure to subscribe and rate us, really does help.

Today we’re going to talk about numbers or data, where they came from, how they evolved, and where they’re going in the future. Given the fact that ITIF has had its own Center for Data Innovation now for probably over a decade, it’s an issue that we take quite seriously understanding the data economy and how it’s evolving and what role government plays.

Jackie Whisman: Our guest is George Sciadas, the former director at Statistics Canada Center for Special Business Projects. His new book Number Savvy: From the Invention of Numbers to the Future of Data shows how numbers were invented and used to quantify our world. It also explains what quantitative data means for our lives. He’s joining us from Ottawa where he’s based and we’re happy to have you. Thanks for being here.

George Sciadas: Thank you. Thank you both.

Jackie Whisman: We’ll start with an easy one. Why’d you write this book?

George Sciadas: This question has actually been at the heart of my thinking in the early days when I started to realize that it’s a book I’m writing because I didn’t know that from the get go. So there’s a number of reasons for that. Let me start by saying that our societies have done a good job in the area of literacy. Unfortunately though historically we haven’t done a very good job in the area of what we call numeracy. So there’s still among many people, a certain amount of confusion when it comes to statistics and numbers and in fact, at times even regrettably, some people say, I’m not a numbers guy, and they shrug it off with some sense of pride even. So one of the reasons for this book in that sense is to contribute the proverbial stone to the edifice of statistical literacy, numeracy as well to feel more comfortable with numbers and so on.

Another reason has to do with, let’s call them the geopolitics of our time. Now, things have changed by leaps and bounds over the last 20 years because of technology and other factors and now data are much more in, much more acceptable than they used to be and everybody’s a data convert. There’s a new-found love affair with data, which is very, very good and it’s a sign of progress in my books, but at the same time, this necessarily means that mathematically the tent of data people is much bigger than it’s ever been and there’s many solitudes in there. So communication becomes more of an issue than it has been historically, and by knowing how we got to this point, the evolution of data from the end of the 20th century to the beginning of the 21st, I believe is crucial to help communication between all the people under the tent.

Lastly, I would say an additional reason that brought me to this book is that I’m a student of data as well and I read many, many books on data. It’s something I’ve enjoyed my whole life and I still do. I realize at some point that especially now with so much interest about data, the majority of those books, if not all of these books, they’re written by keen observers of the data in the world like journalists and academics and so on.

I’m a practitioner of data. I’ve produced an enormous amount of data in my life. I’ve done my fair share of damage, you can say. So I believe by using the perspective of a data producer as well was needed and I would like to think that it has a small contribution to all the other books and all the literature that exists in numeracy because of the perspective I have as a producer of data and how we have come to the point we are now and this has also the seeds of how we may want to move going forward.

Rob Atkinson: George, you mentioned one of the three reasons and that’s the issue of numeracy. We’ve reported several times the study, I believe it was the Educational Testing Service at the US, but I could be wrong, that found that of college seniors in their last semester, only 31% were numeric. And numeracy is not knowing calculus, it’s just pretty basic stuff and only 31% were numeric.

I’m struck by the other fact, another factoid I love to repeat constantly is in the US around 90% of kids in high school take geometry, but about 8% take statistics and I find that appalling. I think the single most important thing you need to learn in high school, in my opinion besides algebra one, is statistics. Way more beneficial than algebra two or calculus unless you want to become a scientist or engineer. But we don’t teach statistics and yet there’s so much value in our day-to-day lives with statistics. What do you think?

George Sciadas: I fully agree with you. Your assessment in my mind is bang on, a great assessment of what we experience. And in fact this is exactly saying in other words why I bother to write this book as well and why many people these days are trying to help the area of numeracy. As you said, just to add a bit more meat to what you said, going back to the sixties, the seventies, even and so on, it was okay not to be able to handle numbers very well. It was okay, never good, but it was acceptable because the majority of people, as I said, had a fear or confusion about numbers and they even wore it as a badge of honor. I’m not a numbers guy, and you had lawyers and highly educated people, highly educated people, highly literate people who would say, oh, a billion with a B or a million or a billion, I don’t know, it’s a big number.

Nonsense. There’s a big difference between a million and a billion and a trillion. There is no such thing as an excuse this day and age for students not to be able to tell the difference between a million and a billion so on. As I said, especially from the United States, many efforts have taken place. Many books and good academics have written contributed how you can tell the difference and separate orders of magnitude and so on. This has been very, very positive and it’s part of progress. Coming to your issue now these days I fully agree curriculums and how they evolve and develop, they should be part of the answer to the fact that our society these days cannot function like that in the sixties and seventies. We need people who know numbers, they understand numbers. Data as we say, are ubiquitous.

They’re everywhere. You can’t step outside of your front door without encountering data, without having to deal with data to understand issues, to deal with issues and so on. So it should be part of the curriculum as one of the many efforts. We need much more in our curriculums. Curricula should say we need much more, but that should be there. Another point to mention to your intervention on that front is that numeracy and the literacy are not some kind of a dichotomous concept as people might have thought in the past. And we used to ask, are you literate or you’re not literate and you got an X. You say 90% are literate and 10% are not literate. What did it ever mean? Not only what it means today.

Numeracy is the same. We should approach them as some kind of a continuum. It’s not a dichotomous variable, it’s a continuum. So even I, I know numbers, I love numbers, but the skills I have now may not be sufficient for 10 years from now. Or the skills I have may be those of a driver as opposed to someone who can fix a car. So we need people who know how to fix a car. We know people how to build the engine of a car and then we know people who know how to just drive the car. All those segments in the continuum line for data are there.

And the high school students should be exposed to a lot more data and as I’m saying in at the book, they should know that data don’t fall like manna from the sky. They’re produced and that brings us to the whole thing what data are produced, how they’re produced, what do they mean, and when you have them, what do you do with them and all that, which is a very big package. But to start unpacking it, we need to start exactly from where you said educate people from a young age to be comfortable around the idea of numbers and data.

Rob Atkinson: One of the big problems in the US, we’ve done some research on this, haven’t published it yet, is a number of the major colleges and universities, particularly state universities of the US, require four years of math, but do not allow statistics to count for that four year requirement, which again, I find beyond absurd. Beyond absurd. It is just like saying hey, you can take it if you want, but kids aren’t going to take five years of math. They’re not going to take four years of math and statistics. So just to your point, I agree, it’s something we’ve got to really stress that the whole education system has to integrate statistics and data in a much deeper way, partly because there’s all this data analytics going on now. It’s not just the generation of statistics, but it’s data analytics everywhere we look.

George Sciadas: Yes, yes, absolutely. Another point I fully can agree with you, makes an awful lot of sense in my mind and it’s part of the exposure we have now as a society. You hear much more data analytics and what it means in practice as well is that data have moved out of where they used to be between a statistical office and a government, for example. Now it’s in every business. I mean data is a business of every business from the banking sector to the telecommunications sector to agriculture.

There’s agriculture, just to give you an example, with the evolution of data, the book makes a joke with a data, a company like John Deere has, which everybody thinks of tractors or something, you would be a fool to think of John Deere as an agricultural company. It’s a data company, it’s a data company. So things have changed to such an extent that with analytics and all the noise and so on, data are no longer the prerogative of some kind of a secluded circle between your politicians and policy makers, I should say, and statistical guys. But it’s every business and every citizen.

Rob Atkinson: My son who’s off to get his PhD in the fall in computer science, his job before that was with a company in Silicon Valley, which I’ll give a commercial too, called Farmers’ Business Network. And it was a venture backed startup that works with farmers all across North America, farmers and ranchers, but one of its business models is it collects data and then does bigger scale analysis on all the data so that there’s a lot more insights and learning that can go on about how to do farming and ranching a better way. Exactly to your point, George, that’s all about data now.

George Sciadas: Yeah, yeah. No, we’re saying exactly the same thing. Yes. Farming nowadays, you can make another statement is more about data than pesticides or something like that. It’s more about data. And if you want to find out the biggest example maybe of them all, given that you’re in the US, baseball. You don’t have to go... Baseball was practically revolutionized by data. That’s very well documented and it’s one of the good examples. Again, it’s not for policymaker, it’s not for our health, it’s not for the environment still it’s for a very important sport and it was changed by data. You don’t need to go farther than that.

Rob Atkinson: Not to keep jumping in Jackie, but I just have to say I have a great book on, I’m a big basketball fan, and this book has analyzed every shot taken in the NBA for the last 30 years. Unbelievable. And it’s shows definitively that Steph Curry is the greatest three pointer of all time.

George Sciadas: And he’s Canadian? No, no, no, I don’t think so.

Rob Atkinson: He seems like he’s Canadian. You know who was Canadian was was that great point guard Steve Nash.

George Sciadas: Of course.

Rob Atkinson: Steve Nash is Canadian, he’s from Vancouver.

George Sciadas: Oh, of course, course. He is an amazing guy. So Jackie, I don’t know if you will have the time to analyze every shot every made in basketball, but this is actually a bit of a segue if you allow me to make one more comment in this part of the discussion, is that data now, I mean just to signal the magnitude of how things have changed and they evolved the last 20 years alone. In the older days because we couldn’t handle data very well most of the data was what we would call macro data. The GDP or unemployment rate, inflation rate, macro for the macroeconomy. People could not handle microdata for people, for businesses, for buildings even any kind of microdata. The new kind of data now they’re way below the micro, they’re data smithereens I call them in the book.

And you have to put many of them together to arrive at microdata, which eventually have to be aggregated to arrive at macrodata, which is the only data we could handle in a paper kind of world like in the past, pre-digital, pre the eighties. Because people are asking me why today everybody cares about microdata. Nobody cared in the fifties and sixties and then you have privacy and confidentiality. The main issue is not privacy and confidentiality is that in the forties, the fifties, the sixties, nobody could handle microdata, neither the users nor the producers because it was paper based.

And then we came to the idea of admin data that became and are still a big thing compared to survey data because they could be digitized and dealt with. And then now you have students who can handle data more than professional statisticians half a century ago. This idea also with the basketball shots and so on is one more indication that we handle enormous amount of data below the microdata, which was our fear in the past and now is the low end of our knowledge.

Jackie Whisman: Well, the subtitle of your book references the Future of Data, which is also the title of the book’s closing chapter. So what do you think the future holds for data?

George Sciadas: I know all about it. I can tell you all about the future of data. And in fact, it’s the subtitle of the book and the title of the last chapter. And in fact, in the beginning of the chapter, a page or two starts exactly as I told you now. But I can tell you exactly now the future of data. And I have item one, item two, item three, and at the end, after I tell you everything that can happen, I tell you also... Just kidding you, I’m not a futurologist, I don’t have a crystal ball and I don’t really know what will happen in the future of data, but the important thing the book does is to bring us as to where we are now, how did we get to this point? All the forces that have brought us to this point are still there. They’re not going to not be there after you and I have been talking. After the interview, they’re still there.

They brought us from the fifties that all that started in the official statistics to 2023 and they’re still there. If we follow the forces, we can have a very good idea of what will happen. That’s the first thing and I will elaborate on that. And the second big thing is that whatever is going to happen is not going to happen without us. How we will behave, react as a society, as a government, as a law making community in the next few years has a lot to do with the future of data, especially when it comes to how we’re approaching things like privacy and confidentiality. People are very confused right now because of the social media and the new data, the big data we call it, all the data that didn’t exist and until the last few years and now they have everybody confused, even those who have those data as well as those who would like to get their hands on those data and so on.

There’s all kinds of issues. But let me take a stab on a few of them and for the future of data. The sure thing in my mind, I’m convinced about that, the future will have much more data than we have had up to now. If we feel today that we’re inundated with data, we’ve never had more, which is true, as the expression says, we ain’t seen nothing yet. The future will have a lot more data than we feel we have now. That’s number one. Number two, which sounds like a paradox, in the period of abundance of data, we have an abundance of data everybody says so, I have never heard more complaints from people that they can’t find the data they need. The data gaps seem to be everywhere. And you say, how can that be? We’ve never had more data. We agree on that.

And yet the data gaps and the shortage of data have never been felt more. Why? Because we use more. So it’s not a paradox. I’m explained that it’s not a paradox, but it could be perceived as a paradox, but it will be part of the future. In the middle of a lot of data, there will be data gaps and miraculously the data someone needs are the data that are not there. And it continues to happen. And that’s what I did in my job for 30 years almost. I try to cover the data that are not there, not to sell the ones that are there. But that’s an aside.

Another thing that’s going to happen in the future of data is from all the forces that we have now at work is that analysis, which is the higher we can value added activity to move from data to insights and actionable knowledge, we need analysis. Analysis used to start with a question, a policy question or a business question. How do I improve my profits? How do I increase my profitability, my market share? How do I improve my society, my neighborhood? It was a question. And you say, I need data, research and analysis.

Now, analysis can start more from just because we have a lot of data even without a question. That’s what some people in my interpretation have called the death of theory. Theory may die and just because you can put your hands or all kinds of data and you have the technology to manipulate that you didn’t have in the past, a lot of analysis may be happening not because anybody asked anything and so on. Now, what could that lead to? I’m imagining. Since it can be done, I don’t want to say by every Tom, Dick and Harry, but it can be done by anybody really, a student or in a basement or somewhere there would be a lot of findings in quotes that will be confusing people.

So I see a big amount of confusion in the horizon because there would be a lot of findings and who’s going to check them? That’s why every chapter in my book has fact checking tips because it will come with a lot of disinformation and misinformation. Either accidentally or maliciously the good data will come packaged with a lot of bad data. Who’s going to separate that? If George, because he spent 30 years in statistics can tell the difference, I will be too busy to take care of all of you because someone has to be doing that. So it’s upon all of us to be able to have some sort of a mechanism. An enumerate society is very important, but it’s not enough. It’s a necessary but not a sufficient condition to do all the fact checking that will have to be there. But the fact is, I can see for the longest time new insights and disinformation will be living, will cohabitating on the same shelf.

Another thing I can see for the future, because of that, because of the impurity in statistics, who’s producing what and what it means and how do we use it? And is it true is it not true? There will be a lot of resistance in the movement. Some of the purists would want to have boutiques for statistics, a small thing, not everything, but it’ll be perceived of a higher quality and you can pay higher price. In the interest of time another thing I can see for the future of course is that a lot of data will not be produced by people. It has started, it’s not my imagination. It has started. Many data is produced by sensors and so on, and eventually they’re aggregated by other machines.

So the production of data by machines has started a little bit, and I don’t want to get there today, but with AI, artificial intelligence, the sky is the limit. It’s a bottomless pit. It can absorb any data you can throw at it. It’s never enough to fit that beast. And the rest of the story.

There’s one more dimension here. Because of the proliferation of data and it could be overwhelming to know and to deal with, I can see also ingestion of data by machines. Not only the production of data will be done by machines, but part of the consumption will be done by machines. Machines will be programmed to get the daily release of the CPI or the GDP or the unemployment rate before any human goes. It’ll be taken by a machine, put in some kind of an algorithm. So I can see even in the future, the production and the consumption of data done outside the humans.

Those are some of the things I can see for the future of data. But as I said, I’m not a futurologist, I’m just a statistical guy, I have no crystal ball. But if you follow the trends that are happening now, you will see. Just to mention the last thing, it’ll create in the new data environment with statistical offices and Google and the social media and others and the utilities will have a lot of data in the banks, it’ll be very interesting on the one hand and confusing on the other because someone has to separate the good data from the bad data in that sense. And there will be partnerships, which you can call competitive friendships.

So when you have Facebook collaborating with the OECD or small businesses, this is something that would have been unheard of 20 years ago. And now you have a giant in the data world like the OECD collaborating with a giant in the social media, Google, and they produce data. So these strange bedfellows are happening a lot. There’s many examples, I list many in my book, and I can’t say this will be lasting. It will be lasting, but it may be changing. And it’s not one of those that because two players collaborated, it will stay like that. In fact, many of those may very well be ephemeral partnerships that don’t have a lot of traction. But all that depends partly, as I said, on how we are going to deal with the issues of privacy and confidentiality.

Rob Atkinson: George, that was really, really fascinating. I could ask many more questions. All these issues, particularly machine to machine to machine, I think that’s going to be a fascinating area, but unfortunately we have to wrap up. So I just want to say thank you so much and I’m looking forward, I have your book. I’m looking forward to reading. I hope everybody else listening does as well.

George Sciadas: Every good thing has to come to an end. And this has too. I really appreciate it, Jackie, and you Rob, for the invitation to join you. I really enjoyed it and I can’t thank you enough for this. And I hope that some people are interested and they look at data a bit more or differently than they had in mind before this.

Rob Atkinson: That would be a wonderful outcome. And again, it was our pleasure as well, George.

George Sciadas: I really appreciate it. Thank you.

Jackie Whisman: And that’s it for this week If you liked it, please be sure to rate us and subscribe. Feel free to email show ideas or questions to podcast@itif.org. You can find the show notes and sign up for our weekly email newsletter on our website itif.org. And follow us on Twitter, Facebook, and LinkedIn @ITIFdc.

Rob Atkinson: We have more episodes in great guests lined up. We hope you’ll continue to tune in.



People on this episode