Episode #201: Leland Wilkinson

50:29
 
分享
 

Manage episode 302782751 series 1564382
由Player FM以及我们的用户群所搜索的The PolicyViz Podcast — 版权由出版商所拥有,而不是Player FM,音频直接从出版商的伺服器串流. 点击订阅按钮以查看Player FM更新,或粘贴收取点链接到其他播客应用程序里。

Leland Wilkinson is Chief Scientist at H2O and Adjunct Professor of Computer Science at the University of Illinois Chicago. He received an A.B. degree from Harvard in 1966, an S.T.B. degree from Harvard Divinity School in 1969, and a Ph.D. from Yale in 1975. Wilkinson wrote the SYSTAT statistical package and founded SYSTAT Inc. in 1984. After the company grew to 50 employees, he sold SYSTAT to SPSS in 1994 and worked there for ten years on research and development of visualization systems. Wilkinson subsequently worked at Skytree and Tableau before joining H2O.

Wilkinson is a Fellow of the American Statistical Association, an elected member of the International Statistical Institute, and a Fellow of the American Association for the Advancement of Science. He has won best speaker award at the National Computer Graphics Association and the Youden prize for best expository paper in the statistics journal Technometrics. He has served on the Committee on Applied and Theoretical Statistics of the National Research Council and is a member of the Boards of the National Institute of Statistical Sciences (NISS) and the Institute for Pure and Applied Mathematics (IPAM). In addition to authoring journal articles, the original SYSTAT computer program and manuals, and patents in visualization and distributed analytic computing, Wilkinson is the author (with Grant Blank and Chris Gruber) of Desktop Data Analysis with SYSTAT. He is also the author of The Grammar of Graphics, the foundation for several commercial and opensource visualization systems (IBMRAVE, Tableau, Rggplot2, and PythonBokeh).

Episode Notes

Grammar of Graphics
Hadley Wickham | R for Data Science
The Idea Factory: Bell Labs and the Great Age of American Innovation

People mentioned:
Jacques Bertin | Book: Semiology of Graphics: Diagrams, Networks, Maps
William Cleveland | Book: Visualizing Data
Jeff Heer
Jock Mackinlay
Miriah Meyer
Tamara Munzner | Book: Visualization Analysis and Design
Dan Rope
Robert Rosenthal
Martin Wattenberg
Graham Wills | Book: Visualizing Time: Designing Graphical Representations for Statistical Data

Tools
R
SPSS
SYSTAT
Tableau

Related Episodes

William Cleveland
Miriah Meyer
Hadley Wickham

Support the Show

This show is completely listener-supported. There are no ads on the show notes page or in the audio. If you would like to financially support the show, please check out my Patreon page, where just for a few bucks a month, you can get a sneak peek at guests, grab stickers, or even a podcast mug. Patrons also have the opportunity to ask questions to guests, so not only will you get a sneak peek at guests but also have the opportunity to submit your own questions. You can also send a one-time donation through PayPal. Your support helps me cover audio editing services, transcription services, and more. You can also support the show by sharing it with others and reviewing it on iTunes or your favorite podcast provider.

Transcript

Welcome back to the PolicyViz podcast. I am your host, Jon Schwabish. Welcome to Season 8 and Episode 201 of the podcast – that’s right, 201 episodes of this podcast. Thanks so much for tuning in. I hope you’re learning a lot about how to communicate your data, and how to visualize your data, and all the things that are required to be an effective data communicator. So on this new season of the show, I’m really excited to bring you some great fantastic guests, I have a whole lineup set up, going through the fall of 2021 and into 2022. But before we get into this week’s episode, just a few updates about the show, I’m still going to bring you great sound quality, great transcription of the show. I’ve also started to add a little bit of video content, so for the first, at least, few episodes of the season, you’ll be able to go over to the YouTube, if you want, and watch the actual video recording of my interviews with the guests. So if you’re interested in watching the interview, in addition to just listening to it, head over to the YouTube channel, it’d be a little bit different because of some of the audio editing versus the video editing, but I think you will enjoy being able to see some of the faces of the folks that I talk with on the podcast.

I’ve also set up some new tiers on the Patreon page, if you’re interested in supporting the show, you can head over there. And if you do become a Patreon, you will have the opportunity to ask questions to my guests. So every month, I’m going to not just give you a sneak peek into who’s going to be on the show, but I’ll give you the opportunity to send me questions that I will ask to those guests. If you’re interested in providing a one-time payment to help support the show, I have a PayPal link set up on the show. So I would love for your support to help bring you the show every other week with great guests in the world of data, data visualization, presentation skills, and more.

Now, to kick off season eight, I can’t think of anyone better to have Leland Wilkinson on the show. If you are in the field of data visualization, you know that name, father of The Grammar of Graphics; if you’re an R user, you, of course, know what The Grammar of Graphics are. It underlays the entire ggplot2 system. I sit down with Leland and we talk about the history of ggplot, his work in the field, his perspective on data visualization tools – of course, being the author of one of the early tools in the field, he has a lot to say and a lot to think about that. I met Leland early on in 2021 as part of a panel for a federal agency who was seeking to improve the way that they were visualizing their data, and so Leland and I got to know each other during that experience. And so, I was so grateful that he would take time out of his schedule to come and chat with me about his work. So I’m going to turn it over to that discussion. Thanks so much again for tuning back into the PolicyViz podcast, and here is my discussion with Leland Wilkinson.

Jon Schwabish: Hi Le, good to see you. How are things?

Leland Wilkinson: Hi, good Jon, things are going well.

JS: Great. Great to see you again. We talked at length, I think, earlier in the year on this panel that we were on, so it’s good to see you again. So I’m excited to have you on the show. I thought we would start with what I think is kind of the obvious question, which is on The Grammar of Graphics. Can you talk about the origin of it – how you developed it? Are people missing something about it? And how does it work across these multiple fields? I mean, it’s applied in statistics and computer science and mathematics. So how do you think about all this, and its evolution over the last decades?

LW: Well, yeah, there are a lot of interesting questions associated with it. And, by the way, I can’t resist now seeing in the bookshelf up there your book, giving it a plug, I have been plugging it to friends because I do think it is the best book on use a visualization for business applications. It’s chock-full of research and examples and so on. So it’s just a beautiful piece and the publisher did a great job. It’s Columbia University Press, I think?

JS: Yeah.

LW: Great reproduction jobs.

JS: Thank you.

LW: It’s nice to see, and it is one of the things I’m proudest about, for The Grammar of Graphics that Springer handled the reproduction, it was the first four-color book published, period, using – not period, but published using four color PDFs, the software that would take PDFs and generate the actual print plates. So I never left the printer or the editors touch the manuscript, the plates, and that I think accounted for why it’s got so few. It’s got two bugs that I know of but that’s about it. So yeah, it was funny. When I went to SPSS, after selling them SYSTAT, which was the stat software company that I created. I had already written a graphics package that was pretty powerful. It’s not widely known now, but it was called SYGRAPH, and it was part of SYSTAT, and my goal there was to create every possible scientific, at least, statistical graphic I’d ever seen, because I was teaching a course, a graduate course in visualization, statistical graphics. And I thought, well, gee, none of the software actually can do that; yeah, they do a great job of pie charts, bar charts, whatever, but they don’t have things like biplots or even parallel coordinates were very rare back in the 1980s, which is when that came out. So I sold it to SPSS, I joined them, I met a lot of wonderful people, but basically, without going into details, certain managers assiduously opposed all my efforts. I reported to the president and nobody reported to me. And he was greatly encouraging, he was a huge fan of the SYSTAT graphics; we toured Europe, showing them off in SPSS offices and so on. But you know how it happens in corporations, even relatively smaller ones like SPSS, that when you get deeper than the C-suite, you suddenly find these entrenched bureaucrats who you got to deal with.

Well, so anyway, I think I’m going a little bit too much in the dirty laundry, I will simply say that I finally had sort of a blowup where I said in a meeting after, again, I had been explaining how I architected the graphics in SYSTAT and got more opposition. I said, very well, I’m going to write a book, so the world can see what I’m trying to tell you here, and you don’t have to use it, I don’t care, but I’m just going to do it. And the president supported me through the whole thing, and it really became a wonderful, subversive subgroup inside of SPSS. And as far as I was concerned, the right people supported me, and I don’t want to name them, but there were plenty of right people, especially the president. So I sat down, and I was authorized to build a team, I built a team of seven people who, not surprisingly, were considered privileged characters inside the company, because we didn’t have to follow all the regulations about pair programming and what language. We used Java, because it was hot at that time, and off we went. So I then started to look at the design, and we sort of tabulated all these possible charts, and I suddenly realized, my god, there’s an algebra under these charts. If you want to create a chart, pay attention to the algebra, don’t pay attention to the chart type, because that doesn’t tell you anything about how to hook up an element, so to speak, or what Hadley Wickham calls a geom and other people.

So how do you hook that up to a column of the data or a subset of rows? So we began to code this in Java, and achieved some real success along these lines, at which point, I contacted Springer, I had a favorite editor John Kimmel, who just retired, but he was an editor for all the great Bell Labs people, and later AT&T Labs, and extremely supportive gentlemen and gave me free rein, even though, ultimately, he lined up reviewers for the manuscript. Now, the thing is, I didn’t set out to write a computer program, even though that’s exactly what we did in Java, because, as you know, a computer program that implements an algorithm is in effect formalizing an argument you would otherwise make in ordinary scientific discourse. And so, I regarded the computer program as a way of checking my ideas. And as we proceeded, it began to occur to me that my motivation in doing all of this, despite the fact that it would be very, very nice in the end, to have something to give [inaudible 00:10:18] work, my motivation was, go down as deep as possible to understand the meaning of graphics. And now, we get into a slippery term, because the meaning of visualization is a favorite term, especially among certain people who regard themselves as authorities on visualization without going into any quantitative or other aspects of the thing. And I believe, and I won’t mention his name, but maybe the most famous person who’s an example of that is someone, both you and I actually believe has contributed virtually nothing to the understanding of graphics, but, by contrast, has made magnificent contributions to the visual design of graphics and deserves credit for that. But a lot of the book, over the years, has been try to argue against those, particularly in the extreme, post-modernist ideas of the meaning of visualization, but other varieties like, oh, here’s a taxonomy of charts, and that’ll tell you something about the meaning of visualization. And my reply, and I made it pretty clear in the book, that is absolutely wrong. In other words, several people who’ve done taxonomies of charts, I have pointed out examples where, in fact, they’re not only incorrect, they’re dangerous along the lines of that wonderful paper you may remember, was it Dijkstra or go-to is harmful in computer languages.

JS: Yeah.

LW: Well, from my point of view, chart taxonomies are truly harmful to the understanding of the meaning of graphics. So now let’s take this idea that I evolved over writing that book towards an extreme, because I believe there’s a lot of work to be done today in exploiting the meaning of graphics from the point of view of a particular system to create software that does remarkable things. And I’ll only point out now, and we can talk more about it later, but I’ve talked with, for example, Jeff Hare, one of my heroes, I’ll say, in this field, because he’s very adept at formalisms and languages and so on. And I’ve said, suppose, as I outlined in the book, you want to develop a program that reads the newspaper, finds graphs and then translates the graphs into a spreadsheet, and then does a statistical analysis and alternative analysis, and yeah, that sounds like cheap thrills, a cheesy little program. Well, imagine doing that for the entire corpus of the New York Times, going back to the early 19th century, there are many, many 1000s of graphics that could be analyzed. And of course, that doesn’t begin to cover scientific graphs from journals and so on. So at this point, we developed the algebra, and it turns out, I get the ideas for the algebra where the structure from work that was done mainly in statistics by people like John Nelder in England, people like John Chambers at Bell Labs, and also a project of the Bureau of Labor Statistics called TPL, Tables Production Language, and I was able to go deeper into that because, at SPSS, I was able to hire for our little team Dan Rope, who came straight from the Bureau of Labor Statistics and who had already written Java software to generate visualizations. It was a first effort, but then was extremely creative, smart, and so on. And what happened was, as the algebra evolved, and it turns out, I only needed three operators to do what I regard as the entire corpus of graphics. And, by the way, after the book came out, I anticipated somebody somewhere adding two, three, four, five more operators to this language and you don’t need to. I had other ones myself. But after months of work almost actually more than a year, I would throw out certain operators because I realized they were redundant, just didn’t need them.

So I remember certain highlights that just – we just were jumping up and down when this happened, some of them, namely, we developed all sorts of scatterplots, even pie charts, bar charts, etc., using the algebra, and a renderer which incorporated a number of graphical elements and features, topological types of features. And then we started to get stuck with certain graphics, and one of them was a scatterplot matrix. Now, if you think about a scatterplot matrix, you could write one pretty trivially by making two iterators, embed one iterator inside another and just iterate all the way through all the possible subplots. And then all you do is position each of the subplots in the right place, that’s trivial, and you got a scatterplot matrix. But that doesn’t fit the algebra that I designed. It’s not incompatible, but it’s just like irrelevant to that algebra, and I was struggling to find out a way. And Dan Rope and I sat there, I’d say we worked for maybe two months on that problem, and all of a sudden, I looked at a symmetric scatterplot matrix, and I said, oh my God, this thing is a classic quadratic form. If you think about it in matrix terms, it’s X transpose X, and therefore, it’s a product term. And I bet, if Dan codes the product properly, and I mentioned basically how we did that, but if he codes it properly, out should pop a scatterplot matrix. And by God, it did. I mean, we just were blown away. Without doing anything about positioning the plots or specifying how big or small they were, we made a scatterplot matrix, and the examples in the book, which basically says, list of variables is X columns, asterisk which is the product term, X, and you type that into the program, and out pops the scatterplot matrix.

Now, the other benefit of that is you then embed in that expression, because the algebra is only 1/7 of The Grammar of Graphics. You embed in that expression, what I called elements, but I kind of like geoms even better as a descriptive term, so that then you can say, good, I’ve got a product of frames, in this case, it’s a symmetric product, although it’s trivial to make it X asterisk Y and do a rectangular splom. Now just go ahead and grab anything you want, line, area, interval, point, and you just put it in there and it plots in the proper place in the scatterplot matrix. So now you have a system, where if you go to the fairly state of the art systems, at least at the time, things like SigmaPlot or your SPSS graphics or SaaS graph, you had a limited number of things you could put in the scatterplot matrix, in some cases, mainly just a set of points. But I was able to show numerous examples of a splom with points, and then joint confidence intervals on those points either using ellipses from a normal assumption or using kernels from a nonparametric approach.

So there were other breakthroughs like that where I thought, don’t put, as some people have written, a bag on the side, there was a wonderful book on the development of the Wang computer system back in the days, it was the world’s first word processor, and they used to joke, the people on that team at Wang, joke about how the IBM programmers, to do the same word processing type of function, would hang a bag of shit on the side of the program and make an exception. And I refused to make any special cases in the book. So then that took care of the algebra. And, of course, then there are six other major objects. Now, here was the trick, which is vastly unappreciated by readers of the book, including the people who actually read it. And that, boy, I tell you, just as an aside, I had people run up to me at VisWeek…

JS: Yeah.

LW: When I first went there in 1999, and they didn’t know me, but they said, oh my God, you wrote The Grammar of Graphics, I just loved it, it’s so fantastic, and it’s this and that. And then they said something which revealed to me that they hadn’t read the book at all. They’ve opened it up and looked at all the pretty pictures. And one of the impacts of the book itself is quite conspicuous, namely, because of the seminar I taught a few years earlier, I included examples of graphs nobody had ever heard of, at least in the Vis community, like phase plane graphs from physics or biplots from statistics and so on. And lo and behold, I’ve noticed, with some amusement that over the subsequent years, people would do papers on phase plane plots. Or when I did scagnostics, which is – I can talk about it later, but it was a particular type of structure to impose on a scatterplot to analyze shape, and then I started to see papers coming out – and, by the way, I introduced that in The Grammar of Graphics, but only in a single page or two – there were people who started to do things like opixnostics, parnostics. In other words, they grabbed any kind of element they could, used in visualization and then apply the same sort of characteristics. But that sort of misses the point that scagnostics took his idea was about point sets in higher dimensional spaces. That is actually 2D subspaces of high dimensional spaces. And you couldn’t just apply the same ideas to willy-nilly anything. Well, all right, here’s what I’m leading to, it was an equal amount of intense work to develop these seven steps that underlie every graphic. And when I say every graphic, I mean, every graphic. You cannot draw a, what I called a well-formed statistical graphic, without implementing code to do every one of those seven steps. And they’re fairly self-evident, many of them, like data source is an object that needs to input data that has an arbitrary organization, and we don’t care what that organization is. But the output is what we now call a frame, but at that time, I called a table. And you have to have a table to do this class of graphics. Now, that immediately, in The Grammar of Graphics, that immediately tells you there are some graphs that are not suited to The Grammar of Graphics, and those would be certain kinds that cannot be organized in the table. But that doesn’t, by the way, rule out node edge graphs, for example, which is a huge class, because edge lists are easily organized as a table.

JS: Sure.

LW: So now, we get through data, and we then go through the other things, which involve things like geometry, which is all those different forms or geoms that are needed to draw these graphs. Now, there was another thing – boy, I hope I’m not – I’m going into some detail here.

JS: That’s fine.

LW: It will give the interested reader an understanding of how the thought process developed because things like geom, you’d immediately think, oh okay, that’s easy, we’re going to go draw bars, lines, pie slices, etc. Oh, wait a minute, pie slices are nothing but rectangular elements that have been put through a polar transformation. Nobody ever said that for some incredible reason that I’ve missed. So, in fact, I made this big collection of geoms, and then realized I could get rid of most of them, and all you need is about, you know, whatever, a 10 of them to draw anything that appears in journals or newspapers or whatever. Similar things happened in other areas like aesthetics. Now, aesthetics, I drew heavily on Jacques Bertin’s work, and, by the way, probably not surprising to you, but almost the only people I found of any intellectual use in this field – this is getting to be really arrogant here – are Jacques Bertin who profoundly wrestled with these problems of geometric forms, Jock Mackinlay who is now at Tableau – Jock did an absolutely brilliant dissertation on a program that would draw a graph based on some of Bertin’s theory and other ideas. And the first edition to the book I didn’t recognize or know about Jock’s research, and I credited him in the second volume, because the stuff was really good. Well, I won’t mention in passing all the others I consider hugely significant, I mention them in the book, obviously, the entire group at Bell Labs, and I would claim that almost everything I’ve ever seen at VisWeek involving interactive data analysis, can be traced back to the Bell Labs group. There’s nothing new under the sun there.

Okay, well, let me get back to the point here. When you go through each of those other classes, those seven fundamental classes, a similar simplification takes place. Statistics, of course, was a piece of cake, because I’m a statistician, and so, I drew heavily on Bill Cleveland’s work and John Chambers and others, to understand how you create statistical functions that can inject statistical summaries into these graphs. By the way, I’m sorry if I wander here, but I should mention that a major assumption of the book – I’d almost call it a breakthrough, except it’s so trivial is that every graph is a function. Now, that’s high school algebra, at least, if you learned it after the new math, and not back in the 1950s. But the fact is, when you understand that every single graph, and I’m talking about everyone in the book, and everyone – anybody draws this week, is actually a function, and so, it can be expressed as a function. Suddenly, again, you get this huge simplification, and that’s what statistics does, it exploits just a tremendous number of statistical functions that do things like regression lines or kernel smooths and so on, and are able to inject themselves into, what I call the frame that contains the actual graph.

Now, you think we’re done there with everything you need now to make the graph, short of having a renderer, which is very important. You have to write one of those, but that’s not hard in Java or C++ or even Python, there are plenty of renders you can draw on. But I thought of coordinates, my God, pie charts require coordinates, those aren’t statistical functions, those are simply you generate – and that’s chapter one in the book – you generate a bar and run it through a polar transformation, and now you got a pie slice. All right, so now we’re done with seven elements, and presumably, you’ve written all seven, and here’s what people missed, who read the book, who think that it’s all about charts. It’s a total order. Some reviewers of the book completely ignored this as well. I said it was a total order, and there is no other way to execute those seven objects, except in the order I printed in the book. And that’s the order we used in the software. I’ll give you a quick example. The people at tableau, for example, but other companies as well, who implemented The Grammar of Graphics as the basis for their engine, and by and large did a really good job, didn’t believe me when I said it’s a total order.

JS: Yeah.

LW: So when I got to Tableau, I did a document for them evaluating, since I was so called VP of Statistics, I evaluated the accuracy of the statistical routines, etc. One test I did surprised me a lot. I did a scatterplot, and I fit a regression line in the scatterplot, did a beautiful job. Now I went back, because Tableau allows you to do this, and in the book, I described how to do this, I decided to log the X axis and log the Y axis. Now, if you do that, the regression line that went through all the points, needs to adjust to the transformation, and what happened instead was the points all worked perfectly, because they executed those in the right order. But what they’d done was they inserted the regression line at the end of everything, and now the regression line didn’t realize, wait a minute, I’m in log-log space, and it flew right off to the top of the chart. And they were quite disturbed by that, justifiably, and I worked with them, to help them realign the order things were executed in.

So in wrapping up this architecture question, I have to say that the extreme – well, let me put it this way, God is in the details. It’s a saying that I put in the frontispiece of the book, and it’s probably more true of this system than any other; namely, if it took me 10 years, which it did, with a very talented committee to generate software that implemented this, and you can see for each graph in the book, as I’m sure you know, there’s an actual language there that needs to be executed by the interpreter that’s built it into the program in the proper order. If you don’t get the details right, you’re going to draw garbage, and I distinguish garbage, which is actually ill formed specifications, literally like you wrote a calculator program and typed in two plus two, and it came out as five. That’s ill formed, and the graph is meaningless. Versus weird shit that actually – and this was one of my goals, and I never quite got to it – weird things that came out of the algebra. I’ll give you an example. Graham Wills, the extremely talented Brit – actually he got his PhD in Ireland, in Dublin, because he grew up there. But anyway, Graham was an expert in time series, but he wrote a beautiful book for Springer called, I think was called Time Series Visualization. But one day he was playing, because he used the grammar graphics program at SPSS to generate all the graphs, and one day, he mistakenly put a double [inaudible 00:33:50] polar transformation into the specification, and he got an outlandish pie chart. Every slice of the pie was then put through another nonlinear transformation. But the point is, it was not meaningless, it was ugly, it was hard to read, but, in fact, it was a faithful representation of the data.

So in addition to my goal of making a program that could understand the meaning of graphs, which I still believe, I also was looking forward to an interactive program which we developed to execute it, that would allow you to type in some arbitrary algebra expression, and you had no idea what was going to come out. And I thought, if you played around with this thing long enough, you are going to invent a ton of new chart types. So I don’t know if anyone has done that yet, the full language is embedded inside SPSS, and I’ve almost run out of time here, I’m sorry.

JS: No, I’m fascinated by the whole process, and as you’re talking, I’m curious, do you think you would have developed it differently, if, say, you didn’t go to SPSS, working with this team in a computer statistical package company, and instead were like, you went to a university, like, the way you describe the iteration of the theory, and then the implementation in Java, I’m just curious, like, how do you think it would have been evolved differently if you had not gone to SPSS?

LW: Well, that’s a really good question no one’s asked me before. It wouldn’t have worked. You immediately reminded me of the thought that it’s probably an interaction with my own personality, which can be very oppositional at times. If you want me to do good work, go get somebody that I’m going to say, oh yeah, conventional wisdom crap, who is in power to stop me, and it goes back to an old theory in psychology, actually, achievement motivation, you find high achievers who often devise methods to break all the rules, make an end run around the rules, and then produce something, and that’s what happened at SPSS. So I don’t want to slander SPSS. It has some tremendously smart people, well motivated, but there was a small group, as I said, that was working every day to shut us down. And I have, for example, the head of marketing there, he eventually got fired, who just said, well, when did he finish, I said, never. And he basically tried to rearrange the personnel, change job titles and so on and didn’t succeed. So yes, I taught and also a wonderful university environment where there wasn’t a lot of conflict. And so, yeah, I did learn a lot of stuff there as I was teaching, but I don’t know about you, but when it gets too comfortable, you stop having ideas that contradict conventional wisdom. So yeah…

JS: You can question these different ways, yeah.

LW: Yes, and actually, Jack Noonan, the president of SPSS, shortly before the company got sold to IBM, did something brilliant. I went to him and I said, they’re just not going to implement our software, we’re done, it’s all working and everything, they will do a single menu for it, they’ve refused. And he said, Le, I got an idea. You embed the interpreter inside SPSS, and you simply do a block in the code that says beginning grammar of graphics, then the language and grammar of graphics. And I said, oh my God. And we did that, and they couldn’t stop us. And it turns out, there are a few selective users inside the SPSS community out there, internationally, who are actually coding grammar of graphics despite the wishes of all the UI people and whatever.

JS: It’s like an Easter egg. It’s like a grammar of graphics Easter egg in SPSS.

LW: It was.

JS: That is amazing.

LW: Yeah, and I have to say, without going into names for obvious reasons, that when I left SPSS, there were similar reactions elsewhere, where people simply didn’t want to take the time to understand the book. But fortunately, I had people like Hadley Wickham come along, and Hadley was a grad student of Di Cook, again, one of my heroes who is behind GGobi, XGobi, and some of those wonderful programs at [inaudible 00:39:08]. Anyway, Hadley read the damn book, sat down and coded it properly, and put it into R, and it just took off. And I told him, you know Hadley, thank you so much, because, without you, this book would have gone back into the library. And ironically, and I think in large measure due to Hadley, the damn sales of the book have increased every single year over the last 20 years. I told my editor that when I first published the book, and he said, well, I’ve never seen that before with one of our books. And it really has, because people, after they used ggplot2 they thought, well, Hadley keeps mentioning Grammar of Graphics, bless his heart. And to me, that is the highest standard an academic can acquire is to distribute the credit backwards in history, and then talk about his innovation or her innovation. That’s what academics is all about, not, oh look, I amended this new thing, and I’m not going to tell you where I got the ideas.

JS: Right. So you’ve mentioned Hadley, and you mentioned Jock and Tableau. Before we go, I want to get your sense of looking now and forward in the data visualization toolkit. What are your thoughts on where things are and where things are going? I mean, clearly, you’re a fan of Tableau; clearly, you’re a fan of R, but where do you see the field of DataViz tools now and going forward?

LW: Yeah. Well, the good, bad, and the ugly.

JS: Yeah, right.

LW: I was at a 20th anniversary session on Grammar of Graphics at the statistical, one of the statistical conferences. Someone asked me the obvious, so when are you going to do a third edition. I said, never. It’s a math book. You don’t…

JS: Right, you don’t need.

LW: Well, if I made a bug, obviously, I’ve got to fix it. But you don’t add to a theorem that is designed to cover X, Y, Z, and then sort of just change it willy-nilly into a new quote theory. And one of my gripes with, at least, the Vis community version of computer science is like using words like scientific, theory, I mean, give me a break, those aren’t theories. They’re interesting observations about the world, very useful, but please don’t try to dignify them by calling them a new theory of visualization, they’re not. That’s why I was quite confident that when people finally read the book, you were going to see spinoffs, and I have to say the Ant Group in China, this bunch of like, I swear, they’re teenagers, but they’ve been going lickety-split, developing a JavaScript grammar of graphics program. And there, of course, is Hadley and then the groups at Python and elsewhere, who’ve done other stuff.

Now, I’ve already alluded, so I won’t say much more, about one of my big gripes about InfoVis or now VisWeek is that they get too easily impressed with high school math. My daughter is a pure mathematician. My son-in-law is a pure mathematician. And I sort of feel like the VP candidate who said, I know John Kennedy – I knew John Kennedy, you are no John Kennedy. Well, I’m sorry, I spent enough time asking mathematicians, what do they think, how do they, you know, and this and so on. And whenever I bring up some topic where it’s some InfoVis paper that has three pages of symbols or algebra, whatever, the pure mathematicians look and say, that’s high school math, that’s not mathematics. And there’s nothing wrong with applied math, but don’t try to impress me by using so much math, that you make it hard for me to understand what you’re actually saying. So I’ve reviewed papers for InfoVis where I looked at a particular new “graph” and I said, congratulations, you’ve designed the first graph I’ve ever seen – this is really snotty, I’m sorry, I apologize – but first graph I’ve ever seen that made it impossible to see the three clusters in the Fisher Anderson Iris data. And you looked at it, and by God, they put that in this example of, like, look how nifty this new graph is. Well, no, unless you can show a real…

The second thing is the idea, and I want to get to the positives, but I’ll just mention, there’s a whole drive inside the InfoVis movement to evaluate the graphs that people are inventing, and that’s laudable. The only problem is any psychologist who looks at those “experiments” knows right away, sorry, those aren’t experiments, they’re not even randomized, you don’t even know the basic of experimenter effects, go read Robert Rosenthal to learn about this. And so, inevitably, I review some paper that says, well, our graph was preferred by 70% of the users and the alternative graphs were preferred by 20% of the users. Really? What does preferred mean?

JS: Yeah.

LW: And did you do a double blind experiment in this, because they probably knew which one you invented, and then you ask them which they prefer.

JS: Which they prefer, right.

LW: So now, having said a lot of these negative points, and I’ve said things that I do not appreciate that I said, at some meetings, although they weren’t nasty, but still, I want to say, the amazing thing about the Vis community is its creativity. I mean, they just sometimes come out of left field and do remarkable stuff. And so, when I consider things by Tamara Munzner and Jeff Hare, and Martin Wattenberg, and Maria for another, though I didn’t know her work as well until we had our committee session, I thought, damn, you’re really on the right track with that stuff. And I never would have thought of it that way, so to speak, and whatever, that is the very best of the best. But when you do impose requirements, like, you have to use LaTeX, which I use for everything I ever write, and you may have only 10 pages, not including reference, and you may use this only double column for blah, blah, blah. By the time you’re finished, it is possible to produce a production ready piece of crap. I mean, the tools are so sophisticated that reviewers miss the essence of what someone’s writing about. And I just submitted a paper to a ASA journal, Journal of Computational and Graphical Statistics, which has got a pretty tough reviewing schedule, and they seem to take about a year before they get your reviews back. But you get pages and pages of single spaced reviews, whereas in InfoVis, I’ve had some reviews that consist of four sentences. And obviously, the person hadn’t read the paper. And some of this isn’t always related to intelligence.

JS: No, that’s right. Well, it’s interesting.

LW: Truly brilliant people say incredibly stupid things, and I don’t know what it is.

JS: Well, I mean, Maria and I actually talked about this for the interview I did with her a few months ago. I think there’s effort and some aspects of the field to open it up to more practitioners and data journalists who all have a lot of things to say, but then you get into this, like you said, there’s this specific format, and it’s just so hard to use, and it’s like this gatekeeper situation, right? We need to make it this difficult to get people in the door.

LW: Right, I completely agree. And that is, again, one of the laudable aspects of InfoVis, and you know the program committee every year works to refine and learn from their mistakes and so on. So they need to be praised for that. It’s just that whenever you make rules and get things set up, it’s like in tax law, right? Somebody smart enough, sitting in an office in New York City is going to figure out how to get around what you tried.

JS: That’s right. Well, Le, thanks so much. I mean, this history is really interesting, and I appreciate you taking the time and telling me all about it, and the origin of Grammar of Graphics, really appreciate it.

LW: Well, thanks. I probably have ended my career here from all the gossip I just gave you. But you know what, I’m 77 years old and I don’t give a shit anymore. I’m having fun, I’m still writing and thinking and so on, but at a certain point, I don’t depend on getting tenure or whatever by getting the citation counts or h index, I don’t care. But I’ve met so many, just brilliant people, and I was delighted to meet you on that committee, you know, who really are passionate about visualization, and it’s going to be a great future.

JS: Yeah. On that note, we’ll close up. Thanks so much, Le, appreciate it.

LW: Terrific, yeah, thanks.

JS: Thanks.

LW: Bye-bye.

And thanks to everyone for tuning into this week’s episode of the show. I hope you enjoyed that interview with Le. I’ve put all the links to the things we’ve talked about in the show notes page, so go check them out. If you would like to be a financial supporter of the show, please head over to my Patreon page or to make a onetime donation, head over to PayPal. So until next time, this has been the PolicyViz podcast. Thanks so much for listening.

A number of people help bring you the PolicyViz podcast. Music is provided by the NRIs, audio editing is provided by Ken Skaggs and each episode is transcribed by Jenny Transcription Services. If you’d to help support the podcast, please share it and review it on iTunes, Stitcher, Spotify or wherever you get your podcasts. The PolicyViz podcast is ad free and supported by listeners. If you’d like to help support the show financially, please visit our PayPal page or our Patreon page at patreon.com/policyviz.

The post Episode #201: Leland Wilkinson appeared first on PolicyViz.

210集单集