Manage episode 289273449 series 1564382
Charlie Smart is a visual journalist based in Brooklyn, New York. He is currently a graphics editor at the New York Times where his work focuses on mapping the coronavirus pandemic and U.S. elections.
Charlie is one member of a large team that created and has maintained the Times’ COVID dashboard. We talk about the origins of the project, edits along the way, and what it’s like to create a live dashboard that can affect people’s health.
Support the Show
This show is completely listener-supported. There are no ads on the show notes page or in the audio. If you would like to financially support the show, please check out my Patreon page, where just for a few bucks a month, you can get a sneak peek at guests, grab stickers, or even a podcast mug. Your support helps me cover audio editing services, transcription services, and more. You can also support the show by sharing it with others and reviewing it on iTunes or your favorite podcast provider.
Welcome back to the PolicyViz Podcast. I am your host, Jon Schwabish. Now, I think all of us around the world have been checking our favorite newspaper or website or dashboard to get information on COVID-19 infections, deaths, and now, fortunately, changes in vaccination rates. And I tend to check two main websites, The New York Times, and also the Washington Post. So I’m located right outside Washington DC, so the Post is sort of like my local newspaper, so I generally check the Post every day to see what’s going on in my county, here outside of Washington DC. But I also really enjoy, primarily because of the data visualizations, the dashboard over the New York Times. And, of course, in tracking these day to day infections and deaths and vaccines, the team over at the Times, and lots of other media organizations, have had to make lots of different decisions about the visuals that they’re going to create and how they are going to communicate this information on a day to day basis. So I’m really excited to have Charlie Smart on this week’s episode of the podcast. Charlie works on the New York Times graphics department. He is one of the team members working on their COVID-19 dashboard. He had a little bit of a Twitter thread some months ago about some of the changes they made to the color palette in the maps about the COVID-19 infections rates. And so, that really spurred me to reach out to Charlie to see if he’d like to chat about the decisions that they’ve made in and around the dashboard. So I’m excited to have this conversation with Charlie, I think you’re going to learn a lot about the insides of how Charlie and other members of the New York Times think about communicating COVID-19 data, which of course is sort of different than showing information about the unemployment rate or GDP, because COVID-19 information is potentially life threatening. It’s making life and death decisions about am I going to wear a mask, am I going to go outside and am I going to be around other people. So these are really important decisions that are driven by data. So I think you’ll enjoy this week’s episode of the podcast. And so, here is my interview with Charlie.
Jon Schwabish: Hey, Charlie, how are you? Thanks for coming on the show.
Charlie Smart: Doing well. Thank you for having me.
JS: I am really excited to talk with you about all this great work you and your team at the Times have been doing, specifically on the coronavirus tracking dashboard and all the other work that you’ve been doing. So we’ve got a lot to talk about, so maybe we start by having you just talk a little bit about yourself and your background and how you got over to the Times, and then I will ping you with a series of questions to get how you’ve been managing all this data and DataViz.
CS: Sure. So I went to college for journalism, and I actually thought I wanted to do radio journalism. When I was in school, I was very into podcasts and NPR and did college radio, and did my first journalism internship at an NPR affiliate in Connecticut. And while I was there, they found out that I knew a little bit of HTML and knew how to code a little bit, but not really well at all, but they found that out and so they asked if I could build some tables for a story, and then make some maps for a story. And I kind of, at that time was sort of just learning that like data journalism was a thing, I had gone to a conference with some radio stations and college radio folks earlier that year and I’d seen actually someone from the Times graphics department speak there and was like, whoa, this is really cool. And so, kind of through doing this radio internship kind of started doing the data and graphics journalism stuff, and just thought that was really cool, and kind of did that work through college. And then when I graduated, I worked for a little while at a design studio in Boston, that sort of does data visualization work, but in more of a design studio environment, not so much news; and then, did some freelance work and worked for a little while with the folks over at the Pudding, and was just kind of like learning a lot of this stuff. I think a lot of people kind of, there’s no, I guess, there are programs now that sort of focus on data visualization and this sort of journalism, but I think a lot of people are sort of self-taught and kind of figure it out as they go along, and that’s very much what I was doing. I started at the Times a little over a year ago now, in late 2019, and I was hired to focus on elections to work on graphics for the primaries that were coming up at that time. And that lasted until, I think the Florida primary was the last one that I directly worked on, which was March 17, which was…
JS: Yeah, like, almost to the day of when everything shut down.
CS: Exactly, yeah, I think we’ve been out of the office at that point for about a week, and, yeah, New York had kind of shut down right around then, and that was when I switched over to working mostly on the coronavirus graphics. And that just sort of was dominating the news cycle and the things that we were covering at the time. And yeah, so I’ve kind of been working on a mix of coronavirus and elections things since then.
JS: Right. Before we talk about the coronavirus stuff, do you miss the radio part of your early interests?
CS: I do. It’s kind of fun to be on a podcast now. I do miss, you know, I always love audio and radio as a medium. And I think there’s people doing really, really cool work in that area, and I do definitely miss it. What I like about data journalism and visual journalism is that it sort of combines these like different interests I have in journalism and in design and programming, and kind of lets you do a real mix of things, and it’s not, you’re not siloed into one area, and you kind of move between these different things. So I really enjoy that, but I do miss college radio and that sort of thing.
JS: Right. Have you, and maybe this is premature, but have you thought about or talked about combining audio into some of the visual data visualizations at the Times? I know, Amanda Cox had done some, years ago had done some audio stuff, but it’s not a very common form, I’m just curious.
CS: Yeah, it’s not something that – I can’t speak for everyone in the department, I don’t know what else is going on. It’s not something that I’ve been kind of focused on, these larger dashboard projects. But it’s definitely an interesting, there’s been some interesting work I’ve seen sort of in that area.
JS: Yeah, interesting. So let’s talk about the Times coronavirus tracker, because there’s a lot there, you’re updating it every day, and then you’ve made some larger changes at certain points, and I’m sure some smaller changes over the course of the last, I mean, almost, you know, we’re in early March right now, so almost a year to the day, since you’ve been doing it. So can you talk a little bit about the evolution of the dashboard, and some of the, you know, both the smaller changes and maybe some of the tweets that you had made, and then I know, sort of, in later, in 2020, in November or so, there were some bigger changes that had to be made?
CS: Yeah, absolutely. So just the first thing I want to say about this project before I kind of get into talking about it is that this has been just a huge team effort in this project. So a lot of the things that I’m going to be talking about are things that other folks have worked on, I don’t know what the biggest byline ever on a New York Times story is, but this one takes up like most of the vertical space of my browser window when you scroll down to it.
CS: So there’s just so many people that are working on this, and I just want to make sure, sort of, everyone on this team, sort of [inaudible 00:07:48].
JS: Yeah. Sorry, before you dive in, I assume that’s not just the people on the graphics team. Does that include like other reporters and folks from the health department sort of weighing in and helping give your team the perspective that you need to make sure you’re representing the data in the right way?
CS: Yeah, absolutely. It’s really a cross department, like, multiple desks working on this, we sort of have people from all over the newsroom contributing to this project in collecting data, in figuring out what the data means and reporting out stories based on the data. So yeah, it’s a huge, it’s like a massive sort of cross newsroom effort, sort of similar to what might happen for an election project or something like that, where it’s just, it’s such a big project that we have people from all over working on it. So in terms of sort of the history of this project, we started tracking coronavirus cases in the United States in, in, I believe, late January. And this started out as literally a Google spreadsheet where people would go in, and every time there was a new case, would add a row to the spreadsheet, and were reporting out this data very, very manually. And we sort of very quickly realized that that was just not tenable for this project as the virus started to spread. And so, this is a part where we sort of worked with the team that does sort of more database development at the Times, and they helped us build a system to sort of more robustly in a real database sort of track these cases as they’re coming in. And this was just data that didn’t really exist in this sort of unified form anywhere, we were tracking from every county; and part of this was that we needed to make it so that the data from different places in the country were comparable. So because things were being reported at the state level, different states would report things in different ways. So for example, some states might not include cases from people incarcerated in that state in their totals, and so, we thought that those should go into the state totals, and so we did that work of adding those numbers that we reported out to the state numbers and keeping those things sort of consistent across states, and also just doing lots of reporting on the way that states were reporting cases where they were reporting only confirmed cases or suspected cases and same with deaths. And so, it was just like a huge reporting effort to collect these, and we started mapping these around that time, I think in early March, we published the first US coronavirus map, and it was a pretty simple map, it was just like circles over counties to show how many cases there were there. I remember in the first map, actually, we highlighted states, when they had had a case to show where in the country, the virus had had been to, and that feature quickly became sort of obsolete as all 50 states had the virus. And so, it’s just sort of been this continuous shifting of responding to the changing nature of the pandemic and responding to that and how we are graphing it.
CS: Yeah. So then, sort of later on in March, we started building out more maps, we made the map fully interactive, we added lots more charts. We started with just sort of this basic, like, bar chart of cases per day, and we started building out sort of different views of that more details on showing like the seven-day average of cases, as this went on, and you need to know sort of trends, and not just daily figures. We added what we call our sort of curve grid, which is the section that’s like where cases are going up and where cases are going down. And then we started thinking about other map views. It got to a point sort of in April-May, where, for a while, the virus had really been, sort of epicenter was New York City and New York, and there got to be a point in late spring of 2020 where New York had gotten the situation somewhat under control and the virus was spreading rapidly in other parts of the country. And at that point, we realized that the focus needed to not be on so much on case totals, but on what’s happening right now in my area. So it’s not just about which places had the most cases because, in that map, New York always had the biggest bubble, because it’s been so bad. But that didn’t mean that things weren’t bad in other places then.
And so, we started playing with different ways of showing that. So we tried different versions of the map showing – we did a version of the map for a little while, that color coded things by whether cases were going up or down, like, daily case numbers, sort of a change from the week before, and how steeply they were going up. And that was good for a little while, but could be kind of confusing we found. And we worked on organizing this curve grid by showing where cases are increasing, and then also, we changed it to show not just where cases are increasing, but sort of adding this other dimension of where cases are high and increasing, and where cases are still high but going down, because those are sort of, like, just because something is going down doesn’t mean things aren’t bad there. So just trying to get as much information as we could. And what we ended up doing for the map was we settled on sort of showing the number of cases per capita in the last seven days and doing sort of a choropleth map based on that, and we thought that was a pretty good way of showing how bad are things near me right now compared to elsewhere in the country. And so that sort of what the map has largely been, since I think, late May or early June was when we made that change.
JS: So when you are going through these iterations, you had said earlier that you had found that this quite didn’t work, and maybe this did work – when you say that you found that, was that just a collaboration within the newsroom or were you asking people, asking your Times readers to help you all understand what they want, what works well for them, or was it more this huge team of people in the Times newsroom saying, yeah, the pandemic’s sort of moving in this direction, we can see things spreading across the states in this way, and this is not really representing the data as clearly as we want?
CS: Yeah, it was sort of a combination of those things. We spent a lot of time thinking about this internally, and sketching lots of different possible ways of mapping and charting these things, and doing just lots of sort of experimentation to try to find the best way of showing this and just discussing internally with people both on graphics and other reporters and other desks who are covering the story, kind of like, what’s the best way that we can be showing this and what’s most important to get across to people right now, but at the same time this is a piece that lots of people look at and lots of people look at every day, and so we do get lots of lots of reader feedback on these pages and are definitely responsive to that. If we see people are interested in seeing certain things or are confused about the way we’re showing things, like, we definitely take that into consideration throughout this whole process.
JS: Can I ask, this is a little bit of a like maybe too inside baseball, but I’m curious. I mean, the Times is read by millions of people around the world, presumably, you’re getting thousands or hundreds of thousands of comments from people – how does that work? Is there a team that’s going through those comments, and are they feeding them to you, the ones that they seem relevant? And then, if there’s enough, presumably there’s so many of them, are you actually trying to quantify or visualize those comments, so that you can help improve the tool? I mean it’s its own kind of data. Right?
CS: Yeah. I’m a little limited in what I can say about specifics of how those sorts of systems work. But yeah, it’s definitely something that we’re responsive to. We’re seeing what readers are asking about these things, and it’s definitely something we like, we think about and talk about during meetings, when we’re planning out how to do these things.
CS: Another thing on the mapping design front, and sort of the way things have changed that I wanted to talk about is that sort of early on in the pandemic, when the virus was largely hitting cities like New York and sort of large cities, and especially on the East Coast, we made a decision on the maps to add what’s called a decametric filter basically, so that we were only showing areas on the map above a certain population. And so we were showing counties, but we had this sort of filter layer on top based on census block groups, and we were not filling in areas that very few people lived in. And the idea behind that was basically that counties in the west are geographically much larger, generally, than counties in the east; and that there were, in many of those counties, sort of, contained outbreaks in things like prisons or in meatpacking facilities. And we realized that if we filled in the whole county in that color, it gave the impression that the virus was very widespread in this place, when that wasn’t really the case. And so, we wanted to sort of have a visual filter to say, yes, the virus is here, but it’s not like the entire state of Nevada is overrun by the coronavirus. And so, we had this filter on the map. And I think for a while that was a really useful sort of visual tool to indicate that, but then there was this point in November around the time when we made these other map changes, that were basically what had happened was that for a long time the virus, like I said, was sort of concentrated in East Coast cities, and then moved to the south, and there was a point in November where the Midwest started to get hit really, really hard by the virus.
Michigan, for example, had been seeing through much of the summer, had been seeing 700 cases a day; and in November-December started seeing 7 or 8000 cases a day; and same with South Dakota had gone from 100 to 1500 cases a day. So just this huge increase in cases in a lot of these Midwestern states. And so, that kind of caused two problems for our graphics. The first was that the states had sort of maxed out the scale we were using on the map. So you just couldn’t see any variation anymore, and everything was just solid red. And that sort of, on the one hand, things were very bad there, and so having everything be read was not wrong, but it’s also not especially useful, if you live there to see how is my county doing compared to other counties around it, and that’s still information we wanted to get in there. And it’s always tough to make changes to these scales, because this is something that people look at every day, and people sort of become very used to the scales and can kind of identify, like, they see the red value, and they know that things are bad, and we didn’t want to just shift the scale down so that everything that was red became orange again, because people might see that and be like, oh, things are better, which was not the case.
So what we did instead was we added more values on top of the scale, we actually extended the color range into the sort of dark purple, and extended the values and sort of did that not just linearly, but so that the maximum values were pretty high, and that sort of allowed us to get a little more range in the scale and allow for some of that, you know, you can see the geographic variation in those places. And the other thing we did was, at that point, we decided to remove that asymmetric filter and just show all of every county, and the reason we did that was because the virus had moved from largely urban areas to rural areas. And so, these areas were being hit really, really hard by the virus, but they were not showing up, you know, they were just showing up as a small little, like, the one city in that county was showing up on the map. And when we realized that that was sort of misleading in the opposite direction that we had initially intended this to go. And this was another point where we sort of had some reader feedback from people who lived in the Dakotas and places like that, where they’re saying the virus is really bad in my area, but it’s not showing up clearly on your map, what can you do about that. So that was when we decided to remove that layer, and we’ve kept the map like that since then, yeah.
JS: So hypothetically speaking, if you were working on a project where you had a similar scaling issue, where the line or whatever, the value sort of punched its way through the maximum of your scale, but it wasn’t the sort of dashboard where you thought people were checking it every day, or, it’s not, you know, which [inaudible 00:21:14] it’s not life threatening, knowing how many goals the Caps scored last night isn’t life threatening, but knowing how many infections are in my county is potentially life threatening. So if this wasn’t that sort of dashboard, do you think you would have taken a different approach to that color challenge?
CS: Yeah, I think absolutely. I think this is a really unique project in the way that people interact with it. It’s not a news story where it publishes once and a lot of people read it on the day that it publishes, or in the week that it publishes, and then it sort of has this very quick drop off of readership. This is very, very consistent readership of people coming back to this every day, every week, and checking it. And, like I said, when you check something this often, people sort of begin to associate these colors with the specific situations. And so, it’s hard to make those changes, and to require people to sort of adjust their own mental model of how they understand these graphics. So yeah, if this was not a story that people were checking all the time, I think we probably would have just adjusted the brakes on the scale, and that would have been that – it sort of requires you to think a little differently of how you’re making changes. Another really important part is just signaling very clearly that we are making changes, even when we adjusted that scale, we included a big note in a box on the top of the thing that’s like, we adjusted the scale, and here’s the reasons why we had to do that. And that was when I posted that tweet thread too, sort of explaining that that was part of just like trying to get this message across of, like, what we changed and why we felt like we needed to change things at this point.
JS: Does that notice still sit there, or, was there a period of time where, like, okay, so this has been existing for X number of days, and that’s probably enough?
CS: Yeah, it was there for a few weeks, I think, and then we pulled it. But we still do things like that for other changes, like, for example, counties in Iowa, the State of Iowa recently stopped reporting data at the county level in the way that we need it for these maps, and is only reporting that sort of data at the state level. And so, we started showing Iowa on the maps as just a state where everything else is just counties, Iowa you hover over, and it’s just the State of Iowa. And so that’s another thing where we really want to be clear about the messaging about why we’re doing that, and that is not just like, we forgot about Iowa counties, it’s like we’re responding to these ongoing changes in the data collection, and in the state of the pandemic.
JS: Yeah. And I seem to remember early on that there was a day where there was a big data dump, and so there are these big spikes in multiple places. And I remember there being a big note that says, February 25th is not really representative, because it was like a big data dump that day.
CS: Yeah, totally. We call them anomalies in data reporting. The one thing about this is that this is a very messy dataset. It’s just, it’s reported from so many different places, and it’s inherently a hard thing to capture data for, and it’s hard to track this data, and we’re really trying our best to make this the most, as useful as it can be, but it is just, it’s a tough data source to work with. And a lot of the times states might, for example, some states just don’t report data on weekends, and so Mondays will have a larger spike than other places. That’s why we use the seven-day average in most places to sort of smooth that out. And there are times when states will have sort of a backlog of tests that were never logged or reported, and will report those all at once, and they’re from some indeterminant dates in the last month, and it’ll just show up as a spike. And so when we know that that’s happening, we have a team that does a lot of reporting around this data, not just collecting what states publish on their website, but also reporting on why is there a spike on this day, and when we’re able to identify the reason behind that, we have a system where we can sort of flag that day as an anomalous day and include a note with that, and have it highlighted in different color on our charts and have a little arrow pointing to it, saying, this day is an anomaly and that we’re not actually four times as many cases on this day as the rest of the week. We think doing that sort of stuff is really important, because we don’t want people to get the wrong message about the real state of the pandemic from these artificial spikes and dips in the data.
JS: Yeah. Right. So can you talk a little bit about the overall user experience of the visuals, because in some of the views, it’s a little bit more about comparing my state to another state and my area to another area, and other views are, here’s just your area, here’s just Virginia or New York State, so how do you all think about balancing those two types of users, some users probably want to make a comparison, and some users just want to know, like, is it okay for me to go to the restaurant today?
CS: Yeah, it’s like, I think there’s a range of reasons that you look at these dashboards, I think the main sort of US dashboard page with a big map at the top is really useful for just getting a picture of how is the United States in general doing in the coronavirus. We lead with sort of the curve, like, the waves of the virus, like, that main chart at the top, that’s become sort of a symbol of how well are we doing compared to the spring, compared to the peak in the winter. And we lead with this map, that’s just sort of the big map of where is the virus worse in the country right now. And so, those pages are useful for just getting sort of an overall picture of the virus in this country, and also we have the world page too where you can see sort of that, and like the global scale of like where is it worse in the world right now, and we do also publish these pages for sub-national geography for some other countries too, when we’re able to get that data. So those are good for just sort of giving an overview of how is the virus affecting this place right now. But we also know that people want information about where they’re living, like, this is not just data that’s interesting in the abstract, it’s very useful specifically to people’s lives and day to day decision making.
And so, this is something we’ve been, you know, we’ve always had a page for every state in the country, and that sort of was the first level of personalization that we had, that you can go to your state and see how are coronavirus cases in New York right now, and we actually have a page for New York City specifically, because we’re able to get zip code level geography for there. But more recently, in the last five months or so, we’ve been focusing more on very detailed personalization. So, in late November, we published a page that lets you search for counties that you’re interested in and create sort of a personalized dashboard of counties interesting to you. And so, you might look up the place where you live, and the place where your parents live, and where your siblings live, and just have these sort of locations that are relevant to you in there. And that just sort of gives you the very basic information of how bad are things right now, what direction are things trending, how are things compared to the peak of the virus, just a sort of like, at a glance, core information. And we also have that in a newsletter where you can get your personalized places, just that very simple data sent to you. And that’s very useful for people.
And then we also in December, the Department of Health and Human Services, started releasing very detailed hospital data on, like, at the specific hospital, single hospital level of COVID data. So that includes things like what percent of ICU beds are available, how many COVID patients are there, what share of the patients there are COVID patients. And so, we started publishing, we initially published a map showing that at the hospital service area level, which is just sort of a kind of small geographic unit that usually has between like one and 10 hospitals in it. And just showing like a choropleth map of how bad things are, and then we wanted to get even more detailed and show at the individual hospital level, how bad are those specific hospitals in my area. And so, we made a map that lets you search for your address and shows you the hospitals closest to you, and we kind of used a new sort of visual tool in that, where you can sort of pan around the map and there’s sort of a column on the left hand side that updates some sort of summary statistics of the hospitals you’re looking at, sort of, like, if you can imagine, panning around in Google Maps and seeing the places that you’re looking at update in that little sidebar, or similar to that idea.
And then, the most recent step we’ve taken in sort of the personalized view of this is that earlier this year, we started publishing a page for every county in the United States. So we’re now publishing over 3000 tracker pages, multiple times every day, and that was a sort of large effort to make that technically feasible to make things fast and efficient enough to be able to update these and publish these. And those pages focus, you know, they give you sort of the overview data that the rest of the tracker pages do, but they also focus on, specifically, on risk – we worked with health researchers at Johns Hopkins to determine a way to calculate sort of risk levels based on a number of factors, cases, testing and things like that. And then based on those risk levels, we give very specific advice on what activities are safe and not safe, and how people can protect themselves based on the current situation in their area. And so, we’re really trying to give people information that’s not only interesting, and gives a picture of how the virus is doing nationally, but also very useful and actionable, that is, instead of saying, there’s been 50 cases in your county today, you might not know what that means, or, it’s like, what do I do with this information. But if instead we say, your county is at an extremely high risk level, and here are the things that you should and shouldn’t do right now, that’s information that we think is very useful to readers. And that’s kind of what we want to be focusing on.
JS: Yeah. I mean, when I go to the tracker page, and I don’t know if this is because I’m logged into my account as a subscriber, or, it’s because I’ve done a search for my county in the past, or, it’s the IP or whatever it is, but definitely when I get to that part, it says, you know, right now, it’s saying that my county is at a very high risk and it’s like red and bold in that little teaser. And when you click over there, you get basically this big headline that’s like in all red, very high risk. So it’s definitely like pointing my attention to my area.
CS: Yeah, I think you talked about, like, on the homepage, that’s something that we’ve been, you know, a lot of design work has gone into that little homepage widget of showing, like, it’s not just we don’t just have the case tracker pages, we also have lots of other tracker pages for vaccines and for colleges and nursing homes and metro areas. And we have many, many different trackers that we’re publishing, and so, we want to be able to use that homepage widget to highlight the things that are interesting, or that are recently updated, or that are specific to your location; when we’re able to get that location or have you search for it, we want this to just be a very quick and useful little bit of information, like, those little sparkline arrows on the homepage dashboard just to show how are things now, how are they compared to last week. Those are just sort of pieces of information that are everyday sort of useful and interesting for people.
JS: It’s interesting in a lot of ways, because it almost feels like the trackers are more of a public health resource than news reporting, in sort of the traditional sense of here’s the story of what’s going on in this particular county. It feels like you have all taken this as almost your responsibility as experts in this field of data communication, and being able to wrangle the data and the technology to provide this resource that one might think that the CDC or the Department of Health and Human Services would have on their webpage, but it seems like you’ve all taken this responsibility of being this gatekeeper in some sense of this coronavirus information.
CS: Yeah, and we definitely are very aware that people use this as a resource, and we want to make it as useful as possible. The other thing we do is that every day we publish this data on a GitHub page, so that folks are able to use it for research or for their own analysis. We want this data to be out there and for people to use it and make use of it. And we also, like, this is definitely a sort of resource both for the public and for us inside the newsroom, like, we report stories based on this data, and based on these dashboards every day. We have people working on just looking at these numbers and seeing interesting trends, and using both the sort of visualizations we’ve created, and also just the sort of raw data. And we have teams, both on graphics and on other desks that are using this data all the time for reporting, and for doing the more sort of traditional journalism with this information. And so, it’s great to be able to support both the sort of public need for this information and the ability of the times to publish work based on this.
JS: Yeah, I wanted to ask you a couple more quick questions. Well, I don’t know if this first question is that quick, but I have one quick question before we end. When you and the folks you work with started creating these trackers, did you have a different feeling or a different approach to creating this tracker and creating the visualizations in the sense that, like you just mentioned, people are using this as a resource, and it’s potentially life-threatening information, right? I mean, this virus has killed more than half a million people in United States alone. It’s not like sport scores or the stock market which is interesting, and it’s useful for lots of people, but not going to affect my personal health or the health of my family in a direct sense. So, do you feel that way, do you approach it that way, or, do you try to put that aside and say, it’s another news story that we’re working on?
CS: No, I mean, I definitely think like, it’s definitely hard to ignore that sort of implications of this data that we’re working with, like, there have been times where, when we’re approaching one of these big milestones of case numbers or death numbers that we’re updating this and we want to be able to update it so that people are aware of these milestones, but it’s also like, it’s terrible to watch these numbers go up in real time as we’re pulling in this data. And it’s definitely like, it’s sort of hard to ignore the fact that these are real people with a terrible virus. And so, I think, we just have to kind of focus on communicating this information, like, as clearly and as useful a way as we can. It’s a news story, and it’s important information that we have to get across. One thing that we’re doing in this project actually that I think is interesting is that we offer these pages in both English and Spanish, so we have Spanish translations for these pages, which I think is really important, because this is such crucial information, and we want as many people as possible to be able to access it, and that was sort of a challenge in building this and that nothing, none of the texts, none of these pages are hard coded anywhere, like, we built a whole system to be able to swap out the translations for everything from the main copy of the story to the individual chart labels are all translated to. And so, we thought that was really important as part of the sort of service that we’re doing. On a sort of technical level, an interesting thing that sort of makes us different from a traditional news story is that there’s copy on these pages, like, there’s charts as well as sort of explanatory copy, but a good chunk of that copy is actually generated, like, we have scripts that we run that generate sentences based on the data, and basically that lets us keep this copy for all these pages fresh and up-to-date with the latest information without having an editor have to go through it, update the copy of 3000 pages every day. And so, it lets us sort of communicate this information, not just in charts, but give sort of a little more explanation, but in a way that’s like sustainable long-term for us to keep these updated. I think that’s been a big sort of constant challenge in this project is just keeping this in a way that, as this project grows, and as the data grows, and the number of pages we’re publishing grows, just being able to keep on the cycle of updating four times a day, and keeping the data as fresh as possible, is something that we’re constantly thinking about and thinking of ways we can improve, and we’re making changes right now to slim down the sizes of files that we’re sending to users, so that the page loads faster on your phone. We just want people to be able to get this data and this information sort of as easily as possible, in a way that’s as useful to them as possible.
JS: Yeah. I think there’s a whole technology side of this tracker that I think is fascinating for lots of different people for lots of different reasons. I think, a few years ago, having a copy where the numbers in the copy would update automatically was a very new thing, and people were very excited about how the graph and the data and the text actually interact with one another, and you all have sort of done that. You’ve sort of taken that all the way to the, I don’t want to say, the endpoint, because I don’t know what the endpoint is, but you’ve taken that all the way where it’s basically automating this whole thing. And this is a really interesting technology that I think, I would suspect that many DataViz folks would be interested in doing in their own work, because it does allow you to more integrate those two things together.
CS: Yeah, I think actually, in some way because we, at the Times, have done sort of elections, live elections results pages for so long, and have a lot of institutional experience working on those sorts of things, I think obviously, it’s a very different set of data and a different set of needs that you’re communicating. But those generated sentences were something that we had started using in the primaries this year, and that just seemed like a natural thing to bring into these coronavirus dashboards, and a lot of those sort of mapping techniques from a technical perspective, are things that we pulled in from these other sorts of experiences in making these sort of large dashboards that are the sort of thing that people check regularly and have new data coming into regularly. And so, I think that was sort of a useful background to be able to do this, especially, on such a tight timeframe, when we were first getting these pieces out, and it was like, there was a very high sense of urgency, especially a lot of us being located in New York, that things were so bad here that we really needed to get these pages out.
JS: Yeah. Okay, I’ve got one last question for you, you might not be able to answer this question, but we’ll see. [inaudible 00:40:50] might say all of it. So when I go to the main Times tracker page, there are, more or less, just boil down to three main graphics, there’s the map, county level map, and as you noted, not for Iowa right now, but county level map; then there’s a series of small multiple, little line charts for cases getting higher, going lower for all the different states; and then the third sort of main visual is these little stripe charts for each state and sort of this really, what I think is a really cool table. So from those three main sections, do you have a favorite part of the, from a data, DataViz perspective, not from the impact of the virus, but from a DataViz, technology perspective, do you have a favorite part of the page?
CS: So on a sort of personal level, I mostly work on the maps, and so, I think the maps have been very useful, especially, at different points of the virus in showing, just in highlighting, like, where in the country, things are worse and I think that’s a very easily understandable visualization. I do think the small multiples, the sort of curve grade, we call it, is maybe one of the most useful things on the page, in that it shows at the level of each state, it shows both the curve of cases in that place, but also, I think the way that we group them is extremely useful, and this was something that I talked about earlier, where it’s not just like the direction, but it’s grouped by both direction and sort of how bad things are, and I think people, you know, there was a time when there was just no states in, like, where cases are low, maybe the US Virgin Islands, was in that, but there was hardly any places in the cases are low area. And that was something that I saw lots of people tweeting about and responding to that it was just bad everywhere at that point, and everything was going up, and it was really sort of striking just to see sort of how these move around between those sections. And so, I look at that section a lot, I think the table is useful as like [inaudible 00:42:57] I’m just going to say all of them, but I do think they’re all useful for different reasons. I think the table is…
JS: Yeah, [inaudible 00:43:03] for different reasons for different people doing different things.
CS: Yeah, that’s really the big part of this is that we know, you know, we have a very large audience that’s looking at this and that everyone sort of has different needs and different interests, whether it’s like you want to know, is it safe to see friends outside or not, or, just trying to figure out your day to day decision making, versus someone who’s looking at this to get a sense of how things are doing in the whole country versus epidemiologists who might just want the data. And so, we publish the data separately from the whole visual presentation of it. I think we have spent a lot of time thinking about how each piece functions, and sort of what the different needs of different users are. The one other part that you didn’t mention is the very top chart, that’s just this sort of curve and those top figures, and I think, as a sort of, I feel like this curve has just become sort of like a symbol in a lot of people’s minds of the virus and just sort of like these different waves of it, and you can kind of like, I know, at least for me, I can remember, sort of, memories of what things were like at the top of that second wave or at the first wave in April, and sort of have a lot of associations with this curve specifically. So I think all the parts of it kind of function together to make a tool that we think is pretty useful for people, and we hope that people are able to too used to get information that they need.
JS: Yeah, that’s great. Charlie, thanks so much for coming on the show. I mean, we covered a lot of ground and it is a remarkable project, and congrats on working on that. And hopefully, you’ll get to maybe work on other things at some point.
JS: [inaudible 00:44:53]
CS: I hope so.
JS: Yeah. Thanks for coming on the show, I appreciate it.
CS: Yeah, thank you for having me. It was great talking to you.
And thanks to everyone for tuning into this week’s episode of the show. I hope you enjoyed that. I hope you’ll take a look at all the links and resources that I put in the episode notes of this week’s episode of the podcast, and go check out the New York Times COVID-19 dashboard so you can see all the things that we talked about in this discussion. So, until next time, this has been the PolicyViz podcast. Thanks so much for listening.
A number of people help bring you the PolicyViz podcast. Music is provided by the NRIs, audio editing is provided by Ken Skaggs and each episode is transcribed by Jenny Transcription Services. If you’d to help support the podcast, please share it and review it on iTunes, Stitcher, Spotify or wherever you get your podcasts. The PolicyViz podcast is ad free and supported by listeners. If you’d like to help support the show financially, please visit our Patreon page at patreon.com/PolicyViz.