Selecting transcript lines in this section will navigate to timestamp in the video- Visualizations consist of two components. There is the space where the data is displayed, and then the objects that represent the data. So a lot on this course is going to be about the data itself, right? How do think about it, how to display it, how to make it compelling, etc. In this video, we're going to concentrate on the space where the data is displayed. We're mostly going to be looking at 2D planes here, although there is such a thing as 3D visualization also, but that's outside of the scope of this conversation. The point is that the size of that plane is very important.
The scale drastically affects the visualization and the story that it tells. So here we have a bar chart with some interesting and easy-to-see patterns, right? What's going on? A is doing well, B is doing pretty well, and the rest are doing less well. Here we have another chart. No one's doing that well in this one, right? A is sort of slightly better than the rest, but they're all kind of doing pretty poorly. It's also hard to see the variation between the groups, D, E, and F look identical here. In this chart, all of them but A and B are doing really pretty poorly. Now maybe you noticed that these charts are actually all the same. If you look at the Y axis, you'll see that the data values are all consistent, it's just the scale that's changing.
So in the left-hand chart, the scale goes from zero to 250, in the right-hand chart, it's from 100 to 240, and the middle is from zero to 2,000. And that dramatically affects your perception of the data that you're looking at. And this is really true with all charts. There's no way to display data visually and divorce it from the scale in which it's being displayed. But you might ask, how you can know how to set the scale for a chart. Which one is right when you look at these three charts? Which is the correct way to do it? It sort of depends, but there are definitely some good rules to follow, and in this particular example, the one on the left is quote unquote, right, and by the end of this video, you'll probably agree.
So, the first thing to consider is, am I using a chart type that requires a certain approach to scaling? And for the most part, the answer is no, but there are some chart types that do have specific rules to follow. So, as an example, bar charts really should always start at zero. And here's why, the height of the bars in a bar chart actually means something. So if you look at the right-hand example, if you look at F, for instance, that data point looks like it has practically no data in it. But the fact is, we're already above 100 at this point.
The left-hand example is an accurate representation of it. I can see that F has a bunch of data, it's above 100, it has less than A maybe, but that missing data in the right-hand example just isn't fair, it just doesn't accurately represent how much data is in that F category. There are plenty of other chart types that don't have to start at zero, but bar charts really always should.
The second question to consider is, am I comparing things in a self-contained context, within the context of the chart? So, in this example, let's say you have a chart that has no external reference, there's no need to put these numbers in the context of other numbers outside of the data that I'm actually displaying? So here, we're looking at widget sales for a company. And I actually generated this chart in Excel, and it's automatically exported it with a scale that started at zero, and went up to $12,000. But as you can see, the numbers are only from 5500 to $10,000 or so. So since I don't need to compare this to any other numbers, and I'm in a bar chart, I don't need to show the numbers below 5500, I don't need to show the entire scale. I'm just showing a change over time. So I can actually set the scale here to 5,000 to 10,000 if I want to, right?
I have a nice round top and bottom number on my graph, and I'm telling a complete story. Or, I could actually add the exact same value above and below the minimum and maximum values so that there's the exact same buffer above and below at the bottom and the top values in this chart. So this is actually the most balanced visually, right? There's exactly the same number of pixels above and below the line. But it's kind of weirder numbers on the axis, right? 4472 to 10,472. I'm not a big fan of this approach. More importantly, research has shown that people are much better at remembering round numbers, so using random-seeming non-round numbers like these won't help your audience understand or remember your data. I could also set essentially an arbitrary scale.
So let's say that I'm a sales manager at this company, and I have a sales target for my people of selling $20,000-worth of product, and I want to show these numbers in the context of that reference. I'm manipulating the scale for a valid purpose. I'm not intending to influence the perception of the data for evil purposes, but within the valid context of this sales target, I'm just showing people where they stand. It's okay to change scales for good reasons, but you really do have to have a solid reason and you have to be consistent with that. Another influence on scale is whether or not you have an external reference, right? You have some arbitrary number outside of the context of the data that I'm referring to. So here we have the same chart, and this is very similar to the last example, where we were showing it in the context of an internal sales target. Now sometimes you might want to set the scale based on an external target, right?
This is not about showing my salespeople how they're doing compared to a target that I set, but it's more about an external number, right? This is the total widget sales and entire marketplace, so I want to show them within that context. It's a very similar motivation, it's a very similar thing, I'm just sort of setting context and comparing it to a number, but if it's external to you and external to your data, you have to do it that much more carefully and thoughtfully so that you avoid the perception of bias on your part. And speaking of bias, that's always a great question to ask yourself. Am I being fair and unbiased, especially when thinking about scale, you have to think about this very, very carefully. You don't want to be that guy, right? You don't want to put on a suit and look all pretty and legitimate and then play games behind people's backs. No one likes a cheater.
So it's a really good exercise when you're creating visualizations, to look at your chart with different scales. I would recommend that you experiment with scales, change it up, see how it looks, think about whether or not you're being accurate, think about whether there's a reason you're setting a particular scale if you are, and above all, look for bias, and eliminate bias whenever you can. And really when in doubt, channel your audience. Think about a few things. One is think about people who don't know the data and don't know the story you're trying to tell, and make sure that your scale is going to help them understand your data and help them get the story that you're trying to communicate.
But especially, think about your audience in terms of two categories. Think about your skeptics and your believers. When you set a scale, ask yourself are the skeptics going to believe this? Are they going to buy it, are they going to think that this is an honest representation of the data? And the same thing on the believer's side. Make sure that when you present data in a certain scale, you're not just reinforcing the believers, you're not just giving them the data story that they want, make sure that it's a really valid, accurate representation of the data that you're sharing. Use your powers of scaling for good, not evil.
Legends and sources
Selecting transcript lines in this section will navigate to timestamp in the video- In this video I'm going to talk about the necessity for clear legends, so that users understand what they're looking at in your visualizations. And also the importance of including your sources. These details can make or break an information design or data visualization project. Legends, sometimes called keys, are those explanatory queues that are often found in the bottom right-hand corner of a chart, and so even when you create a chart in Excel it sort of puts them over there on the right-hand side, it helps the viewer understand what they're looking at. In the most basic form of a chart, like this one here, where everything is labeled, there's really only one thing, I know how many Snoods each one of my sWhatchamacallit has, and Whosawhatsits, et cetera, I don't really actually need a legend in this case, it's one of those rare times when I don't, but for the most part, you're going to need legends, especially if you start adding weird shapes or colors, or, more data, more layers of data, in this case I need to know what the Tally Hos are, versus the Hither Tos and the Be Bops.
If we don't include a legend, if I can't understand what these different colors mean, then I'm creating art, it's not pretty art, but it is just a picture, there's no knowledge, there's no value here. But my job is to inform people, not just to create pictures. Better than a legend or a key that's off to the side, by the way, is in-line labeling. If you can include your labels identifying color values, shapes, and the like in this way, you're helping your audience keep their eye on the data, which will always be better than forcing the tennis match watching, back and forth, swivel head behavior a legend requires.
Zen Buddhism has a concept called beginner's mind. And the idea is that you should come to everything with an open mind, without any preconceptions. This can be hard to do when you're doing data visualizations, when you're working on something for a long time, it's very detail-oriented, you know everything about it by the time you're done, and trying to think of it as someone who knows nothing about it can be difficult. But if you can channel a child's mind, or a novice's mind in the topic area that you're discussing, if you can get to a point where you have great user empathy for people who know nothing about what you're showing them, then that will help you understand what's hard for them to understand, which will help you figure out what you should include in the legend.
Sometimes you need more than just a legend, especially in interactive graphics, so this hospital pricing visualization that I shared earlier, there's a lot going on here, there's first of all a ton of data, there's different colors, there are these different-sized bars, there are two different types of bars, if I click into this thing, I get a lot more detail, what do these dots mean, what are these axes, what are these other bars, there's a lot here. And so, no little legend or some labeling on this screen is going to do me justice to understand it. Even if I was fairly knowledgeable about this topic area.
So what I always do, especially for interactive graphics, is I actually create an entire how-to, I'll take a screenshot of the interface, and I'll look at it, again with that beginner's mind, what might I not understand, and I will draw little lines, and I will label everything, I won't leave it up to anyone's imagination, what's going on here, make it very, very clear of what all the details and shapes and colors represent.
The other thing that I always include in screens like this, usually at the bottom, are the notes, sources, sometimes it's just about the data, just where the data comes from, oftentimes I'll also include notes on the technology. One of the most important reasons to provide sources is for credibility, so if you're creating a visualization on a topic, especially if it's a controversial one like politics or climate change, then when you provide your sources, it'll allow your users, whether they're believers or skeptics, to look at the data themselves. And this has two advantages, one, credibility, as I mentioned, in that they can disprove your thesis or confirm your thesis, or at least get the sense that they have the opportunity to judge you based on the merits of the data, and not just have to take you at your word for it.
And the other reason is that a lot of your users, if they're interested in the topic area, might want to dig deeper into the data themselves, so by providing a resource, you're actually being a good citizen and giving them access to the information that they can go play with on their own. If your mission is to inform your audience, which I would argue it should be, if you're in data visualization, then that's great. If not, then you might want to try another calling.
Data visualization can be a long and complicated process, and when you're finished, the last thing you want to do is all the busywork, going through the details, like the legends and the sources and the how-tos, but it really is as important as everything else that you're doing, don't rush it, don't neglect it, give it the time and effort that it deserves.
Nhận xét
Đăng nhận xét