Visual Display
Illustration and iconography
- In this video, we're going to talk about the use of illustration and iconography in data visualization. I can really sum it up in one sentence. Visual elements, illustration and iconography, are really essential for visualization in many cases. They help make your content relatable to your audience, but you do have to be careful. You have to resist the temptation to overdo it and understand the time required to do it right. We communicate visually for many reasons, and in visualization, I think it comes down to three things primarily.
One is tangibility. We're trying to make the intangible tangible. We're trying to make data, numbers, into things that people can relate to and understand, and visual elements really help do that. Whether it's a chart where you can easily visually compare datasets or illustration to help bring themes to life, it's really an important part of what we do to make the intangible tangible. Second is making the complex simple. Illustration and iconography can really help you reduce text. You don't have to explain something in long paragraphs of text when an image will help bring it home.
They can help convey meaning quickly and easily. And finally, it's about context setting. Visual elements can add a lot of value and they can reduce distraction. They can actually help grab attention, and of course, they can really help establish and emphasize themes in your infographics and visualizations.
So, let's say we're going to create some healthcare infographic, and of course we want it to be shared widely on Twitter and Facebook and Pinterest, so we want our imagery to really jump out and grab attention. So, our first temptation, of course, is to use a very large illustration or a photo of a theme-oriented image.
So, in this case, a healthcare image. We want it to be immediately clear, as soon as someone sees this graphic, this is about healthcare. If you see it on Facebook and you're a doctor or you're in healthcare or you care about healthcare, as soon as you see this stethoscope, you're going to say, oh, this is all about healthcare, this is about me, I want to look at it. So, as you start adding content on your infographic, and let's say we have four pieces of content we want to share, of course you're going to start to want to add imagery to go with the elements of content that, again, reinforce your visual theme.
And probably, those little images that go with each little content bite should be relevant to the content that it's related to. Both uses of imagery, the large, thematic image as well as the smaller more focused images, of course, are very valid uses of illustration and iconography. They really will help capture attention and help your users understand the meaning in the graphic. They'll draw the eye and they can help reinforce your themes.
But you have to be really careful in what you do and how you do it. So, for instance, in the lower left-hand corner here, we've added a bar chart made out of hypodermic needles. And while I can certainly see that the third syringe is the tallest and the fourth syringe is the shortest and I can read the data to some degree, it actually has been proven that it's harder to read the real relative values when you're using irregular shapes like this as opposed to just the trusty old rectangular bar.
So, in this case, we're actually detracting from understanding rather than adding to it, even though I can sort of read general trends in the data. Now, if you have to use and you want to use theme-based imagery, which I do recommend, just be careful about how you do it. So, here we have pills, which, of course, are related to our theme of healthcare, but they're much more uniform. So, the shapes of these bars now are much more uniform and clear, I can really more easily tell the relative values of the bars as opposed to just sort of generally the values of the bars. If you also note now how the main image can really draw the eye from the top left to the bottom right.
So, our image is really reinforcing our linear storytelling structure. We're drawing the eye through a linear progression. One thing to keep in mind is that illustration and iconography can be difficult. It can be hard to create even very simply concepts in icons, and even sourcing icons from third parties like from stock photo and stock illustration sites, it can be hard.
So, that's not to discourage you from doing it, I actually strongly recommend it, but you do want to be careful about it. You want to make sure you factor it into your schedules and your budgets on the projects that you do.
Typography
- Typography is at the foundation of great design and it's at the foundation of really good data visualization as well. All the standard design typography principles apply but it's extra important here because every emphasis that you make in data visualization is so important. It can really change your audience's perception and understanding of the data your communicating if your typography is done weirdly or if it's not done thoughtfully and strategically.
Accuracy is the key, so you have to be extra thoughtful about your typography to call attention to things to emphasize the things that you want to emphasize. This video won't cover the fundamentals of typography but there are other great courses like this one from Ina Saltz here on Linkedin. So you should definitely check that out. But I will point out some specific ways typography applies to infographics and data vis specifically. So, there are basic types of type in charts and graphs. They may seem obvious how to approach but there are some nuance between them. And so you want to think about these categories and use your type design skills to help users understand what type of information they're looking at, right?
Legends and axes labels versus data labels versus callouts, et cetera. So in other words, you want to sort of keep legends and axes in one style for instance. Labels in another style, callouts in another, et cetera. Makes it easier for your audience to quickly understand the visual language of the visualization that you're creating for them. Axes and legends should always be labeled, right?
Here we have Snoods and Whatchamacallits and if there were no labels on the y and x-axis how would I know what I was looking at? Right? The norm is to use small text, you don't want to draw attention to it. And I always recommend that you use gray if you're on a white background, right? You want your legends and your labels to be faded back. They have to be legible but they're not what should be drawing the eyes, so they should not be high contrast.
You very typically see labels like this where we have a lot of stuff going on so people will turn the labels on their side just to squeeze them in. While not strictly forbidden, it's not really a good idea. Not to mention in this example, you don't really need to label every single bar.
Once I know that the black bar is Whatchamacallits and the lightest bar is Gizmos, I don't have to repeat that labeling. Might as well reduce visual distraction and not label it all the way across the chart. But a better way to do it is to not force your audience to tilt their head to read your labels. So, you know, not that this is the only way to do it but make 'em, you know, horizontal so they're legible and readable. Don't really make your audience tilt their heads. Now, of course, usually, almost always in a chart, you need numbers on it, right? I need my y-axis to tell you that it goes from zero to 1,000 in this case. No numbers doesn't make sense. Sometimes you need to add more numbers, right? There may be an argument for including 250, 500, 750, et cetera in this chart. You know, it's not always the right choice, I always say at a minimum you have to have the bottom number and the top number.
But if there's compelling argument for including numbers in the middle, include 'em. And maybe 750 is a really important number. So in this case, add 750, make sure it's really bold and red and maybe even draw a dotted line. If it's a really important to see that third black column to the right that it's almost at 750 but not quite, then that's a compelling argument for making sure 750 stands out. Maybe 750's a benchmark, a target, another argument for making sure that it's really there and bold and bright and visible.
As with axes labels, there's a constant tension with data labels, right? There's a balance between being accurate and also remaining legible and readable and beautiful, right? Because, you know, a designer would say let's label nothing, it's much more pretty and aesthetically pleasing. And a data person says, label everything 'cause every single data point is important. And you have to find that balance for yourself. I would certainly tend towards labeling fewer things rather than everything because when you label everything you're saying, everything is important. Really what you're saying is that nothing is important. So, I can label everything or I can label nothing or I can just label maybe the few things that are actually important to this audience. The things that I'm really focused on. Maybe, sometimes, it's really just one thing that needs a label. So I can just label that one thing and make it really big and bold. Or, use design, even better, to really make it pop and really make it stand out with background color, with contrast, with design, et cetera.
Good design will always find that balance between accuracy, readability, storytelling, the granularity of the data you need to communicate, as well as aesthetics. You can make the case that this really isn't even a label anymore. This is more of a callout, right? This is really drawing attention to something. You can use different font face, weight, color, backgrounds, images, color, et cetera as I have here, to really make a big impression. To really make a callout draw the eye to something. With, you know, not really crazy adjustments to your typography, it's sort of like a label, it's sort of like a callout. And, you know, in this case, it's really making it clear that I want you to look at that one data point. Here's an example from the real world.
A project that my company did for a client. And just about every example of typography is at play here. And there's a lot of variety in the typography without it feeling like it's 800 fonts and all kinds of things going on. You know, we have a title which is the very large, bold type in the upper left hand corner. We have a callout quote which is clearly different typography than above and draws the eye in a very different way. Obviously, the most important data point that we want you to see is that 5.69 in the center of the image there. That's the number that is the most important number here so I'm really making it clear to look there. But then using a very similar graphical style, with a little callout box, for the other similar types of data points but lighter typography to make it clear that they're sort of less important, in addition to the size of the callout box.
You'll also notice that we have two different types of labels on our axiss. So we're showing both the percentiles, the 25th, 50th, et cetera percentile of the scores. As well as the actual values of the scores the 5.5, 5.7, in the lighter gray. And you'll notice in this case, you know, you can make the case, hey why is the value, the 5.7 a lighter number when the actual number down below, the 5.69 in the big callout, is the number we wanted you to see. And that was just a client decision. They wanted to sort of emphasize in the labels, the percentiles but emphasize the value in the callout. So that was a conversation that we have, it was a very strategic decision. Even though the decision seems like it's sort of at odds, you know, internally within the same graphic. And again, typography, just from that standpoint, on the far left side in the bottom we have our sort of legend explaining the relative value of the scores, one versus seven. And then finally, the very small type, you know, very deemphasized typography where we sort of have our footnote. So a lot of different examples of typography all in the service of a data visualization using a lot of different sort of basic principles of typography but nothin' fancy goin' on here. So I started this with a list of four things but really only talked about the first three. We talked about axes and legends, data labels, callouts.
You know, infographics are sort of their own thing, right? They're like any design project. All of your type design skills and experience will go into creating a full infographic. Great infographics always have great type design. So, I recommend looking at the typography courses here on LinkedIn for a deeper dive into the basic principles. It's a really fascinating and important topic for designers. It's a very powerful tool to make your work more beautiful and to help improve understanding and impact for data visualization specifically.
Follow the rules and best practices for typography in your data visualization and you'll be doin' really well. And then think about the nuances to type that are specific to visualization, such as, making your axes labels gray and desaturated, your footnotes really small of course, and really calling attention to the data points that are the ones you really want to call attention to.
Position, size, color, contrast, and shape
- There's a limited number of ways to show differentiation between objects on a page or a screen in order to reveal trends, patterns, and outliers, which of course is the primary goal of data visualization.
There may be an infinite number of ways to execute on these things, but it's really a pretty simple list of ways to do this. And it's limited to position size, color, contrast, and shape. It's maybe surprising that there are so few options. This is what data visualization is really all about. Your goal to share knowledge based on data usually what's interesting is where it varies. And so there's a valid story in this chart, but if all charts look like this, the field wouldn't be very interesting. Usually we're looking at charts that look more like this, right? Or maybe this or this. What's interesting are the outliers, the trends, the patterns in these different views. So there are five primary ways to show that differentiation to allow your audience, to see these things. And some of these are more powerful than others, by the way, And let's start off by talking about position, okay.
So position is interesting, position in many visualizations is actually driven by the data, right? You're not going to change the position of these dots in the scatterplot. The X and the Y coordinates put the dot exactly where it belongs based on the data, right? You can't change them manually. By the way, position is a very strong pre-attentive trigger. We talked about visual perception, the idea that you react subconsciously very quickly, pre-attentively to certain types of triggers. Position is very, very powerful. So human beings are very good at seeing when the dots are very close to each other, when they're far apart, when they track and certain sort of patterns.
Now in an infographic, you can use position to provide emphasis, right? If you put things at the top of the page, you're telling your audience, look here, these are more important than the stuff at the bottom of the page, or you can draw the eye with the position towards the center of the page or towards the upper left-hand corner which is a good spot for left to write language cultures like English, but maybe less so in other cultures, right.
In infographics position is an arbitrary decision to draw attention, to tell your audience something is more or less important. Of course, in data visualization. It may or may not be possible to do that.
Size can also be data-driven right by the way, size is also a very good, strong pre-attentive trigger. We immediately pre-attentively see large dots and small dots. Now in this case, a bubble chart, the large dots are driven by some variable, right? And so again, the data decides the size. You can't manually change it. Now you could on a chart, theoretically, have a scatterplot, just X and Y coordinates a bunch of dots and manually change the size of one in order to draw attention to it and maybe label it. But boy, you should do this very cautiously because most people might assume that this is a third variable, right? It's not really the same thing as an infographic where yes, you can make certain things really big to create a focal point to draw the eye and to again, communicate emphasis or importance to your audience.
Color is another method. It's very, very challenging, but it can be pretty powerful. Now it's not as good a pre-attentive trigger as some of these others that we're talking about, but it also depends on what color you're using. And other attributes of the color, such as the intensity of the color, the luminance, which we'll talk about in a moment. But like size and position, it can be used to draw the eye manually for emphasis on say an infographic, or it might be used to represent a variable, for instance categories, a fourth variable in this bubble chart, right? The green categories, one type of thing. And the gray is another kind of thing. Now you have to be cautious with color for a few reasons. One, like I said, it's not a very strong pre-attentive trigger. Now, here we have green versus dark gray, light green, dark gray. This is easy to see pre-attentively, but if you had a bunch of light blue dots and a bunch of light green dots, especially if they're smaller, it gets harder to pick up on those differences pre-attentively. So color is an interesting one. You have to be careful about.
You also have to worry with color about accessibility. There is the most common form of colorblindness, which occurs in up to 8% of men, by the way, is called red, green colorblindness. For those people, they will have difficulty distinguishing between certain shades of colors, particularly reds and greens. And so here we have a rainbow run through three different colorblindness simulators, and you can see the orange Stripe and the green Stripe in the lower right-hand example. They're barely detectable that they are different. So even if you have a legend, this can be a problem. In addition to colorblindness, there is also the issue of contrast sensibility with accessibility. So if you use a very light shade of gray against a white background, people with certain visual difficulties may have trouble picking up and certainly reading that text. This affects by the way, a much larger number of people, the contrast issue up to 30% of people, right? So I can tell you from experience as I get older, it's harder for me to see smaller type and or less contrasty type. It just happens. So something to be very aware of when you're creating visualizations. Now, luckily for us coming back to colorblindness, there are websites like this one colorbrewer, which allow us to identify colorblind friendly pallets.
And so I can find collections of colors that will not cause problems for people with different forms of colorblindness. And there are different options, different decisions I can make along here for addressing different types of color palettes, such as sequential, pallets versus diverging pallets and other things. Not going to get into these here, but long story short, there are tools that can help us avoid some of the challenges with color. And by the way, coming back to this image here, you know, this visual on the left was run through simulators. These simulators are available on various websites and or as extensions now in the browser. So you can test images, you create and see what they would look like, simulating different types of colorblindness and even other vision deficiencies. So we can solve for these problems with tools like color brewer and some of these filtering tools that help us sort of see what our work will look like. So we can change them, of course. Another really good way to emphasize things is with contrast. And so contrast unlike color, although clearly they sort of go together to some degree. It's about the intensity of the color is a very strong pre-attentive trigger. So light blue, light green, not so good, dark blue, light green, much more effective. And that's more about the contrast than the color differentiation. And of course we use color in data visualization in a bunch of different ways, sometimes like with size or position, it can be simply data-driven.
For instance, in this choropleth map, we have geographies and the color intensity within the geographic shape tells us what the measure is. So the darker color means a higher number, the lighter color means the lower number, et cetera. And of course we can use color manually too. We might fade back all, but one data point or all, but one geography and have that one pop out in color just to bring attention to it. So maybe it's data-driven, maybe it's not. So here's an example of a project that I did that uses contrast. And so what we're looking at here at the moment, we're just sort of passing time. You can see the keyboard typing as you have an opportunity to read on the left and what's happening is we're looking at one letter being typed at a time from the book, Anna Karenina, and then eventually it just speeds the way up. And what we're being shown here is actually the concentration of usage of different letters on a keyboard from the book, Anna Karenina. So if you took the entire text of Anna Karenina, there are a lot more, A's very few q's, okay. And no surprise there, but it's sort of the, the, the project, the goal here was to look at whether certain keyboard layouts maybe are more efficient than others. So this is the Cordy layout. You also get to see the pattern usage of different fingers, but the transparency, the contrast tells us which letters are used more often. And we can see that in different patterns, different keyboard layouts, some are much more efficient than others. Obviously you want your fingers to be all sort of stuck in one area, the more your fingers jump around, the less efficient it is. And so that's the purpose of using contrast for this particular visualization. So it's, data-driven use of contrast in this case, but the other use of contrast is simply to allow you to see volume. And so here's an example from the same project here, we're looking at those keyboard jumps. So how often do you jump from the top row to the bottom or the bottom to the top, if you were typing out all of Anna Karenina in this case. And so it's a bunch of lines all across, you know, sort of all on top of each other, if these were all opaque, it would be harder to see the patterns just by making them translucent.
We can see where all the lines criss-cross, which ones are thicker, which ones are thinner. It's just easier to just sort of pick up the actual pattern through the spaghetti, through the noise. And another example where we use contrast to simply draw the eye to things here, we have a visualization, which is all about job trends in the United States. And I'm already part way down the visualization at this point, where I talk about how wages have increased or decreased relative to inflation for different categories of jobs. And this is sort of the overview look at all of the data. I don't even need to explain what we're looking at here necessarily, but as you scroll through this story and we highlight a portion of the story, for instance, this category of jobs management occupations, well, we can see that category lights up and then all the other categories fade back. So just using contrast to draw attention to things in that case, an entire job category in this case to three specific jobs whose pay went up the most in this case to another category in this case to other specific jobs, et cetera, et cetera. So contrast outlining can draw the eye manually to things, to help tell a story.
And last but not least there is shape. And actually, maybe it is the least, it's my least favorite way to distinguish between different types of data. Because couple reasons, one is it can be visually confusing, especially if it's not used sparingly. For instance, if you had a scatterplot with five categories of things and use five different shapes, that's not going to work well. And the reason is the second reason I don't like it is because it's a very weak pre-attentive trigger. So here's an example, pre-attentively you don't pick up on where all the squares are versus where all the circles are. Is there a pattern here? Is there a difference between the two? I don't know. I have to really pay attention and look around and investigate.
Whereas if I add in this case color and or contrast, now I can actually make a pre-attentive judgment. And by the way, circles and squares are very different shapes. One has pointy corners, the other one doesn't. So even with very well-differentiated shapes, they're not well differentiated pre-attentively. Now imagine if these were squares and diamonds, right, a square and a diamond are the exact same shape. Just one is rotated 45 degrees, right? Or at least close enough, long story short, very weak pre-attentive trigger. Therefore not a powerful tool to use in the tool belt. If you're going to use shape though, make sure you dual encode it double up on the pre-attentive triggers by using the shape as well as we're doing here. Something like color and or contrast. So yes, we have five primary techniques to show differences for our audience. Experiment, figure out what works for your audience.
Think about how many different ways, how many different types of things you need to reveal to them, experiment with it, really test it, mix and match, find rules that you can use over and over, but really see what works and make sure you're, you're a thoughtful and careful about your audience's pre-attentive response to your visuals.
The importance of scale
- Visualizations consist of two components. There is the space where the data is displayed, and then the objects that represent the data. So a lot on this course is going to be about the data itself, right? How do think about it, how to display it, how to make it compelling, etc. In this video, we're going to concentrate on the space where the data is displayed. We're mostly going to be looking at 2D planes here, although there is such a thing as 3D visualization also, but that's outside of the scope of this conversation. The point is that the size of that plane is very important. The scale drastically affects the visualization and the story that it tells. So here we have a bar chart with some interesting and easy-to-see patterns, right? What's going on? A is doing well, B is doing pretty well, and the rest are doing less well. Here we have another chart. No one's doing that well in this one, right? A is sort of slightly better than the rest, but they're all kind of doing pretty poorly. It's also hard to see the variation between the groups, D, E, and F look identical here. In this chart, all of them but A and B are doing really pretty poorly. Now maybe you noticed that these charts are actually all the same. If you look at the Y axis, you'll see that the data values are all consistent, it's just the scale that's changing. So in the left-hand chart, the scale goes from zero to 250, in the right-hand chart, it's from 100 to 240, and the middle is from zero to 2,000. And that dramatically affects your perception of the data that you're looking at. And this is really true with all charts. There's no way to display data visually and divorce it from the scale in which it's being displayed. But you might ask, how you can know how to set the scale for a chart. Which one is right when you look at these three charts? Which is the correct way to do it? It sort of depends, but there are definitely some good rules to follow, and in this particular example, the one on the left is quote unquote, right, and by the end of this video, you'll probably agree. So, the first thing to consider is, am I using a chart type that requires a certain approach to scaling? And for the most part, the answer is no, but there are some chart types that do have specific rules to follow. So, as an example, bar charts really should always start at zero. And here's why, the height of the bars in a bar chart actually means something. So if you look at the right-hand example, if you look at F, for instance, that data point looks like it has practically no data in it. But the fact is, we're already above 100 at this point. The left-hand example is an accurate representation of it. I can see that F has a bunch of data, it's above 100, it has less than A maybe, but that missing data in the right-hand example just isn't fair, it just doesn't accurately represent how much data is in that F category. There are plenty of other chart types that don't have to start at zero, but bar charts really always should.
The second question to consider is, am I comparing things in a self-contained context, within the context of the chart? So, in this example, let's say you have a chart that has no external reference, there's no need to put these numbers in the context of other numbers outside of the data that I'm actually displaying? So here, we're looking at widget sales for a company. And I actually generated this chart in Excel, and it's automatically exported it with a scale that started at zero, and went up to $12,000. But as you can see, the numbers are only from 5500 to $10,000 or so. So since I don't need to compare this to any other numbers, and I'm in a bar chart, I don't need to show the numbers below 5500, I don't need to show the entire scale. I'm just showing a change over time. So I can actually set the scale here to 5,000 to 10,000 if I want to, right? I have a nice round top and bottom number on my graph, and I'm telling a complete story. Or, I could actually add the exact same value above and below the minimum and maximum values so that there's the exact same buffer above and below at the bottom and the top values in this chart. So this is actually the most balanced visually, right? There's exactly the same number of pixels above and below the line. But it's kind of weirder numbers on the axis, right? 4472 to 10,472. I'm not a big fan of this approach. More importantly, research has shown that people are much better at remembering round numbers, so using random-seeming non-round numbers like these won't help your audience understand or remember your data. I could also set essentially an arbitrary scale. So let's say that I'm a sales manager at this company, and I have a sales target for my people of selling $20,000-worth of product, and I want to show these numbers in the context of that reference. I'm manipulating the scale for a valid purpose. I'm not intending to influence the perception of the data for evil purposes, but within the valid context of this sales target, I'm just showing people where they stand. It's okay to change scales for good reasons, but you really do have to have a solid reason and you have to be consistent with that. Another influence on scale is whether or not you have an external reference, right? You have some arbitrary number outside of the context of the data that I'm referring to. So here we have the same chart, and this is very similar to the last example, where we were showing it in the context of an internal sales target. Now sometimes you might want to set the scale based on an external target, right? This is not about showing my salespeople how they're doing compared to a target that I set, but it's more about an external number, right? This is the total widget sales and entire marketplace, so I want to show them within that context. It's a very similar motivation, it's a very similar thing, I'm just sort of setting context and comparing it to a number, but if it's external to you and external to your data, you have to do it that much more carefully and thoughtfully so that you avoid the perception of bias on your part. And speaking of bias, that's always a great question to ask yourself. Am I being fair and unbiased, especially when thinking about scale, you have to think about this very, very carefully. You don't want to be that guy, right? You don't want to put on a suit and look all pretty and legitimate and then play games behind people's backs. No one likes a cheater. So it's a really good exercise when you're creating visualizations, to look at your chart with different scales. I would recommend that you experiment with scales, change it up, see how it looks, think about whether or not you're being accurate, think about whether there's a reason you're setting a particular scale if you are, and above all, look for bias, and eliminate bias whenever you can. And really when in doubt, channel your audience. Think about a few things. One is think about people who don't know the data and don't know the story you're trying to tell, and make sure that your scale is going to help them understand your data and help them get the story that you're trying to communicate. But especially, think about your audience in terms of two categories. Think about your skeptics and your believers. When you set a scale, ask yourself are the skeptics going to believe this? Are they going to buy it, are they going to think that this is an honest representation of the data? And the same thing on the believer's side. Make sure that when you present data in a certain scale, you're not just reinforcing the believers, you're not just giving them the data story that they want, make sure that it's a really valid, accurate representation of the data that you're sharing. Use your powers of scaling for good, not evil.
Legends and sources
- In this video I'm going to talk about the necessity for clear legends, so that users understand what they're looking at in your visualizations. And also the importance of including your sources. These details can make or break an information design or data visualization project. Legends, sometimes called keys, are those explanatory queues that are often found in the bottom right-hand corner of a chart, and so even when you create a chart in Excel it sort of puts them over there on the right-hand side, it helps the viewer understand what they're looking at. In the most basic form of a chart, like this one here, where everything is labeled, there's really only one thing, I know how many Snoods each one of my sWhatchamacallit has, and Whosawhatsits, et cetera, I don't really actually need a legend in this case, it's one of those rare times when I don't, but for the most part, you're going to need legends, especially if you start adding weird shapes or colors, or, more data, more layers of data, in this case I need to know what the Tally Hos are, versus the Hither Tos and the Be Bops.
If we don't include a legend, if I can't understand what these different colors mean, then I'm creating art, it's not pretty art, but it is just a picture, there's no knowledge, there's no value here. But my job is to inform people, not just to create pictures. Better than a legend or a key that's off to the side, by the way, is in-line labeling. If you can include your labels identifying color values, shapes, and the like in this way, you're helping your audience keep their eye on the data, which will always be better than forcing the tennis match watching, back and forth, swivel head behavior a legend requires.
Zen Buddhism has a concept called beginner's mind. And the idea is that you should come to everything with an open mind, without any preconceptions. This can be hard to do when you're doing data visualizations, when you're working on something for a long time, it's very detail-oriented, you know everything about it by the time you're done, and trying to think of it as someone who knows nothing about it can be difficult. But if you can channel a child's mind, or a novice's mind in the topic area that you're discussing, if you can get to a point where you have great user empathy for people who know nothing about what you're showing them, then that will help you understand what's hard for them to understand, which will help you figure out what you should include in the legend. Sometimes you need more than just a legend, especially in interactive graphics, so this hospital pricing visualization that I shared earlier, there's a lot going on here, there's first of all a ton of data, there's different colors, there are these different-sized bars, there are two different types of bars, if I click into this thing, I get a lot more detail, what do these dots mean, what are these axes, what are these other bars, there's a lot here. And so, no little legend or some labeling on this screen is going to do me justice to understand it. Even if I was fairly knowledgeable about this topic area. So what I always do, especially for interactive graphics, is I actually create an entire how-to, I'll take a screenshot of the interface, and I'll look at it, again with that beginner's mind, what might I not understand, and I will draw little lines, and I will label everything, I won't leave it up to anyone's imagination, what's going on here, make it very, very clear of what all the details and shapes and colors represent. The other thing that I always include in screens like this, usually at the bottom, are the notes, sources, sometimes it's just about the data, just where the data comes from, oftentimes I'll also include notes on the technology. One of the most important reasons to provide sources is for credibility, so if you're creating a visualization on a topic, especially if it's a controversial one like politics or climate change, then when you provide your sources, it'll allow your users, whether they're believers or skeptics, to look at the data themselves. And this has two advantages, one, credibility, as I mentioned, in that they can disprove your thesis or confirm your thesis, or at least get the sense that they have the opportunity to judge you based on the merits of the data, and not just have to take you at your word for it. And the other reason is that a lot of your users, if they're interested in the topic area, might want to dig deeper into the data themselves, so by providing a resource, you're actually being a good citizen and giving them access to the information that they can go play with on their own. If your mission is to inform your audience, which I would argue it should be, if you're in data visualization, then that's great. If not, then you might want to try another calling.
Data visualization can be a long and complicated process, and when you're finished, the last thing you want to do is all the busywork, going through the details, like the legends and the sources and the how-tos, but it really is as important as everything else that you're doing, don't rush it, don't neglect it, give it the time and effort that it deserves.
The right paradigm: Basic charts
- One of the most difficult things to do when you're starting in data visualization is figuring out which charts to use in which situation. Now eventually you're going to want to push the envelope and try different forms, different chart types, really out there alternative approaches to visualizations. But before that, I'd recommend having a really good handle on the basic charts, and when you might use them. You know the saying, before you can walk you need to learn how to crawl. This video is a high level overview, but hopefully a really good introduction to the topic of when to use which basic chart forms in which situation. These charts, by the way for the most part, allow you to easily display one to three variables. So these are the most basic chart types that I'm going to be talking about in this movie. You have bar charts, line charts, area charts, timelines, scatter plots, bubble charts and pie charts.
These seven forms are very straight forward and basic and for the most part are forms that people naturally understand. So lets start talking about the bar chart or you could call it the boring old bar chart. And the reason it's the boring old bar chart is because it is used so widely, because it is so effective. The fact is that humans have a capacity, a built in capacity to easily parse the differences between these rectangular shapes. We're wired to see this type of chart. So while it is boring and old, the fact of the matter is, it's extremely effective and what I would argue is that anytime you're doing a visualization you should start off by thinking of it as a bar chart and ask yourself, is there a reason that this should not just be a bar chart? It's effective, it's easily understood, you won't be confusing your users, etcetera. Lack of confusion is a good thing.
Now, a bar chart is really great at showing those just one or two variables, you can add more data to it. So here we have what's called a grouped bard chart, so we have two different data points, the gray and the black and it's very easy to understand still. When you start adding more data points, it can start to get a little bit harder to read. In grouped bar charts, this certainly gets into the category of maybe I want to try a different form. And when you have a whole bunch of them, even if they're separated into different groups in this way, with a lot more spacing etcetera, it can get overwhelming quickly although it is still decipherable. If you're trying to make an emphasis on a comparison within groups, if you want to think of the elements of data as part of a whole, like a category, then a stacked bar chart might be a better way to go. Here you can see that the whole bar represents the total value for each group and each segment represents the category value, sort of, the proportion within the group of the data. If you want to emphasize not the total value for each but the relative value, so how much each category influences the total value, then what's called a stacked percentage bar might be a good choice. So here each bar represents 100% of each data point, so each segment within, each color, represents the relative value within the whole as a percentage.
It's often easier to see relationships like this in a stacked percentage chart. If I wanted to show the relative strength of a category within the whole. A bar chart can't convey all types of data. So if you look at these two charts, both are showing changes in values over time. The problem is that the bar chart really only shows each value at a single point. So for instance, maybe this is telling me the change in value of a stock price every January 1st over X number of years. So I get that sort of snapshot of a moment in time. Whereas the line is really great at showing me the trends over continuous time. Line charts are a great idea for a default choice of chart type, when you're showing things over time. Information designers telling content driven stories with time based elements, will often use what I call timelines as a good default paradigm, right? Each one of these dots is on a timeline, at a certain point in time, and might have more information inside of them. An area chart is like a filled in line chart. Line charts are often better than filled line charts like this because where the lines cross it can be hard to see, with the filled in area charts, where the dark gray's covering up the lighter gray and the medium gray behind it. It's hard to tell where the relative values are. I don't know where the bottoms of those troughs are, in the light gray. But this is an interesting way at looking at data. There's also what's called a stacked area chart. Here you have the filled in line charts and they're treated sort of like the stack bars where they're on top of each other. Again, this is good at showing categories of data over time and how they relate to each other. And finally, like the stacked percentage bar chart, we have the stacked percentage area chart, which again, is a really effective way of showing the relative strength of categories across time, or whatever the X access represents, but as a portion of a whole. So again, the top here is 100%. Another great chart type for showing two variables is called a scatter plot. Scatter plots are great at showing correlation. So here you can see that as X increases, as things get further to the right, Y also increases. They also tend to go further up. You could have shown this data in a bar chart but you wouldn't as easily see the correlations. This one shows what's called a positive correlation, where as things go up on one axis, they go up in the other axis. This one has a negative correlation. As one increases, the other decreases. Here is a scatter plot with no discernible correlation but there are some interesting patterns and certain types of patterns will show up much better in a scatter plot than in a bar chart for instance. Bubble charts are great at showing three variables. It's really just a scatter plot, but now we have a third variable. The size of the dot is representing that third variable. And so we can add more interesting layers to the data by looking at it this way. And once again, in this example, we have correlation. As X goes up, so does the size of the dot, generally. But again it's easy to see the outliers. So we have that one giant dot over on the left hand side, that's something worth investigating, it sort of, it bucks the trend and it's very immediately visible.
Lastly we're going to look at the pie chart and you can spend a lot of time reading about the intense debate about how worthless the pie chart is. There are plenty of detractors of this form and a few defenders, but really, to my eye, the pie chart has two major problems. One is that, it's really hard to parse when there are more than a couple of data points.
So here we have two, four, six different pieces of data and it's a little bit hard to parse. I mean I can certainly see the smallest slice and the biggest slice, but three of the medium slices, I can't tell really anything about them. And that gets to the second point, is that the pie chart is really bad at showing slight variance between data points. So those two top wedges look almost identical in size. Again, human eyes, human brains, have a hard time parsing circular shapes and arcs, whereas if these were bar charts,
I could probably immediately see the difference between those two data points. But with all that said, the pie chart actually is pretty effective at comparing the difference between two data points and especially if it's just one variable. So if the point is really just to show that this is a lot more than that or that this is pretty much the same as that, then I would say that the pie chart works pretty well. So these are the most basic chart forms. I'm sure you are already familiar with all of them or certainly most of them.
Hopefully this video has helped you understand specifically when to use each form.
Nhận xét
Đăng nhận xét