|
This is the sixth in a series of editorials covering all aspects of good science writing. Figures are an extremely important part of any scientific publication. It is a rare paper that contains no figures (such papers are mostly of the theoretical variety, though even a pure theory paper often benefits from a good graph). As the renowned guru of graphics Edward Tufte put it, “At their best, graphics are instruments for reasoning about quantitative information.”1 Since almost all scientific publications include quantitative information to be reasoned about, figures are almost always called for. As a form of communication, figures (and in particular, the graphical display of quantitative data) are uniquely suited to conveying information from complex data sets quickly and effectively. While statistical analysis aims for data reduction, expressing a mass of data by a few simple metrics, graphing retains the full information of the data. Graphs take advantage of the magnificent power of the human brain to recognize visual/spatial patterns and to quickly change focus from the big picture to small details. Graphs are used for data analysis2 and for data communication, though only the later application will be discussed here. Graphs are extremely popular in scientific literature3 for the simple reason that they work so well. But like all forms of communication, graphics can be used to explain and clarify but also to confuse or deceive. Thus, the first rule of graphics is a simple one: they must help to reveal the truth. Just as disorganized writing often indicates disorganized thinking, a chart that fails to tell the story of the data usually means the author does not recognize what story should be told. Thus, sufficient care should be given to the design and execution of graphics, just as in the design and execution of the written paper itself. What does a graph aim to do? Here are some of the more important goals of using a graphic for communication in a scientific publication:
The first choice in designing a graphic is what data to present. “Displays of evidence implicitly but powerfully define the scope of the relevant, as presented data are selected from a larger pool of material. Like magicians, chartmakers reveal what they choose to reveal.”4 Thus, this first choice is probably the most important since it defines what the graph (and the paper) will and will not be about. Graphs should communicate the essence of the results from the paper and not get bogged down in detail. The design of the graph itself should be driven by the structure in the data, and what story the data has to tell. Since most graphics make comparisons (theory to experiment, condition A to condition B, etc.), deciding on the comparison to display defines the arc of the plot that unfolds. There is a fine line, however, between allowing the data to speak for itself and forcing the story you want to tell. Well-presented data should encourage the consideration of alternate explanations, not just your preferred explanation. Overall, the process of creating a graphical display follows these basic steps:5 choose the data to be presented, define the message to be conveyed, pick a style of graph that supports the message, construct the graph seeking clarity, then revise it until it is right. As Tufte has pointed out,6 the design and execution of a graphic are not unlike the overall scientific enterprise. We are searching for a quantitative and demonstrable cause and effect mechanism, and we use scientific reasoning about quantitative evidence to lead us there. Since science is about building models that describe our experiences, graphs should aid in finding and evaluating these models. 1.Errors in GraphsGiven the complexities involved in graphing large data sets, there are many ways for errors to creep in. Still, I was very surprised to read in a study by William S. Cleveland that 30% of all graphs published in volume 207 of Science (1980) contained errors.3 The error types he found were classified as mistakes of construction (mislabels, wrong tick marks or scales, missing items: 6% of graphs), poor reproduction (with some aspect of the graph missing as a result: 6% of graphs), poor discrimination (items such as symbol types and line styles could not be distinguished: 10% of graphs), and poor explanation (something on the graph is not explained, neither in the caption nor the text: 15% of graphs). This total, by the way, only included graphs with actual errors, not graphs that were merely poor at performing the function of communication (of which there were many more, according to Cleveland). Since 1980, a lot about the process of producing graphs has changed. It is likely that ubiquitous computing and graphing software has diminished the frequency of some error types. But while such tools can make producing quality graphs much faster and easier, they also make it easier to produce bad graphs. Since the most common type of error, incomplete explanation of what is on the graph, is outside the technical process of producing the graph itself, it is doubtful that our software tools have helped much with this error type. Unfortunately, I am forced to admit that Cleveland’s 30% error rate is probably not too different from today’s performance. 2.Graphical IntegrityAs with every aspect of science writing, integrity plays a key role in designing and executing figures and tables. A graph is a powerful tool for communicating, and one must choose to communicate truth rather than falsehood. Tufte suggests these questions as a test for graphical integrity:7
To these I would add three more:
This last question is part of the overriding ethic of scientific publishing: For a result to be scientific, and contribute to the body of scientific knowledge, it must be described sufficiently so that it could be reproduced by others. As a straightforward example, any graph that does not numerically label its axes cannot be published (and unfortunately, we sometimes get those graphs submitted to JM3). Working to ensure both graphical integrity and low error rates in the execution of a graph will greatly enhance the ability of the graph to meet its goals and the goals of the paper. A well-written paper with poor graphs will never be remembered as a well-written paper. 3.A Few GuidelinesGraphs come in an extremely wide variety of types, a testament to the innovations from the last two centuries of chart making. Still, rapid communication is generally best served using one of several familiar chart types, since familiarity speeds cognition. The overriding principles of design should be to seek clarity and avoid clutter.8 With that in mind, here are some miscellaneous guidelines for good graphics that might prove useful on different occasions:
I’m sure that there are many more tidbits of advice that would be valuable to share, but these are the first that come to mind. I’d be interested in hearing from the readers of JM3 about their experiences, good and bad, with graphs. 4.Figures and Tables in JM3How are graphs used in our journal, JM3? The table below shows my counts of figures and tables found in the 2012 issues of JM3. The graph types I used are somewhat arbitrary (as all categories are), but hopefully useful. JM3 papers in 2012 had an average of 19 figures and one table per paper, attesting to the importance of figures in our field. About 20% of the figures were used to explain the theory or experimental setup, and the rest showed results. By far the most common figure was the ubiquitous x-y plot, accounting for 1/3 of all figures and tables. Results micrographs (optical and scanning electron micrographs, as well as atomic force microscope renderings) made up 25% of the figures. Contour and 3-D plots were used about 10% of the time, with other types of charts filling in the remainder. While I made no attempt to rate or judge the quality of the figures, it was clear to me from my survey that there were many excellent examples of figures and tables in all categories. There were some poor ones as well. I hope this editorial will spur attention to the difficult process of building quality graphs and that the JM3 figure quality will improve over time. Table 1Figure and table counts for JM3 papers published in 2012.
As an exercise, I rendered the data from the “Results” figures of the above table into a variety of bar charts (see Figure 1). Most of them fail the test of staying “on message.” The first four draw attention to the variations between issues, either in actual numbers or in percentages, though the per-issue variation is not important to my story here. The last two correctly keep the emphasis on the relative frequency of each figure type. But then, they don’t do a better job of conveying the message compared to the table, and the table is far more rich and dense with information (and has the added benefit of documenting the data better). This conclusion is quite frequently true of bar charts: a table would be better. 5.ConclusionsThey say a picture is worth a thousand words. In a scientific journal, each figure occupies the space of anywhere from 150 to 500 words. So at the very least, a figure should convey more information than the words it displaces. Otherwise, valuable space has been wasted. A good graph can certainly do that, though not all figures do. As the abstract artist Ad Reinhardt so aptly put it, “As for a picture, if it isn’t worth a thousand words, the hell with it.” Next time I’ll focus on how to make the most of one specific graph type: the ever-popular x-y scatter plot. ReferencesEdward R. Tufte, The Visual Display of Quantitative Information, 6 Graphics Press, Cheshire, Connecticut
(1983). Google Scholar
John W. Tukey, Exploratory Data Analysis, Addison-Wesley, Reading, MA
(1977). Google Scholar
William S. Cleveland,
“Graphs in Scientific Publications,”
The American Statistician, 38
(4), 261
–269
(1984). http://dx.doi.org/10.1080/00031305.1984.10483223 ASTAAJ 0003-1305 Google Scholar
Edward R. Tufte, Visual Explanations, 43 Graphics Press, Cheshire, Connecticut
(1997). Google Scholar
Marcin Kozak,
“Basic principles of graphing data,”
Sci. Agric., 67
(4), 483
–494
(2010). http://dx.doi.org/10.1590/S0103-90162010000400017 0103-9016 Google Scholar
Edward R. Tufte, Visual Explanations, 53 Graphics Press, Cheshire, Connecticut
(1997). Google Scholar
Ibid., 70 Google Scholar
William S. Cleveland, The Elements of Graphing Data, Wadsworth & Brooks/Cole, Pacific Grove, California
(1985). Google Scholar
Edward R. Tufte, The Visual Display of Quantitative Information, 77 Graphics Press, Cheshire, Connecticut
(1983). Google Scholar
Ibid., 168 Google Scholar
William S. Cleveland, The Elements of Graphing Data, 57 Wadsworth & Brooks/Cole, Pacific Grove, California
(1985). Google Scholar
|