Data-Driven Documents

D3.js is a small, free JavaScript library for manipulating documents based on data.

D3 allows you to bind arbitrary data to a Document Object Model (DOM), and then apply data-driven transformations to the document. As a trivial example, you can use D3 to generate a basic HTML table from an array of numbers. Or, use the same data to create an interactive SVG bar chart with smooth transitions and interaction.

D3 is not a traditional visualization framework. Rather than provide a monolithic system with all the features anyone may ever need, D3 solves only the crux of the problem: efficient manipulation of documents based on data. This gives D3 extraordinary flexibility, exposing the full capabilities of underlying technologies such as CSS3, HTML5 and SVG. It avoids learning a new intermediate proprietary representation. With minimal overhead, D3 is extremely fast, supporting large datasets and dynamic behaviors for interaction and animation. And, for those common needs, D3’s functional style allows code reuse through a diverse collection of optional modules.

Selections

Modifying documents using the native W3C DOM API is indubitably tedious; not only are the method names verbose, but the imperative approach requires manual iteration and bookkeeping of temporary state. For example, to change the text color of paragraph elements:

1 var paragraphs = document.getElementsByTagName("p");
2 for (var i = 0; i < paragraphs.length; i++) {
3   var paragraph = paragraphs.item(i);
4   paragraph.style.setProperty("color", "white", null);
5 }

Instead of manipulating individual nodes, D3 employs a declarative approach, operating on arbitrary sets of nodes called selections. For example, you can rewrite the above loop in 40 characters rather than 190:

1 d3.selectAll("p")
2     .style("color", "white");

Of course, a selection may trivially consist of only a single node:

1 d3.select("body")
2     .style("background-color", "black");

The selector format is defined by the W3C Selectors API, supported natively by modern browsers. Backwards-compatibility for older browsers can be provided by Sizzle. The above examples select nodes by tag name (”p” and “body”, respectively). Elements may be selected using a variety of predicates, including containment, attribute values, and associated class or ID. And in the future, D3 could be extended to support additional selector formats, such as XPath.

D3 provides standard facilities for mutating nodes: setting attributes or styles; registering event listeners; adding, removing or sorting nodes; and changing HTML or text content. These suffice for the vast majority of needs. However, if the underlying DOM API is strictly needed, the nodes in a selection can be accessed directly, as each D3 selection is simply an array of nodes.

Dynamic Properties

Readers familiar with jQuery or Prototype should immediately recognize similarities with D3. However, styles, attributes, and other properties can be specified as functions of data in D3, not just simple constants. Although their appearance is simple, these functions can be surprisingly powerful; the d3.geo.path function, for example, projects geographic coordinates into SVG path data. D3 provides many built-in reusable functions and function factories, such as graphical primitives for area, line and pie charts.

How might you use dynamic properties? To start with a simple functional example, color paragraphs by picking random colors from the rainbow:

1 d3.selectAll("p")
2     .style("color", function() {
3       return "hsl(" + Math.random() * 360 + ",100%,50%)";
4     });

Or use the node index i, provided as the second argument, to alternate shades of gray for even and odd nodes:

1 d3.selectAll("p")
2     .style("color", function(d, i) {
3       return i % 2 ? "#fff" : "#eee";
4     });

D3 also allows you to bind data to a selection; this data is available when computing properties. The data is specified as an array of arbitrary values (whatever you want), and each value to passed as the first argument (d) to property functions. The first element in the data array is passed to the first node in the selection, the second element to the second node, and so on. For example, if you bind an array of numbers to paragraph elements, you can use these numbers to compute dynamic font sizes:

1 d3.selectAll("p")
2     .data([4, 8, 15, 16, 23, 42])
3     .style("font-size", function(d) { return d + "px"; });

Once the data has been bound to the document, you can omit the data operator, and D3 will retrieve the previously-bound data per node. This allows you to recompute properties without explicitly respecifying the associated data.

Enter and Exit

D3 can easily manipulate existing nodes, but what if the nodes don’t exist yet? Similarly, what if there are more nodes in the document than elements in your data array, and you want to remove the surplus? Using the enter and exit selections, you can add new nodes to match your data, and remove nodes that are no longer needed.

When data is bound to a selection of nodes, each element in the data array is paired with the corresponding node in the selection. If there are fewer nodes than data, the extra data elements form the enter selection, which you can instantiate using the enter operator. This operator takes the name of the node to append to the document, such as “p” for paragraph elements:

1 d3.select("body").selectAll("p")
2     .data([4, 8, 15, 16, 23, 42])
3   .enter().append("p")
4     .text(function(d) { return "I'm number " + d + "!"; });

A common pattern is to break the initial selection into three parts: the updating nodes to modify, the entering nodes to add, and the exiting nodes to remove.

 1 // Update…
 2 var p = d3.select("body").selectAll("p")
 3     .data([4, 8, 15, 16, 23, 42])
 4     .text(String);
 5 
 6 // Enter…
 7 p.enter().append("p")
 8     .text(String);
 9 
10 // Exit…
11 p.exit().remove();

By handling these three cases separately, you can perform only the necessary modifications on each set of nodes. This is particularly useful when specifying transitions. For example, with a bar chart you might initialize entering bars using the old scale, and then transition entering bars to the new scale along with the updating and exiting bars. If you want to share dynamic properties across enter and update, you can reselect nodes after instantiating the enter selection, or use the call (mix-in) operator.

Note that the updating nodes are actually the default selection—the result of the data operator. Thus, if you forget about the enter and exit selections, you will automatically select only the elements for which there exists corresponding data.

To reiterate, D3 lets you transform existing documents (node hierarchies) based on data. This generalization includes creating new documents, where the starting selection is the empty node. D3 allows you to change an existing document in response to user interaction, animation over time, or even asynchronous notification from a third-party. A hybrid approach is even possible, where the document is initially rendered on the server, and updated on the client via D3.

Transformation, not Representation

D3 does not provide a new graphical representation—unlike Processing, Raphaël, or Protovis, there is no new vocabulary of marks to learn. Instead, you build directly on standards such as CSS3, HTML5 and SVG. This approach offers numerous advantages. You have full access to the underlying browser’s functionality; for example, you can create elements using D3, and then style them with external stylesheets. You can use advanced filters such as dashed strokes and composite filter effects. If browser makers introduce new features to CSS tomorrow, you’ll be able to use them immediately rather than waiting for a toolkit update. And, if you decide in the future to use a toolkit other than D3, you can take your enhanced knowledge of open standards with you!

Consider the wheel. In Processing, you create a circle using the ellipse operator, which takes four arguments: the x and y of the ellipse center, and the width and height. Raphaël provides an ellipse operator with the same arguments, and a circle operator that takes three arguments using radius. Protovis defines pv.Dot and pv.Wedge mark types. D3, in contrast, does not reinvent the wheel, instead using the standard svg:circle element:

1 svg.append("svg:circle")
2     .attr("cx", 50)
3     .attr("cy", 40)
4     .attr("r", 10);

Because D3 does not specify a particular representation of circle, you can define alternate forms that may offer better performance or compatibility, such as pure HTML:

1 body.append("div")
2     .style("position", "absolute")
3     .style("left", "40px")
4     .style("top", "30px")
5     .style("width", "20px")
6     .style("height", "20px")
7     .style("border-radius", "10px")
8     .style("background-color", "#000");

D3 is easy to debug using the browser’s built-in inspector: the nodes that you manipulate are exactly those that can be inspected natively by the browser. Furthermore, operations are applied immediately (within the scope of operators such as style), and the selection object is an array of nodes.

Transitions

Given D3’s focus on manipulation—not just a one-time mapping of data to a static representation—it naturally includes support for smooth transitions. These are gradual interpolation of styles or attributes over time. Various easing functions are provided to vary tweening, such as “elastic”, “cubic-in-out” and “linear”. D3 knows how to interpolate basic types, such as numbers and numbers embedded within strings (font sizes, path data, etc.). You can provide your own interpolator to extend transitions to more complex properties, such as a backing data structure.

Transitions are trivially created from selections via the transition operator. To fade the background of the page to black, say:

1 d3.select("body").transition()
2     .style("background-color", "black");

You can use CSS3 transitions, too! D3 does not replace the browser’s toolbox, but instead exposes it in a way that is easier to use. A more complex resizing of circles in a symbol map can still be expressed succinctly:

1 d3.selectAll("circle").transition()
2     .duration(750)
3     .delay(function(d, i) { return i * 10; })
4     .attr("r", function(d) { return Math.sqrt(d * scale); });

The transition’s duration and delay parameters can be customized, and as with other properties, specified as functions of data. This is particularly convenient for running a staggered delay by index (i), allowing the viewer to follow individual elements across the transition more easily.

By dirtying only the attributes that actually change during the transition, D3 eliminates any overhead, allowing greater graphical complexity and higher frame rates. Transitions dispatch an event on end that allows sequencing of complex multi-stage transitions.

Subselections

Most documents have some hierarchical structure. For example, what if you wanted to first select all lists, and then select all their list items? By calling selectAll on an existing selection, you generate a subselection for each node:

1 d3.selectAll("ul")
2   .selectAll("li")
3     .text(function(d, i) { return "I'm number " + i + "!"; });

The result of the first selectAll contains all ul elements, while the second contains all li elements that are within ul elements. This results in a simple tree structure that mirrors the document:

subselect

The second selection is grouped according to the first selection: the index (i) for the list items (li elements) corresponds to their index within the list, rather than across all lists. By grouping elements, D3 allows you to maintain the hierarchical structure as you recursively descend into the document.

For example, if your associated data is hierarchical—say a list of multiple choice questions, each with a set of possible responses—you can map the list of questions to the first ul selection, and then each set of responses to the groups in the second li selection. The data property is evaluated for each group of the subselection:

1 d3.selectAll("ul")
2     .data(questions) // an array of questions
3   .selectAll("li")
4     .data(function(d) { return d.responses; }) // a nested array of responses
5     .text(function(d) { return d.text; }); // the text of the response

Thus, the data property can also be defined as a function, taking as an argument the data associated with the parent node. By combining subselection with the enter and exit operators, you can use D3 to construct and update complex hierarchical documents with a minimum amount of code.

Data Keys

With static documents, it often suffices to map data elements to nodes by index. However, if your data changes you may need to rebind new data to existing nodes. In this case you provide a key function to the data operator; data is rebound to nodes by matching string keys on the old and new data. For example:

1 d3.selectAll("ul")
2     .data(data, function(d) { return d.id; });

The key function also determines the enter and exit selections: the new data for which there is no corresponding key in the old data become the enter selection, and the old data for which there is no corresponding key in the new data become the exit selection. The remaining data become the default update selection.

To continue the previous example of a multiple-choice test, here is the skeleton code to update the state of the document to match the array of questions:

1 // Update…
2 var ul = d3.selectAll("ul")
3     .data(data, function(d) { return d.id; });
4 
5 // Enter…
6 ul.enter().append("ul");
7 
8 // Exit…
9 ul.exit().remove();

For more on data keys, see part 2 of the bar chart tutorial.

Modules

D3 is highly extensible, with optional modules available as needed, without bloating the core library. The only required feature of D3 is the selection implementation, along with transitions. For convenience, the default d3.js file also includes standard SVG shape generators and utilities, such as scales and data transformations.

Several additional modules are available that are not included in the default build. The geo module adds support for geographic data, such as translating GeoJSON into SVG path data. The Albers equal-area projection is included in this module, as it is well-suited to choropleth maps. The geom module includes several computational geometry utilities, such as algorithms for Voronoi diagrams and convex hulls. The csv module supports reading and writing comma-separated values, a common alternative to JSON. Lastly, the layout module includes various reusable visualization layouts, such as force-directed graphs, treemaps, and chord diagrams.

Copyright © 2011 Mike Bostock
Fork me on GitHub