Bringing Data to Life

Bringing Data to Life

Building Reusable Data Visualization Components for a Modern Web

This is part one of a two part series discussing why and how you can build your own reusable data visualization components to improve your development workflow and enjoy your work more. This article is aimed at intermediate-level JavaScript data visualization developers with at least a year or two of experience. That being said, novice and expert developers alike will most likely find something useful, and possibly think about approaching their data visualization development in ways that hadn’t previously.

In Part 1 we will be covering what a reusable component is, and why they are helpful for building data visualizations. Part 2 of this series will talk about how to go about actually building these component with plenty of real-world examples and code.

What is a reusable data visualization component, and why should we be concerned with building one? In my experience over the last 15 years designing and developing hundreds of data visualization solutions, I’ve come to realize that following a development pattern of building reusable data visualizations allowed me to enjoy my work more. While I loved the thrill of quickly compositing (read cut-and-paste) sample code liberally borrowed from around the web to build a visualization, the refinement process would always become progressively more painful and tedious. At the same time, I didn’t want to sit down and write copious amounts of generic code just to prototype my idea.

I wanted to focus on the fun stuff: innovation, novel designs, and making cool things quickly. But, besides enjoying my work more, there are more fundamental and pragmatic reasons to consider developing data visualizations for reusability.

What is a reusable component?

At a high level, a reusable data visualization component is an encapsulated unit of code that exposes a defined interface allowing a developer to pass in parameters, call methods, and respond to events to achieve a specific visual and functional outcome.

For instance, with a reusable chart a developer could simply reference the required script, instantiate the chart, and pass it the appropriate data and configuration parameters. The developer would not have to worry about any of the implementation details, just how they wanted the chart to look and behave with respect to their data. Why is this better than simply cutting and pasting some code snippets into our app? Here are a few (but not all) of the benefits for developing with reusability in mind.

  • Encapsulation: All the code relevant to the function of the component resides within that component while application-implementation-specific details can be abstracted out. Why is this helpful? This follows the computer science principle of Separation of Concerns, which is a long way of saying it helps keep you focused on the code for the task at hand. By physically separating code into logical groupings you can be better organized when it comes to your development. Mixing your application logic with your component implementation logic can turn into a real headache down the line as complexity and intermingled code become more difficult to manage.
  • Portability: A well-designed component with the proper levels of abstraction can be used in multiple places within your application or across a wide variety of projects while only having to maintain a single codebase. So, if your application requires a custom bar chart in 10 different locations, you don’t have to copy and paste code to each of those locations, you can simply reference the one bar chart. In the same vein, the latest and greatest visualization you just built for project A could most likely be used on project B with minimal or no modifications. With this approach, if you want to make a modification to the component code, or fix a bug, you are generally just fixing it one place, not the 10 other places you copied and pasted that code.
  • Playing Nicely with Others: With a well thought out component API, you can more easily share your work with other developers who will only need to understand how to implement your API versus the inner details of how your component works. So, working with a team of engineers it is easier to have one or more developers focused specifically on creating data visualizations that will then be integrated into the application by other developers.
  • Better Code: Building an encapsulated and reusable component makes you think about your code a little more than just whipping together a prototype. It forces you to think about not only the purpose of the visualization, but how to better abstract it and create the logic for it. Once you have done it a few times, patterns start to emerge and evolve in your data visualization work that end up making you a more efficient developer.
  • Development Speed: When done well, it can actually be faster to build a reusable component than a prototype. The more complex a component is, the more important it is to be clear in your own mind about the purpose, place, and flow of your visualization logic. With an appropriate component lifecycle pattern, you can more finely break down the common functional tasks it takes to render a visualization, which will make it easier to modify and extend your component as you will be able to more quickly drill into a functional area of your code knowing where to look.

I realize that, when you read all of what I describe above about building reusable components, nothing about this seems fast or easy compared to copying and pasting code snippets, but bear with me and let explain further.

When I first started working with D3.js one of the things I loved were ALL of the great examples I could find online with the accompanying source code. I would have an idea for a visualization and then hunt across the web for various examples that embodied one or more features or techniques I wanted to employ. I would grab this sample code and start merging it together into a functional but Frankenstein-like hairball of code. As I would iterate on the design/functionality, I would refactor the code as needed, trying to make it easier to understand and work on. But inevitably, as the prototype evolved, the code would become more fractured and brittle — never really following a consistent workflow. If it was deemed that the prototype was something to be used in production code I would then have to go through and have to refactor the code to follow more accepted coding best practices. I did this for several years. 🙁

Sometimes when I would get a new project and want to implement something similar to a prototype I had built a year or two prior I would then grab that prototype code — but it looked foreign to me. After spending 30 years programming in dozens of languages and various platforms I have written literally millions of lines of code and trying to remember the intricate details of a one-off prototype from a couple of years prior was challenging to say the least. So I was forced to learn it all over again… not good, and definitely not fast. There had to be a better way.

A few years prior I had written an open source data visualization component framework for Adobe Flex (www.axiis.org). Axiiswas nowhere near as robust or powerful as D3, but it served its purpose and was a great exercise in building a component framework. I also learned a lot about what I did and didn’t like about working within proprietary frameworks.

The two things that I found most irksome on working with more traditional object-oriented component frameworks were:

  • Boilerplate: Having to write lots of repetitive code that performed the same task over and over again — think of public setter/getters that take up at least 10 lines of code (depending on syntax) or more just to set one public property. The same holds true of events, styles, and other common attributes of a reusable component. Code like this is tedious to write and does not focus on the logic unique and specific to the component you are building — just scaffold code needed to support OO design constructs as implemented in the language du jour.
  • Deep Class Hierarchies — Code that follows classic object-oriented principles where objects inherit properties and functionality from parent objects and/or composite functionality from other objects. While, theoretically, object-oriented principles make a lot of sense, pragmatically they aren’t always the most efficient way to develop code or maintain it. Having to debug down several levels of an inheritance chain when trying to fix a bug or just figure out how an object works is not easy, and definitely makes it hard for you to really understand how the component functions in whole. And most of the time, unless the class hierarchies are really well designed, they tend to be brittle and require tweaking when you need to modify some core functionality.

What I really liked about some of these other frameworks was the consistency and repeatability of how the objects were structured in code. Well designed frameworks have a clear and concise object lifecycle from object instantiation, property setting, event capture, measuring, rending, styling, and tear-down. Each component in a framework follows the same general pattern. It makes it a lot easier to debug issues when you know where to go looking for them. Instead of having to sort through a bunch of interdependent classes or lines of code, you can usually get to a specific function or piece of functionality quickly to investigate bugs or change behavior.

So the challenge I set out for myself was to create a reusable component framework specific to data visualization work that would not hinder my speed or creativity when it came to prototyping, while still providing all the benefits of encapsulated reusability and predictable object patterns so the code was more readable and maintainable. Here is a list of the primary functional requirements I wanted to achieve:

Public properties — A straightforward way to create public object properties and default values without having to write a lot of boilerplate setter/getter functions.

Events — The ability for a component builder (me) to declare specific events that a developer using the component can respond to. This would include user interaction events such as ‘mouseover’ or ‘click’, as well as component lifecycle events like ‘measure’, ‘update’, and also property watchers like ‘data changed’, or ‘width changed’.

Dynamic Styles — The ability to apply styles to various display elements from outside the component by using static values OR functions. This is probably one of the most powerful features of D3.js and for data visualizations in general. The ability to call something like the code shown below allows a developer to employ a wide array of data-bound styling techniques that make customizing a visualization for a specific implementation very easy.

viz.style('bar-fill', 'red');
/*** OR ***/
viz.style('bar-fill', function (d) {
return (d.value < 0) ? 'red' : 'green'
});

Common API Convention — Use a proven API convention that developers are already familiar with. In this case, I wanted to use a function chaining pattern that is similar to what you see with D3.js or jQuery. I also preferred explicitly setting configuration properties on a component versus passing in long configuration objects like you see with many other libraries. Function chains allow you to write relatively succinct and self-explanatory procedural code something along the lines of this:

viz.data(myData)
.width(300)
.height(300)
.style("bar-fill","red")
.update();

Code like this is relatively easy to read and understand at a glance, and while it is still procedural code it reads much more like markup.

Object Lifecycle — A consistent set of discrete steps the components uses to perform its functionality that would apply to a wide array of use cases suitable for data visualization. One of the more helpful patterns in data visualization workflow is to separate the measuring/layout of graphical element versus the actual rendering of them. So in the case of a bar chart, we might call viz.update()which prior to executing calls a measuring routine which determines the size, shape, and placement of each bar, and then once that is complete the update function will use these measured values to add/update/delete DOM elements. By doing this, it makes your visualization code much easier to read and understand as layout logic is separated from rendering logic.

All the code in one place — I wanted each component to truly be encapsulated with all the code necessary for the component in ONE file. I find it extremely tedious when in order to modify a component’s functionality I have to edit two or more tightly coupled code files all referring to the same component. Additionally, I wanted to keep the code as succinct as possible while still making it readable. The downside of this is that you don’t want monolithic components with thousands of lines of code, that clearly defeats the purpose of being easy to read and understand. In this case, most components I have built this way are less than 500 lines of code and very easy to manage in one file.

Ability to highly customize — One of the challenges I have found with other charting libraries is that if I want to customize a pre-built chart in a way the original developers didn’t explicitly plan for, it can be very challenging. I either end up having to figure out a clever hack (that will probably break with the next component release) or I am might even have to modify the component source code. For this framework, I wanted to make an assumption that the application developer would want to customize and tweak the component in ways I didn’t anticipate and provide the hooks to do that in a flexible way. I also wanted to make it really easy to modify the source code of any given component should the developer so choose.

Leverage D3.js — I think it would be not only arrogant but a colossal waste of time to think I was going to build something fundamentally better than what Mike Bostock has done with D3.js. It is such a data viz powerhouse and is full of a great set of utilities that make building data visualizations so much easier. While the enter/update/exit data joins D3.js employ are a little unconventional, it really is purpose-built for doing this type of work. At the same time, for more novice JavaScript developers new to D3.js, there can be a bit of a learning curve that takes a while to overcome. I wanted this framework to both work right out of the box for a novice developer while at the same time not be limited by my pre-conceived API and let advanced developers use D3.js to modify and add functionality.

No build tools required — Finally I wanted to make sure developers were not required to use a toolchain to compile or work with this library. Simply reference the library in a <script> tag and you would be off to the races.

In Part 2 of this article, I will walk you through the conceptualization, and the thought process I followed around building a super lightweight framework (less than 500 lines of code) that support the objectives above. With this lightweight framework, I will then show you step by step how you can go about building you’re own easy to read and modify reusable data visualization components. All of the code will be open source (MIT) and free for you to use in your projects.

Comments

Leave a reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

*