Wednesday 31st May, 2017

Iterators in DataTables

By its fundamental nature DataTables operates with repeating data. As developers we naturally want to be able to access and work with that data, processing each item one at a time or manipulating them in some way. This we term iterating or looping over the items. DataTables has a number of built in iteration methods, but if you are new to the DataTables API, it isn't always necessarily obvious which method you should use.

In this post I want to take some time delve into the API and explain the iterators that are built into DataTables and when it is best to use each one. I'll also discuss if there are any penalties for using each iterator (hint - yes, there is usually a trade-off between ease of use and performance).

There are three main iterators in DataTables:

rows().every(), columns().every() and cells().every() - grouped together as they all share the same basic behaviour.
each()
iterator()

There are also a number of helper methods which can make accessing data from a repeating array like structure easier, such as reduce() and map(). These will be covered at the end, but in less depth.

Fundamentals

The DataTables API is array-like in nature - i.e. it looks a lot like a Javascript array. It has a length property, the slice() and push() methods that you would normally associate with an array - indeed it actually makes use of many of the array functions built into Javascript; you can even use a for loop to iterate over it, but it is not actually an array. It is an instance of a class (DataTables.Api) which looks and acts a lot like an array.

Its important to keep the above in mind as we discuss the iteration methods, as this is key to DataTables and how its API works (along with the concept of chaining).

Core iteration methods

The iteration method should use will depend upon the operation or data retrieval that you actually want to perform on the table. As a quick summary:

rows().every(), columns().every() and cells().every() - Use when you want to perform an API method on each selected item, but there is a performance penalty.
each() - Iterator for any data type that is not a row, column or cell (i.e. use for DOM elements and data)
iterator() - Low level API, but carries a significant development overhead.

*.every methods

You might have noticed already that the *.every() method is always shown chained onto one of the table item access API methods (rows().every(), columns().every() and cells().every()) rather than simply being described as every() like each() and the other utilities methods.

The actual operation of each *.every() method is specific to the item access method being used in how they perform and the parameters passed into them. They provide an easy way to access a DataTables API object for each of the items selected, in the scope of that object.

Let's use an example to illustrate: you want to use row().child.show() on every row in a table to make all child rows visible. There is no plural version of that method, so you need to access each row in turn and call that method. You might be tempted to do something like:

var nodes = table.rows().nodes();
for ( var i=0, ien=nodes.length ; i<ien ; i++ ) {
    table.row( nodes[i] ).child.show();
}

This will work but it is cumbersome since you need to select all rows, loop over them, select each individual row and then perform the required operation on them.

The *.every() methods will automatically change the scope of the inner function to be a DataTable API instance for the item in question. This means that you can use this in place of table.row( ... ) in the above - i.e. the selection is already done for you. Thus we can rewrite the above as:

table.rows().every( {
    this.child.show();
} );

Much easier to read and far more concise!

This works equally for the columns() and cells() methods as well, but cannot be used for any other data type the API works with. You cannot use rows().data().every() for example - the whole reason for the *.every() functions is to make it easy to access an API instance for the item, which isn't a valid thing to do on the other data types.

Now the downside - because of the context switching the *.every() methods are the slowest iterators in DataTables. A new API instance must be created for every item and the function context switched to that instance. While this is fast in modern browsers, on large data sets it can still be noticeable. As such these methods shouldn't be used were maximum performance is required (for example a drag event). However, their ease of use can't be understated and they should be used in the majority of cases (click events for example).

each() method

Where the *.every() methods don't go, the each() method does! One of the limitations of *.every() is that it can only be chained to the three plural item selector methods as mentioned above. For all other data types that the DataTables API can carry we use the each() to access the data in the instance.

Many of you will be familiar with the jQuery.each method which can be used to iterate over objects and arrays - DataTables' own each() method is basically the same: a callback function is called once for every item that is in the instance, allowing it to be operated on:

table
    .rows()
    .nodes()
    .each( function( value, index, api ) {
        $( value ).addClass( 'loopy' );
    } );

In this case we use rows().nodes() to get all of the row nodes in the table (tr elements) and then loop over them adding a class to each using jQuery.

This iterator method is very fast since it doesn't require a new API instance to be created every time the callback function is executed. Having said that it still isn't as fast as a traditional for loop. The above could be written as the following if this is in a performance critical area of your code:

var nodes = table.rows().nodes();

for ( var i=0, ien=nodes.length ; i<ien ; i++ ) {
    $( nodes[i] ).addClass( 'loopy' );
}

It is worth noting that the above is for code demonstration purposes only - jQuery has very good array handling built in (it is array-like as well after all) and we could just pass the array of nodes to jQuery. We could also use the to$() method to convert the DataTables API instance to a jQuery one:

table.rows().nodes().to$().addClass( 'loopy' );

iterator() method

The iterator() method is similar to the *.every() methods in that it can be used to access item information from the table. But in this case it doesn't create a new API instance and doesn't change the callback function's scope. Instead it simply provides an index to the item in DataTables' internal data store - settings().

This is where things can get a little hairy. If you review the settings() documentation you will note that it is strongly recommended that you do not use it! The information in the settings object is considered to be a non-public API and the parameter names can change between versions without warning. However, any discussion of iterators in DataTables would be seriously lacking if it didn't mention the iterator() method.

The primary advantage of the iterator() method is that it is fast. It is a simple loop with a callback function and an index to look data up with. It is used by the API methods that DataTables provides, and if you are writing code that has to perform extremely well on the micro level, this is how you would do it.

I would strongly recommend that any use of iterator() be written inside a custom API method so if you do access any internal information that needs to be changed between versions, you have a single point of change only. In general it is best to stick to the *.every() methods when accessing the table items in "user space"!

Utility methods

WHile the discussion above covers the most commonly used iteration methods in DataTables there are a number of other methods that are worth highlighting:

map()

Very similar to the each() method (and also its jQuery cousin) the map() method is used to create one array from another. Consider for example if we had an array of objects and we want to exact a single data point from the objects - e.g. a totalCost parameter:

var totalCosts = table
    .rows()
    .data()
    .map( function ( data ) {
        return data.totalCost;
    } );

pluck()

Its not uncommon to want to do the above - in fact it is known as "plucking data" and DataTables provides a utility method to extract a single data point from an object with ease: pluck(). With this method we can write the above as:

var totalCosts = table.rows().data().pluck( 'totalCost' );

It is worth noting that although pluck() is useful for accessing a single data point, it can't be used to build more complex data objects. Nor can be used to access nested data. For that the map() method would still come into play!

reduce()

The reduce() method is used to transform an array into a single scalar value. Typically this is used for summation calculations in a DataTable, although you could use it for any other calculation that requires the dataset to be reduced to a single value. Continuing our example of totalCost, if we want to sum that value we could use:

var totalCost = table
    .rows()
    .data()
    .pluck( 'totalCost' )
    .reduce( function ( a, b ) {
        return a + b;
    }, 0 );

toArray() and to$()

While the DataTables API instance is useful and provides a number of utility methods, there are times when you might just want a plain array of data (for example when you JSON.stringify() it) or you want to convert the DataTables API instance to be a jQuery object so you can use the jQuery API (typically when working with rows().nodes(), column().nodes() or cells().nodes()).

var data = table.rows().data();
JSON.stringify( data );

unique()

Finally in this discussion there is the unique() method. This quite simply takes a result set and removes any duplicate values:

table
    .rows()
    .data()
    .pluck( 'lastName' )
    .unique();

It is worth highlighting that the DataTables API isn't trying to duplicate the full functionality of a utility library such as Underscore or Lodash for array manipulation. The utility methods available are those which are most useful when working with a table of data. toArray() can be used to convert the data to a plain array if you do want to use the more complex methods of those libraries.

Conclusion

There is no single best iterator for DataTables that you should stick to. Instead you will find that you need to make use of the various methods available, selecting which one to use based upon the data you are working with and what operation you want to perform. Hopefully this article will help shed some light on when to use each one.

It has been quite a technical and theory based post this month - I'll return to something practical in June!