Control Flow in Node Part III

While working on my quest to make async programming easier, or at least bearable, I discovered that often in programming you work with a set of data and want to do things on all the items in that set at once.

This article will explain a way to do async filter and map functions where the callback to map or filter is an async operation itself. I will compare the simple task of reading all the files in a directory into memory in both sync and async style programming.

UPDATE This article has been heavily updated to use callbacks and new node APIs. See the past revisions in the panel to the right for the original Promise/Continuable based article.

The Blocking Way

In a synchronous programming language where I/O is blocking, this task is very straightforward and can be done in node as long as you understand the consequences. Node exposes Sync versions of many of it's I/O functions for the special cases where you don't care about performance and would rather have the much easier coding style (like server startup).

For this example we will need three methods from the fs package. We need readdir to get a listing of files in a directory, stat to test the results (we only want files, not directories), and readFile to read the contents to memory.

Solving the problem is very straightforward using sync style coding:

sync-loaddir.js

var fs = require('fs');

// Here is the sync version:
function loaddirSync(path) {
 return fs.readdirSync(path).filter(function (filename) {
   return fs.statSync(filename).isFile();
 }).map(function (filename) {
   return fs.readFileSync(filename);
 });
}

// And it's used like this
console.dir(loaddirSync(__dirname));

Since the commands are sync we are able to use the built in filter and map from Array.prototype on the array returned by fs.readdirSync.

This is extremely easy to code, but has dangerous side effects. The program waits while waiting for the blocking fs operations to finish. Since CPUs are very fast compared to other hardware (like hard-drives), then the cpu is wasted when it could be busy working on requests for another client if this was part of a hot running event loop.

Obviously this is not optimal. Nothing is done in parallel. Many CPU cycles are wasted.

The Non-Blocking Way

They say that in computer science there is always a give and take when comparing different algorithms. The pro to synchronous coding style is that it's very easy to read and write. The con is that it's very inefficient. That's why most programming languages need threads to achieve any level of concurrency, but node is able to do quite a bit on a single threaded platform.

(Yes I'm aware of coroutines, but in JavaScript where everything is so mutable, they don't work well and are about the same as multi-threading complexity wise. See the archives for information on Node's experiment with this idea)

To make the comparison simple, I'll do the same thing, but using non-blocking apis and callbacks. An initial implementation of our loaddir function would be this:

async-loaddir.js

var fs = require('fs');

// Here is the async version without helpers
function loaddir(path, callback) {
  fs.readdir(path, function (err, filenames) {
    if (err) { callback(err); return; }
    var realfiles = [];
    var count = filenames.length;
    filenames.forEach(function (filename) {
      fs.stat(filename, function (err, stat) {
        if (err) { callback(err); return; }
        if (stat.isFile()) {
          realfiles.push(filename);
        }
        count--;
        if (count === 0) {
          var results = [];
          realfiles.forEach(function (filename) {
            fs.readFile(filename, function (err, data) {
              if (err) { callback(err); return; }
              results.push(data);
              if (results.length === realfiles.length) {
                callback(null, results);
              };
            });
          });
        }
      });
    });
  });
}

// And it's used like this
loaddir(__dirname, function (err, result) {
  if (err) throw err;
  console.dir(result);
});

Yikes! That is almost four times as long and indented several times deeper. I know it's a trade-off, but at this point I'm thinking I'll return to Ruby with clusters of servers on the backend to handle concurrency.

Map and Filter Helpers for Async Code

Since map and filter are common tasks in programming and that's what we really want here, let's write some helpers to make this beast of code a little smaller.

Here is a map helper. It takes an array, a filter function, and a callback. The filter function itself it an async function that takes a callback.

helpers.js#map

function map(array, filter, callback) {
  var counter = array.length;
  var new_array = [];
  array.forEach(function (item, index) {
    filter(item, function (err, result) {
      if (err) { callback(err); return; }
      new_array[index] = result;
      counter--;
      if (counter === 0) {
        callback(null, new_array);
      }
    });
  });
}

And here is a filter helper. It works the same, but removes items that don't pass the filter.

helpers.js#filter

function filter(array, filter, callback) { 
  var counter = array.length;
  var valid = {};
  array.forEach(function (item, index) {
    filter(item, function (err, result) {
      if (err) { callback(err); return; }
      valid[index] = result;
      counter--;
      if (counter === 0) {
        var results = [];
        array.forEach(function (item, index) {
          if (valid[index]) {
            results.push(item);
          }
        });
        callback(null, results);
      }
    });
  });
}

Now with our helpers, let's try the async version again to see how much shorter we can make it:

async-loaddir2.js

var fs = require('fs'),
    helpers = require('./helpers');

// Here is the async version with filter and map helpers:
function loaddir(path, callback) {
  fs.readdir(path, function (err, filenames) {
    if (err) { callback(err); return; }
    helpers.filter(filenames, function (filename, done) {
      fs.stat(filename, function (err, stat) {
        if (err) { done(err); return; }
        done(null, stat.isFile());
      });
    }, function (err, filenames) {
      if (err) { callback(err); return; }
      helpers.map(filenames, fs.readFile, callback);
    });
  });
}

// And it's used like this
loaddir(__dirname, function (err, result) {
  if (err) throw err;
  console.dir(result);
});

That code is much shorter and easier to read. Since fs.readFile and our callback are themselves async functions following the node convention, we can use them directly as the second and third arguments to the helpers.map call. There is benefit in this common pattern.

Also, now that the code is executing in parallel, we can issue a stat call for all the files in a directory at once and then collect the results as they come in. But with this version, not a single readFile can execute until all the stat calls finish. In an ideal world, the program would start reading the file as soon as it knows it's a file and not a directory.

Combined Filter and Map Helper

Often you will want to filter and then map on the same data set. Let's make a combined filterMap helper and see how it helps:

helpers.js#filtermap

function filterMap(array, filter, callback) {
  var counter = array.length;
  var new_array = [];
  array.forEach(function (item, index) {
    filter(item, function (err, result) {
      if (err) { callback(err); return; }
      new_array[index] = result;
      counter--;
      if (counter === 0) {
        new_array.length = array.length;
        callback(null, new_array.filter(function (item) {
          return typeof item !== 'undefined';
        }));
      }
    });
  });
}

Now with this combined helper, let's write a truly parallel loaddir function:

async-loaddir3.js

var fs = require('fs'),
    helpers = require('./helpers');

// Here is the async version with a combined filter and map helper:
function loaddir(path, callback) {
  fs.readdir(path, function (err, filenames) {
    if (err) { callback(err); return; }
    helpers.filterMap(filenames, function (filename, done) {
      fs.stat(filename, function (err, stat) {
        if (err) { done(err); return; }
        if (stat.isFile()) {
          fs.readFile(filename, done);
          return;
        }
        done();
      });
    }, callback);
  });
}

// And it's used like this
loaddir(__dirname, function (err, result) {
  if (err) throw err;
  console.dir(result);
});

Here we will issue all the stat commands at once, and as they come back, check to see if it's a file and if so, then fire off the readFile command right away. If not we'll output the result of undefined signifying to filter_map that we're not interested in that entry. When the readFile command comes back we'll send the file contents to the helper. When all the items have either sent undefined or some text, then the helper knows it's done and gives us the result.

Conclusion and Source Code

While it is a tradeoff in code complexity vs performance, with a little thinking and some good libraries, we can make async programming manageable enough to be understandable while taking full advantage of the parallel nature of non-blocking IO in node.

All source code used in these examples is linked to on the right side of the page or in the upper-right corner of the code snippets.

UDATE I've since made a general purpose callback library called Step. While it doesn't include map and filter helpers, it does have the more useful parallel and group helpers.

Control Flow in Node Part III

The Blocking Way

The Non-Blocking Way

Map and Filter Helpers for Async Code

Combined Filter and Map Helper

Conclusion and Source Code

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112