Pages

Wednesday, November 25, 2009

Joining Promises for Parallel RPCs in Node.JS

I've been playing around a bit more with Node.JS since my last post and I decided to experiment with the asynchronous process.Promise object this time around. In other languages I believe this concept is sometimes referred to as a future.

Diving in, suppose the following:

  • You're writing a blogging engine.
  • Blog Posts are kept in one data store, and Comments are in another.
  • A request for a Post object takes 1 second to return.
  • A request for a list of Comments on a Post takes 2 seconds to return (the comments data store is run by a bunch of slackers who don't care about latency)
  • You want to have /posts/{postId} return an html page that renders both a Post and all all the Comment objects on it.

When you make an RPC (or any I/O call) in Node.JS you should wrap it in a Promise so the process doesn't block on your HTTP request.

So our getPost and getComments RPCs (faked out) look like this:

var getCommentsPromise = function(postId) {
  var promise = new process.Promise();
  var comments = ["Comment 1 on " + postId, "Comment 2 on " + postId];
  setTimeout(function() { promise.emitSuccess(comments); }, 2000);
  return promise;
}

var getPostPromise = function(postId) {
  var promise = new process.Promise();
  setTimeout(function() { promise.emitSuccess({title: "Post Title " + postId, body: "Post Body " + postId}); }, 1000);
  return promise;
}

Now, if all you had to render on /posts/{postId} was the Post object and not the comments, you could just put the rendering code inside the handler for the Post RPC and be done with it, like so (building on the URI template router from my last post):
var handlers = {
  '/posts/{postId}' : {
      GET : function(request, response, args) {
        var postPromise = getPostPromise(postId);

        postPromise.addCallback(function(post) {
          templateVars.post = post;
          var pageHtml = postTemplate(templateVars);
          response.sendBody(pageHtml);
          response.finish();
        });
      }
    }
  }
}


But life is never that simple, and /posts/{postId} has to make two RPCs to get the data required to render a page. This is complicated because you can't render the page until both RPCs are complete.

There are at least two ways to deal with this situation. One sucks and the other doesn't suck as much.

Teh Suck: Serialize the RPCs, then render.

You can serialize the RPCs by nesting the call to the second one inside the handler for the first:

'/slowposts/{postId}' : {
      GET : function(request, response, args) {
        response.sendHeader(200, {"Content-Type": "text/html"});
        var postPromise = getPostPromise(postId);
        postPromise.addCallback(function(post) {
          var commentsPromise = getCommentsPromise(postId);
          commentsPromise.addCallback(function(comments) {
            var postTemplate = tmpl['post-template.html'];            
            var pageHtml = postTemplate({'post': post, 'comments': comments});
            response.sendBody(pageHtml);
            response.finish();
          });
        });
      }
    }
  }

This takes 3 seconds to complete: 2 for fetching comments, then 1 more for fetching the post.

Teh Not So Suck: Parallelize the RPCs, join them and render when the join is complete.

'/fasterposts/{postId}' : {
      GET : function(request, response, args) {
        response.sendHeader(200, {"Content-Type": "text/html"});
        var commentsPromise = getCommentsPromise(args.postId);
        var postPromise = getPostPromise(args.postId);
        var templateVars = {};
        commentsPromise.addCallback(function(comments) {
          templateVars.comments = comments;
        });
        postPromise.addCallback(function(post) {
          templateVars.post = post;
        })

        var joinedPromise = join([commentsPromise, postPromise]);
        
        joinedPromise.addCallback(function() {
          var postTemplate = tmpl['post-template.html'];            
          var pageHtml = postTemplate(templateVars);
          response.sendBody(pageHtml);
          response.finish();
        });
      }
    }
  }

This takes 2 seconds to complete since the RPCs are made in parallel, and the total time is just the slowest RPC (2 for fetching comments).

This method takes a special function, join(), to make it work.   join takes a bunch of promise objects and returns another promise that fires once all the other promises are complete:

function join(promises) {
  var count = promises.length;
  var p = new process.Promise();
  for (var i=0; i<promises.length; i++) {
    promises[i].addCallback(function() {
      if (--count == 0) { p.emitSuccess(); }
    });
  }
  
  return p;
}

Note that this example ignores stuff like errors, which make things even more complicated.  What to do with join when one of the promise objects fires an error instead of success?  Probably a good topic for another post in the future.

Also, I've been using Jed Schmidt's tmpl-node engine to render html in this example. Templating in Node.JS appears to be an active area of debate, but this one works fine for my purposes.

Note that one could also parallelize the rendering of the template as well, so the postPromise handler renders the html for the Post while commentsPromise is fetching/rendering comments. Then the join handler would stitch together the final html.

3 comments:

  1. I've been impressed with node.js ever since I first learned about it. You've outlined the bare bones of a way to parallelize multiple independent promises. Seems like this should be part of node.js (with proper error handling, etc)

    Thanks for the post!!

    ReplyDelete
  2. Hi sembiance! I'm diggin node.js too. Looks like there are some other people thinking about adding this kind of operation to the process.Promise class too:

    http://groups.google.com/group/nodejs/browse_thread/thread/94bfddb1c4376529/c77b7882c0520e46?lnk=gst&q=promise+depend+group#c77b7882c0520e46

    Though I'm not sure what the difference between depend() and group() would be.

    Also, Jed Schmidt had to do something similar for his secure cookie parser: http://github.com/jed/cookie-node/blob/master/cookie-node.js#L2 (he calls it "combine()" - how many names can we give it? :)

    ReplyDelete
  3. How about another name: promise group?

    http://github.com/technoweenie/wheres-waldo/blob/master/lib/promise-group.js

    ReplyDelete