Pages

Thursday, February 25, 2010

PubSubHubhub for NodeJS: Callbacks All the Way Down

NodeJS is a callback-based Javascript server API.

PubSubHubbub is a callback-based web protocol.

I put them together and the result is a PubSubHubbub client for NodeJS:

node-pshb on github


This project only includes a PubSubHubbub client interface, but to me that's the interesting part. You can specify an atom feed url, and functions to call when events happen on that feed due to PubSubHubbub.

The client library takes care of identifying the hub for that feed, requesting a subscription, and listening for subscription confirmation requests and feed updates from the hub.

It provides callback hooks for "subscribed", "update" and "error" events.

A simple client app looks like this:
var callbackPort = 4443;
var subscriber = new pshb.Subscriber(callbackPort);

// Start listening for subscription confirmation callbacks.
subscriber.startCallbackServer(); 

var topicUri = url.parse("http://localhost/foo"); // Dummy feed, always has updates

var feedEvents = subscriber.listen(topicUri);

feedEvents.addListener('subscribed', 
  function(atomFeed) {
    sys.puts('subscribed: ' + atomFeed.id);
  });

feedEvents.addListener('error', 
  function(error) {
    sys.puts('ERROR: ' + error);
  });

feedEvents.addListener('update',
  function(atomFeed) {
    sys.puts('got a PubSubHubub update: ' + atomFeed.id);
  });

I tested this out with the Demo Hub running on a local AppEngine launcher.

The demo app creates a a second server to host a dummy feed on port 80, so http://localhost/foo always returns a feed with the current time as it's "last updated." This is so the test hub always thinks there's an update ready for you.

So start the appengine with the hub running locally (in this demo it's assumed to be on port 8086), then run the test.js app, then go to your hub with your browser and manually update http://localhost/foo.

I noticed that I had to manually run some tasks in the hub's work queue so if you don't see any updates try checking the Task Queues in the app console for the hub. Run any pending "feed-pulls" and "event-delivery" tasks. I imagine there's a way to make them do that automatically but I haven't dug around enough in there to find it.

So there you go, NodeJS and PubSubHubbub: it's callbacks all the way down.

Friday, February 19, 2010

Webfinger Client for Node.JS

In a previous post, I demonstrated how you could use webfinger with nothing more than curl.   This post is about how you can use webfinger from nodejs with a non-blocking webfinger client.

Code for node-webfinger is here on github.

The project contains a simple webfinger-buzz.js command line app that demonstrates the webfinger client. It uses webfinger to find a google buzz feed based on a gmail address, then fetches the updates as an Atom feed, and then prints out the latest entry from that feed.

This could be generalized to support any other webfinger-enabled site like yahoo (though it looks like they're using an older version of XRD which my code can't parse :/).

The webfinger-buzz.js client looks something like this:
var sys = require('sys'),
  http = require("http"),
  url = require("url"),
  atom = require("./lib/atom"),
  webfinger = require('./lib/webfinger-client');

if (process.argv.length < 3) {
  sys.puts("usage: " + process.argv[0] + " " + process.argv[1] + " <user uri>");
  process.exit();
}
 
var userUri =   process.argv[2];
 
sys.puts("fingering " + userUri);

var wf = new webfinger.WebFingerClient();
var fingerPromise = wf.finger(userUri);
fingerPromise.addCallback(function(xrdObj) {
  var statusLinks = xrdObj.getLinksByRel("http://schemas.google.com/g/2010#updates-from");
  var statusUrl = url.parse(statusLinks[0].getAttrValues('href')[0]);
  var httpClient = http.createClient(80, statusUrl.hostname);
  var path = statusUrl.pathname;
  if (statusUrl.search) {
    path += statusUrl.search;
  }
 
  var request = httpClient.request("GET", path, {"host": statusUrl.hostname});
 
  request.addListener('response', function (response) {
    response.setBodyEncoding("utf8");
    var body = "";
    response.addListener("data", function (chunk) {
      body += chunk;
    });
    response.addListener("end", function() {
      var atomParser = new atom.AtomParser(false);
      var atomPromise = atomParser.parse(body);
      atomPromise.addCallback(function(atomFeed) {
        sys.puts("Feed: " + atomFeed.title);
        sys.puts(atomFeed.entries.length + " entries");
        sys.puts("Updated: " + atomFeed.entries[0].updated);
        sys.puts(atomFeed.entries[0].title + ": " + atomFeed.entries[0].summary);
      });
    });
  });
  request.close();
});

hehe fingerPromise. Is that a generalization of pinkySwear?

In the process of writing this webfinger client I used a couple of libraries I found on teh internets: sax-js by Isaac Z. Schlueter - a SAX parser for nodejs, and this URI Template library by James Snell. Both worked well and I recommend them.

The remaining non-webfinger-specific pieces I needed were an XRD parser and an Atom parser, both for javascript and SAX (as opposed to DOM). I couldn't find much in the way of those, so I rolled my own. They are included in the node-webfinger project in the lib/ directory. They're pretty crude parsers but they worked for this example. I'll probably use them in other projects in the future and make improvements as necessary. Unless something better comes along. That seems inevitable.

Monday, February 15, 2010

Bring the (Perlin) Noise

If you just want the source code: The JavaScript Perlin noise generator code is here.

I mentioned in a previous post that I was working on a Perlin noise generator for Art Evolver.

Perlin noise is a function of (x, y) that produces a random-ish pattern. It's not completely random because it has smooth hills and valleys, but the distribution of those hills and valleys is random.

A side note about this algorithm: Usually if you're a computer scientist and you come up with a clever algorithm to solve a particular problem, you get an award from a university, or a CS-centric professional organization like the ACM or IEEE. Perlin got an Academy Award for this noise function. As in, the Oscar kind of Academy Award.  For an algorithm, something not usually consider artsy.  I found that interesting.

Anyways, rather than dive into Perlin's impenetrable description of how the algorithm works, I set out to find an existing JavaScript implementation. That let me to this message board, and specifically this example.

Unfortunately that implementation has some pretty serious directional artifacts:

Note the horizontal and vertical stripes.  There's almost an upside down cross in the lower right. SAAAATAAAAN!

Rather than try to fix that source code (which the author apparently closureized (making it very difficult to understand)) I kept searching.

From the main Wikipedia entry on Perlin noise, I ran across a variant called Simplex noise.  This is an improvement on the original algorithm, also written by Ken Perlin, in 2001.  That Wikipedia page linked to a paper by Stefan Gustavson(pdf) that explains both classical and Simplex Perlin noise in a much easier to grok way than anything I've read by Perlin himself.  I highly recommend Gustavson's paper if you found Perlin difficult.

I took the Java source code in Gustavson's paper and ported it to JavaScript, and the results are here on github.

I ran some performance comparisons between the classical and Simplex algorithms, and for 2-D I only saw a ~10% improvement with Simplex.  Granted, the latter is supposed to be faster in higher dimensions (classical is O(N^2) vs. simplex O(N) where N is the number of dimensions) so it doesn't matter much for my purposes.

Classical Perlin noise

Simplex Perlin noise

Subjectively I think I prefer the Simplex noise to classical, so I'll probably go with that for Art Evolver.

Again, the source code is here.

Saturday, February 13, 2010

Curl-ing up with WebFinger and PubSubHubub

This morning I've been playing around with WebFinger and PubSubHubub. One of the great things about open web APIs is that you can tinker around with them without even writing an application. Just use curl!

Let's start with WebFinger. First, we need to figure out how to get my WebFinger data from gmail. There's standard place to look for that explanation, given a domain name:
http[s]://{domain-name}/.well-known/host-meta
So for gmail we get the explanation of how to get my WebFinger data like so:
$ curl http://gmail.com/.well-known/host-meta
Out pops an XRD doc that contains a URI template (in bold, below):
<?xml version='1.0' encoding='UTF-8'?>
<!-- NOTE: this host-meta end-point is a pre-alpha work in progress.   Don't rely on it. -->
<!-- Please follow the list at http://groups.google.com/group/webfinger -->
<XRD xmlns='http://docs.oasis-open.org/ns/xri/xrd-1.0' 
     xmlns:hm='http://host-meta.net/xrd/1.0'>
  <hm:Host xmlns='http://host-meta.net/xrd/1.0'>gmail.com</hm:Host>
  <Link rel='lrdd' 
        template='http://www.google.com/s2/webfinger/?q={uri}'>
    <Title>Resource Descriptor</Title>
  </Link>
</XRD>
Substitute my email address for {uri} and curl it:
$ curl http://www.google.com/s2/webfinger/?q=banksean@gmail.com
That spits out another XRD that describes some other resources associated with my email address:
<?xml version='1.0'?>
<XRD xmlns='http://docs.oasis-open.org/ns/xri/xrd-1.0'>
 <Subject>acct:banksean@gmail.com</Subject>
 <Alias>http://www.google.com/profiles/banksean</Alias>
 <Link rel='http://portablecontacts.net/spec/1.0'
href='http://www-opensocial.googleusercontent.com/api/people/'/>
 <Link rel='http://webfinger.net/rel/profile-page' 
href='http://www.google.com/profiles/banksean' type='text/html'/>
 <Link rel='http://microformats.org/profile/hcard' 
href='http://www.google.com/profiles/banksean' type='text/html'/>
 <Link rel='http://gmpg.org/xfn/11' 
href='http://www.google.com/profiles/banksean' type='text/html'/>
 <Link rel='http://specs.openid.net/auth/2.0/provider' 
href='http://www.google.com/profiles/banksean'/>
 <Link rel='describedby' 
href='http://www.google.com/profiles/banksean' type='text/html'/>
 <Link rel='describedby' 
href='http://s2.googleusercontent.com/webfinger/?q=banksean%40gmail.com&amp;fmt=foaf'
type='application/rdf+xml'/>
 <Link rel='http://schemas.google.com/g/2010#updates-from' 
href='http://buzz.googleapis.com/feeds/103419049256232792514/public/posted' 
type='application/atom+xml'/>
</XRD>
Bolded above is the rel='http://schemas.google.com/g/2010#updates-from' URI for my public status updates. Let's fetch that:
$ curl http://buzz.googleapis.com/feeds/103419049256232792514/public/posted
<?xml version='1.0' encoding='utf-8'?>
<feed xmlns='http://www.w3.org/2005/Atom' 
xmlns:thr='http://purl.org/syndication/thread/1.0' 
xmlns:media='http://search.yahoo.com/mrss' 
xmlns:activity='http://activitystrea.ms/spec/1.0/'>
<link rel='self' type='application/atom+xml'
href='http://buzz.googleapis.com/feeds/103419049256232792514/public/posted'/>
<link rel='hub' href='http://pubsubhubbub.appspot.com/'/>
<!-- lots more feed data not relevant to this discussion -->
It's a standard Atom feed. Amongst a lot of other stuff in the beginning is a rel="hub" link. This is where PubSubHubub comes in. Suppose that I want some other service to be notified whenever I post a status update (for instance, I have an app that reposts it in the sidebar of my blog). I could poll this Atom feed but polling is pretty janky. With PuSH I can register a callback to be notified whenever I post an update. Since this feed has a rel="hub" link set to http://pubsubhubbub.appspot.com/, that's where I go to do register a callback.

If you just navigate to http://pubsubhubbub.appspot.com/ with your browser you get a form you can fill out to create a subscription. One of the fields is for a "Callback" url. I don't run any websites that know how to handle PuSH callbacks (or subscription confirmation, for that matter). Luckily there is a test subscriber on appspot that accepts subscription requests for anything: http://pubsubhubbub-subscriber.appspot.com/. To create your own callback URL for testing, just add /subscriber.{some_unique_identifier} to the end of it.

According to the PuSH spec, I should POST some form fields to the hub URL like so:
$ curl -v http://pubsubhubbub.appspot.com/subscribe \
-d hub.callback=http://pubsubhubbub-subscriber.appspot.com/subscriber.banksean\&\
hub.topic=http://buzz.googleapis.com/feeds/103419049256232792514/public/posted\&\
hub.verify=sync\&hub.mode=subscribe\&hub.verify_token=\&hub.secret= 
And that creates the subscription. Here's the verbose output:
* About to connect() to pubsubhubbub.appspot.com port 80 (#0)
*   Trying 74.125.19.141... connected
* Connected to pubsubhubbub.appspot.com (74.125.19.141) port 80 (#0)
> POST /subscribe HTTP/1.1
> User-Agent: curl/7.16.3 (powerpc-apple-darwin9.0) libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3
> Host: pubsubhubbub.appspot.com
> Accept: */*
> Content-Length: 219
> Content-Type: application/x-www-form-urlencoded
> 
< HTTP/1.1 204 No Content
< Cache-Control: no-cache
< Content-Type: text/plain
< Expires: Fri, 01 Jan 1990 00:00:00 GMT
< Date: Sat, 13 Feb 2010 17:41:29 GMT
< Server: Google Frontend
< Content-Length: 0
< X-XSS-Protection: 0
< 
* Connection #0 to host pubsubhubbub.appspot.com left intact
* Closing connection #0

The 204 No Content response indicates the subscription was created and is active, according to the spec.

And if you go to http://pubsubhubbub-subscriber.appspot.com/ right now (Saturday morning, October 13th 2009), you'll indeed see a bunch of my posts on it.

Ta Da. No code. Just curl. I love the internet.

Wednesday, February 10, 2010

Mersenne Twister to the Rescue

For Art Evolver I'd like to add a Perlin noise function. The problem with doing this in javascript is the noise is unstable from generation to generation because you can't specify a seed value for Math.random().

Luckily there was an existing implementation of a pseudorandom number generator in javascript here (it's an implementation of Mersenne Twister. The problem with this code is that its functions and state variables are all in the global namespace. Meaning you can only have one generator. I need an arbitrary number of them at any given time, so I wrapped Makoto Matsumoto and Takuji Nishimura's code in a namespace.

Now I can use it like so:
var m = new MersenneTwister(123);

// now calling m.random() four times should return
// the following sequence:
// 2991312382
// 3062119789
// 1228959102
// 1840268610

The namespaced Mersenne Twister code is here.

Sunday, February 7, 2010

Logging With HTML5 WebWorkers

I'm converting some of my code for Art Evolver to use HTML5 WebWorkers instead of doing CPU-intensive operations in the main UI thread (cardinal sin, that is). It went pretty smoothly but I ran into a small problem while debugging: you can't log messages to console.log from a WebWorker. I assume this is for security reasons.

I worked around this by calling postMessage() from my WebWorker to get the log message into the browser window where it could be logged:

// inside render-worker.js
function log(msg) {
  postMessage("log: " + msg);
}
and in the code that calls the worker, I added this to the onmessage callback:
// inside UI code running in the main browser window
worker.onmessage = function(e) {
  if (e.data.indexOf("log:") == 0) {
    window.console.log(e.data);
    return;
  }

  // assume it's not a log message, JSON.parse() it,
  // or do other stuff with it here.
}

Voila- logging from inside a WebWorker. Totally a hack, but it worked for me in a pinch.

NodeJS + WebSockets = Stoopid Easy Comet Chat

A few weeks ago, amix wrote a blog post about Plurk's use of NodeJS for Comet chat.

He didn't post source code (as far as I could tell, after searching for two minutes :) but I was able to cobble something together yesterday along the same lines.

Here's the client:

and server:


This works with Guille's node.websocket.js library.  Start with that, then add chat.js into ./modules.

One last tweak is you have to add onDisconnect to node.websocket.js.  I did that by adding the following:


if (this.module.onConnect) {
    this.module.onConnect(this);
}


to the end of Connection.prototype._handshake. I suppose it could be called from elsewhere but that seems to work.

Check out amix's blog - lots of interesting details about how Plurk scales out to over 100,000 open connections. Hint: you don't do it with one thread per connection. The Servlet/CGI model is starting to look pretty old and creaky these days.

Update: bru has a node.websocket.js fork that includes a comet chat app. It's more complete than what I've posted here and he appears to have worked around the missing onConnect by just adding new connection objects in the onData callback.