Pages

Wednesday, January 28, 2009

CS4 Master Collection Dead Drop

Adobe Platform Evangelist Lee Brimelow presented a pretty awesome challenge, CS4 Master Collection dead drop:

This is your chance to get a free copy of CS4 Master Collection which is valued at over $2500. I have long been a fan of spy movies and the various aspects of tradecraft that intelligence agencies use. With that being said I have created a dead drop which now contains the software. Watch the video below to get all the clues you’ll need to find the drop.

I just heard about this a from my friend, E. He's the one who actually picked up the drop last night. Awesome. What are the odds? He just happened to be:

  • Geeky enough to be reading flash blogs

  • Reading the post soon after it went live

  • Sitting 5 miles away from the drop site when he read about it

  • an Eagle Scout, totally at home in the great outdoors at night


The last one is important because at least one of the commenters got close, but then got lost.

That's like the definition of opportunity: When luck meets preparation.

Tuesday, January 27, 2009

Sharding

Joe Gregorio's article on sharding counters highlights an example of the counter(heh ;)intuitive techniques that work well on applications with large data sets. "Large" meaning "too big to fit into RAM on a single machine."

One of the reasons sharding works so well is that accessing another computer's RAM over a fast network is orders of magnitude faster than accessing your own disk. (memcached also takes advantage of these constraints)

Here's a presentation with some more details and examples of sharding in appengine:

App Engine Google

Wednesday, January 14, 2009

Interview With an Adware Developer

Check out this great interview with an adware developer. (thanks for the link, Adam)

Lots of juicy technical details in this interview. It's been years since I worked with Windows code but some parts were familar:

IE has a mechanism called a Browser Helper Object (BHO) which is basically a gob of executable code that gets informed of web requests as they’re going. It runs in the actual browser process, which means it can do anything the browser can do– which means basically anything. We would have a Browser Helper Object that actually served the ads, and then we made it so that you had to kill all the instances of the browser to be able to delete the thing.

Back in my WebTaggers days we used the BHO api as well. When the BHO interface first came out, there were all these wonderful ideas that it generated, like "you could load a bunch of post-it notes created by you and your friends every time you visit a web page!" - but the BHO turned out to be more widely employed by the dark side. I have to wonder if MS really considered the potential to use the BHO as more of weapon than a tool when they were designing it.

And other stuff like this reinforces my position that Windows is a threat to public health:

...The Win32 API is fundamentally Ascii. There are strings that you can express in 16-bit counted Unicode that you can’t express in ASCII. Most notably, you can have things with a Null in the middle of it.

That meant that we could, for instance, write a Registry key that had a Null in the middle of it.
[...]
Because of that, we were able to make registry keys that were invisible or immutable to anyone using the Win32 API. Interestingly enough, this was not only all civilians and pretty much all of our competitors, but even most of the antivirus people.

Then he drops this awesomebomb:

Eventually, we got sick of writing a new C program every time we wanted to go kick somebody off of a machine. Everybody said, “What we need is something configurable.” I said, “Let’s install a Turing-complete language,” and for that I used tinyScheme, which is a BSD licensed, very small, very fast implementation of Scheme that can be compiled down into about a 20K executable if you know what you’re doing.

I wonder if he released his improvements to tinyScheme as open source.

Anyways I recommend reading the entire interview.

Thursday, January 8, 2009

MapReduce with JavaScript

Inspired by Michael Nielsen's Write your first MapReduce program in 20 minutes, I've ported his example to JavaScript.

The m/r kernel is pretty small, actually just a single mapReduce function and a helper groupBy function:

// mapper should return an array of [{key:'somekey', value:'somevalue'}]
// reducer should return a single {key:'somekey', value:'somevalue'}
function mapReduce(i, mapper, reducer) {
 var intermediate = [];
 var output = [];
 for(var key in i) {
  var value = i[key];
  intermediate = intermediate.concat(mapper(key, value));
 }

 var groups = groupBy(intermediate);
 for(var key in groups) {
  var values = groups[key];
  output.push(reducer(key, values));
 }

 return output;
}

// list should be [{key:k, value:v}, ....] where key may be repeated.
// returns [{key, [v1, v2, v3...]}, ...] where key is *not* repeated.
function groupBy(list) {
 var ret = {};
 for (var i=0; i<list.length; i++) {
  var key = list[i].key;
  var value = list[i].value;
  if (!ret[key]) {
   ret[key] = [];
  }

  ret[key].push(value);
 }
 return ret;
}

And you could use it for the canonical "word count" example like so:

function myMapper(key, value) {
 var ret = [];
 var words = normalizeText(value).split(' ');
 for (var i=0; i<words.length; i++) {
  ret.push({key:words[i], value:1});
 }
 return ret;
}

function myReducer(intermediateKey, values) {
 var sum = 0;
 for (var i=0; i<values.length; i++) {
  sum += values[i];
 }
 return {key:intermediateKey, value:sum};
}

function normalizeText(s) {
 s = s.toLowerCase();
 s = s.replace(/[^a-z]+/g, ' ');
 return s;
}

var i = {};
i.atxt = "The quick brown fox jumped over the lazy grey dogs.";
i.btxt = "That's one small step for a man, one giant leap for mankind.";
i.ctxt = "Mary had a little lamb, Its fleece was white as snow; And everywhere that Mary went, The lamb was sure to go.";

var out = mapReduce(i, myMapper, myReducer);


This example just allows you to write programs in the MapReduce style. It doesn't do any of the fancy footwork necessary to actually parallelize the operations and manage the execution.

That does make me think of some interesting possibilities though.

If only there was a vast sea of computers, all running javascript interpreters, all connected to the internet, all capable of downloading and running your m/r jobs. :)