Kevin Kim mentions, in a post published this evening, his curiosity about what he refers to as “real-time searching”. If things had gone a little differently back in 2006, real-time search would be a lot farther along than it is: this is exactly what the start-up company I used to work for, PubSub Concepts, was all about. We were all set to take over the world.
We called what we did at PubSub “prospective” search. Here’s how it was different from traditional, retrospective search.
What search engines like Google do is crawl the Web and index what they find. When a user comes to them for a search (almost always an HTTP request), the engine very quickly rummmages around in its data repository for any previously indexed pages that match the user’s query; the answer is then given to the user as an HTTP response. It’s a one-off transaction. The problem is that only pages that have already been found by the automated crawl can be given to the user — and given the enormous size of the Web, they might be days or weeks old.
Prospective search is exactly the opposite. These engines don’t crawl; they read live feeds of various sorts in real time, digesting new content as fast as it is created. The user comes in with a query, just as before, but rather than a one-shot response based on historical data, the prospective-search engine holds on to the query. And rather than searching, these engines do matching: as the torrent of real-time data pours through the system, each item is checked against all of the stored queries, and matching results can be “pushed” back to the users as fast as they happen (the way we did this at PubSub was with an instant-messaging framework connected to a scrolling display on the user’s machine).
There are several hard problems to solve with this model, foremost of which is the matching of huge volumes of incoming data against millions of stored queries. But at PubSub we had an extraordinarily efficient “matching engine” of our own design, and were able to do this very well.
What this meant was that I could sit at my computer and watch a live “news zipper” (it was a browser add-in that took the form of a scrolling “sidebar”) of newly-generated content from all over the Web, filtered just for me, according to my own custom search queries. Usually the items were less than a minute old, often just a few seconds. It was enormously addictive. Others thought so too; some of the biggest names in venture capital were lined up to fund this enterprise.
So what happened? I won’t go into the details here, but let’s just say that there was bitter strife between the company’s founding officers, and it got so bad that it scared away all the money, and PubSub died on the vine.
It was an awful shame, and it still saddens me to think of it; we all had the certain feeling that we were onto something really big.
Mainly, I miss that sidebar.
2 Comments
Have you played with Google News? It has some similar functionality. It’s only updated daily but I have a list of keywords registered with them. Whenever any news comes in that is related they shoot me an email.
Hi “B.C.”, and welcome.
Yes, I have several of those alerts myself. The biggest difference is latency, though admittedly Google can keep that low for high-value sources that they re-index often.