Skip navigation
sponsored by 

From ranking blogs to predicting posture

New algorithm could pave the way to improved sensor networks

By Bryn Nelson
Columnist
MSNBC
updated 9:27 a.m. ET Jan. 28, 2008

Image: Bryn Nelson
Bryn Nelson
Columnist

Ranking blogs that deliver the biggest bang for the buck may not seem to flow naturally from detecting contamination in a city water system.

With a new mathematical formula, however, Carlos Guestrin and his research team at Carnegie Mellon University in Pittsburgh are setting their sights on improving how scientists monitor everything from algal blooms in lakes to subtle differences in how someone sits or slouches in a chair.

Story continues below ↓
advertisement

Scientists have long studied how information, influence or physical items move through networks. But by combining that field of research with how to optimally detect the flow in a cost-effective way, the Carnegie Mellon researchers have devised a formula, or algorithm, that could lead to dramatically improved sensor networks, whether geared toward political blogs or posture.

Guestrin, an assistant professor of computer science and machine learning, received his initial inspiration through a collaboration with civil engineers on a problem-solving competition sponsored by the Environmental Protection Agency. Given a model of an unnamed city’s network of water pipes, a simulation of water consumption and different contamination scenarios, the teams had to decide where to place a limited number of $5,000 sensors to detect contamination as quickly as possible.

The competition organizers initially declared every team a winner, to Guestrin’s frustration, but a subsequent evaluation awarded his team the top score. Buoyed by his success, he began thinking about how the central idea could be adapted for other applications, including how information spreads across the Internet.

“Somebody places a story and some people link to the story, and some people link to the links, and so on,” he said. “You can think of it as a cascade of information.”

The Cascades algorithm
But how would this cascade be modeled? What sensors, or blogs in this case, should be tapped to maximize the likelihood of capturing a big story early on during its propagation over the blogosphere?

A team of researchers and graduate students from Carnegie Mellon eventually created a complex mathematical equation called the cost-effective lazy forward-selection algorithm, later dubbed the Cascades algorithm for simplicity’s sake.

One part seeks to maximize reward, in this case detecting the most news in the least amount of time. Within the algorithm, that reward concept is captured by tallying the number of people who read a news item after it appears on a specific blog. If 10 million people read a story after its initial posting on Blog A but only 1,000 had read it beforehand, the story would be deemed both newsworthy and early-breaking for Blog A’s readers.

A second part of the algorithm seeks to minimize cost, namely the inordinate time that could be spent reading blogs. The team also exploited a mathematical relationship known as the law of diminishing returns.

“If I place five sensors in the water distribution system, the sixth sensor helps me much more than if I have 10,000 sensors and want to place the 10,001st,” Guestrin said. The same holds true for adding more blogs to a news-detection system.

For one scenario, the researchers assumed that a reader could peruse no more than 100 blogs while another scenario had a budget set at no more than 5,000 postings. For each, the researchers asked the same question: which blogs should one read to be the most up-to-date on newsworthy stories? The team considered 45,000 blogs in all, obtaining 10 million posts and identifying 350,000 news cascades over the course of 2006.

Although the resulting top 100 blogs contain some familiar names, the two scenarios led to almost completely different lists.

Given a budget of 100 blogs, the biggest bang for the buck belonged to the popular Instapundit blog, which featured more than 4,500 postings throughout the year. Assuming a budget of 5,000 posts, however, the top-scoring blog was the less well-known sisu site, which featured only 331 posts for all of 2006.


Resource guide

Get Your 2008 Credit Score

Find a business to start

Try for Free

Search Jobs

Find Your Dream Home

$7 trades, no fee IRAs

Find your next car