Saturday, March 1, 2008

New Technique for Estimates Relating to Massive Quantities of Data

Brown mathematicians prove new way to build a better estimate

phys.org


Brown University mathematician Charles “Chip” Lawrence and graduate student Luis Carvalho have proved a new way to build a better estimate to answer the question, "How do you sift through hundreds of billions of bits of information and make accurate inferences from such gargantuan sets of data?"


"For more than 80 years, one of the most common methods of statistical prediction has been maximum likelihood estimation (MLE). This method is used to find the single most probable solution, or estimate, from a set of data.

"But new technologies that capture enormous amounts of data – human genome sequencing, "Internet transaction tracking, instruments that beam high-resolution images from outer space – have opened opportunities to predict discrete “high dimensional” or “high-D” unknowns. The huge number of combinations of these “high-D” unknowns produces enormous statistical uncertainty. Data has outgrown data analysis.

"This discrepancy creates a paradox. Instead of producing more precise predictions about gene activity, shopping habits or the presence of faraway stars, these large data sets are producing more unreliable predictions, given current procedures. That’s because maximum likelihood estimators use data to identify the single most probable solution. But because any one data point swims in an increasingly immense sea, it’s not likely to be representative...

"Lawrence and Carvahlo used statistical decision theory to understand the limitations of the old procedure when faced with new “high-D” problems. They also used statistical decision-making theory to find an estimation procedure that applies to a broad range of statistical problems. These “centroid” estimators identify not the single most probable solution, but the solution that is most representative of all the data in a set."

No comments: