Google Could Make Data Marketplaces Actually Useful

February 20, 2011 Off By David
Grazed from GigaOM.  Author: Paul Miller.

This week, Google announced that its Public Data Explorer now lets end users upload and visualize their own data. Previously, the visualization tool had only permitted users to analyze a relatively small group of official data sets from bodies such as the United Nations and World Bank. With this release, Google may be making a move toward the increasingly crowded data marketplace, joining Microsoft, IBMFactual, Infochimps and many others.

Key to making it easy for Google to understand the structure of data being uploaded is a new encoding format, the Dataset Publishing Language (DSPL), which enables data owners to describe structure within their data (“continents” contain “countries”), and takes some rudimentary steps toward encouraging linking concepts between data sets.

Typically, existing solutions in the space concentrate on allowing data consumers to find and then download or visualize a single data set provided by someone else. Many also target data contributors, and concentrate on enabling them to upload and then share or sell individual data sets.

Matters become far more complex when you want to start combining different data sets, even within a single data marketplace. Typically, it’s not what these services are designed for, and typically, there is insufficient metadata to enable sensible combinations. For example, “height” of buildings in one data set combined with “height” of, say, trees or mountains in another is a recipe for disaster if one is measured in feet and the other in meters. Without knowledge of the units used, the newly combined data set is worthless — and, possibly, dangerously misleading. Factual is already doing some of the work to tidy data that it collects, but Google’s DSPL is an interesting example of encouraging data owners to make these things explicit themselves. Whether it will catch on, or not, remains to be seen.