- September 9, 2011
Let’s start with storage – not too long ago geo developers had two choices, file formats or proprietary object-relational databases. Today there are production ready open source object-relational databases such as PostgreSQL/PostGIS and MySQL; even mobile devices have lightweight databases with spatial capabilities such as SQLite. In addition to traditional object-relational databases, NoSQL databases such as Cassandra, CouchDB, and MongoDB have a spatial capabilities. Big Table clones such as Hbase can also store spatial data and there is ongoing work for developing a spatial index which facilitates spatial queries and operations. Neo4J is a graph database that also handles spatial data. Finally, even full text search engines such as ElasticSearch provide geospatial search capabilities.
Manipulating spatial data and performing analysis used to be dominated by specialized proprietary Geographic Information Systems (GIS) desktop software. The geospatial software landscape has expanded into many open source desktop products such as QGIS, UDig and GVSig. While desktop products are typically used for spatial analyses or cartographic production, they also provide a quick way to visualize data and results from API queries. Many open source desktop are built on standard geospatial libraries such as JTS, or the Java Topology Suite, and GEOS, the C port of JTS. These spatial libraries also have bindings to popular scripting languages like Python or Ruby, which lets developers process geospatial data in their language of choice. For example, Shapely is a python library and rgeo and GeoRuby are Ruby geospatial libraries. Software for data extraction, translation and loading (ETL) tasks are also available as open source or as proprietary software. GDAL/OGR is an open source geospatial ETL library and collection of utilities that work with most of the common raster and vector formats. FME (Feature Manipulation Engine) is a commercial product that can perform ETL on most geospatial formats.
The growth of geospatial developer tools has been driven by the availability of spatial data. Collecting spatial data was once the domain of government agencies, but widespread availability of consumer GPS on smartphones has created an explosion of spatial data generated through social media and checkin services. Transparency efforts at all levels of government has added to the growing amount of spatial data. While there are a number of options for storing your own data in one of the solutions mentioned previously or hosting it on a service such as Google Fusion Tables, another alternative is to use a service that provides a consistent API to spatial data. A number of data providers, including InfoChimps, provide spatial data, but when working with spatial data it is easy to overwhelm browser based map clients by the volume of data. Schuyler Earle coined the term “red dot fever” to describe the situation where data markers obscure the map and any discernible patterns. Two ways to overcome data overload are clustering data to show outliers and aggregation which decreases spatial resolution but reveals patterns. InfoChimps provides the Summarizer tool to make data query results more usable by organizing data points into intelligent geographic clusters. Another advantage of the InfoChimps Geo API is consistency across data by a unifying schema call the Infochimps Simple Schema (ICSS). ICSS is based on schema.org to provide a consistent and web friendly way to access data The mind map above is a first stab at organizing the ever growing array of geospatial tools and data. It’s a work and progress and comments are welcome.