Posts

HANA Zeppelin Query Builder with Map Visualization

SAP HANA Query Builder On Apache Zeppelin Demo

HANA Zeppelin Query Builder with Map Visualization

HANA Zeppelin Query Builder with Map Visualization

In working with Apache Zeppelin I found that users wanted a way to explore data and build charts without needing to know SQL right away. This is an attempt to build a note in Zeppelin that would allow a new data scientist to get familiar with the data structure of their database. And it allows them to build simple single table queries that allow for building charts and maps quickly. In addition it shows the SQL used to perform the work.

Demo

This video will demonstrate how it works. I have leveraged work done by Randy Gelhausen’s query builder post on how to make a where clause builder.  I also used Damien Sorel’s jQuery Query Builder. These were used to make a series of paragraphs to lookup tables and columns in HANA and allow the user to build a custom query. This data can be quickly graphed using the Zeppelin Helium visualizations.

The Code

This is for those data scientists and coders that want to replicate this in their Zeppelin.

Note that this code is imperfect as I have not worked out all the issues with it. You may need to make changes to get it to work. It only works on Zeppelin 0.8.0 Snapshot. It is also made to work with SAP HANA as the databases.

It only has one type of aggregation – sum and it does not have a way to perform a having statement. But these features could easily be added.

This Zeppelin note is dependent on code from a previous post. Follow the directions in Using Zeppelin to Explore a Database first.

Paragraph One

%spark
//Get list of columns on a given table
def columns1(table: String) : Array[(String)] = {
 sqlContext.sql("select * from " + table + " limit 0").columns.map(x => x.asInstanceOf[String])
}

def columns(table: String) : Array[(String, String)] = {
 sqlContext.sql("select * from " + table + " limit 0").columns.map(x => (x, x))
}

def number_column_types(table: String) : Array[String] = {
 var columnType = sqlContext.sql("select column_name from table_columns where table_name='" +
    table + "' and data_type_name = 'INTEGER'")
 
 columnType.map {case Row(column_name: String) => (column_name)}.collect()
}

// set up the tables select list
val tables = sqlContext.sql("show tables").collect.map(s=>s(1).asInstanceOf[String].toUpperCase())
z.angularBind("tables", tables)
var sTable ="tables"
z.angularBind("selectedTable", sTable)


z.angularUnwatch("selectedTable")
z.angularWatch("selectedTable", (before:Object, after:Object) => {
 println("running " + after)
 sTable = after.asInstanceOf[String]
 // put the id for paragraph 2 and 3 here
 z.run("20180109-121251_268745664")
 z.run("20180109-132517_167004794")
})


var col = columns1(sTable)
col = col :+ "*"
z.angularBind("columns", col)
// hack to make the where clause work on initial load
var col2 = columns(sTable)
var extra = ("1","1")
col2 = col2 :+ extra
z.angularBind("columns2", col2)
var colTypes = number_column_types(sTable)
z.angularBind("numberColumns", colTypes)
var sColumns = Array("*")
// hack to make the where clause work on initial load
var clause = "1=1"
var countColumn = "*"
var limit = "10"

// setup for the columns select list
z.angularBind("selectedColumns", sColumns)
z.angularUnwatch("selectedColumns")
z.angularWatch("selectedColumns", (before:Object, after:Object) => {
 sColumns = after.asInstanceOf[Array[String]]
 // put the id for paragraph 2 and 3 here
 z.run("20180109-121251_268745664")
 z.run("20180109-132517_167004794")
})
z.angularBind("selectedCount", countColumn)
z.angularUnwatch("selectedCount")
z.angularWatch("selectedCount", (before:Object, after:Object) => {
 countColumn = after.asInstanceOf[String]
})
// bind the where clause
z.angularBind("clause", clause)
z.angularUnwatch("clause")
z.angularWatch("clause", (oldVal, newVal) => {
 clause = newVal.asInstanceOf[String]
})

z.angularBind("limit", limit)
z.angularUnwatch("limit")
z.angularWatch("limit", (oldVal, newVal) => {
 limit = newVal.asInstanceOf[String]
})

This paragraph is Scala code that sets up some functions that are used to query the table with the list of tables and the table with the list of columns. You must have the tables loaded into Spark as views or tables in order to see them in the select lists. This paragraph performs all the binding so that the next paragraph which is Angular code can get the data built here.

Paragraph Two

%angular
<link rel="stylesheet" href="https://cdn.rawgit.com/mistic100/jQuery-QueryBuilder/master/dist/css/query-builder.default.min.css">
<script src="https://cdn.rawgit.com/mistic100/jQuery-QueryBuilder/master/dist/js/query-builder.standalone.min.js"></script>

<script type="text/javascript">
  var button = $('#generateQuery');
  var qb = $('#builder');
  var whereClause = $('#whereClause');
 
  button.click(function(){
    whereClause.val(qb.queryBuilder('getSQL').sql);
    whereClause.trigger('input'); //triggers Angular to detect changed value
  });
 
  // this builds the where statement builder
  var el = angular.element(qb.parent('.ng-scope'));
  angular.element(el).ready(function(){
    var integer_columns = angular.element('#numCol').val()
    //Executes on page-load and on update to 'columns', defined in first snippet
    window.watcher = el.scope().compiledScope.$watch('columns2', function(newVal, oldVal) {
      //Append each column to QueryBuilder's list of filters
      var options = {allowEmpty: true, filters: []}
      $.each(newVal, function(i, v){
        if(integer_columns.split(',').indexOf(v._1) !== -1){
          options.filters.push({id: v._1, type: 'integer'});
        } else if(v._1.indexOf("DATE") !== -1) {
          options.filters.push({id: v._1, type: 'date'})
        } else { 
          options.filters.push({id: v._1, type: 'string'});
        }
      });
      qb.queryBuilder(options);
    });
  });
</script>
<input type="text" ng-model="numberColumns" id="numCol"></input>
<form class="form-inline">
 <div class="form-group">
 Please select table: Select Columns:<br>
 <select size=5 ng-model="selectedTable" ng-options="o as o for o in tables" 
       data-ng-change="z.runParagraph('20180109-151738_134370871')"></select>
 <select size=5 multiple ng-model="selectedColumns" ng-options="o as o for o in columns">
 <option value="*">*</option>
 </select>
 Sum Column:
 <select ng-model="selectedCount" ng-options="o as o for o in columns">
 <option value="*">*</option>
 </select>
 <label for="limitId">Limit: </label> <input type="text" class="form-control" 
       id="limitId" placeholder="Limit Rows" ng-model="limit"></input>
 </div>
</form>
<div id="builder"></div>
<button type="submit" id="generateQuery" class="btn btn-primary" 
       ng-click="z.runParagraph('20180109-132517_167004794')">Run Query</button>
<input id="whereClause" type="text" ng-model="clause" class="hide"></input>

<h3>Query: select {{selectedColumns.toString()}} from {{selectedTable}} where {{clause}} 
   with a sum on: {{selectedCount}} </h3>

Paragraph two uses javascript libraries from jQuery and jQuery Query Builder. In the z.runParagraph  command use the paragraph id from paragraph three.

Paragraph Three

The results of the query show up in this paragraph. Its function is to generate the query and run it for display.

%spark
import scala.collection.mutable.ArrayBuffer

var selected_count_column = z.angular("selectedCount").asInstanceOf[String]
var selected_columns = z.angular("selectedColumns").asInstanceOf[Array[String]]
var limit = z.angular("limit").asInstanceOf[String]
var limit_clause = ""
if (limit != "*") {
 limit_clause = "limit " + limit
}
val countColumn = z.angular("selectedCount")
var selected_columns_n = selected_columns.toBuffer
// remove from list of columns
selected_columns_n -= selected_count_column

if (countColumn != "*") {
 val query = "select "+ selected_columns_n.mkString(",") + ", sum(" + selected_count_column +
     ") "+ selected_count_column +"_SUM from " + z.angular("selectedTable") + " where " + 
      z.angular("clause") + " group by " + selected_columns_n.mkString(",") + " " + 
      limit_clause
 println(query)
 z.show(sqlContext.sql(query))
} else {
 val query2 = "select "+ selected_columns.mkString(",") +" from " + z.angular("selectedTable") + 
      " where " + z.angular("clause") + " " + limit_clause
 println(query2)
 z.show(sqlContext.sql(query2))
}

Now if everything is just right you will be able to query your tables without writing SQL. This is a limited example as I have not provided options for different types of aggregation, advanced grouping or joins for multiple tables.

 

Please follow us on our website at https://volumeintegration.com and on twitter at volumeint.

Query of a geographic region.

Zeppelin Maps the Hard Way

In Zeppelin Maps the Easy Way I showed how to add a map to Zeppelin with a Helium module. But what if you do not have access to the Helium NPM server to load in that module? And what if you want to add features to your Leaflet Map that are not supported in the volume-leaflet package?

This will show you how the Angular javascript library will allow you to add a map user interface to a Zeppelin paragraph.

Zeppelin Angular Leaflet Map

Zeppelin Angular Leaflet Map with Markers

First we want to get a map on the screen with markers.

In Zeppelin create a new note.

As was shown in How to Use Zeppelin With SAP HANA we create a separate paragraph to build the database connection. Please substitute in your own database driver and connection string to make it work for other databases. There are other examples where you can pull in data from a csv file and turn it into a table object.

In the next paragraph we place the spark scala code to query the database and build the markers that will be passed to the final paragraph which is built with angular.

The data query paragraph has a basic way to query a bounding box. It just looks for coordinates that are greater and less than the northwest and southeast corners of a bounding box.

var sql1 = "select comments desc, lat, lng from EVENT_VIEW "
if (box.length > 0) {
var coords = box.split(",")
sql1 = sql1 + " where lng > " + coords(0).toFloat + " and lat > " + coords(1).toFloat + " and lng < " + coords(2).toFloat + " and lat < " + coords(3).toFloat
}

var sql = sql1 +" limit 20"
val map_pings = jdbcDF.sqlContext.sql(sql)
z.angularBind("locations", map_pings.collect()) 

The data from this query is used to make the map_pings and bind it to angular so that any angular code can reference it. Zeppelin has the ability to bind data into other languages so it can be used by different paragraphs in the same note. There are samples for other databases, json and csv files at this link.

We do not have access to the Hana proprietary functions because Zeppelin will load the data up in its own table view of the HANA table. We are using the command “createOrReplaceTempView” so that a copy of the data is not made in Zeppelin. It will just pass the data through.

Note that you should set up the HANA jdbc driver as described in How to Use Zeppelin With SAP HANA.

It is best if you set up a dependency to the HANA jdbc jar in the Spark interpreter. Go to the Zeppelin settings menu.

Zeppelin Settings Menu

Zeppelin Settings Menu

Pick the Interpreter and find the Spark section and press edit.

Zeppelin Interpreter Screen

Zeppelin Interpreter Screen

Then add the path you where you have the SAP HANA jdbc driver called ngdbc.jar installed.

Configure HANA jdbc in Spark Interpreter

Configure HANA jdbc in Spark Interpreter

First Paragraph

%spark
import org.apache.spark.sql._
val driver ="com.sap.db.jdbc.Driver"
val url="jdbc:sap://11.1.88.110:30015/tri"
val database   = "database schema"   
val username   = "username for the database"
val password   = "the Password for the database"
val table_view = "event_view"
var box=""
val jdbcDF = sqlContext.read.format("jdbc").option("driver",driver)
                                           .option("url",url)
                                           .option("databaseName", database)
                                           .option("dbtable", "event_view")
                                           .option("user", username)
                                           .option("password",password)
                                           .option("dbtable", table_view).load()
jdbcDF.createOrReplaceTempView("event_view")

Second Paragraph

%spark

var box = "20.214843750000004,1.9332268264771233,42.36328125000001,29.6880527498568";
var sql1 = "select comments desc, lat, lng from EVENT_VIEW "
if (box.length > 0) {
    var coords = box.split(",")
    sql1 = sql1 + " where lng  > " + coords(0).toFloat + " and lat > " +  
        coords(1).toFloat + " and lng < " + coords(2).toFloat + " and lat < " +
        coords(3).toFloat
}
var sql = sql1 +" limit 20" 

val map_pings = jdbcDF.sqlContext.sql(sql)
z.angularBind("locations", map_pings.collect())
z.angularBind("paragraph", z.getInterpreterContext().getParagraphId())
// get the paragraph id of the the angular paragraph and put it below
z.run("20171127-081000_380354042")

Third Paragraph

In the third paragraph we add the angular code with the %angular directive. Note the for each loop section where it builds the markers and adds them to the map.

%angular 
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.5/leaflet.css" />
.
<div id="map" style="height: 300px; width: 100%"></div>
<script type="text/javascript">
function initMap() {
    var element = $('#textbox');
    var map = L.map('map').setView([30.00, -30.00], 3);
   
    L.tileLayer('http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png').addTo(map);
    var geoMarkers = L.layerGroup().addTo(map);
    
    var el = angular.element($('#map').parent('.ng-scope'));
    var $scope = el.scope().compiledScope;
   
    angular.element(el).ready(function() {
        window.locationWatcher = $scope.$watch('locations', function(newValue, oldValue) {
            //geoMarkers.clearLayers();
            angular.forEach(newValue, function(event) {
                if (event)
                  var marker = L.marker([event.values[1], event.values[2]]).bindPopup(event.values[0]).addTo(geoMarkers);
            });
        })
    });
}
if (window.locationWatcher) { window.locationWatcher(); }

// ensure we only load the script once, seems to cause issues otherwise
if (window.L) {
    initMap();
} else {
    console.log('Loading Leaflet library');
    var sc = document.createElement('script');
    sc.type = 'text/javascript';
    sc.src = 'https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.5/leaflet.js';
    sc.onerror = function(err) { alert(err); }
    document.getElementsByTagName('head')[0].appendChild(sc);
}
</script>
<p>Testing the Map</p>

<form class="form-inline">
  <div class="form-group">
    <input id="textbox" ng-model="box" data-ng-change="z.runParagraph(paragraph);"></input>
    <label for="paragraphId">Paragraph Id: </label>
    <input type="text" class="form-control" id="paragraphId" placeholder="Paragraph Id ..." ng-model="paragraph"></input>
  </div>
  <button type="submit" class="btn btn-primary" ng-click="z.runParagraph(paragraph)"> Run Paragraph</button>
</form>

Now when you run the three paragraphs in order it should produce a map with markers on it.

The next step is to add a way to query the database by drawing a box on the screen. Into the scala / spark code we add a variable for the bounding box with the z.angularBind() command. Then a watcher is made to see when this variable changes so the new value can be used to run the query.

Modify Second Paragraph

%spark
z.angularBind("box", box)
// Get the bounding box
z.angularWatch("box", (oldValue: Object, newValue: Object) => {
    println(s"value changed from $oldValue to $newValue")
    box = newValue.asInstanceOf[String]
})

var sql1 = "select comments desc, lat, lng from EVENT_VIEW "
if (box.length > 0) {
    var coords = box.split(",")
    sql1 = sql1 + " where lng  > " + coords(0).toFloat + " and lat > " +  coords(1).toFloat + " and lng < " + coords(2).toFloat + " and lat < " +  coords(3).toFloat
}
var sql = sql1 +" limit 20" 

val map_pings = jdbcDF.sqlContext.sql(sql)
z.angularBind("locations", map_pings.collect())
z.angularBind("paragraph", z.getInterpreterContext().getParagraphId())
z.run("20171127-081000_380354042") // put the paragraph id for your angular paragraph here

To the angular section we need to add in an additional leaflet library called leaflet.draw. This is done by adding an additional css link and a javascript script. Then the draw controls are added as shown in the code below.

Modify the Third Paragraph

%angular 
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.5/leaflet.css" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/leaflet.draw/0.4.13/leaflet.draw.css" />
.
<script src='https://cdnjs.cloudflare.com/ajax/libs/leaflet.draw/0.4.13/leaflet.draw.js'></script>
<div id="map" style="height: 300px; width: 100%"></div>

<script type="text/javascript">
function initMap() {
    var element = $('#textbox');
    var map = L.map('map').setView([30.00, -30.00], 3);
   
    L.tileLayer('http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png').addTo(map);
    var geoMarkers = L.layerGroup().addTo(map);
    var drawnItems = new L.FeatureGroup();
    
    map.addLayer(drawnItems);
    
    var drawControl = new L.Control.Draw({
        draw: {
             polygon: false,
             marker: false,
             polyline: false
        },
        edit: {
            featureGroup: drawnItems
        }
    });
    map.addControl(drawControl);
    
    map.on('draw:created', function (e) {
        var type = e.layerType;
        var layer = e.layer;
        drawnItems.addLayer(layer);
        element.val(layer.getBounds().toBBoxString());
        map.fitBounds(layer.getBounds());
        window.setTimeout(function(){
           //Triggers Angular to do its thing with changed model values
           element.trigger('input');
        }, 500);
    });
    
    var el = angular.element($('#map').parent('.ng-scope'));
    var $scope = el.scope().compiledScope;
   
    angular.element(el).ready(function() {
        window.locationWatcher = $scope.$watch('locations', function(newValue, oldValue) {
            $scope.latlng = [];
            angular.forEach(newValue, function(event) {
                if (event)
                  var marker = L.marker([event.values[1], event.values[2]]).bindPopup(event.values[0]).addTo(geoMarkers);
                  $scope.latlng.push(L.latLng(event.values[1], event.values[2]));
            });
            var bounds = L.latLngBounds($scope.latlng)
            map.fitBounds(bounds)
        })
    });

}

if (window.locationWatcher) { window.locationWatcher(); }

// ensure we only load the script once, seems to cause issues otherwise
if (window.L) {
    initMap();
} else {
    console.log('Loading Leaflet library');
    var sc = document.createElement('script');
    sc.type = 'text/javascript';
    sc.src = 'https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.5/leaflet.js';
    sc.onerror = function(err) { alert(err); }
    document.getElementsByTagName('head')[0].appendChild(sc);
    s2.onload = initMap;
}
</script>
<p>Testing the Map</p>

<form class="form-inline">
  <div class="form-group">
    <input id="textbox" ng-model="box" data-ng-change="z.runParagraph(paragraph);"></input>
    <label for="paragraphId">Paragraph Id: </label>
    <input type="text" class="form-control" id="paragraphId" placeholder="Paragraph Id ..." ng-model="paragraph"></input>
  </div>
  <button type="submit" class="btn btn-primary" ng-click="z.runParagraph(paragraph)"> Run Paragraph</button>
</form>

There are some important features to mention here that took some investigation to figure out.

Within Zeppelin I was unable to get the box being drawn to be visible. So instead drawing a box will the map to zoom to the area selected by utilizing this code:
element.val(layer.getBounds().toBBoxString());
map.fitBounds(layer.getBounds());

To make the map zoom back to the area after the query is run this code is triggered.

$scope.latlng.push(L.latLng(event.values[1], event.values[2]))
...
var bounds = L.latLngBounds($scope.latlng)
map.fitBounds(bounds)

To trigger the spark / scala paragraph to run after drawing a box this code causes it to run the query paragraph: data-ng-change=”z.runParagraph(paragraph_id);”

<input id="textbox" ng-model="box" data-ng-change="z.runParagraph(paragraph);"></input>

The html form at the bottom is what holds and binds the data back and forth between the paragraphs. It is visible for debugging at the moment.

Query of a geographic region with Zeppelin

Query of a geographic region

Please let us know how it works out for you. Hopefully this will help you add maps to your Zeppelin notebook. I am sure there are many other better ways to accomplish this feature set but this is the first way I was able to get it all to work together.

Demo of the interface:

You can contact us using twitter at @volumeint.

Some code borrowed from: https://gist.github.com/granturing/a09aed4a302a7367be92 and https://zeppelin.apache.org/docs/latest/displaysystem/front-end-angular.html

War what is it good for

War, What is it Good For?

Do you remember the Edwin Starr song “War” from 1969? The chorus repeats:

War, huh yeah

What is it good for?

Absolutely nothing, oh hoh, oh
Well, war is good for at least one thing…maps!

Wartime Maps

Mapping data before computers was difficult and seems to have been a primary concern during war. In fact, wars have advanced the state of the art in mapping data for situational awareness throughout history. The speed at which we can determine events and plot them on a map shows amazing technical advancement.

The basic idea is to visualize the placement of the enemy and friendly forces on a paper map with pins, which we still do today. But instead of physical pins, we use images of pins on an electronic map.

Churchill’s War Rooms

The Map room – Churchill War Rooms

I want to take you to where Winston Churchill poured over maps during World War II. His war rooms were contained in an underground bunker beneath five feet of concrete in London. According to the Imperial War Museums, there was a concern that Londoners would feel abandoned and evacuation would be slow. So the government built a bunker right in London for use during the next war.

These rooms were left exactly the way they were found on August 16, 1945, at the end of the war. You can still see the pin holes in the maps for past troop movements and ships as they crossed the ocean.

Large Wall Map – Churchill War Rooms

There are also walls full of graphs and charts. It’s the 1948 version of today’s management dashboard. These charts outlined the number of troops and were kept up to date by an army of people moving pins and updating charts.

Informational Bar Charts – Churchill War Rooms

It is obvious how these maps and charts were used to enhance decision-making. They provided accurate knowledge and understanding of location, type and counts of equipment, and health of the troops for both the axis and the allies.

Graphs – Churchill War Rooms

There is even a map of Germany with an acetate covering to allow them to write on it. The last thing they wrote were the outlines of which countries would administer the division of Germany.

Germany Divided

Men of Maps

Churchill enjoyed studying maps so much that he had his sleeping/office quarters in the bunker papered with maps from floor to ceiling. His love for maps was well known.

In fact, his peer and collaborator in America, Franklin Roosevelt, was also a big fan of maps and had a steady stream of updated maps provided to him by the National Geographic Society. In the FDR White House, there was a cloakroom converted into a map room modeled after Churchill’s map room. The FDR Library says, “Maps posted in the room were used to track the locations of land, sea and air forces.”

Secret Room

There was another more secretive part of the Churchill War Rooms. Down a back hallway there was a restroom, or as it is called in England, the WC.

It was reserved for Winston Churchill’s use alone. Very few people really knew what was on the other side of the door.

Churchill’s “Water Closet” in the War Rooms

Typical Restroom Lock Indicator for Restrooms in England

The space was actually a secret telephone room with a direct line to FDR in the White House. The two leaders would coordinate the war operations over the encrypted line. It was encrypted with a system called SIGSALY that sat under the Selfridges department store on one end and the Pentagon on the other.

Innovations Continue Today

The use of great human effort, paper maps, and telecommunications aided in the war effort and led to innovations in managing logistics and monitoring world events geospatially. We have come along way, but we still put pins in a map – they just happen to be electronic. The militaries of the world continue to upgrade their map rooms into walls of video screens and server rooms of computers to make visualization updates in near real-time. Onward!

 

To learn more about Volume Labs and Volume Integration, please follow us on Twitter @volumeint and check out our website.

Mapping an Epidemic

Mapping an Epidemic

This map changed the way we see the world and the way we study science, nature, and disease.

In August of 1854, cholera was ravaging the Soho neighborhood of London where John Snow) was a doctor. People were fleeing the area as they thought cholera was spread by gasses in the air or, as they called it, “bad air.”

Just as there is disinformation today about Ebola being airborne, the experts of that time thought most disease was spread in the air. There was no concept that disease might be in the water. They had no idea that bacteria even existed.

John had worked as a doctor in a major outbreak of cholera in a mine. But despite working in close quarters with the miners, he never contracted the disease. He wondered why the air did not affect him.

This inspired him to write a paper on why he believed cholera was spread through water and bodily fluids. The experts at the time did not accept his theory; they continued to believe cholera was caused by the odors emitted by rotting waste.

In the Soho outbreak in August 1854, John Snow saw a chance to further prove his theory. He went door to door keeping a tally of deaths at each home. This was only part of his quest to find evidence to prove the source of the plagues of the day.

He had been collecting statistical information, personal interviews, and other research for many years. He added this information to his paper, “On the Mode of Communication of Cholera.” The paper and his work in researching and collecting evidence founded the science of epidemiology.

One of the most innovative features was plotting data using a map; it was the first published use of dots on a map supporting a scientific conclusion. Each of the bars on John Snow’s map represents one death. Using this visual technique, he could illustrate that the deaths were centered around a point and further investigate and interview people in the area. He could also find anomalies and outliers such as deaths far from the concentration and areas with no deaths.

Epicenter Pump and Brewery

He found through personal interviews and mapping the data that the workers in the brewery (in the epicenter of the epidemic) were not dying. The owner of the brewery said that the workers were given free beer, and he thought that they never drank water at all. In fact, there was a deep well in the brewery used in the beer. In other cases, John Snow found that addresses with low deaths had their own personal well.

He also investigated the outlying incidents through interviews: some worked in the area of the pump or walked by it on the way to school. One woman who got sick had the water brought to her by a wagon each day because she liked the taste of that particular well water. One person he talked to even said the water smelled like sewage and did not drink it, but his servant did and came down with a case of cholera.

The incidents highlighted the area around a public pump on Broad Street. Using his data, he convinced the local authorities to have the pump handle removed.

The most innovative feature of the map is that it changed the way we use maps. The idea that data could be visualized to prove a fact was very new.

John Snow’s map of the service areas of two water companies

John Snow also produced another map showing which water companies supplied water in London. This map showed that the water company which stopped using water from the Thames had a lower death rate due to cholera. The map allowed John Snow to provide further evidence of disease spread through water and what could be done to fix the issue.

This is similar to the Ebola outbreak of today where tracking the disease is important. John Snow’s idea of collecting data in the field and mapping it lives on in maps like those from HealthMap, which show the spread of the Ebola virus.

Data Exploration via Map

Today, we use data driven maps as a powerful tool for all sorts of reasons. But it all started with John Snow.

(For an interesting take on this event and other historical technology that changed the way we live today, watch the “Clean“ episode of the How We Got to Now series on PBS.)

To learn more about Volume Labs and Volume Integration, please follow us on Twitter @volumeint and check out our website.

10

10+ Surprising Geospatial Technologies

Data Organized on Map

I’ve spent years in the geospatial arena, so I’m a bit of a geospatial technology geek. But now it seems like the rest of the world is increasingly interested in this technology too.

You may remember the old latitude and longitude numbers that you learned about in school. Perhaps they didn’t seem very useful or relevant to life at the time, but these coordinates are now tracked constantly with our various GPS enabled gadgets. It’s becoming increasingly common to use coordinates to define the location of data collected, a person, landmark, and more. We can add even further accuracy by recording elevation and point in time.

I would like to describe some of the components that fall under the umbrella of geospatial technology. You might find some surprises!

Equipment

First, let’s discuss some of the tools used to collect geospatial data.

1. GPS

Global Positioning System (GPS) technology is the software and equipment needed to provide the location of things on the planet. This is most often done with the use of special satellites but is often augmented by other methods like WiFi signals. There are even technologies in use that determine location by looking at the stars.

2. Field Sensors

Field sensors are electronic devices that are placed to collect information about weather, soil, or other environmental conditions. These data collecting devices could be anything from a camera to a cell phone. During collection, the data is tagged with geospatial information, so the location of the event is known and can be mapped.

Overhead Imagery

My next geospatial category is overhead imagery. This includes all the imagery from aircrafts and satellites.

3. Visual Overhead Imagery

Visual overhead imagery includes what you see in Google Maps and Google Earth when you use the satellite function. This imagery could be collected via satellite or aircraft, and the technology used involves cameras, aircraft, satellites, global positioning systems, altimeters, and microwave transmission equipment. Today, even video is collected overhead by Planet Labs.

If you don’t own an airplane or satellite, can you collect visual overhead imagery? Yes! It doesn’t have to be expensive. Some hobbyists and students are cutting their teeth on low-cost imagery collection using kites and balloons.

Balloon mapping of Lake Borgne, Louisiana (Cartographer: Stewart Long/publiclab.org)

4. Hyperspectral Overhead Imagery

Hyperspectral refers to the waves of light that are beyond human sight. Engineers have developed sensors that can gather these waves from space, but it can also be done from aircraft. The data is then transformed into a visual representation through analysis and processing to create hyperspectral overhead imagery.

This type of geospatial technology has some surprising uses. Over at the US Geological Survey (USGS), they have used hyperspectral overhead imagery collected via satellite to detect the presence of arsenic in the leaves of ferns. Further analysis led them to aid in locating arsine gas canisters buried in Washington, DC. For more information, check out the full dissertation entitled _Remote Sensing Investigations of Furgative Soil Arsenic and its Effects on Vegetation Reflectance_.

5. LIDAR

Light Detection and Ranging (LIDAR) is a technology that uses an airborne system to measure distance by shining a laser to the ground and measuring the reflected light. This yields a very accurate contour of the earth’s surface as shown in the image of the Three Sisters below.

LIDAR image of the Three Sisters volcanic peaks in Oregon (DOGAMI)

LIDAR can also measure objects on the ground such as trees and houses. This type of data is used to determine elevation and is often used when processing other imagery to improve accuracy.

How do autonomous vehicles “see” where they are going and what is in the way? LIDAR, of course! Plus, it’s even used in various industries to make 3D models of buildings and topography.

Processing

So now that we collected all this imagery, how do we use it?

6. Imagery Processing Systems

The overhead imagery produced from satellites and aircraft is not perfect for human viewing in raw form. So we use imagery processing systems to help automate the manipulation of images and data collected. This collection of computer systems makes the images and data useful to us.

Most images are taken from an angle and must be adjusted or warped. Imagery processing systems assign each pixel a geographic coordinate and an elevation. This is done by combining GPS data that was collected with each click of the camera.

Often this process is called orthorectification. To see a simplified illustration, take a look at this orthorectification animation from Satellite Imaging Corporation.

7. Geospatial Mapping

Geospatial mapping is the process and technology involved in placing information on a map. It is often the final stage of geospatial processing.

Mapping combines data from many sources and layers it onto a map, so conclusions can be drawn about the data. There are different degrees of accuracy required in this process. For some applications, showing data in an approximate relation to each other is sufficient. But other applications, like construction and military exercises, require specialized software and equipment to be as precise as possible.

In an earlier post, I wrote about creating maps with D3. The goal was to build a heat map to display the count of documents for each place name as shown in the image below.

Data Organized on Map

Applications

Let’s explore the some of the applications of all this geospatial technology.

8. Geospatial Marketing

Geospatial marketing is the concept of using geospatial tools and the collection of location information to improve marketing to customers. This is often a subset of geospatial mapping, but this application combines data about customers’ locations. This can help determine where to place a store or how many customers purchase from a particular location. For example, companies can use data about where people typically go after a ballgame to determine where advertisements should be placed.

Another widespread application of geospatial data in marketing is using the IP addresses gained from customers browsing websites and viewing advertisements. These IP addresses can be geographically located, sometimes as specifically as a person’s house, and then used to target advertisements or redesign a website.

9. Location-Aware Applications

Location-aware applications are a category of technologies that are cognizant of their location and provide feedback based that location. In fact, if an IP address can be tied to a location, almost any application can be location-aware.

With the advent of smart phones, location-aware applications have become even more common. Of course, your phone’s mapping application can display your location on a map.

There are also smartphone apps that will trigger events or actions on a phone when you cross into a geospatial area. Some examples are Geofencer and PhoneWeaver.

Additionally, the cameras on smart phones can collect the location of the phone when taking a picture. This is imbedded within the picture and can be used by Facebook, Picasa, Photoshop, and other photo software to display locale information on a map. (You may want to disable this feature if you would rather not have people know where you live.)

10. Internet of Things

The Internet of Things (IoT) is the category of technology that includes electronic objects that connect to the internet and transmit their location. This is a broad and emerging area of geospatial technology that will add even more location data to the world.

IoT could contain objects like cars, fire alarms, energy savings devices like Nest and Neurio, fitness tracking bands like the ones from Jawbone or Nike, and more. For these IoT applications and devices to work optimally, they need to know your location and combine it with other information sensed around them.

Nike+ FuelBand (Peter Parkes/flickr.com)

11. Geospatial Virtual Reality

Virtual reality that makes use of geospatial data is another emerging category. This technology will allow for an immersive experience in realistic geospatial models.

Geospatial virtual reality incorporates all of the technologies listed above to put people into the middle of simulated real-word environments. It’s already been implemented with new hardware like the Oculus Rift, which is a virtual reality headset that enables players to step inside their favorite games and virtual worlds.

Oculus Rift (Sebastian Stabinger/commons.wikimedia.org)

Show Me the Data!

At the base of all of this technology is data. Increasingly, we have to invent more ways to store geospatial data in order for it to be processed and analyzed. The next steps of geospatial technologies involve attaching geospatial information to all data collection and then processing and filtering the massive amounts of data, which is known as big data.

This is my list of surprising geospatial technologies that matter today. It started out as a top 10 list, but evolved to 11 because I just couldn’t leave out geospatial virtual reality. It’s so cool! Feel free to add your suggestions of geospatial technologies in the comments below or as a pingback.