Posts

Zeppelin Maps the Easy Way

You want your data on a map in Apache Zeppelin.

Apache Zeppelin allows a person to query all sorts of databases and big data systems using many languages. This data can then be displayed as graphs and charts. There is also a way to make a data driven map.

A map with data markers on it in Zeppelin

A map with data markers on it in Zeppelin

There is a very easy way to get a basic map with markers on it into Zeppelin. The first thing we will do is to query the database for the data we want to put on the map. First create a new note and add a query for your database of choice.

If you are using SAP HANA there are directions on how to install the jdbc driver at How to Use Zeppelin With SAP HANA

If you are using another database just use the %jdbc interpreter and modify your database configuration settings in the interpreter settings section of Zeppelin.

The query you write must have a column for the longitude and the latitude.

Query for data with Zeppelin

Query for data with Zeppelin

I my example I am using the query “select country_name, lat, lng, event_type from event_view where actor_type_id=’2′ and country_name =’South Sudan’ limit 10”. Note that my interpreter is Hana. This is described in a previous post about Hana and Zeppelin.

Once this data in in the table the plug-in for the map needs to be installed. In the upper right corner click on the login indicator and select Helium.

The Zeppelin Setup Menu

The Zeppelin Setup Menu

A list of Helium plug-ins will be displayed. Enable the one called zeppelin-leaflet or volume-leaflet. The Helium module for maps utilizes another javascript library called Leaflet.

Helium Modules List

Helium Modules

Once it is enabled go back to your note where you created the map query. A new button will show up.  It looks like the earth on a white background. When you press this a map will appear that needs to be configured.

Configure Map Settings

Configure Map Settings

Press the settings link. As shown above you will see the fields from the query. Just drag the fields down into the boxes called latitude, longitude, tooltip and popup.

Mapping the Columns for the map

Mapping the Columns

Once the latitude and longitude is filled in the map will appear with markers on it for the data points. If you put columns into the tooltip and popup boxes the tooltips and popups will work also.

The Finished Map

The Finished Map

This demonstrates the easiest way to add a map to Zeppelin. More advanced data processing before a map is built requires writing a paragraph of code in Spark Scala, Python or another language supported by Apache Zeppelin.

Future posts will show how to write Scala code to preprocess data and another post on how to draw a box on the map to select the area of the earth to be queried for data points.

Please follow us on our website at https://volumeintegration.com and on twitter at volumeint

Mapping an Epidemic

Mapping an Epidemic

This map changed the way we see the world and the way we study science, nature, and disease.

In August of 1854, cholera was ravaging the Soho neighborhood of London where John Snow) was a doctor. People were fleeing the area as they thought cholera was spread by gasses in the air or, as they called it, “bad air.”

Just as there is disinformation today about Ebola being airborne, the experts of that time thought most disease was spread in the air. There was no concept that disease might be in the water. They had no idea that bacteria even existed.

John had worked as a doctor in a major outbreak of cholera in a mine. But despite working in close quarters with the miners, he never contracted the disease. He wondered why the air did not affect him.

This inspired him to write a paper on why he believed cholera was spread through water and bodily fluids. The experts at the time did not accept his theory; they continued to believe cholera was caused by the odors emitted by rotting waste.

In the Soho outbreak in August 1854, John Snow saw a chance to further prove his theory. He went door to door keeping a tally of deaths at each home. This was only part of his quest to find evidence to prove the source of the plagues of the day.

He had been collecting statistical information, personal interviews, and other research for many years. He added this information to his paper, “On the Mode of Communication of Cholera.” The paper and his work in researching and collecting evidence founded the science of epidemiology.

One of the most innovative features was plotting data using a map; it was the first published use of dots on a map supporting a scientific conclusion. Each of the bars on John Snow’s map represents one death. Using this visual technique, he could illustrate that the deaths were centered around a point and further investigate and interview people in the area. He could also find anomalies and outliers such as deaths far from the concentration and areas with no deaths.

Epicenter Pump and Brewery

He found through personal interviews and mapping the data that the workers in the brewery (in the epicenter of the epidemic) were not dying. The owner of the brewery said that the workers were given free beer, and he thought that they never drank water at all. In fact, there was a deep well in the brewery used in the beer. In other cases, John Snow found that addresses with low deaths had their own personal well.

He also investigated the outlying incidents through interviews: some worked in the area of the pump or walked by it on the way to school. One woman who got sick had the water brought to her by a wagon each day because she liked the taste of that particular well water. One person he talked to even said the water smelled like sewage and did not drink it, but his servant did and came down with a case of cholera.

The incidents highlighted the area around a public pump on Broad Street. Using his data, he convinced the local authorities to have the pump handle removed.

The most innovative feature of the map is that it changed the way we use maps. The idea that data could be visualized to prove a fact was very new.

John Snow’s map of the service areas of two water companies

John Snow also produced another map showing which water companies supplied water in London. This map showed that the water company which stopped using water from the Thames had a lower death rate due to cholera. The map allowed John Snow to provide further evidence of disease spread through water and what could be done to fix the issue.

This is similar to the Ebola outbreak of today where tracking the disease is important. John Snow’s idea of collecting data in the field and mapping it lives on in maps like those from HealthMap, which show the spread of the Ebola virus.

Data Exploration via Map

Today, we use data driven maps as a powerful tool for all sorts of reasons. But it all started with John Snow.

(For an interesting take on this event and other historical technology that changed the way we live today, watch the “Clean“ episode of the How We Got to Now series on PBS.)

To learn more about Volume Labs and Volume Integration, please follow us on Twitter @volumeint and check out our website.