DC Crime in Zeppelin Hana

HANA & Zeppelin Map DC Crime

In working on a small demonstration application with a map to showcase Volume Analytics and SAP HANA we needed to import the DC Crime data into the HANA database and visualize it in Apache Zeppelin in a leaflet map.

The data is hosted on a dc.gov site with an odd interface. You need to specify that you want the entire district and then pick a time bounded slice of data. Once it finds everything there is a small link that says “Download crime data”. Then you pick the fields you want and press the get data button.

Get DC Crime Data

Data Format

The data it delivers is in a comma delimited format as seen below.

REPORT_DAT,SHIFT,OFFENSE,METHOD,BLOCK,DISTRICT,PSA,WARD,ANC,NEIGHBORHOOD_CLUSTER,BLOCK_GROUP,CENSUS_TRACT,VOTING_PRECINCT,CCN,XBLOCK,YBLOCK,START_DATE,END_DATE
11/16/2018 4:25:34 PM,EVENING,THEFT/OTHER,OTHERS,5100 - 5199 BLOCK OF WISCONSIN AVENUE NW,2,202,3,3E,Cluster 11,001001 1,001001,Precinct 31,18194950,392758,143179,11/16/2018 3:19:45 PM,11/16/2018 4:25:42 PM
11/16/2018 9:46:50 PM,EVENING,THEFT F/AUTO,OTHERS,1400 - 1499 BLOCK OF PARK ROAD NW,3,302,1,1A,Cluster 2,002802 2,002802,Precinct 39,18194960,397068,140306,11/16/2018 3:40:02 PM,11/16/2018 4:30:14 PM

Create Table

I used the HANA studio’s import feature to import the data into a table. I defined the coordinates as DOUBLE columns.

CREATE COLUMN TABLE "CYBERFLOW"."IP_LOCATION" ("IP" VARCHAR(65),
"COUNTRY_CODE"VARCHAR(10),
"COUNTRY" VARCHAR(150),
"REGION_CODE"VARCHAR(10),
"REGION_NAME"VARCHAR(300),
"CITY" VARCHAR(300),
"ZIP" VARCHAR(50),
"LAT" DOUBLE CS_DOUBLE,
"LON" DOUBLE CS_DOUBLE,
"ISP" VARCHAR(150),
"ORG" VARCHAR(150),
"AAS" VARCHAR(150),
"SOURCE" VARCHAR(500),
"LOOKUP_DATE" DATE CS_DAYDATE) UNLOAD PRIORITY 5 AUTO MERGE

The main issue is the spatial reference system being used. It is encoded in NAD 83 Maryland. This page at spatialreference.org describes it.

The problem is that most web map tools do not support this reference system. Most web maps work best with 4326 / WGS 84. NAD 83 Maryland is a flat earth model that works best for the local Washington DC and Maryland area.

Using HANA to Convert

This tutorial helped me figure this out, which explains how to transform from one reference system to another. HANA comes with four spacial systems defined. Of course NAD 83 Maryland is not one of them. But it is not hard to add a new one. This document describes the command needed to add a reference system to HANA.

The first step is to get the well known text for NAD 83 / Maryland. It is found at spatialreference.org. And we need the coordinate range for X and Y. I pulled it from this page but I now believe that is incorrect but my dots on the map of DC seem to be correct.

This is what I ended up with. This HANA command will create the reference system that is close to what I need. I recommend consulting a geospatial expert to confirm accuracy. I made it planar because this is a flat earth model. Taking the code from the SAP HANA tutorial I modified the coordinates in the transform definition at the bottom based on the well known text.

CREATE SPATIAL REFERENCE SYSTEM "NAD 83 / Maryland" 
IDENTIFIED BY 1000026985 
LINEAR UNIT OF MEASURE "meter" 
TYPE PLANAR 
COORDINATE X BETWEEN 36655.85913986 AND 181219.09721397
COORDINATE Y BETWEEN 230987.04872254 AND 574026.61735490
DEFINITION 'PROJCS["NAD83 / Maryland",
    GEOGCS["NAD83",
        DATUM["North_American_Datum_1983",
            SPHEROID["GRS 1980",6378137,298.257222101,
                AUTHORITY["EPSG","7019"]],
            AUTHORITY["EPSG","6269"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.01745329251994328,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4269"]],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    PROJECTION["Lambert_Conformal_Conic_2SP"],
    PARAMETER["standard_parallel_1",39.45],
    PARAMETER["standard_parallel_2",38.3],
    PARAMETER["latitude_of_origin",37.66666666666666],
    PARAMETER["central_meridian",-77],
    PARAMETER["false_easting",400000],
    PARAMETER["false_northing",0],
    AUTHORITY["EPSG","26985"],
    AXIS["X",EAST],
    AXIS["Y",NORTH]]'
    TRANSFORM DEFINITION '+proj=lcc
 +lat_1=39.45 +lat_2=38.3 
 +lat_0=37.66666666666666 +lon_0=-77 +x_0=400000
 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs';

Create HANA Table

Now we need a table that we can move the data into. I created a table with both points for both coordinate systems.

CREATE COLUMN TABLE "CYBERFLOW"."DC_CRIME_GEO" 
("REPORT_DAT" NVARCHAR(22), 
"SHIFT" NVARCHAR(20), 
"OFFENSE" NVARCHAR(100), 
"METHOD" NVARCHAR(100), 
"BLOCK" NVARCHAR(200), 
"DISTRICT" SMALLINT CS_INT, 
"PSA" SMALLINT CS_INT, 
"WARD" TINYINT CS_INT, 
"ANC" NVARCHAR(10), 
"NEIGHBORHOOD_CLUSTER" NVARCHAR(20), 
"BLOCK_GROUP"NVARCHAR(20), 
"CENSUS_TRACT"NVARCHAR(20), 
"VOTING_PRECINCT"NVARCHAR(15), 
"CCN" INTEGER CS_INT, 
"GEO_POINT" ST_POINT(1000026985) CS_POINT, 
"GEO_POINT_4326" ST_POINT(4326) CS_POINT, 
"START_DATE"NVARCHAR(22), 
"END_DATE" NVARCHAR(22)) 
UNLOAD PRIORITY 5 AUTO MERGE;

Move and Convert Data

Then I could use a select with insert statement to convert the latitude and longitude doubles into HANA ST_POINTS. By using the ST_TRANSFORM function it is possible to convert from one coordinate system to another. But you must have the coordinate system defined in HANA.

insert into "CYBERFLOW"."DC_CRIME_GEO" select "REPORT_DAT","SHIFT","OFFENSE","METHOD","BLOCK","DISTRICT","PSA","WARD","ANC","NEIGHBORHOOD_CLUSTER","BLOCK_GROUP", "CENSUS_TRACT","VOTING_PRECINCT","CCN", NEW ST_POINT( XBLOCK,YBLOCK).ST_SRID(1000026985).ST_TRANSFORM( 1000026985), NEW ST_POINT( XBLOCK,YBLOCK).ST_SRID(1000026985).ST_TRANSFORM( 4326), "START_DATE","END_DATE" from "CYBERFLOW"."DC_CRIME"

View In Zeppelin on a Map

Now that the data is in HANA in the proper coordinate system we can view it in Zeppelin with the Volume map helium plug-in.

DC Crime in Zeppelin & HANA


0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *