Visualizing HANA Graph with Zeppelin

The SAP HANA database has the capability to store information in a graph. A graph is a data structure with vertex or nodes and edges. Graph structures are powerful because you can perform some types of analysis more quickly like nearest neighbor or shortest path calculations. It also enables faceted searching. Zeppelin has recently added support for displaying network graphs.

Simple Network Graph

Zeppelin 8 has support for network graphs. You need to download the Zeppelin 8 Snapshot and build it to get these features. The code to make this graph is:

This is described in the documentation. Note that the part of the json that holds the attributes for the node or edge is in an inner json object called “data”. This is how each node and edge can have different data depending on what type of node or edge it is.

"data": {"fullName":"Andrea Santurbano"}}

Because I happen to be learning the features of SAP HANA I wanted to display a graph from HANA in using Zeppelin. A previous post shows how to connect Zeppelin to HANA.

I am using a series of paragraphs to build the data and then visualize the graph. I have already built my graph workspace in Hana with help from Creating Graph Database Objects and Hana Graph Reference. The difficult part is transforming the data and relationships so they fit into a vertex / node table and an edge table.

I am using sample data from meetup.com which contains events organized by groups and attended by members and held at venues.

It is important to figure out which attributes should exist on the edge and which ones should be in the nodes. HANA is good at allowing sparse data in the attributes because of the way it stores data in a columnar form. If you wish to display the data on a map and in a graph using the same structure supplied by Zeppelin it is important to put your geo coordinates in the nodes and not in the edges.

First we connect to the database and load the node and edge tables with Spark / Scala and build the data frames. One issue here was converting the HANA ST_POINT data type into latitude and longitude values. I defined a select statement with the HANA functions of ST_X() and ST_Y() to perform the conversions before the data is put into the dataframe.

You will have problems and errors if your tables in HANA have null values. Databases don’t mind null values. Scala seems to hate null. So you have to convert any columns that could have null values to something that makes sense. In this case I converted varchar to empty strings and double to 0.0

Then I query the data frames to get the data needed for the visualization and transform it into json strings for the collections of nodes and edges. In the end this note outputs two json arrays. One is the nodes and the other is the edges.


Now we will visualize the data using the new Zeppelin directive called %network.

In my example data extracted from meetup.com. I have four types of nodes: venue, member, group and event. These are defined as labels. My edges or relationships I have defined as: held, sponsored and rsvp. These become the lines on the graph. Zeppelin combines the data from the edges and nodes into a single table view.

So in tabular format it will look like this:

Zeppelin Edges

 

Zeppelin Nodes

When you press the network button in Zeppelin the graph diagram appears.

Zeppelin Network Graph Diagram

Under settings you can specify what data displays on the screen. It does not allow for specifying the edge label displays and does not seem to support a weight option.

Network Graph Settings

If you select the map option and have the leaflet visualization loaded you can show the data on a map. Since I put the coordinates in the edges it will map the edge data. It would be better if I moved the coordinates into each node so the nodes could be displayed on the map.

Zeppelin Graph On a Map

This will help you get past some of the issues I had with getting the Zeppelin network diagram feature to work with HANA and perhaps with other databases.

In the future I hope to show how to call HANA features such as nearest neighbor, shortest path and pattern  matching algorithms to enhance the graph capabilities in Zeppelin.

Please follow us on our website at https://volumeintegration.com and on twitter at volumeint

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *