A Volume Analytics Flow for Finding Social Media Bots

Volume Analytics Chaos Control

Volume Analytics Chaos Control

Volume Analytics is a software tool used to build, deploy and manage data processing applications.

Volume Analytics is a scalable data management platform that allows the rapid ingest, transformation, and loading high volumes of data into multiple analytic models as defined by your requirements or your existing data models.

Volume Analytics is a platform for streaming large volumes of varied data at high velocity.​

Volume Analytics is a tool that both enables rapid software development and operational maintainability with scalability for high data volumes. Volume Analytics can be used for all of your data mining, fusion, extraction, transform and loading needs. Volume Analytics has been used to mine and analyze social media feeds, monitor and alert on insider threats and automate the search for cyber threats. In addition it is being used to consolidate data from many data sources (databases, HDFS, file systems, data lakes) and producing multiple data models for multiple data analytics visualization tools. It could also be used to consolidate sensor data from IoT devices or monitor a SCADA industrial control network.

Volume Analytics easily facilitates a way to quickly develop highly redundant software that’s both scalable and maintainable. In the end you save money on labor for development and maintenance of systems built with Volume Analytics.

In other words Volume Analytics provides the plumbing of a data processing system. The application you are building has distinct units of work that need to be done. We might compare it to a water treatment plant. Dirty water comes in to the system in a pipe and comes to a large contaminate filter. The filter is a work task and the pipe is a topic. Together they make a flow.

After the first filter another pipe carries the water minus the dirt to another water purification worker. In the water plant there is a dashboard for the managers to monitor the system to see if they need to fix something or add more pipes and cleaning tasks to the system.

Volume Analytics provides the pipes, a platform to run the worker tasks and a management tool to control the flow of data through the system.

A Volume Analytics Flow for Finding Social Media Bots

A Volume Analytics Flow for Finding Social Media Bots

In addition Volume Analytics has redundancy for disaster recovery, high availability and parallel processing. This is where our analogy fails. Data is duplicated across multiple topics. The failure of a particular topic (pipe)  does not destroy any data because it is preserved on another topic. Topics are optimally setup in multiple data centers to maintain high availability.

In Volume Analytics the water filter tasks in the analogy are called tasks. Tasks are groups of code that perform some unit of work. Your specific application will have its own tasks. The tasks are deployed on more than one server in more than one data center.

Benefits

Faster start up time saves money and time.

Volume Analytics allows a faster start up time for a new application or system being built. The team does not need to build the platform that moves the data to tasks. They do not need to build a monitoring system as those features are included. However, Volume Analytics will integrate with your current monitoring systems.

System is down less often

The DevOps team gets visibility into the system out of the box. They do not have to stand up a log search system. So it saves time. They can see what is going on and fix it quickly.

Plan for Growth

As your data grows and the system needs to process more data Volume Analytics grows. Add server instances to increase the processing power.  As work grows Volume Analytics allocates work to new instances. There is no re-coding needed. Save time and money as developers are not needed to re-implement the code to work at a larger scale.

Less Disruptive deployments

Construct your application in a way that allows for deployments of new features with a lower impact on features in production. New code libraries and modules can be deployed to the platform and allowed to interact with the already running parts of the system without an outage. A built in code library repository is included.

In addition currently running flows can be terminated while the data waits on the topics for the newly programmed flow to be started.

This Flow processes files to find IP addresses, searches multiple APIs for matches and inserts data into a HANA database

This Flow processes files to find IP addresses, searches multiple APIs for matches and inserts data into a HANA database

A data processing search threats flow in production. Each of the boxes is a task that performs a unit of work. The task puts the processed data on the topic represented by the star. Then the next task picks up the data and does another part of the job. The combination of a set of tasks and topics is a flow.

Geolocate IP Flow

Geolocate IP Flow

Additional flow to geolocate IP addresses added as the first flow is running.

Combined Flows

Combined Flows

The combination of flows working together. The topic ip4-topic is an integration point.

Modular

Volume Analytics is modular and tasks are reusable. You can reconfigure your data processing pipeline without introducing new code. You can use tasks in more than one application.

Highly Available

Out of the box, Volume Analytics highly available due to its built in redundancy. Work tasks and topics (pipes) run in triplicate. As long as your compute instances are in multiple data centers you will have redundancy built in. Volume Analytics knows how to balance the data between duplicate and avoid data loss if one or more work tasks fail — this extends to the concept of queuing up work if all work tasks fail.

Integration

Volume Analytics integrates with other products. It can retrieve and save data to other systems like topics, queues, databases, file systems and data stores. In addition these integrations happen over encrypted channels.

In our sample application CyberFlow there are many tasks that integrate with other systems. The read bucket task reads files from an AWS S3 bucket, the ThreatCrowd is an API call to https://www.threatcrowd.org and Honeypot calls to https://www.projecthoneypot.org. Then the insert tasks integrate to the SAP HANA database used in this example.

Volume Analytics integrates with your enterprise authentication and authorizations systems like LDAP, ActiveDirectory, CAP and more.

Data Management

Ingests datasets from throughout the enterprise, tracking each delivery and routing it through Volume Analytics to extract the greatest benefit. Shares common capabilities such as text extraction, sentiment analysis, categorization, and indexing. A series of services make those datasets discoverable and available to authorized users and other downstream systems.

Data Analytics

In addition, to the management console Volume Analytics comes with an notebook application. This allows a data scientist or analyst to discover and convert data into information on reports. After your data is processed by Volume Analytics and put into a database the Notebook can be used to visualize the data. The data is sliced and diced and displayed on graphs, charts and maps.

Volume Analytics Notebook

Flow Control Panel

Topic Control Panel

The Flow control panel allows for control and basic monitoring of flows. Flows are groupings of tasks and topics working together. You can stop, start and terminate flows. Launch additional flow virtual machines when there is heavy load of data processing work from this screen. The panel also gives access to start up extra worker tasks as needed. There is also a link that will allow you to analyze the logs in Kibana

Topic Control Panel

Topic Control Panel

The topic control panel allows for the control and monitoring of topics. Monitor and delete topics  from here.

Consumer Monitor Panel

Consumer Monitor Panel

The consumer monitor panel allows for the monitoring of consumer tasks. Consumer tasks are the tasks that read from a topic. They may also write to a topic. This screen will allow you to monitor that the messages are being processed and determine if there is a lag in the processing.

Volume Analytics is used by our customers to process data from many data streams and data sources quickly and reliably. In addition, it has enabled the production of prototype systems that scale up into enterprise systems without rebuilding and re-coding the entire system.

And now this tour of Volume Analytics leads into a video demonstration of how it all works together.

Demonstration Video

This video will further describe the features of Volume Analytics using an example application which parses ip addresses out of incident reports and searches other systems for indications of those IP addresses. The data is saved into a SAP HANA database.

Request a Demo Today

Volume Analytics is scalable, fast, maintainable and repeatable. Contact us to request a free demo and experience the power and efficiency of Volume Analytics today.

Contact

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *