A Volume Analytics Flow for Finding Social Media Bots

Volume Analytics Chaos Control

Volume Analytics Chaos Control

Volume Analytics is a software tool used to build, deploy and manage data processing applications.

Volume Analytics is a scalable data management platform that allows the rapid ingest, transformation, and loading high volumes of data into multiple analytic models as defined by your requirements or your existing data models.

Volume Analytics is a platform for streaming large volumes of varied data at high velocity.​

Volume Analytics is a tool that both enables rapid software development and operational maintainability with scalability for high data volumes. Volume Analytics can be used for all of your data mining, fusion, extraction, transform and loading needs. Volume Analytics has been used to mine and analyze social media feeds, monitor and alert on insider threats and automate the search for cyber threats. In addition it is being used to consolidate data from many data sources (databases, HDFS, file systems, data lakes) and producing multiple data models for multiple data analytics visualization tools. It could also be used to consolidate sensor data from IoT devices or monitor a SCADA industrial control network.

Volume Analytics easily facilitates a way to quickly develop highly redundant software that’s both scalable and maintainable. In the end you save money on labor for development and maintenance of systems built with Volume Analytics.

In other words Volume Analytics provides the plumbing of a data processing system. The application you are building has distinct units of work that need to be done. We might compare it to a water treatment plant. Dirty water comes in to the system in a pipe and comes to a large contaminate filter. The filter is a work task and the pipe is a topic. Together they make a flow.

After the first filter another pipe carries the water minus the dirt to another water purification worker. In the water plant there is a dashboard for the managers to monitor the system to see if they need to fix something or add more pipes and cleaning tasks to the system.

Volume Analytics provides the pipes, a platform to run the worker tasks and a management tool to control the flow of data through the system.

A Volume Analytics Flow for Finding Social Media Bots

A Volume Analytics Flow for Finding Social Media Bots

In addition Volume Analytics has redundancy for disaster recovery, high availability and parallel processing. This is where our analogy fails. Data is duplicated across multiple topics. The failure of a particular topic (pipe)  does not destroy any data because it is preserved on another topic. Topics are optimally setup in multiple data centers to maintain high availability.

In Volume Analytics the water filter tasks in the analogy are called tasks. Tasks are groups of code that perform some unit of work. Your specific application will have its own tasks. The tasks are deployed on more than one server in more than one data center.

Benefits

Faster start up time saves money and time.

Volume Analytics allows a faster start up time for a new application or system being built. The team does not need to build the platform that moves the data to tasks. They do not need to build a monitoring system as those features are included. However, Volume Analytics will integrate with your current monitoring systems.

System is down less often

The DevOps team gets visibility into the system out of the box. They do not have to stand up a log search system. So it saves time. They can see what is going on and fix it quickly.

Plan for Growth

As your data grows and the system needs to process more data Volume Analytics grows. Add server instances to increase the processing power.  As work grows Volume Analytics allocates work to new instances. There is no re-coding needed. Save time and money as developers are not needed to re-implement the code to work at a larger scale.

Less Disruptive deployments

Construct your application in a way that allows for deployments of new features with a lower impact on features in production. New code libraries and modules can be deployed to the platform and allowed to interact with the already running parts of the system without an outage. A built in code library repository is included.

In addition currently running flows can be terminated while the data waits on the topics for the newly programmed flow to be started.

This Flow processes files to find IP addresses, searches multiple APIs for matches and inserts data into a HANA database

This Flow processes files to find IP addresses, searches multiple APIs for matches and inserts data into a HANA database

A data processing search threats flow in production. Each of the boxes is a task that performs a unit of work. The task puts the processed data on the topic represented by the star. Then the next task picks up the data and does another part of the job. The combination of a set of tasks and topics is a flow.

Geolocate IP Flow

Geolocate IP Flow

Additional flow to geolocate IP addresses added as the first flow is running.

Combined Flows

Combined Flows

The combination of flows working together. The topic ip4-topic is an integration point.

Modular

Volume Analytics is modular and tasks are reusable. You can reconfigure your data processing pipeline without introducing new code. You can use tasks in more than one application.

Highly Available

Out of the box, Volume Analytics highly available due to its built in redundancy. Work tasks and topics (pipes) run in triplicate. As long as your compute instances are in multiple data centers you will have redundancy built in. Volume Analytics knows how to balance the data between duplicate and avoid data loss if one or more work tasks fail — this extends to the concept of queuing up work if all work tasks fail.

Integration

Volume Analytics integrates with other products. It can retrieve and save data to other systems like topics, queues, databases, file systems and data stores. In addition these integrations happen over encrypted channels.

In our sample application CyberFlow there are many tasks that integrate with other systems. The read bucket task reads files from an AWS S3 bucket, the ThreatCrowd is an API call to https://www.threatcrowd.org and Honeypot calls to https://www.projecthoneypot.org. Then the insert tasks integrate to the SAP HANA database used in this example.

Volume Analytics integrates with your enterprise authentication and authorizations systems like LDAP, ActiveDirectory, CAP and more.

Data Management

Ingests datasets from throughout the enterprise, tracking each delivery and routing it through Volume Analytics to extract the greatest benefit. Shares common capabilities such as text extraction, sentiment analysis, categorization, and indexing. A series of services make those datasets discoverable and available to authorized users and other downstream systems.

Data Analytics

In addition, to the management console Volume Analytics comes with an notebook application. This allows a data scientist or analyst to discover and convert data into information on reports. After your data is processed by Volume Analytics and put into a database the Notebook can be used to visualize the data. The data is sliced and diced and displayed on graphs, charts and maps.

Volume Analytics Notebook

Flow Control Panel

Topic Control Panel

The Flow control panel allows for control and basic monitoring of flows. Flows are groupings of tasks and topics working together. You can stop, start and terminate flows. Launch additional flow virtual machines when there is heavy load of data processing work from this screen. The panel also gives access to start up extra worker tasks as needed. There is also a link that will allow you to analyze the logs in Kibana

Topic Control Panel

Topic Control Panel

The topic control panel allows for the control and monitoring of topics. Monitor and delete topics  from here.

Consumer Monitor Panel

Consumer Monitor Panel

The consumer monitor panel allows for the monitoring of consumer tasks. Consumer tasks are the tasks that read from a topic. They may also write to a topic. This screen will allow you to monitor that the messages are being processed and determine if there is a lag in the processing.

Volume Analytics is used by our customers to process data from many data streams and data sources quickly and reliably. In addition, it has enabled the production of prototype systems that scale up into enterprise systems without rebuilding and re-coding the entire system.

And now this tour of Volume Analytics leads into a video demonstration of how it all works together.

Demonstration Video

This video will further describe the features of Volume Analytics using an example application which parses ip addresses out of incident reports and searches other systems for indications of those IP addresses. The data is saved into a SAP HANA database.

Request a Demo Today

Volume Analytics is scalable, fast, maintainable and repeatable. Contact us to request a free demo and experience the power and efficiency of Volume Analytics today.

Contact

How Do You Host Website on Amazon AWS_

How Do You Host Website on Amazon AWS?

At Volume Labs we have been working to convert our site from WordPress to a static site. In doing this we determined that Hexo was the best tool for us. When considering where to deploy the new site we instantly thought of AWS because they have a way to host static pages right out of S3. We have deployed Volume Labs and Volume Integration to AWS and I will show you how to in this post.

First create an S3 bucket. I named ours using the name of our website. AWS S3 buckets are a place you can store files on AWS and each bucket is unique across all users so our domain name should be unique. S3 is more cost efective than using an AWS server instance.

S3 Bucket Button aws

 

Create the Bucket

S3 is redundant as the data you store there is spread across at least three data centers. You pay for the amount of storage used and the bandwidth used to get it in and out of S3.

When you create the S3 bucket give it the following properties by clicking the properties button. This will configure it to act like a web host and serve up the web pages.

S3 Bucket Properties

Note your website address for the bucket. You will need this later.

Then configure the policy document to allow everyone on the internet read access to your files.

S3 Bucket Policy

S3 Bucket Policy

{ "Version": "2012-10-17", "Id": "Policy1477706476623", "Statement": [ { "Sid": "Stmt14777064735", "Effect": "Allow", "Principal": "*", "Action": "s3:GetObject", "Resource":"arn:aws:s3:::yoursite.com/*" } ] }

Upload your files to the S3 bucket. Use the upload button or the S3 api to upload your files.

Copy your website address that AWS gave you for the S3 bucket and test it out in a browser.

Now configure a certificate with Certificate Manager. Make sure you configure the certificate before you change the DNS settings for the provider of your domain name. When you request a certificate AWS will send you an e-mail to authorize it using the contacts in your dns entry. It will also try webmaster, hostmaster, administrator, postmaster and admin@your-domain.com

Certificate Button

Request a certificate

Enter all of the domains that your site should respond to. Use your main domain name and www. at least. When you finalize your request AWS will send you an e-mail to validate your certificate. Make sure as the owner of your domain name that you have setup your domain registration to send you e-mail. If you have the MX record setup correctly with your domain name provider it will send you an e-mail.

Then we setup CloudFront. CloudFront allows for distribution of content across the world. It caches your content to many servers all over the world this puts the content closer to the people viewing it.

In CloudFront you can configure the connection to the S3 bucket and to the certificate. It is also possible to have it serve up the files with compression this will further improve the speed that your website is delivered to the browser.

CloudFront Button

Press the Create Distribution button. Pick the web delivery method on the next screen.

CloudFront Delivery Method

Set the origin settings. This is where you tell it to read the files from your S3 bucket. Paste in the url for your S3 bucket you saved earlier into the Origin Domain Name.

CloudFront Origin Settings

Now setup the Cache Behaviour. Since we are setting up a SSL TLS certificate turn on the Redirect http to https setting. Also turn on the compress objects setting to improve the speed of downloading your pages. Keep the other settings as is. You can reduce the Time To Live TLS if your pages change more often.

CloudFront Cache Behavior

For the worldwide coverage set the distribution behaviour to use all edge locations. It will push your pages out to servers all over the world. Enter the alternate domain names your site will use. And then set the custom SSL certificate to the one we created using the AWS Certificate Manager.

CloudFront Distribution Behavior

In addition turning off IPv6 will make it easier to deploy using Route 53. So I turned off IPv6 in the distribution behaviour section.

Press the Create Distribution to finish the work here in CloudFront.

After CloudFront finishes distributing your site use the url it generates to view your site. Save this url in order to configure Route 53. It is called that because the default port for DNS is port 53.

Now you need to configure Route 53. Route 53 is a DNS service on steroids with non of the side effects. This is the final step to getting your personal domain name to serve up the content.

Route 53 allows for aliases that will route the requests asking for your root domain and www. subdomain to the CloudFront distribution. It will also direct http traffic to https

Go to Route 53 in the AWS Console. Press the Create Hosted Zone button.

Route 53 Hosted Zone Button

Enter the top level domain name of your site. And press Create. You will see that some settings are created.

Route 53 Name Servers

Take the nameserver NS settings that are generated by Route 53 and enter them in the DNS settings at domain name provider. This will allow AWS Route 53 to act as your domain name service and give you all the nice features in Route 53.

The next step is to set up an alias that will guide requests to your pages sitting in CloudFront which is getting them from S3.

We need two alias routes set as record sets in this screen. So press the Create Record Set button.

Route 53 Alias Record

Click on the alias Yes radio button and enter the CloudFront url where your site is hosted from. Leave the name field blank. Then do it again. Create another record set with the name of www and set the alias to yes and enter the url for the CloudFront distributions again. This will create the route for the www sub domain for your site.

Now wait. It takes 1 to 48 hours for the worldwide network of DNS servers to get the changes you made at your domain name provider.

As another option you can purchase or move your domain name to AWS.

In order to set up e-mail forwarding I use this free e-mail forwarding service Improve MX. Just register your domain with ImproveMX and create a record set for a MX record in Route 53 with the mail server settings.

Now enjoy your site. You should see increased performace over a run of the mill web hosting provider and your costs might be lower.

Please follow use at VolumeInt. And check us out at Volume Integration

Cloud Management Demands an Organizational Shake-Up

Cloud Management Demands an Organizational Shake-Up

(flickr.com/George Thomas)

The cloud is here. Most organizations now have contracts that allow for the construction of applications in a cloud environment. The cloud has promised lower costs, great efficiencies, and greater security. But these cost savings cannot be realized without simplifying the organizational structure.

The Rules Have Changed

Every new technology creates new processes. When the personal computer came around, it forced the established information technology department to change or die. The departments that would not change had their work supplanted by other departments that found lower cost and more efficient ways to work. Why pay your big IT department for time on a mainframe, when you can just go buy a PC that does it cheaper?

The same scenario is happening again with the cloud. Cloud computing provides the ability to only pay for the processing and storage you need on demand. In addition, it does not require staff to install and configure servers since all cloud services provide a web interface to instantly use a new server.

We do not manage client/server networks in the same way that we managed mainframes. So depending on old processes and labor categories will greatly hamper the cloud and ultimately make it as inefficient as the old mainframe.

Let’s use companies that build internet applications as guide. It’s possible for a small department to create an innovation so powerful that it can supersede much larger organizational groups. This can occur when those small departments use the cloud, which allows them to purchase only the computing power needed. The first group that figures out how to bypass the old process, policies, and rules is able to build something so important that policies, rules, and organizational structures are redefined.

I have fallen into the trap of rigidly following old policies myself. There is a great new technology called Hadoop, which is a way to process big data over many servers by dividing it into small pieces. But Hadoop requires code and data to be distributed on many computers. So I rejected Hadoop because of an organizational policy against automatic remote code execution. But it turns out that there were other divisions building tools with Hadoop and proving that the technology could change the way data analysis happens. After they showed the power of Hadoop, the policy changed.

For revolutionary technology, there is a way to mitigate risk and modify policies. The adoption of Hadoop has revolutionized the way big data is processed in my organization and made many small groups the most powerful and efficient in meeting new mission sets.

Change the Roles

Right now, many teams are divided into these groups: users (analysts, statisticians, etc.), decision makers, requirements, system administrators, database administrators, systems engineers, programmers, testers, project managers, and security engineers. With the new cloud systems, where almost anyone in this list can learn how to start up a new virtual machine and install software on it, why are all these roles needed?

I have found that it works best to employ one or more technical generalists who know system administration, databases, programming, systems engineering, testing, and requirements. This technical generalist can get a new application running very easily. After sitting with the actual users, technical generalists can collect ideas, build, and show progress quickly.

The cloud is part of what makes this possible. People who build the solution that users need can instantly start up new servers and compute clusters on demand. If the application is successful, they can instantly scale up without having to wait on another department to start up the servers.

Teamwork?

DevOps

Consider combining development with operations and maintenance. The cloud forces this issue anyway, and you gain great efficiencies.

In the cloud, developers and testers are concerned about real production issues. The cloud makes it easier to deploy new software because the developers think of it up front and build code to automatically deploy the software. Developers need to learn how to do system administration in order to write better code, and system administrators need to learn how to code to deploy applications better.

If a developer or analyst can make the decision to implement a new service on a new virtual machine, why should he wait on someone else to click a few buttons on a web interface to start up a new instance? In the worst cases, there is a long list of departments and boards that need to approve the action.

It’s even better if you are able to find a motivated user who can actually build or prototype the application. They know exactly what is needed, and if they have the right technical team supporting them, they will find a way to get it done. I have seen analysts and mathematicians with access to computational power manipulate it on demand as the mission shifted. They were able to gain the technical title that allowed them to be analyst, system administrator, and programmer all in one; they changed the way things work.

Automate

Instead of hiring more people, make sure the people you have are writing code and scripts to automate the work.

  • Testers should be using code to test applications repetitively.
  • System administrators can write code to deploy patches and code.
  • Programmers can write code to test and deploy systems and set up monitoring tools to watch everything.
  • Analysts and mathematicians can write code to filter and sift the data.

Everyone can rely on others in the group to come up with solutions together. But if the key players are in different departments, they will not be able to work together effectively. In my experience, it’s better to have a small team together than a large, disparate one.

With cloud computing, it is possible to automate the scaling up and down of servers. The right code makes it possible to have servers deploy themselves automatically as load goes up or when there are server failures. Work toward building systems that can utilize this feature of the cloud.

Accomplish the Mission

The key is to get everyone closer to the actual mission – solving the problem with software. Too often, work is tightly controlled by functional divisions. In these cases, the system administrator is able to keep the server running but has no power or responsibility to keep the network, database, or application up. But a running server with no network isn’t very useful to the mission.

The use of the cloud puts more power than ever before in the hands of people with technical skills. Anyone with an internet connection can write an application and deploy it on a cloud server for almost nothing. But within large organizations, we get stymied by processes and labor categories. We lack the access to develop and deploy new technology without impediments.

My recommendation is to find ways to collapse job roles and allow technical generalists to gain direct access to the necessary resources needed. The cloud will only live up to its promise if we can control it directly.

What is your experience in deployment of solutions to the cloud? Does your bureaucracy get in the way?

To learn more about Volume Labs and Volume Integration, please follow us on Twitter @volumeint and check out our website.