Log Analytics and Infrastructure Metrics

A complete end-to-end solution to collect, store and analyse log data and Infrastructure Metrics using open-source technologies. Compliance with SOX, GDPR, PII and other requirements.

Introduction

The exponential growth of log files, increasing requirements to react on time, business needs to store, analyse and regulatory pressures (SOX, GDPR, PII, others) are among the key forces to search for a viable solution going forward.

Log Analytics is at the heart of such cool things as:

  • Predictive Security Monitoring
  • Real-Time Detection of Anomalies
  • Personalisation engines for online stores
  • Discovery of data-driven competitive advantages

It’s almost impossible for traditional log analysis software to enable any two of these and grasp the whole enterprise at once. At least in a cost-effective way of going forward. Provided that volumes of data and pressure to shorten response times will ever be increasing, whatever we build must be open, modular and horizontally scalable.

Openness – will ensure we are not tight to any particular vendor solution, that is supported by a much smaller developer community.

Modularity – will ensure that we can use the best of breed components to get maximum performance and results from our solution.

Horizontal scalability – will ensure our OPEX don’t get skyrocketing if we scale up the solution x100 times.

Business Requirements

1. The solution must be able to collect log files from various OS and applications at source systems and transfer them into centralized storage.

2. The storage component must be able to provide quantitative statistics and searching over log files.

3. The BI component must have the capability to perform statistical analysis (count, aggregation, regression, mean, standard deviation, etc.) on the stored data.

4. The solution must be Open-Source, horizontally scalable and Enterprise ready with professional 24/7 support.

5. The solution must have open and well-documented API and data ingestion and consumption.

6. The solution must allow role-based access control to the system and must log all the security-related activities of both service and user accounts.

Picture1

Solution Summary

We are building the Data Lake according to Lambda Architecture reference model that enables both near real-time and historical data analytics. The solution will centrally store and manage device logs, user logs and infrastructure metrics from all connected devices and systems.

Event logs – provide a comprehensive view into how your systems and their components are performing at any point in time: if your servers are running fine, or if there are any network failures and abnormalities in your network.

User logs – bring an intimate understanding of your online user behaviours, such as what they did on your website, things they clicked on, etc during the buyer's journey. Analysing raw user logs allows more control, accuracy, and transparency into user activities beyond statistics provided by standard web analytics services like Google Analytics or Omniture.

Infrastructure metrics – bring an understanding of the state of your systems is essential for ensuring the reliability and stability of your applications and services.

Horizontal scalability is one of the core principals behind choosing the technology stack and architecture. I’ve decided to use the following open-source technology stack that is ready for large scale deployments with enterprise level support:

  • Hortonworks Hadoop Data Platform
  • Apache Kafka (as part of Hortonworks HDF)
  • Elastic Stack

 

Picture2

Configuration files

//to be added

Reports

//to be added

Capacity Planning

//to be added

References

//to be added