Data Engineering

Engineering way of thinking is something that I was burn with. Every aspect or interest of mine, from the very beginning of my conscious life, meant to be explainable with rules of science. Even human phycology became a topic that I was learning using engineering techniques of classification, case studies, ultimate goals.

I’ve started my engineering journey from high school fixing clocks on my free time and making some extra money for my needs, then to Applied Mathematics and Computer Science learning from the very bottom of processing in Assembler language.

Below is a summary of my current engineering expertise. What I was not able to fit there is my passion with new technology, great will to use it in real-life cases to make a difference.

Hortonworks

Hortonworks

Hortonworks is the leading contributor to and provider of Apache Hadoop big data architecture for the enterprise. Built on Apache Hadoop and powered by YARN, the Hortonworks Data Platform (HDP) enables organizations to deploy big data applications quickly and effectively with its real-time data processing, security, and data governance capabilities.

I’m a big fan of Hortonworks (HWX). Have been working with its platforms almost in all my Big Data assignments for 4+ years. There is an outstanding team at HWX with support at every level of the project starting from capacity planning and initial impact assessment. Apart from that HWX is a great contributor to the next-gen open-source solutions like Apache Metron and others.

Elastic

Elastic Stack

Elastic (formerly Elasticsearch) is a leading provider of open source enterprise solutions for search and big data log analytics, helping customers worldwide make data-driven, informed decisions based on real-time, smart data. Elastic stands behind its widely leveraged open source Elastic Stack (Elasticsearch, Logstash, Kibana, and Beats), as well as other products like Shield (Security), Watcher (Alerting), Marvel (Monitoring), and Graph (collectively known as X-Pack).

I love Elastic for its simplicity and complete data-to-dashboard architecture. I’ve been working with BI and reporting inquiries for almost entire my professional career. Having a solid tool with underlaying storage and speed search is a great asset for open-source community and unique for solutions that I describe in my blogs.

Apache Metron

Hortonworks Cybersecurity Platform (powered by Apache Metron)

Hortonworks Cybersecurity Platform (HCP) is powered by Apache Metron and other open-source big data technologies. At the prime intersection of Big Data and Machine Learning, HCP employs a data-science-based approach to visualize diverse, streaming security data at scale to aid Security Operations Centers (SOC) in real-time detection and response to threats.

Apache Metron provides a scalable advanced security analytics framework built with the Hadoop Community evolving from the Cisco OpenSOC Project. A cyber security application framework that provides organizations the ability to detect cyber anomalies and enable organizations to rapidly respond to identified anomalies.

Core features of HCP include:

  • Ingest and data enrichment in real-time of security data sources at millions of events per second.
  • Real-time behavior profiling at scale.
  • Petabyte-scale storage platform allows larger training sets and detailed forensic replay when a cyber threat is detected.
  • Rapid productionization of machine learning, allowing data scientists to work in real-time and monitor environments faster.
  • “Single view of risk” user interfaces make SOC analysts more productive, and dashboard and notebook interfaces make data scientists more effective.
aws

Amazon Web Services

In 2017, Amazon Web Services (AWS) comprised more than 90 services spanning a wide range including computing, storage, networking, database, analytics, application services, deployment, management, mobile, developer tools, and tools for the Internet of Things. The most popular include Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3). As of 2017, AWS owns a dominant 34% of all cloud (IaaS, PaaS) while the next three competitors Microsoft, Google, and IBM have 11%, 8%, 6% respectively according to Synergy Group.

I’ve been building storage and computing platforms in AWS with private networks and security architecture. It is common for organizations to extend their existing infrastructure to the cloud to balance on cost speed for rolling out new infrastructure for developers and experimental projects.

Apache Hadoop

Apache Hadoop

Unstructured content is fundamentally different from structured data and must be prepared appropriately for use in big data analysis applications. Search Technologies has unrivalled experience with unstructured content processing. The technologies and processes involved are common to enterprise search and business insight applications. The keys to creating insight from unstructured content are: derived metadata provenance, process transparency, an agile environment, and complexity control.

Apache Kafka

Apache Kafka

Kafka got its start as an internal infrastructure system we built at LinkedIn. It’s a high-speed messaging broker that is very good at handling complex event processing and potentially can handle over a million events per second. Today, Kafka is being used by tens of thousands of organizations, including over a third of the Fortune 500 companies. It’s among the fastest growing open source projects and has spawned an immense ecosystem around it. It’s at the heart of a movement towards managing and processing streams of data.

I have experience with Kafka use-case with 1TB data daily that potentially would grow to 100TB daily. Horizontal scalability and in-memory processing is something that makes Kafka really good choice for my projects. I use Kafka extensively in my reference architecture blogs.

Apache NiFi

Apache NiFi

Apache NiFi is a software project from the Apache Software Foundation designed to automate the flow of data between software systems. It is based on the "NiagaraFiles" software previously developed by the NSA, which is also the source of a part of its present name – NiFi. It was open-sourced as a part of NSA's technology transfer program in 2014.

The software design is based on the flow-based programming model and offers features which prominently include the ability to operate within clusters, security using TLS encryption, extensibility (users can write their own software to extend its abilities) and improved usability features like a portal which can be used to view and modify behaviour visually. Software development and commercial support is currently offered by Hortonworks, who acquired NiFi's originator, Onyara Inc.

My experience with NiFi started in 2018 when I built my first complex NiFi process in just few days with no coding at all. Since that time I include NiFi for ETL and flow processing in most of my projects.

database-server-support-oracle-sql-mysql-500x500

Relational Databases: Oracle Database 11g, PostgreSQL, MySQL, MSSQL

Relational databases are at heart of any MIS, BI, ERP. So I’ve learned database fundamentals early in my professional career and since then were practicing Oracle Database 11g, PostgreSQL, MySQL, MSSQL. Despite their classical nature all stated technologies nowadays developed to support extensive use cases with enterprise level support.

I’ve had experience designing and building both transactional and analytical databases. Example of transactional database is an Order Management System, example of analytical database would be Enterprise Data Warehouse (EDW).

With Oracle Database 11g I’ve architected EDW for the largest regional card processing company serving over 7 million customers.

Oracle BI

Oracle BI Enterprise Edition

Oracle BI EE is Oracle Corporation's set of business intelligence tools consisting of former Siebel Systems business intelligence and Hyperion Solutions business intelligence offerings. It’s one of top BI systems on the market with extensive connectivity to various data sources including Hadoop.

I have an expensive experience with Oracle BI in Financial and Retail sectors as a solution architect, dashboard designer and developer during 2010 - 2015 years.

658625-200

Others

There are quite a few technologies that I’ve been working with during my professional career. Most of them are distinctive with approach and realization. One of them is Lotus Notes, very popular in 2000-th with document-based storage and portal-type approach. Brilliant collaborative solutions were built on top of that.

On the engineering side I always try to avoid solutions that link you closely to the particular software vendor and prefer open-source solutions. However, 10-15 years ago open-source stack was really weak to be able to reach business goals without experienced resources that were scarce at that time.