Data Science is no different from fundamental science that requires dedication, collaboration with proper institutions and time for research. I differentiate two meanings of this term: Data Science as a Science, and Data Science as an empowerment of Big Data Engineering, Big Data Development to utilize scientific methods to achieve better business results. It is hardly possible to find Big Data Developer or Engineer who has no idea about Data Science or who won’t be able to develop classification, regression or any known scientific methods.
My personal involvement into Data Science was with the purpose to empower (or complement) Data Engineering to achieve business objectives.
Master’s degree with Honors in Applied Mathematics
With Master’s degree with Honors in Applied Mathematics it’s natural for me to define the problems, divide into smaller chunks and work with information to resolve it. I’ve forgotten most of the course materials, but the skills are there to refresh that quickly and dive into problem. I’ve covered fully courses relevant to Data Science:
Programming was among core subjects I was covering fundamentally during Batchelor degree education. I’ve taken quite a few courses on fundamental Computer Science related to solving problems algorithmically (not sure the exact course name), Operating Systems and Computer Hardware, Compilers, Programming Assembler, etc.
In my early days I was using C, Pascal and Java for development but my recent most relevant skills are:
- SQL, PL/SQL with 10+ years
- R with 5+ years
- Linux / Shell scripting with 10+ years
At some point in my life I learnt that the most comprehensive and heavily loaded by facts report could be useless for the business. It appears that humans have certain cognitive capabilities to visually comprehend the information, and that must be learnt and followed to achieve the objective of the report – give insights, answer questions or maybe generate new questions.
I’ll be short with this just listing the topics I’m able to cover. I’ve done few descriptive data analysis for retail sector client but in general it is a tedious process and every case is different. These are the topics for descriptive statistics I can handle:
- Relationship analysis (for numerical, nominal, ordinal data)
- Similarity and association metrics
- Homogeneity analysis
- Linear Regression
- Outliers detection
- Missing data processing (imputation techniques)
- Cross validation
- Decision Trees
- Clustering Analysis: hierarchical methods
- Clustering Analysis: k-means
Time Series is the one I’ve spent most time with. I’ve been trying plenty of different methods reaching better retail sales forecasting results. Read many topics, works, books on subject and finally reached the model that I was looking for. Details are provided in one of my blogs dedicated to dynamic pricing. I’ve been able to reach 98% accuracy on weekly store sales and little less than that on weekly category sales.
Here is the list of topics I can cover for Time Series data:
- Imputation Methods
- Bootstrapping techniques
- Logistic regression
- Ridge regression
- Panel transition regression
- Clustering Analysis (any other one besides k-means)/Fuzzy Clustering analysis
- Classification/Discriminant analysis
- Anomaly detections
- Multivariate space reduction/Factor analysis
- Random Forests
- Gradient Boosted Decision Trees
- Conditional Random Forests
- Naive Bayes
- Bayesian Networks
- Markov Process