Dynamic pricing is a strategy on which retailers change the price of the product base on broader concept of price elasticity that includes product attributes, seasonality, competitive positioning, store formats and geography. Price is one of the most important factors for customer decision to purchase and the only factor directly affecting retail store profits.
The goal of dynamic pricing optimization is to ensure optimal prices in pace for every store, every SKU all the time through all the channels. Knowledge of product elasticity across all stores, all products and through the year opens new possibilities for setting better retail prices based on consumer demand model and eventually increasing retailer profitability.
- The system must dynamically calculate optimal prices for each SKU at every store for the period from 1 to 4 weeks. Optimal prices are at least bound by: organizational strategy, product and category seasonality, product brand trends, competition, store location and format.
- All optimization and forecast formulas must be based on scientific methods with proven results and practical implementations.
- The system must dynamically calculate (and recalculate by schedule or request) point Price Elasticity of Demand (PED) for 80% of the assortment by sales value. PED must be either SKU/Store or SKU/Cluster based.
- The system must calculate weekly store level sales forecasts for 52 weeks. It must recalculate the forecast based on ongoing sales every week.
- The system must calculate weekly category level sales forecasts for 52 weeks. It must recalculate the forecast based on ongoing sales every week.
- Full recalc should finish within 3 hours. The system must allow horizontal scalability for future grows.
- The system must be able to run on low-range servers, Linux OS and open-source software components.
Approach and Practical Results
Our goal is to increase retailer profitability. For that we’ll use four enablers: Technology, Data, Methodology and timely Decisions. Each one of them will create the ground for the other ones to run.
The Technology layer is a classic Big Data stack that I'm using across all my solutions. With that I can scale out easily, populate new data (NoSQL) and introduce advanced solutions such as:
- multichannel price optimization,
- personalization engine (e-commerce),
- customer micro-segmentation,
- cross-channel marketing campaigns, and others.
Our solution receives thee types of inputs:
- Store Sales Data in the form of individual tickets with first time load of 36 months of historical data.
- Competitor Data in the form of regular scans of competitor prices for selected number of SKUs, and
- Strategy and Business Rules in the form of configuration sheets.
Speed Layer is a classical ETL process handler realized on top of NiFi server. NiFi allows quick and efficient visual design with over 270 ready processors. Another advantage of NiFi is its readiness to handle near real time data.
As a future growth option I would like to add here Kafka messaging system and Spark Streaming engine for scenarios such as real-time personalization in e-commerce or real-time decision making.
Batch Layer consists of three core components: Hadoop - distributed storage, R - statistical engine, Elasticsearch for searching and integration with Kibana - the BI and Reporting engine. Logstash and ES-Hadoop are provided here for integration purposes.
Serving Layer consists of Kibana - graphical UI component with easy to customize and build dashboards, and Hive with HSQL that could be used to integrate my solution with classical ERP and Retail POS management systems.
The solution must continuously monitor consumer purchase behaviour (elasticity of demand) and recommend changes in retail prices to achieve higher profitability. There will be several supplementary outputs of the current solution, these are:
- store and category level sales forecasts for 52 weeks
- seasonal decomposition of category and store sales
- product point elasticity of demand
The ultimate goal is to find optimal price (where profit maximizes) for a given SKU, store, time-frame.
We achieve our goal by decomposing the sales data into three components that are easier to forecast: Store Traffic (52 weeks), Category Sales Conversion (52 weeks), and SKU Sales (4-8 weeks).
Store Traffic is forecasted directly: stlf (d.ts, h=h, s.window=3, method='ets', ic='bic', opt.crit='mae'), where d.ts is a time-series of store tickets (visitors).
Category sales is calculated as a function of probability from traffic: stlf (d.ts, h=h, s.window=3, method='ets', ic='bic', opt.crit='mae'), where d.ts is a time-series of category qty sales by traffic.
SKU sales probability is calculated as a probability of sales from category sales and eventually from traffic: lm(log(prd_cnvq) ~ fk_org + log(gr_ave_price) + log(gr_promo+1) + promo + log(prd_price), data=ds), where prd_convq is a rate of SQL sales by category sales.
Using linear model at SKU level helps us with cases of new items with small history, irregular sales and OOS cases.
As I've noted earlier, our three step forecasting process generates several important supplementary outputs. One of them is a seasonal decomposition of store sales (pic.5). This weekly report gives clear understanding of actual trend less the seasonal component.
Another example shows seasonal category demand percentage wise compared to average category demand. This particular example is for non-alcoholic beverages with seasonal growth during summer weeks.
Storage is organized in the form of Archive + DWH. Both implemented on top of Hadoop/Hive. Archive (Dimensions and Fact tables) storage is a Gzipped CSV files of historical tickets stored under separate folders (partitions) for each months (YYYY-MM).
All other tables (Summary, Forecasts and Statistics) are implemented as internal ORC tables for higher data access speed.
Process Flow Diagram
There are two loosely integrated processes. First one is run once a week and responsible for long term forecasts and SKU model fitting. Second one is taking care of next recommended prices and needs to be run on a daily bases to get prices aligned with changing supplier and competitor prices.
Hortonworks Data Platform (HDP)
Nicely packaged open-source distribution of Hadoop with Enterprise level support. I've been personally inspired and dragged into complex Hadoop eco-system by professionally prepared documentation. Many thanks to the Hortonworks team!
Open Source Search and Analytics Platform - Elastic Stack
I've been working with different BI and Reporting packages for many years and having open-source as a front-end was a great challenge. With Elastic and Kibana this seems to be getting over. Elastic provides fast NoSQL searching capabilities and Kibana reach BI/Reporting GUI on top of that. So these two components could be used for wide range of tasks. I'd highly recommend this stack for near-real time BI/Reporting tasks.
Demand Estimation and Assortment Optimization Under Substitution: Methodology and Application (pdf)
A. Gürhan Kök & Marshall L. Fisher
Operations Research Vol. 55, No. 6, November–December 2007, pp. 1001–1021
I've studied many scientific papers in demand estimation and this one seems to be an exceptional one. Gürhan Kök and Marshall L. Fisher has put together a very interesting paper on demand estimation back in 2007. Fundamentals hasn't changed since then.
Kaggle: Walmart Recruiting - Store Sales Forecasting
This particular contest on Walmart store sales forecasting from 2017 made a very good practical foundation for my final forecasting model. Having such a competition with hundreds of professional data scientists leads to a reliable and quality model.
Online Book: Forecasting Principals and Practice
Excellent introductory book to practical forecasting by Rob J Hyndman and George Athanasopoulos, Monash University, Australia.