Machine learning with SAP ECC 6.0 and SAP BW 7.50

5 min readOct 24, 2021

Machine learning with SAP ECC 6.0 and SAP BW 7.50

What if you currently work with SAP ECC 6.0 and you want to build a proof of concept with machine learning and be prepared when times comes to migrate your ideas to SAP HANA Predictive Analysis Library?

I started to work with SAP about 20 years ago and I stepped through different stages of understanding business data in various implementation projects of master data, material management, sales and distributions, plant maintenance, demand planning, business warehouse and integrated planning and interfaces with other satellite systems. Few years ago, machine learning started to flood the media. This caught my attention because I have seen so many data and I was curious about revamped age of artificial intelligence I read from science fictions books and movies up to the point of singularity Ex Machina, but in reality all that worked in the past were only basic statistics models. When setting expectations take into account the divergence of goals and degrees of understanding a domain knowledge The Expert (Short Comedy Sketch)

People Tend To Overestimate What Can Be Done In One Year And To Underestimate What Can Be Done In Five Or Ten Years.

Predictive Is The Next Step In Analytics Maturity? It’s More Complicated Than That!

Source: timoelliott.com

I completed various AI courses from https://open.sap.com/ and at the same time I started to look for a second opinion of the core solutions to break in my mind the concept to basic elements. I found out that most of data science environment relies on Python and that the giants democratized the machine learning libraries. Most notable neural network library is TensorFlow with Keras on top provided by Google.

Once you understand the basic you want to understand the scale and because of that you inevitable will land on Kaggle competition. I took my chances with learning and scrapping the code from Earthquake Kaggle competition.

The Kaggle completion is the fast start and gives you the real flavor of machine learning. The Kaggle playground works out of the box and from the community shared exploratory data analysis you can puzzle your own ideas. After that you are convinced that machine learning has a real flavor you start asking if it works because of data or if it might work with any data?

Supervised learning is a more commonly used form of machine learning than reinforcement learning in part because it’s a faster, cheaper form of machine learning. With data sets, a supervised learning model can be mapped to inputs and outputs to solve a regression problem or a classification problem like image recognition, machine translation or other category allocation model. In supervised learning you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output. You can’t create this function Y=f(X) in a deterministic way, but you can create a stochastic function with machine learning.

Probably one of the most common and important area in a company to check out the machine learning is on pricing prediction. Sales process are probably the most detailed with lot of attributes in sales order and sales invoices. Sales reporting heavily relies on standard reports, logical databases and profitability reports for real-time data from ECC 6.0 or for aggregated data from SAP BW 7.50.

I liked this course SAP Getting Started with Data Science (Edition 2021) because it disclosures so many public E-learning resources the machine learning relies on with the community you can learn from and contribute.

The hardest part is to set the roadmap there are so many libraries that you are facing with the problem of overchoice. You know the quotation ”I can only show you the door, you have to walk through it”.

The connect-the-dots solution to the problem of overchoice:

The source data from reporting with categorical and numerical data.
The format of data can be TXT, CSV or XLS. I used XLS BEX workbooks from BW 7.50 because of convenient data update with refresh functionality. If you intend to continually update the data in a pipeline you have to post source data with an API.
The environment for exploratory data analysis and training is anaconda.com with Jupyter Notebook.
The main library for data manipulation is pandas.
The library for data preprocessing is sklearn. Save labels encoding with pickle.
The library for training is XGBoost. I like the embedded function of features importance. Fit the model with XGBRegressor() and save the best model with pickle.
The library for graphing is plotly. I like the intuitive content of objects with dictionaries.
The library for user interface is built with ipywidgets and Voila. Voila is slow and refuses to start sometimes, but the advantage is that you don’t need to adapt the code. Otherwise, I would use Streamlit, it is faster.
The library for API to call prediction from SAP RFC is FastAPI. Load the labels encoding and the best model with pickle.
The class for ABAP request is SE24: IF_HTTP_CLIENT.
The class for ABAP JSON for data-interchange format with FastAPI is SE24: /UI2/CL_JSON.
You can use RFC for prediction in SAP ECC for real-time execution in developed reports.
You can use the same RFC for prediction SAP BW process chain in routine of the data transfer process.

It works!

What’s the point with all these steps apart from building your knowledge with a proof of concept? Probably going deeper with basic elements gives you more flexibility to fill the gaps between different environments and use various libraries and external API services or even OpenAI Generative Pre-trained Transformer models like GPT-3. Certainly it is helpful to have a kind of davinci-codex engine to explain ABAP code to natural language and to translate natural language to ABAP code. After all a programming language is only a tool we communicate with a computer to solve a human problem and we would like that to be as close as possible to our natural language.

How did you solve the problem of overchoice with machine learning to check out your ideas with proofs of concept?

Update on 05.11.2021

When you have a lot of incoming orders with lot of characteristics you have to analyze them thoroughly and assign them to specialized internal customer representatives. Traditionally service agents are doing this manually. When you have a lot of data you may try to automate it with machine learning classifier as a first layer for allocating an order to increase productivity and accuracy. You can extract all your data from SAP system, train a model with XGBClassifier(), build an API and integrate it in the application for orders allocation.

Update on 12.11.2021

Expectations from AI are for important decisions for important issues. However, every important decision is the result of many micro decisions. Micro-decisions in SAP mean the availability to incorporate machine learning into all small functionalities like search helps, filling proposals, matching proposals, substitution proposals, validation proposals, BAPI, and reporting.

Update on 24.11.2021

String matching can be useful for a variety of situations and can save you ample amounts of time. For instance material master data contains different external classification and one have to choose by descriptions from thousand of rows the best match. Simplicity is the true elegance. It worth exploring these simple to use libraries fuzzywuzzy and gensim with focus on unsupervised NLP models to solve semantic text matching requirements.

The blog is originally posted and updated here.

Machine learning with SAP ECC 6.0 and SAP BW 7.50

Written by Sergiu Iatco