Skip to Content

Harnessing SAP Analytics Cloud’s smart predict
Data Preparation and Structures for Predictive Models

Shikha Parihar
July 29, 2024

In the first instalment of our predictive analytics series, lead software engineer Shikha Parihar explored how various industries are leveraging predictive analytics to boost their revenue, and how the SAP Analytics Cloud (SAC) is the perfect tool to achieve that. In the second, she delved into the phases of predictive modelling and how to train SAC’s Smart Predict tool. Here, she examines predictive modelling concepts in SAC by describing the data structure required for each of the three models. 

There are three types of predictive scenario in SAP Analytics Cloud (SAC) — regression, classification, and time series. Let’s dive into each… 

1. Regression  

Regression analysis is a way to sort and find the variable with the greatest influence in a dataset. To show the link between two or more datasets, the target is continuous in this method. The below example shows a regression model created to forecast the number of complaints a customer service team will receive the following week. 

To accomplish this, we must first create a training dataset with historical values for several prior weeks. The goal variable (how many complaints per week) in this dataset has historical values or actual values that are known. 

An updated application dataset with the same influencers is used to test the regression predictive model. The target variable’s values for the following week (the number of complaints) are unknown because they are in the future. The predicted number of complaints for the upcoming week is included in the output dataset. 

2. Classification  

Data is divided into categories through the process of classification. In this example, we are building a classification model to determine whether a potential customer will purchase a product. It’s a similar process to the regression model — creating a training dataset with historical customer data from previous purchases of comparable products. 

After the predictive model is constructed, it is used to forecast whether more customers will purchase the product using an application dataset. The customer data in this dataset is identical to that in the training dataset; however, since the target variable (will buy?) needs to be predicted, it has an unknown value (empty). 

The predictive model will be utilised by Smart Predict to determine the likelihood that every client in the application dataset will purchase the product. The output dataset now includes the target variable column with the anticipated result (will they buy yes/no). 

3. Time series  

A time series forecast identifies key trends that directly influence future performance. In the example below, we are creating a time series model to project product sales over the following three months.  

After being trained on the training dataset, the forecasting model automatically generates signal forecasts for N future periods. The training dataset and the application dataset are thus the same dataset in this sense. In this example, an output dataset is created with product sales projections for the following three months. 

To further enhance and strengthen the projections, you can choose to incorporate more variables in the data model (e.g., a flag that shows when a sales promotion occurred Super Promo). To create the forecasts, the future values for these other variables are necessary inputs. 

Datasets used for a segmented time series predictive scenario. 

In the example below, we are forecasting product sales for numerous goods by constructing the time series model with training and application datasets. We must add the product-related data to the dataset to accomplish this. This is referred to as the entity. Here, we indicate that the forecast model will be segmented using the entity variable — Product Name. 

Once we have accessed and retrieved the dataset from data repository for our predictive model, we need to train it before its accuracy is evaluated. To achieve this, we need to go into the settings of our model.  

Untrained predictive model 

If we save the predictive scenario before training it, the predictive model will be saved with the status ‘Not Trained’ in the predictive model list. 

Trained predictive model 

When we train a predictive model, Smart Predict will explore the relationships between the different influencer variables contained in the dataset to find the best combination to predict the target. The status is then updated to ‘Trained’. 

Defining automated data encoding 

The foundation of a predictive model is its ability to comprehend and characterise past events to forecast what is likely to occur in the future. In this section, we’ll cover some but not all, basic ideas underlying the automated approach. 

  • Variables in smart predict 

A variable corresponds to a column in the dataset, and rows contain the observations for the variable. For example, in a database containing information about customers, the <name> and <address> of those customers are variables. 

  • Properties of variables used in SAC smart predict 

Imagine a supermarket retailer wishing to improve its marketing. Determining the probability that a consumer will be interested in a certain product is an excellent way to increase the effectiveness of a marketing campaign. We can analyse just that, using a classification model on historical data, including any available consumer data like demographic or behavioural information. 

The target variable tells us whether a certain consumer bought a particular product or not. The categorisation model will search for trends that characterise the consumer’s behaviour prior to making that transaction. Then, using the most recent data, this model is used to forecast the interest of other consumers in the same product, yielding a likelihood for each consumer. 

  • Missing values 

An empty cell in your dataset is a missing value. These absent values can be the result of a mistake in the data collection process or just not available. Smart predict handles these missing values automatically and replaces them with a constant called ‘Missing’ which is then treated by the model as any other category.  

  • Outliers 

Outliers can be of two types and are handled automatically: 

  1. Anomalous values for the predictor index. For numerical variables, these can have unusually high or low values, and for nominal or ordinal variables, uncommon values. Numerical variable outliers are sorted into bins based on how big or little the encoded variable’s values are. Nominal and ordinal variables’ outliers are grouped together with other uncommon values.  
  1. Unusual rows of data which may require urgent attention. These are flagged automated analytics for manual investigation. 

Across these three blogs, you’ve seen the use cases for Predictive Analytics, how SAP enables a plethora of benefits, and how SAP has democratised access to and use of data analysis and predictive analytics for all users. You should also now understand how the techniques covered in this last instalment connect instantaneously and seamlessly with other SAP Analytics Cloud features, offering deep insights into datasets. 

Capgemini and SAP 

With four decades of experience with SAP solutions, serving 1,800 clients across the world, we are a leader in SAP certifications, an SAP Global Strategic Services Partner, and an SAP Global Platinum Reseller Partner. We can help you innovate, integrate and transform, so you can continue to grow, quickly adapt to any context, unlock and enhance business value, and stay ahead of your competition. 

Get in touch to start the conversation today. 

Author

Shikha Parihar

Lead Software Engineer
Shikha Parihar joined Capgemini in March 2023 after a career break of 5 years. With a strong focus on visual data analytics and 11 years of experience, she is an SAP BI/BW skilled professional.

      What is Predictive Analytics and how can it be applied

      How the SAP analytics cloud is a powerful solution to facilitate it.

      How to use SAP Analytics Cloud’s Smart Predict to discover deep insi

      Training the Smart Predict tool.