Saturday, June 03, 2017

Using Machine Learning in a SAP project

SAP projects can be a very complex task of data mapping and data cleaning. People in data science spend most of their time doing this task called data wrangling. There is no doubt a lot of effort is needed at this stage. Most times in SAP projects this task is done manually using extractions and Excel files in a process that is very likely to be poluted by human errors. As data size and complexity increases final data quality decreases. This happens always, no matter how strong people are with Excel.

I think there is a lot from data science that can be very effective in SAP projects. Not only the tools for building the data wrangling tasks but also the machine learning tools that can be used to extract knowledge from data.

In a recent project the product pricing master data consisted of about 50 thousand entries. This is still a small dataset, but already too large for the user to validate line by line.

With machine learning it was possible to identify the errors in the data using algorithms for outlier detection. These methods work quite well when there is a lot of data. In this pricing data there were many very similar prices for the same material and different customers, and many similar prices for products from same product group. This allows the algorithm to identify areas in space with high density of entries and outliers are elements outside those areas.

The picture below illustrates what the algorithm does, the areas with high density of points are what the algorithm sees as normal values and everything else outside are the outliers.

In this specific case machine learning unconvered a large number of outliers and after analysis it was possible to identify several reasons:

  • prices can be maintained with quantity factor (eg. it can be a price for 1 unit, or a price for 100 units); wrong quantity factors originated prices that were order of magnitude different
  • prices can be maintained in different currencies (which were all converted to a base currency before using the algorithm) and there were cases where the price was calculated in a currency but then maintained in a different currency
  • there were software bugs in reading the upload data format

Using machine learning allowed to quickly extract the few hundred cases of errors from the dataset and this simplified the correction activity. Because it is such a generic tool it can be used with any data to find entries that are outliers. If after inspection the outliers are correct entries, then we can have more confidence on the data quality.

Master data quality is a big problem. Machine learning will not magically solve all master data issues, but it is a strong tool to help on that.

Saturday, August 20, 2016

Using deep learning models on Nespresso images

One great resource in machine learning is pre-trained neural networks for image processing. While training a deep network is complex and needs large amounts of data, using pre-trained models is as easy as using functions from a software library.

Just for some fun I picked a few images from the Nespresso webshop and used the VGG19 pre-trained network with the goal of finding a way to sort the images by similarity. This just needs two steps, the first is to get the network layer outputs for each image.

    base_model = VGG19(include_top=True, weights='imagenet')
    model = Model(input=base_model.input,
            output=base_model.get_layer('block5_pool').output)
    img_features = np.vstack([model.predict(img).flatten() for img in imglst])

The network returns a vector for each image, then using PCA the vector is reduced to a single dimension, which is then the sort order.

    pca = PCA(n_components=1)
    img_score = pca.fit_transform(img_features)

The result is the following sort order for the images (sorted from top to bottom, left to right).




Not perfect but very close, don't you think?

Tuesday, January 19, 2016

SAPYard

I recently published a post in SAPYard about the SAP Debugging course that I created for Academy OnSap. The SAPYard is a great site, full a useful information for SAP consultants and users. I am very grateful for their help spreading the word about the SAP debugging course and I also want to highlight they have great content.

For example, they have a series of tutorials about Web Dynpro, a topic that nowadays is fundamental for programmers. Still on ABAP, you will also find interesting this more advanced post about delete on internal tables. And they also have a lot of content and expertise on SAP HANA (eg, HANA Views).

There are amazing SAP resources outside the walls of the SAP company. Official SAP portals are good and important but in my opinion the quality of the independent community is what makes the SAP ecosystem so great.

Monday, June 15, 2015

An online course on SAP debugging

I think debugging is one of the most important skills to do advanced work on SAP. At least from my experience, when things get hard it is either the debugger or data analysis that will help find the solution. As far as I know there are not many resources to learn and improve the practice of debugging. So, as an attempt to improve this, I worked on assembling a mini course on SAP Debugging, following the patterns of the modern MOOCs.

I think it turned out quite OK and it was fun, probably I will be doing more of these. You can find the debugging course here.

The course is for beginners, but it also includes some advanced topics that can be interesting to experienced people. And it has some assignments that should not take much time and (hopefully) will be fun.

Looking forward to see you in the class!

Monday, May 18, 2015

SAP Unit Tests actually exist, report of a sighting

One of most impressive things in SAP, and impressive in a bad way, is the huge effort of people executing tests to validate the software after upgrades or some larger developments. It should not be like this. We now look back and we think it is funny a room full of people performing calculations like in this picture from the 40s.

But almost 100 years later this is more or less how a SAP upgrade is tested.

The reason is historic. The "old" SAP code is large, monolithic and highly integrated (a nicer way to describe a crazy web of dependencies). Not surprisingly this older code comes without automated tests, these would be hard to implement in such architecture. But not all SAP is coded the same way. Components developed in the last 10-15 years follow the typical object-oriented modular software best practices. And SAP also includes a complete framework for unit tests. My expectation was that SAP would start shipping more and more unit tests, and this would help reduce the need of manual testing and reduce the risk of regressions when installing SAP updates.

So for many years I asked myself: Where are the SAP unit tests???

Many times I tried the option to see the unit tests in SAP objects, just to find out ... there was none.

Anyone remembers something similar? This documentation option in functions. At least the experience is consistent, it almost never shows any documentation.

It actually shows a message to make you feel there could be something in other language. But, of course, there was nothing.

But now I finally saw some unit tests in SAP code. It was in /SAPAPO/CL_ALL_LOCK_DELTA class (in SAP SCM software).

So unit tests exists, it is worth searching for them. Looking at the unit tests is also a good way to understand how to use that part of the code.

And now I have some SAP example to show when trying to bring unit testing best practices to the custom developments world. SAP recommends using unit tests. You know those words can take you a long way.