Saturday, March 24, 2018

My wrap-up after a NLP machine learning competition

I recently participated in a natural language processing (NLP) machine learning competition, and there were some interesting learning’s that I think are worth sharing. NLP is a bit different from my usual topics but I think machine learning is the most interesting thing currently happening in supply chain software and working with text is an important part of machine learning.

The competition was Jigsaw Toxic Comment, with over 4500 teams competing and my team got a great 5th place. The goal was to build a classification system able to classify different comments with given types "toxic/insult/obscene/etc" or as clean.

The best results were 0.9885 AUC which is impressive, it is another task were the machine is at the same level as humans.

To achieve these results all top teams used deep learning, the best models were almost always recurrent neural networks (LSTM or GRU) using pre-trained word vectors. In case you don't know, word vectors are mappings between words and large dimensional vectors (300 dimensions is common), where these vectors result from training in very large collections of text (like wikipedia). In the word vector space, words with similar meaning have close vectors and to some extend the distance between words in the space relate to the concepts. One typical example is that if we take the vector for the work "King" and we add the vector for "Woman" we get a vector that is the closest to the word "Queen". This last example is known to be a bit cherry-picked and word vectors are not yet so perfect but it is for sure one of the best tools in NLP. And there are many good vectors to choose from, so as always in machine learning, combining all of them leads to a better result.

One interesting fact from the data is that even though some word vectors have more then 2 million words, we could find in the comments texts a very large number of words missing in the vectors (around 30%). It is because of bad spelling, using foreign language words and in many cases it is on purpose that people write "heeeyyyy", "d0n't" or "bs'ing". One thing that helps for these cases are subword embeddings like Fasttext that do training in parts of words and based on that can build vectors for unknown words by using the smaller pieces.

Since the comments are full of misspelled words and non-letter characters it could be thought that doing a lot of pre-processing to clean the text and fix the misspellings would improve the results. This was not the case in my experience and other teams also reported the same conclusion. Combination of subword embeddings and the ability of neural networks to internally learn the necessary filtering seem to be better.

Deep learning systems are hungry for data, and if we give them more data we will probably get better results. An interesting trick I learned was that we could easily use the translation technology to get some small variations of the texts. For example using the TextBlob Python package it is as easy as doing

new_text = TextBlob(text).translate(to="de").translate(to="en")

And we get a new text that resulted from translating the original English text to German, and then translating it back to English (using Google webservice). In few cases the result is exactly the same but in most cases there is some small change, different words but hopefully the same meaning.

Data generated this way can be used for additional training of models or can be used for what is called test time data augmentation. This last concept is quite simple to explain, it is often used in image classification where rotating an image sometimes makes it easier for the model to make better predictions. So doing several rotations and then averaging predictions will lead to a better result. With text it also worked quite well by using the different translations variants.

From my experience the language that worked best for this trick was German, then next one was Spanish. I also tried a few other European languages like Portuguese, French or Swedish and also something much different as Japanese.

I did not try it but other teams reported that just doing translations and training with the word vectors for the other language also improved the results.

It was great fun this competition. My team was absolutely amazing.

Wednesday, September 13, 2017

Testing SAP in the age of the robot

Like everywhere else in software industry, in SAP testing is the key factor to have things working in a stable and robust way. The main difference is that while some evolve their systems based on a large set of automated tests, in SAP projects it is common to depend on people performing tests.

We can try to dig some historical and technical reasons for doing things manually in SAP, but that is just looking for excuses. Reality is that we can and we should build testing robots for SAP.

I believe the best way to explain how this can be so effective is to show some examples. I will start with a master data test example.

    def test_atp_group(self):
        wrong = marc[(marc. != 'X1')&(marc.matnr.isin(mara[mara.mtart=='FERT'].matnr))]

So what is it so special here. First the code is very short, it takes advantage of python data science tools to write very compact code. Being able to write tests in very few lines allows not only to create large number of tests but also to be easy to rewrite tests as many times as needed. Most of my tests are 1 to 5 lines of code and that is important. With a big screen I can see a few hundred tests without much need to scrolling and that makes it much easier to go back from test failures to the testing rules. Also it does not feel bad to delete a line of code when a rule changes, while deleting 100 lines would feel a bit depressing.

And why testing master data is good? My experience shows that master data tests have the best return of investment. Tests take little time to write (5 or 10 minutes) and will run dozens of times and will find hundreds (or thousands) of errors. Same tests will run in development, regression, pre-prod and production systems. Most projects will create master data in waves, so tests will also run for multiple waves. And sometimes we need to ask others more than once to get something properly fixed. All this multiplied makes many checks, many errors found and a lot of saved time. During a project making master data tests early is like investing in bitcoins when it started, we get a huge return on the time invested.

In the code above it shows mara and marc variables, you may wonder how it gets there. These two database tables for material master and stored as a special data structure called data frames. To get these data frames from SAP data I use an API inspired on Django, that we built at Cognitiva, that works like this:

    marc = ReadTable(con).table('marc').filter('werks',['P001','P002']).all()
    mara = ReadTable(con).table('mara').filter('matnr',marc.matnr.unique()).all()

Again it takes only one line of code for each table, so quite easy to get data from 50 tables or more.

After master data I think the next most useful test robots (larger ROI) are the ones that test large calculation that depend on complex master data setups. One such example is delivery scheduling that dependes on multiple condition tables and calendars. Another is pricing calculation when there are multiple and complex discounts schemes. Whenever these complex tasks can be isolated, it is easier to build a specific test robot than to try to test as part as end to end testing. This is how it looks like a test on delivery scheduling, where the call to schedule is just wrapper call on BAPI_APO_SCHEDULING. It is also very quick to write tests so it is easy to have an extensive coverage.

    def test_std_10_03(self):
        """standard request before cut-off on Wednesday
           GI Wednesday at 18h, delivery Friday at 18h
        items = [('ZDMODE','STD'),('LOCFR','7309'),
        times = self.schedule(20140423100000, items)
        self.assertSchedEqual(times['WADAT'], 20140423180000)
        self.assertSchedEqual(times['LFDAT'], 20140425180000)

And finally end-to-end (E2E) test robots are also quite useful. Some years ago E2E testing was mainly about testing the GUI that users would use to perform the actions on the system. But nowadays what we see in SAP is that large usage of the system happens through interfaces (EDI, ecommerce, point of sales, external warehouse systems, etc) so automated testing the interfaces and batch jobs exactly as in production and replacing the GUI with some equivalent RFC calls is a good strategy. An example of this would be replacing the action the user would take in VA02 to remove the delivery block of a sales order with a call to the sales order BAPI to do the same change in delivery block. The last option is technically simpler to automate and good enough to catch errors.

To build E2E test robots some time is spent to create building blocks, like a DESADV IDOC for external warehouse, or a TLB queue for orders created from APO transport load builder. But after having these building blocks writing E2E tests is also quick, a example test looks like:

    def vmi_sales(self, plant, mats, country, routedays=2, channel='04'):
        transport_size =
        sold_to, ship_to =
        items =, transport_size)
        mdoc = self.load_stock(items, plant)
        self.check_material_mov(mdoc, plant, country)
        order_id = self.create_tlb(sold_to, plant, items, ship_to=ship_to, 
        time.sleep(15) # wait for CIF processing
        self.check_sales(plant, country, order_id)
        dlv = self.create_delivery(order_id)
        self.check_delivery(dlv, plant, country)
        self.delivery_pgi(dlv, plant)
        return {'order_id':order_id, 'dlv':dlv, 'mdoc':mdoc, 'plant':plant, 

This test would load stock, create a VMI sales through the CIF queue, check sales correctness, create and check the delivery and PGI the delivery by creating an external WMS IDOC. Then finally invoice is created by a batch job. The extension to more complex E2E with transfers between plants and multiple invoices is just a few more lines of code. Then for each E2E writing variants can be just one line of code.

    def vmi_sweden(self):
        return self.vmi_sales('P001',[123,321,112,221],'SE',routedays=14)

Because it is so quick to build variants, robots can test multiple plants, combinations of transfers between sequence of plants and final sales, multiple types of materials, partial deliveries, batch splits, etc.

Although the time investment in E2E test robots is higher, in my opinion it is well worth. Running hundreds or thousands of E2E tests is the best way to be sure we have a robust system or to know which areas need improvement.

Robots are good. We need more robot testing in SAP projects.

Labels: ,

Saturday, June 03, 2017

Using Machine Learning in a SAP project

SAP projects can be a very complex task of data mapping and data cleaning. People in data science spend most of their time doing this task called data wrangling. There is no doubt a lot of effort is needed at this stage. Most times in SAP projects this task is done manually using extractions and Excel files in a process that is very likely to be poluted by human errors. As data size and complexity increases final data quality decreases. This happens always, no matter how strong people are with Excel.

I think there is a lot from data science that can be very effective in SAP projects. Not only the tools for building the data wrangling tasks but also the machine learning tools that can be used to extract knowledge from data.

In a recent project the product pricing master data consisted of about 50 thousand entries. This is still a small dataset, but already too large for the user to validate line by line.

With machine learning it was possible to identify the errors in the data using algorithms for outlier detection. These methods work quite well when there is a lot of data. In this pricing data there were many very similar prices for the same material and different customers, and many similar prices for products from same product group. This allows the algorithm to identify areas in space with high density of entries and outliers are elements outside those areas.

The picture below illustrates what the algorithm does, the areas with high density of points are what the algorithm sees as normal values and everything else outside are the outliers.

In this specific case machine learning unconvered a large number of outliers and after analysis it was possible to identify several reasons:

  • prices can be maintained with quantity factor (eg. it can be a price for 1 unit, or a price for 100 units); wrong quantity factors originated prices that were order of magnitude different
  • prices can be maintained in different currencies (which were all converted to a base currency before using the algorithm) and there were cases where the price was calculated in a currency but then maintained in a different currency
  • there were software bugs in reading the upload data format

Using machine learning allowed to quickly extract the few hundred cases of errors from the dataset and this simplified the correction activity. Because it is such a generic tool it can be used with any data to find entries that are outliers. If after inspection the outliers are correct entries, then we can have more confidence on the data quality.

Master data quality is a big problem. Machine learning will not magically solve all master data issues, but it is a strong tool to help on that.

Saturday, August 20, 2016

Using deep learning models on Nespresso images

One great resource in machine learning is pre-trained neural networks for image processing. While training a deep network is complex and needs large amounts of data, using pre-trained models is as easy as using functions from a software library.

Just for some fun I picked a few images from the Nespresso webshop and used the VGG19 pre-trained network with the goal of finding a way to sort the images by similarity. This just needs two steps, the first is to get the network layer outputs for each image.

    base_model = VGG19(include_top=True, weights='imagenet')
    model = Model(input=base_model.input,
    img_features = np.vstack([model.predict(img).flatten() for img in imglst])

The network returns a vector for each image, then using PCA the vector is reduced to a single dimension, which is then the sort order.

    pca = PCA(n_components=1)
    img_score = pca.fit_transform(img_features)

The result is the following sort order for the images (sorted from top to bottom, left to right).

Not perfect but very close, don't you think?

Tuesday, January 19, 2016


I recently published a post in SAPYard about the SAP Debugging course that I created for Academy OnSap. The SAPYard is a great site, full a useful information for SAP consultants and users. I am very grateful for their help spreading the word about the SAP debugging course and I also want to highlight they have great content.

For example, they have a series of tutorials about Web Dynpro, a topic that nowadays is fundamental for programmers. Still on ABAP, you will also find interesting this more advanced post about delete on internal tables. And they also have a lot of content and expertise on SAP HANA (eg, HANA Views).

There are amazing SAP resources outside the walls of the SAP company. Official SAP portals are good and important but in my opinion the quality of the independent community is what makes the SAP ecosystem so great.