Wednesday, September 13, 2017

Testing SAP in the age of the robot

Like everywhere else in software industry, in SAP testing is the key factor to have things working in a stable and robust way. The main difference is that while some evolve their systems based on a large set of automated tests, in SAP projects it is common to depend on people performing tests.

We can try to dig some historical and technical reasons for doing things manually in SAP, but that is just looking for excuses. Reality is that we can and we should build testing robots for SAP.

I believe the best way to explain how this can be so effective is to show some examples. I will start with a master data test example.

    def test_atp_group(self):
        fert_materials = mara[mara.mtart=='FERT']
        wrong = marc[(marc. != 'X1')&(marc.matnr.isin(mara[mara.mtart=='FERT'].matnr))]

So what is it so special here. First the code is very short, it takes advantage of python data science tools to write very compact code. Being able to write tests in very few lines allows not only to create large number of tests but also to be easy to rewrite tests as many times as needed. Most of my tests are 1 to 5 lines of code and that is important. With a big screen I can see a few hundred tests without much need to scrolling and that makes it much easier to go back from test failures to the testing rules. Also it does not feel bad to delete a line of code when a rule changes, while deleting 100 lines would feel a bit depressing.

And why testing master data is good? My experience shows that master data tests have the best return of investment. Tests take little time to write (5 or 10 minutes) and will run dozens of times and will find hundreds (or thousands) of errors. Same tests will run in development, regression, pre-prod and production systems. Most projects will create master data in waves, so tests will also run for multiple waves. And sometimes we need to ask others more than once to get something properly fixed. All this multiplied makes many checks, many errors found and a lot of saved time. During a project making master data tests early is like investing in bitcoins when it started, we get a huge return on the time invested.

In the code above it shows mara and marc variables, you may wonder how it gets there. These two database tables for material master and stored as a special data structure called data frames. To get these data frames from SAP data I use an API inspired on Django, that we built at Cognitiva, that works like this:

    marc = ReadTable(con).table('marc').filter('werks',['P001','P002']).all()
    mara = ReadTable(con).table('mara').filter('matnr',marc.matnr.unique()).all()

Again it takes only one line of code for each table, so quite easy to get data from 50 tables or more.

After master data I think the next most useful test robots (larger ROI) are the ones that test large calculation that depend on complex master data setups. One such example is delivery scheduling that dependes on multiple condition tables and calendars. Another is pricing calculation when there are multiple and complex discounts schemes. Whenever these complex tasks can be isolated, it is easier to build a specific test robot than to try to test as part as end to end testing. This is how it looks like a test on delivery scheduling, where the call to schedule is just wrapper call on BAPI_APO_SCHEDULING. It is also very quick to write tests so it is easy to have an extensive coverage.

    def test_std_10_03(self):
        """standard request before cut-off on Wednesday
           GI Wednesday at 18h, delivery Friday at 18h
        items = [('ZDMODE','STD'),('LOCFR','7309'),
        times = self.schedule(20140423100000, items)
        self.assertSchedEqual(times['WADAT'], 20140423180000)
        self.assertSchedEqual(times['LFDAT'], 20140425180000)

And finally end-to-end (E2E) test robots are also quite useful. Some years ago E2E testing was mainly about testing the GUI that users would use to perform the actions on the system. But nowadays what we see in SAP is that large usage of the system happens through interfaces (EDI, ecommerce, point of sales, external warehouse systems, etc) so automated testing the interfaces and batch jobs exactly as in production and replacing the GUI with some equivalent RFC calls is a good strategy. An example of this would be replacing the action the user would take in VA02 to remove the delivery block of a sales order with a call to the sales order BAPI to do the same change in delivery block. The last option is technically simpler to automate and good enough to catch errors.

To build E2E test robots some time is spent to create building blocks, like a DESADV IDOC for external warehouse, or a TLB queue for orders created from APO transport load builder. But after having these building blocks writing E2E tests is also quick, a example test looks like:

    def vmi_sales(self, plant, mats, country, routedays=2, channel='04'):
        transport_size =
        sold_to, ship_to =
        items =, transport_size)
        mdoc = self.load_stock(items, plant)
        self.check_material_mov(mdoc, plant, country)
        order_id = self.create_tlb(sold_to, plant, items, ship_to=ship_to, 
        time.sleep(15) # wait for CIF processing
        self.check_sales(plant, country, order_id)
        dlv = self.create_delivery(order_id)
        self.check_delivery(dlv, plant, country)
        self.delivery_pgi(dlv, plant)
        return {'order_id':order_id, 'dlv':dlv, 'mdoc':mdoc, 'plant':plant, 

This test would load stock, create a VMI sales through the CIF queue, check sales correctness, create and check the delivery and PGI the delivery by creating an external WMS IDOC. Then finally invoice is created by a batch job. The extension to more complex E2E with transfers between plants and multiple invoices is just a few more lines of code. Then for each E2E writing variants can be just one line of code.

    def vmi_sweden(self):
        return self.vmi_sales('P001',[123,321,112,221],'SE',routedays=14)

Because it is so quick to build variants, robots can test multiple plants, combinations of transfers between sequence of plants and final sales, multiple types of materials, partial deliveries, batch splits, etc.

Although the time investment in E2E test robots is higher, in my opinion it is well worth. Running hundreds or thousands of E2E tests is the best way to be sure we have a robust system or to know which areas need improvement.

Robots are good. We need more robot testing in SAP projects.

Labels: ,

Saturday, June 03, 2017

Using Machine Learning in a SAP project

SAP projects can be a very complex task of data mapping and data cleaning. People in data science spend most of their time doing this task called data wrangling. There is no doubt a lot of effort is needed at this stage. Most times in SAP projects this task is done manually using extractions and Excel files in a process that is very likely to be poluted by human errors. As data size and complexity increases final data quality decreases. This happens always, no matter how strong people are with Excel.

I think there is a lot from data science that can be very effective in SAP projects. Not only the tools for building the data wrangling tasks but also the machine learning tools that can be used to extract knowledge from data.

In a recent project the product pricing master data consisted of about 50 thousand entries. This is still a small dataset, but already too large for the user to validate line by line.

With machine learning it was possible to identify the errors in the data using algorithms for outlier detection. These methods work quite well when there is a lot of data. In this pricing data there were many very similar prices for the same material and different customers, and many similar prices for products from same product group. This allows the algorithm to identify areas in space with high density of entries and outliers are elements outside those areas.

The picture below illustrates what the algorithm does, the areas with high density of points are what the algorithm sees as normal values and everything else outside are the outliers.

In this specific case machine learning unconvered a large number of outliers and after analysis it was possible to identify several reasons:

  • prices can be maintained with quantity factor (eg. it can be a price for 1 unit, or a price for 100 units); wrong quantity factors originated prices that were order of magnitude different
  • prices can be maintained in different currencies (which were all converted to a base currency before using the algorithm) and there were cases where the price was calculated in a currency but then maintained in a different currency
  • there were software bugs in reading the upload data format

Using machine learning allowed to quickly extract the few hundred cases of errors from the dataset and this simplified the correction activity. Because it is such a generic tool it can be used with any data to find entries that are outliers. If after inspection the outliers are correct entries, then we can have more confidence on the data quality.

Master data quality is a big problem. Machine learning will not magically solve all master data issues, but it is a strong tool to help on that.

Saturday, August 20, 2016

Using deep learning models on Nespresso images

One great resource in machine learning is pre-trained neural networks for image processing. While training a deep network is complex and needs large amounts of data, using pre-trained models is as easy as using functions from a software library.

Just for some fun I picked a few images from the Nespresso webshop and used the VGG19 pre-trained network with the goal of finding a way to sort the images by similarity. This just needs two steps, the first is to get the network layer outputs for each image.

    base_model = VGG19(include_top=True, weights='imagenet')
    model = Model(input=base_model.input,
    img_features = np.vstack([model.predict(img).flatten() for img in imglst])

The network returns a vector for each image, then using PCA the vector is reduced to a single dimension, which is then the sort order.

    pca = PCA(n_components=1)
    img_score = pca.fit_transform(img_features)

The result is the following sort order for the images (sorted from top to bottom, left to right).

Not perfect but very close, don't you think?

Tuesday, January 19, 2016


I recently published a post in SAPYard about the SAP Debugging course that I created for Academy OnSap. The SAPYard is a great site, full a useful information for SAP consultants and users. I am very grateful for their help spreading the word about the SAP debugging course and I also want to highlight they have great content.

For example, they have a series of tutorials about Web Dynpro, a topic that nowadays is fundamental for programmers. Still on ABAP, you will also find interesting this more advanced post about delete on internal tables. And they also have a lot of content and expertise on SAP HANA (eg, HANA Views).

There are amazing SAP resources outside the walls of the SAP company. Official SAP portals are good and important but in my opinion the quality of the independent community is what makes the SAP ecosystem so great.

Monday, June 15, 2015

An online course on SAP debugging

I think debugging is one of the most important skills to do advanced work on SAP. At least from my experience, when things get hard it is either the debugger or data analysis that will help find the solution. As far as I know there are not many resources to learn and improve the practice of debugging. So, as an attempt to improve this, I worked on assembling a mini course on SAP Debugging, following the patterns of the modern MOOCs.

I think it turned out quite OK and it was fun, probably I will be doing more of these. You can find the debugging course here.

The course is for beginners, but it also includes some advanced topics that can be interesting to experienced people. And it has some assignments that should not take much time and (hopefully) will be fun.

Looking forward to see you in the class!