Digitalization is changing the way business is done and affects companies from every industry and consumers in the whole world. The main goal of digitalization is the transformation of analogue values into digital formats. Due to this transformation more and more data is created in the world. This is highlighted by a recent study of the International Data Corporation (IDC), which forecasts an increase of the worldwide amount of data from 33 zettabyte in 2018 to 175 zettabyte in 2025.

The effect of digitalization is particularly noticeable within companies. Most of these companies’ business processes are performed using IT. Because of this the majority of process relevant information exists in a digital format and can only be accessed using IT. This data usually contains information which has to be accessed by internal auditing as part of their audits. This is for example data which has been created as part of a business process. One major part of audits that are conducted is still the drawing of a sample from the data to evaluate which is mainly done due to the large amount of data. The findings which are based on the drawn sample are then used as findings for the complete dataset. This carries the so-called “sampling risk” which means that there could be potential problems in the data which was not part of the sample. To remove this “sampling risk” and to assure validity of the audit results and thus better protect companies, new data science approaches which are able to consider the complete data of a business process for an audit have to be pursued.

Goal and Approach

The goal of the research cooperation DIfA (Data Intelligence for Audit) is to develop data-driven methods and approaches for internal auditing, which enable the identification of process weaknesses, inconsistencies, manipulations and fraud in business processes. With this the transition from a sampling based audit approach to a full population testing is supported and it is shown how new data science approaches can be applied within an internal auditing context.

Since internal auditing is considering both structured data (e.g. database-tables) as well as unstructured data (e.g. pdf-documents), the approach to address the research goal is broken down into the hypothesis-free analysis of structured data as well as natural language processing for the analysis of unstructured data.

The research project DIfA is realized in cooperation with Volkswagen AG and is set to take three years. During this time students will have the opportunity to write bachelor’s and master’s theses within the project.