Federation, replication and transformation: Data Integration with SAP Datasphere

Share

SAP’s data and analytics portfolio is increasingly leaning on SAP Datasphere as opposed to Data Intelligence for integration and connectivity. With the gradual phase-out of Data Intelligence, whose maintenance ends by the end of 2026 (on-premise) or 2028 (cloud), we at Expertum notice that customers require more clarity on the options that they have when it comes to data integration. In this blog, I will touch upon some of the key options that SAP Datasphere offers today, and what we might expect from the solution in the (near) future.

Methods for connecting

From its inception as Data Warehouse Cloud, SAP has always positioned Datasphere (DSP) as a solution that should be capable of facilitating deep and diverse integration scenarios. Today, there are several main directions that you can look in to facilitate these with your SAP Datasphere system. Standard (built-in) connectors are the main method of connecting other (SAP or partner) systems with DSP. For connections that are not available out of the box, Open Connectors (through SAP’s Integration Suite) offer a viable alternative. These work for both SAP solutions that are not yet supported and non-SAP solutions such as Precog and Oracle (through API-like configuration). Note that by the end of this year, new Open Connectors will no longer be supported and existing Connectors will move to Cloud Integration, which will require a manual migration from the user’s end. Finally, the Data Marketplace within Datasphere allows users to exchange data products with other users of DSP, which results in a platform that has a lot of potential but does not necessarily offer the data that you would need for your specific business case.

A technical perspective

If we look at integration from a more technical perspective, we can distinguish two (traditional) methods of data access: federation
(virtual access) and replication (persistent access). There are several objects within SAP Datasphere that facilitate either or both approaches.

Remote Tables are based on the Smart Data Access (SDA) and Smart Data Integration (SDI, which accesses remote sources through the dpServer), technologies that we are quite familiar with by now. Remote tables are usable either through federation (leave data in the source system, no upfront data movement) or replication (replicate snapshots in real-time through change-data-capture or schedule them via task chains). Note that it is also possible to partition Remote Tables in smaller objects. After replication, data can also be stored in other DSP objects such as (Graphical) Views or Analytic Models.

Data Flows can be used for the traditional ETL-process, either from sources external or internal to Datasphere. Here, you can use the standard set of operators complimented with some SQL expressions and Python (Pandas and NumPy) to load the data to your Datasphere target. Data Flows also work with Open Connectors that you might configure in Datasphere yourself.

Last but not least, Transformation Flows use SQL-based logic to apply various data wrangling and transformation operations within local tables in Datasphere. It is also possible to use Transformation Flows in the context of BW Bridge and OpenSQL schema’s. You can apply aggregations, joins, functions and more to data that is stored within your solution. The support for initial loads is already quite advanced, but options for delta capture (which works through an internal table) are limited, especially for external delta loads. If you want to load data to non-SAP systems in general, the way to go is through the Premium Outbound Integration, on which my colleague Rogier wrote a blog about two weeks ago.

Note that if you look at Datasphere’s current (June 2024) roadmap, you will see that SAP plans to support Lakehouse architectures in the context of data integration in the following quarters. How this translates to each of the ‘flows’ within Datasphere is something to keep an eye out for, especially if you have (for example) a Databricks solution in your landscape.

Task chains

Task chains in Datasphere are comparable to process chains in BW(/4HANA). These can be used to combine different actions within DSP into a single workflow. Task chains are developing at a fast pace and already support Data and Transformation Flows (to a limited degree), Intelligent Lookup, View persistency, Remote Table replication and even other task chains in the form of nesting. Combine this with a bunch of operators, run possibilities (e.g. parallel, sequential or scheduled down to the minute) and email notifications (e.g. for errors during initialization or processing) and you are looking at an increasingly powerful function set. If you want to know more about task chains, please refer to SAP note 3018945 or this page for more information.

An example of a Task Chain in SAP Datasphere.

Data Intelligence conversion & wrap-up

In this blog, I’ve briefly walked you through the integration options SAP Datasphere currently offers. As SAP’s flagship Data Warehouse solution, DSP is constantly being updated and its roadmap is also very much in motion. Where its initial years, especially as SAP Data Warehouse Cloud, were rather erratic in the face of its competition and the rest of SAP’s (legacy) solution portfolio, DSP has now found solid ground and is evolving in a direction where it can (start to) meet customer expectations and ambitions.

Without a doubt, Datasphere has taken significant steps in both the data catalog and the different integration flows to close the gap with Data Intelligence. In spite of this, we still see some areas of concern. With DI’s Machine Learning Scenarios moving to either the HANA Predictive Analytics Library (PAL) or SAP AI Core, it remains to be seen if all data flow and pipeline features from DI will be covered in Datasphere eventually. The latter, which concerns REST service exposure, more complex ETL scenarios and system orchestration, should be covered by either of the three Datasphere ‘flows’ we discussed in this blog; though I’m sure more advanced DI data engineers will see some limitations right off the bat (e.g. with regards to the available Python libraries or the lack of R-support). Performance tuning of data replication and processing in general is another topic in which Datasphere is quite limited when compared against the options customers currently have in Data Intelligence.

All in all, I believe that the transition from DI to DSP will be one that requires a healthy degree of attention from customers, as SAP offers no fully automated migration, but does offer some guidance to make the transition over time. Understanding your existing Data Intelligence constructs and translating them to their Datasphere equivalents (or the closest thing to it) sooner rather than later, will be the key to a successful migration.

If you wonder where to go with your SAP BI solutions or if you need advice regarding your (SAP) data landscape, don’t hesitate to contact us!

*All images shown are either property of or adapted from SAP SE.

Chris is here to listen.
Get in touch with him.

Chris is based in Slovenia.

About the author

Lars van der Goes

Lars van der Goes is SAP Data & Analytics Lead at Expertum. Lars combines strong functional skills with broad knowledge of Analytics principles, products and possibilities.