Federation, replication and transformation: Data Integration with SAP Datasphere

 

SAP’s data and analytics portfolio is increasingly leaning on SAP Datasphere as opposed to Data Intelligence for integration and connectivity. With the gradual phase-out of Data Intelligence, whose maintenance ends by the end of 2026 (on-premise) or 2028 (cloud), we at Expertum notice that customers require more clarity on the options that they have when it comes to data integration. In this blog, I will touch upon some of the key options that SAP Datasphere offers today, and what we might expect from the solution in the (near) future.

Methods for connecting

From its inception as Data Warehouse Cloud, SAP has always positioned Datasphere (DSP) as a solution that should be capable of facilitating deep and diverse integration scenarios. Today, there are several main directions that you can look in to facilitate these with your SAP Datasphere system. Standard (built-in) connectors are the main method of connecting other (SAP or partner) systems with DSP. For connections that are not available out of the box, Open Connectors (through SAP’s Integration Suite) offer a viable alternative. These work for both SAP solutions that are not yet supported and non-SAP solutions such as Precog and Oracle (through API-like configuration). Note that by the end of this year, new Open Connectors will no longer be supported and existing Connectors will move to Cloud Integration, which will require a manual migration from the user's end. Finally, the Data Marketplace within Datasphere allows users to exchange data products with other users of DSP, which results in a platform that has a lot of potential but does not necessarily offer the data that you would need for your specific business case.

A technical perspective

If we look at integration from a more technical perspective, we can distinguish two (traditional) methods of data access: federation
(virtual access) and replication (persistent access). There are several objects within SAP Datasphere that facilitate either or both approaches.

Remote Tables
are based on the Smart Data Access (SDA) and Smart Data Integration (SDI, which accesses remote sources through the dpServer), technologies that we are quite familiar with by now. Remote tables are usable either through federation (leave data in the source system, no upfront data movement) or replication (replicate snapshots in real-time through change-data-capture or schedule them via task chains). Note that it is also possible to partition Remote Tables in smaller objects. After replication, data can also be stored in other DSP objects such as (Graphical) Views or Analytic Models.

Data Flows
can be used for the traditional ETL-process, either from sources external or internal to Datasphere. Here, you can use the standard set of operators complimented with some SQL expressions and Python (Pandas and NumPy) to load the data to your Datasphere target. Data Flows also work with Open Connectors that you might configure in Datasphere yourself.

1 blog Datasphere juni 2024

Based on SAP Data Intelligence (DI) technology (DI Replication Flows), Replication Flows are relatively new to Datasphere and allow you to replicate data from a source to a target system; with
or without replicating this data to DSP itself. In the first use case, you can store the incoming data in local tables in Datasphere, which can also be partitioned where necessary. The second use case does not store data in Datasphere at all and eliminates the need for a middleware solution, such as a data provisioning agent. You can check the entire load through Datasphere’s Data Integration Monitor and capture changes through the Init/Delta set up (note that there are limitations for Delta’s). Although the list of supported systems is growing, with some new additions planned for Q3 of this year (e.g. Confluent Kafka, Google BigQuery), your options are still limited (e.g. SAP’s flagship business solutions, MS Azure SQL), so make sure to validate your requirements before counting on Replication Flows. Finally, Q3 will also bring support for cloud object stores (e.g. Analytic Models) as a source for Replication Flows; a very welcome update.

2 blog Datasphere juni 2024

Last but not least, Transformation Flows use SQL-based logic to apply various data wrangling and transformation operations within local tables in Datasphere. It is also possible to use Transformation Flows in the context of BW Bridge and OpenSQL schema’s. You can apply aggregations, joins, functions and more to data that is stored within your solution. The support for initial loads is already quite advanced, but options for delta capture (which works through an internal table) are limited, especially for external delta loads. If you want to load data to non-SAP systems in general, the way to go is through the Premium Outbound Integration, on which my colleague Rogier wrote a blog about two weeks ago.

3 blog Datasphere juni 2024

Note that if you look at Datasphere’s current (June 2024) roadmap, you will see that SAP plans to support Lakehouse architectures in the context of data integration in the following quarters. How this translates to each of the ‘flows’ within Datasphere is something to keep an eye out for, especially if you have (for example) a Databricks solution in your landscape.

Task chains

Task chains in Datasphere are comparable to process chains in BW(/4HANA). These can be used to combine different actions within DSP into a single workflow. Task chains are developing at a fast pace and already support Data and Transformation Flows (to a limited degree), Intelligent Lookup, View persistency, Remote Table replication and even other task chains in the form of nesting. Combine this with a bunch of operators, run possibilities (e.g. parallel, sequential or scheduled down to the minute) and email notifications (e.g. for errors during initialization or processing) and you are looking at an increasingly powerful function set. If you want to know more about task chains, please refer to SAP note 3018945 or this page for more information.

4 blog Datasphere juni 2024

An example of a Task Chain in SAP Datasphere.

Data Intelligence conversion & wrap-up

In this blog, I’ve briefly walked you through the integration options SAP Datasphere currently offers. As SAP’s flagship Data Warehouse solution, DSP is constantly being updated and its roadmap is also very much in motion. Where its initial years, especially as SAP Data Warehouse Cloud, were rather erratic in the face of its competition and the rest of SAP’s (legacy) solution portfolio, DSP has now found solid ground and is evolving in a direction where it can (start to) meet customer expectations and ambitions.

Without a doubt, Datasphere has taken significant steps in both the data catalog and the different integration flows to close the gap with Data Intelligence. In spite of this, we still see some areas of concern. With DI’s Machine Learning Scenarios moving to either the HANA Predictive Analytics Library (PAL) or SAP AI Core, it remains to be seen if all data flow and pipeline features from DI will be covered in Datasphere eventually. The latter, which concerns REST service exposure, more complex ETL scenarios and system orchestration, should be covered by either of the three Datasphere ‘flows’ we discussed in this blog; though I’m sure more advanced DI data engineers will see some limitations right off the bat (e.g. with regards to the available Python libraries or the lack of R-support). Performance tuning of data replication and processing in general is another topic in which Datasphere is quite limited when compared against the options customers currently have in Data Intelligence.

All in all, I believe that the transition from DI to DSP will be one that requires a healthy degree of attention from customers, as SAP offers no fully automated migration, but does offer some guidance to make the transition over time. Understanding your existing Data Intelligence constructs and translating them to their Datasphere equivalents (or the closest thing to it) sooner rather than later, will be the key to a successful migration.

If you wonder where to go with your SAP BI solutions or if you need advice regarding your (SAP) data landscape, don’t hesitate to contact us!


*All images shown are either property of or adapted from SAP SE.

Chris is here to listen.
Get in touch with him.

%firstName% is here to listen.<br />
Get in touch with %pronouns%.

About the author

Photo of Lars van der Goes
Lars van der Goes

Lars van der Goes is SAP Data & Analytics Lead at Expertum. Lars combines strong functional skills with broad knowledge of Analytics principles, products and possibilities.

Read more articles by Lars van der Goes

Related articles