SAP Datasphere and Snowflake: An (im)possible marriage? (Part I)

Share

It is no secret that SAP has been pushing its Datasphere solution as not just a Cloud data warehouse, but a data integration platform as well. Partnerships with Databricks, Google and Collibra, among others, have been promoted in order to appeal to (potential) customers whose IT landscapes are increasingly vendor-diverse. However, non-SAP solutions that fall outside of this partner ecosystem are often difficult to integrate with Datasphere, and with SAP taking steps to limit the (practically) legal(!) application of its APIs for the Operational Data Provisioning (ODP) framework, things are not getting easier. Does that mean that you have no options at all to connect to popular non-SAP solutions such as Snowflake? Not at all, but you do have to take some additional steps. In this blog, my colleague Dirk-Jan will show you exactly how, by demonstrating a data connection from SAP S/4HANA to Datasphere and finally, Snowflake.

In part 1 of this blog series, Dirk-Jan will achieve this by using a dedicated ETL tool in the form of Azure Data Factory, Azure’s cloud ETL service for scale-out serverless data integration and data transformation (this could also be any other supported ETL solution). Part 2 will dive into a direct connection between Datasphere and Snowflake, so stay tuned for that as well!
Getting started: steps to take

S4/HANA

First, we want to create a view in S4/HANA what we will import in Datasphere and use it in a replication flow. So let’s get started by creating a table in ABAP and putting some data in it. Alternatively, you can could create a CDS view or even easier, select an already existing table or view in SAP Datasphere for the follow-up steps. In our example, we will use this table (ZDJ_TABLE) and its corresponding data in Datasphere later on.

Datasphere

To connect S4/HANA to Datasphere, we need to set up a Cloud Connector and and a data provisioning agent to enable Replication flows. The specific technical and connection set up of the Cloud Connector will not be covered in this blog, but a successful connection should give you the following result:

(Model import is disabled because there is no need for this in this scenario and therefore I did not create a tunnel connection)

In Datasphere we will create a new view where we will select our S4/HANA connection as a source. Next, select the custom table we created in S4/HANA:

After you have created the view from the table, select the “Expose for Consumption”-option and save and deploy the view.

Create a database user for your Datasphere space

Within Space management in Datasphere, create a new database user which will be used to exposed data to external tools. Ensure that the database user has the necessary read/write privileges. You will need the host name, database username and password information later on to configure Azure Data Factory.
Also, you need to allow Azure Data Factory to communicate with Datasphere. There is a JSON where all the IP-addresses are present in the documentation library of Microsoft. Look at the region your ADF is set up in and validate that these IP-addresses are whitelisted.

Azure

The Azure configuration will require some steps, starting with the storage account, a virtual machine where we can install our integration runtime and of course Azure Data Factory.

Storage Account

Create a storage account V2 where we can create a BLOB storage for staging purposes.

Virtual machine with the Integration Runtime

Integration Runtime

After you have setup the Data Factory, you need to install the Integration Runtime. Because we can’t use the AutoResolveIntegrationRuntime for connections to SAP HANA, we will need to create one on a virtual machine in Azure.

Run this on your newly created VM. The easiest way for me was to open the Azure portal on the virtual machine, go to the ADF and choose Option 1: Click here to launch the express setup for this computer.

After a while, the Integration runtime is setup and working.

On the Virtual Machine we need to set up an ODBC connection to Datasphere.

Put the username and password of the database user of the Space which you created earlier in the Datasphere setup.

Azure Data Factory

Now that the integration runtime is set up correctly, we can create a Linked Service. Linked Services are much like connection strings, which define the connection information needed for the service to connect to external resources.

Linked Services to Azure Blob Storage, Datasphere and Snowflake

Next, we must connect ADF to HANA and Snowflake. This will be set up in the Linked Service. For this we need three Linked Services. One to Datasphere, one to Snowflake and one to a Blob Storage for staging purposes

Let’s first create a Linked Service to an Azure Blob Storage.

Subsequently, we will create one to SAP Datasphere.
Fill in the server name of Datasphere and the username and password of the database user you set up. The server name has to be filled in with the Port number 443.
Press the test connection to see if the connection is successful.

I had some problems to save this Linked Service, but after some tweaks in the JSON configuration I managed to save it. Your mileage may vary, so if you encounter save issues as well my advice would be to review the JSON configuration settings.

Finally, we establish a Linked Service to Snowflake. We will use the AutoResolveIntegrationRuntime for Snowflake. Fill in your account name, database name, warehouse, username, password. I also had to specify the Role, so please do this as well if the system asks you to. Note that I used ACCOUNTADMIN for this, which is not the best practice for this of course. You need to create a role with appropriate privileges for this.

Once again, test the connection to validate if everything has been set up correctly.

After we created the Linked Services, we can now configure the pipeline in ADF.

Go to Author in ADF, select Pipelines and then New pipeline.

Add “Copy data” under Move and transform in the pipeline.
Name it appropriately:
Under Source create a new dataset:
Next, create a Sink dataset. Please also check if your Snowflake is hosted on Azure, otherwise you will get the below error.
We’re almost there; now, create the mapping between the HANA table and the Snowflake table.
We’re almost there; now, create the mapping between the HANA table and the Snowflake table.
If everything has been set up, we can test the pipeline. Press Add trigger and then Trigger now and in the next screen Ok.
Check the Monitor to see if the activity has been successful:

Snowflake

Finally, let’s have a look in our (Snowflake) table if the data has been loaded successfully:

As you can see, the data has been successfully populated. This concludes part I of this blog. In the next entry, I will dive into a direct connection between SAP Datasphere and Snowflake. Should you have any questions in the meantime, please do not hesitate to contact us.

Credits

This blog was written by our experts Dirk-Jan Kloezeman and Lars van der Goes.

About the author