Why should you aspire a direct connection between two Cloud data warehouses? First of all, a connection such as this is beneficial because there are no components (e.g. ETL tools) between the two applications and this approach therefore results in a simplified architecture. A possible downside of this set up, is that query performance is dependent on the data source and bandwidth. Hence, if you have a large dataset, this is not the best way to integrate the two systems. Additionally, there are some limitations to this connection, which I will bring up later in this blog.
Architectural overview of the setup
Let us get started.
Prerequisites for this setup:
- SAP Datasphere
- Snowflake
- Snowflake JDBC driver and the SAP HANA JDBC driver
- A functional SAP Data Provisioning Agent (Including some configuration)
- Configuration in Snowflake (Network rule, secret, external integration and a function)
Datasphere to Snowflake connection
To establish the connection from Datasphere to Snowflake we need to configure the SAP Data Provisioning agent, create a user in Snowflake with the appropriate privileges, and create a connection from Datasphere to Snowflake. Because neither SAP Datasphere or Snowflake offers a generic connection via a wizard, we will have to create them ourselves via JDBC drivers.
Configuring the Data provisioning agent
The Data Provisioning agent needs to be configured; I will walk you through the steps below.
- Download the Snowflake JDBC driver and put it in the /camel/lib
- Uncomment the camel jdbc section in the /camel/adapters.xml
Also make sure that CAP_LIMIT is added to Capabilities section, see SAP note 3348461 for information.
- Also, check the delimident=true in the configfile-jdbc.properties file
- Next we have to check the /camel/samples/sample-jdbc-dialect.xml according to SAP note 3428356
- After these adjustments, we can create a connection to Datasphere in the DP agent.
- Restart the agent and check if it is working in Datasphere. Remember to enable the adapters you want to use, in our case this is the CamelJDBCAdapter (shown below).
Create a user in Snowflake
Our next step is to create a dedicated user in Snowflake with the appropriate privileges. You can do this either via SQL commands or via the User Interface. Make sure that you grant usage and operate-rights on a role and assign that role to the above user. Otherwise, you will get errors such as this one:
camelnet.snowflake.client.jdbc.SnowflakeSQLException: No active warehouse selected in the current session. Select an active warehouse with the 'use warehouse' command or internal error: Remote execution error Camel execution failed.
Create the connection in Datasphere
The following action we need to undertake is to create a connection in your desired space. Click Connections -> Space -> Create -> Connection Type: Generic JDBC and fill in your connection information.
It is better to create one connection for each database in Snowflake, because otherwise this will result in some difficulties in the future. One of the difficulties as an example: if your user has a default database configured, you could connect to this connection instead of the desired one. Hence, in your JDBC URL connection, specify your database and warehouse after the connection string. Another possible issue is that if you connect without specifying a database, you will see multiple schema’s with the same name and you cannot see to which database it belongs. Finally, choose your Business Name wisely, as my example in the screenshot (Snowflake) is not the best name to use for obvious reasons.
The JDBC driver Class is as follows: net.snowflake.client.jdbc.SnowflakeDriver
The JDBC URL should be in this format: jdbc:snowflake://-snowflakecomputing.com?db=DATABASE_NAME&warehouse=WAREHOUSE_NAME
Validate the connection to see if everything (on a connection level) is correct. As you can see at features, only Remote Tables are connected, so that is a limitation of the connection. We also cannot use Transformation or Data Flows.
Next, in the Data Builder section we can select our connection to Snowflake under Sources.
Now you can use tables in, for example, graphical views.
Snowflake to Datasphere connection
Now that we have a connection from SAP Datasphere to Snowflake, we will configure the same connection the other way around. Although the architecture is very simple once again, because there are no intermediate components, it will take a bit more effort to get this working.
We will use External Network Access to access the SAP HANA Cloud database within SAP Datasphere directly, where you configure database access in your space.
Configuration in Datasphere
- Creation of a space (which I will skip in the context of this blog)
- A database user in this space
- IP Addresses from Snowflake to add to your whitelist
Database user in space
In our space we must configure Database Access. To achieve this, in SAP Datasphere, go to Space Management, click on Edit of the space you want to have Database Access configured, go to the section Database Access. Fill in the mandatory information and press Create. Hence, for the configuration in Snowflake we need to have the Database Username, Hostname and Port and of course the password of the # user.
After creation of the database user, edit the privileges. Some of the features will not be necessary, but I had to select them for troubleshooting.
IP-adresses from Snowflake in the IP Allowlist
Add the IP addresses to the Allowlist (Whitelist). You can find this via SYSTEM$ALLOWLIST.
What do we need to configure in Snowflake
- SAP HANA JDBC driver in an internal stage
- Network rule with the hostname of the database of Datasphere
- Secret to hold the credentials of the user we use in Datasphere
- External access integration
- An User Defined Function
SAP HANA JDBC driver in an internal stage
Download the SAP HANA JDBC driver and upload it to an internal stage. We will need to use this driver in the UDF:
Network rule with the hostname of the Datasphere database
Create a network rule with the hostname and port from the database access, which is configured in datasphere:
Configure the secret to hold the credentials of the user we use in Datasphere
Create a secret with the username and password from Datasphere:
External access integration
Create an External Access Integration through the method shown below:
User Defined Function
Next, we will need to create a Function. The below screenshot shows you a portion of it:
(disclaimer: only a small portion of the function)
After creating the function, you can now finally call the function in SQL. The function DATASPHERE_CONNECT has two inputs:
A string with the SQL query you want to execute in Datasphere.
Now we have the end result of our configuration. As you can see, the result is passed as a JSON and can be used for further goals with data in Snowflake.
And let’s just do something else. Let’s create another function to also get the metadata.
And if you put this function again in a SQL query
And you execute this, you get the metadata of the table. This can be handy if you don’t have access to the resource.
And with the results, we can execute another query to get results in a more “standard” form.
Curious if this or a similar scenario could be helpful for your organization? At Expertum, we look beyond your SAP solutions in order to realize your data ambitions. Contact us to discuss your plans and enjoy a cup of coffee together.