Cloud Storage Backint agent for SAP HANA overview

You can send SAP HANA backups directly to Cloud Storage from SAP HANA instances that are running on Google Cloud, on Bare Metal Solution, on premises, or on other cloud platforms by using the SAP-certified Cloud Storage Backint agent for SAP HANA (Backint agent).

The Backint agent is integrated with SAP HANA so that you can store and retrieve backups directly from Cloud Storage by using the native SAP backup and recovery functions.

When you use the Backint agent, you don't need to use persistent disk storage for backups.

For installation instructions for the Backint agent, see the Cloud Storage Backint agent for SAP HANA installation guide.

For more information about the SAP certification of the Backint agent, see:

The Backint agent configuration file

You configure the Backint agent by specifying parameters in a plain text file.

The default configuration file is called parameters.txt and the default location is /usr/sap/SID/SYS/global/hdb/opt/backint/backint-gcs/parameters.txt.

You can specify multiple configuration files by giving each file a different name.

For example, you might specify a configuration for log backups in a file called backint-log-backups.txt and a configuration for data backups in a file called backint-data-backups.txt.

Storing backups in Cloud Storage buckets

The Backint agent stores your SAP HANA backups in a Cloud Storage bucket.

When you create a bucket, you can choose the bucket location and the bucket storage class.

A bucket location can be regional, dual-regional, or multi-regional. Which you choose depends on your need to restrict the location of your data, your latency requirements for backups and restores, as well as your need for protection against regional outages. For more information, see Bucket locations.

Select dual- or multi-regional buckets in regions that are the same as or close to the regions in which your SAP HANA instances are running.

Choose a storage class based on how long you need to keep your backups, how frequently you expect to access them, and the cost. For more information, see Storage classes.

Multistreaming data backups with the Backint agent

For versions prior to SAP HANA 2.0 SP05, SAP HANA supports multi-streaming for databases larger than 128 GB. As of SAP HANA 2.0 SP05, this threshold is now configurable via the SAP HANA parameter parallel_data_backup_backint_size_threshold, which specifies the minimum database backup size in GB for multistreaming to be enabled.

Multistreaming is useful for increasing throughput and for backing up databases that are larger than 5 TB, which is the maximum size for a single object in Cloud Storage.

The optimum number of channels that you use for multistreaming depends on the Cloud Storage bucket type you are using and the environment in which SAP HANA is running. Also consider the throughput capability of the data disk attached to your HANA instance, as well as the bandwidth your administrator allocates for backup activities.

You can adjust the throughput by changing the number of streams, or limit throughput by using the #RATE_LIMIT_MB parameter in parameters.txt, the Backint agent configuration file.

For a multi-regional bucket, start with 8 channels by setting the parallel_data_backup_backint_channels parameter to 8 in the SAP HANA global.ini configuration file.

For a regional bucket, start with 12 channels by setting the parallel_data_backup_backint_channels in the global.ini file to 12.

Adjust the number of channels as necessary to meet your backup performance objectives.

As stated in the SAP HANA documentation, each additional channel requires an I/O buffer of 512 MB. Specify the size of the I/O buffer by using the data_backup_buffer_size parameter appropriately in the backup section of the global.ini file. For more information regarding the effect of the IO buffer size on backup times, see SAP Note 2657261. As of HANA 2.0 SP05 SAP specifies a maximum value for this parameter of 4 GB. Testing in Google Cloud has not shown a benefit in increasing the buffer size significantly beyond the default, but this may vary for your workload.

For more information about multistreaming, in the SAP HANA Administration Guide that is specific to your SAP HANA version, see Multistreaming Data Backups with Third-Party Backup Tools.

Parallel uploads

You can improve the upload performance of log backup files by enabling the parallel upload feature of the Backint agent. This is especially useful for log backup files because they cannot be multi-streamed from SAP HANA.

For data backups, you can tune the number of SAP HANA backup channels by using only the SAP HANA parameter parallel_data_backup_backint_channels.

When parallel upload is enabled, the Backint agent splits each individual backup file that is received from SAP HANA into multiple parts that are then uploaded in parallel, which improves upload performance.

As the parts are received by Cloud Storage, they are reassembled and stored as the original single file that was received by Backint agent from SAP HANA. The single file is subject to the 5 TB size limit for objects in Cloud Storage.

Configuring parallel upload

The parallel upload feature is enabled in the parameters.txt configuration file by specifying the maximum number of parallel upload threads on the #PARALLEL_FACTOR parameter.

The parameters #PARALLEL_PART_SIZE_MB, which sets the size of each part, and #THREADS, which determines the number of worker threads, are for advanced tuning only. Don't change these settings unless you are instructed to do so by Cloud Customer Care. The default values rarely need to be changed.

For more information about the parallel upload parameters, see Configuration options for the Backint agent.

Parallel upload restrictions

The following restrictions apply to the parallel upload feature:

  • If you enable encryption with either the #ENCRYPTION_KEY or #KMS_KEY_NAME configuration parameter, then you cannot use parallel upload. Encryption is incompatible with parallel upload. If you specify the #PARALLEL_FACTOR parameter with either of these encryption parameters, then the Backint agent exits with a status of 1.
  • If you enable compression, then you cannot use parallel upload. Compression is incompatible with parallel upload. From version 1.0.22, if you specify the #PARALLEL_FACTOR parameter and omit the #DISABLE_COMPRESSION parameter in your configuration, then the Backint agent exits with status of 1.
  • If your Cloud Storage bucket implements a retention policy, then the bucket does not support parallel uploads. A retention policy prevents the reassembly of the parts into a single file, which causes the upload to fail.

For more information about the parallel upload parameters, see Configuration options for the Backint agent.

Tuning parallel upload

For log backups, parallel uploads can significantly improve the backup throughput because SAP HANA does not multistream log backups. In most cases, specifying a #PARALLEL_FACTOR of 16 or less is sufficient. For very large log volumes, you can maximize the throughput by using a high #PARALLEL_FACTOR value, such as 16, and increasing the values for the SAP HANA parameters log_segment_size_mb and max_log_backup_size.

In some cases, using a high #PARALLEL_FACTOR value can decrease the overall throughput, such as might happen if you are also using a high number of parallel backup channels.

To limit the network bandwidth that your backups use, use #RATE_LIMIT_MB to set the maximum amount of bandwidth that parallel uploads can use.

To find a good setting for your specific environment, workload, and backup type, perform tests with different settings and measure the backup throughput.

Authentication and access control for the Backint agent

Google Cloud uses service accounts to identify programs like the Backint agent and to control which Google Cloud resources the programs can access.

Required Cloud Storage permissions

A service account for the Backint agent must be granted permissions to the Google Cloud resources that the Backint agent accesses. The Storage Object Admin role provides list, get, create, and delete permissions for objects in Cloud Storage buckets.

You can set the permissions for the service account at the project level or the bucket level. If you set it at the project level, you give the Backint agent access to all of the buckets in your project. If you set it at the bucket level, you give the Backint agent access to only a single bucket. For more information about Cloud Storage bucket permissions, see:

Service account options for the Backint agent

If SAP HANA is running on a Compute Engine VM, by default, the Backint agent uses the service account of the VM.

If you use the VM service account, the Backint agent has the same project-level permissions as all of the other programs and processes that use the VM service account.

For the strictest access control, create a separate service account for the Backint agent and grant the service account access to the bucket at the bucket level.

If SAP HANA is not running on a Compute Engine VM, you must create a service account for the Backint agent. Create the service account in the Google Cloud project that contains the Cloud Storage bucket that the Backint agent will use.

When you create a service account for the Backint agent, you also need to create a service account key. You store the key on the SAP HANA host and specify the path to the key in the parameters.txt file. When SAP HANA is running on a Compute Engine VM, specifying the path to a key directs the Backint agent to use the service account that is associated with the key instead of the VM service account.

When using a dedicated service account for the Backint agent, rotate your keys regularly as a best practice to protect against unauthorized access.

If you use a customer-managed encryption key that is generated by Cloud Key Management Service to encrypt your backups in Cloud Storage, you need to give your service account access to the encryption key. For more information, see Assigning a Cloud KMS key to a service account.

Access to Google Cloud APIs and metadata servers

The Backint agent requires access to the following Google Cloud IP addresses and hosts during backup and recovery operations:

  • For access to Cloud Storage:
    • Version 1.0.14 and later of the agent: storage.googleapis.com
    • Version 1.0.13 and earlier: www.googleapis.com
  • If you specify a service account on the #SERVICE_ACCOUNT property, oauth2.googleapis.com for authentication.
  • 169.254.169.254 for the Compute Engine instance metadata server which, by default, resolves internal DNS names.
  • metadata.google.internal also for VM instance metadata.

If the Backint agent and SAP HANA are running on a Compute Engine VM that does not have access to the internet, you need to configure Private Google Access so that Backint agent can interact with Cloud Storage and, if using a dedicated service account, authenticate itself with Google Cloud.

To configure Private Google Access, see Configuring Private Google Access.

Proxy servers and the Backint agent

By default, the Backint agent bypasses any HTTP proxy and does not read proxy environment variables in the operating system, such as http_proxy, https_proxy, or no_proxy.

If you have no alternative or your organization understands the performance implications and has the expertise that is required to support the performance of routing backups through a proxy server, you can configure the Backint agent to use a proxy.

The proxy settings for the Backint agent are contained in the net.properties file:

/usr/sap/SID/SYS/global/hdb/opt/backint/backint-gcs/jre/conf/net.properties

Bypassing a proxy server for backups and recoveries

Although the Backint agent bypasses proxy servers by default, you can make the bypass explicit by specifying the required Google Cloud domain names and IP addresses on the http.nonProxyHosts parameter in the /usr/sap/SID/SYS/global/hdb/opt/backint/backint-gcs/jre/conf/net.properties file. For example:

http.nonProxyHosts=localhost|127.|[::1]|.googleapis.com|169.254.169.254|metadata.google.internal

Using a proxy server for backups and recoveries

To configure the Backint agent to send backups through a proxy server, specify the proxy host and port number parameters in the file /usr/sap/SID/SYS/global/hdb/opt/backint/backint-gcs/jre/conf/net.properties.

For queries to the VM instance metadata, the Backint agent cannot use a proxy, so you must specify the domain name and IP address for instance metadata on the http.nonProxyHosts parameter.

The following example shows a valid proxy configuration for the Backint agent:

http.proxyHost=proxy-host
http.proxyPort=proxy-port
http.nonProxyHosts=localhost|127.*|[::1]|169.254.169.254|metadata.google.internal
https.proxyHost=proxy-host
https.proxyPort=proxy-port

Updates for the Backint agent

Google Cloud periodically releases new versions of the Backint agent that you can download and install yourself at no additional cost.

Before you update the Backint agent to a new version in your production environment, make sure to test the new version in a non-production environment.

Updating the Backint agent requires the SAP HANA host to support remote HTTP requests to https://www.googleapis.com/.

To update an existing instance of the Backint agent to a new version, see Updating the Backint agent to a new version.

Encryption for backups

Cloud Storage always encrypts your data before it is written to disk. To apply your own additional layer of encryption, you can provide your own encryption keys for the server-side encryption of your Backint agent backups.

You have two options for providing your own keys with Backint agent:

To use a customer-managed encryption key, specify the path to the key on the #KMS_KEY_NAME parameter in the parameters.txt file. You also need to give the VM or Backint agent service account access to the key. For more information about giving a service account access to an encryption key, see Assigning a Cloud KMS key to a service account.

To use a customer-supplied encryption key, specify the path to the key on the #ENCRYPTION_KEY parameter in the parameters.txt file. The key must be a base64 encoded AES-256 key string, as described in Customer-supplied encryption keys.

Encryption restrictions

The following restrictions apply to the encryption feature:

  • If both #KMS_KEY_NAME and #ENCRYPTION_KEY are specified, the Backint agent fails and exits with a status of 1.

  • If #PARALLEL_FACTOR is specified with either#KMS_KEY_NAME or #ENCRYPTION_KEY, the Backint agent fails and exits with a status of 1.

Configuration parameter reference

You can specify a number of options for the Backint agent in the parameters.txt configuration file.

When you first download the Backint agent, the parameters.txt file contains only two parameters:

  • #BUCKET
  • #DISABLE_COMPRESSION

Note that the # is part of the parameter, and not a comment indicator.

Specify each parameter on a new line. Separate parameters and values with a space.

The Backint agent configuration parameters are shown in the following table.

Parameter and value Description
#BUCKET bucket-name A required parameter that specifies the name of the Cloud Storage bucket that the Backint agent writes to and reads from. The Backint agent creates backup objects with the storage class of the bucket and supports all storage classes. The Backint agent uses Compute Engine default encryption to encrypt data at rest.
#CHUNK_SIZE_MB MB Advanced tuning parameter.

Controls the size of HTTPS requests to Cloud Storage during backup or restore operations. The default chunk size is 100 MB, which means that a single HTTP request stream to or from Cloud Storage is kept open until 100 MB of data is transferred.

Do not modify this setting unless instructed to do so by Customer Care. The default setting, which balances throughput and reliability, rarely needs to be changed.

Because Backint agent retries failed HTTP requests multiple times before failing an operation, smaller chunk sizes result in less data that needs to be retransmitted if a request fails. Larger chunk sizes can improve throughput, but require more memory usage and more time to resend data in the event of a request failure.

#DISABLE_COMPRESSION

Optional parameter that disables the default, on-the-fly compression when Backint agent writes backups to the Cloud Storage bucket. #DISABLE_COMPRESSION is specified by default.

Specifying #DISABLE_COMPRESSION is recommended. Although compression reduces the cost of storage for backups in Cloud Storage, it requires more CPU processing during backup operations and slows down the effective backup throughput.

Regardless of this setting, the Backint agent supports either compressed or uncompressed backup files during a restore operation.

#ENCRYPTION_KEY path/to/key/file Specifies a path to a customer-supplied encryption key that Cloud Storage uses to encrypt backups. The path must be specified as a fully qualified path to a base64-encoded AES-256 key.

You cannot specify #ENCRYPTION_KEY with #KMS_KEY_NAME or #PARALLEL_FACTOR.

For more information about using your own encryption keys on Google Cloud, see Customer-supplied encryption keys

#KMS_KEY_NAME path/to/key/file Specifies a path to a customer-managed encryption key that is generated by Cloud Key Management Service. Cloud Storage uses this key to encrypt backups.

If SAP HANA is running on a Compute Engine VM, the key must be accessible to the VM. If SAP HANA is not running on Google Cloud, the Cloud KMS key must be linked to the Backint agent service account. For information, see Service accounts.

Specify the path by using the following format: projects/key_project/locations/location/keyRings/key_ring_name/cryptoKeys/key_name

Where:

  • key_project is the ID of the project that is associated with the key.
  • location is the regional availability of the key. For more information, see Types of locations for Cloud KMS.
  • key_ring_name is the name of the key ring that contains the key.
  • key_name is the name of the key.

You cannot specify #KMS_KEY_NAME with #ENCRYPTION_KEY or #PARALLEL_FACTOR.

For more information about managing your own encryption keys on Google Cloud, see Customer-managed encryption keys

#MAX_GCS_RETRY integer Defines the maximum number of times the Backint agent retries a failed attempt to read and write to Cloud Storage. The default is 5, which is the recommended value.
#PARALLEL_FACTOR integer

Optional parameter that enables parallel upload and sets the maximum number of parallel uploads. A value of `1` disables parallel uploads. The default is `1`.

Do not enable parallel upload if:

  • The target bucket uses a retention policy.
  • #ENCRYPTION_KEY or #KMS_KEY_NAME are specified.
#PARALLEL_PART_SIZE_MB integer Advanced tuning parameter.

Sets the size, in MB, of each part that is uploaded in parallel. The default is 128 MB.

Do not modify this setting unless instructed to do so by Customer Care. The default setting rarely needs to be changed.

#RATE_LIMIT_MB integer Optional parameter that sets an upper limit, in MB, on the outbound bandwidth to Compute Engine during backup or restore operations. By default, Google Cloud does not limit network bandwidth for the Backint agent. When set, throughput might vary, but will not exceed the specified limit.
#SERVICE_ACCOUNT path/to/key/file Optional parameter that specifies the fully-qualified path to the JSON-encoded Google Cloud service account key when Compute Engine default authentication is not used. Specifying #SERVICE_ACCOUNT directs the Backint agent to use the key when authenticating to the Cloud Storage service. The Compute Engine default authentication is recommended.
#THREADS integer Advanced tuning parameter.

Sets the number of worker threads. The default is the number of processors in the machine.

Do not modify this setting unless instructed to do so by Customer Care. The default setting rarely needs to be changed.

#READ_IDLE_TIMEOUT integer Advanced tuning parameter.

Sets the maximum amount of time in milliseconds that the Backint agent will wait to open the backup file. The default is 1000.

Do not modify this setting unless instructed to do so by Customer Care. The default setting rarely needs to be changed.

#HTTP_READ_TIMEOUT integer Advanced tuning parameter.

Sets the timeout in milliseconds for reading responses from the Cloud Storage API requests. The default is -1; no timeout.

Do not modify this setting unless instructed to do so by Customer Care. The default setting rarely needs to be changed.

Logging for the Backint agent

In addition to the logs kept by SAP HANA in backup.log, the Backint agent writes operational and communication-error events to log files in the logs subdirectory in /usr/sap/SID/SYS/global/hdb/opt/backint/backint-gcs.

When the size of a log file reaches 10 MB, the Backint agent rotates the log files.

If necessary, you can edit the Backint agent logging configuration in /usr/sap/SID/SYS/global/hdb/opt/backint/backint-gcs/logging.properties.

The Backint agent also supports Cloud Logging. To enable Cloud Logging, see the Cloud Storage Backint agent for SAP HANA installation guide.

Using the Backint agent in SAP HANA HA deployments

In an SAP HANA high-availability cluster, you need to install the Backint agent on each node in the cluster.

Use the same Backint agent configuration with the same Cloud Storage bucket specifications for each SAP HANA instance in the HA cluster. You can use the same bucket specifications because, during normal operations, only the active SAP HANA instance in an HA configuration writes backups to Cloud Storage. The secondary system is in replication mode. This is true for data, log, and catalog backups.

Further, application clustering software, such as Pacemaker, prevents split-brain scenarios, in which more than one SAP HANA system in a cluster thinks that it is the primary instance.

However, during maintenance activities, when clustering might be disabled, if the standby database is removed from replication and brought back online, you need to make sure that backups are triggered only on the primary database.

Because the Backint agent is unaware of which SAP HANA system is currently the active system and has no scheduling or triggering mechanisms, you need to manage the scheduling and backup triggers by using SAP mechanisms, such as the SAP ABAP transaction DB13.

SAP ABAP applications connect to the HA cluster through the virtual IP, so the trigger is always routed to the active SAP HANA instance.

If the backup trigger is defined locally on each server, for example as a local operating system script, and both the primary and secondary systems think they are the active system, they both might attempt to write backups to the storage bucket.

Using the Backint agent in SAP HANA DR deployments

In a disaster recovery configuration, where a recovery instance of SAP HANA in another Google Cloud region is kept in sync by using asynchronous SAP HANA System Replication, specify a different bucket for the recovery instance than the primary SAP HANA system uses.

While the DR system is usually in replication mode and therefore cannot run a backup itself, during regular disaster recovery testing, the recovery instance is brought online and could trigger backups. If it does and the recovery system doesn't use a separate bucket, the backups might overwrite data from the primary database.

In the case of an actual disaster that requires you to recover from a backup to your DR region, you can update the Backint agent configuration to reference the multi-regional bucket that your primary HA system uses.

Using the Backint agent in SAP HANA scale-out systems

In SAP HANA scale-out systems, you need to install the Backint agent on each node in the system.

To simplify the management of the parameters.txt configuration file and, if you are using one, the Backint agent service account key, you can place these files in a shared NFS directory.