How polished is PyPy

Client-side encryption with Python

Overview

The Azure Storage Client Library for Python supports encrypting data within client applications prior to uploading the data to Azure Storage, as well as decrypting data while it is being downloaded to the client.

Note

The Azure Storage Python library is in preview.

Encryption and decryption using the envelope method

The encryption and decryption processes follow the envelope method.

Encryption using the envelope method

Encryption using the envelope method works as follows:

  1. The Azure storage client library generates a content encryption key (CEK), which is a one-time symmetric key.
  2. User data is encrypted with this CEK.
  3. The CEK is then enclosed (encrypted) with the key encryption key (KEK). The KEK is identified by a key identifier and can be an asymmetric key pair or a symmetric key that is administered locally. The storage client library itself never has access to the KEK, it just calls the key-wrapping algorithm provided by the KEK. Users can use custom key wrapping and unlocking providers as needed.
  4. The encrypted data is then uploaded to the Azure storage service. The wrapped key and some additional encryption metadata is either stored as metadata (in a blob) or interpolated with the encrypted data (queue messages and table entities).

Decryption via the envelope method

The decryption via the envelope procedure works as follows:

  1. The client library assumes that the user is managing the KEK. The user does not need to know the specific key that was used for encryption. Instead, a key resolver can be set up and used that resolves various key identifiers into keys.
  2. The client library downloads the encrypted data along with any encryption material stored for the service.
  3. The wrapped content encryption key (CEK) is then extracted using the key encryption key (KEK). The client library, in turn, does not have access to the KEK; it simply calls the unpacking algorithm of the custom provider.
  4. The content encryption key (CEK) is then used to decrypt the encrypted user data.

Encryption mechanism

The storage client library uses AES to encrypt user data. In particular, the CBC mode (Cipher Block Chaining) is used with AES. Because each service works differently, the services are discussed here.

BLOBs

The client library currently only supports full blob encryption. Encryption is supported when users use the createUse * methods. Both full and range downloads are supported, and uploads and downloads can be parallelized.

When encrypting, the client library generates a random initialization vector (IV) that is 16 bytes in size, along with a random content encryption key (CEK) that is 32 bytes in size. This information is used to perform the envelope encryption of the blob data. The wrapped CEK and some additional encryption metadata are then stored as blob metadata along with the encrypted blob for the service.

warning

If you edit or upload your own metadata for the blob, make sure that this metadata is preserved. If you upload new metadata without this metadata, the enclosed CEK, IV and other metadata will be lost and the blob contents will never be accessible again.

When downloading an encrypted blob, the contents of the entire blob are saved with the get* Helper methods retrieved. The wrapped CEK is unpacked and used along with the IV (stored as blob metadata in this case) to return the decrypted data to users.

When downloading any area (get* Methods with passed range parameters) in the encrypted blob, the range specified by the users is adjusted to retrieve a small amount of additional data that can be used to successfully decrypt the requested range.

Only block blobs and page blobs can be encrypted / decrypted with this scheme. Encryption of attachment blobs is not currently supported.

Queues

Because queue messages can be in any format, the client library defines a custom format that includes the initialization vector (IV) and encrypted content encryption key (CEK) in the message body.

When encrypting, the client library generates a random IV that is 16 bytes in size along with a random CEK that is 32 bytes in size. This information is used to perform the envelope encryption of the text of the queue message. The wrapped CEK and some additional encryption metadata are then added to the encrypted queue message. This changed message (see below) is saved for the service.

During decryption, the wrapped key is extracted from the queue message and unpacked. The initialization vector is also extracted from the queue message and used together with the unpacked key to decrypt the data in the queue message. Note that the encryption metadata is small in size (less than 500 bytes). While this counts towards the 64K limit for a queue message, the impact should be reasonable.

Tables

The client library supports encryption of entity properties for insert and replace operations.

Note

Merging is not currently supported. Because a subset of the properties may already have been encrypted with a different key, simply merging the new properties and updating the metadata will result in data loss. Merging either requires additional service calls to read the pre-existing entity from the service, or the use of a new key per property. Both methods are unsuitable for performance reasons.

The encryption of table data works as follows:

  1. The users specify the properties to be encrypted.

  2. The client library generates a random initialization vector (IV) that is 16 bytes in size along with a random content encryption key (CEK) of 32 bytes for each entity and performs envelope encryption for each property to be encrypted by adding a new IV per property is derived. The encrypted property is stored as binary data.

  3. The wrapped CEK and some additional encryption metadata are then stored as two additional reserved properties. The first reserved property (_ClientEncryptionMetadata1) is a string property that contains the information about the IV, version, and key wrapped. The second reserved property (_ClientEncryptionMetadata2) is a binary property that contains the information about the properties that are being encrypted. The information in this second property (_ClientEncryptionMetadata2) is itself encrypted.

  4. Because of these additional reserved properties that are required for encryption, users may now only have 250 custom properties instead of 252. The total entity size must be less than 1MB.

    Note that only string properties can be encrypted. If other types of properties are to be encrypted, they must be converted to strings. The encrypted strings are stored as binary properties for the service, and they are converted back to strings after decryption (raw strings, not EntityProperties of type EdmType.STRING).

    For tables, users must specify the properties that should be encrypted in addition to the encryption policy. This can be done either by storing these properties in TableEntity objects of type EdmType.STRING and setting the encryption to TRUE, or by setting the "encryption_resolver_function" in the tableservice object. An encryption resolver is a function that takes a partition key, row key, property name and returns a Boolean value indicating whether the property should be encrypted. With encryption, the client library uses this information to decide whether to encrypt a property when it is written to the network. The delegate also offers the possibility of a logic regarding the encryption of the properties. (Example: if X then property A is encrypted, otherwise properties A and B are encrypted.) Note that it is not necessary to provide this information when reading or querying entities.

Batch operations

A single encryption policy applies to all rows in the batch. The client library internally generates a new random IV and a random CEK per line in the batch. Users can also encrypt different properties for each operation in the batch by defining this behavior in the encryption resolver. When a batch is created as a context manager using the tableservice batch () method, the encryption policy is automatically applied to the batch. When a batch is created explicitly by calling the constructor, the encryption policy must be passed as a parameter and must not change for the life of the batch. Note the following: Entities are encrypted using the batch encryption policy when inserted into the batch (entities are NOT encrypted using the tableservice encryption policy at the time the batch is committed).

Interrogate

Note

Because the entities are encrypted, you cannot run queries that filter on an encrypted property. If you try to do this, you will get incorrect results because the service compares encrypted data with unencrypted data.

To perform query operations, you must specify a key resolver that can resolve all keys in the result set. If an entity in the query result cannot be resolved into a provider, the client library will throw an error. For each query that performs server-side projections, by default the client library adds the specific encryption metadata properties (_ClientEncryptionMetadata1 and _ClientEncryptionMetadata2) to the selected columns.

Important

When using client-side encryption, keep the following important points in mind:

  • When reading from or writing to an encrypted blob, use commands to upload the full blob and download the range-based or full blob. When writing to an encrypted blob, avoid logging operations such as logging. B. "Put Block", "Put Block List", "Write Pages" or "Clear Pages". Otherwise, the encrypted blob may become corrupted and unreadable.
  • A similar limitation applies to tables. Be sure to update the encryption metadata when you update any encrypted properties.
  • If you set metadata on the encrypted blob, the encryption-related metadata required for decryption may be overwritten because setting metadata is not an additive process. This also applies to snapshots: do not provide metadata while taking a snapshot of an encrypted blob. If any metadata needs to be set, go to the get_blob_metadataMethod to get the latest encryption metadata. Also, avoid concurrent writes while metadata is being set.
  • Activate that require_encryption-Flag in the service object for users who should only work with encrypted data. Please see below for more details.

The store's client library expects the KEK and key resolver provided to implement the following interface. Azure Key Vault support for Python KEK management is pending and will be incorporated into this library as soon as it is completed.

Client API / Interface

After a storage service object (i.e. “blockblobservice”) has been created, the user can assign values ​​for the fields that make up an encryption policy: “key_encryption_key”, “key_resolver_function” and “require_encryption”. Users can provide either only a KEK, only a resolver, or both. Key_encryption_key is the basic key type, identified with a key identifier, and provides the logic for including / unpacking. "Key_resolver_function" is used to resolve a key during the decryption process. The function returns a valid KEK for a key identifier. This allows users to choose between multiple keys managed in multiple locations.

The KEK must implement the following methods in order to successfully encrypt data:

  • wrap_key (cek): Wraps the specified CEK (bytes) using an algorithm selected by the user. Returns the wrapped key.
  • get_key_wrap_algorithm (): Returns the algorithm used for key wrapping.
  • get_kid (): Returns the string key ID for this KEK. The KEK must implement the following methods in order to successfully decrypt data:
  • unwrap_key (cek, algorithm): Returns the unwrapped form of the specified CEK using the algorithm specified by the string.
  • get_kid (): Returns a string key ID for this KEK.

The key resolver must implement at least one method that, after receiving a key identifier, returns the corresponding KEK that implements the above interface. Only this method may be assigned to the key_resolver_function property in the service object.

  • The key is always used for encryption. A missing key will result in an error.

  • The following applies to decryption:

    • The key resolver is called when specified to retrieve the key. If the resolver is specified but it does not have a mapping for the key identifier, an error is raised.

    • If no resolver but a key is specified, the key is used if its identifier matches the required key identifier. If the ID does not match, an error is triggered.

      The encryption examples in azure.storage.samples illustrate a more detailed end-to-end scenario for blobs, queues, and tables. Sample implementations of the KEK and the key resolver are provided in the sample files as KeyWrapper and KeyResolver, respectively.

RequireEncryption mode

Users can optionally activate an operating mode in which all data to be uploaded or downloaded must be encrypted. In this mode, attempts to upload data without an encryption policy or to download data that is not encrypted for the service will fail on the client. The flag require_encryption in the service object controls this behavior.

Blob service encryption

Set the fields for the encryption policy to the blockblobservice object. Everything else is handled internally by the client library.

We are currently working on snippets of code for version 12.x of the Azure Storage Client Libraries. For more information, see Announcing the Azure Storage v12 Client Libraries.

Queue service encryption

Set the fields for the encryption policy to the queueservice object.Everything else is handled internally by the client library.

We are currently working on snippets of code for version 12.x of the Azure Storage Client Libraries. For more information, see Announcing the Azure Storage v12 Client Libraries.

Table service encryption

In addition to creating an encryption policy and setting the request options policy, you must either have a encryption_resolver_function in the table service or set the encrypt attribute for the EntityProperty.

Use the resolver

We are currently working on snippets of code for version 12.x of the Azure Storage Client Libraries. For more information, see Announcing the Azure Storage v12 Client Libraries.

Using attributes

As mentioned above, a property can be identified for encryption by storing it in an EntityProperty object and setting the encryption field.

We are currently working on snippets of code for version 12.x of the Azure Storage Client Libraries. For more information, see Announcing the Azure Storage v12 Client Libraries.

Encryption and performance

Note that encrypting your storage data causes additional performance overhead. The content key and IV need to be generated, the content itself needs to be encrypted, and additional metadata needs to be formatted and uploaded. This effort varies depending on the amount of data to be encrypted. It is recommended that customers always test their applications for performance during development.

Next Steps