Skip to content

Custom notebook


Secure Data Encryption in Custom Python/PySpark Notebooks

Before you start, make sure you have completed the Initialization steps. This ensures your environment is ready for secure encryption and decryption operations.

Encrypting Data in Your Notebook

To protect sensitive information, use the encrypt_df() function from the nccutils module. For details about its usage and parameters, see the Data Privacy documentation.

Step-by-step: Encrypt Data in a PySpark Notebook

Ensure the following prerequisites: - df is a PySpark DataFrame containing your data. - PrivacyColumns is automatically generated by NCC; manual configuration is not required. - crypto_key is securely retrieved from Azure Key Vault.

Retrieve your encryption key from Azure Key Vault:

crypto_key = notebookutils.credentials.getSecret(f'https://{EncryptionKeyVault}.vault.azure.net/', KeyName)

Encrypt your DataFrame:

if df.count() > 0:
    if PrivacyColumns != "{}":
        df2 = encrypt_df(
            input_dataframe=df,
            privacy_columns=PrivacyColumns,
            crypto_key=crypto_key,
            column_mapping=entity_json.get('columnMappings', ''),
            PrimaryKeyColumns=entity_json.get('primaryKeys', ''),
            NoHistoryColumns=entity_json.get('noHistoryColumns', ''),
            ExcludeColumns=entity_json.get('excludeColumns', ''),
            foreign_key_columns=entity_json.get('values', '').get('foreignkeys', '')
            )
        output_data = df2
    else:
        output_data = df
else:
    output_data = df

df = output_data
df.count()

This approach encrypts sensitive columns in your DataFrame using NCC configuration. For further details on encryption and available options, refer to the Data Privacy documentation.


NCC Configuration

For more information about configuring entities in NCC, see NCC Data Privacy under the 'Encryption' section.