Custom notebook¶
Secure Data Encryption in Custom Python/PySpark Notebooks¶
Before you start, make sure you have completed the Initialization steps. This ensures your environment is ready for secure encryption and decryption operations.
Encrypting Data in Your Notebook¶
To protect sensitive information, use the encrypt_df() function from the nccutils module. For details about its usage and parameters, see the Data Privacy documentation.
Step-by-step: Encrypt Data in a PySpark Notebook¶
Ensure the following prerequisites:
- df is a PySpark DataFrame containing your data.
- PrivacyColumns is automatically generated by NCC; manual configuration is not required.
- crypto_key is securely retrieved from Azure Key Vault.
Retrieve your encryption key from Azure Key Vault:
crypto_key = notebookutils.credentials.getSecret(f'https://{EncryptionKeyVault}.vault.azure.net/', KeyName)
Encrypt your DataFrame:
if df.count() > 0:
if PrivacyColumns != "{}":
df2 = encrypt_df(
input_dataframe=df,
privacy_columns=PrivacyColumns,
crypto_key=crypto_key,
column_mapping=entity_json.get('columnMappings', ''),
PrimaryKeyColumns=entity_json.get('primaryKeys', ''),
NoHistoryColumns=entity_json.get('noHistoryColumns', ''),
ExcludeColumns=entity_json.get('excludeColumns', ''),
foreign_key_columns=entity_json.get('values', '').get('foreignkeys', '')
)
output_data = df2
else:
output_data = df
else:
output_data = df
df = output_data
df.count()
This approach encrypts sensitive columns in your DataFrame using NCC configuration. For further details on encryption and available options, refer to the Data Privacy documentation.
NCC Configuration¶
For more information about configuring entities in NCC, see NCC Data Privacy under the 'Encryption' section.