Data Privacy in Microsoft Fabric¶
Introduction¶
Safeguarding personal and sensitive information is a critical responsibility when handling data. Microsoft Fabric offers integrated features to help you secure data through encryption, ensuring that only authorized users can access protected information. The Data Privacy module streamlines the encryption and decryption of sensitive data in your tables, supporting compliance and privacy standards.
Key Components¶
- Presidio: Microsoft’s solution for detecting and masking personally identifiable information (PII), such as names and email addresses.
- Faker: Generates synthetic data to replace real personal details for testing or anonymization purposes.
Presidio identifies sensitive data, while encryption and decryption functions enable you to secure and restore it as required.
Core Functions¶
encrypt_df¶
Encrypts specified columns in your data table to protect sensitive information.
Parameters:
- input_dataframe: The data table to process (PySpark DataFrame).
- privacy_columns: JSON string listing columns to encrypt.
- crypto_key: Key used for encryption.
Returns:
A new DataFrame with encrypted columns.
decrypt_dataframe¶
Decrypts previously encrypted columns to restore original data.
Parameters:
- df: DataFrame with encrypted columns.
- crypto_key: Key used for decryption.
Returns:
A DataFrame with decrypted columns.
Function Parameters¶
The following tables describe the required and optional parameters for each function:
encrypt_df¶
| Input Parameter | Description | Required | Example Value |
|---|---|---|---|
| input_dataframe | Your table of data (PySpark DataFrame). | Yes | input_dataframe |
| privacy_columns | JSON string listing columns to encrypt. | Yes | '[{"columnname": "email"}]' |
| crypto_key | Encryption key. | Yes | 'my_secret_key' |
decrypt_dataframe¶
| Input Parameter | Description | Required | Example Value |
|---|---|---|---|
| df | DataFrame with encrypted columns. | Yes | encrypted_df |
| crypto_key | Decryption key. | Yes | 'my_secret_key' |
The
input_dataframeanddfparameters accept data tables such as lists of dictionaries or DataFrames. Theprivacy_columnsparameter should be a JSON string listing columns to protect, e.g.'[{"columnname": "email"}, {"columnname": "phone"}]'. Thecrypto_keyacts as a password—keep it secure and do not share it. Optional parameters can be omitted; default values will be used.
Example: Encrypting and Decrypting Data¶
from nccutils.encryption.encryption import decrypt_dataframe, encrypt_df
# Encrypt sensitive data
encrypted_df = encrypt_df(input_dataframe, privacy_columns, crypto_key)
# Decrypt data when needed
decrypted_df = decrypt_dataframe(encrypted_df, crypto_key)
Secure Deletion of Sensitive Data¶
Microsoft Fabric follows a secure process to ensure sensitive data is permanently removed:
- Detection: Identify sensitive data for encryption.
- Encryption: Encrypt and store data in a new column.
- Overwriting: Replace original data with empty or null values to prevent recovery.
- Deletion: Remove the overwritten data from the table.
This process ensures that sensitive information is irreversibly deleted and cannot be restored, providing strong protection for personal and confidential data.
Summary¶
The Data Privacy module in Microsoft Fabric enables you to protect personal data by encrypting it and restoring it only when necessary, using Microsoft tools and automated workflows.