Skip to content

Data Privacy in Microsoft Fabric

Introduction

Safeguarding personal and sensitive information is a critical responsibility when handling data. Microsoft Fabric offers integrated features to help you secure data through encryption, ensuring that only authorized users can access protected information. The Data Privacy module streamlines the encryption and decryption of sensitive data in your tables, supporting compliance and privacy standards.

Key Components

  • Presidio: Microsoft’s solution for detecting and masking personally identifiable information (PII), such as names and email addresses.
  • Faker: Generates synthetic data to replace real personal details for testing or anonymization purposes.

Presidio identifies sensitive data, while encryption and decryption functions enable you to secure and restore it as required.

Core Functions

encrypt_df

Encrypts specified columns in your data table to protect sensitive information.

Parameters: - input_dataframe: The data table to process (PySpark DataFrame). - privacy_columns: JSON string listing columns to encrypt. - crypto_key: Key used for encryption.

Returns:
A new DataFrame with encrypted columns.

decrypt_dataframe

Decrypts previously encrypted columns to restore original data.

Parameters: - df: DataFrame with encrypted columns. - crypto_key: Key used for decryption.

Returns:
A DataFrame with decrypted columns.

Function Parameters

The following tables describe the required and optional parameters for each function:

encrypt_df

Input Parameter Description Required Example Value
input_dataframe Your table of data (PySpark DataFrame). Yes input_dataframe
privacy_columns JSON string listing columns to encrypt. Yes '[{"columnname": "email"}]'
crypto_key Encryption key. Yes 'my_secret_key'

decrypt_dataframe

Input Parameter Description Required Example Value
df DataFrame with encrypted columns. Yes encrypted_df
crypto_key Decryption key. Yes 'my_secret_key'

The input_dataframe and df parameters accept data tables such as lists of dictionaries or DataFrames. The privacy_columns parameter should be a JSON string listing columns to protect, e.g. '[{"columnname": "email"}, {"columnname": "phone"}]'. The crypto_key acts as a password—keep it secure and do not share it. Optional parameters can be omitted; default values will be used.

Example: Encrypting and Decrypting Data

from nccutils.encryption.encryption import decrypt_dataframe, encrypt_df

# Encrypt sensitive data
encrypted_df = encrypt_df(input_dataframe, privacy_columns, crypto_key)

# Decrypt data when needed
decrypted_df = decrypt_dataframe(encrypted_df, crypto_key)

Secure Deletion of Sensitive Data

Microsoft Fabric follows a secure process to ensure sensitive data is permanently removed:

  1. Detection: Identify sensitive data for encryption.
  2. Encryption: Encrypt and store data in a new column.
  3. Overwriting: Replace original data with empty or null values to prevent recovery.
  4. Deletion: Remove the overwritten data from the table.

This process ensures that sensitive information is irreversibly deleted and cannot be restored, providing strong protection for personal and confidential data.

Summary

The Data Privacy module in Microsoft Fabric enables you to protect personal data by encrypting it and restoring it only when necessary, using Microsoft tools and automated workflows.