Configure Data Cleansing Options in the Silver Layer¶

Learn how to apply data cleansing functions to incoming data in the Silver Layer of your solution. For example, you can convert all text in a column to uppercase. To enable data cleansing, add the entity value DQ_CleansingOptions to a Silver Layer Entity. The value should contain a JSON array that specifies one or more cleansing functions to execute.

Define the JSON Structure¶

Each function in the array must include the following properties:

function: Specifies the name of the cleansing function.
columns: A semicolon-separated list of columns to which the function will be applied.
parameters: A JSON object containing parameters and their values.

Example

[
     {"function": "to_upper",
      "columns": "TransactionTypeName"}, 
     {"function": "custom_function_with_params",
      "columns": "TransactionTypeName;LastEditedBy",
      "parameters": {"param1": "abc", "param2": "123"}}
]

Built-in NCC Functions¶

The following table lists the default cleansing functions available in NCC:

Function name	Description	Parameters	Example parameters
`to_upper`	Converts specified columns to uppercase		`{}`

Add Custom Cleansing Functions¶

You can extend cleansing capabilities by adding custom functions for each customer in the notebook NB_CUSTOM_DATACLEANSING_FUNCTIONS. Custom functions should use the following structure:

Note: The df argument passed into each cleansing function is a PySpark DataFrame, and your function must also return a PySpark DataFrame. Always return the transformed df (never None or another type). If you add or modify columns, reassign df = df.<transformation> before returning so the caller receives the updated DataFrame.

def <function_name>(df, columns, args):

     print(args['<custom parameter name>']) # Use custom parameters

     for column in columns: # Apply function to each column

          df = df.<custom logic>

     return df # Always return the dataframe.