Configure Data Cleansing Options in the Silver Layer¶
Learn how to apply data cleansing functions to incoming data in the Silver Layer of your solution. For example, you can convert all text in a column to uppercase. To enable data cleansing, add the entity value DQ_CleansingOptions to a Silver Layer Entity. The value should contain a JSON array that specifies one or more cleansing functions to execute.
Define the JSON Structure¶
Each function in the array must include the following properties:
function: Specifies the name of the cleansing function.columns: A semicolon-separated list of columns to which the function will be applied.parameters: A JSON object containing parameters and their values.
Example
[
{"function": "to_upper",
"columns": "TransactionTypeName"},
{"function": "custom_function_with_params",
"columns": "TransactionTypeName;LastEditedBy",
"parameters": {"param1": "abc", "param2": "123"}}
]
Built-in NCC Functions¶
The following table lists the default cleansing functions available in NCC:
| Function name | Description | Parameters | Example parameters |
|---|---|---|---|
to_upper |
Converts specified columns to uppercase | {} |
Add Custom Cleansing Functions¶
You can extend cleansing capabilities by adding custom functions for each customer in the notebook NB_CUSTOM_DATACLEANSING_FUNCTIONS. Custom functions should use the following structure:
Note: The
dfargument passed into each cleansing function is a PySparkDataFrame, and your function must also return a PySparkDataFrame. Always return the transformeddf(neverNoneor another type). If you add or modify columns, reassigndf = df.<transformation>before returning so the caller receives the updated DataFrame.