Skip to content

Configure Entity Values for Bronze Layer Entities in NCC

Learn how to configure Entity Values for Bronze layer entities in NCC to optimize data quality and JSON parsing. This article provides guidance on deduplication strategies and working with JSON data using Entity Values.

Introduction

Bronze entities in NCC can be customized with Entity Values to support a variety of scenarios. For a complete list of available Entity Values, see Entity Values Reference.

Manage Data Quality

Control duplicate data during source file ingestion by configuring deduplication options for Bronze layer entities.

Deduplication Modes

Set the DQ_Deduplication_Mode parameter to define the deduplication strategy. Supported modes include:

  • None
    All rows from the source file are loaded without deduplication.

  • Row
    Only unique rows are retained based on the entire row content.

  • Key
    Deduplication is performed using primary key column(s). Only the first occurrence of each key is kept.

TIP
Choose the deduplication mode that best matches your data quality requirements and the structure of your source data.

Deduplication modes illustration

Deduplication Mode Examples

None

Before name age height After name age height
Alice 5 80 Alice 5 80
Alice 5 80 Alice 5 80
Alice 10 80 Alice 10 80

All rows are loaded, including duplicates.

Row

Before name age height After name age height
Alice 5 80 Alice 5 80
Alice 5 80 Alice 10 80
Alice 10 80

Only unique rows are retained.

Key (Name is the key column)

Before name age height After name age height
Alice 5 80 Alice 5 80
Alice 5 80
Alice 10 80

IMPORTANT
When enabling the Key option, make sure that your primary keys are correct. Otherwise, you may lose data extracted from the source system.

Only the first row for each key value is kept.


For more details on Bronze layer entity configuration, see NCC documentation.

Parse JSON Data with Entity Values

Entity Values enable parsing and transforming JSON data in Microsoft Fabric. This section describes how to use the Collection and DateFormat Entity Values for effective data processing.

Collection Entity Value

Use the Collection Entity Value to specify an array from a Landing zone entity's JSON. Arrays are ordered collections of values, defined by square brackets ([]) and separated by commas.

Example JSON array:

["2024-01-01T13:00:00.000", "2024-02-01T13:00:00.000"]

To access collection members, use the collection name, followed by a period (.), and then the member name.

Example JSON Bronze Layer Entity with Collection

Tip:
Use backticks ( ` ) to separate the collection and its members.
Example: billingAddress`.`street

If you wrap the period in backticks, it references a dictionary key:

{"@odata": {"context": "http://services.odata.org"}}

Otherwise, it references a key with a period in its name:

{"@odata.context": "http://services.odata.org"}

Data Processing with collection and column_mapping

Configure the collection and column_mapping settings to process JSON files using PySpark. These settings help transform nested JSON structures into a flat DataFrame.

  • The collection field specifies JSON arrays to explode into rows for lower granularity.
  • For multiple arrays:
    • Nested arrays: separate with a semicolon (;)
    • Non-nested arrays: use different Bronze entities from the same Landing zone entity

Example JSON:

{
    "id": 1,
    "name": "John Doe",
    "orders": [
        {
            "order_id": 101,
            "amount": 250
        },
        {
            "order_id": 102,
            "amount": 450
        }
    ]
}

To explode the orders array, set collection to orders.

Column Mapping

The column_mapping field defines how JSON columns map to DataFrame or Delta Parquet columns. Use it to rename columns or select specific fields. Each column name should be unique.

Example JSON:

{
    "id": 1,
    "name": "John Doe",
    "billingAddress": {
        "street": "123 Main St",
        "city": "Anytown"
    },
    "shippingAddress": {
        "street": "123 Main St",
        "city": "Anytown"
    }
}

Resulting column mapping:

[
    {"source": "id", "target": "user_id"},
    {"source": "name", "target": "full_name"},
    {"source": "`billingAddress`.`street`", "target": "billingAddress_street"},
    {"source": "`billingAddress`.`city`", "target": "billingAddress_city"},
    {"source": "`shippingAddress`.`street`", "target": "shippingAddress_street"},
    {"source": "`shippingAddress`.`city`", "target": "shippingAddress_city"}
]

Nested arrays example:

{
    "id": 1,
    "name": "John Doe",
    "orders": [
        {
            "order_id": 101,
            "amount": 250,
            "items": [
                {"item_id": 1, "product": "Book"},
                {"item_id": 2, "product": "Pen"}
            ]
        },
        {
            "order_id": 102,
            "amount": 450,
            "items": [
                {"item_id": 3, "product": "Notebook"}
            ]
        }
    ]
}

Resulting column mapping:

[
    {"source": "id", "target": "user_id"},
    {"source": "name", "target": "full_name"},
    {"source": "`orders`.`order_id`", "target": "order_id"},
    {"source": "`orders`.`amount`", "target": "order_amount"},
    {"source": "`orders`.`items`.`item_id`", "target": "item_id"},
    {"source": "`orders`.`items`.`product`", "target": "product_name"}
]

Configuration example:

collection = "orders;orders.items"
column_mapping = [
    {"source": "id", "target": "user_id"},
    {"source": "name", "target": "full_name"},
    {"source": "orders.order_id", "target": "order_id"},
    {"source": "orders.amount", "target": "order_amount"},
    {"source": "orders.items.item_id", "target": "item_id"},
    {"source": "orders.items.product", "target": "product_name"}
]

NOTE
Ensure the collection field references arrays in your JSON structure.
The column_mapping should accurately map source fields to target fields in the DataFrame.


DateFormat Entity Value

The DateFormat Entity Value allows you to specify a custom date format for all datetime columns in a JSON entity.

Example JSON Bronze Layer Entity with DateFormat

By default, dates use the ISO-8601 format. You can change this to another format, such as "dd-mm-yyyy hh:mm:ss", to match regional preferences.

Note:
The specified format applies to all datetime columns in the JSON entity.