Bronze Entities in Medallion Architecture¶
Bronze entities are the initial stage of processed data in the Medallion Architecture. After data is ingested into the landing zone, it transitions to the bronze stage for preliminary cleaning and transformation. At this stage, duplicates are removed and basic formatting is applied, preparing the data for further enrichment and analysis in the silver and gold stages.
How bronze entities relate to landing zone entities¶
A single landing zone entity can be associated with multiple bronze entities. This flexible relationship enables efficient data processing and organization as data moves through the architecture.
Bronze entity values¶
Bronze entities support a range of configuration values. The following tables describe the available options for each data source type.
All data source types¶
| Name | Description | Used for | Required | Default |
|---|---|---|---|---|
| timeout_total_in_seconds | Maximum duration for processing the entity (in seconds) | All Sources | No | 43200 |
| timeout_per_cell_in_seconds | Maximum duration for processing a notebook cell (in seconds) | All Sources | No | 1800 |
| valid_dq_deduplication_modes | Deduplication method if primary keys are insufficient | All Sources | No | none, other valid values: row, key |
TIP
For more information about the Data Quality options and examples, go to Data-Quality.
CSV data source types¶
| Name | Description | Used for | Required | Default |
|---|---|---|---|---|
| CompressionType | Compression type for CSV files | CSV Source | Yes, if CSV | none, valid: gzip, bzip2, lz4, snappy, deflate |
| ColumnDelimiter | Character used to separate columns | CSV Source | Yes, if CSV | ; |
| RowDelimiter | Character used to separate rows | CSV Source | Yes, if CSV | \r\n |
| EscapeCharacter | Character used for escaping | CSV Source | Yes, if CSV | \ |
| Encoding | Encoding format for the data | CSV Source | Yes, if CSV | UTF-8 |
| FirstRowIsHeader | Indicates if the first row contains headers | CSV Source | Yes, if CSV | 1 (0 = False, 1 = True) |
TIP
IfFirstRowIsHeaderisFalsethen the columns will be named_c0..._c99
JSON data source types¶
| Name | Description | Used for | Required | Default |
|---|---|---|---|---|
| Collection | Name of the collection to extract data from | JSON Source | No | |
| DateFormat | Format for date values (ISO-8601 by default) | JSON Source | No | |
| multiline | Indicates if the JSON contains multiline data | JSON Source | No | false |
TIP
For more information about the Entity values or examples, go to Collection and DateFormat.
Excel data source types¶
| Name | Description | Used for | Required | Default |
|---|---|---|---|---|
| SheetName | Name of the sheet containing the data | Excel Source | Yes, if Excel | |
| FirstRowIsHeader | Row number where the header is located | Excel Source | Yes, if Excel | 1 |
| ColumnRange | Range of columns to extract | Excel Source | Yes, if Excel | A:E |
| RowsRange | Range of rows to extract | Excel Source | Yes, if Excel | ALL |
| NaValues | Strings to treat as null values | Excel Source | No | NONE |
| Thousands | Character used as thousands separator | Excel Source | No | |
| Decimal | Character used as decimal separator | Excel Source | No | , |
| Comment | Rows containing comments to exclude | Excel Source | No | |
| SkipRows | Rows to skip | Excel Source | No |
How to create a bronze entity¶
To create a bronze entity:
- Use the Entity Wizard.
- Alternatively, create an entity directly from the landing zone entities panel.