Bronze Entities in Medallion Architecture¶
Bronze entities are the initial stage of processed data in the Medallion Architecture. After data is ingested into the landing zone, it transitions to the bronze stage for preliminary cleaning and transformation. At this stage, duplicates are removed and basic formatting is applied, preparing the data for further enrichment and analysis in the silver and gold stages.
How bronze entities relate to landing zone entities¶
A single landing zone entity can be associated with multiple bronze entities. This flexible relationship enables efficient data processing and organization as data moves through the architecture.
Bronze entity values¶
Bronze entities support a range of configuration values. The following tables describe the available options for each data source type.
All data source types¶
| Name | Description | Used for | Required | Default |
|---|---|---|---|---|
| timeout_total_in_seconds | Maximum duration for processing the entity (in seconds) | All Sources | No | 43200 |
| timeout_per_cell_in_seconds | Maximum duration for processing a notebook cell (in seconds) | All Sources | No | 1800 |
| valid_dq_deduplication_modes | Deduplication method if primary keys are insufficient | All Sources | No | none, other valid values: row, key |
TIP
For more information about the Data Quality options and examples, go to Data-Quality.
CSV data source types¶
| Name | Description | Used for | Required | Default |
|---|---|---|---|---|
| CompressionType | Compression type for CSV files | CSV Source | Yes, if CSV | none, valid: gzip, bzip2, lz4, snappy, deflate |
| ColumnDelimiter | Character used to separate columns | CSV Source | Yes, if CSV | ; |
| RowDelimiter | Character used to separate rows | CSV Source | Yes, if CSV | \r\n |
| EscapeCharacter | Character used for escaping | CSV Source | Yes, if CSV | \ |
| Encoding | Encoding format for the data | CSV Source | Yes, if CSV | UTF-8 |
| FirstRowIsHeader | Indicates if the first row contains headers | CSV Source | Yes, if CSV | 1 (0 = False, 1 = True) |
TIP
IfFirstRowIsHeaderisFalsethen the columns will be named_c0..._c99
JSON data source types¶
| Name | Description | Used for | Required | Default |
|---|---|---|---|---|
| Collection | Name of the collection to extract data from | JSON Source | No | |
| DateFormat | Format for date values (ISO-8601 by default) | JSON Source | No | |
| Multiline | Indicates if the JSON contains Multiline data | JSON Source | No | false |
TIP
For more information about the Entity values or examples, go to Collection and DateFormat.
'Fixed width' data source types¶
| Name | Description | Used for | Required | Default |
|---|---|---|---|---|
| TxtType | Must be specified if the txt file is of type 'fixed width' | Fixed width Source | Yes | FixedWidth |
| FirstRow | The row where the header is located | Fixed width Source | No | False |
Column mapping defines how fixed‑width text files are sliced. SourceColumn must always be "start,width" (start is 1‑based). TargetColumn is the name of the resulting column. The engine extracts width characters starting at start, trims the value, and assigns it to TargetColumn. Example:
[
{
"SourceColumn": "1,5",
"TargetColumn": "ID"
},
{
"SourceColumn": "6,10",
"TargetColumn": "Name"
}
]
Excel data source types¶
| Name | Description | Used for | Required | Default |
|---|---|---|---|---|
| SheetName | Name of the sheet containing the data | Excel Source | Yes, if Excel | |
| FirstRowIsHeader | Row number where the header is located | Excel Source | Yes, if Excel | 1 |
| ColumnRange | Range of columns to extract | Excel Source | Yes, if Excel | A:E |
| RowsRange | Range of rows to extract | Excel Source | Yes, if Excel | ALL |
| NaValues | Strings to treat as null values | Excel Source | No | NONE |
| Thousands | Character used as thousands separator | Excel Source | No | |
| Decimal | Character used as decimal separator | Excel Source | No | , |
| Comment | Rows containing comments to exclude | Excel Source | No | |
| SkipRows | Rows to skip | Excel Source | No |
XML data source types¶
| Name | Description | Used for | Required | Default |
|---|---|---|---|---|
| RootTag | The single outermost element that encloses all other elements in a document | XML Source | Yes | |
| RowTag | Refers to a repeating element within the root (or another parent) that represents a single record or data row | XML Source | No |
XML Example:¶
<library>
<name>Liberty Library</name>
<books>
<book>
<title>De Ontdekking van de Hemel</title>
<author>Harry Mulisch</author>
</book>
<book>
<title>Het Diner</title>
<author>Herman Koch</author>
</book>
</books>
<members>
<member>
<name>Anna</name>
<borrowed>
<book>Het Diner</book>
</borrowed>
</member>
<member>
<name>Tom</name>
</member>
</members>
</library>
Translated into a diagram:
graph TD
A --> B[books]
A[library] --> A1[name: Liberty Library]
A --> C[members]
B --> D[book 1]
B --> E[book 2]
D --> D1[title: De Ontdekking van de Hemel]
D --> D2[author: Harry Mulisch]
E --> E1[title: Het Diner]
E --> E2[author: Herman Koch]
C --> F[member 1]
C --> G[member 2]
F --> F1[name: Anna]
F --> F2[borrowed]
F2 --> F21[book: Het Diner]
G --> G1[name: Tom]
There are couple of ways to configure this XML with different results:
Option 1: Only RootTag
When the RootTag is set to: library the result in Bronze will be:
| name | books | members |
|---|---|---|
| Liberty Library | [{"title": "De Ontdekking van de Hemel", "author": "Harry Mulisch"}, {"title": "Het Diner", "author": "Herman Koch"}] |
[{"name": "Anna", "borrowed": [{"book": "Het Diner"}]}, {"name": "Tom", "borrowed": []}] |
Option 2: Set RootTag on outer layer and RowTag on most inner level
When the RootTag is set to: library and the RowTag to borrowed the result in Bronze will be:
| name | members_member_name | members_member_borrowed_book |
|---|---|---|
| Liberty Library | Anna | Het Diner |
IMPORTANT
Because Tom hasn't borrowed any books, setting the RowTag on this level his record will not show up anymore
Parent element names are prefixed to the element names, likemembers&name
Option 3: Set RootTag on not the most outer layer
When the RootTag is set to: books and result in Bronze will be:
| book_title | book_author |
|---|---|
| De Ontdekking van de Hemel | Harry Mulisch |
| Het Diner | Herman Koch |
How to create a bronze entity¶
To create a bronze entity:
- Use the Entity Wizard.
- Alternatively, create an entity directly from the landing zone entities panel.