Medallion Architecture in NCC¶
Learn how the NCC platform uses the Medallion Architecture to organize and manage data in a lakehouse environment. This layered approach improves data consistency, quality, governance, and performance by structuring data into four stages: Landing Zone, Bronze, Silver, and Gold.
What is Medallion Architecture?¶
Medallion Architecture is a data management pattern that divides data processing into distinct layers. Each layer serves a specific purpose and builds on the previous one, enabling scalable and reliable analytics.
Layers in NCC Medallion Architecture¶
| Layer | Description |
|---|---|
| Landing Zone | Initial staging area for raw data. No transformation or validation is applied. |
| Bronze | Raw but structured data. Minimal transformations and schema enforcement. |
| Silver | Metadata-enriched, deduplicated data with historical records (SCD2). |
| Gold | Curated, high-quality data for analytics, reporting, and machine learning. |
Landing Zone¶
The Landing Zone is the entry point for raw data ingestion. Data is stored in its original format, without transformation or validation, serving as a buffer between external sources and the structured Medallion layers.
Key features:
- Supports CSV, JSON, Parquet, and Excel formats.
- Enables auditing, reprocessing, and debugging.
- Uses time-based partitioning for traceability.
- No schema enforcement or quality checks.
Bronze Layer¶
The Bronze Layer contains raw but structured data. Minimal transformations are applied, including parsing, schema enforcement, and basic metadata enrichment. This layer maintains the latest version of each dataset.
Key features:
- Data is readable and queryable.
- Basic quality checks, such as primary key enforcement and type casting.
- Includes ingestion metadata (for example, timestamps).
Silver Layer¶
The Silver Layer stores metadata-enriched data and maintains historical records using a Slowly Changing Dimension Type 2 (SCD2) approach.
Key features:
- Data is deduplicated.
- Historical versions of records are retained.
Gold Layer¶
The Gold Layer provides curated, high-quality data for business intelligence, reporting, and machine learning. Business logic, joins, and filtering are applied to prepare data for analytics.
Key features:
- Combines customer and transaction data.
- Filters out invalid records.
- Aggregates and models data for specific use cases.
Medallion Entities in NCC¶
Entities in NCC are defined for each Medallion Architecture layer:
- Landing zone entities extract data to the Landing Zone.
- Bronze entities parse Landing Zone data into the Bronze Layer.
- Silver entities parse Bronze Layer data into the Silver Layer.
- Gold entities model business logic in the Gold Layer.
An entity is a metadata collection in NCC that contains all information required to process data through each Medallion layer. Each entity is tailored to its layer’s requirements. For example, a Landing Zone entity includes connection and data source details, a Bronze entity specifies primary keys and column mappings, and a Silver entity defines record-level history building.
Entity Relationships¶
Entities in NCC are directly linked, except for Gold entities, which are populated through business logic and may reference zero or more Silver entities:
- Landing zone entities are the first layer and depend only on a connection.
- One or more Bronze entities can be based on a single Landing Zone entity. For example, if an Excel file with multiple tabs is configured in a Landing Zone entity, each tab can be loaded by a separate Bronze entity.
- Each Silver entity is linked to one Bronze entity to build the slowly changing dimension (SCD2).
- Gold entities are not directly linked to Silver entities in NCC, but typically depend on data loaded in the Silver Layer. A Gold entity can reference zero, one, or many Silver entities as needed for business logic and analytics. This dependency is managed by users.
Schematic overview relationship¶
flowchart TD
subgraph Medallion Layers
LZ["Landing Zone Entity"]
BZ1["Bronze Entity 1"]
BZ2["Bronze Entity ..."]
SZ1["Silver Entity 1"]
SZ2["Silver Entity ..."]
subgraph Gold Layers
GE1["Gold Entity 1"]
GE2["Gold Entity 2"]
GE3["Gold Entity 3"]
end
end
ext["External Data Source"]
conn["Connection"]
ext --> conn
conn --> LZ
LZ --> BZ1
BZ1 --> SZ1
LZ -.-> BZ2
BZ2 -.-> SZ2
SZ1 -.-> GE1
SZ2 -.-> GE1
SZ1 -.-> GE2
GE3
NOTE
The diagram above illustrates the relationships between entities in the NCC Medallion Architecture.
There can be a1:*relationship between Landing Zone entities and Bronze Entities, and a1:1or1:0relationship between Bronze and Silver entities.
Gold entities are not directly linked but can reference zero, one, or many Silver entities, depending on business logic requirements.