This article describes how to configure and manage Pyspark Notebook data sources in the NCC Portal, following Microsoft Learn Fabric documentation style.
TIP
For instructions on creating a custom Pyspark notebook, see Create a custom notebook.
Prerequisites
- Access to the NCC Portal.
- Required permissions to add data sources and entities.
Step 1: Set Up the Pyspark Notebook Connection
No connection needs to be created in Fabric for this data source.
Step 2: Add a Data Source
- In the NCC Portal, select Tenant Settings > Data Sources.
- Select Add DataSource.
- Complete the following fields:
| Field |
Description |
Example/Default Value |
| Name |
Name of the data source in NCC |
`` |
| Data Source Type |
Type of data source |
NOTEBOOK |
| Namespace |
Prefix for storing data in Lakehouses |
`` |
| Code |
Identifier for pipelines |
NB |
| Description |
Description of the data source |
`` |
| Connection |
Name of the connection in Fabric |
(not required) |
| Environment |
NCC environment for the data source |
Development |
Step 3: Create a Landing Zone Entity
- Go to Landing Zone Entities.
- Select New Entity.
- Enter the required details:
| Field |
Description |
Example/Default Value |
| Pipeline |
Not used |
`` |
| Data Source |
Data source for the connection |
- Set in previous step |
| Source schema |
Parameter for custom notebook |
`` |
| Source name |
Parameter for custom notebook |
`` |
| Incremental |
Extract data incrementally |
False |
| Has encrypted columns |
Indicates if table has sensitive data |
False |
| Entity value |
NotebookName required, CustomParametersJSON optional. Entity values reference |
|
| Lake house |
Lakehouse for storing data |
LH_Data_Landingzone |
| File path |
File path for data storage |
Is filled automatically |
| File name |
File name for data storage |
Is filled automatically |
| File type |
Expected file type |
E.g. Json, Csv, Parquet, Xlsx, Txt, Xml |
TIP
Step 4: Create a Bronze Zone Entity
- Go to Bronze Zone Entities.
- Select New Entity.
- Enter the following information:
| Field |
Description |
Example/Default Value |
| Pipeline |
Orchestrator pipeline for parsing |
PL_BRZ_COMMAND |
| Landing zone entity |
Landing zone entity to be parsed |
InSpark_Sales/total_sales |
| Entity value |
Optional. Entity values reference |
|
| Column mappings |
Optional. Column mapping info |
|
| Lake house |
Lakehouse for storing data |
LH_Bronze_Layer |
| Schema |
Schema for storing data |
dbo |
| Name |
Table name for storing data |
total_sales |
| Primary keys |
Unique identifier fields (case sensitive) |
id |
Step 5: Create a Silver Zone Entity
- Go to Silver Zone Entities.
- Select New Entity.
- Provide the following details:
| Field |
Description |
Example/Default Value |
| Pipeline |
Orchestrator pipeline for parsing |
PL_SLV_COMMAND |
| Bronze layer entity |
Bronze layer entity to be parsed |
dbo.total_sales |
| Entity value |
Optional. Entity values reference |
|
| Lake house |
Lakehouse for storing data |
LH_Silver_Layer |
| Schema |
Schema for storing data |
dbo |
| Name |
Table name for storing data |
total_sales |
| Columns to exclude |
Comma-separated columns to exclude (case sensitive) |
|
| Columns to exclude from history |
Comma-separated columns to exclude from compare (case sensitive) |
|
Example Configuration
The company InSpark has created a Pyspark notebook named NB_CUST_Get_Total_Sales_data in Fabric. This notebook extracts a file named total_sales.csv and expects one parameter: the filename to extract (total_sales.csv). The following configuration demonstrates how to set up this scenario.
Data Source
| Field |
Value |
| Name |
InSpark |
| Data Source Type |
NOTEBOOK |
| Namespace |
InSpark_Sales |
| Code |
NB |
| Description |
Custom notebooks for InSpark |
| Connection |
|
| Environment |
Development |
Landing Zone Entity
| Field |
Value |
| Pipeline |
PL_LDZ_COPY_FROM_ADF_PIPELINE |
| Data Source |
InSpark |
| Source schema |
sales |
| Source name |
total_sales |
| Incremental |
False |
| Entity value |
|
| Lake house |
LH_Data_Landingzone |
| File path |
InSpark_Sales |
| File name |
total_sales |
| File type |
Parquet |
Example Entity value:
| Name |
Value |
| CustomParametersJSON |
{"FileName": "total_sales.csv"} |
| NotebookName |
NB_CUST_Get_Total_Sales_data |
Bronze Zone Entity
| Field |
Value |
| Pipeline |
PL_BRZ_COMMAND |
| Landing zone entity |
InSpark_Sales/total_sales |
| Entity value |
|
| Column mappings |
|
| Lake house |
LH_Bronze_Layer |
| Schema |
dbo |
| Name |
total_sales |
| Primary keys |
id |
Example Entity value:
| Name |
Value |
| ColumnDelimiter |
, |
| CompressionType |
none |
| Encoding |
UTF-8 |
| EscapeCharacter |
\ |
| FirstRowIsHeader |
1 |
| RowDelimiter |
\r\n |
Silver Zone Entity
| Field |
Value |
| Pipeline |
PL_SLV_COMMAND |
| Bronze layer entity |
dbo.total_sales |
| Entity value |
|
| Lake house |
LH_Silver_Layer |
| Schema |
dbo |
| Name |
total_sales |
| Columns to exclude |
|
| Columns to exclude from history |
|
Next steps