Configure Pyspark Notebook Data Sources in NCC Portal¶

This article describes how to configure and manage Pyspark Notebook data sources in the NCC Portal, following Microsoft Learn Fabric documentation style.

TIP
For instructions on creating a custom Pyspark notebook, see Create a custom notebook.

Prerequisites¶

Access to the NCC Portal.
Required permissions to add data sources and entities.

Step 1: Set Up the Pyspark Notebook Connection¶

No connection needs to be created in Fabric for this data source.

Step 2: Add a Data Source¶

In the NCC Portal, select Tenant Settings > Data Sources.
Select Add DataSource.
Complete the following fields:

Field	Description	Example/Default Value
Name	Name of the data source in NCC	``
Data Source Type	Type of data source	`NOTEBOOK`
Namespace	Prefix for storing data in Lakehouses	``
Code	Identifier for pipelines	`NB`
Description	Description of the data source	``
Connection	Name of the connection in Fabric	(not required)
Environment	NCC environment for the data source	`Development`

Step 3: Create a Landing Zone Entity¶

Go to Landing Zone Entities.
Select New Entity.
Enter the required details:

Field	Description	Example/Default Value
Pipeline	Not used	``
Data Source	Data source for the connection	- Set in previous step
Source schema	Parameter for custom notebook	``
Source name	Parameter for custom notebook	``
Incremental	Extract data incrementally	`False`
Has encrypted columns	Indicates if table has sensitive data	`False`
Entity value	NotebookName required, CustomParametersJSON optional. Entity values reference
Lake house	Lakehouse for storing data	`LH_Data_Landingzone`
File path	File path for data storage	Is filled automatically
File name	File name for data storage	Is filled automatically
File type	Expected file type	E.g. `Json`, `Csv`, `Parquet`, `Xlsx`, `Txt`, `Xml`

TIP

Configure incremental loading

Apply data encryption to sensitive data

Step 4: Create a Bronze Zone Entity¶

Go to Bronze Zone Entities.
Select New Entity.
Enter the following information:

Field	Description	Example/Default Value
Pipeline	Orchestrator pipeline for parsing	`PL_BRZ_COMMAND`
Landing zone entity	Landing zone entity to be parsed	`InSpark_Sales/total_sales`
Entity value	Optional. Entity values reference
Column mappings	Optional. Column mapping info
Lake house	Lakehouse for storing data	`LH_Bronze_Layer`
Schema	Schema for storing data	`dbo`
Name	Table name for storing data	`total_sales`
Primary keys	Unique identifier fields (case sensitive)	`id`

Step 5: Create a Silver Zone Entity¶

Go to Silver Zone Entities.
Select New Entity.
Provide the following details:

Field	Description	Example/Default Value
Pipeline	Orchestrator pipeline for parsing	`PL_SLV_COMMAND`
Bronze layer entity	Bronze layer entity to be parsed	`dbo.total_sales`
Entity value	Optional. Entity values reference
Lake house	Lakehouse for storing data	`LH_Silver_Layer`
Schema	Schema for storing data	`dbo`
Name	Table name for storing data	`total_sales`
Columns to exclude	Comma-separated columns to exclude (case sensitive)
Columns to exclude from history	Comma-separated columns to exclude from compare (case sensitive)

Example Configuration¶

The company InSpark has created a Pyspark notebook named NB_CUST_Get_Total_Sales_data in Fabric. This notebook extracts a file named total_sales.csv and expects one parameter: the filename to extract (total_sales.csv). The following configuration demonstrates how to set up this scenario.

Data Source¶

Field	Value
Name	`InSpark`
Data Source Type	`NOTEBOOK`
Namespace	`InSpark_Sales`
Code	`NB`
Description	`Custom notebooks for InSpark`
Connection
Environment	`Development`

Landing Zone Entity¶

Field	Value
Pipeline	`PL_LDZ_COPY_FROM_ADF_PIPELINE`
Data Source	`InSpark`
Source schema	`sales`
Source name	`total_sales`
Incremental	`False`
Entity value
Lake house	`LH_Data_Landingzone`
File path	`InSpark_Sales`
File name	`total_sales`
File type	`Parquet`

Example Entity value:

Name	Value
CustomParametersJSON	{"FileName": "total_sales.csv"}
NotebookName	NB_CUST_Get_Total_Sales_data

Bronze Zone Entity¶

Field	Value
Pipeline	`PL_BRZ_COMMAND`
Landing zone entity	`InSpark_Sales/total_sales`
Entity value
Column mappings
Lake house	`LH_Bronze_Layer`
Schema	`dbo`
Name	`total_sales`
Primary keys	`id`

Example Entity value:

Name	Value
ColumnDelimiter	,
CompressionType	none
Encoding	UTF-8
EscapeCharacter	\
FirstRowIsHeader	1
RowDelimiter	\r\n

Silver Zone Entity¶

Field	Value
Pipeline	`PL_SLV_COMMAND`
Bronze layer entity	`dbo.total_sales`
Entity value
Lake house	`LH_Silver_Layer`
Schema	`dbo`
Name	`total_sales`
Columns to exclude
Columns to exclude from history

Next steps¶

Add the source to a load planner group