Skip to content

Configure Pyspark Notebook Data Sources in NCC Portal

This article describes how to configure and manage Pyspark Notebook data sources in the NCC Portal, following Microsoft Learn Fabric documentation style.

TIP
For instructions on creating a custom Pyspark notebook, see Create a custom notebook.

Prerequisites

  • Access to the NCC Portal.
  • Required permissions to add data sources and entities.

Step 1: Set Up the Pyspark Notebook Connection

No connection needs to be created in Fabric for this data source.

Step 2: Add a Data Source

  1. In the NCC Portal, select Tenant Settings > Data Sources.
  2. Select Add DataSource.
  3. Complete the following fields:
Field Description Example/Default Value
Name Name of the data source in NCC ``
Data Source Type Type of data source NOTEBOOK
Namespace Prefix for storing data in Lakehouses ``
Code Identifier for pipelines NB
Description Description of the data source ``
Connection Name of the connection in Fabric (not required)
Environment NCC environment for the data source Development

Step 3: Create a Landing Zone Entity

  1. Go to Landing Zone Entities.
  2. Select New Entity.
  3. Enter the required details:
Field Description Example/Default Value
Pipeline Not used ``
Data Source Data source for the connection - Set in previous step
Source schema Parameter for custom notebook ``
Source name Parameter for custom notebook ``
Incremental Extract data incrementally False
Has encrypted columns Indicates if table has sensitive data False
Entity value NotebookName required, CustomParametersJSON optional. Entity values reference
Lake house Lakehouse for storing data LH_Data_Landingzone
File path File path for data storage Is filled automatically
File name File name for data storage Is filled automatically
File type Expected file type E.g. Json, Csv, Parquet, Xlsx, Txt, Xml

TIP

Step 4: Create a Bronze Zone Entity

  1. Go to Bronze Zone Entities.
  2. Select New Entity.
  3. Enter the following information:
Field Description Example/Default Value
Pipeline Orchestrator pipeline for parsing PL_BRZ_COMMAND
Landing zone entity Landing zone entity to be parsed InSpark_Sales/total_sales
Entity value Optional. Entity values reference
Column mappings Optional. Column mapping info
Lake house Lakehouse for storing data LH_Bronze_Layer
Schema Schema for storing data dbo
Name Table name for storing data total_sales
Primary keys Unique identifier fields (case sensitive) id

Step 5: Create a Silver Zone Entity

  1. Go to Silver Zone Entities.
  2. Select New Entity.
  3. Provide the following details:
Field Description Example/Default Value
Pipeline Orchestrator pipeline for parsing PL_SLV_COMMAND
Bronze layer entity Bronze layer entity to be parsed dbo.total_sales
Entity value Optional. Entity values reference
Lake house Lakehouse for storing data LH_Silver_Layer
Schema Schema for storing data dbo
Name Table name for storing data total_sales
Columns to exclude Comma-separated columns to exclude (case sensitive)
Columns to exclude from history Comma-separated columns to exclude from compare (case sensitive)

Example Configuration

The company InSpark has created a Pyspark notebook named NB_CUST_Get_Total_Sales_data in Fabric. This notebook extracts a file named total_sales.csv and expects one parameter: the filename to extract (total_sales.csv). The following configuration demonstrates how to set up this scenario.

Data Source

Field Value
Name InSpark
Data Source Type NOTEBOOK
Namespace InSpark_Sales
Code NB
Description Custom notebooks for InSpark
Connection
Environment Development

Landing Zone Entity

Field Value
Pipeline PL_LDZ_COPY_FROM_ADF_PIPELINE
Data Source InSpark
Source schema sales
Source name total_sales
Incremental False
Entity value
Lake house LH_Data_Landingzone
File path InSpark_Sales
File name total_sales
File type Parquet

Example Entity value:

Name Value
CustomParametersJSON {"FileName": "total_sales.csv"}
NotebookName NB_CUST_Get_Total_Sales_data

Bronze Zone Entity

Field Value
Pipeline PL_BRZ_COMMAND
Landing zone entity InSpark_Sales/total_sales
Entity value
Column mappings
Lake house LH_Bronze_Layer
Schema dbo
Name total_sales
Primary keys id

Example Entity value:

Name Value
ColumnDelimiter ,
CompressionType none
Encoding UTF-8
EscapeCharacter \
FirstRowIsHeader 1
RowDelimiter \r\n

Silver Zone Entity

Field Value
Pipeline PL_SLV_COMMAND
Bronze layer entity dbo.total_sales
Entity value
Lake house LH_Silver_Layer
Schema dbo
Name total_sales
Columns to exclude
Columns to exclude from history

Next steps