Performance Best Practices¶
This article provides recommendations to optimize performance across multiple layers in your solution, including Configuration Nitrogen Control Center, Warehouse, and Power BI.
Configuration Nitrogen Control Center¶
Limit processed data¶
Load only the data required to generate accurate facts and dimensions. Avoid unnecessary data retrieval to improve efficiency.
Use LastLoadValue¶
By default, source parameters are loaded using SELECT * FROM. For source parameters with more than one million rows, evaluate whether full history is necessary. If not, set the LastLoadValue to a specific date (e.g., 01-01-2020) during the initial load to restrict the dataset.
Limit extracted data from source via source query (to be developed)¶
Select only the columns needed from a table, rather than using SELECT * FROM, to minimize data extraction.
Use column mapping and schema mapping¶
Choose appropriate data types and lengths for columns to reduce overall data size.
Warehouse¶
Use include/exclude columns¶
When loading entire tables, select only the columns required for dimensions and facts in the StgKey layer.
Determine active/historical process type to limit data¶
In the StgKey layer, use N7_IsDeleted = 0 and N7_IsCurrent = 1 by default. Process the full dataset only when necessary to minimize the amount of data stored.
Use datamart persist¶
If executing a view in the datastore is time-consuming, leverage datamart persist functionality to store results in a table for improved performance.
Power BI¶
- Use import or Direct Lake mode instead of Direct Query for better performance.
- Ensure keys in facts and dimensions are integers.
- Import only the data required for visualization to reduce dataset size and optimize performance.
- Avoid importing columns with high cardinality, as they do not compress efficiently and can impact performance.
- Apply correct data types to imported columns.
- Business logic should be implemented as close to the source as possible, preferably in the Warehouse, not in Power Query.
- Avoid using calculated columns to improve performance.
- Disable the following settings for optimal performance:
- 'Auto date/time' (Options > Data Load > Time Intelligence)
- 'Import relationships from data sources on first load'
- 'Update or delete relationships when refreshing data'
- 'Autodetect new relationships after data is loaded'