Skip to content
All posts
Engineering May 18, 2026

When SCD Type 6 Becomes Accidental Bloat: A Real-World Dimensional Modelling Fix

A client implemented SCD Type 6 across all warehouse dimensions but never used the "current value" columns. The result - doubled storage, slower queries, avoidable ETL complexity - until a targeted redesign restored performance.

Hristo Dolutarov 10 min read
SCD Type 6 customer_dim table with duplicate _current columns highlighted as accidental warehouse bloat
SCD Type 6 customer_dim table with duplicate _current columns highlighted as accidental warehouse bloat

A client implemented Slowly Changing Dimension (SCD) Type 6 across their warehouse dimensions but never used the “current value” columns it adds. The result was a doubled dimension footprint, slower queries, and avoidable ETL complexity - until a targeted redesign restored performance and reduced cost.

Introduction

Dimensional models are built to make analytics fast and understandable. But the techniques we choose to capture history - especially in dimension tables - have a direct cost in storage, procedures runtime, and query performance. This use case highlights a surprisingly common situation: a sophisticated history-tracking pattern was introduced “just in case,” then never actually used by downstream reporting or data science.

Use case at a glance: The client’s warehouse dimensions were modelled as SCD Type 6, meaning each tracked attribute existed twice - one set preserving history (Type 2) and another set storing the latest value (Type 1). However, business users always reported by the historical values-in-effect and never filtered or grouped by the “current” columns. The extra columns inflated row width, increased I/O, and slowed joins without providing analytical value. Although the solution was in MS Fabric, it still impedes performance, because even in columnar data storages, the leading unit is still the row and wider rows means more clusters to traverse when seeking required data.

Background: What SCD Type 6 Is (and When It Helps)

SCD Type 6 is a hybrid technique designed to support two valid but different analytical questions at the same time:

  • As-was reporting (historical truth): group/filter facts by the attribute values that were true when the transaction happened (Type 2 behaviour).
  • As-is reporting (current perspective): re-state historical facts according to the attribute’s current value (Type 1 behaviour embedded alongside Type 2 rows).

In the Kimball technique definition, Type 6 extends a Type 2 dimension by also storing “current” (Type 1) versions of the same attributes in each row, overwriting those current columns across all rows for the same durable business key whenever the attribute changes. This dual storage provides flexibility, but it increases row width (effectively doubling it) and ETL work, so it should be used only when both perspectives are genuinely needed.

The Situation: Type 6 Everywhere, but the Extra Columns Nowhere

The client had standardised on SCD Type 6 for several core dimensions (Customer, Product, Organization, and Location). Each dimension row included:

  • Type 2 metadata (surrogate key, effective start/end dates, current-row flag)
  • Historical attribute columns (the values-in-effect for that row version)
  • Duplicated “current” attribute columns (often prefixed like current_* or suffixed like *_current) updated with Type 1 overwrite logic

When performance issues emerged, the team reviewed semantic models, BI reports, and the most expensive warehouse queries. They found that every join and aggregation referenced only the Type 2 historical attributes and the “current row” flag - none of the duplicated current-value columns were referenced in SQL, measures, or filters. In other words, the warehouse was paying the Type 6 cost without receiving the Type 6 benefit.

Impact: Bigger Rows, More I/O, Slower Joins

Even if a query doesn’t explicitly select the extra columns, wider dimension rows can still degrade performance. Wider rows reduce the number of rows that fit into a data page, increase memory pressure during scans and joins, and can slow down sort/hash operations. They also increase ingestion time because more data is written, indexed, and maintained during every load.

  • Storage: duplicated attribute sets effectively doubled the dimension table footprint (especially painful for large, high-cardinality dimensions).
  • Query latency: common star-schema joins slowed because dimension scans and hash joins processed more bytes per row.
  • ETL duration: overwrite logic for the current columns increased update work and transaction log volume.
  • Cost: increased compute time for scheduled transformations and more storage consumed over retention windows.

Root Cause: A Powerful Pattern Applied as a Default

SCD Type 6 is usually introduced to satisfy a specific requirement: “report history correctly as of the time of the event but also allow slicing all history by today’s definition.” The client had the first requirement (as-was). They did not have the second (as-is). Yet the architecture template and ETL framework treated Type 6 as the safest, most flexible option and applied it broadly.

Signals that Type 6 is overkill:

  • Reports always use the Type 2 “current row” (or effective date) logic and never re-state history under current attributes.
  • No semantic model exposes the duplicated current columns, or they exist but are never used in measures/filters.
  • Business definitions rarely change, or when they do, the organization prefers historical correctness over restatement.
  • Dimension tables are large enough that row width is a material performance factor.

Remediation: Right-Size the SCD Strategy

The fix was not to “tune the queries,” but to remove unnecessary work from the model. After confirming the current-value columns were unused, the team moved from a Type 6 implementation to a simpler Type 2-only pattern for the affected attributes. The guiding principle was: store only what you will query.

  • Validate requirements with stakeholders: confirm whether “as-is restatement” is required for any dimension/attribute (not assumed).
  • Prove non-usage: search query logs, semantic models, dashboards, and ad hoc notebooks for references to the current-value columns.
  • Simplify the schema: remove duplicated current-value columns for attributes that don’t need restatement; keep Type 2 columns and effective dating.
  • Simplify ETL logic: eliminate the Type 1 overwrite step that updates current columns across all historical rows for a durable key.
  • Regression test: reconcile aggregates (before/after) for key dashboards; confirm history is preserved and joins still behave.
  • Roll out iteratively: prioritize the largest dimensions first to unlock the biggest performance gains early.

Expected Results: Faster Queries and a Leaner Warehouse

After the redesign, dimension row width decreased substantially and the ETL flow performed less update-heavy work. The most noticeable improvements appeared in star-join queries that touched the largest dimensions: fewer bytes scanned, smaller hash tables, and better cache efficiency.

  • Dimension storage footprint: track table size before/after (data + indexes).
  • Average and P95 query duration: focus on top star-join dashboards and their peak-time performance.
  • ETL runtime: compare load windows, especially steps that previously updated many historical rows.
  • Compute cost: measure Warehouse/Lakehouse compute minutes per day/week.

Lessons Learned (and a Practical Checklist)

Although Type 6 is a valid technique, there are not many real-world use cases that can have a true benefit from it. You may use it with limited capacity - attribute by attribute, based on how the business wants to interpret history. The following checklist helped the client avoid repeating the mistake:

  • Start with questions, not patterns: “Do we need historical truth, current restatement, or both at once?”
  • Limit Type 6 to a small set of attributes: typically, only those used for both ‘as-was’ and ‘as-is’ analysis at the same time.
  • Measure dimension width: treat row size as a first-class performance variable, especially for high-cardinality dimensions.
  • Expose only what’s used: if current-value columns aren’t consumed in the semantic layer, remove them - or don’t create them.
  • Revisit annually: reporting needs evolve; revalidate whether Type 6 attributes still justify their cost.

Conclusion

This case underscores a core dimensional modelling truth: flexibility isn’t free. SCD Type 6 is designed to support both historical accuracy and reporting history under current attribute values, but if you only need one of those perspectives, the extra columns and overwrite logic become pure overhead. By aligning the SCD strategy to actual reporting behaviour, the client reduced storage, simplified ETL, and recovered query performance - without sacrificing the history that mattered.

Important Addendum: When a Functional Mindset Meets Set-Based Databases

One of the most common “looks helpful” additions in SCD implementations is an IsCurrent (or “current flag”) column. In many Type 2 and Type 6 designs, that flag is logically redundant because “currentness” is already implied by the effective dating rules: the current version is simply the row whose EndDate is set to a far-future sentinel value (or is NULL), e.g.:

WHERE EndDate > CURRENT_TIMESTAMP
-- or
WHERE EndDate = '9999-12-31'
-- or
WHERE EndDate IS NULL

The redundancy often comes from a functional programming mindset: we like explicit markers that make a record’s state easy to reason about in isolation (“this row is current = true”). But a relational database is optimised for set thinking: it derives meaning from predicates over sets of rows and relies on the optimiser to choose efficient access paths. When we add an explicit flag that duplicates a rule already encoded in dates, we create one more value that must be maintained, indexed, and kept consistent - without adding new information.

Why this can be more than a cosmetic issue: in many warehouses the “current row lookup” is one of the hottest access patterns (dimension joins, point lookups, late-arriving facts, CDC merges). A redundant current flag can make those lookups slower - not faster - depending on how indexes are designed and how selective the predicate is.

  • Low selectivity: a flag often has a 1/N distribution per business key, but across the whole table it is still a near 50/50 or “mostly false” column. Optimisers may treat it as weakly selective and prefer scans, especially if statistics are stale or if the query also needs other columns not covered by an index.
  • Index shape trade-offs: teams frequently add an index like (BusinessKey, IsCurrent). That index can help a narrow lookup, but it competes with (and can prevent) more useful composites like (BusinessKey, EndDate) or (BusinessKey, StartDate, EndDate) that also support range joins and as-of queries.
  • Columnar engines nuance: in columnstores (including many lakehouse/warehouse engines), unused columns are cheaper to scan, but predicates still need to be evaluated and segment pruning depends on sort order and metadata. If data is sorted/clustered by EndDate (or by a composite that includes it), the date predicate can prune efficiently; a flag that is not part of the sort/cluster key may not.

Practical guidance: if your “current row” definition is already encoded in EndDate, prefer querying on EndDate and support it with physical design rather than duplicating the concept in a flag.

  • Use a single source of truth: choose one representation of “current” (typically EndDate sentinel or NULL) and standardise it across all dimensions.
  • Index/cluster for the predicate you actually run: for rowstore, a composite like (BusinessKey, EndDate) often serves both “current row” and time-travel queries. For platforms that support it, consider a filtered/partial index on EndDate = sentinel to speed up current-row lookups without maintaining a full-table flag index.
  • If you need a flag for ergonomics, compute it: expose IsCurrent as a computed/derived field in a view or semantic model (e.g., CASE WHEN EndDate = sentinel THEN 1 ELSE 0 END) so it cannot drift out of sync with dates.
  • Validate invariants: enforce “one current row per business key” with checks/tests (ETL assertions, data quality rules) rather than relying on a flag to represent correctness.

In short: adding an explicit current flag feels convenient, but in a set-based system it can be “accidental complexity.” When it duplicates what EndDate already guarantees, it increases maintenance work and can interfere with the most efficient index/cluster strategy. The same theme appears throughout this use case: SCD patterns should be requirement-driven, not template-driven - otherwise “flexibility” quietly turns into bloat.

Share
X LinkedIn
About the author

Hristo Dolutarov

SoftPapaya

Read on
Careers

We work on interesting things. Maybe with you?

We're looking for people who like reading posts like this - and writing them. Engineering, design, devrel - check open roles.