Columnar Datamart


When a Datamart’s Refresh Data Load is executed, data from its Data Sources is copied into separate tables within the database. Effectively, a snapshot of the source data is created, becoming the new, updated content of the Datamart.

Optionally, this data can undergo further processing or enrichment through (Distributed) Calculation Data Loads before making it available to application users and business logic via the Datamart’s Publish Data Load. Publishing produces another snapshot—this time, reflecting the refreshed Datamart data after processing or enrichment has been applied. When the Citus is installed in the Analytics Database (PA DB), the publish table is columnar, often offering improved performance for analytical queries, which typically utilize only a subset of the fields defined in the Datamart.

However, rows in a columnar table cannot be directly modified. This means that the table is truncated and all data is re-inserted each time an update is necessary.

The refresh process is usually configured to perform incremental data loads into the Datamart, where most rows remain unchanged, and only a small proportion are updated or added. This approach enhances efficiency and speed; however, as the Datamart grows, the refresh time increases, primarily because existing rows must be identified and replaced. Consequently, the process slows as the number of rows in the Datamart increases.

For this reason, and in cases where new data cannot be delivered incrementally, a full reload of the Datamart during each Refresh may be required. It is important to note that for larger Datamarts, this involves processing a substantial volume of data with each refresh. Therefore, more frequent-than-weekly refreshes are generally not recommended in such cases.

Given the context, creating a Columnar Datamart can be advantageous in certain situations. Although not strictly defined, the following guidelines will help determine when to consider using a columnar Datamart—and more importantly, when to avoid it:

  • When a Datamart contains fewer than approximately 100 fields, the benefits of a columnar table diminish significantly. In general, adopting a Columnar Datamart is advisable only for those with 100 or more fields.

  • If incremental refreshes are not feasible, necessitating full reloads, the switch to a Columnar Datamart has minimal impact on the Refresh Data Load. This is because, similar to other columnar tables, the target table is truncated and populated a new during each refresh.

  • When publishing is not required, for example, when no post-processing or enrichment of refreshed data is needed, queries can benefit from the performance improvements provided by columnar refresh tables.
    info Note that without publishing, Datamart queries are executed directly against its refresh tables.

  • A columnar Datamart should not be used if frequent refreshes are necessary. The acceptable frequency depends on the size, but daily updates are generally the upper limit; weekly or slower schedules are highly recommended.

If a columnar Datamart is chosen, consider setting the LoadMode to APPEND. In this mode, the refresh tables do not need to be truncated, as only new rows are added and existing rows remain unchanged. This approach requires that existing rows are never updated. If the refresh is configured as append-only and attempts to reload existing rows, the Data Load will result in an error, and no data will be loaded.

If columnar tables are not supported in the database, the following warning message will be displayed: Columnar tables not currently supported in this partition.