With the release of version 15.0, we have added support for the Citus database on our AWS platform. This enhancement enables users to leverage the scalability and distributed capabilities of Citus.
Description
Citus Database is an extension to PostgreSQL that enables horizontal scalability by distributing data across multiple nodes. This allows for handling large datasets and high-throughput workloads more efficiently than a single database server. Citus achieves this by partitioning tables into shards and distributing them across nodes, improving performance and enabling real-time analytics at scale.
Key Features
-
Distributed SQL capabilities.
-
Scalability for transactional and analytical workloads.
-
Compatibility with PostgreSQL, making it easy for PostgreSQL users to adopt.
-
Support for multi-tenant applications and multi-shard queries.
Sharding in PostgreSQL
-
Capability – Enabled via the Citus open source extension for PostgreSQL.
-
Technique – Splits large PostgreSQL tables into smaller parts, called “shards”, which are distributed horizontally across multiple servers (nodes).
-
Benefit – Allows for distributing the workload and storage, improving performance and scalability.
Why Use Citus for Sharding?
-
Data Handling – Citus can manage large amounts of data by spreading it across multiple servers, eliminating performance bottlenecks as data grows.
-
Faster Reports – Executes analysis in parallel, helping with quicker, data-driven decision-making.
Citus in Pricefx Data Distribution
The application of Citus is specifically related to the Analytics module, including:
How We Apply Citus in Pricefx
-
Citus acts solely as the database for OLAP, specifically for Pricing Analytics (PA) data.
-
Datamarts containing hundreds of millions of rows and hundreds of columns are considered potential candidates for Citus-based setup.
-
The threshold for using Citus depends on the number of rows, fields, and data organization.
-
Data distribution is typically most effective when based on
productIdorcustomerId. -
The operational cost of Citus may be higher than PostgreSQL since it requires more pods and nodes/servers.
-
Citus has not been introduced to improve performance but as a stability measure.