2/13/20

Learn How to Create Dynamic Partitions with PostgreSQL


Learn How to Create Dynamic Partitions with PostgreSQL

A step by step guide to automatically shard your big data tables

Songtham Tung

Songtham Tung

Oct 30, 2018 · 5 min read



Some basic definitions

Before we get started, here’s a few basic terms.
  • Dynamic refers to constantly changing.
  • Dynamic Partitioning thus refers to automatically splitting a large table into smaller tables.

PostgreSQL

  • Functions
  • Triggers




TLDR; Show me the code!

Our goal is to automatically achieve the creation of new partitions each month — instead of manually creating each one ahead of time.

Visualization of a chat table partitioned by month.


I added sequence and index which can be nice, but it’s not required for partitioning.


This contains the bulk of the logic.

  • Then we DECLARE the variables that will be used
  • Afterwards we BEGIN initializing the variables
  • Next we query to see if the partitioned table exists. If not, we dynamically create a new partitioned table e.g. (chat_yyyy_mm) that INHERITS from chat_master.
  • Finally, we insert the new record to the partitioned table
  • “NEW” signifies a new database row for INSERT/UPDATE operations
  • Tables cannot contain dash (-), so table names are underscored (_)
  • “||” is PostgreSQL syntax for string concatenation
  • For STRING INTERPOLATION, we use the built-in function “format()”
  • The syntax for escape string is backslash (\), but the letter ‘E’ has to come before the opening single quote — see line 19.
  • RETURN NULL means that this function returns nothing to the caller, which is the TRIGGER.





Querying the data

Now that we’ve created a script that would dynamically create partitions when new records get inserted — what about READing from them?
INSERT INTO public.chat_master (
 program_id, user_id, dialogue, created_at)
 VALUES ('program_1', 'A01', 'hello world!', '2018-11-11'
);
NOTICE:  A partition has been created chat_2018_11 
INSERT 0 0
SELECT * FROM chat_master|id | program_id  |user_id|   dialogue     | created_at
-----------------------------------------------------------------
| 2 | "program_1" | "A01" | "hello world!" | "2018-11-11 00:00:00"
SELECT * FROM chat_2018_11|id | program_id  |user_id|   dialogue     | created_at
-----------------------------------------------------------------
| 2 | "program_1" | "A01" | "hello world!" | "2018-11-11 00:00:00"
Wait a second ... isn’t this duplicate data?
Nope! That’s what I thought at first too, but this isn’t the case. When you’re inserting data into chat_master, it’s only writing to the partitioned table. However, when you’re reading from chat_master, it’s grabbing data from all the inherited partitioned tables.
INSERT INTO public.chat_master (
 program_id, user_id, dialogue, created_at)
 VALUES ('program_2', 'A01', 'hello panya!', '2018-12-12'
);NOTICE:  A partition has been created chat_2018_12 
INSERT 0 0
SELECT * FROM chat_master|id | program_id  |user_id|   dialogue     | created_at
-----------------------------------------------------------------
| 2 | "program_1" | "A01" | "hello world!" | "2018-11-11 00:00:00"
| 3 | "program_2" | "A01" | "hello panya!" | "2018-12-12 00:00:00"SELECT * FROM chat_2018_12|id | program_id  |user_id|   dialogue     | created_at
-----------------------------------------------------------------
| 2 | "program_2" | "A01" | "hello panya!" | "2018-12-12 00:00:00"
SELECT * FROM ONLY chat_master



Why Partitioning Matters

Dynamic partitioning is great when you want to automatically split a large table into smaller ones. This can be beneficial when running full table scans and filtering by the partitions on the WHERE clause.

No comments: