Mastering PostgreSQL Efficiency - Tips & Tricks ✨

August 21, 2023·5 min read

Priya Jain

@priyajaindev

Mastering PostgreSQL Efficiency - Tips & Tricks ✨

Mastering PostgreSQL Efficiency - Tips I learned from my 4 years experience working with PostgreSQL. I will be updating this post as I learn more. I hope you find it useful.

When working with PostgreSQL, I faced a lot of issues like slow queries, data inconsistency, etc. I learned few things that helped me a lot. Hope you find it useful 🧙‍♂️.

1. Prevent Lock Contention for Updates on Hot Rows 🔒

Imagine your database as a busy kitchen. When too many cooks try to update the same recipe at once, chaos ensues. The INSERT ... ON CONFLICT ... DO UPDATE spell in PostgreSQL is like the head chef, making sure updates happen smoothly, even in a bustling kitchen.

INSERT INTO products (
  product_id, quantity, last_update
) VALUES (
  123456, FLOOR(RANDOM() * 100), CURRENT_TIMESTAMP
) ON CONFLICT (product_id) DO UPDATE SET 
quantity = products.quantity + EXCLUDED.quantity,
last_update = EXCLUDED.last_update;

2. Updates Based on a Select Query 🔄

Think of this as upgrading your computer's software with just one click. With the UPDATE spell and the FROM clause, you can magically update multiple records based on information from another table, like updating all products' prices based on category discounts.

UPDATE products
SET price = price_base - price_base * categories.discount
FROM categories
WHERE products.category_id = categories.category_id;

In the above example, we are updating the price of all products based on the discount of the category they belong to.

3. Return the Values of Modified Rows 📜

Ever wished you could time travel and see what your database looked like before you made changes? The RETURNING clause in PostgreSQL is your time machine. It shows you the data that's been modified by your DELETE, INSERT, or UPDATE spells.

DELETE FROM borrowed_books
WHERE return_date = CURRENT_DATE
RETURNING book_id, borrower_id, return_date;

4. Delete Duplicate Rows 🗑️

This saved me a lot of time. Duplicates in your database are like having two identical socks – they serve no purpose. SQL window functions, like the ROW_NUMBER() OVER(), help you identify and delete duplicate rows, leaving your database sock drawer neat and tidy.

WITH duplicates AS (
  SELECT guest_id, ROW_NUMBER() OVER(
    PARTITION BY guest_name, event_date
    ORDER BY rsvp_timestamp DESC
  ) AS rownum
  FROM guest_list
)
DELETE FROM guest_list
USING duplicates
WHERE guest_list.guest_id = duplicates.guest_id
  AND duplicates.rownum > 1;

5. Reduce the Amount of Group By Columns 📊

When you're at a party, you don't introduce yourself by listing all your personal details. Similarly, you can skip listing unnecessary columns in your GROUP BY clause, and the database will understand what you mean. It's like introducing yourself with just your name – simple and effective.

SELECT products.name, products.brand, COUNT(*) as count
FROM products
JOIN stocks ON stocks.product_id = products.id
GROUP BY products.id

In the above example, we are grouping by products.id instead of products.name and products.brand because products.id is the primary key and is unique.

6. Simplified Inequality Checks with Nullable Columns 🤷‍♂️

Dealing with nullable columns can be tricky. It's like trying to compare apples and oranges. But with the IS DISTINCT FROM operator, SQL understands that you're comparing apples to something, making your queries much clearer.

SELECT * FROM products WHERE price IS DISTINCT FROM 0;

In the above example, we are selecting all the products whose price is different from the base price.

7. Fill table with random data 🫙

This is useful when you want to test your queries with a large amount of data.

INSERT INTO products (name, price)
SELECT
  md5(random()::text),
  floor(random() * 1000)
FROM generate_series(1, 1000000);

In the above example, we are inserting 1 million rows into the products table with random data.

8. Use `EXPLAIN` to understand the query plan 📊

This is useful when you want to understand how your query is executed. It will show you the query plan, which will help you optimize your query.

EXPLAIN SELECT * FROM products WHERE price > 100;
EXPLAIN ANALYZE  SELECT * FROM products WHERE price > 100;

Most useful when writing complex queries for reporting.

9. Prevent Division by Zero Errors ➗

Dividing by zero is like trying to share a pizza when there's no pizza. The NULLIF function transforms division by zero into a friendly null, so your database doesn't throw a tantrum.

SELECT 1 / NULLIF(0, 0);
SELECT visitors_today / NULLIF(visitors_yesterday, 0)
FROM logs_aggregated;

In the above example, we are dividing the number of visitors today by the number of visitors yesterday. If the number of visitors yesterday is 0, then the query will return null.

10. Fill Gap in Data for Statistical Analytics 📈

SELECT dates_without_gaps.day, COALESCE(SUM(statistics.count), 0)
FROM generate_series(
  CURRENT_DATE - INTERVAL '14 days',
  CURRENT_DATE,
  '1 day'
) as dates_without_gaps(day)
LEFT JOIN statistics ON(statistics.day = dates_without_gaps.day)
GROUP BY dates_without_gaps.day;

In the above example, we are filling the gaps in the data for the last 14 days.

11. Use `WITH` to make your query more readable 📖

WITH random_names AS (
    SELECT
        CONCAT('firstname-', i) AS firstname,
        CONCAT('lastname-', i) AS lastname
    FROM generate_series(1, 100000) AS i
)
INSERT INTO contacts (firstname, lastname)
SELECT firstname, lastname
FROM random_names;

In the above example, we are inserting 100,000 rows into the contacts table with random data.

12. Limit Rows Also Including Ties 🏆

SELECT *
FROM teams
ORDER BY winning_games DESC
FETCH FIRST 3 ROWS WITH TIES;

In the above example, we are selecting the top 3 teams with the most winning games. If there are ties, then the query will return all the teams with the same number of winning games. Places where this is useful: When same rank is given to multiple rows, for example, in a leaderboard or performance review.

13. Strengthen Security with UUID Keys 🕵️‍♂️

ALTER TABLE users ADD COLUMN uuid uuid NOT NULL DEFAULT
gen_random_uuid();
CREATE UNIQUE INDEX users_uuid ON users (uuid);

In the above example, we are adding a uuid column to the users table and creating a unique index on the uuid column. This will prevent enumeration attacks.

14. Constraints for Improved Data Strictness 📏

-- Make sure that the email is not null and unique
ALTER TABLE users ADD CONSTRAINT users_email_unique UNIQUE (email);
-- Make sure that checkin_at is before checkout_at
ALTER TABLE reservations 
ADD constraint start_before_end CHECK (checkin_at < checkout_at);
 
-- Make sure that the vatid is not null if the customer is from the EU  
ALTER TABLE invoices
ADD constraint eu_vat CHECK (
  NOT(is_europeanunion) OR vatid IS NOT NULL
);

15. Multiple Aggregates In One Query 🧭✨

SELECT
  COUNT(*) FILTER (WHERE genre = 'Action') AS action_shows,
  COUNT(*) FILTER (WHERE genre = 'Drama') AS drama_shows,
  COUNT(*) FILTER (WHERE genre = 'Comedy') AS comedy_shows,
  AVG(viewers) AS average_viewers,
  SUM(duration) AS total_watch_time
FROM netflix_catalog;

In the above example, we are calculating the number of shows in each genre, average viewers, and total watch time in one query.

There many more such small things I learn over the years. But because I don't use them often, I forget them. So I decided to write them down. I will keep updating this list as I learn more. If you have any suggestions, please let me know.

Next Steps

This week my target is to learn more about PostgreSQL. This is what I will be learning:

Full text Search in PostgreSQL
Basic Queue in PostgreSQL
Basic Cache in PostgreSQL
Basic Pub/Sub in PostgreSQL
Indexing in PostgreSQL
TimeZone in PostgreSQL