Postgres Book

– title: “Postgres reading notes” image: categories: [ Databases ] —

Intro

I am currently reading the book Mastering PostgreSQL 15 in order to get a better understanding of this specific database technology. In the past I was working more with analytical/OLAP-oriented databases like Snowflake, but now OLTP databases are more important to my current job, so I decided to start getting into that with Postgres. I will share my reading notes here.

Transactions and Locking

  • transaction chaining using COMMIT AND CHAIN
  • savepoints using SAVEPOINT abc
  • most DDL statements are transactional!
  • Multi-Versions Concurrency Control: a transaction can see data that has been committed before the start of this transaction
    • Writes do not block reads
    • In Concurrent write transactions, the second write will wait until the first one has committed
  • SELECT … FOR UPDATE lets you select rows and update these rows in a later statement, avoiding race conditions when two processes do a select and update just with WHERE conditions - nice!
    • NOWAIT or lock_timeout to avoid waiting for the other process to long
    • SKIP LOCKED will let a second transaction do a select, skipping the locked rows from the first transaction
  • Supported isolation levels: read committed, repeatable read, serializable
  • Unique row identifier: ctid, can be used as SELECT ctid … or WHERE ctid …
  • Advisory locks: named locks, SELECT pg_advisory_lock(1) waits until lock named “1” is released
  • VACUUM cleans up unused space in the file system, tables will remain the same size
    • autovacuum usually runs every minute, no manual start needed
    • VACUUM FULL rebuilds a table but locks it completely, avoid usint it!
    • If there is free space in a table after a VACUUM, it will be used by the next command allocating memory for this table
    • Row gets delted if it cannot be seen any more by anybody

Indexes

  • Index in postgres is a B-tree
  • EXPLAIN to show details of execution plan
  • Execution costs are calculated with points: reading one sequential block costs one point (configured by seq_page_cost)
    • cpu_tuple_cost and cpu_operator_cost needed for processing the blocks; many more cost parameters e.g. for parallel queries
    • Costs are an estimate, cannot be translated to real execution time
  • One index can be used in a single query multiple times (e.g. search for a column equal to one value OR another value)
  • Query optimizer can decide not use an index if it does not make sense based on the table statistics
    • Plans change based on the queries input values because postgres will optimize based on those
  • CLUSTER tables in order to store them in the same order as an index
    • Locks the table while running
    • Table can only be clustered by one indes
    • Clustered state of table will not be maintained automatically!
  • Index-only scan: If just the index column is queried, it is not necessary to go to the actual table storage