– title: “Postgres reading notes” image: categories: [ Databases ] —
Intro
I am currently reading the book Mastering PostgreSQL 15 in order to get a better understanding of this specific database technology. In the past I was working more with analytical/OLAP-oriented databases like Snowflake, but now OLTP databases are more important to my current job, so I decided to start getting into that with Postgres. I will share my reading notes here.
Transactions and Locking
- transaction chaining using COMMIT AND CHAIN
- savepoints using SAVEPOINT abc
- most DDL statements are transactional!
- Multi-Versions Concurrency Control: a transaction can see data that has been committed before the start of this transaction
- Writes do not block reads
- In Concurrent write transactions, the second write will wait until the first one has committed
- SELECT … FOR UPDATE lets you select rows and update these rows in a later statement, avoiding race conditions when two processes do a select and update just with WHERE conditions - nice!
- NOWAIT or lock_timeout to avoid waiting for the other process to long
- SKIP LOCKED will let a second transaction do a select, skipping the locked rows from the first transaction
- Supported isolation levels: read committed, repeatable read, serializable
- Unique row identifier: ctid, can be used as SELECT ctid … or WHERE ctid …
- Advisory locks: named locks, SELECT pg_advisory_lock(1) waits until lock named “1” is released
- VACUUM cleans up unused space in the file system, tables will remain the same size
- autovacuum usually runs every minute, no manual start needed
- VACUUM FULL rebuilds a table but locks it completely, avoid usint it!
- If there is free space in a table after a VACUUM, it will be used by the next command allocating memory for this table
- Row gets delted if it cannot be seen any more by anybody
Indexes
- Index in postgres is a B-tree
- EXPLAIN to show details of execution plan
- Execution costs are calculated with points: reading one sequential block costs one point (configured by seq_page_cost)
- cpu_tuple_cost and cpu_operator_cost needed for processing the blocks; many more cost parameters e.g. for parallel queries
- Costs are an estimate, cannot be translated to real execution time
- One index can be used in a single query multiple times (e.g. search for a column equal to one value OR another value)
- Query optimizer can decide not use an index if it does not make sense based on the table statistics
- Plans change based on the queries input values because postgres will optimize based on those
- CLUSTER tables in order to store them in the same order as an index
- Locks the table while running
- Table can only be clustered by one indes
- Clustered state of table will not be maintained automatically!
- Index-only scan: If just the index column is queried, it is not necessary to go to the actual table storage