10 Things I Hate About PostgreSQL

Josh Kupershmidt

March 2013, Updated June 2013

Disclaimer: I don't actually hate PostgreSQL

Photo credit: Today is a good day on flickr

Praise for psql

Simple
Easy to understand
Easy to use correctly
Concise documentation
Works the same way on all platforms
Has sophisticated tab-completion
Modern codebase
Works equally well for interactive use and executing SQL scripts

Praise Criticism of psql

Previous slide is a lie!

~~Simple~~
~~Easy to understand~~
~~Easy to use~~
~~Concise documentation~~
~~Works the same way on all platforms~~
~~Has sophisticated tab-completion~~
~~Modern codebase~~
~~Works equally well for interactive use and executing SQL scripts~~

Simple?

psql section of manual is 30 pages (A4 PDF, 9.2)
Loaded to the gills with: command-line options, environment variables, metacommands, formatting options, hooks to your shell and editor.
Default behavior is often not what you want!
Must read: Peter Eisentraut's excellent Running SQL scripts with psql
Be particularly careful when SQL files source in other files: prefer \ir to \i, and watch for extra COMMITs in sourced files

Sophisticated Tab-Completion?

The whole tab completion machinery is pretty much just throwing darts while blindfolded, but the effort to replace it with something better is so large that we just keep poking at it the way it is...

- Robert Haas

Works the same on all platforms?

libedit vs. libreadline differences
limited libedit buffer size (EL_BUFSIZ)
poor multibyte character support in libedit
OpenSSL + libreadline licensing headache

Thanks a lot, Apple (10.6)

josh@[local] [test] # \d pg_catalog.pg_c[TAB]

malloc: *** error for object 0x9: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap

Oh, thanks again Apple (10.7)

pg_dump

Undocumented gotcha: dropping tables is verboten during pg_dump initialization

 *  ... So
 *  it is possible to get 'cache lookup failed' error if someone
 *  performs DDL changes while a dump is happening. The window for this
 *  sort of thing is from the acquisition of the transaction snapshot to
 *  getSchemaData() (when pg_dump acquires AccessShareLock on every
 *  table it intends to dump).

In fact, performing any DDL while pg_dump is running is dangerous and may give you a broken dump.
Possible fix in MVCC Catalog Access patch?

pg_dump

Don't trust pg_dump to tell you if your command-line options are nonsense:
```
$ pg_dump -Fc --clean ... 
```

Watch out for errors with --clean

$ make installcheck

$ pg_dump -Fc -d regression --file=regression.dump

$ pg_restore --clean -d regression -1 regression.dump

pg_restore: [archiver (db)] could not execute query: ERROR:  type "widget" does not exist

    Command was: DROP FUNCTION public.widget_out(widget);

Only plain-format dumps are allowed with pg_dumpall, making whole-cluster backups awkward

Bloat

Why doesn't autovacuum just deal with it?
autovacuum may fall behind a heavy write load if not tuned aggressively.
Ever perform bulk updates on a large table behind an Access Exclusive lock?

autovacuum has only limited support for compacting tables, as opposed to simply marking tuples for reuse

/*
 * lazy_truncate_heap - try to truncate off any empty pages at the end
 */

/*
 * We need full exclusive lock on the relation in order to do
 * truncation. If we can't get it, give up rather than waiting --- we
 * don't want to block other backends, and we don't want to deadlock
 * (which is quite possible considering we already hold a lower-grade
 * lock).
 */

Bloat Fighting

Keep an eye on autovacuum logs, and avoid very long-running transactions
Set up a bloat report
Shameless plug: pg_repack, formerly pg_reorg
See also Denish Patel's talk P90x Your Database for discussion of alternative tools.

Performance

Still in the Dark Ages of understanding PostgreSQL performance

Drawing by Jan Luyken, uploaded by rosewithoutathorn84

Performance

Our Competition

Our most recent change is that we now include a regression test for performance for sysbench. For each push into our "staging" tree we run a full test at different steps of "connections". We test both read-only and read-write workloads. ...
Today we generate automated reports for cachegrind, callgrind, and valgrind. We run pahole by hand. ...
All of these need to go into the process. We should never regress on L2 misses or branch predictions. We should never see holes in our structures/classes.

- Brian "Krow" Aker, June 2009

Performance

How do we stack up?

No "Performance Farm" to complement the Build Farm. Desperately needed.
Consequently, no one can tell you how PostgreSQL performance on standard benchmarks has changed over time with major and minor PostgreSQL releases, OS versions, different compilers, libc variants, x86 to x86_64, etc.

Little attention paid to the low-level optimizations Krow mentions:

$ pahole --sizes ~/runtime/bin/postgres  | sort -k3 -nr | head -5
CopyStateData  440  13
HbaLine        480  9
XLogCtlData    1376 7
VacAttrStats   320  6
Tuplesortstate 488  6

Performance

pgbench-tools very handy for automating performance tests across different scale factors, client counts, and number of workers.

But needs a lot of time to produce good results, and results are sometimes still too noisy to be usable:

 scale | clients |  tps  | stddev | num
-------+---------+-------+--------+-----
     1 |       1 | 708.9 |    9.3 |  10
     1 |       2 | 607.1 |    8.5 |  10
     1 |       4 | 616.7 |    7.2 |  10
    10 |       1 | 507.9 |   59.4 |  10
    10 |       2 | 483.9 |   60.6 |  10
    10 |       4 | 506.0 |    7.4 |  10
    50 |       1 | 190.8 |    5.5 |  10
    50 |       2 | 191.1 |    2.7 |  10
    50 |       4 | 191.3 |    3.4 |  10
   100 |       1 | 142.0 |    4.0 |  10
   100 |       2 | 158.6 |    4.5 |  10
   100 |       4 | 162.1 |    4.4 |  10
   250 |       1 |  45.3 |    3.4 |  10
   250 |       2 |  50.2 |    1.5 |  10
   250 |       4 |  53.8 |    2.0 |  10

Performance

Patch Hurdles

Author: Here is a patch to improve performance in this particular usecase, see the results from this micro-benchmark

Skeptic: What about outside of a micro-benchmark?

Skeptic: Might it negatively impact different workloads?

Skeptic: Changing around this code in HEAD will make it harder to maintain the back-branches

Skeptic: The patch adds complexity for only a possible small performance gain

Skeptic: Bizarre platforms may not properly implement the new system or library call you are relying on

Skeptic: Forensics of on-disk data may become more difficult

Backwards Compatibility

Backwards Compatibility

Ancient behaviors preserved for "compatibility", even if awful:

reindexdb --table='foo; ALTER ROLE limited WITH superuser'

libpq V3 protocol is about 10 years old now, with many desired changes.
pg_upgrade compatibility an obstacle to on-disk changes
psql, pg_dump, etc. code clogged with support for 7.x servers

Documentation

Delicious final product...

Photo credit: theimpulsivebuy on flickr

Documentation

Just don't ask how it's made...

Documentation Gripes

DocBook 4.2, SGML
Not covered by buildfarm
Finicky toolset

$ make postgres-US.pdf
        ...
        ! TeX capacity exceeded, sorry [number of strings=84278].
\IfFileExists #1#2#3->\openin \@inputcheck #1 
                                              \ifeof \@inputcheck \ifx \inpu...
l.691151 {}
           \Seq%
!  ==> Fatal error occurred, no output PDF file produced!

Dependent on old version of openjade, at least on Debian/Ubuntu
For now, Tom's duct tape keeps the PDF builds working
Lingering openjade bugs

Documentation Gripes

Flood of texpdf errors:

Underfull \hbox (badness 10000) in paragraph at lines 1016507--1016535
 []  \T1/pcr/m/n/9 src/backend/optimizer/geqo/geqo_main.c\T1/ptm/m/n/10 , rou-t
ines []  \T1/pcr/m/n/9 gimme_pool_size \T1/ptm/m/n/10 and
[1943.0.23] [1944.0.23]

Overfull \hbox (141.59937pt too wide) in paragraph at lines 78386--78386
[]\T1/pcr/m/n/9 createdb: could not connect to database postgres: could not connect to server: No such file or directory

Documentation Gripes

Broken intra-page links throughout the PDFs
Several good suggestions for improving our PDF formatting, but would need DSSSL changes
Structure of documentation doesn't do a great job of keeping up with new features. See the sheer amount of material on replication crammed into warm-standby.html.

Documentation Gripes

PDF builds take a long time (~11 mins. each), plus html build
No ccache equivalent for building documentation
Oh, and Google doesn't like us

Development

Herding Cats

Photo credit: Tjflex on flickr

Commitfest

Worst way to review patches, except all the others
Running way over schedule is the norm
Stuck in back-to-back 'fests nowadays, although the intent was to leave breathing space for design and discussion in between.
Long waits till commit the norm, even for small uncontroversial patches.
CF App encourages "one patch, one reviewer" mindset, to the detriment of large, complicated patches
Does an okay job of solving the "sometimes patches were ignored and forgotten" problem, but doesn't help much for the "no one wants to spend a hundred hours reviewing a giant patch for little credit" problem
Bellyaching every few months about the process

Commitfest

Schedule, 9.3

Start: 2012-06-15
Proposed End: 2012-07-15
Actual End: 2012-08-23

Start: 2012-09-15
Proposed End: 2012-10-15
Actual End: 2012-11-05

Start: 2012-11-15
Proposed End: 2012-12-15
Actual End: 2013-01-17

Start: 2012-01-15
Proposed End: 2012-02-15
Actual End: 2013-04-14

TODO

Todo list is incomplete, including several gripes from these slides
Many complaints end with consensus "that's too bad, but there's not much we can do without causing mass breakage/breaking pg_upgrade/breaking V3 protocol" and are not TODOs
"Please do not add items here without discussion on the mailing list" — leads to items being quietly overlooked
No way to vote for feature requests, see e.g. Audacity's wiki
Our Todo list is a barrier to new contributors, who often try to use the list to pick a first project. We need more Easy-tagged items, in particular.