10 Things I Hate About PostgreSQL


Josh Kupershmidt
March 2013, Updated June 2013

Arizona PUG

Disclaimer: I don't actually hate PostgreSQL
Elephants
Photo credit: Today is a good day on flickr

Praise for psql

  • Simple
  • Easy to understand
  • Easy to use correctly
  • Concise documentation
  • Works the same way on all platforms
  • Has sophisticated tab-completion
  • Modern codebase
  • Works equally well for interactive use and executing SQL scripts

Praise Criticism of psql

Previous slide is a lie!
  • Simple
  • Easy to understand
  • Easy to use
  • Concise documentation
  • Works the same way on all platforms
  • Has sophisticated tab-completion
  • Modern codebase
  • Works equally well for interactive use and executing SQL scripts

Simple?

  • psql section of manual is 30 pages (A4 PDF, 9.2)
  • Loaded to the gills with: command-line options, environment variables, metacommands, formatting options, hooks to your shell and editor.
  • Default behavior is often not what you want!
  • Must read: Peter Eisentraut's excellent Running SQL scripts with psql
  • Be particularly careful when SQL files source in other files: prefer \ir to \i, and watch for extra COMMITs in sourced files

Sophisticated Tab-Completion?

The whole tab completion machinery is pretty much just throwing darts while blindfolded, but the effort to replace it with something better is so large that we just keep poking at it the way it is...

- Robert Haas

Works the same on all platforms?

  • libedit vs. libreadline differences
  • limited libedit buffer size (EL_BUFSIZ)
  • poor multibyte character support in libedit
  • OpenSSL + libreadline licensing headache
  • Thanks a lot, Apple (10.6)
    josh@[local] [test] # \d pg_catalog.pg_c[TAB]
    
    malloc: *** error for object 0x9: pointer being freed was not allocated
    *** set a breakpoint in malloc_error_break to debug
    Abort trap
    
  • Oh, thanks again Apple (10.7)

pg_dump

  • Undocumented gotcha: dropping tables is verboten during pg_dump initialization
  •  *  ... So
     *  it is possible to get 'cache lookup failed' error if someone
     *  performs DDL changes while a dump is happening. The window for this
     *  sort of thing is from the acquisition of the transaction snapshot to
     *  getSchemaData() (when pg_dump acquires AccessShareLock on every
     *  table it intends to dump).
    
  • In fact, performing any DDL while pg_dump is running is dangerous and may give you a broken dump.
  • Possible fix in MVCC Catalog Access patch?

pg_dump

  • Don't trust pg_dump to tell you if your command-line options are nonsense:
    $ pg_dump -Fc --clean ... 
  • Watch out for errors with --clean
    $ make installcheck
    $ pg_dump -Fc -d regression --file=regression.dump
    $ pg_restore --clean -d regression -1 regression.dump
    pg_restore: [archiver (db)] could not execute query: ERROR:  type "widget" does not exist
        Command was: DROP FUNCTION public.widget_out(widget);
  • Only plain-format dumps are allowed with pg_dumpall, making whole-cluster backups awkward

Bloat

  • Why doesn't autovacuum just deal with it?
  • autovacuum may fall behind a heavy write load if not tuned aggressively.
  • Ever perform bulk updates on a large table behind an Access Exclusive lock?
  • autovacuum has only limited support for compacting tables, as opposed to simply marking tuples for reuse
    /*
     * lazy_truncate_heap - try to truncate off any empty pages at the end
     */
    
    /*
     * We need full exclusive lock on the relation in order to do
     * truncation. If we can't get it, give up rather than waiting --- we
     * don't want to block other backends, and we don't want to deadlock
     * (which is quite possible considering we already hold a lower-grade
     * lock).
     */
    

Bloat Fighting

  • Keep an eye on autovacuum logs, and avoid very long-running transactions
  • Set up a bloat report
  • Shameless plug: pg_repack, formerly pg_reorg
  • See also Denish Patel's talk P90x Your Database for discussion of alternative tools.

Performance

Still in the Dark Ages of understanding PostgreSQL performance

Witch Burning
Drawing by Jan Luyken, uploaded by rosewithoutathorn84

Performance

Our Competition

Our most recent change is that we now include a regression test for performance for sysbench. For each push into our "staging" tree we run a full test at different steps of "connections". We test both read-only and read-write workloads. ...
Today we generate automated reports for cachegrind, callgrind, and valgrind. We run pahole by hand. ...
All of these need to go into the process. We should never regress on L2 misses or branch predictions. We should never see holes in our structures/classes.

- Brian "Krow" Aker, June 2009

Performance

How do we stack up?

  • No "Performance Farm" to complement the Build Farm. Desperately needed.
  • Consequently, no one can tell you how PostgreSQL performance on standard benchmarks has changed over time with major and minor PostgreSQL releases, OS versions, different compilers, libc variants, x86 to x86_64, etc.
  • Little attention paid to the low-level optimizations Krow mentions:
    $ pahole --sizes ~/runtime/bin/postgres  | sort -k3 -nr | head -5
    CopyStateData  440  13
    HbaLine        480  9
    XLogCtlData    1376 7
    VacAttrStats   320  6
    Tuplesortstate 488  6
    

Performance

  • pgbench-tools very handy for automating performance tests across different scale factors, client counts, and number of workers.
  • But needs a lot of time to produce good results, and results are sometimes still too noisy to be usable:
     scale | clients |  tps  | stddev | num
    -------+---------+-------+--------+-----
         1 |       1 | 708.9 |    9.3 |  10
         1 |       2 | 607.1 |    8.5 |  10
         1 |       4 | 616.7 |    7.2 |  10
        10 |       1 | 507.9 |   59.4 |  10
        10 |       2 | 483.9 |   60.6 |  10
        10 |       4 | 506.0 |    7.4 |  10
        50 |       1 | 190.8 |    5.5 |  10
        50 |       2 | 191.1 |    2.7 |  10
        50 |       4 | 191.3 |    3.4 |  10
       100 |       1 | 142.0 |    4.0 |  10
       100 |       2 | 158.6 |    4.5 |  10
       100 |       4 | 162.1 |    4.4 |  10
       250 |       1 |  45.3 |    3.4 |  10
       250 |       2 |  50.2 |    1.5 |  10
       250 |       4 |  53.8 |    2.0 |  10
    

Performance

Patch Hurdles

Author: Here is a patch to improve performance in this particular usecase, see the results from this micro-benchmark

Skeptic: What about outside of a micro-benchmark?

Skeptic: Might it negatively impact different workloads?

Skeptic: Changing around this code in HEAD will make it harder to maintain the back-branches

Skeptic: The patch adds complexity for only a possible small performance gain

Skeptic: Bizarre platforms may not properly implement the new system or library call you are relying on

Skeptic: Forensics of on-disk data may become more difficult

Backwards Compatibility

XKCD 1172

Backwards Compatibility

  • Ancient behaviors preserved for "compatibility", even if awful:
    reindexdb --table='foo; ALTER ROLE limited WITH superuser'
          
  • libpq V3 protocol is about 10 years old now, with many desired changes.
  • pg_upgrade compatibility an obstacle to on-disk changes
  • psql, pg_dump, etc. code clogged with support for 7.x servers

Documentation

Delicious final product...

Chicken Nuggets
Photo credit: theimpulsivebuy on flickr

Documentation

Just don't ask how it's made...

Chicken Goo

Documentation Gripes

  • DocBook 4.2, SGML
  • Not covered by buildfarm
  • Finicky toolset
$ make postgres-US.pdf
        ...
        ! TeX capacity exceeded, sorry [number of strings=84278].
\IfFileExists #1#2#3->\openin \@inputcheck #1 
                                              \ifeof \@inputcheck \ifx \inpu...
l.691151 {}
           \Seq%
!  ==> Fatal error occurred, no output PDF file produced!
  • Dependent on old version of openjade, at least on Debian/Ubuntu
  • For now, Tom's duct tape keeps the PDF builds working
  • Lingering openjade bugs

Documentation Gripes

  • Flood of texpdf errors:
Underfull \hbox (badness 10000) in paragraph at lines 1016507--1016535
 []  \T1/pcr/m/n/9 src/backend/optimizer/geqo/geqo_main.c\T1/ptm/m/n/10 , rou-t
ines []  \T1/pcr/m/n/9 gimme_pool_size \T1/ptm/m/n/10 and
[1943.0.23] [1944.0.23]
            texpdf underfull hbox
        
Overfull \hbox (141.59937pt too wide) in paragraph at lines 78386--78386
[]\T1/pcr/m/n/9 createdb: could not connect to database postgres: could not connect to server: No such file or directory 
texpdf overfull hbox

Documentation Gripes

  • Broken intra-page links throughout the PDFs
    Broken Intra-Page Links
  • Several good suggestions for improving our PDF formatting, but would need DSSSL changes
  • Structure of documentation doesn't do a great job of keeping up with new features. See the sheer amount of material on replication crammed into warm-standby.html.

Documentation Gripes

  • PDF builds take a long time (~11 mins. each), plus html build
  • No ccache equivalent for building documentation
  • Oh, and Google doesn't like us
Google returning old doc page links

Development

Herding Cats
Cats
Photo credit: Tjflex on flickr
  

Commitfest

  • Worst way to review patches, except all the others
  • Running way over schedule is the norm
  • Stuck in back-to-back 'fests nowadays, although the intent was to leave breathing space for design and discussion in between.
  • Long waits till commit the norm, even for small uncontroversial patches.
  • CF App encourages "one patch, one reviewer" mindset, to the detriment of large, complicated patches
  • Does an okay job of solving the "sometimes patches were ignored and forgotten" problem, but doesn't help much for the "no one wants to spend a hundred hours reviewing a giant patch for little credit" problem
  • Bellyaching every few months about the process

Commitfest

Schedule, 9.3
  • CF1
Start: 2012-06-15
Proposed End: 2012-07-15
Actual End: 2012-08-23
  • CF2
Start: 2012-09-15
Proposed End: 2012-10-15
Actual End: 2012-11-05
  • CF3
Start: 2012-11-15
Proposed End: 2012-12-15
Actual End: 2013-01-17
  • CF4
Start: 2012-01-15
Proposed End: 2012-02-15
Actual End: 2013-04-14

TODO

  • Todo list is incomplete, including several gripes from these slides
  • Many complaints end with consensus "that's too bad, but there's not much we can do without causing mass breakage/breaking pg_upgrade/breaking V3 protocol" and are not TODOs
  • "Please do not add items here without discussion on the mailing list" — leads to items being quietly overlooked
  • No way to vote for feature requests, see e.g. Audacity's wiki
  • Our Todo list is a barrier to new contributors, who often try to use the list to pick a first project. We need more Easy-tagged items, in particular.