Postgres Vacuum Cleaners

Dominic Mitchell dom at happygiraffe.net
Mon Nov 20 14:21:15 GMT 2006


On Mon, Nov 20, 2006 at 01:28:01PM +0000, Raf wrote:
> Gents and Ladies,
>
> I've become rusty with postgres and figured that someone might be able
> to shed some light.
>
> I am trying to understand postgres' vacuum functionality.  I've
> started working with a DB, which I'm told uses "auto_vacuum."
>
> Now, to me this says little about the vacuuming policy, however, I
> have seen the there are some very large files under the postgres data
> device.
> It seems that the size of these files exceeds the typical size of a
> database dump, which to me suggests that there is still a physical
> history of stale tuples in the DB and that it may need a full vacuum.
> Googling has led me to believe that auto_vacuum only rebuilds stats /
> does a vacuum analyse.  Thus, I don't believe that this would free up
> space on the device?

It's possible to free up some space, but not necessarily likely.

Backing up a step, note that PostgreSQL has two forms of vacuum:
"ordinary" vacuum and "full" vacuum.

The ordinary vacuum goes through and marks out-of-date or deleted tuples
(rows) as free space.  If it finds a full page (8kb) of contiguous free
space, then it's possible to shrink the file it's using, but this is
relatively rare.

The full vacuum locks the entire table, and effectively writes out a new
file containing no space at all, removing deleted and out-of-date tuples
along the way.  Once upon a time (Pg 7.1), this was the only kind of
vacuum.

So a full vacuum gets you a smaller space in use.  *But*, it's not
necessarily the right thing to do.  This is because there's inevitably
some amount of "churn" in your database as rows get deleted and updated.
So keeping around a bit of empty space in the file turns out to help
matters, because you don't have to go off and ask the O/S for more
space.

In recent versions of PostgreSQL (8.1 and later I think), there is a
tool called auto_vacuum.  This is a little daemon which runs alongside
PostgreSQL, monitoring the update / insert / delete stats.  When so many
operations have occurred, it triggers an ordinary vacuum for you.  This
is (in general) a good idea, as it means you don't end up with dodgy
cron scripts trying to guess when is a good time to do a daily vacuum.

  NB: If you're running 7.4 or 8.0, all is not lost.  The auto_vacuum
  daemon is in the "contrib" package and has to be stopped/started
  separately.

If you're wondering when a full vacuum might be useful, I'd say that
it's handy when you have deleted most of the rows in a table and do not
plan on refilling that table.  For example, in one of our databases, we
went from "keep history data forever" to "keep 100 days of history".
Performing a one-off full vacuum shrunk the space used in that table
considerably.

> Now, I was hoping that someone might be able to give be a brief
> synopsis of the different vacuum arguments and their functionality.
> I've tried googl'ing, but I'm hoping that this will be more
> meaningful.  Further, is it correct to assume that while vacuuming the
> db would be taken off line, or is there a way to expose it in a
> query-only mode?

The PostgreSQL documentation is actually quite reasonable.

  http://www.postgresql.org/docs/8.1/static/maintenance.html#ROUTINE-VACUUMING

The "ordinary" vacuum will let queries (and updates) run at the same
time.  Although the vacuum can still be heavy on the disk I/O, so you
may need to override the auto_vacuum daemon and run it manually at a
time of day when the database is less busy.

> The goal of my mission is to free up space and speed up queries.
> Suggestions?

To start with, use the auto_vacuum daemon until it becomes a problem.
When it comes to speeding up queries, there are a few parameters  in
postgresql.conf that can make a big difference to the speed.  I
recommend this page for tuning PostgreSQL.

  http://varlena.com/varlena/GeneralBits/Tidbits/perf.html

It's a little bit old now, but it still seems to cover the salient
points.

I'd also recommend browsing the "runtime configuration" section of the
manual, so you know what's available for tweaking.

  http://www.postgresql.org/docs/8.1/static/runtime-config.html

Oh, and there's a section on disk space usage as well.

  http://www.postgresql.org/docs/8.1/static/diskusage.html

-Dom


More information about the london.pm mailing list