Postgres Vacuum Cleaners
Raf
rafiq at dreamthought.com
Mon Nov 20 17:08:18 GMT 2006
Many thanks to all who answered. I'm going to do a full vacuum and then
enable the auto_vacuum_rumba daemon, which everyone 'believed' was
running.
Appreciated.
Raf
On Mon, 20 Nov 2006, Dominic Mitchell wrote:
> On Mon, Nov 20, 2006 at 01:28:01PM +0000, Raf wrote:
> > Gents and Ladies,
> >
> > I've become rusty with postgres and figured that someone might be able
> > to shed some light.
> >
> > I am trying to understand postgres' vacuum functionality. I've
> > started working with a DB, which I'm told uses "auto_vacuum."
> >
> > Now, to me this says little about the vacuuming policy, however, I
> > have seen the there are some very large files under the postgres data
> > device.
> > It seems that the size of these files exceeds the typical size of a
> > database dump, which to me suggests that there is still a physical
> > history of stale tuples in the DB and that it may need a full vacuum.
> > Googling has led me to believe that auto_vacuum only rebuilds stats /
> > does a vacuum analyse. Thus, I don't believe that this would free up
> > space on the device?
>
> It's possible to free up some space, but not necessarily likely.
>
> Backing up a step, note that PostgreSQL has two forms of vacuum:
> "ordinary" vacuum and "full" vacuum.
>
> The ordinary vacuum goes through and marks out-of-date or deleted tuples
> (rows) as free space. If it finds a full page (8kb) of contiguous free
> space, then it's possible to shrink the file it's using, but this is
> relatively rare.
>
> The full vacuum locks the entire table, and effectively writes out a new
> file containing no space at all, removing deleted and out-of-date tuples
> along the way. Once upon a time (Pg 7.1), this was the only kind of
> vacuum.
>
> So a full vacuum gets you a smaller space in use. *But*, it's not
> necessarily the right thing to do. This is because there's inevitably
> some amount of "churn" in your database as rows get deleted and updated.
> So keeping around a bit of empty space in the file turns out to help
> matters, because you don't have to go off and ask the O/S for more
> space.
>
> In recent versions of PostgreSQL (8.1 and later I think), there is a
> tool called auto_vacuum. This is a little daemon which runs alongside
> PostgreSQL, monitoring the update / insert / delete stats. When so many
> operations have occurred, it triggers an ordinary vacuum for you. This
> is (in general) a good idea, as it means you don't end up with dodgy
> cron scripts trying to guess when is a good time to do a daily vacuum.
>
> NB: If you're running 7.4 or 8.0, all is not lost. The auto_vacuum
> daemon is in the "contrib" package and has to be stopped/started
> separately.
>
> If you're wondering when a full vacuum might be useful, I'd say that
> it's handy when you have deleted most of the rows in a table and do not
> plan on refilling that table. For example, in one of our databases, we
> went from "keep history data forever" to "keep 100 days of history".
> Performing a one-off full vacuum shrunk the space used in that table
> considerably.
>
> > Now, I was hoping that someone might be able to give be a brief
> > synopsis of the different vacuum arguments and their functionality.
> > I've tried googl'ing, but I'm hoping that this will be more
> > meaningful. Further, is it correct to assume that while vacuuming the
> > db would be taken off line, or is there a way to expose it in a
> > query-only mode?
>
> The PostgreSQL documentation is actually quite reasonable.
>
> http://www.postgresql.org/docs/8.1/static/maintenance.html#ROUTINE-VACUUMING
>
> The "ordinary" vacuum will let queries (and updates) run at the same
> time. Although the vacuum can still be heavy on the disk I/O, so you
> may need to override the auto_vacuum daemon and run it manually at a
> time of day when the database is less busy.
>
> > The goal of my mission is to free up space and speed up queries.
> > Suggestions?
>
> To start with, use the auto_vacuum daemon until it becomes a problem.
> When it comes to speeding up queries, there are a few parameters in
> postgresql.conf that can make a big difference to the speed. I
> recommend this page for tuning PostgreSQL.
>
> http://varlena.com/varlena/GeneralBits/Tidbits/perf.html
>
> It's a little bit old now, but it still seems to cover the salient
> points.
>
> I'd also recommend browsing the "runtime configuration" section of the
> manual, so you know what's available for tweaking.
>
> http://www.postgresql.org/docs/8.1/static/runtime-config.html
>
> Oh, and there's a section on disk space usage as well.
>
> http://www.postgresql.org/docs/8.1/static/diskusage.html
>
> -Dom
>
More information about the london.pm
mailing list