Web Caching

Author: Duane Wessels



Publisher: O'Reilly

Reviewed by: Andy Williams

Why do we need web caching?

Web caching helps the speed and efficiency of the web. Used correctly it can speed up response times to keep users happy, and can maximise network bandwidth to keep network administrators happy.

Chapter 1 is an introduction, or refresher for those who already work with the Web, on the basics of web architecture, protocols, reasons for and against web caching, and types of web caching. It explains the differnece between server and client caching - although this is not rocket science it has been my experience that a lot of developers do not understand the difference.

Chapter 2 is an explanation of how web caching works. It covers the = differences bewteen HTTP and non-HTTP (ftp, gopher etc proxy requests). Why certain pages should or shouldn't be cached. Explains what HTTP headers are used to help caching. How to force a cache refresh. This chapter also has a section on caching algorythms - although this is just a a reasonably high level.

In chapter 3 the author covers the politics of a web caching system. It includes how web cache logs can be used to compromise a users' privacy, and how caching can be used to strengthen privacy. There is a nice section on copyright issues which in my opinion does an excellent job of explaining the issues invovled.

The focus of chapter 4 is on configuring the caching mechanism on the client side. This section was a little dissapointing in that it focused mainly on Microsoft Internet Explorer and Netscape. I would have like to have seen some code based configuration, e.g. LWP. It does however briefly mention Mosaic, Lynx and Wget. There is a nice section on Proxy auto-configuration scripts, which I have never been able to master, not entirely sure that this section is in quite enough detail though.

Chapter 5 has an explation of how to make your network users actually use the cache - Interception proxying and caching.

In chapter 6 we discover how to configure servers to work with caches. This section is aimed at web administrators. It explains the important HTTP response headers that should be used. There is also a nice list of "ten ways to be cach-friendly". Apache seems to be the server of choice in this section, so sorry to all you IIS users out there.

Chapter 7 explains how caches work with other caches - a cache hierarchy. It builds on some of the knowledge from chapter 2 - hit ratios and freshness.

Chapter 8 is a good section intercahe protocols - ICP, CARP and HTCP and Cache digests. There are comparisons bewteen the protocols and a "Which Protocol to Use" section which is extrememly helpful.

Chapters 9 and 10 cover the physical requirements of a caching system - clustering, load sharing, design.

Chapters 11 and 12 are there to help with monitoring and benchmarking your new cache sytem. It includes a description of several tools - UCD-SNMP from the university of California, RDDTool and Web Polygraph. The benchmarking section (chapter 12) is incredibly detailed and has proved very useful in setting up my web cache.

Appendix A covers analysis of your web cache trace data, and backs up what Duane has been talking about throught the book.

Appendix B is an in-depth description of the Internet Cache Protocol. Most of this information is contained in RFCs 2186 and 2187, but the layout in this appendix is much easy to read and comprehend. This section builds on chapter 8.

Appendix C builds on the CARP protocol originally discussed in chapter 8.

Appendix D builds on the HCTP protocol originally discussed in chapter 8.

Appendix E builds on Cache Digests originally discussed in chapter 8.

Appendix F is a very nice section on HTTP status codes. Although all this information is in the relevant RFCs, I find it much easier to have in paper form.

There is a list of all the acronyns used in the book in appendix H.

I was hoping that this book was going to cover not only system administration of web caching but also some programming insite. Unfortunately I was mistaken, however, what this book does give you is pretty much everything you ever wanted to know about about design, deployment and operation of a Web cache.

Another good book on caching/proxying is "Web Proxy Servers" by Ari Luotonen and is well worth a read.