Web weirdness
Smylers
Smylers at stripey.com
Wed Jun 20 14:11:06 BST 2007
David Cantrell writes:
> My understanding is that you didn't need to encode ampersands in URLs
You probably don't _need_ to, in terms of browsers try to be very
accommodating, but if you send a URL with a raw ampersand through the
W3C HTML validator it complains with:
Entity references start with an ampersand (&) and end with a semicolon
(;). If you want to use a literal ampersand in your document you must
encode it as "&" (even inside URLs!).
and points you here:
http://www.htmlhelp.com/tools/validator/problems.html#amp
> unless they would otherwise look like the beginning of an entity - so
> the string '"' would have to be represented as '%XXquot;' or
> somesuch.
There's 2 ways of writing that, with different meanings:
* &quot%3B is the way of writing in HTML a URL fragment which will
display in a browser's URL bar like "%3B, where (presuming this is
in the query part) the ampersand signifies that this is a new
parameter.
* %26quot%3B would appear exactly like that in a displayed URL bar; as
part of a URL query it is all literal text, with all of the
characters, including the ampersand, being continuations of the value
for the preceding parameter
> &image isn't a named entity though. Anything that thinks it is is
> broken.
I believe there are circumstances in SGML (which HTML claims to be) in
which the trailing semicolon is optional. I don't think this is one of
them, but ...
> Browsers, for example, treat &image=blah correctly,
Arguably the 'correct' thing to do is to report a syntax error and
refuse to parse the document. Otherwise software has to guess at what
the error and fix are; that different software makes different guesses
isn't surprising.
Smylers
More information about the london.pm
mailing list