Issue 195 in html5lib: Self closing <title/> breaks parsing

Discussion:

h***@public.gmane.org

13 years ago

Status: New
Owner: ----

New issue 195 by kovidgo...-***@public.gmane.org: Self closing <title/> breaks parsing
http://code.google.com/p/html5lib/issues/detail?id=195

A self closing title tag causes the parser to treat all subsequent text as
CDATA. To reproduce run the following code in a python interpreter

import html5lib
from lxml import etree
print etree.tostring(html5lib.parse('<html><head><title/></head></html>',
namespaceHTMLElements=False, treebuilder='lxml'))

Output is: <html><head><title></head></html></title></head><body/></html>

Output should be
<html><head><title/></head><body/></html>

--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.

h***@public.gmane.org

13 years ago

Permalink

Updates:
Status: Invalid

Comment #1 on issue 195 by t.broyer: Self closing <title/> breaks parsing
http://code.google.com/p/html5lib/issues/detail?id=195

There's no such things as "self closing tags" in HTML.

In this case, the '/' is ignored (generates a "parse error" though), so
everything following it is read as the title's content.

h***@public.gmane.org

13 years ago

Permalink

Comment #2 on issue 195 by kovidgo...-***@public.gmane.org: Self closing <title/>
breaks parsing
http://code.google.com/p/html5lib/issues/detail?id=195

First, thanks for the quick response.

Note that, html5lib handles self closing a, p, div, table, body and even
html tags correctly. Handling <title/> as an instruction to treat the rest
of the document as text is the wrong decision. It is *never* going to be
right thing to do. But, given that browsers make this same mistake, I'm
guessing you are not going to change this. Oh well, a spot of regex to the
rescue.

h***@public.gmane.org

13 years ago

Permalink

Comment #3 on issue 195 by hob...-***@public.gmane.org: Self closing <title/> breaks
parsing
http://code.google.com/p/html5lib/issues/detail?id=195

html5lib is behaving correctly per the spec. of the HTML parsing algorithm.
When the tree builder sees a start tag whose tag name is "title" while in
the relevant insertion modes, it puts the tokenizer into the RCDATA state.

h***@public.gmane.org

13 years ago

Permalink

Comment #4 on issue 195 by kovidgo...-***@public.gmane.org: Self closing <title/>
breaks parsing
http://code.google.com/p/html5lib/issues/detail?id=195

Sure, I can believe that, given that browsers do the same thing. If it were
up to me, I would depart from the spec on something that is so obviously
wrong. But, I can see the value in having a parser that hews closely to the
spec. I can always use a regex/patch html5lib myself for my use cases.