h***@public.gmane.org
2012-03-06 22:16:10 UTC
Status: New
Owner: ----
New issue 200 by vova...-***@public.gmane.org: html5lib.treebuilders.dom.dom2sax
crashes on 'xml:lang' attribute
http://code.google.com/p/html5lib/issues/detail?id=200
A simple test case(my program has more complex handler implementation but
the problem is reproducible with the default handler):
import xml.sax.handler
import html5lib
def test(html):
handler = xml.sax.handler.ContentHandler()
parser =
html5lib.HTMLParser(tree=html5lib.treebuilders.getTreeBuilder('dom'))
dom = parser.parse(html)
html5lib.treebuilders.dom.dom2sax(dom, handler)
html = '<html xml:lang="en">'
test(html)
With html5lib 0.95 it produces the following traceback:
python test.py
Traceback (most recent call last):
File "test.py", line 13, in <module>
test(html)
File "test.py", line 10, in test
html5lib.treebuilders.dom.dom2sax(dom, handler)
File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py",
line 271, in dom2sax
for child in node.childNodes: dom2sax(child, handler, nsmap)
File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py",
line 256, in dom2sax
del attributes[(attr.namespaceURI, attr.nodeName)]
KeyError: (None, u'xml:lang')
With previous versions(at least 0.11) there's no any error. I assume this
attribute may be invalid in the xml namespace, but anyway I don't think it
is ok for parser just to crash. I've seen A LOT of html documents that has
such attribute in the real world.
Tested it with Python 2.6.5, Linux
Please advise.
Thanks,
--Vladimir
Owner: ----
New issue 200 by vova...-***@public.gmane.org: html5lib.treebuilders.dom.dom2sax
crashes on 'xml:lang' attribute
http://code.google.com/p/html5lib/issues/detail?id=200
A simple test case(my program has more complex handler implementation but
the problem is reproducible with the default handler):
import xml.sax.handler
import html5lib
def test(html):
handler = xml.sax.handler.ContentHandler()
parser =
html5lib.HTMLParser(tree=html5lib.treebuilders.getTreeBuilder('dom'))
dom = parser.parse(html)
html5lib.treebuilders.dom.dom2sax(dom, handler)
html = '<html xml:lang="en">'
test(html)
With html5lib 0.95 it produces the following traceback:
python test.py
Traceback (most recent call last):
File "test.py", line 13, in <module>
test(html)
File "test.py", line 10, in test
html5lib.treebuilders.dom.dom2sax(dom, handler)
File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py",
line 271, in dom2sax
for child in node.childNodes: dom2sax(child, handler, nsmap)
File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py",
line 256, in dom2sax
del attributes[(attr.namespaceURI, attr.nodeName)]
KeyError: (None, u'xml:lang')
With previous versions(at least 0.11) there's no any error. I assume this
attribute may be invalid in the xml namespace, but anyway I don't think it
is ok for parser just to crash. I've seen A LOT of html documents that has
such attribute in the real world.
Tested it with Python 2.6.5, Linux
Please advise.
Thanks,
--Vladimir
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.