Discussion:
Issue 186 in html5lib: Errors on null-bytes
h***@public.gmane.org
2011-06-19 19:33:03 UTC
Permalink
Status: New
Owner: ----

New issue 186 by valievka...-***@public.gmane.org: Errors on null-bytes
http://code.google.com/p/html5lib/issues/detail?id=186

What steps will reproduce the problem?

***@karim-AO531h:~$ python
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import html5lib
html5lib.parse("<p val='hu\x00'>", treebuilder="lxml")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 54, in parse
return p.parse(doc, encoding=encoding)

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 213, in parse
parseMeta=parseMeta, useChardet=useChardet)

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 115, in _parse
self.mainLoop()

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 172, in mainLoop
new_token = self.phase.processStartTag(new_token)

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 481, in processStartTag
return self.startTagHandler[token["name"]](token)

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 1024, in startTagCloseP
self.tree.insertElement(token)

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/_base.py",
line 291, in insertElementNormal
element.attributes = token["data"]

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/etree_lxml.py",
line 221, in _setAttributes
self._attributes = Attributes(self, attributes)

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/etree_lxml.py",
line 191, in __init__
self._element._element.attrib[name] = value
File "lxml.etree.pyx", line 2145, in lxml.etree._Attrib.__setitem__
(src/lxml/lxml.etree.c:46818)
File "apihelpers.pxi", line 563, in lxml.etree._setAttributeValue
(src/lxml/lxml.etree.c:15781)
File "apihelpers.pxi", line 1366, in lxml.etree._utf8
(src/lxml/lxml.etree.c:22211)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL
bytes or control characters
html5lib.parse("<p val=\"hu\x00\">", treebuilder="lxml")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 54, in parse
return p.parse(doc, encoding=encoding)

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 213, in parse
parseMeta=parseMeta, useChardet=useChardet)

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 115, in _parse
self.mainLoop()

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 172, in mainLoop
new_token = self.phase.processStartTag(new_token)

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 481, in processStartTag
return self.startTagHandler[token["name"]](token)

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 1024, in startTagCloseP
self.tree.insertElement(token)

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/_base.py",
line 291, in insertElementNormal
element.attributes = token["data"]

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/etree_lxml.py",
line 221, in _setAttributes
self._attributes = Attributes(self, attributes)

File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/etree_lxml.py",
line 191, in __init__
self._element._element.attrib[name] = value
File "lxml.etree.pyx", line 2145, in lxml.etree._Attrib.__setitem__
(src/lxml/lxml.etree.c:46818)
File "apihelpers.pxi", line 563, in lxml.etree._setAttributeValue
(src/lxml/lxml.etree.c:15781)
File "apihelpers.pxi", line 1366, in lxml.etree._utf8
(src/lxml/lxml.etree.c:22211)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL
bytes or control characters



What is the expected output? What do you see instead?


Please provide any additional information below.
Null-bytes are processed wrong way in tokenizer.


Attachments:
null-byte-errors.diff 1.6 KB
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
h***@public.gmane.org
2013-05-01 22:38:19 UTC
Permalink
Updates:
Status: GitHub

Comment #2 on issue 186 by geoffers: Errors on null-bytes
http://code.google.com/p/html5lib/issues/detail?id=186

https://github.com/html5lib/html5lib-python/issues/33
--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
For more options, visit https://groups.google.com/groups/opt_out.
Loading...