h***@public.gmane.org
2011-06-19 19:33:03 UTC
Status: New
Owner: ----
New issue 186 by valievka...-***@public.gmane.org: Errors on null-bytes
http://code.google.com/p/html5lib/issues/detail?id=186
What steps will reproduce the problem?
***@karim-AO531h:~$ python
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 54, in parse
return p.parse(doc, encoding=encoding)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 213, in parse
parseMeta=parseMeta, useChardet=useChardet)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 115, in _parse
self.mainLoop()
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 172, in mainLoop
new_token = self.phase.processStartTag(new_token)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 481, in processStartTag
return self.startTagHandler[token["name"]](token)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 1024, in startTagCloseP
self.tree.insertElement(token)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/_base.py",
line 291, in insertElementNormal
element.attributes = token["data"]
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/etree_lxml.py",
line 221, in _setAttributes
self._attributes = Attributes(self, attributes)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/etree_lxml.py",
line 191, in __init__
self._element._element.attrib[name] = value
File "lxml.etree.pyx", line 2145, in lxml.etree._Attrib.__setitem__
(src/lxml/lxml.etree.c:46818)
File "apihelpers.pxi", line 563, in lxml.etree._setAttributeValue
(src/lxml/lxml.etree.c:15781)
File "apihelpers.pxi", line 1366, in lxml.etree._utf8
(src/lxml/lxml.etree.c:22211)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL
bytes or control characters
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 54, in parse
return p.parse(doc, encoding=encoding)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 213, in parse
parseMeta=parseMeta, useChardet=useChardet)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 115, in _parse
self.mainLoop()
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 172, in mainLoop
new_token = self.phase.processStartTag(new_token)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 481, in processStartTag
return self.startTagHandler[token["name"]](token)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 1024, in startTagCloseP
self.tree.insertElement(token)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/_base.py",
line 291, in insertElementNormal
element.attributes = token["data"]
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/etree_lxml.py",
line 221, in _setAttributes
self._attributes = Attributes(self, attributes)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/etree_lxml.py",
line 191, in __init__
self._element._element.attrib[name] = value
File "lxml.etree.pyx", line 2145, in lxml.etree._Attrib.__setitem__
(src/lxml/lxml.etree.c:46818)
File "apihelpers.pxi", line 563, in lxml.etree._setAttributeValue
(src/lxml/lxml.etree.c:15781)
File "apihelpers.pxi", line 1366, in lxml.etree._utf8
(src/lxml/lxml.etree.c:22211)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL
bytes or control characters
What is the expected output? What do you see instead?
Please provide any additional information below.
Null-bytes are processed wrong way in tokenizer.
Attachments:
null-byte-errors.diff 1.6 KB
Owner: ----
New issue 186 by valievka...-***@public.gmane.org: Errors on null-bytes
http://code.google.com/p/html5lib/issues/detail?id=186
What steps will reproduce the problem?
***@karim-AO531h:~$ python
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import html5lib
html5lib.parse("<p val='hu\x00'>", treebuilder="lxml")
Traceback (most recent call last):html5lib.parse("<p val='hu\x00'>", treebuilder="lxml")
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 54, in parse
return p.parse(doc, encoding=encoding)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 213, in parse
parseMeta=parseMeta, useChardet=useChardet)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 115, in _parse
self.mainLoop()
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 172, in mainLoop
new_token = self.phase.processStartTag(new_token)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 481, in processStartTag
return self.startTagHandler[token["name"]](token)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 1024, in startTagCloseP
self.tree.insertElement(token)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/_base.py",
line 291, in insertElementNormal
element.attributes = token["data"]
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/etree_lxml.py",
line 221, in _setAttributes
self._attributes = Attributes(self, attributes)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/etree_lxml.py",
line 191, in __init__
self._element._element.attrib[name] = value
File "lxml.etree.pyx", line 2145, in lxml.etree._Attrib.__setitem__
(src/lxml/lxml.etree.c:46818)
File "apihelpers.pxi", line 563, in lxml.etree._setAttributeValue
(src/lxml/lxml.etree.c:15781)
File "apihelpers.pxi", line 1366, in lxml.etree._utf8
(src/lxml/lxml.etree.c:22211)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL
bytes or control characters
html5lib.parse("<p val=\"hu\x00\">", treebuilder="lxml")
Traceback (most recent call last):File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 54, in parse
return p.parse(doc, encoding=encoding)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 213, in parse
parseMeta=parseMeta, useChardet=useChardet)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 115, in _parse
self.mainLoop()
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 172, in mainLoop
new_token = self.phase.processStartTag(new_token)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 481, in processStartTag
return self.startTagHandler[token["name"]](token)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/html5parser.py",
line 1024, in startTagCloseP
self.tree.insertElement(token)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/_base.py",
line 291, in insertElementNormal
element.attributes = token["data"]
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/etree_lxml.py",
line 221, in _setAttributes
self._attributes = Attributes(self, attributes)
File "/usr/local/lib/python2.6/dist-packages/html5lib-0.95_dev-py2.6.egg/html5lib/treebuilders/etree_lxml.py",
line 191, in __init__
self._element._element.attrib[name] = value
File "lxml.etree.pyx", line 2145, in lxml.etree._Attrib.__setitem__
(src/lxml/lxml.etree.c:46818)
File "apihelpers.pxi", line 563, in lxml.etree._setAttributeValue
(src/lxml/lxml.etree.c:15781)
File "apihelpers.pxi", line 1366, in lxml.etree._utf8
(src/lxml/lxml.etree.c:22211)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL
bytes or control characters
What is the expected output? What do you see instead?
Please provide any additional information below.
Null-bytes are processed wrong way in tokenizer.
Attachments:
null-byte-errors.diff 1.6 KB
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.