alexander
2013-01-19 04:24:17 UTC
*For example*: (missing quotes):
1. <a href="http://www.somewebsite.com> some link </a>
2. <img src=">
It only happens when the input has nothing valid. If i replace example 2
with any of this:
1. <div> <img src="> </div>
2. <div> </div> <img src=">
(it can be div, or anything else, as long as the malformed tag is not the
only element)
*Code fragment*:
parser = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder('dom'), tokenizer=sanitizer.HTMLSanitizer)
sometree = parser.parseFragment(bad_html)
walker = treewalkers.getTreeWalker('dom')
stream = walker(sometree)
s = serializer.htmlserializer.HTMLSerializer(quote_attr_values=True)
nice_html = s.render(stream) <----*it fails here*
*The question*:
I would like to know if this is the expected behavior or i am doing something wrong.
*Additional** info*:
I'm using the lib for sanitizing user input.
*Output*:
File "somemodule.py", line 20, in somefunction
nice_html = s.render(stream)
File "some_env/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/serializer/htmlserializer.py", line 302, in render
return u"".join(list(self.serialize(treewalker)))
File "some_env/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/serializer/htmlserializer.py", line 192, in serialize
for token in treewalker:
File "some_env/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/filters/optionaltags.py", line 15, in __iter__
type = token["type"]
TypeError: 'NoneType' object has no attribute '__getitem__'
1. <a href="http://www.somewebsite.com> some link </a>
2. <img src=">
It only happens when the input has nothing valid. If i replace example 2
with any of this:
1. <div> <img src="> </div>
2. <div> </div> <img src=">
(it can be div, or anything else, as long as the malformed tag is not the
only element)
*Code fragment*:
parser = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder('dom'), tokenizer=sanitizer.HTMLSanitizer)
sometree = parser.parseFragment(bad_html)
walker = treewalkers.getTreeWalker('dom')
stream = walker(sometree)
s = serializer.htmlserializer.HTMLSerializer(quote_attr_values=True)
nice_html = s.render(stream) <----*it fails here*
*The question*:
I would like to know if this is the expected behavior or i am doing something wrong.
*Additional** info*:
I'm using the lib for sanitizing user input.
*Output*:
File "somemodule.py", line 20, in somefunction
nice_html = s.render(stream)
File "some_env/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/serializer/htmlserializer.py", line 302, in render
return u"".join(list(self.serialize(treewalker)))
File "some_env/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/serializer/htmlserializer.py", line 192, in serialize
for token in treewalker:
File "some_env/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/filters/optionaltags.py", line 15, in __iter__
type = token["type"]
TypeError: 'NoneType' object has no attribute '__getitem__'
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/html5lib-discuss/-/sSiTs1l1xNcJ.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/html5lib-discuss/-/sSiTs1l1xNcJ.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.