Discussion:
Problem with attribute having unclosed quotes (python html5lib)
alexander
2013-01-19 04:24:17 UTC
Permalink
*For example*: (missing quotes):
1. <a href="http://www.somewebsite.com> some link </a>
2. <img src=">

It only happens when the input has nothing valid. If i replace example 2
with any of this:
1. <div> <img src="> </div>
2. <div> </div> <img src=">

(it can be div, or anything else, as long as the malformed tag is not the
only element)


*Code fragment*:

parser = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder('dom'), tokenizer=sanitizer.HTMLSanitizer)
sometree = parser.parseFragment(bad_html)
walker = treewalkers.getTreeWalker('dom')
stream = walker(sometree)

s = serializer.htmlserializer.HTMLSerializer(quote_attr_values=True)
nice_html = s.render(stream) <----*it fails here*



*The question*:

I would like to know if this is the expected behavior or i am doing something wrong.



*Additional** info*:

I'm using the lib for sanitizing user input.


*Output*:

File "somemodule.py", line 20, in somefunction

nice_html = s.render(stream)

File "some_env/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/serializer/htmlserializer.py", line 302, in render
return u"".join(list(self.serialize(treewalker)))


File "some_env/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/serializer/htmlserializer.py", line 192, in serialize

for token in treewalker:

File "some_env/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/filters/optionaltags.py", line 15, in __iter__
type = token["type"]

TypeError: 'NoneType' object has no attribute '__getitem__'
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/html5lib-discuss/-/sSiTs1l1xNcJ.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
Geoffrey Sneddon
2013-01-22 13:31:28 UTC
Permalink
Post by alexander
1. <a href="http://www.somewebsite.com> some link </a>
2. <img src=">
It only happens when the input has nothing valid. If i replace example 2
1. <div> <img src="> </div>
2. <div> </div> <img src=">
(it can be div, or anything else, as long as the malformed tag is not the
only element)

Post by alexander
I would like to know if this is the expected behavior or i am doing something wrong.


Post by alexander
TypeError: 'NoneType' object has no attribute '__getitem__'
That it throws an exception is a bug. The expected output would be
something along the lines of:

1. <a href="http://www.somewebsite.com> some link</a>"></a>
2. <img src=">">

Thanks,

Geoffrey.
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+***@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
Loading...