h***@public.gmane.org
2012-09-04 15:06:03 UTC
Status: New
Owner: ----
New issue 212 by alexaho...-***@public.gmane.org: Incorrect parsing of attribute names
that contain the ":" (colon) character
http://code.google.com/p/html5lib/issues/detail?id=212
What steps will reproduce the problem?
{{{
import html5lib
import lxml.etree
tree = html5lib.parse(u'<html xmlns:foo="http://www.example.com/ns/foo"
foo:bar="baz"></html>', treebuilder = 'lxml', namespaceHTMLElements = True)
return lxml.etree.tostring(tree)
}}}
What is the expected output?
{{{
<html:html xmlns:html="http://www.w3.org/1999/xhtml"
xmlns:foo="http://www.example.com/ns/foo"
foo:bar="baz"><html:head/><html:body/></html:html>
}}}
What do you see instead?
{{{
<html:html xmlns:html="http://www.w3.org/1999/xhtml"
xmlnsU0003Afoo="http://www.example.com/ns/foo"
fooU0003Abar="baz"><html:head/><html:body/></html:html>
}}}
Note the `U0003A` which is Unicode for ":" (colon) and the fact that it
works with the `xmlns:html` attribute which makes me think it's hardcoded.
Please provide any additional information below.
Given:
{{{
html = tree.getroot()
}}}
`html.nsmap` does not contain the `foo` namespace. Also, the attribute
`xmlns:foo` does not exist in `html.attrib`, but instead
`html.attrib['xmlnsU0003Afoo']` works.
I assume it has nothing to do with the treebuilder (`lxml`) since parsing
the same string with the included HTML parser (`lxml.html`) works as
expected.
Owner: ----
New issue 212 by alexaho...-***@public.gmane.org: Incorrect parsing of attribute names
that contain the ":" (colon) character
http://code.google.com/p/html5lib/issues/detail?id=212
What steps will reproduce the problem?
{{{
import html5lib
import lxml.etree
tree = html5lib.parse(u'<html xmlns:foo="http://www.example.com/ns/foo"
foo:bar="baz"></html>', treebuilder = 'lxml', namespaceHTMLElements = True)
return lxml.etree.tostring(tree)
}}}
What is the expected output?
{{{
<html:html xmlns:html="http://www.w3.org/1999/xhtml"
xmlns:foo="http://www.example.com/ns/foo"
foo:bar="baz"><html:head/><html:body/></html:html>
}}}
What do you see instead?
{{{
<html:html xmlns:html="http://www.w3.org/1999/xhtml"
xmlnsU0003Afoo="http://www.example.com/ns/foo"
fooU0003Abar="baz"><html:head/><html:body/></html:html>
}}}
Note the `U0003A` which is Unicode for ":" (colon) and the fact that it
works with the `xmlns:html` attribute which makes me think it's hardcoded.
Please provide any additional information below.
Given:
{{{
html = tree.getroot()
}}}
`html.nsmap` does not contain the `foo` namespace. Also, the attribute
`xmlns:foo` does not exist in `html.attrib`, but instead
`html.attrib['xmlnsU0003Afoo']` works.
I assume it has nothing to do with the treebuilder (`lxml`) since parsing
the same string with the included HTML parser (`lxml.html`) works as
expected.
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.