Discussion:
Issue 212 in html5lib: Incorrect parsing of attribute names that contain the ":" (colon) character
h***@public.gmane.org
2012-09-04 15:06:03 UTC
Permalink
Status: New
Owner: ----

New issue 212 by alexaho...-***@public.gmane.org: Incorrect parsing of attribute names
that contain the ":" (colon) character
http://code.google.com/p/html5lib/issues/detail?id=212

What steps will reproduce the problem?
{{{
import html5lib
import lxml.etree

tree = html5lib.parse(u'<html xmlns:foo="http://www.example.com/ns/foo"
foo:bar="baz"></html>', treebuilder = 'lxml', namespaceHTMLElements = True)
return lxml.etree.tostring(tree)
}}}


What is the expected output?
{{{
<html:html xmlns:html="http://www.w3.org/1999/xhtml"
xmlns:foo="http://www.example.com/ns/foo"
foo:bar="baz"><html:head/><html:body/></html:html>
}}}


What do you see instead?
{{{
<html:html xmlns:html="http://www.w3.org/1999/xhtml"
xmlnsU0003Afoo="http://www.example.com/ns/foo"
fooU0003Abar="baz"><html:head/><html:body/></html:html>
}}}
Note the `U0003A` which is Unicode for ":" (colon) and the fact that it
works with the `xmlns:html` attribute which makes me think it's hardcoded.


Please provide any additional information below.
Given:
{{{
html = tree.getroot()
}}}
`html.nsmap` does not contain the `foo` namespace. Also, the attribute
`xmlns:foo` does not exist in `html.attrib`, but instead
`html.attrib['xmlnsU0003Afoo']` works.

I assume it has nothing to do with the treebuilder (`lxml`) since parsing
the same string with the included HTML parser (`lxml.html`) works as
expected.
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
h***@public.gmane.org
2012-09-04 15:39:59 UTC
Permalink
Updates:
Status: Invalid

Comment #1 on issue 212 by t.broyer: Incorrect parsing of attribute names
that contain the ":" (colon) character
http://code.google.com/p/html5lib/issues/detail?id=212

This is the expected behavior.
See
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#coercing-an-html-dom-into-an-infoset

See also issue 131
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
h***@public.gmane.org
2012-09-04 18:21:45 UTC
Permalink
Comment #2 on issue 212 by alexaho...-***@public.gmane.org: Incorrect parsing of
attribute names that contain the ":" (colon) character
http://code.google.com/p/html5lib/issues/detail?id=212

Thank you for your answer!

There are a lot of "if"s there ("If the XML API being used restricts")
which means that parsers are not supposed to restrict the allowable
characters and that means that it is not impossible to allow ":"s (colons)
in the future. In that case, what is the recommended thing to do? Should I
replace Unicode codes found in attribute names or look for both `xmlns:foo`
and `xmlnsU0003Afoo`?
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
Loading...