Discussion:
Issue 217 in html5lib: invalid characters in tag and attribute names
h***@public.gmane.org
2012-11-20 22:54:25 UTC
Permalink
Status: New
Owner: ----

New issue 217 by szepe.vi...-***@public.gmane.org: invalid characters in tag and
attribute names
http://code.google.com/p/html5lib/issues/detail?id=217

original "HTML" code:

pedig meg is szólaltattuk. <FOTÓGALÉRIA, VIDEÓ</b>

after html5lib.HTMLParser:

pedig meg is szólaltattuk. <fotÓgalÉria, b="" videÓ<=""></fotÓgalÉria,>

then HTMLParser can't handle tag and attributes with accented characters
and commas (etc.) in them:

Exception in html2text: <class 'HTMLParser.HTMLParseError'>; malformed
start tag, at line 89, column 154


"What is the expected output? What do you see instead?"

<fotgalria b="" vide=""></fotgalria>

Thank you!
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+***@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
h***@public.gmane.org
2012-11-21 09:03:17 UTC
Permalink
Updates:
Status: Invalid

Comment #1 on issue 217 by ja...-***@public.gmane.org: invalid characters in
tag and attribute names
http://code.google.com/p/html5lib/issues/detail?id=217

This behaviour is correct per spec; tag names and so on are converted to
ascii lowercase.
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
h***@public.gmane.org
2012-11-21 10:44:33 UTC
Permalink
Comment #2 on issue 217 by szepe.vi...-***@public.gmane.org: invalid characters in tag
and attribute names
http://code.google.com/p/html5lib/issues/detail?id=217

As I see, html5lib returns "<fotÓgalÉria>" as a tagname.
So "Ó" and "É" are NOT dropped.

Please implement this.

Thank you!!
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+***@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
h***@public.gmane.org
2012-11-21 10:49:14 UTC
Permalink
Comment #3 on issue 217 by ja...-***@public.gmane.org: invalid characters in
tag and attribute names
http://code.google.com/p/html5lib/issues/detail?id=217

Ó and É are not characters in ascii, therefore they don't get lowercased.
Compare with what browsers do:
http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1922
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+***@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
h***@public.gmane.org
2012-11-21 11:38:20 UTC
Permalink
Comment #4 on issue 217 by szepe.vi...-***@public.gmane.org: invalid characters in tag
and attribute names
http://code.google.com/p/html5lib/issues/detail?id=217

Thank you for the demonstration!
After all I use lxml to strip invalid characters.
--
You received this message because you are subscribed to the Google Groups "html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to html5lib-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
Loading...