An issue was discovered in lxml before 4.2.5. lxml/html/clean.py in the lxml.html.clean module does not remove javascript: URLs that use escaping, allowing a remote attacker to conduct XSS attacks, as demonstrated by "j a v a s c r i p t:" in Internet Explorer. This is a similar issue to CVE-2014-3146. References: https://github.com/lxml/lxml/commit/6be1d081b49c97cfd7b3fbd934a193b668629109
Created python-lxml tracking bugs for this issue: Affects: fedora-all [bug 1660236]
Easy to reproduce. As an example, '<a href="javascrip%20t%20:evil_function()">poc</a>' should be cleaned to '<a href="">poc</a>' but isn't. Apparently Internet Explorer can somehow execute "j a v a s c r i p t:" (with spaces). I don't have any experience with that, but I'll trust upstream.