Description of problem: Bad browsers can inject non-utf-8 into text fields How reproducible: always Steps to Reproduce: 1. ask warren to do what we did for bug 122992 :) Actual results: Some non-UTF-8 in the description Expected results: Bugzilla should not allow bad character data, and the database should be cleaned of any non-utf8 text Additional info: additionally, after checking the data for UTF-8-ness, the checked UTF-8 string should be run through function NFKC($string) in Unicode::Normalize module -- although this may be computation expensive (to guarantee that most browsers have a chance of displaying the Unicode should some valid-but-wacko Unicode make it through)
Created attachment 100201 [details] Perl subroutines to check for good UTF-8/chars simple perl subroutine that checks for good utf-8 and does a very simple sanity check on the Unicode (no noncharacter codepoints... as of Unicode 4.0)
Created attachment 100203 [details] this one has an improved not_a_char subroutine slightly shorter and faster not_a_char function... also checks for chars > U-10FFFF, which are obsoleted as of Unicode 3.0
Red Hat's current Bugzilla version is 2.18. I am moving all older open bugs to this version. Any bugs against the older versions will need to be verified that they are still bugs. This will help me also to sort them better.
Red Hat Bugzilla is now using version 3.2 of the Bugzilla codebase and therefore this bug will need to be re-verified against the new release. With the updated code this bug may no longer be relevant or may have been fixed in the new code. Updating bug version to 3.2.