123142 – Bogus UTF-8 char data can be input

Bug 123142 - Bogus UTF-8 char data can be input

Summary: Bogus UTF-8 char data can be input

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Bugzilla
Classification:	Community
Component:	Bugzilla General
Sub Component:
Version:	3.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	PnT DevOps Devs
QA Contact:	David Lawrence
Docs Contact:
URL:	https://bugzilla.redhat.com/bugzilla/...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-05-12 22:37 UTC by Toshiyuki Takamiya
Modified:	2015-06-02 01:18 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-11-04 21:53:26 UTC
Embargoed:

Attachments	(Terms of Use)
Perl subroutines to check for good UTF-8/chars (3.05 KB, text/plain) 2004-05-12 22:39 UTC, Eido Inoue	no flags	Details
this one has an improved not_a_char subroutine (2.16 KB, text/plain) 2004-05-12 22:45 UTC, Eido Inoue	no flags	Details
Show Obsolete (1) View All

Description Eido Inoue 2004-05-12 22:37:45 UTC

Description of problem:
Bad browsers can inject non-utf-8 into text fields

How reproducible:
always

Steps to Reproduce:
1. ask warren to do what we did for bug 122992 :)

Actual results:
Some non-UTF-8 in the description

Expected results:
Bugzilla should not allow bad character data, and the database should
be cleaned of any non-utf8 text

Additional info:

additionally, after checking the data for UTF-8-ness, the checked
UTF-8 string should be run through function NFKC($string) in
Unicode::Normalize module -- although this may be computation
expensive (to guarantee that most browsers have a chance of displaying
the Unicode should some valid-but-wacko Unicode make it through)

Comment 1 Eido Inoue 2004-05-12 22:39:12 UTC

Created attachment 100201 [details]
Perl subroutines to check for good UTF-8/chars

simple perl subroutine that checks for good utf-8 and does a very simple sanity
check on the Unicode (no noncharacter codepoints... as of Unicode 4.0)

Comment 2 Eido Inoue 2004-05-12 22:45:24 UTC

Created attachment 100203 [details]
this one has an improved not_a_char subroutine

slightly shorter and faster not_a_char function... also checks for chars >
U-10FFFF, which are obsoleted as of Unicode 3.0

Comment 3 David Lawrence 2006-04-08 18:03:35 UTC

Red Hat's current Bugzilla version is 2.18. I am moving all older open bugs to
this version. Any bugs against the older versions will need to be verified that
they are still bugs. This will help me also to sort them better.

Comment 5 David Lawrence 2008-09-16 16:50:56 UTC

Red Hat Bugzilla is now using version 3.2 of the Bugzilla codebase and therefore this bug will need to be re-verified against the new release. With the updated code this bug may no longer be relevant or may have been fixed in the new code.
Updating bug version to 3.2.

Note You need to log in before you can comment on or make changes to this bug.