Bug 74088

Summary: DB_File broken in perl-DB_File-1.804-51 -- can't store key
Product: [Retired] Red Hat Raw Hide Reporter: Jonathan Kamens <jik>
Component: perlAssignee: Warren Togami <wtogami>
Status: CLOSED CURRENTRELEASE QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.0   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-05-28 06:59:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test database file
none
C test case
none
Perl test case none

Description Jonathan Kamens 2002-09-15 18:18:25 UTC
Save the attached files in a directory and run these commands:

bunzip2 testdb.db.bz2 ; gcc -o testdb testdb.c -ldb ; chmod +x testdb.pl ; cp
testdb.db pltest.db ; cp testdb.db ctest.db ; ./testdb.pl pltest.db ; ./testdb
ctest.db ; db_dump testdb.db | db_load pltest2.db ; ./testdb.pl pltest2.db

You will see this output:

noaddr CMfR 2:  (11287 keys before, 0 keys after)
noaddr CMfR 2: 1 (11287 keys before, 11288 keys after)
noaddr CMfR 2: 1 (11287 keys before, 11288 keys after)

What this proves is that a C file accessing the db4 library directly was able to
store a key in the database that Perl was unable to store.  Furthermore, Perl
was able to store the key in the database after it was dumped and reloaded.

I have perl-5.8.0-51, perl-DB_File-1.804-51, and db4-4.0.14-14.

Comment 1 Jonathan Kamens 2002-09-15 18:19:12 UTC
Created attachment 76212 [details]
test database file

Comment 2 Jonathan Kamens 2002-09-15 18:19:42 UTC
Created attachment 76213 [details]
C test case

Comment 3 Jonathan Kamens 2002-09-15 18:20:09 UTC
Created attachment 76214 [details]
Perl test case

Comment 4 Chip Turner 2002-09-16 21:35:16 UTC
looks like you're running into UTF8 issues.  when I changed your key to be valid
ascii (ie, all bytes < 127), it seems to work.  also, if I 'utf8::upgrade($key)'
it works as well.  this is almost certainly a bug in DB_File itself.  it looks
like the data in your db was possibly already utf8 encoded; where does it come
from?  a result of disk IO?

Comment 5 Jonathan Kamens 2002-09-17 09:59:57 UTC
It's an IP address in binary (packed sockaddr_in) format.  It is not UTF8 data,
and neither Perl nor DB_File should not be treating it like UTF8 data.  There's
nothing that says I can't use binary keys in hashes or databases.

If it is a UTF8 problem, it's certainly a weird one, given that the key is added
successfully by Perl if I dump and reload the database.

I don't really care whether it's a UTF8 bug or some other kind of bug; all I
know is that it is a bug, and it has the potential to to affect anyone who uses
DB_File with binary keys (and since we haven't even proven that it's dependent
on the key, perhaps even non-binary keys!), and it thus seems like a rather
significant bug.


Comment 6 Chip Turner 2002-09-17 15:44:35 UTC
perl 5.8.0 (and 5.6.1 to a lesser extent) has been fully tooled for utf8 inside
and out.  you may not intend the data to be utf8, but the internal data model
perl uses IS utf8 in almost every case, more so than it has been in previous
releases.  can you provide a smaller test case that doesn't involve your already
existing database?  preferably a test case that creates the database itself.

typically these kinds of utf8 issues show up when dealing with perl modules that
have C bindings, especially the Digest:: modules, but possibly this case as
well.  if you can simplify your test case I will submit the issue upstream to
the perl maintainers.

Comment 7 Jonathan Kamens 2002-09-17 17:16:05 UTC
Perl utf8 support is only supposed to affect actual program source code, and
this problem was occurring even in a dynamically generated variable; the only
reason the key is a constant in the test case I submitted is to reduce the size
of the test case.  Furthermore, I don't have "use utf8" in my source code, which
means Perl shouldn't be enabling its utf8 support, and the problem occurs even
if I put "no utf8" in the source code explicitly.

Even if the bug *is* in some way due to the new utf8 source code, that doesn't
change the fact that it's a bug.

No, I can't reduce the test case any further.  The database having this problem
has been built up over the course of many years; I have no idea which particular
operations and in what order would cause the bug to manifest itself.  That is
why I included the database as part of the test case. The database attachment is
only 137KB bzipped; I hardly think that's so large that it can't be passed upstream.

This looks to me like it has the potential to be a rather serious bug.  Given
that, I don't understand why it seems like you're looking for a reason to ignore
it rather than aggressively pursuing it, especially when I've given you a simple
test case which reproduces it on demand.


Comment 8 Chip Turner 2002-09-17 18:30:43 UTC
A smaller test case that shows the bug more simply is always better than one
that doesn't.  A 12k entry database isn't as good a test case as a smaller,
simpler, self-contained test case.  If one isn't available, then we'll do what
we can, but chances of getting a fix from upstream are much greater if the test
case is simpler.  I'm sorry that you feel this is some attempt to avoid the issue.

utf8 issues do indeed come up more than just in the actual source of your
program.    an example is this:
echo y | perl -e 'use Devel::Peek; my $x = "y\n"; Dump($x); $x = <>; Dump($x)'

Notice that they are the same string, but one has the UTF8 flag.  This causes a
number of differences in how perl treats the variable internally.

Also notice:
perl -le 'use Socket; print inet_aton("200.200.200.200")' | hexdump

When printed, the 4 bytes become 8 bytes.  If you use binmode and a perlio-ism
(perldoc PerlIO), it becomes correct:

perl -le 'use Socket; binmode(STDOUT, ":raw"); print
inet_aton("200.200.200.200")' | hexdump

utf8 is most definitely more than just an issue surrounding what encoding your
.pl or .pm file uses.  Also note that the above snippets behave very differently
in perl 5.6.1.

Getting these issues solved is very high priority.  The solution comes quicker
with simpler test cases.  There are also issues with other modules, though, so
any help you can provide will make the process much smoother.

Comment 9 Warren Togami 2005-05-28 06:59:58 UTC
Closing due to inactivity.  Assuming this if fixed with modern perl.