Bug 479291 - gdbm creates unexpectedly large db files on PPC platforms [RHEL-5]
gdbm creates unexpectedly large db files on PPC platforms [RHEL-5]
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: gdbm (Show other bugs)
5.2
All Linux
low Severity low
: rc
: ---
Assigned To: Karel Klíč
BaseOS QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-01-08 12:25 EST by Tomas Hoger
Modified: 2010-11-09 08:19 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-04-19 09:28:48 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Test cases (2.87 KB, application/x-compressed-tar)
2009-01-08 12:42 EST, Tomas Hoger
no flags Details

  None (edit)
Description Tomas Hoger 2009-01-08 12:25:49 EST
Description of problem:
gdbm seems to create unexpectedly large db files on PPC platforms, compared to other platforms, or to what it produced in RHEL-4 version or RHEL-5 GA version (~ 10 times larger).  This problem was spotted due to an unexpected rpmdiff error for rebuild of the RHEL-5 package.

Version-Release number of selected component (if applicable):
gdbm-1.8.0-26.2.1

Steps to Reproduce:
see below
Comment 2 Tomas Hoger 2009-01-08 12:42:31 EST
Created attachment 328477 [details]
Test cases

This test is based on what's used in avahi, as avahi rebuild on EL5 causes on .db file generated during the build to grow unexpectedly.

Archive contains:
- build-db - python script using gdbm module (from avahi)
- service-types - input file that is turned to .db using build-db during
  avahi build
- c-build-db.c - quick-n-dirty re-implementation of build-db to make sure it's
  not the python module to blame for the .db file growth

When running build-db on up-to-date RHEL5 PPC machine, generated .db file has ~200k, while avahi version from RHEL5 GA contains .db file with size ~15k (similar to other platforms).

Checking other versions as well, on EL4 it has ~25k and on F10 ~15k.

EL4 PPC

# make
gcc -Wall -lgdbm c-build-db.c -o c-build-db
./build-db
./c-build-db

ls -l service-types*.db
-rw-------  1 root root 26453 Jan  8  2009 service-types.c.db
-rw-r--r--  1 root root 26453 Jan  8  2009 service-types.db

file service-types*.db
service-types.c.db: GNU dbm 1.x or ndbm database, big endian
service-types.db:   GNU dbm 1.x or ndbm database, big endian

./read-db service-types.db | sort> out.py.txt
./read-db service-types.c.db | sort > out.c.txt
diff -u out.py.txt out.c.txt

md5sum service-types*.db
51987a1dadc4fac56d6c07ed4080eabb  service-types.c.db
51987a1dadc4fac56d6c07ed4080eabb  service-types.db


EL5 PPC

# make
gcc -Wall -lgdbm c-build-db.c -o c-build-db
./build-db
./c-build-db

ls -l service-types*.db
-rw------- 1 root root 198485 Jan  8 12:00 service-types.c.db
-rw-r--r-- 1 root root 198485 Jan  8 12:00 service-types.db

file service-types*.db
service-types.c.db: GNU dbm 1.x or ndbm database, big endian
service-types.db:   GNU dbm 1.x or ndbm database, big endian

./read-db service-types.db | sort> out.py.txt
./read-db service-types.c.db | sort > out.c.txt
diff -u out.py.txt out.c.txt

md5sum service-types*.db
b1758d78c36512263c234da6df1957fd  service-types.c.db
b1758d78c36512263c234da6df1957fd  service-types.db


F10 PPC

# make
gcc -Wall -lgdbm c-build-db.c -o c-build-db
./build-db
./c-build-db

ls -l service-types*.db
-rw------- 1 root root 14165 Jan  8 18:02 service-types.c.db
-rw-r--r-- 1 root root 14165 Jan  8 18:02 service-types.db

file service-types*.db
service-types.c.db: GNU dbm 1.x or ndbm database, big endian
service-types.db:   GNU dbm 1.x or ndbm database, big endian

./read-db service-types.db | sort> out.py.txt
./read-db service-types.c.db | sort > out.c.txt
diff -u out.py.txt out.c.txt

md5sum service-types*.db
9c8d351703debe47a225ac27481166fc  service-types.c.db
44c395c348a623b99bcd62cad9a6ef36  service-types.db


.db file on EL5 works fine and contains same content as old .db from RHEL5 GA avahi packages, but it seems to contain lots of extra \0s.
Comment 3 RHEL Product and Program Management 2009-03-26 12:48:09 EDT
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".
Comment 4 Karel Klíč 2010-04-16 10:22:41 EDT
Reproduced. 

On F-12 x86:
-rw-------.  1 kklic kklic 14165 2010-04-16 15:48 service-types.c.db
-rw-rw-r--.  1 kklic kklic 14165 2010-04-16 15:49 service-types.db

On RHEL-5.5 PPC64:
-rw------- 1 root root 198485 Apr 16 10:15 service-types.c.db
-rw-rw-r-- 1 root root 198485 Apr 16 10:15 service-types.db

[root@ibm-sf2a-lp2 ~]# uname -a
Linux ibm-sf2a-lp2.rhts.eng.rdu.redhat.com 2.6.18-194.el5 #1 SMP Tue Mar 16 22:03:12 EDT 2010 ppc64 ppc64 ppc64 GNU/Linux
Comment 5 Karel Klíč 2010-04-16 12:50:19 EDT
It is a compilation/gcc/environment issue. Even when I compile gdbm from RHEL-5.5 on RHEL-4.8 PPC64 machine, it correctly creates small db files.

On RHEL-4.8 PPC64:
-rw-------  1 root root 14165 Apr 16 12:40 service-types.c.db
-rw-rw-r--  1 root root 14165 Apr 16 12:40 service-types.db

[root@ibm-js20-5 test]# uname -a
Linux ibm-js20-5.rhts.eng.rdu.redhat.com 2.6.9-89.EL #1 SMP Mon Apr 20 10:25:13 EDT 2009 ppc64 ppc64 ppc64 GNU/Linux

RHEL-4.8 compilation:
gcc -c -DHAVE_CONFIG_H -I. -I. -O2 -g -pipe -m32 -fsigned-char update.c  -fPIC -DPIC -o .libs/update.o

RHEL-5.5 compilation: 
gcc -c -DHAVE_CONFIG_H -I. -I. -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 update.c  -fPIC -DPIC -o .libs/update.o
Comment 6 Karel Klíč 2010-04-19 09:28:48 EDT
It is neither a compilation/gcc issue, nor gdbm issue at the end.

The stat function (see `man 2 stat`) returns struct stat, which contains st_blksize field. This field field gives the "preferred" blocksize for efficient file system  I/O. Gdbm uses st_blksize to determine its internal database block size, if the size is not provided when calling function gdbm_open(file, block_size, ...).

On RHEL-5.5 PPC kernel the preferred I/O block size reported 
in st_blksize: 65536 bytes

On Fedora 12 x86: 4096 bytes

On RHEL-6 PPC: 4096 bytes

The attached example uses 3 blocks, so the database file takes approx. 3*4096 bytes on RHEL-6 and 3*65536 bytes on RHEL-5.5.

Tomas, to me it seems to be a kernel attribute rather than a flaw. Also the gdbm use of st_blksize is appropriate. Closing as NOTABUG, so please reopen if you do not agree with this.
Comment 7 Tomas Hoger 2010-04-19 09:41:29 EDT
Thank you for digging out the details.  I'll trust your analysis here.

Note You need to log in before you can comment on or make changes to this bug.