Bug 215180

Summary:	rpm segfaults on an attempt to rebuild database
Product:	[Fedora] Fedora	Reporter:	Michal Jaegermann <michal>
Component:	rpm	Assignee:	Paul Nasrat <nobody+pnasrat>
Status:	CLOSED DUPLICATE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6	CC:	imc
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2007-07-17 12:44:25 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Michal Jaegermann 2006-11-12 00:35:58 UTC

Description of problem:

After 'rpm --rebuilddb --verbose' I got a segfault an a core (I just
turn cores on).  With data from rpm-debuginfo gdb produces the following

Core was generated by `/usr/lib/rpm/rpmd --rebuilddb --verbose'.
Program terminated with signal 11, Segmentation fault.
#0  __memp_fget_rpmdb (dbmfp=0x9c55788, pgnoaddr=0xbfefbbac, flags=0,
    addrp=0xbfefbb88) at ../db/dist/../mp/mp_fget.c:190
190     ../db/dist/../mp/mp_fget.c: No such file or directory.
        in ../db/dist/../mp/mp_fget.c
(gdb) where
#0  __memp_fget_rpmdb (dbmfp=0x9c55788, pgnoaddr=0xbfefbbac, flags=0,
    addrp=0xbfefbb88) at ../db/dist/../mp/mp_fget.c:190
#1  0x003c8510 in __db_goff_rpmdb (dbp=0x9c55488, dbt=0x9c5899c, tlen=12052,
    pgno=6916, bpp=0x9c55914, bpsz=0x9c5591c)
    at ../db/dist/../db/db_overflow.c:147
#2  0x003cfe4d in __db_ret_rpmdb (dbp=0x9c55488, h=0xb7c935c4, indx=11,
    dbt=0x9c5899c, memp=0x9c55914, memsize=0x9c5591c)
    at ../db/dist/../db/db_ret.c:50
#3  0x003bb115 in __db_c_get_rpmdb (dbc_arg=0x9c558c8, key=0x9c58984,
    data=0x9c5899c, flags=<value optimized out>)
    at ../db/dist/../db/db_cam.c:778
#4  0x003c15f6 in __db_c_get_pp_rpmdb (dbc=0x9c558c8, key=0x9c58984,
    data=0x9c5899c, flags=18) at ../db/dist/../db/db_iface.c:1741
#5  0x00351706 in db3cget (dbi=0x9c54f30, dbcursor=0x5704db86, key=0x9c58984,
    data=0x9c5899c, flags=1459936134) at db3.c:612
#6  0x0034d333 in rpmdbNextIterator (mi=0x9c58968) at rpmdb.h:591
#7  0x0034ee04 in rpmdbRebuild (prefix=0x9c41f30 "/", ts=0x9c53cd8,
    hdrchk=0x160830 <headerCheck>) at rpmdb.c:3854
#8  0x00184af6 in rpmtsRebuildDB (ts=0x9c53cd8) at rpmts.c:209
#9  0x08049822 in main (argc=3, argv=Cannot access memory at address 0x5704db8a
) at ./rpmqv.c:633
#10 0x00a3cf2c in __libc_start_main () from /lib/libc.so.6
#11 0x080490c1 in _start ()
(gdb)

Locations like "../db/dist/../mp/mp_fget.c:190" are somewhat nasty
to look at but it is possible to find the file outside of gdb.
The code in question looks like this:

	/* Search the hash chain for the page. */
retry:	st_hsearch = 0;
	MUTEX_LOCK(dbenv, &hp->hash_mutex);
	for (bhp = SH_TAILQ_FIRST(&hp->hash_bucket, __bh);
	    bhp != NULL; bhp = SH_TAILQ_NEXT(bhp, hq, __bh)) {
		++st_hsearch;
-- bomb! -->	if (bhp->pgno != *pgnoaddr || bhp->mf_offset != mf_offset)
			continue;

and gdb prints

gdb) p bhp
$1 = (BH *) 0x5704db86
(gdb) p *bhp
Cannot access memory at address 0x5704db86
(gdb) p pgnoaddr
$2 = (db_pgno_t *) 0xbfefbbac
(gdb) p bhp->pgno
Cannot access memory at address 0x5704dbfa

Trying to access memory which was already freed?

Version-Release number of selected component (if applicable):
rpm-4.4.2-32

How reproducible:
the next attempt of --rebuilddb succeeded but I tried that because
I got a segfault from yum during an installation and maybe this
was really an rpm fault?

Comment 1 Jeff Johnson 2006-11-12 04:57:06 UTC

The segfault is likely the result of bad data, which is likely corrected by --rebuilddb.

Comment 2 Michal Jaegermann 2006-11-12 05:15:27 UTC

> The segfault is likely the result of bad data ...
These "bad data" were produced by nothing else but rpm and
an attempt to correct that resulted in a segfault.  Luckily
the condition did not persist.

Comment 3 Michal Jaegermann 2006-11-12 05:21:22 UTC

BTW - segfault in 'yum update' mentioned in the report is now
bug 215184.  Not much information there, I am afraid, beyond nasty
result. It happened when all new packages were already installed
and now yum was supposed to do all cleanups; so it left me with
a pile of duplicates.

Comment 4 Jeff Johnson 2006-11-12 07:24:49 UTC

rpm (and Berkeley DB) relies on shared posix mutexes for locking to insure data integrity.

There's a rash of recent rpmdb problems, dunno the cause .... blame rpm which has not changed for 
over a year, certainly not the rpmdb code. YMMV.

A --dupes option can be added to rpm with this line in /etc/popt:

    rpm     alias --dupes   --qf '%|SOURCERPM?{%{name}.%{arch}}:{%|ARCH?{%{name}}:{%{name}-%
{version}}|}|
\n' --pipe "sort | uniq -d" \
        --POPTdesc=$"list duplicated packages"

Invoke as rpm -qa --dupes.

Comment 5 Jeff Johnson 2006-11-12 07:34:39 UTC

BTW, doing
    rm -f /var/lib/rpm/__db*
before --rebuilddb --verbose would have eliminated a corrupt cache.

Comment 6 Ian Collier 2006-11-18 16:12:17 UTC

In common with a few users, it seems, I'm finding rpm and yum very unstable
under FC6.  Just now:

# rpm -ivh /home/imc/rpmbuild/RPMS/i386/xli-1.17.0-6.fc6.i386.rpm
Preparing...                Segmentation fault (core dumped)

But where's my core file?

# ls -l core
ls: core: No such file or directory
# ulimit -c
unlimited

Comment 7 Michal Jaegermann 2006-11-18 17:14:53 UTC

> But where's my core file?

If you have 'ulimit -c' set to 'unlimited' then your core file will
really have a name like core.<process_id> so try 'ls -l core*'.
Also a process which dumped core may be a child with a different
context and a core is somewhere else (maybe /?).  To look for all
possible core files try, with a current updatedb, the following

   locate -r '/core\.[1-9]'

This may have a few wrong hits but not too many.

Comment 8 Jeff Johnson 2006-11-18 18:22:13 UTC

If you give me a ptr to a core using -ivv and the packages involved, I'll diagnose
the segfault.

Be forewarned: almost all segfaults in rpm are caused by bad data.

Comment 9 Jeff Johnson 2006-12-03 18:35:15 UTC

Segafualts and loss of data are likely due to removing an rpmdb environment
without correcting other problems in the rpmdb.

FYI: Most rpmdb "hangs" are now definitely fixed by purging stale read locks when opening
a database environment in rpm-4.4.8-0.4. There's more todo, but I'm quite sure that a
large class of problems with symptoms of "hang" are now corrected.

Detecting damaged by verifying when needed is well automated in rpm-4.4.8-0.4. Automatically 
correcting all possible damage is going to take more work, but a large class of problems is likely
already fixed in rpm-4.4.8-0.8 as well.

UPSTREAM

Comment 10 Panu Matilainen 2007-07-17 12:44:25 UTC


*** This bug has been marked as a duplicate of 213963 ***