215184 – 'yum update' segfaults

Bug 215184 - 'yum update' segfaults

Summary: 'yum update' segfaults

Keywords:
Status:	CLOSED DUPLICATE of bug 213963
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	rpm
Sub Component:
Version:	6
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Paul Nasrat
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-11-12 01:33 UTC by Michal Jaegermann
Modified:	2014-01-21 22:55 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2007-07-17 12:45:28 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Michal Jaegermann 2006-11-12 01:33:52 UTC

Description of problem:

.....
  Updating  : eel2                         ####################### [ 2/29]
  Updating  : policycoreutils              ####################### [ 3/29]
  Updating  : nautilus-extensions          ####################### [ 4/29]
  Updating  : selinux-policy               ####################### [ 5/29]
  Updating  : gnome-vfs2-devel             ####################### [ 6/29]
  Updating  : gnome-vfs2-smb               ####################### [ 7/29]
  Updating  : nautilus-cd-burner           ####################### [ 8/29]
  Installing: policycoreutils-newrole      ####################### [ 9/29]
  Updating  : perl-DateManip               ####################### [10/29]
  Updating  : nautilus                     ####################### [11/29]
  Updating  : eel2-devel                   ####################### [12/29]
  Updating  : selinux-policy-targeted      ####################### [13/29]
  Updating  : yumex                        ####################### [14/29]
  Updating  : selinux-policy-strict        ####################### [15/29]
Segmentation fault

I did not have 'ulimit -c' set higher than 0 at that time.  Sigh.
This left me with the following list of duplicates

eel2-2.16.0-1.fc6.i386
eel2-2.16.1-1.fc6.i386
eel2-devel-2.16.0-1.fc6.i386
eel2-devel-2.16.1-1.fc6.i386
gnome-vfs2-2.16.0-4.fc6.i386
gnome-vfs2-2.16.2-1.fc6.i386
gnome-vfs2-devel-2.16.0-4.fc6.i386
gnome-vfs2-devel-2.16.2-1.fc6.i386
gnome-vfs2-smb-2.16.0-4.fc6.i386
gnome-vfs2-smb-2.16.2-1.fc6.i386
nautilus-2.16.0-7.fc6.i386
nautilus-2.16.2-3.fc6.i386
nautilus-cd-burner-2.16.0-3.fc6.i386
nautilus-cd-burner-2.16.0-5.fc6.i386
nautilus-extensions-2.16.0-7.fc6.i386
nautilus-extensions-2.16.2-3.fc6.i386
perl-DateManip-5.44-1.2.1.noarch
perl-DateManip-5.44-2.fc6.noarch
policycoreutils-1.30.30-1.i386
policycoreutils-1.32-2.fc6.i386
selinux-policy-2.3.18-10.noarch
selinux-policy-2.4.3-2.fc6.noarch
selinux-policy-strict-2.3.18-10.noarch
selinux-policy-strict-2.4.3-2.fc6.noarch
selinux-policy-targeted-2.3.18-10.noarch
selinux-policy-targeted-2.4.3-2.fc6.noarch
yumex-1.1.7-1.0.fc6.noarch
yumex-1.2.0-1.0.fc6.noarch

and just removing older version is not enough because after that
'rpm -V ...' on correspodning package is often unhappy.

Maybe this is really the same as bug 215180 with rpm?  There is
some gdb backtrace there.

Version-Release number of selected component (if applicable):
rpm-4.4.2-32

How reproducible:
hard to tell but fc6 seems to have various troubles with yum/rpm.

Comment 1 Paul Nasrat 2006-11-16 08:39:15 UTC

Without the core it's pretty impossible to say what went wrong.  If we blow up
mid transaction then yes the system may be in an inconsistent (what is on disk
vs rpmdb) state.  Remove the dupes then reinstall the highest version of packages.

Can you give me any more information?

Comment 2 Michal Jaegermann 2006-11-16 17:47:53 UTC

> Without the core it's pretty impossible to say what went wrong.
Yes.  Unfortunately a default is 'ulimit -c 0'.

> Can you give me any more information?
I am afraid that the above is all I managed to collect.  As mentioned
it is quite possible that bug 215180 points to an underlying cause.
I got that one right after I did 'ulimit -c unlimited' and attempted
to collect some information which would tell more what happened
here.

It is rather nasty that yum failures, for whatever real reasons,
leave a system in a state which requires often massive repairs.  It
happened to me more than on one occasion.  I understand reasons behind
transaction orderings but maybe it would be possible to split bigger
transactions into small consistent groups each with its own
"Update-or-Install/Cleanup" cycle?  If something would go wrong then
a resulting mess would range from minimal to maybe nonexistent.

Maybe it was not self-evident in what I wrote but a big, or maybe even
main, part of this report was really that users are left with not
really obvious cleanup jobs.  Recently I had to "straighten up"
an installation with hundreds, literally, duplicates accumulated
by its owner clearly over months and the owner was not doing
anything nasty.  Probably only his rpm database got hiccups.

I can do such splits as suggested above even right now by checking
updates first and later updating packages from a resulting list one by
one, leaving to yum to resolve dependencies as they come, in multiple
yum invocations.  It would be so much simpler if yum could do the
same job internally (and then I would not be afraid to ask an "average
user" to do that). That way to proceed would also likely be
faster with longer update lists on machines without massive amounts
of memory.

Comment 3 Jeff Johnson 2006-12-03 18:43:30 UTC

Segafualts and loss of data are likely due to removing an rpmdb environment
without correcting other problems in the rpmdb.

FYI: Most rpmdb "hangs" are now definitely fixed by purging stale read locks when opening
a database environment in rpm-4.4.8-0.4. There's more todo, but I'm quite sure that a
large class of problems with symptoms of "hang" are now corrected.

Detecting damaged by verifying when needed is well automated in rpm-4.4.8-0.4. Automatically 
correcting all possible damage is going to take more work, but a large class of problems is likely
already fixed in rpm-4.4.8-0.8 as well.

UPSTREAM

Comment 4 Michal Jaegermann 2006-12-03 19:34:00 UTC

Again!  This bug report is mostly about _yum_ messing up heavily
due to unwarranted assuptions that nothing will fail in an underlying
infrastructure.  I have no idea who switched here a component to
"rpm" but for me this is an obvious misunderstanding and an ERROR.

What went wrong is really not that important here.  Two days ago
I was doing an update of around 30 packages on some FC6 system.  I went
away and when I came back I was staring at a gdm login screen.
No idea about this failure reasons.  There were no traces of anything
amiss in logs.  Results were that _all_ packages in that transaction
ended up as duplicates and cleaning that up is far from automatic.
This is, again, yum problem and see, please, last three paragraphs in
comment #2.  The issue could be at least mitigated.  Yes, it
would be nice to have an absolute reliability but we not have that,
I am 'fraid.  Actually, in recent times, what was a rare oddity,
i.e. yum bailing out in a middle of a transaction, for some reasons
seems to become a quite frequent occurence.

BTW - the simplest way to recover from such mess appears to 'rpm -e ...'
all _newer_, just installed, packages from duplicate pairs and to run
an update again.  Otherwise there are pretty good chances that you
will end up with some missing files in what was left.

I may also underestimate a frequency of past hits.  Relatively
recently I was dealing with an FC4 installation, which was cleaned
up and upgraded to FC6, where were literally hundreds of duplicate,
and in some cases triplicate, packages installed with different
versions.  An owner was unable to provide an explanation how he
ended up in such state but I suspect multiple upgrades where yum
terminated prematurely (for whatever reasons).

Comment 5 Ian Collier 2006-12-12 23:30:40 UTC

There are so many open bugs now about yum and rpm crashing in FC6 that it's hard
to know which one to comment on.  However, if anyone's interested I have about a
gigabyte of saved /var/lib/rpm databases...

The most interesting thing to me is that on my last update I looked at the
database state *before* invoking yum, and it was different from the last saved
copy, with no updates or queries having happened in the meantime (there was only
one different file, namely __db.002).

So, I did this: yum install 'ImageMagick-perl' and, as luck would have it, it
died with a segfault.  I reset the database to the last known state, repeated
the install, and it succeeded.

Therefore I strongly suspect that yum-updatesd is corrupting the rpm cache. 
It's not running right now (I can't remember whether I shut it down deliberately
or whether it must have crashed) and my rpm database is the same as it was two
days ago.  I think I might keep it turned off - it looks like my life will be a
lot easier that way.

So:
http://users.comlab.ox.ac.uk/ian.collier/debug/rpm/15/ 
 - complete contents of /var/lib/rpm after previous update on Dec 7th
http://users.comlab.ox.ac.uk/ian.collier/debug/rpm/16/before/__db.002
 - copy of the changed file as it existed before executing yum on Dec 10th
http://users.comlab.ox.ac.uk/ian.collier/debug/rpm/16/
 - contents of the database after yum crashed
http://users.comlab.ox.ac.uk/ian.collier/debug/rpm/16/core.25910
 - core file from the crashed "yum" process

(I assume there's nothing vital to my security in any of these files...)

Comment 6 Michal Jaegermann 2006-12-12 23:47:03 UTC

> Therefore I strongly suspect that yum-updatesd is corrupting the
> rpm cache.

I think that this is a false track.  I do not run yum-updatesd;
especially while on a DSL line.  So far, for me, it turned out
to be much bigger bother than help.

I agree that ultimately problems seems to be caused by troubles
in rpm databases which sometimes will be handled and sometimes
not - depending on a phase of a moon.

Comment 7 Jeff Johnson 2006-12-13 01:47:48 UTC

Again, I suggest rpm-4.4.8-0.4 or later. Working well for me, hangs are gone, and verify is automated.

The real fix is in yum, which plain and simply does not need to reopen and rpmdb thousands of times per 
transaction in order to handle ^C.

Comment 8 Michel Van den Bergh 2006-12-21 09:43:01 UTC

(In reply to comment #7)
> Again, I suggest rpm-4.4.8-0.4 or later. Working well for me, hangs are gone,
and verify is automated.
> 
> The real fix is in yum, which plain and simply does not need to reopen and
rpmdb thousands of times per 
> transaction in order to handle ^C.

Could you please be a little more helpful and tell us where we can find
rpm-4.4.8? Even fedora development has only 4.4.2. This is a crucial system
component that is not working properly. I think it should be treated with a
little more consideration.

Comment 9 Panu Matilainen 2007-07-17 12:45:28 UTC


*** This bug has been marked as a duplicate of 213963 ***

Note You need to log in before you can comment on or make changes to this bug.