Bug 492947

Summary: /etc/passwd moved to /etc/passwd.rpmsave during update transaction.
Product: [Fedora] Fedora Reporter: Lennart Poettering <lpoetter>
Component: rpmAssignee: Panu Matilainen <pmatilai>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: bnocera, erik-fedora, fche, ffesti, herrold, james.leddy, jlayton, jnovy, jpriddy, jreiser, jspaleta, mitr, mschmidt, n3npq, notting, pebolle, pmatilai, rjones, rvandolson, yersinia.spiros
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-04-03 07:59:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 446452    
Attachments:
Description Flags
My current yum.log
none
yum.log none

Description Lennart Poettering 2009-03-30 20:57:11 UTC
I upgraded my rawhide system today (last update before on fri) and fucking yum/rpm deleted my /etc/passwd.

I cannot tell you anything about the versions, since that made my system unusable and I am still busy trying to unfuck this mess here.

The person who is responsible for this disaster deservers to be hit with a big big stone.

Fuck man, this is so uncool. I mean I expect breakage from time to time in Rawhide. But this is too much. Especially with the release looming. I don't know who is resposible for this, but he should know that he pissed me off big time.

Comment 1 Jeff Johnson 2009-03-31 01:10:59 UTC
If you *really* want to do more than vent, you will need
to supply the details that convinced you that yum/rpm
deleted /etc/passwd. Otherwise *shrug* ...

Comment 2 Lennart Poettering 2009-03-31 01:34:50 UTC
During the upgrade the old passwd got renamed to /etc/passwd.rpmsave.

Also, before the yum upgrade everything was fine, afterwards it wasn't.

Comment 3 Jeff Johnson 2009-03-31 01:50:29 UTC
Good, that's a start. So /etc/passwd wasn't removed, just renamed.
Still a largish flaw, I'm not quibbling, just stating facts.

The .rpmsave suffix is appended iff the file resolution is FA_BACKUP.

The resolution FA_BACKUP is assigned iff a file is marked %config
(likely the case for /etc/passwd), and is modified wrto the digest
in the rpmdb.

Since digests have just changed from MD5 -> SHA256 (so "changed"
cannot be computed through usual means), what likely has
happened is that /etc/passwd contents ended up being renamed
with .rpmsave because of the switch from MD5 to SHA256 in
recent versions of F11 beta.

Yes still a flaw, but also not exactly an easy problem to solve once the
decision to switch from MD5 has been made.

hth Enjoy!

Comment 4 Lennart Poettering 2009-03-31 11:18:16 UTC
BTW. A lot of other important config files got renamed this way too. inittab, nsswitch.conf and more. I had to rename quite a few files beneath /etc back to their original names to get back to a bootable system.

Comment 5 Jeff Johnson 2009-03-31 12:41:00 UTC
Likel same %config digest check cause. Lather rinse repeat.

Comment 6 Bill Nottingham 2009-03-31 20:05:11 UTC
Can you attach your yum.log from the transaction in question, just to clarify what packages were updated?

Comment 7 John Reiser 2009-04-01 14:31:46 UTC
Sometimes I find that SELinux is involved.  As part of recovery, I boot to single user mode (by appending " single" to the kernel command line) and run "restorecon -R /" before proceeding.  Or if things are really bad, or just to make sure things work, then boot into rescue mode and create the file .autorelabel in the root directory (chroot /mnt/sysimage; touch /.autorelabel), then reboot into that root.

Comment 8 Lennart Poettering 2009-04-01 14:48:15 UTC
John: I don't have Selinux enabled.

Comment 9 Adam Jackson 2009-04-01 14:48:54 UTC
How on earth does a file marked %config _ever_ get renamed away, regardless of whether the checksum is computable.  That's just insane.

Comment 10 Lennart Poettering 2009-04-01 14:50:21 UTC
Created attachment 337550 [details]
My current yum.log

Comment 11 Lennart Poettering 2009-04-01 14:51:44 UTC
Bill, that attachment is my complete yum.log. Not sure which one was the transaction that went wrong.

Comment 13 James M. Leddy 2009-04-01 15:12:57 UTC
(In reply to comment #3)
> Good, that's a start. So /etc/passwd wasn't removed, just renamed.
> Still a largish flaw, I'm not quibbling, just stating facts.
> 
> The .rpmsave suffix is appended iff the file resolution is FA_BACKUP.
> 
> The resolution FA_BACKUP is assigned iff a file is marked %config
> (likely the case for /etc/passwd), and is modified wrto the digest
> in the rpmdb.
> 
> Since digests have just changed from MD5 -> SHA256 (so "changed"
> cannot be computed through usual means), what likely has
> happened is that /etc/passwd contents ended up being renamed
> with .rpmsave because of the switch from MD5 to SHA256 in
> recent versions of F11 beta.
> 
> Yes still a flaw, but also not exactly an easy problem to solve once the
> decision to switch from MD5 has been made.
> 

I don't know much about rpm, but how hard would it be to  write a script to enter all new sha values into the database?

And why Apr 1?  are these just the first two packages to use sha among many?  If so wouldn't we be in for a world of hurt?

Comment 14 Lennart Poettering 2009-04-01 15:18:23 UTC
What does this have to do with Apirl 1st? I posted this bug two days ago.

Comment 15 Jeff Layton 2009-04-01 15:21:29 UTC
I assume that there will need to be some way to regenerate the hashes before F11 is released. Otherwise F10 -> F11 upgrades will be break the same way, correct?

Comment 16 Miloslav Trmač 2009-04-01 15:29:26 UTC
The switch to SHA-256 was done a few weeks ago, and /etc/passwd is
%config(noreplace).

I did test the case of %config(noreplace) - I even prepared a patch to make
sure that %config(noreplace) is not moved to .rpmsave (see #479869).

Could someone retry the test cases in
https://bugzilla.redhat.com/show_bug.cgi?id=479869#c3 , please?

Comment 17 Miloslav Trmač 2009-04-01 15:32:29 UTC
(In reply to comment #16)
> Could someone retry the test cases in
> https://bugzilla.redhat.com/show_bug.cgi?id=479869#c3 , please?  
(You'll need to define %_binary_filedigest_algorithm to "1" to create MD5 packages, to "8" to create SHA-256 packages.)

Comment 18 Frank Ch. Eigler 2009-04-01 15:55:48 UTC
(In reply to comment #16)
> The switch to SHA-256 was done a few weeks ago

Can someone provide a link to the discussion I presume existed,
where the idea that new RPMs carry both MD5 and SHA1 hashes
must have been proposed and rejected?

Comment 19 James M. Leddy 2009-04-01 18:45:04 UTC
(In reply to comment #14)
> What does this have to do with Apirl 1st? I posted this bug two days ago.  

Whoops, was looking at the mailing list.

Comment 20 James M. Leddy 2009-04-01 18:58:18 UTC
(In reply to comment #18)
> (In reply to comment #16)
> > The switch to SHA-256 was done a few weeks ago
> 
> Can someone provide a link to the discussion I presume existed,
> where the idea that new RPMs carry both MD5 and SHA1 hashes
> must have been proposed and rejected?  

This is about file signatures (I presume somewhere in the rpm database).  RPMS themselves can already be signed by both MD5 and SHA1, iirc

Comment 21 Jef Spaleta 2009-04-01 19:57:40 UTC
Lennart,

From the yum log you did a significant number of updates on the 30th.

Can you do a quick scan of your /etc/ to see how many rpmsave files are on the system and see if other noreplace config files were impacted?  Maybe this is some sort of hiccup with the setup package specifically and not a general case noreplace problem.

-jef

Comment 22 James M. Leddy 2009-04-01 20:36:26 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > Could someone retry the test cases in
> > https://bugzilla.redhat.com/show_bug.cgi?id=479869#c3 , please?  
> (You'll need to define %_binary_filedigest_algorithm to "1" to create MD5
> packages, to "8" to create SHA-256 packages.)  

has it been carried forward?

from the yum logs:

Mar 04 21:51:01 Updated: rpm-libs-4.6.0-11.fc11.x86_64
Mar 18 12:24:04 Updated: rpm-libs-4.7.0-0.beta1.3.fc11.x86_64
Mar 30 19:37:44 Updated: rpm-libs-4.7.0-0.beta1.7.fc11.x86_64

Comment 23 Lennart Poettering 2009-04-01 23:15:22 UTC
(In reply to comment #21)
> Lennart,
> 
> From the yum log you did a significant number of updates on the 30th.
> 
> Can you do a quick scan of your /etc/ to see how many rpmsave files are on the
> system and see if other noreplace config files were impacted?  Maybe this is
> some sort of hiccup with the setup package specifically and not a general case
> noreplace problem.

As mentioned above quite a few config files got renamed. inittab, group, the sahdow files, nsswitch.conf. All in all about 20 files or so. Since I renamed them all back I cannot tell you this in more detail, sorry.

Comment 24 Panu Matilainen 2009-04-02 05:53:31 UTC
(In reply to comment #9)
> How on earth does a file marked %config _ever_ get renamed away, regardless of
> whether the checksum is computable.  That's just insane.  

That is the real question here. Digest change shouldn't affect %config(noreplace) behavior at all, and even for %config without noreplace it shouldn't have been *renamed* with no /etc/passwd left around at all.

I'll try to figure out what might cause such behavior, but if somebody can come up with an actual reproducer, that'd be most helpful (can't reproduce this with a single package with a config(noreplace) file in it, digest change or no)

Comment 25 Panu Matilainen 2009-04-02 07:20:11 UTC
Looking at the yum.log from comment #10:

...
Mar 30 21:13:13 Updated: 1:gnome-applets-2.25.92-4.fc11.x86_64
Mar 30 21:13:16 Updated: libselinux-devel-2.0.79-4.fc11.x86_64
Mar 30 21:13:16 Updated: gnome-common-2.26.0-1.fc11.noarch
Mar 30 21:13:17 Updated: libselinux-2.0.79-4.fc11.i586
Mär 30 21:25:04 Erased: tkinter
Mär 30 21:25:07 Erased: PersonalCopy-Lite-patches
Mär 30 21:25:09 Erased: audit-libs
Mär 30 21:25:11 Erased: opal
...
Mär 30 21:28:35 Erased: setup
...
Mär 30 21:29:55 Erased: libgcc
...

So setup, and big pile of other stuff like libgcc, bash, etc has been *removed* - no wonder there's no /etc/passwd or much anything left. There's a twelve minute time break in times between last "Updated" and the first "Erased" which almost certainly means a new transaction starting there (iirc you can't even perform simultaneous install/update/erase with yum).

Lennart, please check your root's command history and see what exact yum commands have been run around that time. From the log it seems that there's been what amounts to self-destruct 'yum remove' command issued on March 30th after the big update.

Comment 26 Paul Bolle 2009-04-02 11:12:42 UTC
(In reply to comment #25)
> So setup, and big pile of other stuff like libgcc, bash, etc has been *removed*
> - no wonder there's no /etc/passwd or much anything left. There's a twelve
> minute time break in times between last "Updated" and the first "Erased" which
> almost certainly means a new transaction starting there (iirc you can't even
> perform simultaneous install/update/erase with yum).

Also notable:
- the sudden change in arch for libselinux (x86_64 -> i586)
- the change in language of the month in the printed messages (English -> German?)

Comment 27 Panu Matilainen 2009-04-02 12:00:52 UTC
Hmm, the 12 minute time jump might be an artifact of how yum logs things: "Cleanup" actions (ie erase caused by upgrade) aren't logged, only obsoletions and real removals are logged as "Erased".

In any case, the issue here is that something added a huge pile of erasure
elements that shouldn't have been there. Now we just need to figure out is it
rpm or yum and what triggers it, so far I haven't been able to reproduce.

I am seeing some other strangeness in logging times though (this from a chrooted test-upgrade from f10 to rawhide):
...
Apr 02 14:25:18 Updated: 1:gdm-user-switch-applet-2.26.0-7.fc11.x86_64
Apr 02 14:25:32 Installed: gnome-bluetooth-2.27.1-4.fc11.x86_64
Apr 02 14:25:33 Updated: bluez-4.34-1.fc11.x86_64
Apr 02 07:27:06 Erased: libdhcp4client
Apr 02 07:27:24 Erased: bluez-gnome
Apr 02 07:27:35 Erased: pulseaudio-core-libs

Comment 28 seth vidal 2009-04-02 12:55:01 UTC
Here's my guess:

1. yum update was run - something (probably a crash of something like X or selinux or dbus) took down the system mid-transaction
2. depending on the version of yum he had one there at the time - he ran yum-complete-transaction which finished the erasure portion of the update process
3. something in the erasure portion went wrong.

Lennart,
 Does that sound familiar?

Comment 29 David Woodhouse 2009-04-02 13:05:52 UTC
Created attachment 337802 [details]
yum.log

Seth, your guess fairly much precisely matches what just triggered it for me (yeah, I picked last night to upgrade from F-10 to rawhide; I'm a masochist).

Yum died because the X session restarted itself (and brought me back to a new session, bizarrely. The X server didn't crash and restart and bring me back to the gdm login screen). I restarted it with yum-complete-transaction.

Comment 30 Jeff Layton 2009-04-02 13:34:31 UTC
Similar situation for me. I was yum updating over a ssh session and lost connectivity. When I logged back in, I ran yum-complete-transaction and then rebooted, at which point the boot failed due to missing /etc/passwd.

Comment 31 Lennart Poettering 2009-04-02 14:08:55 UTC
(In reply to comment #25)
> Looking at the yum.log from comment #10:
> 
> ...
> Mar 30 21:13:13 Updated: 1:gnome-applets-2.25.92-4.fc11.x86_64
> Mar 30 21:13:16 Updated: libselinux-devel-2.0.79-4.fc11.x86_64
> Mar 30 21:13:16 Updated: gnome-common-2.26.0-1.fc11.noarch
> Mar 30 21:13:17 Updated: libselinux-2.0.79-4.fc11.i586
> Mär 30 21:25:04 Erased: tkinter
> Mär 30 21:25:07 Erased: PersonalCopy-Lite-patches
> Mär 30 21:25:09 Erased: audit-libs
> Mär 30 21:25:11 Erased: opal
> ...
> Mär 30 21:28:35 Erased: setup
> ...
> Mär 30 21:29:55 Erased: libgcc
> ...
> 
> So setup, and big pile of other stuff like libgcc, bash, etc has been *removed*
> - no wonder there's no /etc/passwd or much anything left. There's a twelve
> minute time break in times between last "Updated" and the first "Erased" which
> almost certainly means a new transaction starting there (iirc you can't even
> perform simultaneous install/update/erase with yum).
> 
> Lennart, please check your root's command history and see what exact yum
> commands have been run around that time. From the log it seems that there's
> been what amounts to self-destruct 'yum remove' command issued on March 30th
> after the big update.  

Dude, I am not stupid. I did a "yum upgrade" that's all. And bash is still there. No clue why yum claims they got removed. Possibly some multi-arch issue? Or an upgrade? (i.e. first install new package, then remove old package?) Also bash, libgcc are not /etc/passwd.

Comment 32 Panu Matilainen 2009-04-02 14:23:43 UTC
(In reply to comment #31)
> Dude, I am not stupid. I did a "yum upgrade" that's all. 

Nobody intentionally removes half the system, I'm just trying to find out where the bug is. Others have had similar experience with upgrade crashing somewhere in the middle and then tried yum-complete-transaction which ended up erasing things it certainly shouldn't have (see comments 28-30). 

So just be certain: did you use yum-complete-transaction on the system? If not, then we're back to drawing board here.

Comment 33 John Reiser 2009-04-02 14:42:36 UTC
In my case, I was doing "yum --nogpgcheck localupdate *.fc11.$ARCH.rpm
*.fc11.noarch.rpm" from directory /var/cache/pungi/rawhide/packages after "rm
-f $(repomanage -o .)"  This would update at least several dozen packages.  The
rpm_debug_check and transaction check succeeded, and the replacements started. 
Then yum failed after (during?) replacement of bash (which was something like
the fifth or sixth package replaced), claiming that some package (dependency?)
was already installed (or something).  Inspecting the results, "rpm -q bash"
showed that two different versions of bash (for the same $ARCH) were installed.
 I removed the older one by hand with "rpm --erase".

I tried re-invoking the localupdate, and received the notice that there were
pending transactions, and recommending yum-complete-transaction.  So I invoked
yum-complete-transaction, and saw that yum proposed to remove hundreds of
packages.  I said, "No, thank you" to that.  Instead I invoked
"yum-complete-transaction --cleanup-only" and then "yum update", suffering the
re-download of all those packages, but saving sanity.

This whole process happened about three times total in the last week or so, on
both i386 and x86_64; but the most recent runs of localupdate from my pungi
caches succeeded.  [I considered filing a bug report, but I was more interested
in testing Rescue mode of my newly-composed DVDs, and thought that explaining
it all would be messy, and that my use case would be ridiculed as outrageous.]

Comment 34 Jeff Johnson 2009-04-02 15:51:58 UTC
Hint: FA_SKIP != FA_BACKUP.

Reproducer:
 Take an upgrade transaction.
  Split the installs from the erases.
  Run the upgrade transaction, terminate after installs are done.

   (wait a bit so that yum.log shows clearly that we're
   gonna run a new&different transaction)

 Now run the erases.

Have fun!

Comment 35 Erik van Pienbroek 2009-04-02 22:06:30 UTC
I've also been hit by this bug. Starting point was Rawhide which was fully updated (all packages which were available before/during the beta freeze). After the unfreeze, I ran 'yum update' and during the installation of 'glibc-common' the computer freezed.

After a reboot I first manually installed glibc* which was hanging around in /var/cache/yum using 'rpm -Uvh --replacefiles --replacepkgs' (so that my glibc installation was sane..translations didn't work anymore).

Afterwards I ran 'yum-complete-transaction'. This caused files like /etc/passwd to be renamed to .rpmsave.

Comment 36 Panu Matilainen 2009-04-03 07:59:39 UTC
Fixed in rpm-4.7.0-0.beta1.9.fc11, file states of skipped files (such as %config(noreplace) on upgrade, and %ghost files) wasn't getting recorded correctly, causing them to be inappropriately handled.

The fixed rpm wont magically fix the incorrect states already in rpmdb but further updates will correct the issue, and the issue should only be generally seen in special circumstances like when running yum-complete-transaction.

Comment 37 seth vidal 2009-04-03 20:53:19 UTC
The bug making the session abort but not logout is here:

https://bugzilla.redhat.com/show_bug.cgi?id=494046