Bug 235524 - rpm leaves old file behind when upgrading hal
Summary: rpm leaves old file behind when upgrading hal
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: rpm
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Panu Matilainen
QA Contact:
URL:
Whiteboard: bzcl34nup
Depends On:
Blocks: multilib
TreeView+ depends on / blocked
 
Reported: 2007-04-06 17:59 UTC by David Woodhouse
Modified: 2008-05-07 01:26 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2008-05-07 01:26:41 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Output from 'rpm -Uvv hal-0.5.9-5.fc7.ppc.rpm hal-libs-0.5.9-5.fc7.ppc64.rpm hal-libs-0.5.9-5.fc7.ppc.rpm' (77.42 KB, text/plain)
2007-04-27 07:59 UTC, David Woodhouse
no flags Details

Description David Woodhouse 2007-04-06 17:59:28 UTC
Initial state: old hal packages are installed -- both arches. /usr/sbin/hald is
64-bit:

[root@ps3 hal]# rpm -qa | grep ^hal
hal-0.5.9-0.git20070401.1.fc7
hal-gnome-0.5.9-0.git20070401.1.fc7
hal-devel-0.5.9-0.git20070401.1.fc7
hal-devel-0.5.9-0.git20070401.1.fc7
hal-0.5.9-0.git20070401.1.fc7
[root@ps3 hal]# rpm -qV hal.ppc64 | grep hald
.......T c /etc/rc.d/init.d/haldaemon
[root@ps3 hal]# rpm -qV hal.ppc64 | grep /usr/sbin/hald
[root@ps3 hal]# rpm -qV hal.ppc | grep /usr/sbin/hald
S.5....T   /usr/sbin/hald
[root@ps3 hal]# file /usr/sbin/hald ; md5sum /usr/sbin/hald
/usr/sbin/hald: ELF 64-bit MSB executable, 64-bit PowerPC or cisco 7500, version
1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped
a382beb878be3060161a7f2ed900a24c  /usr/sbin/hald



Upgrade hal, and now we should have only the 32-bit version of hal installed,
because of the Obsoletes: hal < 0.5.9-1 in hal-libs.

[root@ps3 hal]# rpm -Uhv hal-0.5.9-2.fc7.ppc.rpm hal-devel-0.5.9-2.fc7.ppc64.rpm
hal-devel-0.5.9-2.fc7.ppc.rpm hal-libs-0.5.9-2.fc7.ppc.rpm
hal-libs-0.5.9-2.fc7.ppc64.rpm hal-gnome-0.5.9-2.fc7.ppc.rpm
Preparing...                ########################################### [100%]
   1:hal-libs               ########################################### [ 17%]
   2:hal-libs               ########################################### [ 33%]
   3:hal                    ########################################### [ 50%]
   4:hal-devel              ########################################### [ 67%]
   5:hal-devel              ########################################### [ 83%]
   6:hal-gnome              ########################################### [100%]
[root@ps3 hal]# rpm -qa --qf %{NAME}-%{VERSION}-%{RELEASE}.%{ARCH}\\n | grep ^hal
hal-0.5.9-2.fc7.ppc
hal-libs-0.5.9-2.fc7.ppc
hal-libs-0.5.9-2.fc7.ppc64
hal-devel-0.5.9-2.fc7.ppc64
hal-gnome-0.5.9-2.fc7.ppc
hal-devel-0.5.9-2.fc7.ppc


But hald didn't actually change!

[root@ps3 hal]# file /usr/sbin/hald ; md5sum /usr/sbin/hald/usr/sbin/hald: ELF
64-bit MSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV),
dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped
a382beb878be3060161a7f2ed900a24c  /usr/sbin/hald


RPM's own verification seems confused now...

[root@ps3 hal]# rpm -qV hal.ppc
[root@ps3 hal]# rpm -qV hal.ppc64
[root@ps3 hal]# rpm -qV hal.ppc62
package hal.ppc62 is not installed
[root@ps3 hal]# rpm -e hal.ppc64
[root@ps3 hal]# rpm -q hal.ppc64
[root@ps3 hal]# rpm -q hal.ppc62
package hal.ppc62 is not installed

Comment 1 Jeff Johnson 2007-04-07 01:52:24 UTC
rpm always prefers ELF64 executables.

What do you expect from -qV? query and verify are different rpm modes.
The command you likely wanted was
    rpm -V hal
Add -v if you want to see every file checked.

What does "rpm -q --qf '%{name}.%{arch}\n' hal" say?

Comment 2 David Woodhouse 2007-04-07 02:10:10 UTC
(In reply to comment #1)
> rpm always prefers ELF64 executables.

That's considered a bug by some on ppc, where we actually prefer 32-bit
executables. There's a lot to be said for removing this facility from rpm, and
just banning conflicts on executables instead.

And it should be irrelevant here anyway -- we don't have the 64-bit package
installed at all any more. I went from having hal-v1.ppc and hal-v1.ppc64 both
installed, to having only hal-v2.ppc installed. Yet the file left in the
filesystem was the one from the hal-v1.ppc64 package. That's just broken.
 
> What does "rpm -q --qf '%{name}.%{arch}\n' hal" say?

hal.ppc



Comment 3 Jeff Johnson 2007-04-07 02:21:34 UTC
Having 8+ ppc arches is considered a bug by many, as only elf32/elf64 is meaningful.

The only reason that rpm implements
    Always prefer ELF64.
is that only the elf64 /sbin/ldconfig had support for both elf32/elf64, the elf32 version
only supported elf32. That forced the policy in the original implementation.

You can certainly detect executable conflicts. Again, the marching orders for
multilib (which I think is seriously deficient in many ways) were
   1) libraries on separate paths
   2) executables don't matter (in the sense that, say, /bin/ls functionality is independent of elf32/elf64)

I implemented what was requested, exactly as requested. It was a job mon.

What exactly do you expect having
   1) both elf32 and elf64 packages installed.
   2) a path shared between the 2 packages that happens to be occupied by elf64.
   3) removing the elf64 package.

Should rpm remove the elf64 executable or not? Note that there is no way to
replace the elf64 with the elf32 executable without access to the original package,
that file done benn discarded.

Swap elf32 <-> elf64 in the above scenario, you have exactly the same problem.

I suggest you stop attempting to put differing content on the same path and
change your packaging.

Comment 4 David Woodhouse 2007-04-07 02:34:28 UTC
(In reply to comment #3)
> What exactly do you expect having
>    1) both elf32 and elf64 packages installed.
>    2) a path shared between the 2 packages that happens to be occupied by elf64.
>    3) removing the elf64 package.


You misunderstand. Both elf64 and elf32 packages were removed, and a _new_ elf32
package was installed. At least, that's what RPM was asked for and that's what
the rpmdb says. But the files left behind on the filesystem are still from the
_old_ package.

> Should rpm remove the elf64 executable or not? Note that there is no way to
> replace the elf64 with the elf32 executable without access to the original
package,
> that file done benn discarded.

No. there is a _new_ elf32 package installed.
  
> I suggest you stop attempting to put differing content on the same path and
> change your packaging.

Er, yeah. I buy that argument 100%. That's what we _did_. That's why it's
removing the conflicting 32-bit and 64-bit packages, and replacing them with a
single 32-bit package.

Comment 5 Jeff Johnson 2007-04-24 18:49:44 UTC
Here's the original state on CentOS5/x86_64:

# rpm -qs hal | grep /usr/sbin/hald
normal        /usr/sbin/hald
wrong color   /usr/sbin/hald

i.e. the elf64 version of /usr/sbin/hald is installed, the elf32 version is "wrong color" (i.e. not installed),
and bothe hal.x86_64 and hal.i386 packages are installed.

Comment 6 David Woodhouse 2007-04-24 18:52:18 UTC
Ok, now do an update which removes both the existing packages and installs one
new 32-bit package. What happens?

Comment 7 Jeff Johnson 2007-04-24 19:15:47 UTC
After stubbing out the FC67 ConsoleKit dependency (not present on CentOS5),
this is the minimal necessary upgrade transaction:

    # rpm -Uvv hal-0.5.9-5.fc7.x86_64.rpm hal-libs-0.5.9-5.fc7.x86_64.rpm hal-
libs-0.5.9-5.fc7.i386.rpm hal-info-20070402-1.fc7.noarch.rpm hal-gnome-0.5.9-5.fc7.x86_64.rpm

(aside) note the loop which needs repairing:

D: ========== tsorting packages (order, #predecessors, #succesors, tree, Ldepth, Rbreadth)
D:     0    0    1    3    0    0 +hal-libs-0.5.9-5.fc7.i386
D:     1    0    1    2    0    1 +hal-libs-0.5.9-5.fc7.x86_64
D:     2    0    0    4    0    2 -hal-0.5.8.1-19.el5.i386
D:     3    0    0    1    0    3 -hal-0.5.8.1-19.el5.x86_64
D: LOOP:
D: removing hal-info-20070402-1.fc7.noarch "Requires: hal >= 0.5.9" from tsort relations.
D:     hal-info-20070402-1.fc7.noarch           Requires: hal >= 0.5.9
D: removing hal-0.5.9-5.fc7.x86_64 "Requires: hal-info" from tsort relations.
D:     hal-0.5.9-5.fc7.x86_64                   Requires: hal-info
D: ========== continuing tsort ...
D:     4    1    1   -1    0    4 +hal-info-20070402-1.fc7.noarch
D: ========== successors only (150191 bytes)
D:     5    3    1   -1    0    5 +hal-0.5.9-5.fc7.x86_64


The flaw in this bug is in computing the file resolution for erasing /usr/sbin/hald (and other elf64 
executables). Note the "unknown" below:

D: ========== --- hal-0.5.8.1-19.el5 i386-linux 0x1 
...
D: fini      040755  3 (   0,   0)        4096 /usr/share/doc/hal-0.5.8.1 skip
D: fini      100755  1 (   0,   0)      268024 /usr/sbin/hald unknown
D: fini      100755  1 (   0,   0)       19672 /usr/libexec/hald-runner unknown
D: fini      100755  1 (   0,   0)       38280 /usr/libexec/hald-probe-volume unknown
...

Examining the ordering, one finds the reason for the "unknown" state.

In order to compute a resolution, rpm walks hash buckets in the order that
package installs/erases occur.

The file resolution computation depends intimately on installs occurring before erases.

Because the hal <-> hal-info loop delayed ordering of hal, the resolution was computed
incorrectly.

Next to verify the explanantion above by adding dependency whiteout.

Comment 8 Jeff Johnson 2007-04-24 19:49:06 UTC
After restoring to initial state, I've added to /etc/rpm/macros

    %_dependency_whiteout_system    hal>hal-info

which breaks the loop.

Now the same upgarde as previous looks like

D: ========== tsorting packages (order, #predecessors, #succesors, tree, Ldepth, Rbreadth) 
D:     0    0    1    3    0    0 +hal-libs-0.5.9-5.fc7.i386
D:     1    0    1    2    0    1 +hal-libs-0.5.9-5.fc7.x86_64
D:     2    2    2    2    1    0   +hal-0.5.9-5.fc7.x86_64
D: ========== successors only (529020 bytes)
D:     3    0    0    1    0    2 -hal-0.5.8.1-19.el5.x86_64
D:     4    0    0    4    0    3 -hal-0.5.8.1-19.el5.i386
D:     5    1    0    2    2    0     +hal-info-20070402-1.fc7.noarch
D:     6    1    0    2    2    1     +hal-gnome-0.5.9-5.fc7.x86_64

Note no loop.

And the disposition for /usr/sbin/hald (and other executables) erasing old hal.i386 and hal.x86_64 is 
"skip":

D: fini      040755  3 (   0,   0)        4096 /usr/share/doc/hal-0.5.8.1 skip
D: fini      100755  1 (   0,   0)      296264 /usr/sbin/hald skip
D: fini      100755  1 (   0,   0)       15840 /usr/libexec/hald-runner skip
D: fini      100755  1 (   0,   0)       35816 /usr/libexec/hald-probe-volume skip
D: fini      100755  1 (   0,   0)       31744 /usr/libexec/hald-probe-storage skip
D: fini      100755  1 (   0,   0)       11960 /usr/libexec/hald-probe-smbios skip
D: fini      100755  1 (   0,   0)        7928 /usr/libexec/hald-probe-serial skip
D: fini      100755  1 (   0,   0)       10040 /usr/libexec/hald-probe-printer skip
D: fini      100755  1 (   0,   0)        8048 /usr/libexec/hald-probe-pc-floppy skip
D: fini      100755  1 (   0,   0)        9024 /usr/libexec/hald-probe-input skip

which is correct, because the new.hal.i386 has just been installed.

The final test is checking what is actually installed:
# file /usr/sbin/hald
/usr/sbin/hald: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, 
dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped

Which is "correct" because I asked for hal.x86_64, not hal.i386.

So could you try fixing the hal <-> hal-info loop and report your ppc/ppc64 experience please?

I do *NOT* see a multilib problem yet, only a dependency loop flaw in packaging.

Yes, there are other multilib problems, just not this bug (yet).

NEEDINFO

Comment 9 David Zeuthen 2007-04-25 15:47:24 UTC
(In reply to comment #8)
> I do *NOT* see a multilib problem yet, only a dependency loop flaw in packaging.

Care to explain why it's a bug that hal <-> hal-info? Thanks.


Comment 10 Jeff Johnson 2007-04-25 17:17:53 UTC
I'm not sure what you ask. Perhaps you are asking how to express the
semantic requirement of how to express that hal, indeed, must have hal-info
to function properly when installed, and that hal-info, indeed needs a specific
version of hal.

The answer lies in dependency contexts. Likely (I have not looked), nothing
in the hal and hal-info packages is executed while installing, only after being
installed.

So one way to break the loop is to add dependency context markers (like Requires(post))
so that the loop does not exist while installing or erasing hal and hal-info.

Rearranging packaging is another approach to break the loop.

Adding dependency_whiteoiut is a 3rd approach.

The real flaw needs to be fixed in dependency loop handling.

The current tsort implementation runs until empty, then checks for loops.
The end result is that new hal is installed after, not before, old hal packages,
breaking the computation of file resolutions.

Adding an ordering relation for tsort (as smart does) to force all erasures after
whatever package caused the erasure is the best long term solution. I shall be
attempting that in rpm-4.4.9 shortly.

But that assumes that erasures are ordered (not true in rpm-4.4.2) and is a very
different topic.

Eliminating the loop so that hal upgrades correctly is all that I can think of for FC.

Meanwhile, since all I have for multilib testing is x86_64, which prefers elf64, not
elf32, I patiently wait for dwmw2's report about whether removing the loop fixes
the problem on ppc/ppc64.

Comment 11 David Woodhouse 2007-04-27 07:03:40 UTC
(In reply to comment #10)
> Meanwhile, since all I have for multilib testing is x86_64, which prefers
elf64, not
> elf32, I patiently wait for dwmw2's report about whether removing the loop fixes
> the problem on ppc/ppc64.

I keep offering you accounts on ppc64. Anyway, with the patch I just added to
bug #235757 you should be able to set '%_prefer_color 1' on your x86_64 and
reproduce the problem locally.


Comment 12 David Woodhouse 2007-04-27 07:59:19 UTC
Created attachment 153587 [details]
Output from 'rpm -Uvv hal-0.5.9-5.fc7.ppc.rpm hal-libs-0.5.9-5.fc7.ppc64.rpm hal-libs-0.5.9-5.fc7.ppc.rpm'

Even without the loop, it doesn't work.

Comment 13 Jeff Johnson 2007-04-27 10:41:58 UTC
This is the multilib flaw
    D: fini      100755  1 (   0,   0)    337764 /usr/sbin/hald;4631acad skipcolor

Offer (and patch) appreciated, but I won't be beholden to @redhat.com. Fork you!



Comment 14 David Woodhouse 2007-04-27 11:09:40 UTC
I wasn't offering access to redhat.com machines. These would be my own. If you
change your mind, just send me a SSH public key. But with that other patch you
ought to be able to reproduce on x86_64 anyway. 

In fact, even without it you could have gone from hal.i386+hal.x86_64 to
hal.i386+hal-libs.i386+hal-libs.x86_64, and observed the problem, surely?

Comment 15 Jeff Johnson 2007-04-27 15:46:06 UTC
As long as you don't own any ia64's, I may take you up on your kind and generous offer.

Yep, I could have reproduced the problem in any of a number of ways, including
installing x86_64 on i686, the same way that all of the multilib code in rpm was written.

But I'm a lazy schmuck, and I needed an attribution to someone other than me for the change.
You will do ;-) ;-)

The %_prefer_color patch is being added now.

Comment 16 Red Hat Bugzilla 2007-08-21 05:33:18 UTC
User pnasrat's account has been closed

Comment 17 Panu Matilainen 2007-08-22 06:34:34 UTC
Reassigning to owner after bugzilla made a mess, sorry about the noise...

Comment 18 Bug Zapper 2008-04-03 23:59:18 UTC
Based on the date this bug was created, it appears to have been reported
against rawhide during the development of a Fedora release that is no
longer maintained. In order to refocus our efforts as a project we are
flagging all of the open bugs for releases which are no longer
maintained. If this bug remains in NEEDINFO thirty (30) days from now,
we will automatically close it.

If you can reproduce this bug in a maintained Fedora version (7, 8, or
rawhide), please change this bug to the respective version and change
the status to ASSIGNED. (If you're unable to change the bug's version
or status, add a comment to the bug and someone will change it for you.)

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

Comment 19 Bug Zapper 2008-05-07 01:26:39 UTC
This bug has been in NEEDINFO for more than 30 days since feedback was
first requested. As a result we are closing it.

If you can reproduce this bug in the future against a maintained Fedora
version please feel free to reopen it against that version.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp


Note You need to log in before you can comment on or make changes to this bug.