Bug 119185 (readonlyrpmdb) - if rpm exits in mid-transaction, the RPM data base can be left in an inconsistent state
Summary: if rpm exits in mid-transaction, the RPM data base can be left in an inconsis...
Keywords:
Status: CLOSED UPSTREAM
Alias: readonlyrpmdb
Product: Fedora
Classification: Fedora
Component: rpm
Version: 1
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Paul Nasrat
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-03-26 01:17 UTC by Tethys
Modified: 2011-02-23 09:16 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-02-20 22:45:18 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Tethys 2004-03-26 01:17:41 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624

Description of problem:
I have /usr mounted readonly. I forgot to remount it readwrite
before doing a yum update, which resulted in the following
output:

leto:~# yum update
Gathering header information file(s) from server(s)
Server: Fedora Core 1 - i386 - Base
Server: Fedora Core 1 - i386 - Released Updates
Finding updated packages
Downloading needed headers
getting
/var/cache/yum/updates-released/headers/ghostscript-0-7.07-15.2.i386.hdr
ghostscript-0-7.07-15.2.i 100% |=========================|  29 kB   
00:00     
getting /var/cache/yum/updates-released/headers/xmms-1-1.2.10-1.p.i386.hdr
xmms-1-1.2.10-1.p.i386.hd 100% |=========================| 8.7 kB   
00:00     
getting
/var/cache/yum/updates-released/headers/openssl-devel-0-0.9.7a-33.10.i386.hdr
openssl-devel-0-0.9.7a-33 100% |=========================|  24 kB   
00:00     
getting
/var/cache/yum/updates-released/headers/openssl-0-0.9.7a-33.10.i686.hdr
openssl-0-0.9.7a-33.10.i6 100% |=========================| 8.9 kB   
00:00     
getting
/var/cache/yum/updates-released/headers/openssl096-0-0.9.6-26.i386.hdr
openssl096-0-0.9.6-26.i38 100% |=========================| 4.8 kB   
00:00     
getting
/var/cache/yum/updates-released/headers/xcin-0-2.5.3.pre3-21.fc1.2.i386.hdr
xcin-0-2.5.3.pre3-21.fc1. 100% |=========================| 6.6 kB   
00:00     
getting
/var/cache/yum/updates-released/headers/xmms-devel-1-1.2.10-1.p.i386.hdr
xmms-devel-1-1.2.10-1.p.i 100% |=========================| 5.2 kB   
00:00     
getting
/var/cache/yum/updates-released/headers/ghostscript-devel-0-7.07-15.2.i386.hdr
ghostscript-devel-0-7.07- 100% |=========================| 6.4 kB   
00:00     
getting
/var/cache/yum/updates-released/headers/xmms-skins-1-1.2.10-1.p.i386.hdr
xmms-skins-1-1.2.10-1.p.i 100% |=========================| 5.8 kB   
00:00     
getting
/var/cache/yum/updates-released/headers/openssl096b-0-0.9.6b-18.i386.hdr
openssl096b-0-0.9.6b-18.i 100% |=========================| 5.9 kB   
00:00     
getting
/var/cache/yum/updates-released/headers/openssl-perl-0-0.9.7a-33.10.i386.hdr
openssl-perl-0-0.9.7a-33. 100% |=========================| 6.3 kB   
00:00     
getting /var/cache/yum/updates-released/headers/hpijs-0-1.5-4.2.i386.hdr
hpijs-0-1.5-4.2.i386.hdr  100% |=========================| 6.5 kB   
00:00     
Resolving dependencies
Dependencies resolved
I will do the following:
[update: xmms 1:1.2.10-1.p.i386]
[update: openssl 0.9.7a-33.10.i686]
[update: ghostscript 7.07-15.2.i386]
[update: openssl-devel 0.9.7a-33.10.i386]
Is this ok [y/N]: y
Getting xmms-1.2.10-1.p.i386.rpm
xmms-1.2.10-1.p.i386.rpm  100% |=========================| 1.9 MB   
00:35     
Getting openssl-0.9.7a-33.10.i686.rpm
openssl-0.9.7a-33.10.i686 100% |=========================| 1.1 MB   
00:19     
Getting ghostscript-7.07-15.2.i386.rpm
ghostscript-7.07-15.2.i38 100% |=========================| 7.5 MB   
02:17     
Getting openssl-devel-0.9.7a-33.10.i386.rpm
openssl-devel-0.9.7a-33.1 100% |=========================| 1.6 MB   
00:30     
Running test transaction:
Test transaction complete, Success!
openssl 100 % done 1/8 
error: unpacking of archive failed on file /usr/bin/openssl;40637f96:
cpio: open
xmms 100 % done 2/8 
error: unpacking of archive failed on file /usr/bin/wmxmms;40637f96:
cpio: open
ghostscript 100 % done 3/8 
error: unpacking of archive failed on file /usr/bin/bdftops;40637f96:
cpio: open
openssl-devel 100 % done 4/8 
error: unpacking of archive failed on file /usr/include/openssl: cpio:
chown
Completing update for xmms  - 5/8
Completing update for openssl  - 6/8
Completing update for ghostscript  - 7/8
Updated:  xmms 1:1.2.10-1.p.i386 openssl 0.9.7a-33.10.i686 ghostscript
7.07-15.2.i386 openssl-devel 0.9.7a-33.10.i386
Transaction(s) Complete

Not a problem, I thought -- just remount it readwrite and try again:

leto:~# yum update
Gathering header information file(s) from server(s)
Server: Fedora Core 1 - i386 - Base
Server: Fedora Core 1 - i386 - Released Updates
Finding updated packages
Downloading needed headers
getting
/var/cache/yum/updates-released/headers/openssl-0-0.9.7a-33.10.i386.hdr
openssl-0-0.9.7a-33.10.i3 100% |=========================| 8.9 kB   
00:00     
Resolving dependencies
.Dependencies resolved
I will do the following:
[update: openssl-devel 0.9.7a-33.10.i386]
I will install/upgrade these to satisfy the dependencies:
[deps: openssl 0.9.7a-33.10.i686]
Is this ok [y/N]: y
Running test transaction:
Test transaction complete, Success!
openswarning: /usr/share/ssl/openssl.cnf created as
/usr/share/ssl/openssl.cnf.rpmnew
openssl 100 % done 1/3 
openssl-devel 100 % done 2/3 
Completing update for openssl-devel  - 3/3
Updated:  openssl-devel 0.9.7a-33.10.i386
Dep Installed:  openssl 0.9.7a-33.10.i686
Transaction(s) Complete

That's odd... it seems to have done openssl, but what about
ghostscript and xmms? Somehow it's managed to delete them from
the rpm database:

leto:~# rpm -q ghostscript xmms
package ghostscript is not installed
package xmms is not installed
leto:~# ls -l /usr/bin/{xmms,gs}
-rwxr-xr-x  1 root root 3172440 Jan 15 12:29 /usr/bin/gs
-rwxr-xr-x  1 root root 1086504 Oct 22 19:50 /usr/bin/xmms
leto:~# rpm -qf /usr/bin/{xmms,gs}
file /usr/bin/xmms is not owned by any package
file /usr/bin/gs is not owned by any package

Not ideal, so I tried:

leto:~# rpm --rebuilddb
leto:~# rpm -q ghostscript xmms
package ghostscript is not installed
package xmms is not installed

Now what? :-)

Not sure if this should be logged against yum or rpm.

Version-Release number of selected component (if applicable):
Unknown ("rpm -q yum" claims it's not installed)

How reproducible:
Didn't try


Additional info:

Comment 1 Jeff Johnson 2004-03-28 12:09:59 UTC
Wht do you expect a package installer to "work" if the
majority of paths in packages begin with /usr and /usr is
readonly and cannot be written to ?!?

You can try adding to /etc/rpm/macros
    %_netsharedpath /usr
to disable installing files on paths that begin with "/usr/"


Comment 2 Tethys 2004-03-28 22:35:12 UTC
Please reread the bug report. I don't expect it to work. I do,
however, expect it to not corrupt the RPM database in the process.
Since having a readonly /usr is a perfectly valid state for the
machine, I would expect yum and/or rpm to be able to gracefully
handle that situation, and fail in a sane manner.

Comment 3 Jeff Johnson 2004-03-29 02:52:28 UTC
Please re-read my reply. Did you try
    %_netsharedpath /usr
    

Comment 4 Tethys 2004-03-29 16:32:28 UTC
Yes, I read your reply.

You suggested that as a way of disabling installing files to /usr.
Why on earth would I want to do that? Since RH (and now FC) packages
aren't relocatable, that would render my system unusable. What I want
is to have RPM fail gracefully when it can't write to a file.

As for asking if I've tried it? No. Since I currently have a borked
RPM database, I *can't* try it until I reinstall. How you can class
behaviour that leaves my system in that state as "NOTABUG" is beyond
me. Reopening...

Comment 5 Jeff Johnson 2004-03-29 19:08:48 UTC
How exactly is rpm to install files onto a RDONLY mount point?

%_netsharedpath is the existing mechansim for disabling
installs onto RDONLY mount points which you did not use.
That is NOTABUG in my book, ymmv.

Your database can be fixed by verifying what you have
installed is what you want. You might examine the
output of rpm -qa --last to see the order in
which pkgs were installed. You might also look
at /var/log/rpmpkgs*, which is a cron driven query
listing what was in your database at the time the query
was run.

Any missing packages are easily reinstalled.

"corruption" is different than "missing", there are procedures
to deal with that as well.

Comment 6 Tethys 2004-03-29 19:24:00 UTC
Sigh. I really feel like I'm banging my head against a wall with
this one.

>How exactly is rpm to install files onto a RDONLY mount point?

*** IT's NOT *** I have never once said I wanted it to. I've said
I want it to fail without corrupting the database. That's what this
bug is about. The clue is in the title.

>You might also look
>at /var/log/rpmpkgs*, which is a cron driven query
>listing what was in your database at the time the query
>was run.

leto:~# grep xmms /var/log/rpmpkgs*
/var/log/rpmpkgs.2:xmms-1.2.8-3.p.i386.rpm
/var/log/rpmpkgs.3:xmms-1.2.8-3.p.i386.rpm
/var/log/rpmpkgs.4:xmms-1.2.8-3.p.i386.rpm
leto:~# ls -l /var/log/rpmpkgs*
-rw-r--r--  1 root root 23647 Mar 29 04:04 /var/log/rpmpkgs
-rw-r--r--  1 root root 22939 Mar 27 04:02 /var/log/rpmpkgs.1
-rw-r--r--  1 root root 22988 Mar 20 04:02 /var/log/rpmpkgs.2
-rw-r--r--  1 root root 23256 Mar 13 04:02 /var/log/rpmpkgs.3
-rw-r--r--  1 root root 23256 Mar  6 04:02 /var/log/rpmpkgs.4

So as you can see, xmms disappeared from the RPM database sometime
between March 20th and March 27th. In actual fact, it was on March
25th, as evidenced by the date I logged this bug.

>Any missing packages are easily reinstalled.
>
>"corruption" is different than "missing", there are procedures
>to deal with that as well.

Agreed. But the packages *AREN'T MISSING*. They're still there, and
still usable. They're just not in the RPM database. Thus "corruption"
rather than "missing". Had the upgrade actually deleted them, I'd
agree with you. But it didn't. It jsut corrupted the RPM database.
All of this was clearly shown in my original post.

Comment 7 Jeff Johnson 2004-03-29 19:53:37 UTC
Arguing with me is not gonna fix your rpmdb.

Comment 8 Tethys 2004-03-29 21:42:25 UTC
>Arguing with me is not gonna fix your rpmdb.

True. But I thought I might at least be able to get you to acknowledge 
the problem. Can you even see what I'm trying to get at? RPM has,
through normal operation, corrupted its own database. How can that
not be a bug? I agree that installing to a readonly filesystem is
always going to fail. Like I said, it was accidental. But why do
you believe the correct course of action at that point is to corrupt
its database?

Comment 9 meltie 2004-04-09 01:24:46 UTC
What's the problem here, Jeff? RPM should be detecting that it can't
read to a filesystem, and failing gracefully - which it isn't, and a
side-effect of this behaviour is the corruption of rpmdb.

Comment 10 Juha Sahakangas 2004-04-15 19:36:56 UTC
I agree, this is clearly a bug.

Nobody is asking rpm to do the impossible and magick more space to the
usr, what this is all about is that it should notice that something
went (badly!) wrong and abort the installation in time instead of
forcefully wrecking trough like all was in order, even though it's
clearly not, and blowing up things while as a result.

Every software should fail gracefully, and RPM being a critical system
component, even more so.


Comment 11 Shawn McMahon 2004-05-11 14:53:16 UTC
You have got to be kidding me.  By what rational is this NOT a bug?

Are you saying that, BY DESIGN, it is intended to corrupt the database
if a target filesystem is read-only?


Comment 12 sam 2004-05-11 15:11:05 UTC
Shawn...   
The two posts prior to yours just said that it is indeed a bug and  
that jeff johnson is a moron.   
   
I will reiterate:  
  
This is defineatley a bug, and jeff johnson is definatly a moron. 
 
Someone needs to mail this jeff person and get it to reopen the damn 
bug. 

Comment 13 Seth Vidal 2004-05-11 15:13:45 UTC
just a suggestion - if you want jeff to acknowledge the bug and work
on a fix - calling him a moron is not the best way to go about
achieving that goal.



Comment 14 Peter Finlayson 2004-05-11 15:31:15 UTC
This is clearly a bug. If Jeff doesn't know how to fix it, or is
unwilling to do so for any reason, then he should get the bug
reassigned to somebody else, not close it.

Comment 15 Janne Pikkarainen 2004-05-25 07:43:05 UTC
This really, really is a bad bug and IMHO now that Red Hat is hoppin'
on the SELinux train, this becames even more an issue. Under no
circumtances RPM database should not became corrupted. Here's some
pseudo code where I'll try to explain what all the other guys in this
bug have already been trying to explain:

---
if (/usr_is_mounted_read_only) {
    die("Hey! Your /usr is mounted read-only. I'm unable to continue
so I quit now. See you later!");
}
else {
    do_this_installation_stuff();
}
---

What currently happens is something like

---
   do_this_installation_stuff();
   if (/usr_is_mounted_read_only) {
       corrupt_rpm_database_without_notice();
   }
---

Is that REALLY supposed to happen?

Comment 16 Janne Pikkarainen 2004-05-25 07:45:28 UTC
Uh. My first example is of course the preferred way to handle the
ro-situation, and the latter is how this currently happens.

Comment 17 Seth Vidal 2004-05-25 12:46:10 UTC
Changing component to rpm instead of yum.


Comment 18 Michiel Toneman 2004-06-10 18:57:40 UTC
What makes Linux such a joy to work with is the hierarchy in
responsibility. Fix the problems at the origin.

e.g.
If an application crashes the kernel, the kernel should be fixed.
If an application crashes X11, X11 should be fixed.
If a webpage crashes the browser, the browser should be fixed.

and definitely:

If an application corrupts the primary system software database, the
package manager needs to be fixed.

A friend showed me this incident, and frankly I'm a little shocked. We
had this http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=73097
horror on RH8, lets not repeat this. Each and every one I know who ran
RH8 at one time or another has had the problem, and it has turned away
many of them  from RH. 

RPM should not be able to corrupt the system because it found
something slightly unexpected in /usr.

Comment 19 Tethys 2004-06-23 20:22:15 UTC
The consensus seems to agree with me that this really is a bug,
so reopening...

Comment 20 Jeff Johnson 2004-06-25 11:26:44 UTC
Package management on RO /usr is not supported by rpm, never has been.

I see no other bug here.

Comment 21 Simon Lucy 2004-06-25 12:00:37 UTC
I suggest that a new bug is opened that simply defines the exact
behaviour.  Its quite obviously a bug that the rpm database is
corrupted, the reason its corrupted is the context of the bug, but it
is not the bug itself.  The bug is that the rpm database is corrupted.

The only reason for marking a bug WONTFIX or NOTABUG is that the
behaviour (if verified) is as by design.  And in this case the design
is NOT whether rpm is being asked to write to a read only path but
that the rpm database may be corrupted.  I hardly think this was
intended by design.


Comment 22 Jeff Johnson 2004-06-25 12:13:55 UTC
Yes, %_netsharedpath is the designed behavior for handling
RO mount points in rpm.

The RO mount is the source of the problem. Configure
rpm and use appropriately with RO /usr is the answer.

Other problems, like "corrupted" rpmdb, are derivative on
the correct use and configuration. Screaming about the
symptom when the cure for the cause has been described
is, well, pointless.

Now will you please leave this bug closed?

Comment 23 Simon Lucy 2004-06-25 12:27:38 UTC
So accidentally installing to a read only path is a user problem and
corruption of the rpm database is their own fault?

I'd have thought that the graceful exit after testing the destination
would be to inform the user that the destination is read only.

Suppose there was a situation where in copying a file to a read only
destination the original file's directory entry was corrupted?  This
is almost analogous and has similar consequences.  The contents can be
rebuilt or recovered but the user is hardly likely to expect or know
what to do in this situation.

Comment 24 Need Real Name 2004-06-25 15:20:45 UTC
I agree with Simon.

"The RO mount is the source of the problem. Configure
rpm and use appropriately with RO /usr is the answer."

So then shouldn't rpm check for this condition and inform the user of
their mistake (i.e. "rpm cannot work properly with a RO mount; please
adjust the permissions") instead of messing up the rpm db?

Maybe this should be lowered in priority, or changed to an
enhancement, but I don't think it's total b.s. either.

Comment 25 David Chubb 2004-06-25 18:45:58 UTC
As a system administrator I would expect that if the rpm system could 
not successfully apply a given package (regardless of the reason why) 
it would not update the database.  Reading his bug report leads me to 
believe it updated the rpm database before the package was 
successfully applied (which is why it doesn't rerun the packages for 
GhostScript & xmms).

In my mind it is more of a process issue and error handling within 
the rpm system.

I'm not reopening this...but I think it is a valid bug (either with 
yum or rpm).

Comment 26 Tethys 2004-06-25 19:37:59 UTC
>Configure rpm and use appropriately with RO /usr is the answer.

No, Jeff, that's not the answer. If you believe it is, then you
either didn't understand the problem correctly, or have just failed
to explain your solution well enough (I concede that the latter is
a possibility, albeit a slim one in my eyes).

>Screaming about the symptom when the cure for the cause has
>been described is, well, pointless.
>
>Now will you please leave this bug closed?

No, I won't leave it closed until I can see that you've understood
the problem. If you can show that you have, and still believe it's
not a bug, then yes, I'll (reluctantly) leave it closed. You haven't
yet described your cure for the problem. If you do, I'll go away a
happy man.

So, once more...

*** I DON'T EXPECT RPM TO WORK WHILE MY /usr IS MOUNTED READ ONLY ***

With that out of the way, how exactly will %_netsharedpath help here?
From your description, it will prevent me from installing anything in
/usr. But I *want* to install stuff there. In fact RH/FC won't let me
install anywhere else, so I *need* to install stuff there:

   leto:~% rpm -qa | wc -l
   1630
   leto:~% for i in $(rpm -qa); do rpm -qi "$i" | head -1 | egrep 'not
relocatable'; done | wc -l
   1499

Of the packages installed on this FC2 system, only 9% can be
installed anywhere other than /usr.

So, how will %_netsharedpath help me install updates on my machine?
Since you believe it to be a cure for the problem, how should I use
it?

My current approach is:

1. mount -oremount,rw /usr
2. Install/upgrade any packages that I need
3. mount -oremount,ro /usr

The bug was triggered by me forgetting to do step 1 first. However,
by using %_netsharedpath, step 2 will fail. How am I meant to keep
my system up to date using %_netsharedpath? Or are you just claiming
that a readonly /usr is not a valid configuration for a Red Hat or
Fedora machine and that I shouldn't do it?

Comment 27 Tethys 2004-06-25 22:13:58 UTC
Sigh. Look, if you want to keep closing this, then there's nothing
I can do to stop you. But at least have the decency to explain
your rationale. You haven't answered any of my questions, and you
haven't explained what I'm meant to do to avoid further corrupting my
RPM database.

*Please* explain what I'm meant to do. If you believe I should be
using %_netsharedpath, then please answer my questions in comment 26.

Comment 28 Tethys 2004-06-26 06:40:05 UTC
Nice to see a reasoned response there, Jeff. Sigh. Fine. If you
insist on acting like a moron, there's nothing more I can do, and
this will stay closed.

But it's still a bug, and it's still not fixed.

Comment 29 Haji Hill 2004-07-12 17:47:55 UTC
Reading through this, it seems obvious that the solution to the real 
problem has not been identified; that being how to fix the corrupted 
rpm database.

Jeff has identified how to avoid corrupting it in the future (and 
really a script could be written easily enough that would do the 
remounting of /usr for you), but as to the soltuion to fixing the rpm 
database after it has already been corrupted he has given no real 
insight at all... or did I miss that?

Jeff, if rpm is designed to corrupt it's own database as punishment 
for use under improper configuration conditions, what is the solution 
to fixing that rpm databse once this state has been reached?

Comment 30 Jeff Johnson 2004-07-13 17:05:40 UTC
You are correct that noone in this bug has cared to ask
     How do I fix a corrupted rpm database?
Presumably, that's because everyone is more inclined to call
me a "moron", and/or ask leading questions with "punishment".

There are gazillions of bug reports describing how to repair
a database. And I'll be quite happy to walk anyone through the rather
simple procedure.

Open a different bug, please. This bug is not the place to fix anything.

Comment 31 Tethys 2004-07-13 19:48:37 UTC
That's because the bug wasn't corncerned with fixing a corrupted
RPM database. It was concerned with preventing RPM from corrupting
its own database in the course of normal operation. But you seem
utterly blind to that fact.

I didn't call you a moron, believing you to simply be not understanding
the problem. Right up until the point where you simply refused to
listen to reason.

Comment 32 Sibil Llort 2004-08-16 15:34:01 UTC
I have to agree with Jeff Johnson here. Linux is a system for experts
only. If you make a mistake you deserve to have your database
corrupted. I think that, in the future, if a user makes spelling
errors, typos, or any other kind of mistake, we should just go ahead
and corrupt the RPM database for good measure. Also if a user visits
AOL.COM we should corrupt their filesystem. Developers like Jeff are
what make Linux what it is: mightier than Microsoft. Leave this "bug"
closed!

Comment 33 Adam 2004-08-16 16:45:04 UTC
Look guys, maybe it will help if we re-phrase the problem this way:

1. What is the expected behaviour of rpm when it encounters a problem
   that prevents it from continuing?

2. Does "/usr is readonly"  fall under the above situation?

I understand that there are many cases to take care of in a complex
program. None of us are saying that rpm should work with a read-only
/usr. What we are talking about is the failure mode of the software.

I think we can all agree that if a program detects (or CAN detect)
an improper configuration or input, it should give an error message
and exit.

Adam


Comment 34 Deven T. Corzine 2004-08-16 17:04:29 UTC
Sibil, I have to disagree.  While the update should obviously fail
with a readonly /usr, it should fail gracefully, not corrupt its own
database!  Especially in a critical system program like this, such
failure modes are not acceptable for a production system.  If you're
going to take an "experts only" stance, you might as well be using
Gentoo, where at least they seem to care that the system WORKS.

While this corruption is obviously caused by "user error", the
software should be resilient to such errors, as users WILL make
mistakes.  Call the user a "moron" in the error if you'd like, but
there should be a sanity check here, as it's obviously needed.

However, there should be no need for a special check for a readonly
filesystem -- normal error checking should catch the failure to write
the data to the filesystem in the first place.  Does this mean that
rpm ISN'T checking for errors while installing packages?!?  That's
truly disturbing.  What if the filesystem is full?  What if an I/O
error occurs?  Blindly assuming that every operation works is
incredibly dangerous, and inexcusable for rpm in particular.

Between this and the refusal to consider the Windows XP dual-booting
problem as a "stop-ship" bug for Fedora Core 2, I'm becoming
increasingly convinced that the Fedora Core approach to replacing the
Red Hat desktop distributions is a dismal failure, and the FC releases
cannot be trusted for production use...

Comment 35 Gary Niger 2004-08-16 21:24:01 UTC
since Jeff doesn't want to fix this, I have. Patches available at 
http://dc.h4xx.com

Comment 36 Martin C. Messer 2004-08-17 01:03:41 UTC
Warning, don't click on the link in comment 35. Someone's idea of a
joke, certainly not an rpm patch. You've been warned.

Comment 37 Ed Wilts 2004-08-17 02:22:55 UTC
Section 3.2 of the Red Hat Enterprise Linux Reference Guide:
http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/ref-guide/s1-filesystem-fhs.html

"Red Hat Enterprise Linux uses the Filesystem Hierarchy Standard 
(FHS) file system structure, which defines the names, locations, and
permissions for many file types and directories.

The FHS document is the authoritative reference to any FHS-compliant
file system, but the standard leaves many areas undefined or
extensible. This section is an overview of the standard and a
description of the parts of the file system not covered by the standard.

Compliance with the standard means many things, but the two most
important are compatibility with other compliant systems and the
ability to mount a /usr/ partition as read-only. This second point is
important because the directory contains common executables and should
not be changed by users. Also, since the /usr/ directory is mounted as
read-only, it can be mounted from the CD-ROM or from another machine
via a read-only NFS mount."

Comment 38 Steven Brown 2004-08-17 03:59:12 UTC
This is obviously a bug in rpm/yum.  It should be doing such updates 
as a transaction, or at least fail gracefully.


Comment 39 Corey 2004-08-17 04:44:24 UTC
I agree... it's one if-statement, I don't know what the big deal is.

Comment 40 Doug 2004-08-17 15:25:56 UTC
So I guess we just need to refile this bug and get someone else to 
fix it for us.

Comment 41 Ken Snider 2004-08-30 20:45:58 UTC
Am I reading this correctly, that the problem is related to the rpm db
being "out of sync", because the db was mounted read-only when the
install was performed, and therefore not present in the db at this point?

Comment 42 Bennett Mead 2004-08-31 21:14:36 UTC
Jeff, what's the deal here?  This bug has been in limbo for almost 6 
months.  Do you have something against Red Hat/Fedora?  By refusing 
to transform RPM into a robust/mature product you are damaging the 
credibility of Linux.  Are we arguing over the description of the 
bug?  Over sysadmin responsability?

Error Checking is good.  If you know that something will damage the 
system, prevent the system from doint that action before the damage 
is done.  Easy as that.  A sysadmin would switch editors in a moment 
if half his script disappeared if he hit just one wrong key.  People 
expect a mature program like YUM or RPM to prevent whatever 
reasonable mistakes a user can throw at it.  

**Any action that results in a corrupt database is a bug!** 

What's wrong with that statement?  Is there a single command with no 
other user input that can be run to fix the database or something so 
that you can brush it off as trivial?  Sysadmins are busy people, 
every time you have a bug that takes 15 minutes or even just one 
minute to clean up after, you're costing companies money.  At least 
keep it around as a bug someone else can fix.

Where are you coming from on this?  Linux is on the path towards 
greater user exceptance, why stand up for things that will be 
considered reasons not to go with Red Hat/Fedora/Anything that uses 
RPM and YUM?  Package adding/updating is so fundamental, it should be 
more bug free than mere applications.


Comment 43 dan 2004-09-14 08:28:12 UTC
Granted, most software problems lie between the headphones... this is
not one of them. The user made a mistake thats easy to protect
against. If modifying rpm to fix this issue would harm it in some
other way, then state that. or, state WHY this behavior is NORMAL to
the operation of rpm. otherwise, fix the fucking problem. it ain't the
users fault that the db got corrupted and it shouldn't be a
complicated problem to correct. 

Comment 44 Max Gilead 2004-09-19 19:33:50 UTC
I think Tethys or someone else with good writing skills should mail
some RedHat manager and suggest that mr. Johnson should be fired
immediately for refusing to fix clearly identified problem in one of
the company's core products.

Comment 45 Chuck Mead 2004-09-20 16:35:28 UTC
I find it difficult to believe that this silly argument over an edge
case bug which only arises through a PEBKAC condition is still going
on. I also find the comments Max made in #44 above to be reprehensible
and beneath contempt. Beyond that it's clear that since it's yum we're
discussing this bug is not a serious issue for anyone except the
victim with the, obvious, PEBKAC problem. It is clearly an edge case
now being used to bludgeon Jeff. Get off it... grow up. If I were Jeff
I'd quit before I fixed this now! You don't solve problems or interact
well with others via threats and abusive language. It only makes
people mad.

NB. PEBKAC == Problem Exists Between Keyboard And Chair

Comment 46 Daniel Reed 2004-09-20 17:24:10 UTC
It seems my #130260 is actually a duplicate of this bug. If something
related to rpm (rpm? something in librpm?) exits unexpectedly, it can
leave the RPM data base in an inconistent state.

The workaround I used for #130260 was to find files in /usr/bin and
/usr/lib that were not symlinks and that rpm -qf did not believe were
a part of any package, then find which package the files actually were
a part of on another system (or perhaps using rpm --redhatprovides),
and manually rpm -Uvh the .rpm.

My specific situation was the loss of the .rpm repository location (an
NFS volume) during a yum upgrade, but it seems to be a general flaw in
rpm's handling of unexpected failures. I believe this bug should
remain open until such time as the flaw in rpm has been identified and
corrected. This is a robustness issue.

Comment 47 Dzhugashvili 2004-09-28 20:21:39 UTC
@Deven T. Corzine

I'm with sybyl on this one. One of the big problems with Windows is the idiotic users 
blindly installing malware left right and centre. If Linux holds their hands when they make 
a mistake then they will never learn. Much better to show them the error of their ways - an 
afternoon spent fixing the rpm database will drum it into their heads that installing to a 
RO directory is wrong!

By making the users think about their actions they will not make the same mistake twice - 
this will free up valuable developer time. Jeff must have better things to do, such as all the 
urgent patches to TuxRacer, than tell users not to install to RO locations!

Comment 48 Deven T. Corzine 2004-09-29 22:29:06 UTC
This has nothing to do with idiotic users blindly installing malware.

Stupid or lazy users are not the ones who have to worry about this
bug, since they would never have /usr readonly in the first place.

The people most likely to have /usr readonly are conscientious,
competent (but perhaps harried) system administrators who have to
worry that one tiny mistake (forgetting to remount /usr read-write)
will trash the RPM database.

ANY professional system administrator will tell you that this failure
mode in the core package management software is absolutely inexcusable
and reason enough to consider other Linux distributions.  Anyone who
says "just don't make mistakes" is out of touch with the real world. 
Even supremely competent experts make mistakes and typos on occasion.
 Nobody's perfect.  But nobody should have to worry that a trivial
user error will cause such drastic system corruption.

The competent system administrator who forgets to remount /usr expects
an error to occur.  Within seconds, they'll probably realize that they
forgot to remount /usr and do that, then try again.  This should work.
 But with this bug, that minor error corrupts the system and leaves
the poor administrator with significant recovery work to do because of
a trivial mistake.  Draconian punishment for trivial mistakes may work
in a totalitarian state, but in the real world, system administrators
who get burned by this bug will be looking for alternative Linux
distributions in short order.

We're not talking about handholding here.  Handholding would be to
offer to remount /usr read-write for the user (since they forgot) --
but nobody is suggesting anything of the sort.

We're talking about robust software design.  No user action (no matter
how wrong) should cause severe internal corruption like this.  The
software should fail gracefully, preferably with an informative error
message, but at least without self-immolation.  Of all the bugs that a
developer should fix, a bug like this should be the TOP priority, not
written off as "not a bug".

You're right about one thing, though.  Spending an afternoon repairing
the RPM database because of a trivial mistake will teach them "the
error of their ways" -- but not in the way you think.  It will teach
them that they were foolish to trust this software and (by extension)
any Linux distribution which uses it.

This is a very serious bug, yet some people just don't get it.

Comment 49 Sibil Llort 2004-10-04 20:25:39 UTC
Deven Corzine, you are WRONG! Expert system administrators, people
like me who use Linux, never ever make mistakes. Quite simply, we are
perfect - this is why we use Linux, which is also perfect. There are
some who say "but making /usr readonly is in the Red Hat official
documentation! It is documented standard procedure!" Well, this is
true. But that is not the point - the point is, ONLY IDIOTS READ THE
MANUAL. Sure, you read the manual, you do what it says, it corrupts
your RPM database. Fine. You're not perfect, you're not an expert, you
don't DESERVE linux. Now if you'll excuse me, Jeff Johnson and I need
to go patch TuxRacer, it's still way better than Counter Strike but I
think more can be done.

Comment 50 Haji Hill 2004-10-05 11:28:09 UTC
Ummmm... Like hella LOL dude... I'll P$WN you're a$$ckor's at 
TuxRacer yo' and you'll never even get a chance to counter$trike my 
axxors, $0 pwned will you be....
101 @ Joo 4 ever

Comment 51 Tethys 2006-01-20 20:43:03 UTC
If you're going to make this CLOSED->WONTFIX, then I'll keep opening it until
you give some justification for that decision.

Comment 52 Tethys 2006-01-23 20:07:26 UTC
Well well... hiding behind that mac.com address it seems we have one Jeff
Johnson. Taking this personally are you Jeff? Well, it's still a bug, it's still not
fixed, and so CLOSED->WONTFIX is inappropriate. Reopening.

Comment 53 Michael Jennings (KainX) 2006-01-23 20:09:53 UTC
What everyone seems to be missing here, and that Jeff is getting very sick of
repeating, is that RPM uses a "best effort" algorithm when doing upgrades.  The
RPM database was not corrupted; believe me, I can show you what a corrupted RPM
database looks like.

What actually happened here is that the previous versions were removed as part
of the upgrade process, and the newer versions were installed.  At least as much
as possible.  :-)  As many files as could be removed from the old install were,
and that install was removed from the RPM DB.  As much of the new stuff as could
be installed was, but the install of some items failed, so the new packages were
not added to the RPM DB.  While this may not be what *you* think should've
happened, it is consistent with RPM's upgrade algorithm and does NOT result in a
corrupted database in any way, shape, or form.

What is really being asked for is different (not "better," just "different")
handling of failures during upgrades.  For this, you want to look at the
auto-rollback features RPM has.  You'll also want to look at RPM's ability to
repackage previously-installed versions during package upgrades, and how to
permit yum to activate either or both of these features.

Comment 54 Janne Pikkarainen 2006-01-23 20:20:27 UTC
And I don't get, why on earth rpm could and should not check the overall mount 
point state before it attempts to do ANYTHING. If any of the required mount 
points were found mounted in read-only mode, rpm should abort and at least ask 
from the sysadmin what to do next. Let's try this one more time. THIS is the 
behaviour I kindly ask rpm to do: 
 
1) rpm update/installation/other modification starts 
 
2) rpm checks if any of the mount points is mounted read-only 
 
3) should something be mounted read-only, rpm should throw out "DANGER, 
DANGER, WILL ROBINSON! You're mount point <x> is mounted read-only! Are you 
STILL sure you want to continue? Press Yes and I shall not take any 
responsibility from consequences. Press No and I shall abort. What is your 
command, my master?" 
 
4) if user still decides to continue... well, then just do it, whatever that 
means. 
 
5) if user chooses wisely and decides to abort, then abort. 

Comment 55 Tethys 2006-01-23 20:35:32 UTC
At last, a reasoned, sane response. A response that's wrong on some details,
but at least someone's making the attempt to talk about the problem. Thank you.

The RPM database *was* corrupted. Not to the point of unreadability, to be
sure, but since it didn't represent the state of the filesystem, then I'd
call that corruption. A file was installed as part of an RPM package. After
the events described here, RPM is unaware that those files exist, and claims
they're not associated with any package. Well how else did they get there,
and what would you call that if not db corruption?

The files associated with the previous version weren't removed from the system,
as I showed in the initial bug report. Now perhaps the previous version was
removed from the RPM database before the new version was attempted to be
installed. In which case, great. We've found the bug. It should be trivial to
only alter the state of the database after the transaction is complete, rather
than do it mid way through and risk the transaction failing later on.

So RPM has auto-rollback features? Sounds great. If the transaction fails, roll
back to a previously known good state. But that's not what happened here.
Hence it's still a bug, whichever way you look at it. Plus, rpm(8) makes no
mention of these features, and whether or not they need to be explicitly called
or if they're enabled by default.

Oh, and having reread the comments, I haven't seen Jeff mention once that
RPM uses a best effort algorithm for upgrades. All he's said is that I should
be telling it my /usr is mounted readonly and not to install anything there
(and thus completely missing the point of this bug).

Comment 56 James Olin Oden 2006-01-23 21:31:52 UTC
> The RPM database *was* corrupted. Not to the point of unreadability, to be
> sure, but since it didn't represent the state of the filesystem, then I'd
> call that corruption. A file was installed as part of an RPM package. After
> the events described here, RPM is unaware that those files exist, and claims
> they're not associated with any package. Well how else did they get there,
> and what would you call that if not db corruption?

The rpm database was not corrupted (which means that its data structures were 
garbled), but in an inconsistent state.  The state it was in is quite 
explainable in that the error happened the "PROCESS" stage of the PSM (and in 
particular in the call to the FSM that delivers the files).  This is before the 
POST stage of the PSM (Package State Machine) where the header is loaded into 
the database (or erased if its an erase element being ran through the PSM). 

> So RPM has auto-rollback features? Sounds great. If the transaction fails, 
roll
> back to a previously known good state. But that's not what happened here.
> Hence it's still a bug, whichever way you look at it. Plus, rpm(8) makes no
> mention of these features, and whether or not they need to be explicitly 
called
> or if they're enabled by default.
It didn't do that because you did not configure rpm to do that.  Its not 
documented because the feature is experimental.  It is used in some production 
sites, but some of us are required to provide a rollback capability for our 
upgrades.  

Concerning the "best effort" policy of rpm, this has been this way since its 
creation AFAIK.   In some environments its not acceptable, in others it is the 
sanest policy to use.

Concrerning the real problem at hand, it would be very nice that when rpm 
detects it has read only filesystems that are not in the %_netsharedpath macro,
it should fail early before running any package through the package state 
machine, because it can (there are issues, but bottom line it is very very 
possible).

That said, expressing anger and directing it towards others in public will not 
engender anyone to work on this problem for you. 


Comment 57 Michael Jennings (KainX) 2006-01-23 21:40:23 UTC
> And I don't get, why on earth rpm could and should not check the overall mount
> point state before it attempts to do ANYTHING. If any of the required mount
> points were found mounted in read-only mode, rpm should abort and at least ask
> from the sysadmin what to do next.

"Best effort" is the key portion of my message that you missed.  And to answer
your initial question, "checking the overall mountpoint state" is not the job of
RPM.  Package management does not include compensating for every stupid mistake
an administrator can make.  Should RPM also check to make sure you didn't set
the "immutable" attribute on files it's trying to overwrite?  Should it check to
make sure you don't have /usr pointing to a RAMdisk and warn you of that too? 
Next you'll be wanting RPM to warn you before installing programs which don't
pass a nessus audit, libraries with improper/undefined symbols or versioning,
scripts with race conditions, documentation with words that are unsafe for
children, and toilet paper that might cause chafing.

Clearly you want a package manager which exceeds what RPM was/is designed and
intended to do.  You should either find one or write one yourself.

> At last, a reasoned, sane response. A response that's wrong on some details,
> but at least someone's making the attempt to talk about the problem. Thank you.

You're welcome.  Though you disagreeing with certain implementation/design
decisions of RPM does not make me "wrong on some details."  :-)

> The RPM database *was* corrupted. Not to the point of unreadability, to be
> sure, but since it didn't represent the state of the filesystem, then I'd
> call that corruption.

You'd be incorrect.  The RPM DB isn't designed to represent the state of the
filesystem.  It's designed to represent the state of the package manager's
knowledge of packages installed.  And it was correct in that regard.

> A file was installed as part of an RPM package. After
> the events described here, RPM is unaware that those files exist, and claims
> they're not associated with any package. Well how else did they get there,
> and what would you call that if not db corruption?

Your /usr was read-only.  The files in /usr/bin were not installed by the
packages you were trying to install.  They were from the previous packages which
were uninstalled.  The files themselves couldn't be uninstalled because of the
read-only filesystem, but the package was uninstalled.  Remember, best effort
means "do as much as we can."  We could update the database, so we did.  We
could not remove the files, so we didn't.

> The files associated with the previous version weren't removed from the system,
> as I showed in the initial bug report. Now perhaps the previous version was
> removed from the RPM database before the new version was attempted to be
> installed. In which case, great. We've found the bug. It should be trivial to
> only alter the state of the database after the transaction is complete, rather
> than do it mid way through and risk the transaction failing later on.

Again, you're missing the "best effort" part.  What you want is transaction
rollbacks, and as I said, that can be found elsewhere.

> So RPM has auto-rollback features? Sounds great. If the transaction fails, roll
> back to a previously known good state. But that's not what happened here.
> Hence it's still a bug, whichever way you look at it. Plus, rpm(8) makes no
> mention of these features, and whether or not they need to be explicitly called
> or if they're enabled by default.

No, it's not a bug.  If there's a "bug" anywhere, it's in your lack of knowledge
of the current RPM algorithms, the alternatives available, and the details of
how to activate them; and in the documentation which fails to mention these
things.  RPM is behaving as designed and implemented.  That is, by definition,
not a bug.

> Oh, and having reread the comments, I haven't seen Jeff mention once that
> RPM uses a best effort algorithm for upgrades. All he's said is that I should
> be telling it my /usr is mounted readonly and not to install anything there
> (and thus completely missing the point of this bug).

Jeff is one of the most skilled and intelligent programmers (and thinkers) I
know.  I can think through the repercussions and ramifications of things almost
instantly.  He didn't miss anything; you simply don't know him well enough to
grasp his subtle sense of humor and his succinct style.

You see, users often use Bugzilla and other web-based bug trackers as a way to
label behavior they don't understand or that doesn't coincide with what they
expected to occur as a Bug without actually trying to ask the first and more
important question:  Why did this happen?  They also use it as a way to avoid
taking the time to do due diligence research to find the answer to that
question, like searching Google or mailing list archives.  Had you done this,
you'd have almost certainly stumbled upon at least one of the following links:

http://www.linuxjournal.com/article/7034
http://www.redhat.com/archives/rpm-list/2003-January/msg00342.html
https://lists.dulug.duke.edu/pipermail/rpm-devel/2006-January/000692.html

Or even if you had not, a post to the mailing list would've resolved the issue
much more quickly as someone like myself or James tried to explain to you why
your expectations did not meet with reality, why the idea of a "rollback" is a
far more complex notion than you have imagined, what (if any) rollback features
exist, what their capabilities and flaws are, etc.

Having explained the situation to the best of my ability, and having provided
links for further research on the part of anyone not thus satisfied with my
responses, I will make none further.  You are more than welcome to continue to
disagree regarding what the correct behavior should be, but as Jeff is the RPM
author and has made his decision, you're not likely to make much progress in
that regard.  And continuing to use Bugzilla as leverage for empowering your
disagreement is simply childish.

To summarize:

1.  The behavior described is not a bug.  The application is behaving as designed.
2.  Requests for additional pre-install checks (like filesystems being
read-only) should be filed independently as RFE's and will be given attention
directly proportional to their perceived sanity.
3.  Your options with respect to transactions and rollbacks thereof are
described in great detail here:  http://www.linuxjournal.com/article/7034
4.  Calling Jeff a "moron" says far more about you than it does about him. 
Largely because most of us know better.

Comment 58 Tethys 2006-01-23 23:56:22 UTC
Fair enough. I think we'll have to agree to disagree over terminology here,
particularly regarding "corruption" and "best effort".

I don't believe I'm asking more of a package manager than RPM was (or at
least should have been) designed to do. I merely expect it to keep track
of the packages that are currently installed on my system, and by extension,
the files associated with those packages. To be honest, although rollbacks
are a nice feature, I don't even think they're necessary. The ability to
roll back is essential for many environments, but it could be equally well
achieved outside of RPM (for example in yum).

>And to answer your initial question, "checking the overall mountpoint
>state" is not the job of RPM. Package management does not include
>compensating for every stupid mistake an administrator can make. Should
>RPM also check to make sure you didn't set the "immutable" attribute on
>files it's trying to overwrite? Should it check to make sure you don't
>have /usr pointing to a RAMdisk and warn you of that too?

I actually agree with you here. RPM *shouldn't* be checking whether the
filesystem is writable, or that files it's trying to write aren't immutable.
What I believe it should do is have more sane behaviour in the event that
a write to a file fails (which can, after all, happen for any number of
reasons).

>You'd be incorrect.  The RPM DB isn't designed to represent the state of the
>filesystem.  It's designed to represent the state of the package manager's
>knowledge of packages installed.  And it was correct in that regard.

Hmmm. I'll have to disagree there. The presence of a package and the presence
of its associated files in the filesystem are inextricably linked IMHO. You
can't reasonably claim that a package isn't installed if the files that it is
supposed to install are present on the system. True, you can make that claim
from the perspective of the package manager, but that's just playing with
semantics to get the result you're looking for. By any sane definition, the
package *is* installed, and the end user is unable to tell the difference.

>If there's a "bug" anywhere, it's in your lack of knowledge of the current
>RPM algorithms, the alternatives available, and the details of how to
>activate them

Remember that I didn't actually invoke the rpm command at any point here.
All I did was "yum update". Indeed, it was initially filed as a yum bug.
Perhaps if yum had used the --repackage and --rollback options to rpm, then
this problem would never have shown itself.

>Jeff [...] didn't miss anything; you simply don't know him well enough to
>grasp his subtle sense of humor and his succinct style.

Correct. In fact, I don't know him at all. All I get to see of him is his
comments here. Even after rereading all of the comments in this bug, I
fail to see any indication that he did grasp the problem I was describing,
and plenty of evidence to suggest otherwise (e.g., his suggestion to use
%_netsharedpath)

>The behavior described is not a bug.  The application is behaving as designed.

Then there's still a bug, it just happens to be in the design of the
application, rather than in the code, and it's still just as much in
need of fixing. Can you honestly claim, with a straight face, that the
package manager was designed to become inconsistent with the filesystem
through normal usage, and that if it was, that this represents an acceptable
state of affairs?

>Calling Jeff a "moron" says far more about you than it does about him.
>Largely because most of us know better.

I'd ask you to reread through the comments in this thread, and see which
ones came from me. It should be pointed out that Jeff could have saved
himself a lot of bother by simply explaining 18 months ago what you've
just explained here...

Comment 59 Dzhugashvili 2006-01-24 13:08:33 UTC
> Should RPM also check to make sure you didn't set
> the "immutable" attribute on files it's trying to overwrite?  Should it check to
> make sure you don't have /usr pointing to a RAMdisk and warn you of that too? 
> Next you'll be wanting RPM to warn you before installing programs which don't
> pass a nessus audit, libraries with improper/undefined symbols or versioning,
> scripts with race conditions, documentation with words that are unsafe for
> children, and toilet paper that might cause chafing.

I for one am glad that Linux continues to take a hardline stance against user inadequacy. Any more 
hand-holding and we would be taken over by the desktop hordes. rpm's behaviour is exactly correct: 
the user has issued a command, therefore the OS ploughs on as best as it can. So the mount is read 
only? Who is the OS to question the user! Rollbacks and pre-flights are for the faint-hearted and ill-
prepared. It's the user's job to consider the repercussions *before* issuing commands - ideally by 
reading a few years worth of magazines first in order to find out all the undocumented secrets.

Comment 60 Dmitry Bolkhovityanov 2006-02-09 05:45:25 UTC
The problem still exists in 4.4.2-7, compiled for FC4 from FC5-test.

Just my 2 cents: such behaviour is PLAINLY INTOLERABLE.
All arguments were stated many times above, and it is obvious that SUCH reaction
to a trivial human mistake (in otherwise a perfectly legitimate setup with /usr
usually readonly) is DEFINITELY A BUG.
Whether it is an "implementation bug" or "design bug" -- is another question,
but this is more a question of terminology.
But it should be FIXED ANYWAY.

Comment 61 Dmitry Bolkhovityanov 2006-02-09 05:50:55 UTC
2 Sibil Llort and Dzhugashvili (hello, mr. Iosif Stalin! :-): would you be so
kind to add smileys to your comments, since some people can't grok your light
vein of humour and take your comments seriously.

Comment 62 Dmitry Bolkhovityanov 2006-02-09 05:55:26 UTC
2 Tethys: since you are reporter, don't you think it would be the right thing to
set this bug's priority=high and severity=high? This rpm's behaviour makes
severe damage to the system, so such attributes are justified...

Comment 63 Dmitry Bolkhovityanov 2006-02-09 06:54:37 UTC
Since Jeff Johnson isn't even mentioned in "Email sent to:", and "Assigned To"
is Paul Nasrat, I have to conclude that Paul is currently the person responsible
for this bugreport.

Paul, what's the problem with fixing this? While such behaviour may seem correct
from initial design's point of view (that discussion can be infinite), it is
extremely inconvenient for "users", and as such is worth fixing.

As for technical details: 

- As I understand, when detecting that some filesystem operation (file
creation/removal) had failed from deep inside of RPM, it can be 
hard to "fail gracefully", i.e., leave everything "as was" -- since some work
could already have been done (in other dirs, with other packages, etc.).

- So, a pre-check, as some people above suggested, can be done (before actual
actions start!): for each file/dir affected from each package, call
"access(filename,W_OK)", and if it fails -- then print descriptive message
(including strerror(errno)), and exit(1).

(Since speed is not so critical aspect of RPM operation, a little overhead due
to access() is tolerable, and those files/dirs would be touched by subsequent
install/remove/upgrade anyway.)

- Since there is no "C_OK" flag (analogue of O_CREAT), and
access("/existent/dir/nonexistent-file",W_OK) returns -1/ENOENT, the check
"access(filename,W_OK)<0" can be extended to
"access(filename,W_OK)<0&&access(dirname(filename),W_OK)<0" for cases when an
ability to create a file should be checked.

- While above checks cover 99.9...% of situations, cases may exist when
"current" ("dumb"?) behaviour is preferred; for such cases the "--force" flag
can switch off the pre-check.

Is such modification possible?
It looks low-cost and would cure the problem for 99.9% cases.

Comment 64 Dmitry Bolkhovityanov 2006-02-09 11:37:31 UTC
BTW -- access()-based approach would also take care of other similar situations,
like "immutable" bit set (which is also perfectly legitimate -- rpm should just
fail gracefully).

Comment 65 Jeff Johnson 2006-02-20 22:45:18 UTC
RO mounts detected in rpm-4.4.5.

Comment 66 Tethys 2006-02-20 23:08:55 UTC
Thank you, Jeff. Does this explicitly check for RO mounts, or does it just not
update the RPM database if the files couldn't be written (for whatever reason)?

Comment 67 Rolloffle 2006-05-16 15:51:25 UTC
Wow, Jeff. You stalled passive-aggressively for *23 months*, fighting against
what is not doubt a mere 3 or 4 line patch, to fix a blatant bug in the system.
No wonder Bill Gates is a billionaire; are you autistic or just a cunt?

Comment 68 Michael Jennings (KainX) 2006-05-16 16:49:07 UTC
While I'm certain the chafing can get quite uncomfortable, even the most severe
cases of diaper rash do not justify such verbal tantrums.  I'm sure we can find
some people willing to chip in and buy you some talcum powder.


Comment 69 Serge van den Boom 2006-06-06 11:27:15 UTC
After being repeatedly forsaken by package managers in the past, I've been 
compiling all my software manually for the past 5 years.

As the work involved is getting tiresome, I'm lately becoming more prepared to 
give package managers (yet) another chance.

Now someone happened to mention this bug report (with the remark "tell me this 
is a joke"). I was astounded to see that the maintainer of such a crucial piece 
of software doesn't see the value of error handling and considers leaving the 
system in an inconsistent state to be EVER acceptable, although it did explain 
some past experiences.

I'm relieved to see that after two years, he finally seems to accept the common 
wisdom exhibited in this thread, although it isn't clear from his note here 
whether he just fixed this problem by checking for this specific case, or 
solved the general case of ensuring the system is left in a consistent state 
after an unexpected error has been encountered.

I hope he's learned something from this, but don't think I'll be trying any rpm-
based distro any time soon again.



Comment 70 Dmitry Bolkhovityanov 2006-06-07 03:39:56 UTC
Jeff Johnson wrote 
"RO mounts detected in rpm-4.4.5."

BUT: what is rpm-4.4.5 and WHERE is it?  Or, maybe, a better question is "WHEN?"?

Fedora5 includes rpm-4.4.2, so does the "rawhide" --
ftp://download.fedora.redhat.com/pub/fedora/linux/core/development/source/SRPMS/.
Even RHEL's updates include only 4.3.3.
And RPM.org site doesn't mention any version numbers at all, also looking deserted.

So, where the currently-authoritative RPM site is? (Wraptastic.org? When did it
split from RedHat, and what is their current relationship?)
And when this new RPM version will be shipped with RedHat/Fedora?

Thanks in advance to anyone who can shed some light!

Comment 71 Jon Dowland 2006-06-08 09:45:38 UTC
See ftp://jbj.org/pub/rpm-4.4.x/

Comment 72 kushaldas@gmail.com 2006-06-08 16:58:35 UTC
Jeff Johnson wrote 
"RO mounts detected in rpm-4.4.5."

When we will get that in the repo ??


Comment 73 Robert Scheck 2006-06-08 17:48:04 UTC
For updating Fedora Core to a current RPM version have a look to bug #174307.

Comment 74 Ken Snider 2006-06-08 20:27:25 UTC
http://www.kuro5hin.org/story/2006/6/5/101431/9311

If this can be believed, Jeff Johnson lost his job as a result of this ticket.

Comment 75 Michael Jennings (KainX) 2006-06-08 21:01:53 UTC
You'll also want to read the following enlightening documents of equal merit:

http://www.carnicom.com/contrails.htm
http://www.travis-walton.com/
http://www.nardwuar.com/vs/bill_kaysing/index.html


Comment 76 Bennett Mead 2006-06-08 21:49:41 UTC
Yah, totally.  Struck me as a surly guy doing a spin job on the facts.  I assume
Jeff is indeed not working for Red Hat any more?  That kuro5hin post is almost
cerrtainly a third person account mostly devoid of facts...

Comment 77 Michael Jennings (KainX) 2006-06-09 01:00:13 UTC
Jeff does indeed not work for RedHat any more.  And having talked to him myself
on numerous occasions, I have it on good authority that he's not the least bit
upset about it.

If you want more details, ask him yourself, or visit his web site at
http://wraptastic.org/blog/index.php



Note You need to log in before you can comment on or make changes to this bug.