From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 Description of problem: I have /usr mounted readonly. I forgot to remount it readwrite before doing a yum update, which resulted in the following output: leto:~# yum update Gathering header information file(s) from server(s) Server: Fedora Core 1 - i386 - Base Server: Fedora Core 1 - i386 - Released Updates Finding updated packages Downloading needed headers getting /var/cache/yum/updates-released/headers/ghostscript-0-7.07-15.2.i386.hdr ghostscript-0-7.07-15.2.i 100% |=========================| 29 kB 00:00 getting /var/cache/yum/updates-released/headers/xmms-1-1.2.10-1.p.i386.hdr xmms-1-1.2.10-1.p.i386.hd 100% |=========================| 8.7 kB 00:00 getting /var/cache/yum/updates-released/headers/openssl-devel-0-0.9.7a-33.10.i386.hdr openssl-devel-0-0.9.7a-33 100% |=========================| 24 kB 00:00 getting /var/cache/yum/updates-released/headers/openssl-0-0.9.7a-33.10.i686.hdr openssl-0-0.9.7a-33.10.i6 100% |=========================| 8.9 kB 00:00 getting /var/cache/yum/updates-released/headers/openssl096-0-0.9.6-26.i386.hdr openssl096-0-0.9.6-26.i38 100% |=========================| 4.8 kB 00:00 getting /var/cache/yum/updates-released/headers/xcin-0-2.5.3.pre3-21.fc1.2.i386.hdr xcin-0-2.5.3.pre3-21.fc1. 100% |=========================| 6.6 kB 00:00 getting /var/cache/yum/updates-released/headers/xmms-devel-1-1.2.10-1.p.i386.hdr xmms-devel-1-1.2.10-1.p.i 100% |=========================| 5.2 kB 00:00 getting /var/cache/yum/updates-released/headers/ghostscript-devel-0-7.07-15.2.i386.hdr ghostscript-devel-0-7.07- 100% |=========================| 6.4 kB 00:00 getting /var/cache/yum/updates-released/headers/xmms-skins-1-1.2.10-1.p.i386.hdr xmms-skins-1-1.2.10-1.p.i 100% |=========================| 5.8 kB 00:00 getting /var/cache/yum/updates-released/headers/openssl096b-0-0.9.6b-18.i386.hdr openssl096b-0-0.9.6b-18.i 100% |=========================| 5.9 kB 00:00 getting /var/cache/yum/updates-released/headers/openssl-perl-0-0.9.7a-33.10.i386.hdr openssl-perl-0-0.9.7a-33. 100% |=========================| 6.3 kB 00:00 getting /var/cache/yum/updates-released/headers/hpijs-0-1.5-4.2.i386.hdr hpijs-0-1.5-4.2.i386.hdr 100% |=========================| 6.5 kB 00:00 Resolving dependencies Dependencies resolved I will do the following: [update: xmms 1:1.2.10-1.p.i386] [update: openssl 0.9.7a-33.10.i686] [update: ghostscript 7.07-15.2.i386] [update: openssl-devel 0.9.7a-33.10.i386] Is this ok [y/N]: y Getting xmms-1.2.10-1.p.i386.rpm xmms-1.2.10-1.p.i386.rpm 100% |=========================| 1.9 MB 00:35 Getting openssl-0.9.7a-33.10.i686.rpm openssl-0.9.7a-33.10.i686 100% |=========================| 1.1 MB 00:19 Getting ghostscript-7.07-15.2.i386.rpm ghostscript-7.07-15.2.i38 100% |=========================| 7.5 MB 02:17 Getting openssl-devel-0.9.7a-33.10.i386.rpm openssl-devel-0.9.7a-33.1 100% |=========================| 1.6 MB 00:30 Running test transaction: Test transaction complete, Success! openssl 100 % done 1/8 error: unpacking of archive failed on file /usr/bin/openssl;40637f96: cpio: open xmms 100 % done 2/8 error: unpacking of archive failed on file /usr/bin/wmxmms;40637f96: cpio: open ghostscript 100 % done 3/8 error: unpacking of archive failed on file /usr/bin/bdftops;40637f96: cpio: open openssl-devel 100 % done 4/8 error: unpacking of archive failed on file /usr/include/openssl: cpio: chown Completing update for xmms - 5/8 Completing update for openssl - 6/8 Completing update for ghostscript - 7/8 Updated: xmms 1:1.2.10-1.p.i386 openssl 0.9.7a-33.10.i686 ghostscript 7.07-15.2.i386 openssl-devel 0.9.7a-33.10.i386 Transaction(s) Complete Not a problem, I thought -- just remount it readwrite and try again: leto:~# yum update Gathering header information file(s) from server(s) Server: Fedora Core 1 - i386 - Base Server: Fedora Core 1 - i386 - Released Updates Finding updated packages Downloading needed headers getting /var/cache/yum/updates-released/headers/openssl-0-0.9.7a-33.10.i386.hdr openssl-0-0.9.7a-33.10.i3 100% |=========================| 8.9 kB 00:00 Resolving dependencies .Dependencies resolved I will do the following: [update: openssl-devel 0.9.7a-33.10.i386] I will install/upgrade these to satisfy the dependencies: [deps: openssl 0.9.7a-33.10.i686] Is this ok [y/N]: y Running test transaction: Test transaction complete, Success! openswarning: /usr/share/ssl/openssl.cnf created as /usr/share/ssl/openssl.cnf.rpmnew openssl 100 % done 1/3 openssl-devel 100 % done 2/3 Completing update for openssl-devel - 3/3 Updated: openssl-devel 0.9.7a-33.10.i386 Dep Installed: openssl 0.9.7a-33.10.i686 Transaction(s) Complete That's odd... it seems to have done openssl, but what about ghostscript and xmms? Somehow it's managed to delete them from the rpm database: leto:~# rpm -q ghostscript xmms package ghostscript is not installed package xmms is not installed leto:~# ls -l /usr/bin/{xmms,gs} -rwxr-xr-x 1 root root 3172440 Jan 15 12:29 /usr/bin/gs -rwxr-xr-x 1 root root 1086504 Oct 22 19:50 /usr/bin/xmms leto:~# rpm -qf /usr/bin/{xmms,gs} file /usr/bin/xmms is not owned by any package file /usr/bin/gs is not owned by any package Not ideal, so I tried: leto:~# rpm --rebuilddb leto:~# rpm -q ghostscript xmms package ghostscript is not installed package xmms is not installed Now what? :-) Not sure if this should be logged against yum or rpm. Version-Release number of selected component (if applicable): Unknown ("rpm -q yum" claims it's not installed) How reproducible: Didn't try Additional info:
Wht do you expect a package installer to "work" if the majority of paths in packages begin with /usr and /usr is readonly and cannot be written to ?!? You can try adding to /etc/rpm/macros %_netsharedpath /usr to disable installing files on paths that begin with "/usr/"
Please reread the bug report. I don't expect it to work. I do, however, expect it to not corrupt the RPM database in the process. Since having a readonly /usr is a perfectly valid state for the machine, I would expect yum and/or rpm to be able to gracefully handle that situation, and fail in a sane manner.
Please re-read my reply. Did you try %_netsharedpath /usr
Yes, I read your reply. You suggested that as a way of disabling installing files to /usr. Why on earth would I want to do that? Since RH (and now FC) packages aren't relocatable, that would render my system unusable. What I want is to have RPM fail gracefully when it can't write to a file. As for asking if I've tried it? No. Since I currently have a borked RPM database, I *can't* try it until I reinstall. How you can class behaviour that leaves my system in that state as "NOTABUG" is beyond me. Reopening...
How exactly is rpm to install files onto a RDONLY mount point? %_netsharedpath is the existing mechansim for disabling installs onto RDONLY mount points which you did not use. That is NOTABUG in my book, ymmv. Your database can be fixed by verifying what you have installed is what you want. You might examine the output of rpm -qa --last to see the order in which pkgs were installed. You might also look at /var/log/rpmpkgs*, which is a cron driven query listing what was in your database at the time the query was run. Any missing packages are easily reinstalled. "corruption" is different than "missing", there are procedures to deal with that as well.
Sigh. I really feel like I'm banging my head against a wall with this one. >How exactly is rpm to install files onto a RDONLY mount point? *** IT's NOT *** I have never once said I wanted it to. I've said I want it to fail without corrupting the database. That's what this bug is about. The clue is in the title. >You might also look >at /var/log/rpmpkgs*, which is a cron driven query >listing what was in your database at the time the query >was run. leto:~# grep xmms /var/log/rpmpkgs* /var/log/rpmpkgs.2:xmms-1.2.8-3.p.i386.rpm /var/log/rpmpkgs.3:xmms-1.2.8-3.p.i386.rpm /var/log/rpmpkgs.4:xmms-1.2.8-3.p.i386.rpm leto:~# ls -l /var/log/rpmpkgs* -rw-r--r-- 1 root root 23647 Mar 29 04:04 /var/log/rpmpkgs -rw-r--r-- 1 root root 22939 Mar 27 04:02 /var/log/rpmpkgs.1 -rw-r--r-- 1 root root 22988 Mar 20 04:02 /var/log/rpmpkgs.2 -rw-r--r-- 1 root root 23256 Mar 13 04:02 /var/log/rpmpkgs.3 -rw-r--r-- 1 root root 23256 Mar 6 04:02 /var/log/rpmpkgs.4 So as you can see, xmms disappeared from the RPM database sometime between March 20th and March 27th. In actual fact, it was on March 25th, as evidenced by the date I logged this bug. >Any missing packages are easily reinstalled. > >"corruption" is different than "missing", there are procedures >to deal with that as well. Agreed. But the packages *AREN'T MISSING*. They're still there, and still usable. They're just not in the RPM database. Thus "corruption" rather than "missing". Had the upgrade actually deleted them, I'd agree with you. But it didn't. It jsut corrupted the RPM database. All of this was clearly shown in my original post.
Arguing with me is not gonna fix your rpmdb.
>Arguing with me is not gonna fix your rpmdb. True. But I thought I might at least be able to get you to acknowledge the problem. Can you even see what I'm trying to get at? RPM has, through normal operation, corrupted its own database. How can that not be a bug? I agree that installing to a readonly filesystem is always going to fail. Like I said, it was accidental. But why do you believe the correct course of action at that point is to corrupt its database?
What's the problem here, Jeff? RPM should be detecting that it can't read to a filesystem, and failing gracefully - which it isn't, and a side-effect of this behaviour is the corruption of rpmdb.
I agree, this is clearly a bug. Nobody is asking rpm to do the impossible and magick more space to the usr, what this is all about is that it should notice that something went (badly!) wrong and abort the installation in time instead of forcefully wrecking trough like all was in order, even though it's clearly not, and blowing up things while as a result. Every software should fail gracefully, and RPM being a critical system component, even more so.
You have got to be kidding me. By what rational is this NOT a bug? Are you saying that, BY DESIGN, it is intended to corrupt the database if a target filesystem is read-only?
Shawn... The two posts prior to yours just said that it is indeed a bug and that jeff johnson is a moron. I will reiterate: This is defineatley a bug, and jeff johnson is definatly a moron. Someone needs to mail this jeff person and get it to reopen the damn bug.
just a suggestion - if you want jeff to acknowledge the bug and work on a fix - calling him a moron is not the best way to go about achieving that goal.
This is clearly a bug. If Jeff doesn't know how to fix it, or is unwilling to do so for any reason, then he should get the bug reassigned to somebody else, not close it.
This really, really is a bad bug and IMHO now that Red Hat is hoppin' on the SELinux train, this becames even more an issue. Under no circumtances RPM database should not became corrupted. Here's some pseudo code where I'll try to explain what all the other guys in this bug have already been trying to explain: --- if (/usr_is_mounted_read_only) { die("Hey! Your /usr is mounted read-only. I'm unable to continue so I quit now. See you later!"); } else { do_this_installation_stuff(); } --- What currently happens is something like --- do_this_installation_stuff(); if (/usr_is_mounted_read_only) { corrupt_rpm_database_without_notice(); } --- Is that REALLY supposed to happen?
Uh. My first example is of course the preferred way to handle the ro-situation, and the latter is how this currently happens.
Changing component to rpm instead of yum.
What makes Linux such a joy to work with is the hierarchy in responsibility. Fix the problems at the origin. e.g. If an application crashes the kernel, the kernel should be fixed. If an application crashes X11, X11 should be fixed. If a webpage crashes the browser, the browser should be fixed. and definitely: If an application corrupts the primary system software database, the package manager needs to be fixed. A friend showed me this incident, and frankly I'm a little shocked. We had this http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=73097 horror on RH8, lets not repeat this. Each and every one I know who ran RH8 at one time or another has had the problem, and it has turned away many of them from RH. RPM should not be able to corrupt the system because it found something slightly unexpected in /usr.
The consensus seems to agree with me that this really is a bug, so reopening...
Package management on RO /usr is not supported by rpm, never has been. I see no other bug here.
I suggest that a new bug is opened that simply defines the exact behaviour. Its quite obviously a bug that the rpm database is corrupted, the reason its corrupted is the context of the bug, but it is not the bug itself. The bug is that the rpm database is corrupted. The only reason for marking a bug WONTFIX or NOTABUG is that the behaviour (if verified) is as by design. And in this case the design is NOT whether rpm is being asked to write to a read only path but that the rpm database may be corrupted. I hardly think this was intended by design.
Yes, %_netsharedpath is the designed behavior for handling RO mount points in rpm. The RO mount is the source of the problem. Configure rpm and use appropriately with RO /usr is the answer. Other problems, like "corrupted" rpmdb, are derivative on the correct use and configuration. Screaming about the symptom when the cure for the cause has been described is, well, pointless. Now will you please leave this bug closed?
So accidentally installing to a read only path is a user problem and corruption of the rpm database is their own fault? I'd have thought that the graceful exit after testing the destination would be to inform the user that the destination is read only. Suppose there was a situation where in copying a file to a read only destination the original file's directory entry was corrupted? This is almost analogous and has similar consequences. The contents can be rebuilt or recovered but the user is hardly likely to expect or know what to do in this situation.
I agree with Simon. "The RO mount is the source of the problem. Configure rpm and use appropriately with RO /usr is the answer." So then shouldn't rpm check for this condition and inform the user of their mistake (i.e. "rpm cannot work properly with a RO mount; please adjust the permissions") instead of messing up the rpm db? Maybe this should be lowered in priority, or changed to an enhancement, but I don't think it's total b.s. either.
As a system administrator I would expect that if the rpm system could not successfully apply a given package (regardless of the reason why) it would not update the database. Reading his bug report leads me to believe it updated the rpm database before the package was successfully applied (which is why it doesn't rerun the packages for GhostScript & xmms). In my mind it is more of a process issue and error handling within the rpm system. I'm not reopening this...but I think it is a valid bug (either with yum or rpm).
>Configure rpm and use appropriately with RO /usr is the answer. No, Jeff, that's not the answer. If you believe it is, then you either didn't understand the problem correctly, or have just failed to explain your solution well enough (I concede that the latter is a possibility, albeit a slim one in my eyes). >Screaming about the symptom when the cure for the cause has >been described is, well, pointless. > >Now will you please leave this bug closed? No, I won't leave it closed until I can see that you've understood the problem. If you can show that you have, and still believe it's not a bug, then yes, I'll (reluctantly) leave it closed. You haven't yet described your cure for the problem. If you do, I'll go away a happy man. So, once more... *** I DON'T EXPECT RPM TO WORK WHILE MY /usr IS MOUNTED READ ONLY *** With that out of the way, how exactly will %_netsharedpath help here? From your description, it will prevent me from installing anything in /usr. But I *want* to install stuff there. In fact RH/FC won't let me install anywhere else, so I *need* to install stuff there: leto:~% rpm -qa | wc -l 1630 leto:~% for i in $(rpm -qa); do rpm -qi "$i" | head -1 | egrep 'not relocatable'; done | wc -l 1499 Of the packages installed on this FC2 system, only 9% can be installed anywhere other than /usr. So, how will %_netsharedpath help me install updates on my machine? Since you believe it to be a cure for the problem, how should I use it? My current approach is: 1. mount -oremount,rw /usr 2. Install/upgrade any packages that I need 3. mount -oremount,ro /usr The bug was triggered by me forgetting to do step 1 first. However, by using %_netsharedpath, step 2 will fail. How am I meant to keep my system up to date using %_netsharedpath? Or are you just claiming that a readonly /usr is not a valid configuration for a Red Hat or Fedora machine and that I shouldn't do it?
Sigh. Look, if you want to keep closing this, then there's nothing I can do to stop you. But at least have the decency to explain your rationale. You haven't answered any of my questions, and you haven't explained what I'm meant to do to avoid further corrupting my RPM database. *Please* explain what I'm meant to do. If you believe I should be using %_netsharedpath, then please answer my questions in comment 26.
Nice to see a reasoned response there, Jeff. Sigh. Fine. If you insist on acting like a moron, there's nothing more I can do, and this will stay closed. But it's still a bug, and it's still not fixed.
Reading through this, it seems obvious that the solution to the real problem has not been identified; that being how to fix the corrupted rpm database. Jeff has identified how to avoid corrupting it in the future (and really a script could be written easily enough that would do the remounting of /usr for you), but as to the soltuion to fixing the rpm database after it has already been corrupted he has given no real insight at all... or did I miss that? Jeff, if rpm is designed to corrupt it's own database as punishment for use under improper configuration conditions, what is the solution to fixing that rpm databse once this state has been reached?
You are correct that noone in this bug has cared to ask How do I fix a corrupted rpm database? Presumably, that's because everyone is more inclined to call me a "moron", and/or ask leading questions with "punishment". There are gazillions of bug reports describing how to repair a database. And I'll be quite happy to walk anyone through the rather simple procedure. Open a different bug, please. This bug is not the place to fix anything.
That's because the bug wasn't corncerned with fixing a corrupted RPM database. It was concerned with preventing RPM from corrupting its own database in the course of normal operation. But you seem utterly blind to that fact. I didn't call you a moron, believing you to simply be not understanding the problem. Right up until the point where you simply refused to listen to reason.
I have to agree with Jeff Johnson here. Linux is a system for experts only. If you make a mistake you deserve to have your database corrupted. I think that, in the future, if a user makes spelling errors, typos, or any other kind of mistake, we should just go ahead and corrupt the RPM database for good measure. Also if a user visits AOL.COM we should corrupt their filesystem. Developers like Jeff are what make Linux what it is: mightier than Microsoft. Leave this "bug" closed!
Look guys, maybe it will help if we re-phrase the problem this way: 1. What is the expected behaviour of rpm when it encounters a problem that prevents it from continuing? 2. Does "/usr is readonly" fall under the above situation? I understand that there are many cases to take care of in a complex program. None of us are saying that rpm should work with a read-only /usr. What we are talking about is the failure mode of the software. I think we can all agree that if a program detects (or CAN detect) an improper configuration or input, it should give an error message and exit. Adam
Sibil, I have to disagree. While the update should obviously fail with a readonly /usr, it should fail gracefully, not corrupt its own database! Especially in a critical system program like this, such failure modes are not acceptable for a production system. If you're going to take an "experts only" stance, you might as well be using Gentoo, where at least they seem to care that the system WORKS. While this corruption is obviously caused by "user error", the software should be resilient to such errors, as users WILL make mistakes. Call the user a "moron" in the error if you'd like, but there should be a sanity check here, as it's obviously needed. However, there should be no need for a special check for a readonly filesystem -- normal error checking should catch the failure to write the data to the filesystem in the first place. Does this mean that rpm ISN'T checking for errors while installing packages?!? That's truly disturbing. What if the filesystem is full? What if an I/O error occurs? Blindly assuming that every operation works is incredibly dangerous, and inexcusable for rpm in particular. Between this and the refusal to consider the Windows XP dual-booting problem as a "stop-ship" bug for Fedora Core 2, I'm becoming increasingly convinced that the Fedora Core approach to replacing the Red Hat desktop distributions is a dismal failure, and the FC releases cannot be trusted for production use...
since Jeff doesn't want to fix this, I have. Patches available at http://dc.h4xx.com
Warning, don't click on the link in comment 35. Someone's idea of a joke, certainly not an rpm patch. You've been warned.
Section 3.2 of the Red Hat Enterprise Linux Reference Guide: http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/ref-guide/s1-filesystem-fhs.html "Red Hat Enterprise Linux uses the Filesystem Hierarchy Standard (FHS) file system structure, which defines the names, locations, and permissions for many file types and directories. The FHS document is the authoritative reference to any FHS-compliant file system, but the standard leaves many areas undefined or extensible. This section is an overview of the standard and a description of the parts of the file system not covered by the standard. Compliance with the standard means many things, but the two most important are compatibility with other compliant systems and the ability to mount a /usr/ partition as read-only. This second point is important because the directory contains common executables and should not be changed by users. Also, since the /usr/ directory is mounted as read-only, it can be mounted from the CD-ROM or from another machine via a read-only NFS mount."
This is obviously a bug in rpm/yum. It should be doing such updates as a transaction, or at least fail gracefully.
I agree... it's one if-statement, I don't know what the big deal is.
So I guess we just need to refile this bug and get someone else to fix it for us.
Am I reading this correctly, that the problem is related to the rpm db being "out of sync", because the db was mounted read-only when the install was performed, and therefore not present in the db at this point?
Jeff, what's the deal here? This bug has been in limbo for almost 6 months. Do you have something against Red Hat/Fedora? By refusing to transform RPM into a robust/mature product you are damaging the credibility of Linux. Are we arguing over the description of the bug? Over sysadmin responsability? Error Checking is good. If you know that something will damage the system, prevent the system from doint that action before the damage is done. Easy as that. A sysadmin would switch editors in a moment if half his script disappeared if he hit just one wrong key. People expect a mature program like YUM or RPM to prevent whatever reasonable mistakes a user can throw at it. **Any action that results in a corrupt database is a bug!** What's wrong with that statement? Is there a single command with no other user input that can be run to fix the database or something so that you can brush it off as trivial? Sysadmins are busy people, every time you have a bug that takes 15 minutes or even just one minute to clean up after, you're costing companies money. At least keep it around as a bug someone else can fix. Where are you coming from on this? Linux is on the path towards greater user exceptance, why stand up for things that will be considered reasons not to go with Red Hat/Fedora/Anything that uses RPM and YUM? Package adding/updating is so fundamental, it should be more bug free than mere applications.
Granted, most software problems lie between the headphones... this is not one of them. The user made a mistake thats easy to protect against. If modifying rpm to fix this issue would harm it in some other way, then state that. or, state WHY this behavior is NORMAL to the operation of rpm. otherwise, fix the fucking problem. it ain't the users fault that the db got corrupted and it shouldn't be a complicated problem to correct.
I think Tethys or someone else with good writing skills should mail some RedHat manager and suggest that mr. Johnson should be fired immediately for refusing to fix clearly identified problem in one of the company's core products.
I find it difficult to believe that this silly argument over an edge case bug which only arises through a PEBKAC condition is still going on. I also find the comments Max made in #44 above to be reprehensible and beneath contempt. Beyond that it's clear that since it's yum we're discussing this bug is not a serious issue for anyone except the victim with the, obvious, PEBKAC problem. It is clearly an edge case now being used to bludgeon Jeff. Get off it... grow up. If I were Jeff I'd quit before I fixed this now! You don't solve problems or interact well with others via threats and abusive language. It only makes people mad. NB. PEBKAC == Problem Exists Between Keyboard And Chair
It seems my #130260 is actually a duplicate of this bug. If something related to rpm (rpm? something in librpm?) exits unexpectedly, it can leave the RPM data base in an inconistent state. The workaround I used for #130260 was to find files in /usr/bin and /usr/lib that were not symlinks and that rpm -qf did not believe were a part of any package, then find which package the files actually were a part of on another system (or perhaps using rpm --redhatprovides), and manually rpm -Uvh the .rpm. My specific situation was the loss of the .rpm repository location (an NFS volume) during a yum upgrade, but it seems to be a general flaw in rpm's handling of unexpected failures. I believe this bug should remain open until such time as the flaw in rpm has been identified and corrected. This is a robustness issue.
@Deven T. Corzine I'm with sybyl on this one. One of the big problems with Windows is the idiotic users blindly installing malware left right and centre. If Linux holds their hands when they make a mistake then they will never learn. Much better to show them the error of their ways - an afternoon spent fixing the rpm database will drum it into their heads that installing to a RO directory is wrong! By making the users think about their actions they will not make the same mistake twice - this will free up valuable developer time. Jeff must have better things to do, such as all the urgent patches to TuxRacer, than tell users not to install to RO locations!
This has nothing to do with idiotic users blindly installing malware. Stupid or lazy users are not the ones who have to worry about this bug, since they would never have /usr readonly in the first place. The people most likely to have /usr readonly are conscientious, competent (but perhaps harried) system administrators who have to worry that one tiny mistake (forgetting to remount /usr read-write) will trash the RPM database. ANY professional system administrator will tell you that this failure mode in the core package management software is absolutely inexcusable and reason enough to consider other Linux distributions. Anyone who says "just don't make mistakes" is out of touch with the real world. Even supremely competent experts make mistakes and typos on occasion. Nobody's perfect. But nobody should have to worry that a trivial user error will cause such drastic system corruption. The competent system administrator who forgets to remount /usr expects an error to occur. Within seconds, they'll probably realize that they forgot to remount /usr and do that, then try again. This should work. But with this bug, that minor error corrupts the system and leaves the poor administrator with significant recovery work to do because of a trivial mistake. Draconian punishment for trivial mistakes may work in a totalitarian state, but in the real world, system administrators who get burned by this bug will be looking for alternative Linux distributions in short order. We're not talking about handholding here. Handholding would be to offer to remount /usr read-write for the user (since they forgot) -- but nobody is suggesting anything of the sort. We're talking about robust software design. No user action (no matter how wrong) should cause severe internal corruption like this. The software should fail gracefully, preferably with an informative error message, but at least without self-immolation. Of all the bugs that a developer should fix, a bug like this should be the TOP priority, not written off as "not a bug". You're right about one thing, though. Spending an afternoon repairing the RPM database because of a trivial mistake will teach them "the error of their ways" -- but not in the way you think. It will teach them that they were foolish to trust this software and (by extension) any Linux distribution which uses it. This is a very serious bug, yet some people just don't get it.
Deven Corzine, you are WRONG! Expert system administrators, people like me who use Linux, never ever make mistakes. Quite simply, we are perfect - this is why we use Linux, which is also perfect. There are some who say "but making /usr readonly is in the Red Hat official documentation! It is documented standard procedure!" Well, this is true. But that is not the point - the point is, ONLY IDIOTS READ THE MANUAL. Sure, you read the manual, you do what it says, it corrupts your RPM database. Fine. You're not perfect, you're not an expert, you don't DESERVE linux. Now if you'll excuse me, Jeff Johnson and I need to go patch TuxRacer, it's still way better than Counter Strike but I think more can be done.
Ummmm... Like hella LOL dude... I'll P$WN you're a$$ckor's at TuxRacer yo' and you'll never even get a chance to counter$trike my axxors, $0 pwned will you be.... 101 @ Joo 4 ever
If you're going to make this CLOSED->WONTFIX, then I'll keep opening it until you give some justification for that decision.
Well well... hiding behind that mac.com address it seems we have one Jeff Johnson. Taking this personally are you Jeff? Well, it's still a bug, it's still not fixed, and so CLOSED->WONTFIX is inappropriate. Reopening.
What everyone seems to be missing here, and that Jeff is getting very sick of repeating, is that RPM uses a "best effort" algorithm when doing upgrades. The RPM database was not corrupted; believe me, I can show you what a corrupted RPM database looks like. What actually happened here is that the previous versions were removed as part of the upgrade process, and the newer versions were installed. At least as much as possible. :-) As many files as could be removed from the old install were, and that install was removed from the RPM DB. As much of the new stuff as could be installed was, but the install of some items failed, so the new packages were not added to the RPM DB. While this may not be what *you* think should've happened, it is consistent with RPM's upgrade algorithm and does NOT result in a corrupted database in any way, shape, or form. What is really being asked for is different (not "better," just "different") handling of failures during upgrades. For this, you want to look at the auto-rollback features RPM has. You'll also want to look at RPM's ability to repackage previously-installed versions during package upgrades, and how to permit yum to activate either or both of these features.
And I don't get, why on earth rpm could and should not check the overall mount point state before it attempts to do ANYTHING. If any of the required mount points were found mounted in read-only mode, rpm should abort and at least ask from the sysadmin what to do next. Let's try this one more time. THIS is the behaviour I kindly ask rpm to do: 1) rpm update/installation/other modification starts 2) rpm checks if any of the mount points is mounted read-only 3) should something be mounted read-only, rpm should throw out "DANGER, DANGER, WILL ROBINSON! You're mount point <x> is mounted read-only! Are you STILL sure you want to continue? Press Yes and I shall not take any responsibility from consequences. Press No and I shall abort. What is your command, my master?" 4) if user still decides to continue... well, then just do it, whatever that means. 5) if user chooses wisely and decides to abort, then abort.
At last, a reasoned, sane response. A response that's wrong on some details, but at least someone's making the attempt to talk about the problem. Thank you. The RPM database *was* corrupted. Not to the point of unreadability, to be sure, but since it didn't represent the state of the filesystem, then I'd call that corruption. A file was installed as part of an RPM package. After the events described here, RPM is unaware that those files exist, and claims they're not associated with any package. Well how else did they get there, and what would you call that if not db corruption? The files associated with the previous version weren't removed from the system, as I showed in the initial bug report. Now perhaps the previous version was removed from the RPM database before the new version was attempted to be installed. In which case, great. We've found the bug. It should be trivial to only alter the state of the database after the transaction is complete, rather than do it mid way through and risk the transaction failing later on. So RPM has auto-rollback features? Sounds great. If the transaction fails, roll back to a previously known good state. But that's not what happened here. Hence it's still a bug, whichever way you look at it. Plus, rpm(8) makes no mention of these features, and whether or not they need to be explicitly called or if they're enabled by default. Oh, and having reread the comments, I haven't seen Jeff mention once that RPM uses a best effort algorithm for upgrades. All he's said is that I should be telling it my /usr is mounted readonly and not to install anything there (and thus completely missing the point of this bug).
> The RPM database *was* corrupted. Not to the point of unreadability, to be > sure, but since it didn't represent the state of the filesystem, then I'd > call that corruption. A file was installed as part of an RPM package. After > the events described here, RPM is unaware that those files exist, and claims > they're not associated with any package. Well how else did they get there, > and what would you call that if not db corruption? The rpm database was not corrupted (which means that its data structures were garbled), but in an inconsistent state. The state it was in is quite explainable in that the error happened the "PROCESS" stage of the PSM (and in particular in the call to the FSM that delivers the files). This is before the POST stage of the PSM (Package State Machine) where the header is loaded into the database (or erased if its an erase element being ran through the PSM). > So RPM has auto-rollback features? Sounds great. If the transaction fails, roll > back to a previously known good state. But that's not what happened here. > Hence it's still a bug, whichever way you look at it. Plus, rpm(8) makes no > mention of these features, and whether or not they need to be explicitly called > or if they're enabled by default. It didn't do that because you did not configure rpm to do that. Its not documented because the feature is experimental. It is used in some production sites, but some of us are required to provide a rollback capability for our upgrades. Concerning the "best effort" policy of rpm, this has been this way since its creation AFAIK. In some environments its not acceptable, in others it is the sanest policy to use. Concrerning the real problem at hand, it would be very nice that when rpm detects it has read only filesystems that are not in the %_netsharedpath macro, it should fail early before running any package through the package state machine, because it can (there are issues, but bottom line it is very very possible). That said, expressing anger and directing it towards others in public will not engender anyone to work on this problem for you.
> And I don't get, why on earth rpm could and should not check the overall mount > point state before it attempts to do ANYTHING. If any of the required mount > points were found mounted in read-only mode, rpm should abort and at least ask > from the sysadmin what to do next. "Best effort" is the key portion of my message that you missed. And to answer your initial question, "checking the overall mountpoint state" is not the job of RPM. Package management does not include compensating for every stupid mistake an administrator can make. Should RPM also check to make sure you didn't set the "immutable" attribute on files it's trying to overwrite? Should it check to make sure you don't have /usr pointing to a RAMdisk and warn you of that too? Next you'll be wanting RPM to warn you before installing programs which don't pass a nessus audit, libraries with improper/undefined symbols or versioning, scripts with race conditions, documentation with words that are unsafe for children, and toilet paper that might cause chafing. Clearly you want a package manager which exceeds what RPM was/is designed and intended to do. You should either find one or write one yourself. > At last, a reasoned, sane response. A response that's wrong on some details, > but at least someone's making the attempt to talk about the problem. Thank you. You're welcome. Though you disagreeing with certain implementation/design decisions of RPM does not make me "wrong on some details." :-) > The RPM database *was* corrupted. Not to the point of unreadability, to be > sure, but since it didn't represent the state of the filesystem, then I'd > call that corruption. You'd be incorrect. The RPM DB isn't designed to represent the state of the filesystem. It's designed to represent the state of the package manager's knowledge of packages installed. And it was correct in that regard. > A file was installed as part of an RPM package. After > the events described here, RPM is unaware that those files exist, and claims > they're not associated with any package. Well how else did they get there, > and what would you call that if not db corruption? Your /usr was read-only. The files in /usr/bin were not installed by the packages you were trying to install. They were from the previous packages which were uninstalled. The files themselves couldn't be uninstalled because of the read-only filesystem, but the package was uninstalled. Remember, best effort means "do as much as we can." We could update the database, so we did. We could not remove the files, so we didn't. > The files associated with the previous version weren't removed from the system, > as I showed in the initial bug report. Now perhaps the previous version was > removed from the RPM database before the new version was attempted to be > installed. In which case, great. We've found the bug. It should be trivial to > only alter the state of the database after the transaction is complete, rather > than do it mid way through and risk the transaction failing later on. Again, you're missing the "best effort" part. What you want is transaction rollbacks, and as I said, that can be found elsewhere. > So RPM has auto-rollback features? Sounds great. If the transaction fails, roll > back to a previously known good state. But that's not what happened here. > Hence it's still a bug, whichever way you look at it. Plus, rpm(8) makes no > mention of these features, and whether or not they need to be explicitly called > or if they're enabled by default. No, it's not a bug. If there's a "bug" anywhere, it's in your lack of knowledge of the current RPM algorithms, the alternatives available, and the details of how to activate them; and in the documentation which fails to mention these things. RPM is behaving as designed and implemented. That is, by definition, not a bug. > Oh, and having reread the comments, I haven't seen Jeff mention once that > RPM uses a best effort algorithm for upgrades. All he's said is that I should > be telling it my /usr is mounted readonly and not to install anything there > (and thus completely missing the point of this bug). Jeff is one of the most skilled and intelligent programmers (and thinkers) I know. I can think through the repercussions and ramifications of things almost instantly. He didn't miss anything; you simply don't know him well enough to grasp his subtle sense of humor and his succinct style. You see, users often use Bugzilla and other web-based bug trackers as a way to label behavior they don't understand or that doesn't coincide with what they expected to occur as a Bug without actually trying to ask the first and more important question: Why did this happen? They also use it as a way to avoid taking the time to do due diligence research to find the answer to that question, like searching Google or mailing list archives. Had you done this, you'd have almost certainly stumbled upon at least one of the following links: http://www.linuxjournal.com/article/7034 http://www.redhat.com/archives/rpm-list/2003-January/msg00342.html https://lists.dulug.duke.edu/pipermail/rpm-devel/2006-January/000692.html Or even if you had not, a post to the mailing list would've resolved the issue much more quickly as someone like myself or James tried to explain to you why your expectations did not meet with reality, why the idea of a "rollback" is a far more complex notion than you have imagined, what (if any) rollback features exist, what their capabilities and flaws are, etc. Having explained the situation to the best of my ability, and having provided links for further research on the part of anyone not thus satisfied with my responses, I will make none further. You are more than welcome to continue to disagree regarding what the correct behavior should be, but as Jeff is the RPM author and has made his decision, you're not likely to make much progress in that regard. And continuing to use Bugzilla as leverage for empowering your disagreement is simply childish. To summarize: 1. The behavior described is not a bug. The application is behaving as designed. 2. Requests for additional pre-install checks (like filesystems being read-only) should be filed independently as RFE's and will be given attention directly proportional to their perceived sanity. 3. Your options with respect to transactions and rollbacks thereof are described in great detail here: http://www.linuxjournal.com/article/7034 4. Calling Jeff a "moron" says far more about you than it does about him. Largely because most of us know better.
Fair enough. I think we'll have to agree to disagree over terminology here, particularly regarding "corruption" and "best effort". I don't believe I'm asking more of a package manager than RPM was (or at least should have been) designed to do. I merely expect it to keep track of the packages that are currently installed on my system, and by extension, the files associated with those packages. To be honest, although rollbacks are a nice feature, I don't even think they're necessary. The ability to roll back is essential for many environments, but it could be equally well achieved outside of RPM (for example in yum). >And to answer your initial question, "checking the overall mountpoint >state" is not the job of RPM. Package management does not include >compensating for every stupid mistake an administrator can make. Should >RPM also check to make sure you didn't set the "immutable" attribute on >files it's trying to overwrite? Should it check to make sure you don't >have /usr pointing to a RAMdisk and warn you of that too? I actually agree with you here. RPM *shouldn't* be checking whether the filesystem is writable, or that files it's trying to write aren't immutable. What I believe it should do is have more sane behaviour in the event that a write to a file fails (which can, after all, happen for any number of reasons). >You'd be incorrect. The RPM DB isn't designed to represent the state of the >filesystem. It's designed to represent the state of the package manager's >knowledge of packages installed. And it was correct in that regard. Hmmm. I'll have to disagree there. The presence of a package and the presence of its associated files in the filesystem are inextricably linked IMHO. You can't reasonably claim that a package isn't installed if the files that it is supposed to install are present on the system. True, you can make that claim from the perspective of the package manager, but that's just playing with semantics to get the result you're looking for. By any sane definition, the package *is* installed, and the end user is unable to tell the difference. >If there's a "bug" anywhere, it's in your lack of knowledge of the current >RPM algorithms, the alternatives available, and the details of how to >activate them Remember that I didn't actually invoke the rpm command at any point here. All I did was "yum update". Indeed, it was initially filed as a yum bug. Perhaps if yum had used the --repackage and --rollback options to rpm, then this problem would never have shown itself. >Jeff [...] didn't miss anything; you simply don't know him well enough to >grasp his subtle sense of humor and his succinct style. Correct. In fact, I don't know him at all. All I get to see of him is his comments here. Even after rereading all of the comments in this bug, I fail to see any indication that he did grasp the problem I was describing, and plenty of evidence to suggest otherwise (e.g., his suggestion to use %_netsharedpath) >The behavior described is not a bug. The application is behaving as designed. Then there's still a bug, it just happens to be in the design of the application, rather than in the code, and it's still just as much in need of fixing. Can you honestly claim, with a straight face, that the package manager was designed to become inconsistent with the filesystem through normal usage, and that if it was, that this represents an acceptable state of affairs? >Calling Jeff a "moron" says far more about you than it does about him. >Largely because most of us know better. I'd ask you to reread through the comments in this thread, and see which ones came from me. It should be pointed out that Jeff could have saved himself a lot of bother by simply explaining 18 months ago what you've just explained here...
> Should RPM also check to make sure you didn't set > the "immutable" attribute on files it's trying to overwrite? Should it check to > make sure you don't have /usr pointing to a RAMdisk and warn you of that too? > Next you'll be wanting RPM to warn you before installing programs which don't > pass a nessus audit, libraries with improper/undefined symbols or versioning, > scripts with race conditions, documentation with words that are unsafe for > children, and toilet paper that might cause chafing. I for one am glad that Linux continues to take a hardline stance against user inadequacy. Any more hand-holding and we would be taken over by the desktop hordes. rpm's behaviour is exactly correct: the user has issued a command, therefore the OS ploughs on as best as it can. So the mount is read only? Who is the OS to question the user! Rollbacks and pre-flights are for the faint-hearted and ill- prepared. It's the user's job to consider the repercussions *before* issuing commands - ideally by reading a few years worth of magazines first in order to find out all the undocumented secrets.
The problem still exists in 4.4.2-7, compiled for FC4 from FC5-test. Just my 2 cents: such behaviour is PLAINLY INTOLERABLE. All arguments were stated many times above, and it is obvious that SUCH reaction to a trivial human mistake (in otherwise a perfectly legitimate setup with /usr usually readonly) is DEFINITELY A BUG. Whether it is an "implementation bug" or "design bug" -- is another question, but this is more a question of terminology. But it should be FIXED ANYWAY.
2 Sibil Llort and Dzhugashvili (hello, mr. Iosif Stalin! :-): would you be so kind to add smileys to your comments, since some people can't grok your light vein of humour and take your comments seriously.
2 Tethys: since you are reporter, don't you think it would be the right thing to set this bug's priority=high and severity=high? This rpm's behaviour makes severe damage to the system, so such attributes are justified...
Since Jeff Johnson isn't even mentioned in "Email sent to:", and "Assigned To" is Paul Nasrat, I have to conclude that Paul is currently the person responsible for this bugreport. Paul, what's the problem with fixing this? While such behaviour may seem correct from initial design's point of view (that discussion can be infinite), it is extremely inconvenient for "users", and as such is worth fixing. As for technical details: - As I understand, when detecting that some filesystem operation (file creation/removal) had failed from deep inside of RPM, it can be hard to "fail gracefully", i.e., leave everything "as was" -- since some work could already have been done (in other dirs, with other packages, etc.). - So, a pre-check, as some people above suggested, can be done (before actual actions start!): for each file/dir affected from each package, call "access(filename,W_OK)", and if it fails -- then print descriptive message (including strerror(errno)), and exit(1). (Since speed is not so critical aspect of RPM operation, a little overhead due to access() is tolerable, and those files/dirs would be touched by subsequent install/remove/upgrade anyway.) - Since there is no "C_OK" flag (analogue of O_CREAT), and access("/existent/dir/nonexistent-file",W_OK) returns -1/ENOENT, the check "access(filename,W_OK)<0" can be extended to "access(filename,W_OK)<0&&access(dirname(filename),W_OK)<0" for cases when an ability to create a file should be checked. - While above checks cover 99.9...% of situations, cases may exist when "current" ("dumb"?) behaviour is preferred; for such cases the "--force" flag can switch off the pre-check. Is such modification possible? It looks low-cost and would cure the problem for 99.9% cases.
BTW -- access()-based approach would also take care of other similar situations, like "immutable" bit set (which is also perfectly legitimate -- rpm should just fail gracefully).
RO mounts detected in rpm-4.4.5.
Thank you, Jeff. Does this explicitly check for RO mounts, or does it just not update the RPM database if the files couldn't be written (for whatever reason)?
Wow, Jeff. You stalled passive-aggressively for *23 months*, fighting against what is not doubt a mere 3 or 4 line patch, to fix a blatant bug in the system. No wonder Bill Gates is a billionaire; are you autistic or just a cunt?
While I'm certain the chafing can get quite uncomfortable, even the most severe cases of diaper rash do not justify such verbal tantrums. I'm sure we can find some people willing to chip in and buy you some talcum powder.
After being repeatedly forsaken by package managers in the past, I've been compiling all my software manually for the past 5 years. As the work involved is getting tiresome, I'm lately becoming more prepared to give package managers (yet) another chance. Now someone happened to mention this bug report (with the remark "tell me this is a joke"). I was astounded to see that the maintainer of such a crucial piece of software doesn't see the value of error handling and considers leaving the system in an inconsistent state to be EVER acceptable, although it did explain some past experiences. I'm relieved to see that after two years, he finally seems to accept the common wisdom exhibited in this thread, although it isn't clear from his note here whether he just fixed this problem by checking for this specific case, or solved the general case of ensuring the system is left in a consistent state after an unexpected error has been encountered. I hope he's learned something from this, but don't think I'll be trying any rpm- based distro any time soon again.
Jeff Johnson wrote "RO mounts detected in rpm-4.4.5." BUT: what is rpm-4.4.5 and WHERE is it? Or, maybe, a better question is "WHEN?"? Fedora5 includes rpm-4.4.2, so does the "rawhide" -- ftp://download.fedora.redhat.com/pub/fedora/linux/core/development/source/SRPMS/. Even RHEL's updates include only 4.3.3. And RPM.org site doesn't mention any version numbers at all, also looking deserted. So, where the currently-authoritative RPM site is? (Wraptastic.org? When did it split from RedHat, and what is their current relationship?) And when this new RPM version will be shipped with RedHat/Fedora? Thanks in advance to anyone who can shed some light!
See ftp://jbj.org/pub/rpm-4.4.x/
Jeff Johnson wrote "RO mounts detected in rpm-4.4.5." When we will get that in the repo ??
For updating Fedora Core to a current RPM version have a look to bug #174307.
http://www.kuro5hin.org/story/2006/6/5/101431/9311 If this can be believed, Jeff Johnson lost his job as a result of this ticket.
You'll also want to read the following enlightening documents of equal merit: http://www.carnicom.com/contrails.htm http://www.travis-walton.com/ http://www.nardwuar.com/vs/bill_kaysing/index.html
Yah, totally. Struck me as a surly guy doing a spin job on the facts. I assume Jeff is indeed not working for Red Hat any more? That kuro5hin post is almost cerrtainly a third person account mostly devoid of facts...
Jeff does indeed not work for RedHat any more. And having talked to him myself on numerous occasions, I have it on good authority that he's not the least bit upset about it. If you want more details, ask him yourself, or visit his web site at http://wraptastic.org/blog/index.php