From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050302 Firefox/1.0.1 Foobar/1.0.1-1.4.3.foo Description of problem: because rhel-3 had rh-postgresql and rhel-4 has postgresql (package name is different) upgrade removes postgres user from system. Version-Release number of selected component (if applicable): postgresql-7.4.7-2.RHEL4.1 How reproducible: Always Steps to Reproduce: 1. rpm -Uvh postgresql-*7.4.7-2.RHEL4.1* Actual Results: There is no postgres user any more on system. Expected Results: postgres user should be in system. Additional info: Suggested fix: %triggerpostun -n postgresql-server -- rh-postgresql-server groupadd -g 26 -o -r postgres >/dev/null 2>&1 || : useradd -M -n -g postgres -o -r -d /var/lib/pgsql -s /bin/bash \ -c "PostgreSQL Server" -u 26 postgres >/dev/null 2>&1 || : touch /var/log/pgsql chown postgres:postgres /var/log/pgsql chmod 0700 /var/log/pgsql
I suppose this is a side effect of the Obsoletes: bug you already noted (bug #151909). The postgres user is (and should be IMHO) added and removed as a consequence of installing or uninstalling the postgresql-server package, not the base package. So in the state with no server package, you won't have a postgres user either. There's no particular harm done ... when you reinstall the server package the user will be redefined with the same UID, and so will regain ownership of /var/lib/pgsql.
Ok. Let's assume that bug #151909 is fixed. That won't fix removed postgres user. 1. %pre script of postgresql-server package is run first, trying to add postgres user to system. 2. %postun script of rh-postgresql-server gets parameter "0" (no rh-postgresql-packages left in system). %postun script will remove postgres user. 3. System has postgresql-server package installed on system but no postgres user left. Triggerpostun scriptlet is only way to fix it.
So if package A replaces package B, B's uninstall script is run after A's install script? Isn't that an RPM bug? Seems like it's guaranteed to break things in quite a lot of scenarios.
Nope, that's the defined behavior. From /usr/share/doc/rpm-<version>/triggers: For reference, here's the order in which scripts are executed on a single package upgrade: new-%pre for new version of package being installed ... (all new files are installed) new-%post for new version of package being installed any-%triggerin (%triggerin from other packages set off by new install) new-%triggerin old-%triggerun any-%triggerun (%triggerun from other packages set off by old uninstall) old-%preun for old version of package being removed ... (all old files are removed) old-%postun for old version of package being removed old-%triggerpostun any-%triggerpostun (%triggerpostun from other packages set off by old un install)
notting: the fact that it's documented doesn't mean it's not broken ;-) In particular, it seems awfully brain-dead that the old preun script runs after the new package is already installed. What if that script expects to be able to run programs belonging to the old package? They're very likely nonfunctional now that the package fileset has been partially overwritten. Another bug (in the context of rh-postgresql to postgresql anyway) is that the old preun script will decide to stop the postgres server ... and with this ordering of events, what it'll be stopping is the new-version server. I don't think I can fix that one in the triggerpostun script, since that script can't tell if the server had been running and hence mustn't issue a blanket service-start request. I'll put the proposed trigger into postgresql.spec, but I think there are a ton of similar problems lurking due to this poor design of the update sequence.
I believe the original ordering was done this way in order to have the most guarantee of a working setup at any particular failure step. For example, if the old preun ran before new-pre, you could end up with a very broken system if new-pre fails. Or I could be remembering wrong, and they are just scheduled at that point becuse that (after all the new %pre/%install/%post) is when the actual remove step for the old package is. Perhaps the latter. In any case, since this RPM behavior has been around since the inception, I doubt that it will be changed. Note that in the same case where you're seeing the server get stopped, chkconfig --del is also getting run, so if the initscript has the same name in the two versions, you'll need a trigger to re-run chkconfig --add.
No trigger is needed, just use the instance counts: %preun INSTANCES=$1 case INSTANCES in 0) # Real erase do chkconfig chkconfig --del your_service ;; 1) # Its an upgrade do nothing : ;; esac exit 0 Cheers...james
James: no, you missed the point: the case of concern is where postgresql-server is replacing rh-postgresql-server. AIUI the instances count will be counting only postgresql-server in that package's scripts, and only rh-postgresql-server in that package's scripts; so rh-postgresql-server is gonna think it's an "rpm -e" and do the wrong things. Bill: thanks for the note about chkconfig, but that at least is a bullet we dodged --- rh-postgresql-server uses different initscript names. I think the only real risk here is a possible need to start the server manually after upgrade. The fix is in postgresql-7.4.7-5.RHEL4.2 which I hope we can still squeeze into U1.
And stopped service is not so bad here after all; database needs dump before upgrade and restore after it to be useable after upgrade anyway. Current system doesn't handle ondisk database format upgrades. Usually this really means: After upgrade, when database won't start, Revert back to old rh-postgresql-server package (this is not easy because of Obsoletes: tags and different package names, needs rpm -i --force and other crude operations to get it running. Dump database. Backup /var/lib/pgsql/data directory and, remove it. Initialize database. Restore old database from dump. Stopped service is not such big deal. -)
This fix will be pushed out into RHEL4, FC3, and FC4 along with security updates currently being made (PG 7.4.8 and 8.0.3 releases).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-433.html