Bug 77562
Summary: | rpm sometimes hangs | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | djh <djh> |
Component: | rpm | Assignee: | Jeff Johnson <jbj> |
Status: | CLOSED WORKSFORME | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 8.0 | CC: | binand, gczarcinski, jason, jr-redhatbugs2, jsightler, nerijus, per.starback, rivenburgh, tristan.hill, wtogami, yaneti, yiango |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2002-11-11 05:04:31 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
djh
2002-11-09 01:32:09 UTC
Once "every 3 weeks" or so indicates that the missed SIGCHLD is fixed IMHO. That's all that's claimed in 4.1-9. BTW, there's still a teensy race in -9, the final fix will be to s/pause/sleep(1)/ in lib/psm.c. That isn't done yet because adding sleep(1) makes a hard problem even harder to sort out. Howevere, before I can make that claim, other sources of hangs need to be excluded. Can you read #73097 and try to figger out what else was happening on the machine when rpm hung every 3 weeks? And please, let's not start another bug pileon. You guys got me outnumbered :-) The back trace indicates that the "hang" is of the "stale lock" variety, not the missed SIGCHLD. What I need is an informed (guess if necessary) hint about what else was happening around the time rpm "hung". For me, this type of hang appears to always occur prior to installing/upgrading a particular package. If I specify two packages on the command line, it can either hang before installing the first package, e.g.: # rpm -ivh pkg1.rpm pkg2.rpm Preparing... ########################################### [100%] <HANG> OR after the first package is installed but before installing the second package: # rpm -ivh pkg1.rpm pkg2.rpm Preparing... ########################################### [100%] 1:pkg1 ########################################### [100%] <HANG> I don't recall seeing any hangs during the "erase" phase after all packages have been installed/upgraded. Unfortunately, I don't know of any way to consistency reproduce the hang. If I make a shell script that rapidly uninstalls/installs/upgrades packages, it'll never hang. Yet, on a few occasions, I've seen this hang occur when installing a single package after not using rpm for a week. FYI, I'm not running multiple instances of rpm simultaneously, I'm not using any programs other than rpm to access the rpm databases, and when I experience a hang it's almost always after the preceding rpm completed successfully. Did you throw an strace on the <HANG> or just assume that since the error bar sez' 100%, then rpm must be hung. See #73200 for details. FWIW, I ran randomized install/upgrade transactions in a loop for 24 hours, so I believe there is no missed SIGCHLD anymore. But, yes, the problem (if any) is gonan be a bear to track down. Yes, I have strace'd many of the hangs, and every single time I've gotten the same results as djdave.au -- endless select() timeouts. Same backtrace results too when I've done that. 'ps' always shows rpm in an S state. select timeouts (and the backtrace) are from the database. I need to characterize the condition somehow in order to begin to attempt to fix. If there's a problem within rpm, then the problem will track with either certain packages, or with the state of those packages. For example, the kernel package is often installed multiple times, which is different for that package. There's a screwy code path in rpm associated with files that both contain identical files that is often lightly tested. Another condition that tends to vary is whether this is/was an upgrade (i.e. previous version existed) or an install. Then there's where the hang occurs, in the trace above during rpmdbAdd(), the other common location when upgrading is rpmdbRemove(). Can you remember what package names were involved, or was it just random? > Another condition that tends to vary is whether this is/was > an upgrade (i.e. previous version existed) or an install. I've seen hangs with both upgrades (-U, -F) and new installs (-i) of packages. > Can you remember what package names were involved, > or was it just random? It's been a couple of weeks since the last hang (though I haven't been using rpm much lately since the systems are mostly configured), but from what I remember it was completely random. I got hangs with large packages (like kernel), and small utility packages (like ipchains, IIRC). After killing rpm, issuing the very same rpm command again did not hang; it didn't seem to make any difference whether I removed the __* files first or not (although afterward I did for good measure, and also an 'rpm --rebuilddb'). Removing __db* files first would prevent the hang (but opens a lock window). I claim that --rebuilddb is unnecessary, only rm -f __db* is needed, but won't hurt. I don't see any way to reproduce your problem so I'm gonna close this bug Yet Again. Feel free to reopen if you need help or have more info. |