Red Hat Bugzilla – Bug 72148
Rpm hangs (SIGTERM immune) when fed with unsolicited standard input.
Last modified: 2008-05-01 11:38:03 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3; Linux; X11; i686)
Description of problem:
When you use rpm, and type-ahead the future commands to be executed, rpm can
hang requiring a SIGKILL. SIGTERM doesn't help.
About 90% times, almost always with larger rpm workloads (many packages).
Some packages seem to be immune.
Steps to Reproduce:
1. Run rpm installer (-U, -i) on almost any package. KDE packages are good to
2. When rpm is running, type-ahead something for the command line (at least
one full line with enter)
3. Most of the time, rpm will hang after finishing the current package, or
4. kill -9 followed by cd /var/lib/rpm && db_recover are required to bring it
back to a workable state
Actual Results: rpm hangs, requiring SIGKILL
Expected Results: rpm finishes and exists cleanly
This bug seems to be present in all rpm 4.1 versions since the first or second
limbo beta release. It is not present in rpm 4.0.x
It's pretty inconvenient at times, as some people are used to type-ahead.
rpm has a database, rpm-4.1 now traps signals
in order to avoid stale database locks left
from interrupted installs. Providing a stronger
guarantee of data integrity has to be balanced with
I'm gonna mark this WONTFIX in the sense that rpm
needs to have a signal handler from now on.
Responsiveness will slowly improve (by polling for
signals more often), but the overriding goal is
data integrity, not typeahead responsiveness, in rpm.
It's not about signal handler -- that would be another bug.
This bug is about rpm going dead when it's present with unsolicited standard
This is a regression against 4.0.x and it doesn't make sense for rpm to
break dead when something comes on the standard input. Does standard input
handling have anything to do with signal handling? As it is, rpm now hangs
cold when you type anything on the keyboard while it's running. I don't think
that's something that users expect nor desire.
rpm (4.1-0.85 if it matters) does not "hang"
when presented with endless 'a' characters
on stdin during a large upgrade: WORKSFORME
just now, feel free to try to duplicate.
Again, I suspect that the behavior you are seeing
has everything to do with signal handling, why
else would you put SIGTERM in the subject line?
Okay, I'll try to get the exact steps to duplicate -- it seems that only
certain kinds of jobs are susceptible to this.
The SIGTERM thing was a side-effect of my main problem -- that after it hung
due to blind-typing (i.e. presenting it with standard input), it was
impossible to Ctrl-C it. But that's another story.
OK. Watch out for the 2 following effects that
might otherwise be interpreted as "hangs":
1) On upgrade, erased packages are
sorted to the end of the transaction,
leading to an unpleasantly long delay without
progress bars at the end of a transaction.
2) SIGHUP/SIGTERM/SIGINT/SIGQUIT are all
trapped while the database is open, and
existence of the signal is checked for early exit
processing when signals are unblocked, i.e. after
most database operations. IMHO, this is the
"hang" that you are reporting.
1.On upgrade, erased packages are sorted to the end of the transaction.
Then I assume that would happen again and in the same way after I restart it:
killall -9 rpm
cd /var/lib/rpm && db_recover
rpm -Uhv ...
Oh, I must have forgot to mention: the problem I'm seeing is that it's hung
without any cpu usage!!! So unless you have some kind of O(1) sorting w/
dummy sleep() afterwards, it's not due to actual work being done.
2. SIGHUP/SIGTERM/SIGINT/SIGQUIT are all trapped while the database is open.
I don't care much. I'm using SIGKILL ;-) And again, that's a nicety I don't
care about yet. What bothers me is that feeding rpm with unsolicited
standard input makes it *sometimes* stop and hang (hand w/o cpu usage).
I should strace it to see what it exactly hangs at.
NB: I initially thought it was all due to using a database created by older
db3/rpm4.0.x, but I removed the database and reinstalled all modules by
hand just to make it isn't that.
No the database format is the same.
If it's truly a "hang", then it's due to
a database lock, possibly stale and persistent.
rm -f /var/lib/rpm/__db*
to eliminate the possibility of an old, stale
lock hanging out. FYI: handling the reference
count on the persistent /var/lib/rpm/__db*
files is the whole reason for trapping signals.
And, the /var/lib/rpm/__db* files are now persistent,
so you should only have to remove under rare and
exceptional conditions, like an rpm segfault. This
is new and different behavior in rpm-4.1, which
permits concurrent database access rather than
the traditional; exclusive/shared fcntl locking scheme.
If you are hanging on a database lock, attach strace,
and look for a steady heart beat of select's, about 1
BTW, killall -9 is *exactly* the rare, execptional, and
pathological condition where
rm -f /var/lib/rpm/__db*
is gonna be needed. Otherwise you *will* have stale