Red Hat Bugzilla – Bug 155456
SG_IO with timeout value of UINT_MAX results in invalid timeout value passed to scsi_add_timer
Last modified: 2009-08-22 09:32:31 EDT
Description of problem:
According to the comment in /usr/include/scsi/sg.h describing the timeout field
in sg_io_hdr_t, a MAX_UINT value is supposed to indicate "no timeout" for a SCSI
passthru command submitted via the SG_IO ioctl:
unsigned int timeout; /* [i] MAX_UINT->no timeout (unit: millisec) */
In RHEL3 and RHEL2.1, this seemed to work correctly.
However, on RHEL4, it looks like the sg module timeout calculation was changed
so that a value of UINT_MAX entered in the 'timeout' field of sg_io_hdr_t
results in an invalid timeout, and the scsi eh thread being called immediately
for a command submitted in this fashion. I've tested this on both i386 and
x86_64 arch's, and they both behave the same.
Version-Release number of selected component (if applicable):
I've tested both 2.6.9-5.ELsmp and 2.6.9-6.37.ELsmp, both contain the error.
Steps to Reproduce:
1. Enable scsi logging via /sys/module/scsi_mod/scsi_logging_level
2. Use the SG_IO ioctl to issue a scsi passthru command with MAX_UINT timeout (I
sometimes put the attached test program in a while loop to issue the command
over and over for easier reproducibility).
3. Observe that the trace indicates scsi_add_timer getting called with a -2
timeout value, and the scsi eh thread getting woken up immediately as a result
SCSI passthru command may complete or may get aborted, depending on who wins the
SCSI passthru command should not get immediately aborted, but should be given
essentially infinite time to complete. Also, scsi_add_timer should be called
with a valid value, not -2.
I'll attach a little test program you can use to reproduce this case easily. I
originally found this when testing iscsi for rhel4 u2 on the rhel4 u1 beta
kernel, 2.6.9-6.37.ELsmp. I've also found the failure on the rhel4 kernel
2.6.9-5.ELsmp. Running the same test program on rhel3 or rhel2.1 runs fine, and
this invalid timeout calculation is not seen, nor does the scsi eh thread fire
Just at a glance, in my rhel3 tree I'm looking at (2.4.21-25.EL), sg_new_write()
calls sg_ms_to_jif(), which is a fairly straightforward function. In rhel4
though, it looks like sg_new_write() now calls a more generic function,
msecs_to_jiffies(), which is more complicated, and apparently doesn't handle
this MAX_UINT value correctly.
Created attachment 113413 [details]
test program to reproduce the failure
I take back the comment about i386. It looks like this problem only occurs on
x86_64 from what I can tell. i386 platform seems ok - at least on 2.6.9-5.ELsmp
Just to clarify the scope of this problem. We've only been able to reproduce it
consistently on x86_64 with 2.6.9-6.37.ELsmp (rhel4 u1 beta). It doesn't appear
that the original rhel4 kernel - 2.6.9-5.ELsmp is affected.
One more datapoint. On x86_64, the attached test program seems to run ok on
2.6.9-5.ELsmp. However, we have another tool that still fails in a similar
fashion on 2.6.9-5.ELsmp x86_64. This same tool passes on 2.6.9-5.ELsmp i386.
So it looks like the scope is limited to x86_64 arch, and is easiest to
reproduce on 2.6.9-6.37.ELsmp.
Is someone chasing this down for inclusion in RHEL4 U2?
It's one my ToDo list.
OK, I think I may have found the problem. It's a small change to possibly fix
it. I'm building a test RPM right now (but that takes several hours), so if you
want to try the fix sooner I'm uploading the proposed patch.
Created attachment 116940 [details]
Possible fix for SG-IO timeout issue
Explanation of the patch is embedded in the comments.
Test kernel located at
I ran the reproducer on the test kernel for 1,000,000 iterations without a
failure. I'm now booting back to 2.6.9-5.ELsmp in order to try the same thing
there and see if it reproduces on that kernel (I don't have the 2.6.9-6.37.ELsmp
Created attachment 117013 [details]
Test case reproducer
The reproducer, as posted, does not work (amongst other things it doesn't even
use the MAX_UINT define for the timeout and therefore doesn't even begin to
test the failing case). I've modified the reproducer to actually test the
problem at hand. Using my modified reproducer, an i386 test machine works
perfectly fine running the 2.6.9-6.13.ELdledfordsmp kernel. On the other hand,
an x86_64 machine fails within usually less than 60 seconds with either the
2.6.9-5.ELsmp or 2.6.9-11.32.ELdledfordsmp kernel. The test patch doesn't
work, but with the reproducer now working properly I hopefully will be able to
find the actual problem.
In order to narrow this down, can you please check all machines that exhibit
this problem and see if the dmesg output contains a line such as this:
Losing some ticks... checking if CPU frequency changed.
warning: many lost ticks.
Your time source seems to be instable or some driver is hogging interupts
I suspect that timer coherency problems may be related to this issue and
knowing if afflicted machines display this message while non-afflicted machines
don't would narrow that down. Thanks for checking on this.
Sorry about the snafu with the test program - I must have posted the version of
testing a potential workaround (use a large timeout but not MAX_UINT).
Not sure if I can test this tonight but I'll try. I'll also try to dig up the
messages file from when I originally repro'd this problem to check for the
things you mention.
Unfortunately, I no longer have the messages file from the machine that I was
hitting the failure on 3 months ago, so I can't check for those messages.
I ran some tests on a x86_64 2.6.9-11.15.ELsmp machine, but using the test
program (yours or mine - the corrected one) I'm no longer able to repro the
problem. Looks like I need to run one one of the original kernels I saw the
problem on, or create another test case.
We do have a piece of software in house that can repro the problem on this
latest kernel, but it's more complicated, and I can't just attach the source to
this bz. When I modify this software to just use a large timeout value but not
MAX_UINT, I don't see the problem on this latest kernel. So I'm pretty sure the
problem is still there but it's just no longer repoducible with the simple test
At any rate, don't kill yourself for the freeze this week. I have a workaround
I can use for now. I'll try to find another test case to repro the problem on
the latest kernel and run on the original kernels where I first saw it. Let me
know if you have a kernel you want me to test.
I've forwarded this bug to Ingo Molnar for review. I suspect the actual problem
is in the scalable timer code that he helped to write. Specifically, any time
you add a timer, the maximum allowed offset from the current time is jiffies +
MAX_JIFFIE_OFFSET. At the time the timer is initialized by the SCSI stack, we
check against jiffies to get jiffies + MAX_JIFFIE_OFFSET (it doesn't use no
timer as the docs say, it just uses the maximum allowed timer for MAX_UINT).
However, the scalable timer code keeps a per CPU list of bases that it sorts
newly added timers into. On each CPU, it keeps track of what jiffie it is up to
having serviced already. So, if the CPU in question is lagging behind by a
couple of jiffies, the number jiffies + MAX_JIFFIE_OFFSET will actually be a
wrapped negative number relative to base_jiffies stored on the CPU. My theory
is that when a CPU that is behind by a couple of ticks in servicing timers
(which is why I posted the messages I did about loosing timer ticks, my test
machine is displaying that at bootup and that could easily make the per CPU
base_jiffies variable get behind the real global jiffies variable) gets a new
timer set to the global jiffies + MAX_JIFFIES_OFFSET, it will erroneously place
that timer into a bucket that is getting ready to be ran basically immediately.
This would only be a problem on machines that are either A) heavily loaded
enough that the timer servicing routines are running behind or B) on a machine
that's loosing lots of ticks and getting behind as a result (which is more
common on x86_64 due to things like not having proper IDE drivers on newer
chipsets and therefore running with interrupts masked during IDE operations,
which is the case on my test machine, where as i386 hardware is more likely to
have optimized, interrupt friendly drivers and less likely to exhibit this
problem). I'm not entirely sure what would be required to fix this, and since
Ingo wrote the timer code in question, that's why I sent him an email requesting
a review of this bug. I suspect that in the mod_timer() function we could check
to see if the timer offset being added needs fixing up with this test:
if (time_after(timer->expires, base_jiffies + MAX_JIFFIE_OFFSET) &&
time_before_eq(timer->expires, jiffies + MAX_JIFFIE_OFFSET))
timer->expires = base_jiffies + MAX_JIFFIE_OFFSET;
I'll try that on my test machine, which crashes rather quickly, and see if it
solves the problem (although I'm not sure exactly how the per CPU stuff is
handled in this code, so I may run into cross CPU issues that make the test fail
for other reasons, which is why I want Ingo to review this).
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time. This request will be
reviewed for a future Red Hat Enterprise Linux release.