Bug 18334 - Heavy I/O load causes deadlock
Heavy I/O load causes deadlock
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
ia64 Linux
high Severity high
: ---
: ---
Assigned To: Michael K. Johnson
Depends On:
  Show dependency treegraph
Reported: 2000-10-04 11:05 EDT by Matt Domsch
Modified: 2008-05-01 11:37 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2000-10-18 14:39:16 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
cpcmp.tgz (3.89 KB, application/octet-stream)
2000-10-04 11:06 EDT, Matt Domsch
no flags Details

  None (edit)
Description Matt Domsch 2000-10-04 11:05:02 EDT
Using the 2.4.0-0.31 IA-64 kernel (and before), under heavy I/O disk load 
the system gets into a deadlock state.  Using Magic-Sysrq, it appears to 
be waiting on one or more spinlocks.  The spinlocks I've observed it 
waiting on include:

file_move() getting file_list_lock.
sync_old_buffer() getting the big kernel lock.
schedule() getting the runqueue lock.

Issue seen on SMP systems (2 or 4 B0-step processors) with either 1GB or 
5GB of RAM.

I'll attach a copy-and-compare tool that brings out the behavior in 
several seconds.

cpcmp.tgz attached.  Untar, cd cpcmp.
The command looks like:
./cpcmp.pl 6 /usr/src/linux-2.4 /mnt/disk2/a,/mnt/disk1/b,50000,20

6 is the syslog level at which to write messages.
/usr/src/linux-2.4 is your original set of information.
This first gets copied to /mnt/disk2/a{1..20}/.
Then data gets copied from /mnt/disk2/a{1..20}/ to /mtn/disk1/b{1..20}/.
Each thread runs 50000 times (enough to last a long long time)
Run 20 threads.

Of course, change parameters at will.  This test with 2 disks for me died 
in about 2.5 seconds waiting on a spinlock.
Comment 1 Matt Domsch 2000-10-04 11:06:19 EDT
Created attachment 3740 [details]
Comment 2 Matt Domsch 2000-10-04 11:09:57 EDT
This was reproduced on Dell "Bordeaux" systems (Intel Lion beta units), Intel 
BIOS 56.
Comment 3 Matt Domsch 2000-10-04 14:26:45 EDT
Uniprocessor kernel on the same system does not fail.  Same test has been 
running for several hours now, no problems.  My guess is that either a) the 
processor B0 stepping isn't guaranteeing atomicity wrt spinlock operations, or 
b) the ia64 spinlock operations are wrong, or c) the compiler is generating bad 
assembly for the SMP spinlock functions.
Comment 4 Matt Domsch 2000-10-04 16:13:30 EDT
The 2.4.0-0.31 kernel SRPM doesn't work on IA-32 platforms due to the fact that 
the SCSI layer gets initialized twice, which was fixed in the 2.4.0-test9 
series.  Running the i386 .config file from the 2.4.0-0.31 SRPM on a 2.4.0-
test9-final kernel on my IA-32 SMP system (Dell PowerEdge 2400), and running 
cpcmp.pl, does not lock up (after 20 minutes, and I'll let it run).
Comment 5 Matt Domsch 2000-10-18 14:39:13 EDT
kernel 2.4.0-test9 + the IA-64 -test9 patch + modutils 2.3.18 solves the 
problem.  Tests ran > 16 hours with no failures.

Note You need to log in before you can comment on or make changes to this bug.