Bug 82231
Summary: | SCSI I/O hangs system with NFS writes, nfsd status = DW | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Ethan Vanmatre <evm> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 8.0 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-09-30 15:40:25 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ethan Vanmatre
2003-01-20 05:56:44 UTC
OK, I've had more time to look at this problem. It is not a NFS problem. NFS just uses the disks via the aic7xxx driver. It appears that kupdated and bdflush deadlock. I can recreate the scsi hang at any time by causing heavy I/O through this scsi controller. For example if I start more than one fsck to the disks on that controller one by one they will hang. For example I can start 8 fsck to the 8 disks (of 514 GB each) and see the system hang in a matter minutes. The fsck processes one by one hang as Kupdated and bdflush DW then SW. After a while all fsck and both kupdated and bdflush and sometimes one or more kjourneld hang. I have a ps oxw pid,command,whcannel captured but it is on the system and it is hung. I'm away from work right now. The system responds to pings but does not allow ssh into it. Usually you can loginto the system and do most everything that does not need this scsi adapter. I have noted that reducing the ammount of memory from 1.5GB to 512 MB appears to allow the fsck to run longer before hanging. I can fsck all 8 disks serially but that takes over 8 hours. Doing 2 at a time usually causes the hang in the first 30 to 40 minutes. More info tomorrow. After replacing the external raid controller I dod not see any change in status. The system still hung. I tried getting the 6.2.28 aic7xxx driver but it does not load with 2.4.18-19.8.0* Thinking the 6.2.8 driver is bad I dropped back to 2.4.18-14 kernel and the 6.2.28 aic7xxx driver loaded. So the 2.4.18-14* kernel and 6.2.28 aic7xxx driver allowed all 8 fsck to operate in parallel without a hang. Problem solved? Well yes. BUT... But, write rated to the external raid are very strange. Eight array are defigned. scsi id 2 lun 0 through lun 7 When writing a 1.1 GB file I get most writes completion in 30 sec to a minute but for 2 of the "disks" the time is 7 and 13 minutes respectivly. And it appears to change over time. Any thoughts? Thanks ,Ethan Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |