Bug 433267
Summary: | [Stratus 4.6.z bug] iounmap may sleep while holding vmlist_lock, causing a deadlock. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | RHEL Program Management <pm-rhel> |
Component: | kernel | Assignee: | Vitaly Mayatskikh <vmayatsk> |
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.6.z | CC: | andriusb, chas.horvath, damin, dmair, jbaron, jskrabal, lwoodman, smcgrath |
Target Milestone: | rc | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHSA-2008-0167 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-03-14 10:31:03 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 361931 | ||
Bug Blocks: | 240187 |
Description
RHEL Program Management
2008-02-18 08:48:29 UTC
Chas, this is slated to be delivered on 13-Mar-08 in the 4.6.z stream. Jiri, what exactly do you need (and when) from Stratus? Bug 361931 has the exact patch needed... Are we still on schedule for this being released 13-Mar-08? Hi Adrius, I'm little bit confused here. From the bug activity list I see that you have set the NEEDINFO flag on and now you are asking me what information I need. It looks like you did it by mistake or I missed something or I did some mistake myself. I'm still quite new here so it may be my fault. From the original bug history its obvious that the patch has been tested on 4.6 release and it is working. Also the devel_ack is on in the EUS bug. It looks like the only blocking issue here is the bug status (NEEDINFO). So I'm changing the status to ASSIGNED. Still, the bug shall be delivered as planed originally. Thanks Jiri - I see Vitaly just spun kernel-2.6.9-67.0.7.EL, and I'm assuming there will be another internal include/spin prior to the 13-Mar-08 GA date... Patch included in kernel 2.6.9-67.0.7.EL An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0167.html Gentlemen, while this bug has been closed, I was wondering if there is any possible way that this issue could affect the mpt-fusion drivers and cause a potential Journal Abort error on an EXT3 filesystem. [root@dmx-node5 ~]# Message from syslogd@dmx-node5 at Sat Mar 22 03:42:13 2008 ... dmx-node5 kernel: journal commit I/O error We are running under Vmware ESX 3.5 w/ an iSCSI SAN, and in most cases, I would attribute this to very high load on the SAN. In fact, I've not had issues w/ this until more recent kernels. At the time that this issue is happening (it is not isolated to this VM, but to all machines running 2.6.9-67.0.4) there is nominal load, and no indication of SAN timeout issues or SCSI mid-layer issues in the VM or the logs. Am I chasing a red-herring here? If so, any suggestions on debugging procedures that I should use to diagnose the specific issue? This does not seem to affect anything running RHEL5 w/ latest kernels. We're running kernel-hugemem-2.6.9-67.EL we have a RAID-10 setup and just recently we noticed issues with "find" spinning in "D" state (uninterruptible sleep) can this be related? time strace -c find . -type d >../find.out q% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 49.44 114.793832 63 1821161 getdents64 18.73 43.484328 16 2731259 lstat64 11.44 26.549265 15 1820837 chdir 5.78 13.427308 15 910426 close 5.31 12.329499 14 910427 open 4.78 11.103291 12 910426 fstat64 4.42 10.251790 11 910419 fcntl64 0.10 0.230356 31 7375 write 0.00 0.000206 69 3 mremap 0.00 0.000203 203 1 execve 0.00 0.000127 21 6 read 0.00 0.000124 31 4 munmap 0.00 0.000103 13 8 old_mmap 0.00 0.000061 12 5 mmap2 0.00 0.000032 11 3 brk 0.00 0.000029 15 2 1 access 0.00 0.000028 14 2 mprotect 0.00 0.000023 12 2 fchdir 0.00 0.000012 12 1 time 0.00 0.000012 12 1 uname 0.00 0.000012 12 1 set_thread_area ------ ----------- ----------- --------- --------- ---------------- 100.00 232.170641 10022369 1 total real 73m26.471s user 1m27.163s sys 5m54.949s And further on two systems at this kernel level we see "First orphan inode" showing up in output of tune2fs -l Like so: First orphan inode: 9519201 On one of these hosts gconf problems appeared and df -hl said /tmp/ was full du -sh did not agree with df and isn't surprising since df keeps track of things differently. Please advise. |