Bug 433267 - [Stratus 4.6.z bug] iounmap may sleep while holding vmlist_lock, causing a deadlock.
Summary: [Stratus 4.6.z bug] iounmap may sleep while holding vmlist_lock, causing a de...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.6.z
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Vitaly Mayatskikh
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On: 361931
Blocks: 240187
TreeView+ depends on / blocked
 
Reported: 2008-02-18 08:48 UTC by RHEL Program Management
Modified: 2008-05-02 19:04 UTC (History)
8 users (show)

Fixed In Version: RHSA-2008-0167
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-03-14 10:31:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2008:0167 0 normal SHIPPED_LIVE Moderate: kernel security and bug fix update 2008-03-14 10:30:46 UTC

Description RHEL Program Management 2008-02-18 08:48:29 UTC
This bug has been copied from bug #361931 and has been proposed
to be backported to 4.6 z-stream (EUS).

Comment 2 Andrius Benokraitis 2008-02-18 13:48:13 UTC
Chas, this is slated to be delivered on 13-Mar-08 in the 4.6.z stream.

Comment 3 Andrius Benokraitis 2008-02-26 19:05:40 UTC
Jiri, what exactly do you need (and when) from Stratus? Bug 361931 has the exact
patch needed... Are we still on schedule for this being released 13-Mar-08?

Comment 4 Jiri Skrabal 2008-02-27 07:49:29 UTC
Hi Adrius,

I'm little bit confused here. From the bug activity list I see that you have set
the NEEDINFO flag on and now you are asking me what information I need.

It looks like you did it by mistake or I missed something or I did some mistake
myself. I'm still quite new here so it may be my fault.

From the original bug history its obvious that the patch has been tested on 4.6
release and it is working. Also the devel_ack is on in the EUS bug. It looks
like the only blocking issue here is the bug status (NEEDINFO). 

So I'm changing the status to ASSIGNED. Still, the bug shall be delivered as
planed originally.



Comment 5 Andrius Benokraitis 2008-02-27 18:15:32 UTC
Thanks Jiri - I see Vitaly just spun kernel-2.6.9-67.0.7.EL, and I'm assuming
there will be another internal include/spin prior to the 13-Mar-08 GA date...

Comment 6 Vitaly Mayatskikh 2008-02-27 18:45:07 UTC
Patch included in kernel 2.6.9-67.0.7.EL

Comment 10 errata-xmlrpc 2008-03-14 10:31:03 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0167.html


Comment 11 damin 2008-03-24 19:26:22 UTC
Gentlemen, while this bug has been closed, I was wondering if there is any 
possible way that this issue could affect the mpt-fusion drivers and cause a 
potential Journal Abort error on an EXT3 filesystem.

[root@dmx-node5 ~]# 
Message from syslogd@dmx-node5 at Sat Mar 22 03:42:13 2008 ...
dmx-node5 kernel: journal commit I/O error

We are running under Vmware ESX 3.5 w/ an iSCSI SAN, and in most cases, I would 
attribute this to very high load on the SAN. In fact, I've not had issues w/ 
this until more recent kernels. At the time that this issue is happening (it is 
not isolated to this VM, but to all machines running 2.6.9-67.0.4) there is 
nominal load, and no indication of SAN timeout issues or SCSI mid-layer issues 
in the VM or the logs.

Am I chasing a red-herring here? If so, any suggestions on debugging procedures 
that I should use to diagnose the specific issue?

This does not seem to affect anything running RHEL5 w/ latest kernels.

Comment 12 R.H. 2008-05-02 19:04:45 UTC
We're running kernel-hugemem-2.6.9-67.EL  we have a RAID-10 setup and just
recently we noticed issues with "find" spinning in "D" state (uninterruptible 
sleep) can this be related? 

time strace -c find . -type d >../find.out


q% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 49.44  114.793832          63   1821161           getdents64
 18.73   43.484328          16   2731259           lstat64
 11.44   26.549265          15   1820837           chdir
  5.78   13.427308          15    910426           close
  5.31   12.329499          14    910427           open
  4.78   11.103291          12    910426           fstat64
  4.42   10.251790          11    910419           fcntl64
  0.10    0.230356          31      7375           write
  0.00    0.000206          69         3           mremap
  0.00    0.000203         203         1           execve
  0.00    0.000127          21         6           read
  0.00    0.000124          31         4           munmap
  0.00    0.000103          13         8           old_mmap
  0.00    0.000061          12         5           mmap2
  0.00    0.000032          11         3           brk
  0.00    0.000029          15         2         1 access
  0.00    0.000028          14         2           mprotect
  0.00    0.000023          12         2           fchdir
  0.00    0.000012          12         1           time
  0.00    0.000012          12         1           uname
  0.00    0.000012          12         1           set_thread_area
------ ----------- ----------- --------- --------- ----------------
100.00  232.170641              10022369         1 total

real    73m26.471s
user    1m27.163s
sys     5m54.949s

And further on two systems at this kernel level we see "First orphan inode"
showing up in output of tune2fs -l

Like so:

First orphan inode:       9519201

On one of these hosts gconf problems appeared and df -hl said /tmp/ was full
du -sh did not agree with df and isn't surprising since df keeps track of things
differently.

Please advise.


Note You need to log in before you can comment on or make changes to this bug.