Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 546060

Summary: soft lockup while unmounting a read-only filesystem with errors (As per Redhat Bug #429054)
Product: Red Hat Enterprise Linux 5 Reporter: Rhys McMurdo <r.mcmurdo>
Component: kernelAssignee: Eric Sandeen <esandeen>
Status: CLOSED ERRATA QA Contact: Igor Zhang <yugzhang>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: esandeen, jwest, qcai, r.mcmurdo, rwheeler, tao, tumeya, yugzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 20:56:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch that fixed the issue against -164. none

Description Rhys McMurdo 2009-12-09 22:00:43 UTC
Description of problem:
I am opening a new bug report as I am not too sure if ones that have been closed will be actioned on.

As per my comments in RH Bug #429054, I feel that this issue has now regressed as it is happening on the latest kernel release of RHEL 5.4

Version-Release number of selected component (if applicable):
Kernel 2.6.18-164.6.1.el5 x86_64


How reproducible:
Always

Steps to Reproduce:

mkfs.ext3 /dev/hdb1
tune2fs -e remount-ro /dev/hdb1 
mount -o quota /dev/hdb1 /mnt
quotacheck /mnt
setquota -u root 10000 10000 10000 10000 /mnt
quotaon /mnt
dd if=/dev/zero of=/mnt/dump
# Cause corruption
dd if=/dev/zero of=/dev/hdb1
# Cause filesystem to remount ro
file /mnt/dump
# Cause Soft lockup
umount /mnt  
  
Actual results:
Soft lockup as per the messages in 429054

Expected results:
No soft lockup

Additional info:

Comment 1 Ric Wheeler 2009-12-10 03:34:01 UTC
When you say a soft lockup, do you mean that your file system umount hangs or that other processes hang?

Comment 2 Rhys McMurdo 2009-12-10 05:16:49 UTC
Hi,

umount sits at 100% CPU utilisation and hangs, while the system appears to be still available the only way to clear the hang is via a reboot. Sometimes I have experienced a complete system lockup as well (Though the first behaviour seems to be more frequent). Messages like the following appear on the console every 10 seconds (NB: This is from another server log, however the same behaviour has been observed on a 2.6.18-164.6.1.el5 kernel):

Dec  7 22:27:22 flurry-srv1 kernel: BUG: soft lockup - CPU#6 stuck for 10s! [umount:24310]
Dec  7 22:27:22 flurry-srv1 kernel: CPU 6:
Dec  7 22:27:22 flurry-srv1 kernel: Modules linked in: ipt_MASQUERADE iptable_nat ip_nat nfs fscache mptctl mptbase ipmi_devintf ipmi_si ipmi_msghandler dell_rbu nfsd exportfs lockd nfs_acl auth_rpcgss autofs4 hidp rfcomm l2cap bluetooth
 sunrpc bnx2 ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp xt_multiport iptable_filter ip_tables x_tables ib_iser libiscsi scsi_transport_iscsi ib_srp ib_sdp ib_ipoib ipv6 xfrm_nalgo crypto_api rdma_ucm ib_
ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev ata_piix libata qla2xxx(U) ib_mthca ib_mad fl
oppy ide_cd ib_core i5000_edac cdrom e1000e sg pcspkr edac_mc serio_raw dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod qla2xxx_conf(U) intermodule(U) shpchp megaraid_sas sd_mod scsi_mod ext3 
jbd uhci_hcd ohci_hcd ehci_hcd
Dec  7 22:27:22 flurry-srv1 kernel: Pid: 24310, comm: umount Tainted: G      2.6.18-128.1.6.el5 #1
Dec  7 22:27:22 flurry-srv1 kernel: RIP: 0010:[<ffffffff88055c40>]  [<ffffffff88055c40>] :ext3:ext3_write_dquot+0x73/0x74
Dec  7 22:27:22 flurry-srv1 kernel: RSP: 0018:ffff810439c81de0  EFLAGS: 00000206
Dec  7 22:27:22 flurry-srv1 kernel: RAX: 00000000ffffffe2 RBX: ffff81043b058800 RCX: ffff810434b47a78
Dec  7 22:27:22 flurry-srv1 kernel: RDX: ffffffff800fb14c RSI: 0000000000000002 RDI: ffff81043b058800
Dec  7 22:27:22 flurry-srv1 kernel: RBP: ffff810439c81e18 R08: ffff81044d187ee8 R09: ffff810439c81e18
Dec  7 22:27:22 flurry-srv1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Dec  7 22:27:22 flurry-srv1 kernel: R13: 0000000000000000 R14: ffff81043b058800 R15: 0000000000000000
Dec  7 22:27:22 flurry-srv1 kernel: FS:  00002b4f209c8570(0000) GS:ffff81010f3a5b40(0000) knlGS:0000000000000000
Dec  7 22:27:22 flurry-srv1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  7 22:27:22 flurry-srv1 kernel: CR2: 00000034f28d3150 CR3: 000000043a6a2000 CR4: 00000000000006e0
Dec  7 22:27:22 flurry-srv1 kernel: 
Dec  7 22:27:22 flurry-srv1 kernel: Call Trace:
Dec  7 22:27:22 flurry-srv1 kernel:  [<ffffffff800fb1d7>] dqput+0x10c/0x19f
Dec  7 22:27:22 flurry-srv1 kernel:  [<ffffffff800fbeae>] vfs_quota_off+0xf6/0x3c7
Dec  7 22:27:22 flurry-srv1 kernel:  [<ffffffff800decf8>] deactivate_super+0x5b/0x82
Dec  7 22:27:22 flurry-srv1 kernel:  [<ffffffff800e8290>] sys_umount+0x245/0x27b
Dec  7 22:27:22 flurry-srv1 kernel:  [<ffffffff800b46af>] audit_syscall_entry+0x16e/0x1a1
Dec  7 22:27:22 flurry-srv1 kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0

Comment 3 Takuma Umeya 2010-03-17 00:29:28 UTC
cnp from Bz429054... 
I believe the initial patch didn't fix the problem. So this is not a regression
but it wasn't fixed until now. Looked around for another patch upstream and
this should fix the problem: 
b48d380541f634663b71766005838edbb7261685
Built on 5.4 and the issue is gone. I've already started to take action against
incorporating this to the tree.

Comment 4 Eric Sandeen 2010-03-17 01:08:44 UTC
Great, thanks.  Feel free to take the bug as well, and sorry for the misfire on the original bug ...

-Eric

Comment 7 Takuma Umeya 2010-03-30 01:20:21 UTC
Created attachment 403390 [details]
patch that fixed the issue against -164.

Comment 8 Eric Sandeen 2010-03-30 02:51:09 UTC
Thanks for the patch!  Sorry the other bug didn't take care of this.

-Eric

Comment 11 RHEL Program Management 2010-05-20 12:46:24 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 13 Jarod Wilson 2010-05-25 21:10:46 UTC
in kernel-2.6.18-200.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 17 errata-xmlrpc 2011-01-13 20:56:58 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html