Bug 546060
| Summary: | soft lockup while unmounting a read-only filesystem with errors (As per Redhat Bug #429054) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Rhys McMurdo <r.mcmurdo> | ||||
| Component: | kernel | Assignee: | Eric Sandeen <esandeen> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Igor Zhang <yugzhang> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 5.4 | CC: | esandeen, jwest, qcai, r.mcmurdo, rwheeler, tao, tumeya, yugzhang | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-01-13 20:56:58 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Rhys McMurdo
2009-12-09 22:00:43 UTC
When you say a soft lockup, do you mean that your file system umount hangs or that other processes hang? Hi, umount sits at 100% CPU utilisation and hangs, while the system appears to be still available the only way to clear the hang is via a reboot. Sometimes I have experienced a complete system lockup as well (Though the first behaviour seems to be more frequent). Messages like the following appear on the console every 10 seconds (NB: This is from another server log, however the same behaviour has been observed on a 2.6.18-164.6.1.el5 kernel): Dec 7 22:27:22 flurry-srv1 kernel: BUG: soft lockup - CPU#6 stuck for 10s! [umount:24310] Dec 7 22:27:22 flurry-srv1 kernel: CPU 6: Dec 7 22:27:22 flurry-srv1 kernel: Modules linked in: ipt_MASQUERADE iptable_nat ip_nat nfs fscache mptctl mptbase ipmi_devintf ipmi_si ipmi_msghandler dell_rbu nfsd exportfs lockd nfs_acl auth_rpcgss autofs4 hidp rfcomm l2cap bluetooth sunrpc bnx2 ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp xt_multiport iptable_filter ip_tables x_tables ib_iser libiscsi scsi_transport_iscsi ib_srp ib_sdp ib_ipoib ipv6 xfrm_nalgo crypto_api rdma_ucm ib_ ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev ata_piix libata qla2xxx(U) ib_mthca ib_mad fl oppy ide_cd ib_core i5000_edac cdrom e1000e sg pcspkr edac_mc serio_raw dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod qla2xxx_conf(U) intermodule(U) shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Dec 7 22:27:22 flurry-srv1 kernel: Pid: 24310, comm: umount Tainted: G 2.6.18-128.1.6.el5 #1 Dec 7 22:27:22 flurry-srv1 kernel: RIP: 0010:[<ffffffff88055c40>] [<ffffffff88055c40>] :ext3:ext3_write_dquot+0x73/0x74 Dec 7 22:27:22 flurry-srv1 kernel: RSP: 0018:ffff810439c81de0 EFLAGS: 00000206 Dec 7 22:27:22 flurry-srv1 kernel: RAX: 00000000ffffffe2 RBX: ffff81043b058800 RCX: ffff810434b47a78 Dec 7 22:27:22 flurry-srv1 kernel: RDX: ffffffff800fb14c RSI: 0000000000000002 RDI: ffff81043b058800 Dec 7 22:27:22 flurry-srv1 kernel: RBP: ffff810439c81e18 R08: ffff81044d187ee8 R09: ffff810439c81e18 Dec 7 22:27:22 flurry-srv1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 Dec 7 22:27:22 flurry-srv1 kernel: R13: 0000000000000000 R14: ffff81043b058800 R15: 0000000000000000 Dec 7 22:27:22 flurry-srv1 kernel: FS: 00002b4f209c8570(0000) GS:ffff81010f3a5b40(0000) knlGS:0000000000000000 Dec 7 22:27:22 flurry-srv1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 7 22:27:22 flurry-srv1 kernel: CR2: 00000034f28d3150 CR3: 000000043a6a2000 CR4: 00000000000006e0 Dec 7 22:27:22 flurry-srv1 kernel: Dec 7 22:27:22 flurry-srv1 kernel: Call Trace: Dec 7 22:27:22 flurry-srv1 kernel: [<ffffffff800fb1d7>] dqput+0x10c/0x19f Dec 7 22:27:22 flurry-srv1 kernel: [<ffffffff800fbeae>] vfs_quota_off+0xf6/0x3c7 Dec 7 22:27:22 flurry-srv1 kernel: [<ffffffff800decf8>] deactivate_super+0x5b/0x82 Dec 7 22:27:22 flurry-srv1 kernel: [<ffffffff800e8290>] sys_umount+0x245/0x27b Dec 7 22:27:22 flurry-srv1 kernel: [<ffffffff800b46af>] audit_syscall_entry+0x16e/0x1a1 Dec 7 22:27:22 flurry-srv1 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 cnp from Bz429054... I believe the initial patch didn't fix the problem. So this is not a regression but it wasn't fixed until now. Looked around for another patch upstream and this should fix the problem: b48d380541f634663b71766005838edbb7261685 Built on 5.4 and the issue is gone. I've already started to take action against incorporating this to the tree. Great, thanks. Feel free to take the bug as well, and sorry for the misfire on the original bug ... -Eric Created attachment 403390 [details]
patch that fixed the issue against -164.
Thanks for the patch! Sorry the other bug didn't take care of this. -Eric This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. in kernel-2.6.18-200.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html |