Bug 657166 - XFS causes kernel panic due to double free of log tickets
Summary: XFS causes kernel panic due to double free of log tickets
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Lachlan McIlroy
QA Contact: Eryu Guan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-11-25 05:49 UTC by Lachlan McIlroy
Modified: 2018-11-14 16:50 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-21 09:26:57 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1065 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.7 kernel security and bug fix update 2011-07-21 09:21:37 UTC

Description Lachlan McIlroy 2010-11-25 05:49:05 UTC
Description of problem:

Customer reported this panic after remounting their xfs filesystem after a forced shutdown.

Sep 27 21:08:44 r05b16 kernel: XFS mounting filesystem sdm1
Sep 27 21:08:45 r05b16 kernel: Starting XFS recovery on filesystem: sdm1 (logdev: internal)
Sep 27 21:08:45 r05b16 kernel: Ending XFS recovery on filesystem: sdm1 (logdev: internal)
Sep 27 21:08:46 r05b16 hotswap[5834]: Mount of c2d66959-52fb-4354-90ad-5114897037b6 successful.
Sep 27 21:08:46 r05b16 signal_video-server[5835]: Signalling Storage Server that c2d66959-52fb-4354-90ad-5114897037b6 is added back
Sep 27 21:09:46 r05b16 kernel: ----------- [cut here ] --------- [please bite here ] ---------
Sep 27 21:09:46 r05b16 kernel: Kernel BUG at mm/slab.c:3114
Sep 27 21:09:46 r05b16 kernel: invalid opcode: 0000 [1] SMP
Sep 27 21:09:46 r05b16 kernel: last sysfs file: /block/sdl/stat
Sep 27 21:09:46 r05b16 kernel: CPU 7
Sep 27 21:09:46 r05b16 kernel: Modules linked in: xfs ses(FU) enclosure(FU) ipv6 xfrm_nalgo crypto_api autofs4 lockd sunrpc video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport cdc_ether usbnet sd_mod sg shpchp cxgb3 ata_piix i2c_i801 i2c_core uhci_hcd ehci_hcd pcspkr libata mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas scsi_mod igb 8021q dca
Sep 27 21:09:46 r05b16 kernel: Pid: 5832, comm: xfssyncd Tainted: GF 2.6.18-194.el5 #1
Sep 27 21:09:46 r05b16 kernel: RIP: 0010:[<ffffffff800dc5fd>] [<ffffffff800dc5fd>] __cache_alloc_node+0x61/0xd2
Sep 27 21:09:46 r05b16 kernel: RSP: 0018:ffff8105e7c11d00 EFLAGS: 00010046
Sep 27 21:09:46 r05b16 kernel: RAX: 0000000000000013 RBX: ffff8101747db000 RCX: 0000000000000001
Sep 27 21:09:46 r05b16 kernel: RDX: 0000000000000000 RSI: 0000000000000250 RDI: ffff81032a60f580
Sep 27 21:09:47 r05b16 kernel: RBP: ffff81032a60f540 R08: 0000000000000000 R09: ffff8101749f77a0
Sep 27 21:09:47 r05b16 kernel: R10: 0000000000000000 R11: 000002d000000000 R12: ffff81032a604500
Sep 27 21:09:47 r05b16 kernel: R13: 0000000000000000 R14: 0000000000000250 R15: 0000000000000000
Sep 27 21:09:47 r05b16 kernel: FS: 0000000000000000(0000) GS:ffff81038aaff3c0(0000) knlGS:0000000000000000
Sep 27 21:09:47 r05b16 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Sep 27 21:09:47 r05b16 kernel: CR2: 00002aac8370a000 CR3: 000000037f5e0000 CR4: 00000000000006e0
Sep 27 21:09:47 r05b16 kernel: Process xfssyncd (pid: 5832, threadinfo ffff8105e7c10000, task ffff8101749f77a0)
Sep 27 21:09:47 r05b16 kernel: Stack: 0000000000000246 0000000000000250 ffff81032a604500 ffff81032a604500
Sep 27 21:09:47 r05b16 kernel: 0000000000000069 ffffffff8000abeb 0000000000000009 0000000000000000
Sep 27 21:09:47 r05b16 kernel: 0000000000000250 ffffffff883f711a 00000000000002d0 0000000000000000
Sep 27 21:09:47 r05b16 kernel: Call Trace:
Sep 27 21:09:47 r05b16 kernel: [<ffffffff8000abeb>] kmem_cache_alloc+0x34/0x76
Sep 27 21:09:47 r05b16 kernel: [<ffffffff883f711a>] :xfs:kmem_zone_alloc+0x56/0xa3
Sep 27 21:09:47 r05b16 kernel: [<ffffffff883f7175>] :xfs:kmem_zone_zalloc+0xe/0x2f
Sep 27 21:09:48 r05b16 kernel: [<ffffffff883e6e8f>] :xfs:xlog_ticket_get+0x30/0xe6
Sep 27 21:09:48 r05b16 kernel: [<ffffffff883e6fcd>] :xfs:xfs_log_reserve+0x88/0xc9
Sep 27 21:09:48 r05b16 kernel: [<ffffffff883ef2e1>] :xfs:xfs_trans_reserve+0xe4/0x1c5
Sep 27 21:09:48 r05b16 kernel: [<ffffffff883f241f>] :xfs:xfs_syncsub+0x167/0x226
Sep 27 21:09:48 r05b16 kernel: [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4
Sep 27 21:09:48 r05b16 kernel: [<ffffffff883ffc8c>] :xfs:xfs_sync_worker+0x17/0x36
Sep 27 21:09:48 r05b16 kernel: [<ffffffff88400ba2>] :xfs:xfssyncd+0xfe/0x138
Sep 27 21:09:48 r05b16 kernel: [<ffffffff88400aa4>] :xfs:xfssyncd+0x0/0x138
Sep 27 21:09:48 r05b16 kernel: [<ffffffff80032bdc>] kthread+0xfe/0x132
Sep 27 21:09:48 r05b16 kernel: [<ffffffff8005efb1>] child_rip+0xa/0x11
Sep 27 21:09:48 r05b16 kernel: [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4
Sep 27 21:09:48 r05b16 kernel: [<ffffffff80032ade>] kthread+0x0/0x132
Sep 27 21:09:49 r05b16 kernel: [<ffffffff8005efa7>] child_rip+0x0/0x11
Sep 27 21:09:49 r05b16 kernel:
Sep 27 21:09:49 r05b16 kernel:
Sep 27 21:09:49 r05b16 kernel: Code: 0f 0b 68 84 2b 2b 80 c2 2a 0c 48 89 de 4c 89 e7 44 89 ea e8
Sep 27 21:09:49 r05b16 kernel: RIP [<ffffffff800dc5fd>] __cache_alloc_node+0x61/0xd2
Sep 27 21:09:49 r05b16 kernel: RSP <ffff8105e7c11d00>
Sep 27 21:09:49 r05b16 kernel: <0>Kernel panic - not syncing: Fatal exception 

Version-Release number of selected component (if applicable):
kernel-2.6.18-194.el5

How reproducible:
Everytime they run their test.

Steps to Reproduce:
From the customer:

"The setup we are talking about is a 12 disk machine that has a disk with
HW problems.

We are using fio tool to fill up the bad disk, we get scsi errors during
the test that leads the controller to remove the device and then attach
it back after few seconds.

What we are doing in this case is unmount and mount to the new device.

The kernel panic will occurred after a 1 minute.

The scenario above occurred few times but now using the same scenario we
get different behavior, we do not get kernel panic but we do see that
the cpu get stuck for more than 10 sec."

Comment 7 RHEL Program Management 2011-02-01 16:59:06 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Jarod Wilson 2011-02-21 20:57:17 UTC
in kernel-2.6.18-245.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 13 Chao Ye 2011-06-15 05:18:22 UTC
Confirm patch in kernel git tree

Comment 15 errata-xmlrpc 2011-07-21 09:26:57 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html


Note You need to log in before you can comment on or make changes to this bug.