Bug 604244
| Summary: | GFS2: kernel NULL pointer dereference from dlm_astd | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Robert Peterson <rpeterso> | ||||||
| Component: | kernel | Assignee: | Robert Peterson <rpeterso> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | low | ||||||||
| Version: | 6.0 | CC: | adas, arozansk, bmarzins, nstraz, swhiteho, syeghiay | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | 6.0 | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 612608 (view as bug list) | Environment: | |||||||
| Last Closed: | 2010-11-15 14:28:06 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 612608 | ||||||||
| Attachments: |
|
||||||||
|
Description
Robert Peterson
2010-06-15 17:11:14 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. Created attachment 424231 [details]
Upstream patch
Here is the upstream patch to fix the problem. It should
just apply to the RHEL6 branch I think, but I'll make sure.
Unfortunately, I discovered a problem with the patch. I know what it is, so I'll create a new one and test it well before reposting upstream. Although Steve claims to have pushed the patch upstream, it appears he hasn't, and that's a good thing so I have time to rework it as needed. Created attachment 424941 [details]
Upstream patch - try #2
The previous patch worked perfectly for the failing scenario
but got into trouble when the genesis test was run. This
version of the patch has passed more than 50 iterations of the
genesis program, with flying colors.
It was tested on RHEL6-beta system roth-08. I could recreate
the problem with the previous version pretty reliably, so I'm
confident that problem is fixed.
I emailed the patch upstream to cluster-devel, but since Steve
is on holiday/vacation it won't be pushed upstream until he
returns in one week.
The brawl test completed successfully on the west cluster. The genesis test completed successfully on RHEL6 node west-08. I posted the patch for inclusion into the RHEL6 kernel today. Changing status to POST. The initial backtrace says rhel5, but this bug is against rhel6 and mentions testing on rhel6, so I presume you actually meant to cc Aris, not me. Patch(es) available on kernel-2.6.32-42.el6 *** Bug 610136 has been marked as a duplicate of this bug. *** I hit this BUG with kernel 2.6.32-42.el6.x86_64. It is the same backtrace as 610136 which was dup'd to this bz. It was hit while running brawl w/ a 1k file system block size. The flock below corresponds to a file generated by the test program accordion. 3689713 -rw-rw-r--. 1 root root 27189 Jul 7 23:34 accrdfile2l G: s:UN n:6/384cf1 f:I t:UN d:EX/0 a:0 r:0 ------------[ cut here ]------------ kernel BUG at fs/gfs2/glock.c:173! invalid opcode: 0000 [#1] Modules linked in: sctp libcrc32c gfs2 dlm configfs sunrpc cpufreq_ondemand powernow_k8 freq_table ipv6 dm_mirror dm_region_hash dm_log dcdba s k8temp hwmon serio_raw amd64_edac_mod edac_core edac_mce_amd tg3 sg i2c_piix4 shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom qla2x xx scsi_transport_fc scsi_tgt sata_svw ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mod [las t unloaded: configfs] Pid: 6793, comm: dlm_astd Not tainted 2.6.32-42.el6.x86_64 #1 PowerEdge SC1435 RIP: 0010:[<ffffffffa0435680>] [<ffffffffa0435680>] gfs2_glock_hold+0x20/0x30 [gfs2] RSP: 0018:ffff88011a075e10 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff8801fa45ba28 RCX: 000000000000264e RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000000 RBP: ffff88011a075e10 R08: ffffffff818bb9c0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000000001 R15: ffff88011a12f000 FS: 00007f1497a47700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000002886000 CR3: 00000001bdbc4000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process dlm_astd (pid: 6793, threadinfo ffff88011a074000, task ffff8801186c7580) Stack: ffff88011a075e40 ffffffffa0436141 0000000000000001 0000000000000000 <0> 0000000000000001 ffff880107b80000 ffff88011a075e60 ffffffffa0453a5d <0> ffffffffa0416aa8 ffff8801d229a078 ffff88011a075ee0 ffffffffa03f93dd Call Trace: [<ffffffffa0436141>] gfs2_glock_complete+0x31/0xd0 [gfs2] [<ffffffffa0453a5d>] gdlm_ast+0xfd/0x110 [gfs2] [<ffffffffa03f93dd>] dlm_astd+0x25d/0x2b0 [dlm] [<ffffffffa0453860>] ? gdlm_bast+0x0/0x50 [gfs2] [<ffffffffa0453960>] ? gdlm_ast+0x0/0x110 [gfs2] [<ffffffffa03f9180>] ? dlm_astd+0x0/0x2b0 [dlm] [<ffffffff810909e6>] kthread+0x96/0xa0 [<ffffffff810141ca>] child_rip+0xa/0x20 [<ffffffff81090950>] ? kthread+0x0/0xa0 [<ffffffff810141c0>] ? child_rip+0x0/0x20 Code: ff ff c9 c3 0f 1f 80 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 8b 47 28 85 c0 74 06 f0 ff 47 28 c9 c3 48 89 fe 31 ff e8 a0 fc ff ff <0f> 0 b eb fe 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 RIP [<ffffffffa0435680>] gfs2_glock_hold+0x20/0x30 [gfs2] RSP <ffff88011a075e10> The duplicate bug #610136 was a duplicate because it was an improperly referenced i_iopen glock, as shown by the "5/" in the glock dump: G: s:UN n:5/9d14 f:I t:UN d:EX/0 a:0 r:0 However, in this case, the glock referenced improperly is G: s:UN n:6/384cf1 f:I t:UN d:EX/0 a:0 r:0 and "6/" indicates a glock for an flock: LM_TYPE_FLOCK. The patch for this bug record affected only i_open glocks. Therefore, although this symptom is nearly identical, the problem is not with this patch. This has got to be another similar bug somewhere in the flock code. Please open a new bugzilla record with the symptom from comment #12 and assign it to me. Setting this one back to ON_QA. Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |