Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 698094

Summary:	NULL pointer dereference, IP: blkiocg_lookup_group+0x9/0x40
Product:	Red Hat Enterprise Linux 6	Reporter:	Jan Stancek <jstancek>
Component:	kernel	Assignee:	Vivek Goyal <vgoyal>
Status:	CLOSED ERRATA	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6.1	CC:	arozansk, jburke, kzhang
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	kernel-2.6.32-160.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-12-06 13:15:52 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jan Stancek 2011-04-20 07:50:29 UTC

Description of problem:
While running cgroups tests on 2.6.32-131.0.1.el6.x86_64 kernel panic'ed:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 
IP: [<ffffffff81254fb9>] blkiocg_lookup_group+0x9/0x40 
PGD 0  
Oops: 0000 [#1] SMP  
last sysfs file: /sys/devices/system/node/has_normal_memory 
CPU 1  
Modules linked in: sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log tg3 microcode dcdbas serio_raw i2c_i801 sg iTCO_wdt iTCO_vendor_support shpchp i3000_edac edac_core ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom ata_generic pata_acpi ata_piix radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan] 
 
Modules linked in: sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log tg3 microcode dcdbas serio_raw i2c_i801 sg iTCO_wdt iTCO_vendor_support shpchp i3000_edac edac_core ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom ata_generic pata_acpi ata_piix radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan] 
Pid: 11419, comm: umount Not tainted 2.6.32-131.0.1.el6.x86_64 #1 PowerEdge SC440               
RIP: 0010:[<ffffffff81254fb9>]  [<ffffffff81254fb9>] blkiocg_lookup_group+0x9/0x40 
RSP: 0018:ffff88007c1cf6a8  EFLAGS: 00010007 
RAX: ffff880078a17540 RBX: ffff8800789cf380 RCX: 000000000000ffff 
RDX: 000000000000449d RSI: ffff88003772c000 RDI: 0000000000000000 
RBP: ffff88007c1cf6a8 R08: 0000000000000000 R09: 0000000000000023 
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003772c028 
R13: ffff880078a17540 R14: ffff88003772c000 R15: 0000000000000001 
FS:  00007f647c96e740(0000) GS:ffff880002040000(0000) knlGS:0000000000000000 
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
CR2: 0000000000000028 CR3: 0000000078f91000 CR4: 00000000000006e0 
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
Process umount (pid: 11419, threadinfo ffff88007c1ce000, task ffff88007b7500c0) 
Stack: 
 ffff88007c1cf768 ffffffff812578da ffff88007c1cf728 ffff880057304420 
<0> ffff88007c1cf708 0000000000000246 ffff880079e17748 ffff88007c1cf778 
<0> ffff8800377b7000 ffff88007c1cf758 ffff880057292368 0000000000000000 
Call Trace: 
 [<ffffffff812578da>] blk_throtl_bio+0xba/0x550 
 [<ffffffffa02610d9>] ? ext4_discard_preallocations+0x3a9/0x460 [ext4] 
 [<ffffffff81249002>] generic_make_request+0x1f2/0x5b0 
 [<ffffffff8108e16f>] ? wake_up_bit+0x2f/0x40 
 [<ffffffff8124944f>] submit_bio+0x8f/0x120 
 [<ffffffff811a2996>] submit_bh+0xf6/0x150 
 [<ffffffff811a4700>] __block_write_full_page+0x1e0/0x3b0 
 [<ffffffff811a4040>] ? end_buffer_async_write+0x0/0x190 
 [<ffffffffa0239020>] ? noalloc_get_block_write+0x0/0x60 [ext4] 
 [<ffffffffa0239020>] ? noalloc_get_block_write+0x0/0x60 [ext4] 
 [<ffffffff811a5340>] block_write_full_page_endio+0xe0/0x120 
 [<ffffffff811a4040>] ? end_buffer_async_write+0x0/0x190 
 [<ffffffff811a5395>] block_write_full_page+0x15/0x20 
 [<ffffffffa02365e8>] ext4_writepage+0xd8/0x3c0 [ext4] 
 [<ffffffffa0234dbc>] mpage_da_submit_io+0x14c/0x1d0 [ext4] 
 [<ffffffffa0206405>] ? jbd2_journal_start+0xb5/0x100 [jbd2] 
 [<ffffffffa023a659>] ext4_da_writepages+0x419/0x660 [ext4] 
 [<ffffffff81122541>] do_writepages+0x21/0x40 
 [<ffffffff8110db4b>] __filemap_fdatawrite_range+0x5b/0x60 
 [<ffffffff8110e02c>] filemap_flush+0x1c/0x20 
 [<ffffffffa0234eae>] ext4_alloc_da_blocks+0x4e/0x80 [ext4] 
 [<ffffffffa024157d>] ext4_rename+0x19d/0x750 [ext4] 
 [<ffffffff81180472>] vfs_rename+0x3b2/0x440 
 [<ffffffff8120e162>] ? selinux_inode_permission+0x72/0xb0 
 [<ffffffff81183330>] sys_renameat+0x230/0x260 
 [<ffffffff81262791>] ? cpumask_any_but+0x31/0x50 
 [<ffffffff8113cca0>] ? unmap_region+0x110/0x130 
 [<ffffffff8113ae7e>] ? remove_vma+0x6e/0x90 
 [<ffffffff810d1b82>] ? audit_syscall_entry+0x272/0x2a0 
 [<ffffffff814e0aae>] ? do_page_fault+0x3e/0xa0 
 [<ffffffff8118337b>] sys_rename+0x1b/0x20 
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b 
Code: 14 45 85 c0 75 e4 83 78 18 02 75 de 48 85 c0 74 09 8b 40 20 c9 c3 0f 1f 40 00 8b 47 20 c9 c3 0f 1f 00 55 48 89 e5 0f 1f 44 00 00 <48> 8b 57 28 48 85 d2 75 0e eb 24 0f 1f 40 00 48 8b 12 48 85 d2  
RIP  [<ffffffff81254fb9>] blkiocg_lookup_group+0x9/0x40 
 RSP <ffff88007c1cf6a8> 
CR2: 0000000000000028 
---[ end trace d16e7c3f28faea25 ]--- 
Kernel panic - not syncing: Fatal exception 
Pid: 11419, comm: umount Tainted: G      D    ----------------   2.6.32-131.0.1.el6.x86_64 #1 
Call Trace: 
 [<ffffffff814daaa1>] ? panic+0x78/0x143 
 [<ffffffff814deae4>] ? oops_end+0xe4/0x100 
 [<ffffffff81040cdb>] ? no_context+0xfb/0x260 
 [<ffffffff8126d536>] ? __const_udelay+0x46/0x50 
 [<ffffffff81040f65>] ? __bad_area_nosemaphore+0x125/0x1e0 
 [<ffffffff810097bc>] ? __switch_to+0x1ac/0x320 
 [<ffffffff8104108e>] ? bad_area+0x4e/0x60 
 [<ffffffff810417b3>] ? __do_page_fault+0x3c3/0x480 
 [<ffffffff8108e047>] ? bit_waitqueue+0x17/0xd0 
 [<ffffffff8108e16f>] ? wake_up_bit+0x2f/0x40 
 [<ffffffff8108e047>] ? bit_waitqueue+0x17/0xd0 
 [<ffffffff8108e16f>] ? wake_up_bit+0x2f/0x40 
 [<ffffffff814e0aae>] ? do_page_fault+0x3e/0xa0 
 [<ffffffff814dde55>] ? page_fault+0x25/0x30 
 [<ffffffff81254fb9>] ? blkiocg_lookup_group+0x9/0x40 
 [<ffffffff812578da>] ? blk_throtl_bio+0xba/0x550 
 [<ffffffffa02610d9>] ? ext4_discard_preallocations+0x3a9/0x460 [ext4] 
 [<ffffffff81249002>] ? generic_make_request+0x1f2/0x5b0 
 [<ffffffff8108e16f>] ? wake_up_bit+0x2f/0x40 
 [<ffffffff8124944f>] ? submit_bio+0x8f/0x120 
 [<ffffffff811a2996>] ? submit_bh+0xf6/0x150 
 [<ffffffff811a4700>] ? __block_write_full_page+0x1e0/0x3b0 
 [<ffffffff811a4040>] ? end_buffer_async_write+0x0/0x190 
 [<ffffffffa0239020>] ? noalloc_get_block_write+0x0/0x60 [ext4] 
 [<ffffffffa0239020>] ? noalloc_get_block_write+0x0/0x60 [ext4] 
 [<ffffffff811a5340>] ? block_write_full_page_endio+0xe0/0x120 
 [<ffffffff811a4040>] ? end_buffer_async_write+0x0/0x190 
 [<ffffffff811a5395>] ? block_write_full_page+0x15/0x20 
 [<ffffffffa02365e8>] ? ext4_writepage+0xd8/0x3c0 [ext4] 
 [<ffffffffa0234dbc>] ? mpage_da_submit_io+0x14c/0x1d0 [ext4] 
 [<ffffffffa0206405>] ? jbd2_journal_start+0xb5/0x100 [jbd2] 
 [<ffffffffa023a659>] ? ext4_da_writepages+0x419/0x660 [ext4] 
 [<ffffffff81122541>] ? do_writepages+0x21/0x40 
 [<ffffffff8110db4b>] ? __filemap_fdatawrite_range+0x5b/0x60 
 [<ffffffff8110e02c>] ? filemap_flush+0x1c/0x20 
 [<ffffffffa0234eae>] ? ext4_alloc_da_blocks+0x4e/0x80 [ext4] 
 [<ffffffffa024157d>] ? ext4_rename+0x19d/0x750 [ext4] 
 [<ffffffff81180472>] ? vfs_rename+0x3b2/0x440 
 [<ffffffff8120e162>] ? selinux_inode_permission+0x72/0xb0 
 [<ffffffff81183330>] ? sys_renameat+0x230/0x260 
 [<ffffffff81262791>] ? cpumask_any_but+0x31/0x50 
 [<ffffffff8113cca0>] ? unmap_region+0x110/0x130 
 [<ffffffff8113ae7e>] ? remove_vma+0x6e/0x90 
 [<ffffffff810d1b82>] ? audit_syscall_entry+0x272/0x2a0 
 [<ffffffff814e0aae>] ? do_page_fault+0x3e/0xa0 
 [<ffffffff8118337b>] ? sys_rename+0x1b/0x20 
 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b 
panic occurred, switching back to text console 

Version-Release number of selected component (if applicable):
RHEL6.1-20110419.n.0_nfs-Server-x86_64
kernel-2.6.32-131.0.1.el6.x86_64

How reproducible:
unknown, this is first time I see this panic

Steps to Reproduce:
The problem was triggered by ltp/generic running RHEL6CGROUP. Test ran only about ~5 minutes, so it was probably  cgroup_regression_test.sh

Actual results:


Expected results:


Additional info:

Comment 3 Vivek Goyal 2011-04-20 20:13:50 UTC

Ok, I looked at this bz.

Following is the function in question.

struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key)
{
        struct blkio_group *blkg;
        struct hlist_node *n;
        void *__key;

        hlist_for_each_entry_rcu(blkg, n, &blkcg->blkg_list, blkcg_node) {
                __key = blkg->key;
                if (__key == key)
                        return blkg;
        }

        return NULL;
}

I think we have been passed blkcg=NULL that's why when it tries to access &blkcg->blkg_list it crashes. Note blkg_list is at offset 0x28 in blkio_cgroup
structure.

Above function is called here.

static struct throtl_grp * throtl_find_alloc_tg(struct throtl_data *td,
                        struct cgroup *cgroup)
{
        struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgroup);

..
..
..

        if (blkcg == &blkio_root_cgroup)
                tg = &td->root_tg;
        else
                tg = tg_of_blkg(blkiocg_lookup_group(blkcg, key));

..
..
..
}

So it looks like somehow  cgroup_to_blkio_cgroup() returned NULL. And at this point of time I have no idea why that can happen. To me this is in the context of process and if process still part of the cgroup, then process is alive
and cgroup can not go away. So above function should not return null.

It smells of a generic cgroup layer bug which can be triggered in some corner case. We shall have to keep trying it and see if we can come up with a way to reproduce it.

Comment 4 Jan Stancek 2011-04-21 21:31:52 UTC

Here is another one:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 
IP: [<ffffffff81254fb9>] blkiocg_lookup_group+0x9/0x40 
PGD 795e7067 PUD 2c4ee067 PMD 0  
Oops: 0000 [#1] SMP  
last sysfs file: /sys/devices/system/node/has_normal_memory 
CPU 1  
Modules linked in: sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log tg3 microcode dcdbas serio_raw i2c_i801 sg iTCO_wdt iTCO_vendor_support i3000_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom pata_acpi ata_generic ata_piix radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan] 
 
Modules linked in: sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log tg3 microcode dcdbas serio_raw i2c_i801 sg iTCO_wdt iTCO_vendor_support i3000_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom pata_acpi ata_generic ata_piix radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan] 
Pid: 18182, comm: rmdir Not tainted 2.6.32-131.0.1.el6.x86_64 #1 PowerEdge SC440               
RIP: 0010:[<ffffffff81254fb9>]  [<ffffffff81254fb9>] blkiocg_lookup_group+0x9/0x40 
RSP: 0018:ffff880079d8b8b8  EFLAGS: 00010007 
RAX: ffff88007886f540 RBX: ffff88007afb8980 RCX: 000000000000ffff 
RDX: 000000000000649b RSI: ffff880037665800 RDI: 0000000000000000 
RBP: ffff880079d8b8b8 R08: 0000000000000000 R09: 0000000000000025 
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880037665828 
R13: ffff88007886f540 R14: ffff880037665800 R15: 0000000000000010 
FS:  00007f752fd62700(0000) GS:ffff880002040000(0000) knlGS:0000000000000000 
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
CR2: 0000000000000028 CR3: 0000000037a5f000 CR4: 00000000000006e0 
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
Process rmdir (pid: 18182, threadinfo ffff880079d8a000, task ffff880045594a80) 
Stack: 
 ffff880079d8b978 ffffffff812578da ffffc90000307020 ffffea00017cc990 
<0> ffff880045594a80 000000500002bb00 ffff880045594a80 0000000000020058 
<0> ffff880079d8ba18 ffffffff8111fc01 ffff8800000126c0 0000000000000000 
Call Trace: 
 [<ffffffff812578da>] blk_throtl_bio+0xba/0x550 
 [<ffffffff8111fc01>] ? __alloc_pages_nodemask+0x111/0x8b0 
 [<ffffffff81249002>] generic_make_request+0x1f2/0x5b0 
 [<ffffffff8110d1fe>] ? find_get_page+0x1e/0xa0 
 [<ffffffff811a2c6f>] ? __find_get_block_slow+0xaf/0x130 
 [<ffffffff8124944f>] submit_bio+0x8f/0x120 
 [<ffffffff811a2996>] submit_bh+0xf6/0x150 
 [<ffffffff811a4313>] ll_rw_block+0x143/0x150 
 [<ffffffff811a434e>] __breadahead+0x2e/0x40 
 [<ffffffffa0236f7e>] __ext4_get_inode_loc+0x33e/0x3b0 [ext4] 
 [<ffffffffa0237076>] ext4_iget+0x86/0x7e0 [ext4] 
 [<ffffffff81205ffb>] ? security_d_instantiate+0x1b/0x30 
 [<ffffffffa023f3d5>] ext4_lookup+0xa5/0x140 [ext4] 
 [<ffffffff8118068b>] do_lookup+0x18b/0x220 
 [<ffffffff81180c89>] __link_path_walk+0x569/0x820 
 [<ffffffff8118160a>] path_walk+0x6a/0xe0 
 [<ffffffff811817db>] do_path_lookup+0x5b/0xa0 
 [<ffffffff81173d31>] ? get_empty_filp+0xa1/0x170 
 [<ffffffff8118244b>] do_filp_open+0xfb/0xd90 
 [<ffffffff810415d4>] ? __do_page_fault+0x1e4/0x480 
 [<ffffffff81262791>] ? cpumask_any_but+0x31/0x50 
 [<ffffffff8113cca0>] ? unmap_region+0x110/0x130 
 [<ffffffff8118f0b2>] ? alloc_fd+0x92/0x160 
 [<ffffffff8116f859>] do_sys_open+0x69/0x140 
 [<ffffffff8116f970>] sys_open+0x20/0x30 
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b 
Code: 14 45 85 c0 75 e4 83 78 18 02 75 de 48 85 c0 74 09 8b 40 20 c9 c3 0f 1f 40 00 8b 47 20 c9 c3 0f 1f 00 55 48 89 e5 0f 1f 44 00 00 <48> 8b 57 28 48 85 d2 75 0e eb 24 0f 1f 40 00 48 8b 12 48 85 d2  
RIP  [<ffffffff81254fb9>] blkiocg_lookup_group+0x9/0x40 
 RSP <ffff880079d8b8b8> 
CR2: 0000000000000028 
---[ end trace fa132e811bbf87a1 ]--- 
Kernel panic - not syncing: Fatal exception 
Pid: 18182, comm: rmdir Tainted: G      D    ----------------   2.6.32-131.0.1.el6.x86_64 #1 
Call Trace: 
 [<ffffffff814daaa1>] ? panic+0x78/0x143 
 [<ffffffff814deae4>] ? oops_end+0xe4/0x100 
 [<ffffffff81040cdb>] ? no_context+0xfb/0x260 
 [<ffffffff8137316f>] ? ata_bmdma_start+0x2f/0x40 
 [<ffffffff81040f65>] ? __bad_area_nosemaphore+0x125/0x1e0 
 [<ffffffff813633c1>] ? ata_qc_issue+0x1d1/0x340 
 [<ffffffff8104108e>] ? bad_area+0x4e/0x60 
 [<ffffffff8136b440>] ? ata_scsi_rw_xlat+0x0/0x1f0 
 [<ffffffff810417b3>] ? __do_page_fault+0x3c3/0x480 
 [<ffffffff8110fa45>] ? mempool_alloc_slab+0x15/0x20 
 [<ffffffff8110fb53>] ? mempool_alloc+0x63/0x140 
 [<ffffffff8125fe5f>] ? cfq_set_request+0x18f/0x520 
 [<ffffffff814e0aae>] ? do_page_fault+0x3e/0xa0 
 [<ffffffff814dde55>] ? page_fault+0x25/0x30 
 [<ffffffff81254fb9>] ? blkiocg_lookup_group+0x9/0x40 
 [<ffffffff812578da>] ? blk_throtl_bio+0xba/0x550 
 [<ffffffff8111fc01>] ? __alloc_pages_nodemask+0x111/0x8b0 
 [<ffffffff81249002>] ? generic_make_request+0x1f2/0x5b0 
 [<ffffffff8110d1fe>] ? find_get_page+0x1e/0xa0 
 [<ffffffff811a2c6f>] ? __find_get_block_slow+0xaf/0x130 
 [<ffffffff8124944f>] ? submit_bio+0x8f/0x120 
 [<ffffffff811a2996>] ? submit_bh+0xf6/0x150 
 [<ffffffff811a4313>] ? ll_rw_block+0x143/0x150 
 [<ffffffff811a434e>] ? __breadahead+0x2e/0x40 
 [<ffffffffa0236f7e>] ? __ext4_get_inode_loc+0x33e/0x3b0 [ext4] 
 [<ffffffffa0237076>] ? ext4_iget+0x86/0x7e0 [ext4] 
 [<ffffffff81205ffb>] ? security_d_instantiate+0x1b/0x30 
 [<ffffffffa023f3d5>] ? ext4_lookup+0xa5/0x140 [ext4] 
 [<ffffffff8118068b>] ? do_lookup+0x18b/0x220 
 [<ffffffff81180c89>] ? __link_path_walk+0x569/0x820 
 [<ffffffff8118160a>] ? path_walk+0x6a/0xe0 
 [<ffffffff811817db>] ? do_path_lookup+0x5b/0xa0 
 [<ffffffff81173d31>] ? get_empty_filp+0xa1/0x170 
 [<ffffffff8118244b>] ? do_filp_open+0xfb/0xd90 
 [<ffffffff810415d4>] ? __do_page_fault+0x1e4/0x480 
 [<ffffffff81262791>] ? cpumask_any_but+0x31/0x50 
 [<ffffffff8113cca0>] ? unmap_region+0x110/0x130 
 [<ffffffff8118f0b2>] ? alloc_fd+0x92/0x160 
 [<ffffffff8116f859>] ? do_sys_open+0x69/0x140 
 [<ffffffff8116f970>] ? sys_open+0x20/0x30 
 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b 
panic occurred, switching back to text console

Comment 9 Vivek Goyal 2011-04-25 21:32:56 UTC

Jan,

Is it possible to enable kdump in your recipe and capture vmcore when this bug is reproduced?

Vivek

Comment 15 Vivek Goyal 2011-05-02 20:36:16 UTC

Jan,

Can you please try running your tests with following kernel. I have put a fix there and hoping this issue should be resolved.

http://download.lab.bos.redhat.com/brewroot/scratch/vgoyal/task_3291406/

Comment 16 Jan Stancek 2011-05-09 08:04:52 UTC

(In reply to comment #15)
> Jan,
> 
> Can you please try running your tests with following kernel. I have put a fix
> there and hoping this issue should be resolved.
> 
> http://download.lab.bos.redhat.com/brewroot/scratch/vgoyal/task_3291406/

Vivek,

The test_10 from LTP's cgroup_regressions is now running for >6 days on your testing kernel from #15 without any crash. Previously this test hit the problem within couple of hours, which makes me believe that fix and suspicion about rebind_subsytems() and rcu protection is correct.

static int rebind_subsystems(struct cgroupfs_root *root,
...
1022                        cgrp->subsys[i] = NULL;

Comment 17 Vivek Goyal 2011-05-09 12:41:30 UTC

(In reply to comment #16)
> (In reply to comment #15)
> > Jan,
> > 
> > Can you please try running your tests with following kernel. I have put a fix
> > there and hoping this issue should be resolved.
> > 
> > http://download.lab.bos.redhat.com/brewroot/scratch/vgoyal/task_3291406/
> 
> Vivek,
> 
> The test_10 from LTP's cgroup_regressions is now running for >6 days on your
> testing kernel from #15 without any crash. Previously this test hit the problem
> within couple of hours, which makes me believe that fix and suspicion about
> rebind_subsytems() and rcu protection is correct.
> 
> static int rebind_subsystems(struct cgroupfs_root *root,
> ...
> 1022                        cgrp->subsys[i] = NULL;

Thanks Jan. I will send the fix to upstream first and then pull it in rhel6.

Comment 19 RHEL Program Management 2011-06-03 20:00:36 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 20 Aristeu Rozanski 2011-06-27 19:03:51 UTC

Patch(es) available on kernel-2.6.32-160.el6

Comment 23 Mike Gahagan 2011-10-03 17:03:38 UTC

After having run the cgroup test suite from LTP and others which reproduced the bug in the past since July on numerous systems I think we can consider this issue verified.

Comment 24 errata-xmlrpc 2011-12-06 13:15:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html