671477 – [RHEL6.1] possible vmalloc_sync_all() bug

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 671477 - [RHEL6.1] possible vmalloc_sync_all() bug

Summary: [RHEL6.1] possible vmalloc_sync_all() bug

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Andrea Arcangeli
QA Contact:	Caspar Zhang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	6.1KnownIssues
TreeView+	depends on / blocked

Reported:	2011-01-21 15:56 UTC by Aristeu Rozanski
Modified:	2013-07-03 07:27 UTC (History)
CC List:	8 users (show)
Fixed In Version:	kernel-2.6.32-122.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-05-19 12:08:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0542	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update	2011-05-19 11:58:07 UTC

Description Aristeu Rozanski 2011-01-21 15:56:33 UTC

Description of problem:
Multiple BUGs on the CPUs being stuck
BUG: soft lockup - CPU#5 stuck for 61s! [stapio:8604]
Modules linked in: stap_850fe20eb529fd0f6b13f9a95a4cdd61_882(U) cryptd aes_x86_64 aes_generic ts_kmp nls_koi8_u nls_cp932 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support e1000e ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix dm_mod [last unloaded: stap_02b9727bb984f2da043cd88e66031bd0_878]
irq event stamp: 1099684
hardirqs last  enabled at (1099683): [<ffffffff8100bc10>] restore_args+0x0/0x30
hardirqs last disabled at (1099684): [<ffffffff8100afea>] save_args+0x6a/0x70
softirqs last  enabled at (1099680): [<ffffffff8107095d>] __do_softirq+0x14d/0x220
softirqs last disabled at (1099667): [<ffffffff8100c3cc>] call_softirq+0x1c/0x30
CPU 5:
Modules linked in: stap_850fe20eb529fd0f6b13f9a95a4cdd61_882(U) cryptd aes_x86_64 aes_generic ts_kmp nls_koi8_u nls_cp932 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support e1000e ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix dm_mod [last unloaded: stap_02b9727bb984f2da043cd88e66031bd0_878]
Pid: 8604, comm: stapio Not tainted 2.6.32-102.el6scratch.x86_64.debug #1 Express5800/T110b [N8100-1589]
RIP: 0010:[<ffffffff81044e38>]  [<ffffffff81044e38>] flush_tlb_others_ipi+0x118/0x130
RSP: 0000:ffff88003a181828  EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff88003a181868 RCX: 0000000000000008
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff82009550
RBP: ffffffff8100bd8e R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88003a180000 R15: ffffffff81791100
FS:  00007f4b4f00c710(0000) GS:ffff880004800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f4b4efebd2c CR3: 000000003a486000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Call Trace:
 [<ffffffff81044e48>] ? flush_tlb_others_ipi+0x128/0x130
 [<ffffffff81044ec6>] ? native_flush_tlb_others+0x76/0x90
 [<ffffffff81044fee>] ? flush_tlb_page+0x5e/0xb0
 [<ffffffff81043d50>] ? ptep_clear_flush_young+0x50/0x70
 [<ffffffff8114f02c>] ? page_referenced_one+0x9c/0x1d0
 [<ffffffff8114e829>] ? page_lock_anon_vma+0x69/0xb0
 [<ffffffff8114e7c0>] ? page_lock_anon_vma+0x0/0xb0
 [<ffffffff8114fd92>] ? page_referenced+0x2f2/0x3f0
 [<ffffffff814faa30>] ? _spin_unlock_irq+0x30/0x40
 [<ffffffff810a76cd>] ? trace_hardirqs_on_caller+0x14d/0x190
 [<ffffffff81134824>] ? shrink_active_list+0x1c4/0x370
 [<ffffffff81096a8d>] ? sched_clock_cpu+0xcd/0x110
 [<ffffffff81135f6d>] ? shrink_zone+0x34d/0x510
 [<ffffffff8109b8b9>] ? ktime_get_ts+0xa9/0xe0
 [<ffffffff8113624e>] ? do_try_to_free_pages+0x11e/0x520
 [<ffffffff8113684d>] ? try_to_free_pages+0x9d/0x130
 [<ffffffff81133bb0>] ? isolate_pages_global+0x0/0x3a0
 [<ffffffff8112ddc0>] ? __alloc_pages_nodemask+0x4a0/0x910
 [<ffffffff81013233>] ? native_sched_clock+0x13/0x60
 [<ffffffff81162ee3>] ? alloc_pages_vma+0x93/0x150
 [<ffffffff8117f8f5>] ? do_huge_pmd_anonymous_page+0x135/0x310
 [<ffffffff814fdd77>] ? do_page_fault+0xc7/0x3c0
 [<ffffffff811467d5>] ? handle_mm_fault+0x245/0x2b0
 [<ffffffff814fddee>] ? do_page_fault+0x13e/0x3c0
 [<ffffffff814fb625>] ? page_fault+0x25/0x30
http://rhts.redhat.com/testlogs/2011/01/182206/474344/3992197/console.txt

when the NMI watchdog kicks in:
sending NMI to all CPUs:
NMI backtrace for cpu 4
CPU 4:
Modules linked in: stap_850fe20eb529fd0f6b13f9a95a4cdd61_882(U) cryptd aes_x86_64 aes_generic ts_kmp nls_koi8_u nls_cp932 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support e1000e ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix dm_mod [last unloaded: stap_02b9727bb984f2da043cd88e66031bd0_878]
Pid: 8593, comm: stapio Not tainted 2.6.32-102.el6scratch.x86_64.debug #1 Express5800/T110b [N8100-1589]
RIP: 0010:[<ffffffff81283e11>]  [<ffffffff81283e11>] delay_tsc+0x61/0x80
RSP: 0018:ffff88003a587d60  EFLAGS: 00000093
RAX: 000000000bbb6f15 RBX: ffff88003b21ef40 RCX: 000000000bbb6f15
RDX: 0000000000000072 RSI: ffff880004612340 RDI: 0000000000000001
RBP: ffff88003a587d68 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000001 R12: 00000000a6726d88
R13: ffff88003bf287c0 R14: ffff88003bf28eb8 R15: 00000000a12d7b43
FS:  00007f4b501ba700(0000) GS:ffff880004600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f4b4f00bff8 CR3: 000000003a486000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
 <#DB[1]>  <<EOE>> Pid: 8593, comm: stapio Not tainted 2.6.32-102.el6scratch.x86_64.debug #1
Call Trace:
 <NMI>  [<ffffffff81009d59>] ? show_regs+0x49/0x50
 [<ffffffff814fcca8>] nmi_watchdog_tick+0x1d8/0x200
 [<ffffffff814fbde3>] do_nmi+0x1d3/0x300
 [<ffffffff814fb940>] nmi+0x20/0x39
 [<ffffffff81283e11>] ? delay_tsc+0x61/0x80
 <<EOE>>  [<ffffffff81283d4f>] ? __delay+0xf/0x20
 [<ffffffff81289570>] _raw_spin_lock+0x110/0x180
 [<ffffffff814facd6>] _spin_lock+0x56/0x70
 [<ffffffff8103fcee>] ? vmalloc_sync_all+0x10e/0x180
 [<ffffffff814faaeb>] ? _spin_unlock+0x2b/0x40
 [<ffffffff8103fcee>] vmalloc_sync_all+0x10e/0x180
 [<ffffffff81153c3b>] alloc_vm_area+0x4b/0x70
 [<ffffffffa05aaf1e>] _stp_ctl_write_cmd+0x19e/0x440 [stap_850fe20eb529fd0f6b13f9a95a4cdd61_882]
 [<ffffffff8122722b>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff8121b946>] ? security_file_permission+0x16/0x20
 [<ffffffff81184d98>] vfs_write+0xb8/0x1a0
 [<ffffffff81186156>] ? fget_light+0x66/0x100
 [<ffffffff811857d1>] sys_write+0x51/0x90
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b

on the list of modules, e1000e, one of the drivers using the vzalloc function
backports

Comment 2 Larry Woodman 2011-01-21 20:54:58 UTC

The problem is with THP.  The page reclaim code calls page_referenced_one() which takes the mm->page_table_lock on one CPU before sending an IPI to other CPU(s):

On CPU1 we take the mm->page_table_lock, send IPIs and wait for a response:
page_referenced_one(...)
        if (unlikely(PageTransHuge(page))) {
                pmd_t *pmd;

                spin_lock(&mm->page_table_lock);
                pmd = page_check_address_pmd(page, mm, address,
                                             PAGE_CHECK_ADDRESS_PMD_FLAG);
                if (pmd && !pmd_trans_splitting(*pmd) &&
                    pmdp_clear_flush_young_notify(vma, address, pmd))
                        referenced++;
                spin_unlock(&mm->page_table_lock);
        } else {


CPU2 can race in vmalloc_sync_all() because it disables interrupt(preventing a response to the IPI from CPU1) and takes the pgd_lock then spins in the mm->page_table_lock which is already held on CPU1.

                spin_lock_irqsave(&pgd_lock, flags);
                list_for_each_entry(page, &pgd_list, lru) {
                        pgd_t *pgd;
                        spinlock_t *pgt_lock;

                        pgd = (pgd_t *)page_address(page) + pgd_index(address);

                        pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
                        spin_lock(pgt_lock);


At this point the system is deadlocked.  The pmdp_clear_flush_young_notify needs to do its PDG business with the page_table_lock held then release that lock before sending the IPIs to the other CPUs.


Larry

Comment 3 RHEL Program Management 2011-02-01 05:44:36 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 4 RHEL Program Management 2011-02-01 19:09:49 UTC

This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 5 Andrea Arcangeli 2011-02-09 00:16:33 UTC

https://brewweb.devel.redhat.com/taskinfo?taskID=3096320

This has the fix I posted upstream. I'm waiting upstream comment before submitting the fix to rhkernel-list. I couldn't see where anything takes pgd_lock from irq so needing the irqsave around it, but it looks far too easy that I can remove the _irqsave and be done with it. But until I see something that takes it from irq I choosed to believe so.

Comment 6 Andrea Arcangeli 2011-02-15 18:49:29 UTC

fix posted to rhkernel-list Message-ID: <20110215184909.GK5935>

Comment 8 RHEL Program Management 2011-02-15 20:40:14 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 10 Andrea Arcangeli 2011-02-28 22:35:25 UTC

Posted a second approach to the fix on Message-ID: <20110228222138.GP22700>

The old fix should work too however but this is more obviously safe (for non-Xen users). The old fix remains a good idea but with this applied it's only a cleanup, so ok for upstream but not worth the risk for RHEL if this new patch is applied.

Comment 11 Aristeu Rozanski 2011-03-11 14:37:57 UTC

Patch(es) available on kernel-2.6.32-122.el6

Comment 18 errata-xmlrpc 2011-05-19 12:08:11 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html

Note You need to log in before you can comment on or make changes to this bug.