RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1083448 - kernel(2.6.32-431.5.1.el6.x86_64) crash on AMD Opteron when running qemu-kvm with windows xp
Summary: kernel(2.6.32-431.5.1.el6.x86_64) crash on AMD Opteron when running qemu-kvm ...
Keywords:
Status: CLOSED DUPLICATE of bug 1116398
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.5
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Radim Krčmář
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1069309
TreeView+ depends on / blocked
 
Reported: 2014-04-02 09:27 UTC by Lulin Fan
Modified: 2023-09-14 02:05 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-01-14 13:11:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Lulin Fan 2014-04-02 09:27:33 UTC
Description of problem:

On AMD server, I met kernel crash after 2-3 days running qemu-kvm with windows xp guest. Now it crashed 5 times in last 2 weeks.

PID: 7405   TASK: ffff8826ad06caa0  CPU: 35  COMMAND: "qemu-system-x86"
 #0 [ffff88103c6c7b40] machine_kexec at ffffffff81038f3b
 #1 [ffff88103c6c7ba0] crash_kexec at ffffffff810c5d82
 #2 [ffff88103c6c7c70] panic at ffffffff8152751a
 #3 [ffff88103c6c7cf0] watchdog_overflow_callback at ffffffff810e696d
 #4 [ffff88103c6c7d10] __perf_event_overflow at ffffffff8111c847
 #5 [ffff88103c6c7d90] perf_event_overflow at ffffffff8111ce14
 #6 [ffff88103c6c7da0] x86_pmu_handle_irq at ffffffff8101e7e8
 #7 [ffff88103c6c7e90] perf_event_nmi_handler at ffffffff8152bd69
 #8 [ffff88103c6c7ea0] notifier_call_chain at ffffffff8152d825
 #9 [ffff88103c6c7ee0] atomic_notifier_call_chain at ffffffff8152d88a
#10 [ffff88103c6c7ef0] notify_die at ffffffff810a153e
#11 [ffff88103c6c7f20] do_nmi at ffffffff8152b4eb
#12 [ffff88103c6c7f50] nmi at ffffffff8152adb0
    [exception RIP: flush_tlb_others_ipi+282]
    RIP: ffffffff8104fa5a  RSP: ffff8824ff603778  RFLAGS: 00000046
    RAX: 0000000000000000  RBX: 00000000000006c0  RCX: 0000000000000039
    RDX: 0000000000000000  RSI: 0000000000000040  RDI: ffffffff81e27058
    RBP: ffff8824ff6037b8   R8: 0000000000000001   R9: 0000000000000040
    R10: 0000000000013560  R11: 0000000000000007  R12: ffffffff81e27058
    R13: 0000000000000003  R14: ffffffff81e27050  R15: ffff8824ff7610c8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
#13 [ffff8824ff603778] flush_tlb_others_ipi at ffffffff8104fa5a
#14 [ffff8824ff6037c0] native_flush_tlb_others at ffffffff8104fae6
#15 [ffff8824ff6037f0] flush_tlb_page at ffffffff8104fc0e
#16 [ffff8824ff603820] do_wp_page at ffffffff811499f7
#17 [ffff8824ff6038c0] handle_pte_fault at ffffffff8114a81d
#18 [ffff8824ff6039a0] handle_mm_fault at ffffffff8114b27a
#19 [ffff8824ff603a10] __do_page_fault at ffffffff8104a8d8
#20 [ffff8824ff603b30] do_page_fault at ffffffff8152d76e
#21 [ffff8824ff603b60] page_fault at ffffffff8152ab25
    [exception RIP: copy_user_generic_string+50]
    RIP: ffffffff8128d432  RSP: ffff8824ff603c10  RFLAGS: 00010097
    RAX: ffff8824ff602000  RBX: ffff883e26abc758  RCX: 0000000000000004
    RDX: 0000000000000004  RSI: ffff8824ff603c50  RDI: 00007f88cbecb300
    RBP: ffff8824ff603c38   R8: 00000000ffffffff   R9: ffff8824ff603ac4
    R10: 0000000000000000  R11: 0000000000000007  R12: ffff8834e1fcc000
    R13: ffff8824ff603c50  R14: 0000000000000004  R15: ffff8826ad06caa0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#22 [ffff8824ff603c10] kvm_write_guest_cached at ffffffffa02f96ab [kvm]
#23 [ffff8824ff603c40] kvm_lapic_sync_to_vapic at ffffffffa0325a8f [kvm]
#24 [ffff8824ff603c80] kvm_arch_vcpu_ioctl_run at ffffffffa0312715 [kvm]
#25 [ffff8824ff603dc0] kvm_vcpu_ioctl at ffffffffa02f9b04 [kvm]
#26 [ffff8824ff603e60] vfs_ioctl at ffffffff8119db32
#27 [ffff8824ff603ea0] do_vfs_ioctl at ffffffff8119dffa
#28 [ffff8824ff603f30] sys_ioctl at ffffffff8119e251
#29 [ffff8824ff603f80] system_call_fastpath at ffffffff8100b072
    RIP: 00007f89451bf7b7  RSP: 00007f893bffea68  RFLAGS: 00010206
    RAX: 0000000000000010  RBX: ffffffff8100b072  RCX: 0000000000000009
    RDX: 0000000000000000  RSI: 000000000000ae80  RDI: 000000000000001c
    RBP: 00007f894b940890   R8: 0000000000000000   R9: 0000000000000007
    R10: fffffffffffffdd7  R11: 0000000000000246  R12: 0000000000000001
    R13: 00007f894b940980  R14: 00007f8948fae940  R15: 00007f8948388000
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b


PID: 7406   TASK: ffff8826ad06d500  CPU: 56  COMMAND: "qemu-system-x86"
 #0 [ffff88383c407e90] crash_nmi_callback at ffffffff8102fee6
 #1 [ffff88383c407ea0] notifier_call_chain at ffffffff8152d825
 #2 [ffff88383c407ee0] atomic_notifier_call_chain at ffffffff8152d88a
 #3 [ffff88383c407ef0] notify_die at ffffffff810a153e
 #4 [ffff88383c407f20] do_nmi at ffffffff8152b4eb
 #5 [ffff88383c407f50] nmi at ffffffff8152adb0
    [exception RIP: _spin_lock+30]
    RIP: ffffffff8152a61e  RSP: ffff8824c190d818  RFLAGS: 00000097
    RAX: 0000000000000589  RBX: ffff8825d04ec700  RCX: 0000000000000001
    RDX: 0000000000000588  RSI: ffff88401c4cb7b0  RDI: ffffea00d363eca8
    RBP: ffff8824c190d818   R8: 00003ffffffff000   R9: 0000000000001000
    R10: 0000000000013560  R11: 0000000000000007  R12: ffffea00cd4082e8
    R13: ffffea00d2a66760  R14: 8000003c2f8b4065  R15: ffff883c65b15658
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #6 [ffff8824c190d818] _spin_lock at ffffffff8152a61e
 #7 [ffff8824c190d820] do_wp_page at ffffffff81149926
 #8 [ffff8824c190d8c0] handle_pte_fault at ffffffff8114a81d
 #9 [ffff8824c190d9a0] handle_mm_fault at ffffffff8114b27a
#10 [ffff8824c190da10] __do_page_fault at ffffffff8104a8d8
#11 [ffff8824c190db30] do_page_fault at ffffffff8152d76e
#12 [ffff8824c190db60] page_fault at ffffffff8152ab25
    [exception RIP: copy_user_generic_string+50]
    RIP: ffffffff8128d432  RSP: ffff8824c190dc10  RFLAGS: 00010097
    RAX: ffff8824c190c000  RBX: ffff8835b799a698  RCX: 0000000000000004
    RDX: 0000000000000004  RSI: ffff8824c190dc50  RDI: 00007f88cbecb380
    RBP: ffff8824c190dc38   R8: ffff8824c190db18   R9: ffff8824c190dac4
    R10: 0000000000000000  R11: 0000000000000007  R12: ffff8834e1fcc000
    R13: ffff8824c190dc50  R14: 0000000000000004  R15: ffff8826ad06d500
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#13 [ffff8824c190dc10] kvm_write_guest_cached at ffffffffa02f96ab [kvm]
#14 [ffff8824c190dc40] kvm_lapic_sync_to_vapic at ffffffffa0325a8f [kvm]
#15 [ffff8824c190dc80] kvm_arch_vcpu_ioctl_run at ffffffffa0312715 [kvm]
#16 [ffff8824c190ddc0] kvm_vcpu_ioctl at ffffffffa02f9b04 [kvm]
#17 [ffff8824c190de60] vfs_ioctl at ffffffff8119db32
#18 [ffff8824c190dea0] do_vfs_ioctl at ffffffff8119dffa
#19 [ffff8824c190df30] sys_ioctl at ffffffff8119e251
#20 [ffff8824c190df80] system_call_fastpath at ffffffff8100b072
    RIP: 00007f89451bf7b7  RSP: 00007f893b5fda68  RFLAGS: 00010206
    RAX: 0000000000000010  RBX: ffffffff8100b072  RCX: 0000000000000009
    RDX: 0000000000000000  RSI: 000000000000ae80  RDI: 000000000000001d
    RBP: 00007f894b96c1d0   R8: 0000000000000000   R9: 0000000000000007
    R10: fffffffffffffdd7  R11: 0000000000000246  R12: 0000000000000001
    R13: 00007f894b96c2c0  R14: 00007f8948fae940  R15: 00007f8948380000
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b


processor       : 63
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 1
model name      : AMD Opteron(TM) Processor 6272
stepping        : 2
cpu MHz         : 2100.089
cache size      : 2048 KB
physical id     : 3
siblings        : 16
core id         : 7
cpu cores       : 8
apicid          : 111
initial apicid  : 111
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx
 fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nonstop_tsc
extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave av
x lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skin
it wdt lwp fma4 nodeid_msr topoext perfctr_core cpb npt lbrv svm_lock nrip_save tsc_scale vmcb_cl
ean flushbyasid decodeassists pausefilter pfthreshold
bogomips        : 4199.79
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual



Version-Release number of selected component (if applicable):

kernel: 2.6.32-431.5.1.el6.x86_64
qemu-kvm: 1.4.2

How reproducible:

sometimes.

Steps to Reproduce:
1.
2.
3.

Actual results:
kernel crash

Expected results:
no crash

Additional info:

after spent several days on reading the kernel/kvm code and kdump, I think the problem is in the kvm, when vcpu_enter_guest() disable the irq, the page_fault from copy_to_user of this cpu will call flush_tlb_others_ipi() to send a ipi to other cpu, at the same time, another cpu also in a same page_fault with irq disable,  that cause a deadlock.

CPU(A)                                                 CPU(B)
vcpu_enter_guest                               vcpu_enter_guest
  disable_irq                                     disable_irq
    kvm_lapic_sync_to_vapic                         kvm_lapic_sync_to_vapic
      kvm_write_guest_cached                          kvm_write_guest_cached
        page_fault                                      page_fault
         do_wp_page(hold a lock)                        do_wp_page(try to get lock)
            flush_tlb_others_ipi                             waiting....
		waiting CPU B to handle ipi...


Also found this commit:
http://git.kernel.org/cgit/virt/kvm/kvm.git/commit/?id=b463a6f744a263fccd7da14db1afdc880371a280
it seems to resolve this issue.

I am not sure the analysis is right, any suggestion??

Comment 2 Ademar Reis 2014-07-14 18:53:19 UTC
Thanks for taking the time to enter a bug report with us. We're not able to guarantee the timeliness or suitability of a resolution for issues entered here because this is not a mechanism for requesting support.

If this issue is critical or in any way time sensitive, please raise a ticket through your regular Red Hat support channels to make certain it receives the proper attention and prioritization that will result in a timely resolution.

For information on how to contact the Red Hat production support team, please visit: https://www.redhat.com/support/process/production/#howto

Comment 4 Qunfang Zhang 2014-07-30 07:27:02 UTC
Hi, fanlulin 

(1) Does this bug only happen on winxp guest? Is other windows guest affected? 
(2) The qemu-kvm version you used is not an RHEL6 build, could you try RHEL6 qemu-kvm version? 

Thanks,
Qunfang

Comment 8 Paolo Bonzini 2015-01-14 13:11:42 UTC

*** This bug has been marked as a duplicate of bug 1116398 ***

Comment 9 Red Hat Bugzilla 2023-09-14 02:05:48 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.