Bug 505440
Summary: | Panic on suspend with KSM module loaded | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Michael Solberg <msolberg> | ||||
Component: | kvm | Assignee: | Andrea Arcangeli <aarcange> | ||||
Status: | CLOSED ERRATA | QA Contact: | Lawrence Lim <llim> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 5.4 | CC: | cpelland, ehabkost, ieidus, lihuang, sghosh, syeghiay, tburke, tools-bugs, virt-maint, yxie | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kvm-83-88.el5 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-09-02 09:25:36 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 510812 | ||||||
Attachments: |
|
Description
Michael Solberg
2009-06-11 22:07:47 UTC
I need to see the panic message, as I couldn't reproduce it. (actually, I think I could reproduce it if the KSM module is loaded, so I would like to confirm if you are seeing the KSM problem I am seeing, or another issue) Does it happen on suspend to disk, too? For reference, this is the soft lockup warning I am seeing if I suspend with KSM loaded: Disabling non-boot CPUs ... CPU 1 is now offline CPU1 is down Breaking affinity for irq 114 CPU 2 is now offline CPU2 is down Breaking affinity for irq 4 Breaking affinity for irq 14 CPU 3 is now offline CPU3 is down Stopping tasks: ===================================================================================================<3>BUG: soft lockup - CPU#0 stuck for 10s! [kksmd:1917] CPU 0: Modules linked in: ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables nfsd exportfs auth_rpcgss autofs4 hidp nfs lockd fscache nfs_acl rfcomm l2cap bluetooth sunrpc bridge ipv6 xfrm_nalgo crypto_api ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac lp ksm(U) kvm_intel(U) kvm(U) snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device ata_piix snd_pcm_oss sg snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep snd parport_pc i2c_i801 ide_cd tg3 parport i2c_core shpchp i5000_edac soundcore cdrom edac_mc pcspkr serio_raw dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 1917, comm: kksmd Tainted: G 2.6.18-152.el5 #1 RIP: 0010:[<ffffffff8003b068>] [<ffffffff8003b068>] prepare_to_wait+0xc/0x5c RSP: 0018:ffff81023a707df8 EFLAGS: 00000246 RAX: ffff81023a707e78 RBX: ffffffff88431990 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff81023a707e60 RDI: ffffffff88431990 RBP: 0000000000000000 R08: ffff81023a706000 R09: 000000000000003c R10: ffffffff803ed5a0 R11: 0000000000000000 R12: ffffffff88431990 R13: 0000001c0000000a R14: 0000000000004724 R15: ffffffff80063fc8 FS: 0000000000000000(0000) GS:ffffffff803c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00002b62faaf5000 CR3: 0000000224918000 CR4: 00000000000026e0 Call Trace: [<ffffffff8842e1e4>] :ksm:kthread_ksm_scan_thread+0x0/0xc90 [<ffffffff8842edbf>] :ksm:kthread_ksm_scan_thread+0xbdb/0xc90 [<ffffffff800a05aa>] autoremove_wake_function+0x0/0x2e [<ffffffff800a0392>] keventd_create_kthread+0x0/0xc4 [<ffffffff8842e1e4>] :ksm:kthread_ksm_scan_thread+0x0/0xc90 [<ffffffff800a0392>] keventd_create_kthread+0x0/0xc4 [<ffffffff80033031>] kthread+0xfe/0x132 [<ffffffff8005efb1>] child_rip+0xa/0x11 [<ffffffff800a0392>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032f33>] kthread+0x0/0x132 [<ffffffff8005efa7>] child_rip+0x0/0x11 BUG: soft lockup - CPU#0 stuck for 10s! [kksmd:1917] CPU 0: Created attachment 348174 [details]
ksm not working w/ suspend, need try_to_freeze
This patch should fix up the problem and let the ksm thread properly freeze.
(In reply to comment #1) > I need to see the panic message, as I couldn't reproduce it. (actually, I think > I could reproduce it if the KSM module is loaded, so I would like to confirm if > you are seeing the KSM problem I am seeing, or another issue) > > Does it happen on suspend to disk, too? I'm having trouble getting my output - the screen blanks and I can't seem to figure out how to keep it from doing it. I'll try playing with hal quirks when I get a little time and see what I can come up with. I'll also check against suspend. (In reply to comment #4) > > I'm having trouble getting my output - the screen blanks and I can't seem to > figure out how to keep it from doing it. I'll try playing with hal quirks when > I get a little time and see what I can come up with. While you don't have the full output, could you test suspend while keeping the kvm modules loaded, but unloading the ksm module? If it works if ksm is unloaded, so it is very likely you are seeing the same problem I am seeing. If it doesn't work even if ksm is unloaded, there may be other issues in addition to the KSM problem. (In reply to comment #5) > (In reply to comment #4) > > > > I'm having trouble getting my output - the screen blanks and I can't seem to > > figure out how to keep it from doing it. I'll try playing with hal quirks when > > I get a little time and see what I can come up with. > > While you don't have the full output, could you test suspend while keeping the > kvm modules loaded, but unloading the ksm module? > > If it works if ksm is unloaded, so it is very likely you are seeing the same > problem I am seeing. If it doesn't work even if ksm is unloaded, there may be > other issues in addition to the KSM problem. Sure. If I remove the ksm module, the system does indeed sleep correctly. Also, the system wakes correctly, which it had not done previously with kvm-83 from linux-kvm.org. (In reply to comment #1) > Does it happen on suspend to disk, too? Yes it does - it appears to be the same output you're getting up there. So the issue seems to be on the KSM module, only. Changing description. Can reproduce in RHEL5U4 Server x86_64 20090701.0 [root@dhcp-66-70-3 ~]# rpm -q kernel kernel-2.6.18-156.el5 [root@dhcp-66-70-3 ~]# rpm -q kvm kvm-83-82.el5 Michael can you test the first or second patch I posted to virtualist on your laptop? The way to test this is: 1) rmmod ksm, then suspend and resume. 2) modprobe ksm, suspend and resume. 3) modprobe ksm and run kvm, verify with "fuser /dev/ksm" that kvm registered into ksm, start ksm with "./ksmctl start 100000 1", verify with `top` that kksmd runs at 100% cpu load, and finally suspend and resume Tested in kvm-83-89.el5. Can not reproduce the panic Host A : Intel(R) Core(TM) i7 CPU 920 kernel-2.6.18-157.el5 Host B : Intel(R) Core(TM)2 Quad CPU Q9550 kernel-2.6.18-156.el5 steps 1. modprobe ksm, suspend and resume (echo mem > /sys/power/status ) => PASS 2. modprobe ksm, suspend and resume (echo disk > /sys/power/status ) => PASS 3. modprobe ksm and run kvm, verify with "fuser /dev/ksm" that kvm registered into ksm, start ksm with " ksmctl start 100000 1", verify with `top` that kksmd runs at 100% cpu load, and finally suspend and resume (echo disk > /sys/power/status ) ==> PASS 4. modprobe ksm and run kvm, verify with "fuser /dev/ksm" that kvm registered into ksm, start ksm with " ksmctl start 100000 1", verify with `top` that kksmd runs at 100% cpu load, and finally suspend and resume (echo mem > /sys/power/status ) ==> PASS setting to *VERIFIED* thanks a lot for checking! An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-1272.html |