Bug 505440

Summary: Panic on suspend with KSM module loaded
Product: Red Hat Enterprise Linux 5 Reporter: Michael Solberg <msolberg>
Component: kvmAssignee: Andrea Arcangeli <aarcange>
Status: CLOSED ERRATA QA Contact: Lawrence Lim <llim>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: cpelland, ehabkost, ieidus, lihuang, sghosh, syeghiay, tburke, tools-bugs, virt-maint, yxie
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kvm-83-88.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 09:25:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 510812    
Attachments:
Description Flags
ksm not working w/ suspend, need try_to_freeze none

Description Michael Solberg 2009-06-11 22:07:47 UTC
Description of problem:

Suspending my T61 laptop with the kvm modules loaded results in a kernel panic.

Version-Release number of selected component (if applicable):

kvm-83-74.el5

How reproducible:

100%

Steps to Reproduce:
1.  Right-click the power management icon an select suspend
  
Actual results:
Panic

Expected results:
System suspends

Additional info:
For a while I've compiled my own kvm modules from linux-kvm.org on RHEL 5 and they would always panic when waking from suspend.  The modules in 5.4 panic as the system goes to uni-processor mode.  If I unload the modules before suspending, everything works as expected.

Comment 1 Eduardo Habkost 2009-06-16 18:06:31 UTC
I need to see the panic message, as I couldn't reproduce it. (actually, I think I could reproduce it if the KSM module is loaded, so I would like to confirm if you are seeing the KSM problem I am seeing, or another issue)

Does it happen on suspend to disk, too?

Comment 2 Eduardo Habkost 2009-06-16 21:39:37 UTC
For reference, this is the soft lockup warning I am seeing if I suspend with KSM loaded:

Disabling non-boot CPUs ...
CPU 1 is now offline
CPU1 is down
Breaking affinity for irq 114
CPU 2 is now offline
CPU2 is down
Breaking affinity for irq 4
Breaking affinity for irq 14
CPU 3 is now offline
CPU3 is down
Stopping tasks: ===================================================================================================<3>BUG: soft lockup - CPU#0 stuck for 10s! [kksmd:1917]
CPU 0:
Modules linked in: ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables nfsd exportfs auth_rpcgss autofs4 hidp nfs lockd fscache nfs_acl rfcomm l2cap bluetooth sunrpc bridge ipv6 xfrm_nalgo crypto_api ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac lp ksm(U) kvm_intel(U) kvm(U) snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device ata_piix snd_pcm_oss sg snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep snd parport_pc i2c_i801 ide_cd tg3 parport i2c_core shpchp i5000_edac soundcore cdrom edac_mc pcspkr serio_raw dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 1917, comm: kksmd Tainted: G      2.6.18-152.el5 #1
RIP: 0010:[<ffffffff8003b068>]  [<ffffffff8003b068>] prepare_to_wait+0xc/0x5c
RSP: 0018:ffff81023a707df8  EFLAGS: 00000246
RAX: ffff81023a707e78 RBX: ffffffff88431990 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff81023a707e60 RDI: ffffffff88431990
RBP: 0000000000000000 R08: ffff81023a706000 R09: 000000000000003c
R10: ffffffff803ed5a0 R11: 0000000000000000 R12: ffffffff88431990
R13: 0000001c0000000a R14: 0000000000004724 R15: ffffffff80063fc8
FS:  0000000000000000(0000) GS:ffffffff803c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002b62faaf5000 CR3: 0000000224918000 CR4: 00000000000026e0

Call Trace:
 [<ffffffff8842e1e4>] :ksm:kthread_ksm_scan_thread+0x0/0xc90
 [<ffffffff8842edbf>] :ksm:kthread_ksm_scan_thread+0xbdb/0xc90
 [<ffffffff800a05aa>] autoremove_wake_function+0x0/0x2e
 [<ffffffff800a0392>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8842e1e4>] :ksm:kthread_ksm_scan_thread+0x0/0xc90
 [<ffffffff800a0392>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80033031>] kthread+0xfe/0x132
 [<ffffffff8005efb1>] child_rip+0xa/0x11
 [<ffffffff800a0392>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032f33>] kthread+0x0/0x132
 [<ffffffff8005efa7>] child_rip+0x0/0x11

BUG: soft lockup - CPU#0 stuck for 10s! [kksmd:1917]
CPU 0:

Comment 3 Chris Wright 2009-06-16 22:12:44 UTC
Created attachment 348174 [details]
ksm not working w/ suspend, need try_to_freeze

This patch should fix up the problem and let the ksm thread properly freeze.

Comment 4 Michael Solberg 2009-06-16 23:37:12 UTC
(In reply to comment #1)
> I need to see the panic message, as I couldn't reproduce it. (actually, I think
> I could reproduce it if the KSM module is loaded, so I would like to confirm if
> you are seeing the KSM problem I am seeing, or another issue)
> 
> Does it happen on suspend to disk, too?  

I'm having trouble getting my output - the screen blanks and I can't seem to figure out how to keep it from doing it.  I'll try playing with hal quirks when I get a little time and see what I can come up with.

I'll also check against suspend.

Comment 5 Eduardo Habkost 2009-06-16 23:58:46 UTC
(In reply to comment #4)
> 
> I'm having trouble getting my output - the screen blanks and I can't seem to
> figure out how to keep it from doing it.  I'll try playing with hal quirks when
> I get a little time and see what I can come up with.

While you don't have the full output, could you test suspend while keeping the kvm modules loaded, but unloading the ksm module?

If it works if ksm is unloaded, so it is very likely you are seeing the same problem I am seeing. If it doesn't work even if ksm is unloaded, there may be other issues in addition to the KSM problem.

Comment 6 Michael Solberg 2009-06-17 00:23:54 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > 
> > I'm having trouble getting my output - the screen blanks and I can't seem to
> > figure out how to keep it from doing it.  I'll try playing with hal quirks when
> > I get a little time and see what I can come up with.
> 
> While you don't have the full output, could you test suspend while keeping the
> kvm modules loaded, but unloading the ksm module?
> 
> If it works if ksm is unloaded, so it is very likely you are seeing the same
> problem I am seeing. If it doesn't work even if ksm is unloaded, there may be
> other issues in addition to the KSM problem.  

Sure.  If I remove the ksm module, the system does indeed sleep correctly.  Also, the system wakes correctly, which it had not done previously with kvm-83 from linux-kvm.org.

Comment 7 Michael Solberg 2009-06-17 13:20:46 UTC
(In reply to comment #1)
> Does it happen on suspend to disk, too?  

Yes it does - it appears to be the same output you're getting up there.

Comment 8 Eduardo Habkost 2009-06-17 13:25:48 UTC
So the issue seems to be on the KSM module, only. Changing description.

Comment 14 Mark Xie 2009-07-03 13:33:29 UTC
Can reproduce in RHEL5U4 Server x86_64 20090701.0

[root@dhcp-66-70-3 ~]# rpm -q kernel
kernel-2.6.18-156.el5
[root@dhcp-66-70-3 ~]# rpm -q kvm
kvm-83-82.el5

Comment 15 Andrea Arcangeli 2009-07-07 15:30:01 UTC
Michael can you test the first or second patch I posted to virtualist on your laptop?

The way to test this is:

1) rmmod ksm, then suspend and resume.
2) modprobe ksm, suspend and resume.
3) modprobe ksm and run kvm, verify with "fuser /dev/ksm" that kvm registered into ksm, start ksm with "./ksmctl start 100000 1", verify with `top` that kksmd runs at 100% cpu load, and finally suspend and resume

Comment 20 lihuang 2009-07-14 06:11:42 UTC
Tested in kvm-83-89.el5. Can not reproduce the panic 

Host A : Intel(R) Core(TM) i7 CPU         920
         kernel-2.6.18-157.el5
Host B : Intel(R) Core(TM)2 Quad CPU    Q9550
         kernel-2.6.18-156.el5

steps
1.  modprobe ksm, suspend and resume (echo mem > /sys/power/status )
=> PASS
2.  modprobe ksm, suspend and resume (echo disk > /sys/power/status )
=> PASS
3. modprobe ksm and run kvm, verify with "fuser /dev/ksm" that kvm registered
into ksm, start ksm with " ksmctl start 100000 1", verify with `top` that
kksmd runs at 100% cpu load, and finally suspend and resume  (echo disk > /sys/power/status )
==> PASS
4. modprobe ksm and run kvm, verify with "fuser /dev/ksm" that kvm registered
into ksm, start ksm with " ksmctl start 100000 1", verify with `top` that
kksmd runs at 100% cpu load, and finally suspend and resume  (echo mem > /sys/power/status )
==> PASS


setting to *VERIFIED*

Comment 21 Andrea Arcangeli 2009-07-14 21:25:54 UTC
thanks a lot for checking!

Comment 23 errata-xmlrpc 2009-09-02 09:25:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1272.html