Bug 1038929 - BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:2229]
Summary: BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:2229]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: i686
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Marcelo Tosatti
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1082268 1082272 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-06 07:41 UTC by Masao Takahashi
Modified: 2014-06-09 22:33 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-09 12:32:33 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg output that lockup messages are contained (72.90 KB, text/plain)
2013-12-06 07:41 UTC, Masao Takahashi
no flags Details

Description Masao Takahashi 2013-12-06 07:41:20 UTC
Created attachment 833450 [details]
dmesg output  that lockup messages are contained

Description of problem:
Invoking a spice server(qemu-system-x86) which is emulator of Windows XP,
Booting Windows i stopped by "BUG: soft lockup" message.

Version-Release number of selected component (if applicable):
linux-3.13.0-0.rc2.git5.1.fc21.i686 


How reproducible:
always

Steps to Reproduce:
1. invoke a spice server (qemu-system-x86)
2. then, qemu is crashed by the bug.
3.

Actual results:

Booting Windows process is completed.
Expected results:
qemu-system-x86 is crashed.

Additional info:

Comment 1 Josh Boyer 2013-12-16 15:10:23 UTC
Are you still seeing this with 3.13-rc4?

Comment 2 Masao Takahashi 2013-12-16 23:13:30 UTC
(In reply to Josh Boyer from comment #1)
> Are you still seeing this with 3.13-rc4?

Wait a moment. I will try it.

Comment 3 Masao Takahashi 2013-12-17 04:15:05 UTC
I tried. But same result as follows.

---------------------------------------------
[  256.107005] BUG: soft lockup - CPU#1 stuck for 22s! [qemu-system-x86:2267]
[  256.107005] Modules linked in: fuse ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle tun bridge stp llc arc4 md4 nls_utf8 cifs dns_resolver snd_pcm_oss snd_mixer_oss fscache kvm_amd dm_service_time kvm snd_hda_codec_idt joydev snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc ppdev tg3 snd_timer snd soundcore dcdbas nv_tco i2c_nforce2 k8temp serio_raw ptp pps_core parport_pc parport binfmt_misc uinput dm_multipath usb_storage nouveau ata_generic pata_acpi video mxm_wmi wmi i2c_algo_bit drm_kms_helper ttm drm i2c_core
[  256.107005] CPU: 1 PID: 2267 Comm: qemu-system-x86 Not tainted 3.13.0-rc4.git0.1.fc21.i686 #1
[  256.107005] Hardware name: Dell Inc. OptiPlex 740 Enhanced/0YP693, BIOS 2.2.2  04/01/2009
[  256.107005] task: dd3b4100 ti: d6e5a000 task.ti: d6e5a000
[  256.107005] EIP: 0060:[<f84b5e12>] EFLAGS: 00000246 CPU: 1
[  256.107005] EIP is at svm_complete_interrupts+0x32/0x1a0 [kvm_amd]
[  256.107005] EAX: dbaef000 EBX: d6f40000 ECX: 00000000 EDX: dbaef000
[  256.107005] ESI: 80000008 EDI: 00000000 EBP: d6e5be0c ESP: d6e5be00
[  256.107005]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  256.107005] CR0: 8005003b CR2: 00000000 CR3: 16e1b000 CR4: 000007f0
[  256.107005] Stack:
[  256.107005]  d6f40000 00000001 00000000 d6e5be14 f84b5fb5 d6e5beb0 f846b739 00000000
[  256.107005]  d6e5be54 002c4020 df4af898 00000000 c0dcbf3c 00000286 00000000 df4af898
[  256.107005]  00000000 4e5ae5e7 00000035 00000000 00e388a4 f4f2e024 7ffbfeff fffffffe
[  256.107005] Call Trace:
[  256.107005]  [<f84b5fb5>] svm_cancel_injection+0x35/0x40 [kvm_amd]
[  256.107005]  [<f846b739>] kvm_arch_vcpu_ioctl_run+0x289/0x11a0 [kvm]
[  256.107005]  [<f8468474>] ? kvm_arch_vcpu_load+0x194/0x1f0 [kvm]
[  256.107005]  [<f84b20d6>] ? svm_vcpu_put+0x36/0x50 [kvm_amd]
[  256.107005]  [<f8457d28>] ? vcpu_load+0x58/0xa0 [kvm]
[  256.107005]  [<f84581ce>] kvm_vcpu_ioctl+0x40e/0x4d0 [kvm]
[  256.107005]  [<c04c14d4>] ? do_futex+0xf4/0xa10
[  256.107005]  [<f8457dc0>] ? vcpu_put+0x50/0x50 [kvm]
[  256.107005]  [<c058c4c2>] do_vfs_ioctl+0x2e2/0x4d0
[  256.107005]  [<c0a0737b>] ? __do_page_fault+0x1eb/0x560
[  256.107005]  [<c04c1e7c>] ? SyS_futex+0x8c/0x140
[  256.107005]  [<c0594eaf>] ? fget_light+0x6f/0xc0
[  256.107005]  [<c058c710>] SyS_ioctl+0x60/0x80
[  256.107005]  [<c0a0ac0d>] sysenter_do_call+0x12/0x28
[  256.107005] Code: 8d 74 26 00 89 c3 8b 80 80 34 00 00 f6 83 68 01 00 00 10 8b bb 30 35 00 00 8b b0 88 00 00 00 c7 83 30 35 00 00 00 00 00 00 75 1e <85> f6 c6 83 64 1c 00 00 00 c6 83 48 05 00 00 00 c6 83 50 05 00
---------------------------------------------------------------------------

Comment 4 Masao Takahashi 2014-01-17 05:50:00 UTC
linux-3.12.8-200.fc19.i686 is good.
There is no soft lockup.

Comment 5 Masao Takahashi 2014-01-28 08:55:02 UTC
I have tested linux-3.14.0-0.rc0.git12.1.fc21.i686 .
First, I edited arch/x86/kvm/x86.c as follows.

static int __vcpu_run(struct kvm_vcpu *vcpu)
          
		if (need_resched()) {
			srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
			//cond_resched();
			schedule();
			vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
		}

replacing cond_resched() with schedule(), and build a kernel.
Then, run qemu-kvm.

The above soft lockup is disappered.
Kernel works.

I don't know why.

Comment 6 Marcelo Tosatti 2014-01-28 16:00:33 UTC
Masao,

Can you try reverting

https://git.kernel.org/cgit/virt/kvm/kvm.git/patch/?id=01b71917b55d28c09ade9fb8c683cf0d2aad1858

Comment 7 Masao Takahashi 2014-01-29 04:11:14 UTC
(In reply to Marcelo Tosatti from comment #6)
> Masao,
> 
> Can you try reverting
> 
> https://git.kernel.org/cgit/virt/kvm/kvm.git/patch/
> ?id=01b71917b55d28c09ade9fb8c683cf0d2aad1858

I have tried.
But, the soft lockup is occurred.

Comment 8 Marcelo Tosatti 2014-01-29 18:55:41 UTC
(In reply to Masao Takahashi from comment #7)
> (In reply to Marcelo Tosatti from comment #6)
> > Masao,
> > 
> > Can you try reverting
> > 
> > https://git.kernel.org/cgit/virt/kvm/kvm.git/patch/
> > ?id=01b71917b55d28c09ade9fb8c683cf0d2aad1858
> 
> I have tried.
> But, the soft lockup is occurred.

Please try patch at 

http://marc.info/?l=linux-kernel&m=139038631607917&q=raw

Comment 9 Masao Takahashi 2014-01-30 03:55:03 UTC
(In reply to Marcelo Tosatti from comment #8)
> (In reply to Masao Takahashi from comment #7)
> > (In reply to Marcelo Tosatti from comment #6)
> > > Masao,
> > > 
> > > Can you try reverting
> > > 
> > > https://git.kernel.org/cgit/virt/kvm/kvm.git/patch/
> > > ?id=01b71917b55d28c09ade9fb8c683cf0d2aad1858
> > 
> > I have tried.
> > But, the soft lockup is occurred.
> 
> Please try patch at 
> 
> http://marc.info/?l=linux-kernel&m=139038631607917&q=raw

I have tried on linux-3.14.0-0.rc0.git15.1.fc21.i686 which is applied an above patch.
Soft lockup is occurred.

Comment 10 Masao Takahashi 2014-02-03 04:09:13 UTC
I have resolved this issue.
But, there is a question as below.

should_resched() procudure is defined in two different files.
One : arch/x86/include/asm/preempt.h
   static __always_inline bool should_resched(void)
   {
	return unlikely(!__this_cpu_read_4(__preempt_count)); /* here */
   }
It is called as should-1

Another: include/asm-generic/preempt.h
 static __always_inline bool should_resched(void)
 {
	return unlikely(!preempt_count() && tif_need_resched()); 
 }
This is called as should-2   

Then, I replace shoud-1 with should-2 and regenerated a kernel.
So, the above softlockup is disappered.

What is wrong?

Comment 11 H.-P. Sorge 2014-02-09 20:06:21 UTC
Just to confirm..
I have the same problem during power down. Suspend is OK.  

Current kernel: 
3.12.9-201.fc19.x86_64
 

The problem came up after I changed the mother board few years! ago.
I thought it was HW related.
Did'n bother to investgate because of a 7/24 operation.
Since I want to poweroff the system remotely it got nasty.

I fiddled around with acpi and friends but did not find a working combination.   

In case it matters 
cat /proc/cpuinfo 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 4
model name      : AMD Phenom(tm) II X4 965 Processor
stepping        : 3
microcode       : 0x10000c8
cpu MHz         : 800.000
...

Plan to upgrade to F20. But not tomorrow...

Comment 12 Peter Bieringer 2014-03-08 11:35:40 UTC
Got hit by same issue.

OS: Fedora 20 with latest updates

kernel 3.12.9-301.fc20.i686 
 => kvm virtual machine starts

kernel >= 3.13.x
 => same report as topic

kernel-3.14.0-0.rc5.git2.1.fc21.i686
 => same report as topic

Is there any newer kernel around to test?

System: Dell Latitude D620

CPU:

$ cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Core(TM)2 CPU         T7400  @ 2.16GHz
stepping	: 6
microcode	: 0xd1
cpu MHz		: 2167.000
cache size	: 4096 KB
...

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Core(TM)2 CPU         T7400  @ 2.16GHz
stepping	: 6
microcode	: 0xd1
cpu MHz		: 1000.000
cache size	: 4096 KB
..

Comment 13 Jeff Buhrt 2014-03-10 17:28:30 UTC
I am seeing a similar problem this morning on the newest kernel.

Fedora (3.12.7-300.fc20.i686+PAE) 20 (Heisenbug) -KVM seems ok
  -this is what I fell back to and the guests are fine now

Fedora (3.13.5-202.fc20.i686+PAE) 20 (Heisenbug) -KVM breaks
https://www.dropbox.com/s/0qvej1hem2z1z3c/2014-03-10%2011.17.29.jpg
https://www.dropbox.com/s/q4opmf4vd08ax89/2014-03-10%2011.26.00.jpg
a couple more shots:
https://www.dropbox.com/s/rfu73iqyhxj2zbt/2014-03-10%2010.38.04.jpg
https://www.dropbox.com/s/lmkyxe78hdaa2t2/2014-03-10%2010.38.23.jpg
Fedora (3.12.10-300.fc20.i686+PAE) 20 (Heisenbug) -KVM also breaks

Fedora 3.12.7-300.fc20.i686+PAE also showed:
 vcpu0 unhandled rdmsr: 0xc0010001
 vcpu1 unhandled rdmsr: 0x3a
https://www.dropbox.com/s/db2wa9d7go592iw/2014-03-10%2010.13.13.jpg

/var/log/messages from the tests:
https://www.dropbox.com/s/tiafantivfpos51/messages

From 3am today:
https://www.dropbox.com/s/x6s7mq0e4cqecxz/messages-20140310
This something that started happening maybe once a month. At first I thought it was the swap size, but right now there is 16GB RAM and swapon reports Size 17825788. Ironcially the newer kernels are worse.


The box:
Gigabyte GA-78LMT
AMD FX(tm)-6100 Six-Core Processor (bogomips 6629.62 per core)
16GB
two 3TB SATA drives, mdraid mirrored with LVM on top of the md's
32bit PAE kernel (longish story but a new 64bit install wasn't booting, but installed fine... I fell back to grub2/32bit with PAE... which also matches the kernel in our other systems including blackboxes in customers' closets)
guests are 32bit Fedora and XPee Pro
two NIC's: one public for VM's access only, 2nd for private local net

I wonder if this ticket is indirectly related to the lockups in https://bugzilla.redhat.com/show_bug.cgi?id=1061885

[I have done ~30ish years of Unix, so if you have things you like like me to try/see, please let me know.]

Thanks,

-Jeff

Comment 14 Sergei LITVINENKO 2014-03-10 19:43:21 UTC
Fedora-19, 32bit, PAE

After upgrade kernel to 3.13.x it is not possible not run guest Fedora-20 32 bit. Even update kernel to vmlinuz-3.13.5-103.fc19.i686.PAE do not help.

No issue with guest if host kernel is 3.12.11-201.fc19.i686.PAE

---

Mar 10 20:04:27 homedesk kernel: [  228.040001] BUG: soft lockup - CPU#1 stuck for 22s! [qemu-system-x86:9462]
Mar 10 20:04:27 homedesk kernel: [  228.040001] Modules linked in: vhost_net vhost macvtap macvlan bnep bluetooth fuse ip6table_fil
ter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_
conntrack xt_CHECKSUM iptable_mangle tun bridge stp llc adt7475 hwmon_vid arc4 raid0 gpio_ich iTCO_wdt iTCO_vendor_support snd_emu1
0k1 coretemp kvm_intel snd_util_mem snd_hwdep snd_rawmidi kvm snd_ac97_codec ac97_bus snd_seq snd_seq_device microcode snd_pcm nvid
ia(POF) serio_raw rtl8187 i2c_i801 eeprom_93cx6 uvcvideo mac80211 videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media 
snd_page_alloc cfg80211 rfkill snd_timer snd soundcore lpc_ich drm r8169 mii emu10k1_gp gameport mfd_core sky2 i2c_core asus_atk011
0 acpi_cpufreq uinput binfmt_misc
Mar 10 20:04:27 homedesk kernel: [  228.040001] CPU: 1 PID: 9462 Comm: qemu-system-x86 Tainted: PF          O 3.13.5-103.fc19.i686.
PAE #1
Mar 10 20:04:27 homedesk kernel: [  228.040001] Hardware name: System manufacturer P5K Deluxe/P5K Deluxe, BIOS 0902    06/19/2008
Mar 10 20:04:27 homedesk kernel: [  228.040001] task: ea7f9680 ti: e70f8000 task.ti: e70f8000
Mar 10 20:04:27 homedesk kernel: [  228.040001] EIP: 0060:[<f82e90cd>] EFLAGS: 00000202 CPU: 1
Mar 10 20:04:27 homedesk kernel: [  228.040001] EIP is at kvm_arch_vcpu_ioctl_run+0x24d/0x10c0 [kvm]
Mar 10 20:04:27 homedesk kernel: [  228.040001] EAX: 00000088 EBX: e6d08000 ECX: 0000401c EDX: 00000001
Mar 10 20:04:27 homedesk kernel: [  228.040001] ESI: 00000000 EDI: 00000000 EBP: e70f9eb0 ESP: e70f9e18
Mar 10 20:04:27 homedesk kernel: [  228.040001]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Mar 10 20:04:27 homedesk kernel: [  228.040001] CR0: 80050033 CR2: 098dc478 CR3: 26c52000 CR4: 000027f0
Mar 10 20:04:27 homedesk kernel: [  228.040001] Stack:
Mar 10 20:04:27 homedesk kernel: [  228.040001]  00000007 00000000 00000002 0000000c 00000000 ea7f9680 ea7f9680 0000000c
Mar 10 20:04:27 homedesk kernel: [  228.040001]  00000001 f82d9360 00000001 0000000c 00000000 0000000e 0048fce0 ea54e024
Mar 10 20:04:27 homedesk kernel: [  228.040001]  7ffbfeff fffffffe c047f301 ea7fde80 00000286 e70f9e94 c0480e6e e70f8000
Mar 10 20:04:27 homedesk kernel: [  228.040001] Call Trace:
Mar 10 20:04:27 homedesk kernel: [  228.040001]  [<f82d9360>] ? kvm_set_ioapic_irq+0x30/0x30 [kvm]
Mar 10 20:04:27 homedesk kernel: [  228.040001]  [<c047f301>] ? ttwu_do_activate.constprop.87+0x51/0x60
Mar 10 20:04:27 homedesk kernel: [  228.040001]  [<c0480e6e>] ? try_to_wake_up+0x12e/0x210
Mar 10 20:04:27 homedesk kernel: [  228.040001]  [<f82e5d7f>] ? kvm_arch_vcpu_load+0x4f/0x200 [kvm]
Mar 10 20:04:27 homedesk kernel: [  228.040001]  [<f82d5d04>] ? vcpu_load+0x44/0x70 [kvm]
Mar 10 20:04:27 homedesk kernel: [  228.040001]  [<f82d616e>] kvm_vcpu_ioctl+0x40e/0x4b0 [kvm]
Mar 10 20:04:27 homedesk kernel: [  228.040001]  [<c04b7ac4>] ? do_futex+0xf4/0xa90
Mar 10 20:04:27 homedesk kernel: [  228.040001]  [<c040f908>] ? __switch_to+0xb8/0x350
Mar 10 20:04:27 homedesk kernel: [  228.040001]  [<f82d5d60>] ? vcpu_put+0x30/0x30 [kvm]
Mar 10 20:04:27 homedesk kernel: [  228.040001]  [<c0578212>] do_vfs_ioctl+0x2e2/0x4c0
Mar 10 20:04:27 homedesk kernel: [  228.040001]  [<c06429bf>] ? file_has_perm+0x7f/0x90
Mar 10 20:04:27 homedesk kernel: [  228.040001]  [<c06433ac>] ? selinux_file_ioctl+0x4c/0xf0
Mar 10 20:04:27 homedesk kernel: [  228.040001]  [<c0578450>] SyS_ioctl+0x60/0x80
Mar 10 20:04:27 homedesk kernel: [  228.040001]  [<c09cffcd>] sysenter_do_call+0x12/0x28
Mar 10 20:04:27 homedesk kernel: [  228.040001]  [<c09c0000>] ? wait_noreap_copyout.isra.7+0x96/0xb1
Mar 10 20:04:27 homedesk kernel: [  228.040001] Code: 00 83 c0 24 e8 d5 d8 1b c8 fa 66 66 90 66 90 83 7b 1c 02 74 0b 8b 43 20 85 c0 0f 84 86 08 00 00 c7 43 1c 00 00 00 00 fb 66 66 90 <66> 90 8b 03 be 01 00 00 00 83 c0 24 e8 52 d8 1b c8 89 43 18 8b

Comment 15 Marcus Asshauer 2014-03-24 18:54:37 UTC
Same here with 3.13.6-200.fc20.i686

Comment 16 aillescastecdes 2014-03-26 16:41:18 UTC
Same problem

kernel 3.13.6-100.fc19.i686.PAE, Mate-compis Mate 1.6.2
Nvidia driver 331.49 (GeForce GT610) Core2 Quad ver 6.7.10
kmod-nvidia-3.13.6-100.fc19.i686.PAE-331.49-2.fc19.i686


this is intermittent and not always for the same conditions

qemu-system-x86-1.4.2-15.fc19.i686
qemu-kvm-1.4.2-15.fc19.i686
qemu-guest-agent-1.4.2-15.fc19.i686
ipxe-roms-qemu-20130517-2.gitc4bce43.fc19.noarch
qemu-kvm-tools-1.4.2-15.fc19.i686
qemu-common-1.4.2-15.fc19.i686
qemu-img-1.4.2-15.fc19.i686

other bug related with kernel update:

glibc crashed,ld-2.17.so killed by SIGSEGV,glibc-2.17-20.fc19


syslog for kernel CPU stuck:

Message from syslogd@localhost at Mar 26 10:16:57 ...
 kernel:[174484.048001] BUG: soft lockup - CPU#2 stuck for 22s! [qemu-system-x86:26935]

Message from syslogd@localhost at Mar 26 10:16:57 ...
 kernel:[174484.048001] CPU: 2 PID: 26935 Comm: qemu-system-x86 Tainted: PF          O 3.13.6-100.fc19.i686.PAE #1

Message from syslogd@localhost at Mar 26 10:16:57 ...
 kernel:[174484.048001] Hardware name:                  /DG35EC, BIOS ECG3510M.86A.0112.2009.0203.1136 02/03/2009

Message from syslogd@localhost at Mar 26 10:16:57 ...
 kernel:[174484.048001] task: e5aee780 ti: f6cb0000 task.ti: f6cb0000

Message from syslogd@localhost at Mar 26 10:16:57 ...
 kernel:[174484.048001] Stack:

Message from syslogd@localhost at Mar 26 10:16:57 ...
 kernel:[174484.048001] Call Trace:

Message from syslogd@localhost at Mar 26 10:16:57 ...
 kernel:[174484.048001] Code: 00 83 c0 24 e8 15 c9 29 c8 fa 66 66 90 66 90 83 7b 1c 02 74 0b 8b 43 20 85 c0 0f 84 86 08 00 00 c7 43 1c 00 00 00 00 fb 66 66 90 <66> 90 8b 03 be 01 00 00 00 83 c0 24 e8 92 c8 29 c8 89 43 18 8b

Comment 17 Marcus Asshauer 2014-03-31 19:11:43 UTC
Same with 3.13.7-200.fc20.i686

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 Duo CPU     T7100  @ 1.80GHz
stepping        : 13
microcode       : 0xa4
cpu MHz         : 1800.000
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fdiv_bug        : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida dtherm tpr_shadow vnmi flexpriority
bogomips        : 3591.20
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 Duo CPU     T7100  @ 1.80GHz
stepping        : 13
microcode       : 0xa4
cpu MHz         : 800.000                                                                                                                                    
cache size      : 2048 KB                                                                                                                                    
physical id     : 0                                                                                                                                          
siblings        : 2                                                                                                                                          
core id         : 1                                                                                                                                          
cpu cores       : 2                                                                                                                                          
apicid          : 1
initial apicid  : 1
fdiv_bug        : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida dtherm tpr_shadow vnmi flexpriority
bogomips        : 3591.20
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

Comment 18 Josh Boyer 2014-04-01 17:22:16 UTC
*** Bug 1082268 has been marked as a duplicate of this bug. ***

Comment 19 Josh Boyer 2014-04-01 17:22:18 UTC
*** Bug 1082272 has been marked as a duplicate of this bug. ***

Comment 20 dave.harley 2014-04-02 09:57:42 UTC
Hi, 
Affecting me also. 
I'm stuck back on 3.12.11-201 to keep my VMs operational. 

Checked after last kernel update and it is still broken on Kernel 3.13.7-200 with the same dmesg and stdout messages as detailed in other reports. 

CPU in question is:
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
stepping	: 10
microcode	: 0xa07

Regards
Dave

Comment 21 dave.harley 2014-04-10 10:42:30 UTC
Quick update:   Still in Kernel 3.13.9-200 

Regards, 
Dave Harley.

Comment 22 Sergei LITVINENKO 2014-04-11 21:06:11 UTC
it looks like only 32bit,PAE is affected.
After migration to F20 64bit issue is gone

Comment 23 H.-P. Sorge 2014-04-12 21:17:11 UTC
Shortly after upgrade to F20 64Bit there is still the same problem.
Sometimes CPU#3 gets stuck.

But to dig deeper into it I first have to fix the "Oh no! ..... " problem.

Comment 24 Marcin Trendota 2014-04-16 13:19:57 UTC
I've noticed that lockup occurs more likely with higher disk usage (system upgrade for example).

Comment 25 H.-P. Sorge 2014-04-24 14:19:19 UTC
Seems to be solved with 3.13.10-200.fc20.x86_64.

I have powered off from local and remote session several times.

It is always a clean poweroff - no excessive waits, no complaining messages.

Thank you.

Comment 26 Masao Takahashi 2014-04-25 03:29:30 UTC
(In reply to H.-P. Sorge from comment #25)
> Seems to be solved with 3.13.10-200.fc20.x86_64.
> 
> I have powered off from local and remote session several times.
> 
> It is always a clean poweroff - no excessive waits, no complaining messages.
> 
> Thank you.

I testd as below. But the problem still exists.

1. kernel
   linux-3.13.11-200.fc19.i686
2. cpuinfo
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 15
model		: 107
model name	: AMD Athlon(tm) 64 X2 Dual Core Processor 5600+
stepping	: 2
cpu MHz		: 1000.000
cache size	: 512 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fdiv_bug	: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch lbrv
bogomips	: 2004.11
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc 100mhzsteps

processor	: 1
vendor_id	: AuthenticAMD
cpu family	: 15
model		: 107
model name	: AMD Athlon(tm) 64 X2 Dual Core Processor 5600+
stepping	: 2
cpu MHz		: 1000.000
cache size	: 512 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fdiv_bug	: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch lbrv
bogomips	: 2004.11
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc 100mhzsteps

Comment 27 Jeff Buhrt 2014-04-29 01:09:07 UTC
I also confirmed the 32bit PAE host kernel-PAE-3.13.10-200.fc20.i686 still fails when running a 32bit guest kernel. XPee Pro SP3 guest seems fine.

kernel-PAE-3.12.7-300.fc20.i686 is mostly stable and only ~monthly will crash all the guests and run slow until rebooted. 3.12.7 seems to only do this under heavy guest I/O, but not that different of a pattern from other loads at least at a high level (Munin graphs pre-crash, etc.)

Comment 28 Masao Takahashi 2014-05-16 06:58:59 UTC
I have tried a new kernel as below.
Then softlockup problem is disappered.

1. kernel
   linux-3.15.0-0.rc5.git2.8.fc21.i686
2. Patch applied
  kvm/linux-next
  here is a link.
 https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=0091d63ea508b831b63dbb3e23e204da51ce1521

Please any one try this.

Comment 29 dave.harley 2014-06-09 10:37:36 UTC
I've tried the  3.14.5-200.fc20.i686 and it seems to be resolved. 
Thanks.

Comment 30 Josh Boyer 2014-06-09 12:32:33 UTC
Seems fixed by:

[jwboyer@vader linux]$ git log v3.14.4..v3.14.5 arch/x86/kvm/
commit 8c02c2a4f89a4eda43b4679e8f0e170edeebc85f
Author: Marcelo Tosatti <mtosatti>
Date:   Thu Apr 10 18:19:12 2014 -0300

    KVM: x86: remove WARN_ON from get_kernel_ns()
    
    commit b351c39cc9e0151cee9b8d52a1e714928faabb38 upstream.
    
    Function and callers can be preempted.
    
    https://bugzilla.kernel.org/show_bug.cgi?id=73721
    
    Signed-off-by: Marcelo Tosatti <mtosatti>
    Reviewed-by: Paolo Bonzini <pbonzini>
    Signed-off-by: Greg Kroah-Hartman <gregkh>

Comment 31 Masao Takahashi 2014-06-09 22:33:57 UTC
(In reply to Josh Boyer from comment #30)
> Seems fixed by:
> 
> [jwboyer@vader linux]$ git log v3.14.4..v3.14.5 arch/x86/kvm/
> commit 8c02c2a4f89a4eda43b4679e8f0e170edeebc85f
> Author: Marcelo Tosatti <mtosatti>
> Date:   Thu Apr 10 18:19:12 2014 -0300
> 
>     KVM: x86: remove WARN_ON from get_kernel_ns()
>     
>     commit b351c39cc9e0151cee9b8d52a1e714928faabb38 upstream.
>     
>     Function and callers can be preempted.
>     
>     https://bugzilla.kernel.org/show_bug.cgi?id=73721
>     
>     Signed-off-by: Marcelo Tosatti <mtosatti>
>     Reviewed-by: Paolo Bonzini <pbonzini>
>     Signed-off-by: Greg Kroah-Hartman <gregkh>

OK, I have tested linux-3.14.6-200.fc20.i686.
This issue is resolved.


Note You need to log in before you can comment on or make changes to this bug.