Bug 457649

Summary: Kernel double faults when nxclient tries to connect
Product: [Fedora] Fedora Reporter: Chris Underhill <redhat-bugzilla>
Component: kvmAssignee: Glauber Costa <gcosta>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 9CC: berrange, clalance, gcosta, kernel-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-11-05 23:10:14 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Chris Underhill 2008-08-02 07:03:22 EDT
Description of problem:

I have an up-to-date virtual machine running Fedora 8, which I'm running through
the virtual machine manager. It is using qemu-kvm. The host operating system is
Fedora 9, again fully updated. In both cases, the architecture is x86_64. 

Whenever I try to connect to this virtual machine using the free Nomachine NX
client, the kernel in the virtual machine generates a double fault and an oops
message in the log file. I am accessing the virtual machine over a virtual
bridge (br0 interface). I'm using KDE in the virtual machine, through the NX
client configuration.

Version-Release number of selected component (if applicable):

Host - Fedora 9, fully updated:
qemu-0.9.1-6.fc9.x86_64
kmod-kqemu-2.6.25.11-97.fc9.x86_64-1.3.0-0.37.lvn9.x86_64
kqemu-1.3.0-0.7.pre11.lvn9.noarch
qemu-img-0.9.1-6.fc9.x86_64
kmod-kqemu-1.3.0-0.37.lvn9.x86_64
kvm-65-7.fc9.x86_64
kernel-2.6.25.11-97.fc9.x86_64
/proc/cpuinfo contains:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
stepping	: 6
cpu MHz		: 1600.000
cache size	: 4096 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc
arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr
lahf_lm
bogomips	: 4803.18
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
stepping	: 6
cpu MHz		: 1600.000
cache size	: 4096 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc
arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr
lahf_lm
bogomips	: 4799.91
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:


Virtual machine - Fedora 8, fully updated.
kernel-2.6.25.11-97.fc9.x86_64

/proc/cpuinfo contains:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 2
model name	: QEMU Virtual CPU version 0.9.1
stepping	: 3
cpu MHz		: 2400.067
cache size	: 2048 KB
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
clflush mmx fxsr sse sse2 syscall nx lm rep_good pni
bogomips	: 4805.21
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:


How reproducible:

Every time.

Steps to Reproduce:
1. Install Fedora 8 in a qemu-kvm virtual machine
2. Try to connect using nxclient from a Fedora 9 box
3. Virtual machine oopses with double fault and nxclient fails to connect.
  
Actual results:

Kernel double faults:

double fault: 0000 [1] SMP 
CPU 0 
Modules linked in: rfcomm l2cap bluetooth autofs4 fuse sunrpc nf_conntrack_ipv4
ipt_REJECT iptable_filter ip_tables nf_conntrack_ipv6 xt_state nf_conntrack
xt_tcpudp ip6t_ipv6header ip6t_REJECT ip6table_filter ip6_tables x_tables
sha256_generic aes_x86_64 aes_generic cbc dm_crypt crypto_blkcipher dm_mirror
dm_multipath dm_mod ipv6 floppy 8139cp pcspkr 8139too mii i2c_piix4 i2c_core
button sr_mod cdrom sg pata_acpi ata_piix ata_generic libata sd_mod scsi_mod
ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: freq_table]
Pid: 2712, comm: nxserver Not tainted 2.6.25.11-60.fc8 #1
RIP: 0010:[<0000000081023990>]  [<0000000081023990>]
RSP: 0018:0000000000000000  EFLAGS: 00010012
RAX: 000000000000002d RBX: 0000000000000000 RCX: 0000000000caaff4
RDX: 0000000000000000 RSI: 0000000000cacacc RDI: 0000000000021000
RBP: 00000000ffd55638 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff813f2000(0063) knlGS:00000000f7f366c0
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000081023990 CR3: 0000000015875000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nxserver (pid: 2712, threadinfo ffff81000d102000, task ffff81000d0f0000)
Stack:  ffffffff814bce48 0000000081023965 ffffffff814bcf58 000000000000002b
 0000000000000040 0000000081023990 ffffffff814bceb8 ffffffff8100d5ba
 0000000000000000 0000000000000000 ffffffff8129d544 ffffffff814bcf58
Call Trace:
 <#DF>  [<ffffffff8100d5ba>] ? show_registers+0xbc/0x211
 [<ffffffff8128c7ee>] ? __die+0x94/0xfa
 [<ffffffff8100d82f>] ? die+0x42/0x66
 [<ffffffff8100d941>] ? do_double_fault+0x63/0x65
 [<ffffffff8100ce4d>] ? double_fault+0x7d/0x90
 <<EOE>> 

Code:  Bad RIP value.
RIP  [<0000000081023990>]
 RSP <0000000000000000>
---[ end trace c72d62afd986f164 ]---
BUG: sleeping function called from invalid context at kernel/rwsem.c:21
in_atomic():0, irqs_disabled():1
Pid: 2712, comm: nxserver Tainted: G      D  2.6.25.11-60.fc8 #1

Call Trace:
 <#DF>  [<ffffffff8104887f>] ? lock_hrtimer_base+0x25/0x4b
 [<ffffffff81029242>] __might_sleep+0xb5/0xb7
 [<ffffffff8128af1f>] down_read+0x1d/0x2e
 [<ffffffff8105c0fd>] acct_collect+0x4c/0x199
 [<ffffffff81035fe1>] do_exit+0x21b/0x65c
 [<ffffffff8128c51e>] oops_begin+0x0/0x96
 [<ffffffff8100d84a>] die+0x5d/0x66
 [<ffffffff8100d941>] do_double_fault+0x63/0x65
 [<ffffffff8100ce4d>] double_fault+0x7d/0x90
 <<EOE>> 


Expected results:

NX client connects and I can run applications etc.

Additional info:

Connecting to a CentOS 5.2, Fedora 7 or Fedora 9 virtual machines from my host
works fine.
Comment 1 Chris Underhill 2008-08-02 07:05:07 EDT
Ooops - cut-and-paste error. The kernel in the Fedora 8 VM is 

kernel-2.6.25.11-60.fc8
Comment 2 Chuck Ebbert 2008-08-11 20:21:49 EDT
Are there any messages in the host's log when that happens?
Comment 3 Chris Underhill 2008-08-12 13:59:45 EDT
(In reply to comment #2)

> Are there any messages in the host's log when that happens?

The only messages are in my .xsession-errors file, where I see:

Qt: Locales not supported on X server
Window manager warning: Invalid WM_TRANSIENT_FOR window 0x4ac specified for 0x5600006 ( NX - fc8v).

but I get this as the NX logon window appears, i.e., before connection, regardless of which VM (or remote machine) I connect to, so doubt it's relevant.

There's nothing reported in dmesg, /var/log/messages or /var/log/Xorg.0.log
Comment 4 Chuck Ebbert 2008-09-22 20:20:32 EDT
Is the nxserver program 32-bit or 64-bit?
Comment 5 Chuck Ebbert 2008-09-22 20:21:42 EDT
Also, is this still happening with 2.6.26.3 kernels?
Comment 6 Chris Underhill 2008-09-28 18:21:41 EDT
NX software on the Fedora 8 virtual machine consists of:

nxnode-3.0.0-93.i386
nxserver-3.0.0-79.i386
nxclient-3.0.0-89.i386

I've updated the vm to use kernel-2.6.26.3-14.fc8.x86_64 and the double fault is still present:

Sep 28 23:10:51 fc8v1 kernel: double fault: 0000 [1] SMP 
Sep 28 23:10:51 fc8v1 kernel: CPU 0 
Sep 28 23:10:51 fc8v1 kernel: Modules linked in: rfcomm l2cap bluetooth autofs4 fuse sunrpc nf_conntrack_ipv4 ipt_REJECT ipt
able_filter ip_tables nf_conntrack_ipv6 xt_state nf_conntrack xt_tcpudp ip6t_ipv6header ip6t_REJECT ip6table_filter ip6_tabl
es x_tables sha256_generic aes_x86_64 aes_generic cbc dm_crypt crypto_blkcipher dm_mirror dm_log dm_multipath dm_mod ipv6 floppy 8139cp pcspkr 8139too mii i2c_piix4 i2c_core sr_mod cdrom sg pata_acpi ata_piix ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: freq_table]
Sep 28 23:10:51 fc8v1 kernel: Pid: 2479, comm: nxserver Not tainted 2.6.26.3-14.fc8 #1
Sep 28 23:10:51 fc8v1 kernel: RIP: 0010:[<0000000081026580>]  [<0000000081026580>]
Sep 28 23:10:51 fc8v1 kernel: RSP: 0018:0000000000000000  EFLAGS: 00010012
Sep 28 23:10:51 fc8v1 kernel: RAX: 000000000000002d RBX: 0000000000000000 RCX: 0000000000caaff4
Sep 28 23:10:51 fc8v1 kernel: RDX: 0000000000000000 RSI: 0000000000cacacc RDI: 0000000000021000
Sep 28 23:10:51 fc8v1 kernel: RBP: 00000000fff10b48 R08: 0000000000000000 R09: 0000000000000000
Sep 28 23:10:51 fc8v1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Sep 28 23:10:51 fc8v1 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Sep 28 23:10:51 fc8v1 kernel: FS:  0000000000000000(0000) GS:ffffffff8141a000(0063) knlGS:00000000f7ef16c0
Sep 28 23:10:51 fc8v1 kernel: CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
Sep 28 23:10:51 fc8v1 kernel: CR2: 0000000081026580 CR3: 0000000004625000 CR4: 00000000000006e0
Sep 28 23:10:51 fc8v1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 28 23:10:51 fc8v1 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 28 23:10:51 fc8v1 kernel: Process nxserver (pid: 2479, threadinfo ffff8100060e4000, task ffff81000b538000)
Sep 28 23:10:51 fc8v1 kernel: Stack:  ffffffff8158fe48 0000000081026555 ffffffff8158ff58 000000000000002b
Sep 28 23:10:51 fc8v1 kernel:  0000000000000040 0000000081026580 ffffffff8158feb8 ffffffff8100daca
Sep 28 23:10:51 fc8v1 kernel:  0000000000000000 0000000000000000 ffffffff812aa564 ffffffff8158ff58
Sep 28 23:10:51 fc8v1 kernel: Call Trace:
Sep 28 23:10:51 fc8v1 kernel:  <#DF>  [<ffffffff8100daca>] ? show_registers+0xbc/0x211
Sep 28 23:10:51 fc8v1 kernel:  [<ffffffff81299548>] ? __die+0x94/0xfa
Sep 28 23:10:51 fc8v1 kernel:  [<ffffffff8100dd3f>] ? die+0x42/0x66
Sep 28 23:10:51 fc8v1 kernel:  [<ffffffff8100de51>] ? do_double_fault+0x63/0x65
Sep 28 23:10:51 fc8v1 kernel:  [<ffffffff8100d339>] ? double_fault+0x89/0xa0
Sep 28 23:10:51 fc8v1 kernel:  <<EOE>> 
Sep 28 23:10:51 fc8v1 kernel: 
Sep 28 23:10:51 fc8v1 kernel: Code:  Bad RIP value.
Sep 28 23:10:51 fc8v1 kernel: RIP  [<0000000081026580>]
Sep 28 23:10:51 fc8v1 kernel:  RSP <0000000000000000>
Sep 28 23:10:51 fc8v1 kernel: ---[ end trace e74e5e35529f801d ]---
Sep 28 23:10:51 fc8v1 kernel: BUG: sleeping function called from invalid context at kernel/rwsem.c:21
Sep 28 23:10:51 fc8v1 kernel: in_atomic():0, irqs_disabled():1
Sep 28 23:10:51 fc8v1 kernel: Pid: 2479, comm: nxserver Tainted: G      D   2.6.26.3-14.fc8 #1
Sep 28 23:10:51 fc8v1 kernel: 
Sep 28 23:10:51 fc8v1 kernel: Call Trace:
Sep 28 23:10:51 fc8v1 kernel:  <#DF>  [<ffffffff8102bfbb>] __might_sleep+0xd5/0xd9
Sep 28 23:10:51 fc8v1 kernel:  [<ffffffff81297c2b>] down_read+0x1d/0x2e
Sep 28 23:10:51 fc8v1 kernel:  [<ffffffff8105f30c>] acct_collect+0x4c/0x1a0
Sep 28 23:10:51 fc8v1 kernel:  [<ffffffff8103946a>] do_exit+0x213/0x83d
Sep 28 23:10:51 fc8v1 kernel:  [<ffffffff81299237>] oops_begin+0x0/0xa6
Sep 28 23:10:51 fc8v1 kernel:  [<ffffffff8100dd5a>] die+0x5d/0x66
Sep 28 23:10:51 fc8v1 kernel:  [<ffffffff8100de51>] do_double_fault+0x63/0x65
Sep 28 23:10:51 fc8v1 kernel:  [<ffffffff8100d339>] double_fault+0x89/0xa0
Sep 28 23:10:51 fc8v1 kernel:  <<EOE>>
Comment 7 Chuck Ebbert 2008-10-03 21:57:25 EDT
from System.map:
ffffffff81026580 T ia32_sysenter_target

The reported oops address in 2.6.26.3-14 is <0000000081026580>

The report on 2.6.25.11-60 is consistent with this one.

Somehow the top half of the address got zeroed??
Comment 8 Chuck Ebbert 2008-10-03 22:34:22 EDT
This is apparently a bug in kvm, fixed in kvm-74.

Alexanders patch is at:
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/20012/focus=20235

Or to be more specific:

diff --git a/qemu/target-i386/cpu.h b/qemu/target-i386/cpu.h
index 7e95900..61c39d4 100644
--- a/qemu/target-i386/cpu.h
+++ b/qemu/target-i386/cpu.h
@@ -542,8 +542,8 @@ typedef struct CPUX86State {

     /* sysenter registers */
     uint32_t sysenter_cs;
-    uint32_t sysenter_esp;
-    uint32_t sysenter_eip;
+    uint64_t sysenter_esp;
+    uint64_t sysenter_eip;
     uint64_t efer;
     uint64_t star;
Comment 9 Fedora Update System 2008-10-13 13:23:33 EDT
kvm-65-10.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/kvm-65-10.fc9
Comment 10 Fedora Update System 2008-10-15 22:08:46 EDT
kvm-65-10.fc9 has been pushed to the Fedora 9 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kvm'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-8846
Comment 11 Chris Underhill 2008-10-16 03:12:49 EDT
Updated to kvm-65-10.fc9 and I can successfully connect to my vm using nxclient with no kernel faults. 

As far as I'm concerned, the updated rpm fixes this bug - Thanks!
Comment 12 Fedora Update System 2008-11-05 23:10:10 EST
kvm-65-10.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.