Description of problem: kernel-3.20.0-0.rc0.git7.3.bz1193875.fc23.aarch64 cannot run KVM guests (sometimes anyway). It looks like it's missing some emulation from kvm.ko: [ 66.722766] kvm [1301]: load/store instruction decoding not implemented It happens when the guest inserts a kernel module. It may be `crc32-arm64.ko' but I cannot be completely certain about that, since it may be inserting another module but not getting the debug message out. Version-Release number of selected component (if applicable): kernel-3.20.0-0.rc0.git7.3.bz1193875.fc23.aarch64 qemu-2.2.0-5.fc22.aarch64 How reproducible: 100% Steps to Reproduce: 1. Run: libguestfs-test-tool
I should note that what specifically happens is the guest starts up, and then suddenly aborts (when inserting the guest kernel module).
In the test above, I was using guest kernel == host kernel == 3.20.0-0.rc0.git7.3.bz1193875.fc23. I tried this again, with: guest kernel == 3.20.0-0.rc0.git7.3.bz1193875.fc23 host kernel == 3.19.0-0.rc7.git1.1.fc22 This *also* crashes when loading the kernel module in the guest. So it seems as if the problem is some new processor instruction is used in a kernel module (possibly crc32-arm64.ko), which KVM is unable to emulate.
A couple of other observations: (1) Doesn't fail with host == guest == 3.19.0-0.rc7.git1.1.fc22.aarch64 (2) I'm very certain the troublesome kernel module is either `crc32-arm64.ko' or `crc32.ko', and I'm about 90% certain it is `crc32-arm64.ko'.
Created attachment 993700 [details] crc32-arm64.ko It occurred to me that maybe people wouldn't be able to download the suspected modules, so I'll attached them here.
Created attachment 993701 [details] crc32.ko
Here is a diff of the instructions used in crc32-arm64 3.19 vs 3.20. ldrh mov mvn +nop orr ret Seems strange. I checked the 3.20 module and there is a nop instruction inserted after every ret or unconditional branch. I have no idea if nop would be a problem on aarch64. Maybe this is a wild goose chase.
I noticed that qemu dumps the registers on stderr before exiting: error: kvm run failed Function not implemented PC=fffffe000046cf4c SP=fffffe0028383ba0 X00=fffffdfffaa20020 X01=fffffe0028383c00 X02=fffffffffffffffc X03=00000000d503201f X04=fffffdfffaa20024 X05=ffffffffffffffff X06=0000000000000bb0 X07=fffffe0001a3c3b8 X08=fffffe0028380000 X09=fffffe0000f91000 X10=fffffe0001cfc000 X11=fffffe000123b000 X12=0000000000000000 X13=fffffe0001a3b808 X14=ffff000000000000 X15=ffffffffffffffff X16=fffffe0000165898 X17=0000000000000001 X18=0000000000000d71 X19=0000040000000000 X20=fffffdfffc000020 X21=0000000000000140 X22=fffffe0029471180 X23=fffffe0000f218e8 X24=0000000000000000 X25=0000000000000000 X26=fffffe0001d6d000 X27=fffffdfffc0007a8 X28=fffffe0029660000 X29=fffffe0028383ba0 X30=fffffe00001e1bb0 PSTATE=600001c5 (flags -ZC-) Not very helpful without knowing the address space layout of the guest kernel.
I resolved PC against the symbol table, and it happens in the guest kernel function '__copy_to_user', at the place marked with <<< below: fffffe00003e3040 <__copy_to_user>: fffffe00003e3040: 8b020004 add x4, x0, x2 fffffe00003e3044: f1002042 subs x2, x2, #0x8 fffffe00003e3048: 540000a4 b.mi fffffe00003e305c <__copy_to_user+0x1c> fffffe00003e304c: f8408423 ldr x3, [x1],#8 fffffe00003e3050: f1002042 subs x2, x2, #0x8 fffffe00003e3054: f8008403 str x3, [x0],#8 fffffe00003e3058: 54ffffa5 b.pl fffffe00003e304c <__copy_to_user+0xc> fffffe00003e305c: b1001042 adds x2, x2, #0x4 fffffe00003e3060: 54000084 b.mi fffffe00003e3070 <__copy_to_user+0x30> fffffe00003e3064: b8404423 ldr w3, [x1],#4 fffffe00003e3068: d1001042 sub x2, x2, #0x4 fffffe00003e306c: b8004403 str w3, [x0],#4 <<<<<<< fffffe00003e3070: b1000842 adds x2, x2, #0x2 fffffe00003e3074: 54000084 b.mi fffffe00003e3084 <__copy_to_user+0x44> fffffe00003e3078: 78402423 ldrh w3, [x1],#2 fffffe00003e307c: d1000842 sub x2, x2, #0x2 fffffe00003e3080: 78002403 strh w3, [x0],#2 fffffe00003e3084: b1000442 adds x2, x2, #0x1 fffffe00003e3088: 54000064 b.mi fffffe00003e3094 <__copy_to_user+0x54> fffffe00003e308c: 39400023 ldrb w3, [x1] fffffe00003e3090: 39000003 strb w3, [x0] fffffe00003e3094: d2800000 mov x0, #0x0 // #0 fffffe00003e3098: d65f03c0 ret Unfortunately qemu doesn't dump a stack trace before it exits. I will try to attach gdb to see if that gives any extra information.
gdb gives this stack trace, which looks bogus to me: Program received signal SIGABRT, Aborted. __copy_to_user () at arch/arm64/lib/copy_to_user.S:43 43 USER(9f, str w3, [x0], #4 ) (gdb) bt #0 __copy_to_user () at arch/arm64/lib/copy_to_user.S:43 #1 0xfffffe00001a6558 in __probe_kernel_write (dst=<optimized out>, src=<optimized out>, size=<optimized out>) at mm/maccess.c:56 #2 0x0000000000000000 in ?? () More gdb information: (gdb) info registers x0 0xfffffdfffaa20020 -2199113301984 x1 0xfffffe0028343c20 -2198348743648 x2 0xfffffffffffffffc -4 x3 0xd503201f 3573751839 x4 0xfffffdfffaa20024 -2199113301980 x5 0xffffffffffffffff -1 x6 0xfffffe0000a1b588 -2199012657784 x7 0xfffffe0000a1b570 -2199012657808 x8 0xfffffe0000a1b558 -2199012657832 x9 0xfffffdfee01a4480 -2203853372288 x10 0x101010101010101 72340172838076673 x11 0x6 6 x12 0x0 0 x13 0xffffffffffffffff -1 x14 0xffff000000000000 -281474976710656 x15 0xffffffffffffffff -1 x16 0xfffffe000013a5e0 -2199021967904 x17 0x1 1 x18 0x0 0 x19 0x40000000000 4398046511104 x20 0xfffffdfffc000020 -2199090364384 x21 0x140 320 x22 0x0 0 x23 0xfffffe0000dc17d8 -2199008831528 x24 0xfffffe000009c5b0 -2199022615120 x25 0xfffffe0000f65000 -2199007113216 x26 0x0 0 x27 0x0 0 x28 0xfffffe0029120000 -2198334210048 x29 0xfffffe0028343bc0 -2198348743744 x30 0xfffffe00001a6558 -2199021525672 sp 0xfffffe0028343bc0 0xfffffe0028343bc0 pc 0xfffffe00003e306c 0xfffffe00003e306c <__copy_to_user+44> cpsr 0x600001c5 1610613189 fpsr 0x0 0 fpcr 0x0 0 (gdb) frame 1 #1 0xfffffe00001a6558 in __probe_kernel_write (dst=<optimized out>, src=<optimized out>, size=<optimized out>) at mm/maccess.c:56 56 ret = __copy_to_user_inatomic((__force void __user *)dst, src, size);
Thread on kvmarm: https://lists.cs.columbia.edu/pipermail/kvmarm/2015-February/thread.html#13632
ftrace is implicated: https://lists.cs.columbia.edu/pipermail/kvmarm/2015-February/013652.html
Marc Zyngier posted a patch here which works for me: http://lists.infradead.org/pipermail/linux-arm-kernel/2015-February/325445.html I intend to add this to the kernel package in Rawhide unless someone gets there first.
Similar new bug in 4.2.0: https://bugzilla.redhat.com/show_bug.cgi?id=1269779