Bug 1213478

Summary: QEMU/KVM for 32-bit ARM fails to boot on AMD Seattle AArch64 host
Product: [Fedora] Fedora Reporter: D. Marlin <dmarlin>
Component: qemuAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 21CC: amit.shah, berrange, cfergeau, crobinso, drjones, dwmw2, gansalmon, itamar, jonathan, kernel-maint, lersek, madhu.chinakonda, mchehab, msalter, pbonzini, pbrobinson, rjones, scottt.tw, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: aarch64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-04-28 17:07:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418, 922257    
Attachments:
Description Flags
inverse bisection log none

Description D. Marlin 2015-04-20 16:24:35 UTC
Description of problem:

QEMU/KVM for 32-bit ARM fails to boot on an AMD Seattle AArch64 host.  This does seem to be specific to the AMD Seattle hardware [see Additional Info: 1].


Version-Release number of selected component (if applicable):

on both host and guest:
- Fedora 21
- kernel-3.19.3-200.fc21.armv7hl
- qemu-system-aarch64-2.3.0-0.2.rc1.fc22.aarch64
- libvirt-daemon-1.2.13-2.fc22.aarch64


How reproducible:

reliably, but not 100% of the time [see Additional Info: 1]


Steps to Reproduce:
1. Set up a Fedora-21 AArch64 host with the F22 version of QEMU and deps.
2. Put an F21 disk image and kernel/initrd in a working directory
3. Start the guest:

  qemu-system-aarch64 \
    -cpu host,aarch64=off -M virt \
    -m 1024 \
    -nographic \
    -enable-kvm \
    -kernel ./vmlinuz-3.19.3-200.fc21.armv7hl \
    -initrd initramfs-3.19.3-200.fc21.armv7hl.img \
    -append 'root=/dev/vda3 rw rootwait console=ttyAMA0,38400n8' \
    -drive if=none,id=hd0,file=fedora-21.img,format=raw \
    -device virtio-blk-device,drive=hd0 \
    -netdev user,id=user0 -device virtio-net-device,netdev=user0


Actual results:

Guest hangs after displaying:

      :
[  OK  ] Reached target System Initialization.
         Starting Show Plymouth Boot Screen...
error: kvm run failed Function not implemented
R00=c0de26cc R01=0000002f R02=00000000 R03=f0198c00
R04=ee565418 R05=0000001e R06=ee565428 R07=00000000
R08=c0dd9a58 R09=0000001f R10=00000001 R11=ee5fe464
R12=ffffffff R13=00000000 R14=fee0fff4 R15=c08a026c
PSR=60010190 -ZC- A usr32

kvm [30401]: load/store instruction decoding not implemented (HSR: 0x92000046, IPA: 0x3efffff4)


Expected results:

Guest boots and runs without errors.


Additional info:

1) On rare occasions the guest boots to a login prompt with no errors.  This has been observerd only twice during testing.

2)  Running the same setup and command on an APM Mustang (AArch64) results in a functional F21 ARMv7 (32-bit ARM) guest.

3) This was also tested using the kernel-3.17.4-302.fc21.aarch64 on the host and guest, and the result was the same (except for the values in the register dump).

Comment 1 Andrew Jones 2015-04-20 17:23:26 UTC
(In reply to D. Marlin from comment #0)
> 
> 3) This was also tested using the kernel-3.17.4-302.fc21.aarch64 on the host
> and guest, and the result was the same (except for the values in the
> register dump).

This is interesting. I take it aarch64=off was also removed from the qemu command line. Can you paste the output of the 64-on-64 register dump here?

Comment 2 D. Marlin 2015-04-20 19:34:34 UTC
(In reply to Andrew Jones from comment #1)
> (In reply to D. Marlin from comment #0)
> > 
> > 3) This was also tested using the kernel-3.17.4-302.fc21.aarch64 on the host
> > and guest, and the result was the same (except for the values in the
> > register dump).
> 
> This is interesting. I take it aarch64=off was also removed from the qemu
> command line. Can you paste the output of the 64-on-64 register dump here?

Sorry, I meant to say using kernel-3.17 on the host (64-bit) and guest (32-bit).  The host was running:

  Linux seattle-02.farm.hsv.redhat.com 3.17.4-302.fc21.aarch64 #1 SMP Mon Dec 8 12:40:24 UTC 2014 aarch64 aarch64 aarch64 GNU/Linux

The command I used was:

qemu-system-aarch64 \
    -cpu host,aarch64=off -M virt \
    -m 1024 \
    -nographic \
    -enable-kvm \
    -initrd ./initramfs-3.17.4-301.fc21.armv7hl+lpae.img \
    -kernel ./vmlinuz-3.17.4-301.fc21.armv7hl+lpae \
    -append 'root=/dev/vda3 rw' \
    -drive if=none,id=hd0,file=fedora-21.img,format=raw \
    -device virtio-blk-device,drive=hd0 \
    -netdev user,id=user0 -device virtio-net-device,netdev=user0
	:

error: kvm run failed Function not implemented
R00=c0a2e34c R01=0000002f R02=00000000 R03=f0152c00
R04=c856b018 R05=0000001e R06=c856b020 R07=00000000
R08=c0a26dc0 R09=0000001f R10=00000001 R11=c85b2664
R12=ffffffff R13=00000000 R14=fee0fff4 R15=c062126c
PSR=60010190 -ZC- A usr32


However, when trying to reproduce this I made an interesting mistake.  I copied the exact command I used more recently on APM Mustang to enable SMP:

qemu-system-aarch64 \
    -cpu host,aarch64=off -M virt \
    -smp cpus=8 \
    -m 8192 \
    -nographic \
    -enable-kvm \
    -initrd ./initramfs-3.17.4-301.fc21.armv7hl+lpae.img \
    -kernel ./vmlinuz-3.17.4-301.fc21.armv7hl+lpae \
    -append 'root=/dev/vda3 rw' \
    -drive if=none,id=hd0,file=fedora-21.img,format=raw \
    -device virtio-blk-device,drive=hd0 \
    -netdev user,id=user0 -device virtio-net-device,netdev=user0


The only difference is in the number of VCPUs requested and the amount of memory.  Mustang supports 8 CPUs, but Seattle does not.  This time I received the following:

Warning: Number of SMP cpus requested (8) exceeds the recommended cpus supported by KVM (6)
Warning: Number of hotpluggable cpus requested (8) exceeds the recommended cpus supported by KVM (6)
	:

Fedora release 21 (Twenty One)
Kernel 3.17.4-301.fc21.armv7hl+lpae on an armv7l (ttyAMA0)

localhost login:


It booted successfully.  I have been able to successfully boot with other values for 'cpus=' and '-m' as well, but it still 'sometimes' fails, and it seems to consistently fail with the first example used (above).

Comment 3 Laszlo Ersek 2015-04-21 12:11:52 UTC
For now,

host hardware  host OS    qemu                 result   tested by
-------------  ---------  -------------------  -------  ---------
Mustang        RHELSA     Rawhide              N/A      N/A
Mustang        RHELSA     upstream v2.3.0-rc3  success  lersek
Mustang        Rawhide    Rawhide              success  dmarlin
Mustang        Rawhide    upstream v2.3.0-rc3  ?        ?
Seattle        RHELSA     Rawhide              N/A      N/A
Seattle        RHELSA     upstream v2.3.0-rc3  success  lersek
Seattle        Rawhide    Rawhide              failure  dmarlin
Seattle        Rawhide    upstream v2.3.0-rc3  ?        ?

Comment 4 D. Marlin 2015-04-21 18:50:48 UTC
Adjusted for my actual testing:

host hardware  host OS    qemu                 result   tested by
-------------  ---------  -------------------  -------  ---------
Mustang        RHELSA     Rawhide              N/A      N/A
Mustang        RHELSA     upstream v2.3.0-rc3  success  lersek
Mustang        Fedora 21  2.3.0-0.2.rc1.fc22   success  dmarlin
Mustang        Rawhide    upstream v2.3.0-rc3  ?        ?
Seattle        RHELSA     Rawhide              N/A      N/A
Seattle        RHELSA     upstream v2.3.0-rc3  success  lersek
Seattle        Fedora 21  2.3.0-0.2.rc1.fc22   failure* dmarlin
Seattle        Rawhide    upstream v2.3.0-rc3  ?        ?

The host kernel I used on Mustang and Seattle is:

  kernel-3.19.3-200.fc21.aarch64

* - I have been able to start a 32-bit guest on Seattle using different options on the QEMU command line, at least *some* of the time.  The original command line still fails consistently.  One of the command lines that has worked is:

qemu-system-aarch64 \
    -cpu host,aarch64=off -M virt \
    -smp cpus=6 \
    -m 6144 \
    -nographic \
    -enable-kvm \
    -initrd ./initramfs-3.17.4-301.fc21.armv7hl+lpae.img \
    -kernel ./vmlinuz-3.17.4-301.fc21.armv7hl+lpae \
    -append 'root=/dev/vda3 rw' \
    -drive if=none,id=hd0,file=fedora-21.img,format=raw \
    -device virtio-blk-device,drive=hd0 \
    -netdev user,id=user0 -device virtio-net-device,netdev=user0
           :

Fedora release 21 (Twenty One)
Kernel 3.17.4-301.fc21.armv7hl+lpae on an armv7l (ttyAMA0)

Comment 5 Laszlo Ersek 2015-04-21 22:39:16 UTC
Created attachment 1017161 [details]
inverse bisection log

Okay, so this is indeed a qemu bug that has been fixed between v2.3.0-rc1 and v2.3.0-rc3. I bisected this range, with the "good" and "bad" meanings reversed for the git bisect command (and consequently, the bisection log, which I'm attaching). The fix is this commit:

commit 25b9fb107bc1f6735fdb3fce537792f5db95f78d
Author: Alex Bennée <alex.bennee>
Date:   Wed Apr 1 17:57:30 2015 +0100

    target-arm: kvm64 fix save/restore of SPSR regs
    
    The current code was negatively indexing the cpu state array and not
    synchronizing banked spsr register state with the current mode's spsr
    state, causing occasional failures with migration.
    
    Some munging is done to take care of the aarch64 mapping and also to
    ensure the most current value of the spsr is updated to the banked
    registers (relevant for KVM<->TCG migration).
    
    Signed-off-by: Alex Bennée <alex.bennee>
    Signed-off-by: Peter Maydell <peter.maydell>

 target-arm/kvm64.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

Note that Cole created builds of v2.3.0-rc3 in Koji for several Fedora releases (eg. <http://arm.koji.fedoraproject.org/koji/buildinfo?buildID=273272>); so this should be possible to verify independently, just by installing one of those packages.

Indeed I tested qemu-system-aarch64-2.3.0-0.5.rc3.fc22.aarch64, it seems to be working fine.

Comment 6 D. Marlin 2015-04-28 17:01:11 UTC
Confirmed, qemu-system-aarch64-2.3.0-0.5.rc3.fc22.aarch64 is working for me as well.  I think we can close this one as fixed in version 2.3.0-0.5.rc3.fc22.