Bug 1784961

Summary:	libguestfs failing on power9 images
Product:	[Fedora] Fedora	Reporter:	Kevin Fenzi <kevin>
Component:	qemu	Assignee:	Fedora Virtualization Maintainers <virt-maint>
Status:	CLOSED EOL	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	32	CC:	amit, awilliam, berrange, cfergeau, crobinso, dan, dgibson, dwmw2, gustavold, hannsj_uhl, itamar, jcajka, lvivier, normand, pbonzini, rjones, virt-maint
Target Milestone:	---
Target Release:	---
Hardware:	ppc64le
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-05-25 15:14:31 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1071880

Description Kevin Fenzi 2019-12-18 20:03:32 UTC

Since we moved our builders to fedora 31, and after bug #1769600 was fixed, we are still seeing libguestfs fail in composes. It's used in the Cloud and Container images to add modifications after the image is created. 

F31 cloud: https://koji.fedoraproject.org/koji/taskinfo?taskID=39714385

...

Exception encountered in _build_image_from_template thread
guestfs_launch failed.
This usually means the libguestfs appliance failed to start or crashed.
Do:
  export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1
and run the command again.  For further information, read:
  http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs
You can also run 'libguestfs-test-tool' and post the *complete* output
into a bug report or message to the libguestfs mailing list.
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/imgfac/Builder.py", line 132, in _build_image_from_template
    self.os_plugin.create_base_image(self, template, parameters)
  File "/usr/lib/python3.7/site-packages/imagefactory_plugins/TinMan/TinMan.py", line 354, in create_base_image
    gfs = launch_inspect_and_mount(self.image, readonly=True)
  File "/usr/lib/python3.7/site-packages/imgfac/FactoryUtils.py", line 25, in launch_inspect_and_mount
    g.launch()
  File "/usr/lib64/python3.7/site-packages/guestfs.py", line 5872, in launch
    r = libguestfsmod.launch(self._o)
RuntimeError: guestfs_launch failed.
This usually means the libguestfs appliance failed to start or crashed.
Do:
  export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1
and run the command again.  For further information, read:
  http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs
You can also run 'libguestfs-test-tool' and post the *complete* output
into a bug report or message to the libguestfs mailing list.
ABORT called in TinMan plugin
Domain not found: no domain with matching name 'factory-build-39b625f4-da5d-459e-8693-463de8a82dc3'
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/imagefactory_plugins/TinMan/TinMan.py", line 243, in abort
    guest_dom = self.guest.libvirt_conn.lookupByName(self.tdlobj.name)
  File "/usr/lib64/python3.7/site-packages/libvirt.py", line 4364, in lookupByName
    if ret is None:raise libvirtError('virDomainLookupByName() failed', conn=self)
libvirt.libvirtError: Domain not found: no domain with matching name 'factory-build-39b625f4-da5d-459e-8693-463de8a82dc3'
No Oz VM found with name (factory-build-39b625f4-da5d-459e-8693-463de8a82dc3) - nothing to do
This likely means the local VM has already been destroyed or never started
Resetting dropped connection: koji.fedoraproject.org
https://koji.fedoraproject.org:443 "POST /kojihub?session-id=92178375&session-key=3682-PpaGAeidVV93gCOchWf&callnum=9 HTTP/1.1" 200 114
...

F30 container: https://koji.fedoraproject.org/koji/taskinfo?taskID=39713832

Note that these are on a Fedora 31 power9 machine with Fedora 31 guests (the task runs on the bildvm). 

libguestfs-test-tool gives:

     ************************************************************
     *                    IMPORTANT NOTICE
     *
     * When reporting bugs, include the COMPLETE, UNEDITED
     * output below in your bug report.
     *
     ************************************************************
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
XDG_RUNTIME_DIR=/run/user/0
SELinux: Permissive
guestfs_get_append: (null)
guestfs_get_autosync: 1
guestfs_get_backend: libvirt
guestfs_get_backend_settings: []
guestfs_get_cachedir: /var/tmp
guestfs_get_hv: /usr/bin/qemu-system-ppc64
guestfs_get_memsize: 1024
guestfs_get_network: 0
guestfs_get_path: /usr/lib64/guestfs
guestfs_get_pgroup: 0
guestfs_get_program: libguestfs-test-tool
guestfs_get_recovery_proc: 1
guestfs_get_smp: 1
guestfs_get_sockdir: /tmp
guestfs_get_tmpdir: /tmp
guestfs_get_trace: 0
guestfs_get_verbose: 1
host_cpu: powerpc64le
Launching appliance, timeout set to 600 seconds.
libguestfs: launch: program=libguestfs-test-tool
libguestfs: launch: version=1.40.2fedora=31,release=8.fc31,libvirt
libguestfs: launch: backend registered: unix
libguestfs: launch: backend registered: uml
libguestfs: launch: backend registered: libvirt
libguestfs: launch: backend registered: direct
libguestfs: launch: backend=libvirt
libguestfs: launch: tmpdir=/tmp/libguestfsI8FPWT
libguestfs: launch: umask=0022
libguestfs: launch: euid=0
libguestfs: libvirt version = 5006000 (5.6.0)
libguestfs: guest random name = guestfs-z2tu3i19vx35na9x
libguestfs: connect to libvirt
libguestfs: opening libvirt handle: URI = qemu:///system, auth = default+wrapper, flags = 0
libguestfs: successfully opened libvirt handle: conn = 0x11b29cae0
libguestfs: qemu version (reported by libvirt) = 4001001 (4.1.1)
libguestfs: get libvirt capabilities
libguestfs: parsing capabilities XML
libguestfs: build appliance
libguestfs: begin building supermin appliance
libguestfs: run supermin
libguestfs: command: run: /usr/bin/supermin
libguestfs: command: run: \ --build
libguestfs: command: run: \ --verbose
libguestfs: command: run: \ --if-newer
libguestfs: command: run: \ --lock /var/tmp/.guestfs-0/lock
libguestfs: command: run: \ --copy-kernel
libguestfs: command: run: \ -f ext2
libguestfs: command: run: \ --host-cpu powerpc64le
libguestfs: command: run: \ /usr/lib64/guestfs/supermin.d
libguestfs: command: run: \ -o /var/tmp/.guestfs-0/appliance.d
supermin: version: 5.1.20
supermin: rpm: detected RPM version 4.15
supermin: package handler: fedora/rpm
supermin: acquiring lock on /var/tmp/.guestfs-0/lock
supermin: if-newer: output does not need rebuilding
libguestfs: finished building supermin appliance
libguestfs: command: run: qemu-img
libguestfs: command: run: \ create
libguestfs: command: run: \ -f qcow2
libguestfs: command: run: \ -o backing_file=/var/tmp/.guestfs-0/appliance.d/root,backing_fmt=raw
libguestfs: command: run: \ /tmp/libguestfsI8FPWT/overlay2.qcow2
Formatting '/tmp/libguestfsI8FPWT/overlay2.qcow2', fmt=qcow2 size=4294967296 backing_file=/var/tmp/.guestfs-0/appliance.d/root backing_fmt=raw cluster_size=65536 lazy_refcounts=off refcount_bits=16
libguestfs: create libvirt XML
libguestfs: libvirt XML:\n<?xml version="1.0"?>\n<domain type="kvm" xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0">\n  <name>guestfs-z2tu3i19vx35na9x</name>\n  <memory unit="MiB">1024</memory>\n  <currentMemory unit="MiB">1024</currentMemory>\n  <vcpu>1</vcpu>\n  <clock offset="utc">\n    <timer name="rtc" tickpolicy="catchup"/>\n    <timer name="pit" tickpolicy="delay"/>\n  </clock>\n  <os>\n    <type machine="pseries">hvm</type>\n    <kernel>/var/tmp/.guestfs-0/appliance.d/kernel</kernel>\n    <initrd>/var/tmp/.guestfs-0/appliance.d/initrd</initrd>\n    <cmdline>panic=1 console=hvc0 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=screen</cmdline>\n  </os>\n  <on_reboot>destroy</on_reboot>\n  <devices>\n    <rng model="virtio">\n      <backend model="random">/dev/urandom</backend>\n    </rng>\n    <controller type="scsi" index="0" model="virtio-scsi"/>\n    <disk device="disk" type="file">\n      <source file="/tmp/libguestfsI8FPWT/scratch1.img"/>\n      <target dev="sda" bus="scsi"/>\n      <driver name="qemu" type="raw" cache="unsafe"/>\n      <address type="drive" controller="0" bus="0" target="0" unit="0"/>\n    </disk>\n    <disk type="file" device="disk">\n      <source file="/tmp/libguestfsI8FPWT/overlay2.qcow2"/>\n      <target dev="sdb" bus="scsi"/>\n      <driver name="qemu" type="qcow2" cache="unsafe"/>\n      <address type="drive" controller="0" bus="0" target="1" unit="0"/>\n    </disk>\n    <serial type="unix">\n      <source mode="connect" path="/tmp/libguestfsSJfQVT/console.sock"/>\n      <target port="0"/>\n    </serial>\n    <channel type="unix">\n      <source mode="connect" path="/tmp/libguestfsSJfQVT/guestfsd.sock"/>\n      <target type="virtio" name="org.libguestfs.channel.0"/>\n    </channel>\n    <controller type="usb" model="none"/>\n    <memballoon model="none"/>\n  </devices>\n  <qemu:commandline>\n    <qemu:env name="TMPDIR" value="/var/tmp"/>\n  </qemu:commandline>\n</domain>\n
libguestfs: command: run: ls
libguestfs: command: run: \ -a
libguestfs: command: run: \ -l
libguestfs: command: run: \ -R
libguestfs: command: run: \ -Z /var/tmp/.guestfs-0
libguestfs: /var/tmp/.guestfs-0:
libguestfs: total 184
libguestfs: drwxr-xr-x. 3 root root unconfined_u:object_r:user_tmp_t:s0   4096 Dec 18 19:56 .
libguestfs: drwxrwxrwt. 9 root root system_u:object_r:tmp_t:s0            4096 Dec 18 19:56 ..
libguestfs: drwxr-xr-x. 2 root root unconfined_u:object_r:user_tmp_t:s0   4096 Dec 17 23:15 appliance.d
libguestfs: -rw-r--r--. 1 root root unconfined_u:object_r:user_tmp_t:s0      0 Dec  6 23:04 lock
libguestfs: -rw-r--r--. 1 root root unconfined_u:object_r:user_tmp_t:s0  11104 Dec  7 21:02 qemu-16765032-1573858419.devices
libguestfs: -rw-r--r--. 1 root root unconfined_u:object_r:user_tmp_t:s0  26890 Dec  7 21:02 qemu-16765032-1573858419.help
libguestfs: -rw-r--r--. 1 root root unconfined_u:object_r:user_tmp_t:s0 124801 Dec  7 21:02 qemu-16765032-1573858419.qmp-schema
libguestfs: -rw-r--r--. 1 root root unconfined_u:object_r:user_tmp_t:s0     48 Dec  7 21:02 qemu-16765032-1573858419.query-kvm
libguestfs: -rw-r--r--. 1 root root unconfined_u:object_r:user_tmp_t:s0     49 Dec  7 21:02 qemu-16765032-1573858419.stat
libguestfs: 
libguestfs: /var/tmp/.guestfs-0/appliance.d:
libguestfs: total 446932
libguestfs: drwxr-xr-x. 2 root root unconfined_u:object_r:user_tmp_t:s0       4096 Dec 17 23:15 .
libguestfs: drwxr-xr-x. 3 root root unconfined_u:object_r:user_tmp_t:s0       4096 Dec 18 19:56 ..
libguestfs: -rw-r--r--. 1 qemu qemu unconfined_u:object_r:user_tmp_t:s0    2116096 Dec 18 19:56 initrd
libguestfs: -rwxr-xr-x. 1 qemu qemu unconfined_u:object_r:user_tmp_t:s0   25101936 Dec 18 19:56 kernel
libguestfs: -rw-r--r--. 1 qemu qemu system_u:object_r:virt_content_t:s0 4294967296 Dec 18 19:56 root
libguestfs: command: run: ls
libguestfs: command: run: \ -a
libguestfs: command: run: \ -l
libguestfs: command: run: \ -Z /tmp/libguestfsSJfQVT
libguestfs: total 8
libguestfs: drwxr-xr-x. 2 root root unconfined_u:object_r:user_tmp_t:s0 4096 Dec 18 19:56 .
libguestfs: drwxrwxrwt. 8 root root system_u:object_r:tmp_t:s0          4096 Dec 18 19:56 ..
libguestfs: srw-rw----. 1 root qemu unconfined_u:object_r:user_tmp_t:s0    0 Dec 18 19:56 console.sock
libguestfs: srw-rw----. 1 root qemu unconfined_u:object_r:user_tmp_t:s0    0 Dec 18 19:56 guestfsd.sock
libguestfs: launch libvirt guest


SLOF\x1b[0m\x1b[?25l **********************************************************************
\x1b[1mQEMU Starting
\x1b[0m Build Date = Jul 24 2019 00:00:00
 FW Version = mockbuild@ release 20190114
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/vty@30000000
Populating /vdevice/nvram@71000000
Populating /pci@800000020000000
                     00 0800 (D) : 1af4 1004    virtio [ scsi ]
Populating /pci@800000020000000/scsi@1
       SCSI: Looking for devices
          100000000000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          101000000000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
                     00 1000 (D) : 1af4 1003    virtio [ serial ]
                     00 1800 (D) : 1af4 1005    unknown-legacy-device*
No NVRAM common partition, re-initializing...
Scanning USB 
Using default console: /vdevice/vty@30000000
Detected RAM kernel at 400000 (1a5b040 bytes) 
     
  Welcome to Open Firmware

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php

Booting from memory...
OF stdout device is: /vdevice/vty@30000000
Preparing to boot Linux version 5.3.16-300.fc31.ppc64le (mockbuild.fedoraproject.org) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Fri Dec 13 17:59:56 UTC 2019
Detected machine type: 0000000000000101
command line: panic=1 console=hvc0 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=screen
Max number of cores passed to firmware: 1024 (NR_CPUS = 1024)
Calling ibm,client-architecture-support...libguestfs: error: appliance closed the connection unexpectedly, see earlier error messages
libguestfs: child_cleanup: 0x11b29a290: child process died
libguestfs: error: guestfs_launch failed, see earlier error messages
libguestfs: closing guestfs handle 0x11b29a290 (state 0)
libguestfs: command: run: rm
libguestfs: command: run: \ -rf /tmp/libguestfsI8FPWT
libguestfs: command: run: rm
libguestfs: command: run: \ -rf /tmp/libguestfsSJfQVT

If I manually start something via qemu, it works, but it restarts in the middle:

[root@buildvm-ppc64le-09 tmp][PROD]# qemu-system-ppc64 -m 4096 -boot d -enable-kvm -smp 4 -net nic -net user -hda test.img -cdrom Fedora-Everything-netinst-ppc64le-Rawhide-20191217.n.0.iso -nographic


SLOF **********************************************************************
QEMU Starting
 Build Date = Jul 24 2019 00:00:00
 FW Version = mockbuild@ release 20190114
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /vdevice/l-lan@71000002
Populating /vdevice/v-scsi@71000003
       SCSI: Looking for devices
          8000000000000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          8200000000000000 CD-ROM   : "QEMU     QEMU CD-ROM      2.5+"
Populating /pci@800000020000000
                     00 0000 (D) : 1234 1111    qemu vga
                     00 0800 (D) : 1033 0194    serial bus [ usb-xhci ]
No NVRAM common partition, re-initializing...
Installing QEMU fb



Scanning USB 
  XHCI: Initializing
    USB Keyboard 
    USB mouse 
No console specified using screen & keyboard
     


  Welcome to Open Firmware

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php


Trying to load:  from: /vdevice/v-scsi@71000003/disk@8200000000000000 ...   Successfully loaded
qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM
Falling back to kernel-irqchip=off


SLOF **********************************************************************
QEMU Starting
 Build Date = Jul 24 2019 00:00:00
 FW Version = mockbuild@ release 20190114
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /vdevice/l-lan@71000002
Populating /vdevice/v-scsi@71000003
       SCSI: Looking for devices
          8000000000000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
          8200000000000000 CD-ROM   : "QEMU     QEMU CD-ROM      2.5+"
Populating /pci@800000020000000
                     00 0000 (D) : 1234 1111    qemu vga
                     00 0800 (D) : 1033 0194    serial bus [ usb-xhci ]
Installing QEMU fb



Scanning USB 
  XHCI: Initializing
    USB Keyboard 
    USB mouse 
No console specified using screen & keyboard
     


  Welcome to Open Firmware

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php


Trying to load:  from: /vdevice/v-scsi@71000003/disk@8200000000000000 ...   Successfully loaded
Linux ppc64le
#1 SMP Mon Dec 1

(it then puts the anaconda prompt inside the already rendered screen)

Perhaps the restart is confusing libguestfs?

Comment 1 Richard W.M. Jones 2019-12-18 21:58:49 UTC

There may be more information from the libguestfs-test-tool run if you
look in /var/log/libvirt/qemu/guestfs-z2tu3i19vx35na9x.log.  Also if
qemu segfaulted then abrt/coredumpctl may have captured a core dump.

However the basic problem is that qemu is crashing, so this is most
likely to be a qemu (or possibly kernel/firmware) problem.

Comment 2 Kevin Fenzi 2019-12-18 22:52:13 UTC

The /var/log/libvirt/guestfs*.log: 

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \
HOME=/var/lib/libvirt/qemu/domain-1-guestfs-z2tu3i19vx35 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-guestfs-z2tu3i19vx35/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-guestfs-z2tu3i19vx35/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-guestfs-z2tu3i19vx35/.config \
QEMU_AUDIO_DRV=none \
TMPDIR=/var/tmp \
/usr/bin/qemu-system-ppc64 \
-name guest=guestfs-z2tu3i19vx35na9x,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-guestfs-z2tu3i19vx35/master-key.a
es \
-machine pseries-4.1,accel=kvm,usb=off,dump-guest-core=off \
-m 1024 \
-overcommit mem-lock=off \
-smp 1,sockets=1,cores=1,threads=1 \
-uuid 0024f525-c02c-42c5-9326-0bfa6acf4a9a \
-display none \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=32,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-no-reboot \
-boot strict=on \
-kernel /var/tmp/.guestfs-0/appliance.d/kernel \
-initrd /var/tmp/.guestfs-0/appliance.d/initrd \
-append 'panic=1 console=hvc0 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check p
rintk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=/dev/sdb
 selinux=0 guestfs_verbose=1 TERM=screen' \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x1 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x2 \
-drive file=/tmp/libguestfsI8FPWT/scratch1.img,format=raw,if=none,id=drive-scsi0-0-0-0,cache=unsafe \
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=drive-scsi0-0-0-0,id=
scsi0-0-0-0,bootindex=1,write-cache=on \
-drive file=/tmp/libguestfsI8FPWT/overlay2.qcow2,format=qcow2,if=none,id=drive-scsi0-0-1-0,cache=unsafe \
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=1,lun=0,device_id=drive-scsi0-0-1-0,drive=drive-scsi0-0-1-0,id=
scsi0-0-1-0,write-cache=on \
-chardev socket,id=charserial0,path=/tmp/libguestfsSJfQVT/console.sock \
-device spapr-vty,chardev=charserial0,id=serial0,reg=0x30000000 \
-chardev socket,id=charchannel0,path=/tmp/libguestfsSJfQVT/guestfsd.sock \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.libguestfs.channel
.0 \
-object rng-random,id=objrng0,filename=/dev/urandom \
-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x3 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
2019-12-18 19:56:10.134+0000: Domain id=1 is tainted: custom-argv
2019-12-18 19:56:14.213+0000: shutting down, reason=shutdown

There's no crash or core...

Comment 3 Kevin Fenzi 2020-01-07 19:19:42 UTC

Any ideas or news here?

Still happening. ;(

Comment 4 Dan Horák 2020-01-10 12:55:45 UTC

For the record, I have already seen the "restarting" VM when playing with RHEL 8 cloud images (and perhaps with others too). The symptom was similar, boot starts, writes the grub boot menu and instead of booting the selected OS, it boots again to the grub menu. And with the second grub run, it allows to Linux to boot.

Comment 5 Gustavo Luiz Duarte 2020-01-10 16:10:33 UTC

Also, have you tried passing the same number of threads as the host? I mean "-smp 1,sockets=1,cores=1,threads=1" (P9 usually has 4 threads per core). See Bug 1789199.

[]'s
Gustavo

Comment 6 Gustavo Luiz Duarte 2020-01-10 16:12:17 UTC

I meant to write "-smp 1,sockets=1,cores=1,threads=4"

Comment 7 Dan Horák 2020-01-10 16:56:14 UTC

should be "-smp 4,sockets=1,cores=1,threads=4" :-) But I see no change. Kevin's command line gives a good reproducer, so let's switch to qemu or start a new bug against qemu.

Comment 8 Dan Horák 2020-01-10 17:08:58 UTC

it could be related to this warning
qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM Falling back to kernel-irqchip=off
because it appear when Linux kernel is loaded/booted for the first time. It's missing in the second boot.

and I think there has been a bug for it already

Comment 9 Dan Horák 2020-01-10 17:56:46 UTC

Seems it's the machine type problem again, using -M pseries-4.0 makes the "double boot" problem in qemu go away.

Comment 10 Laurent Vivier 2020-01-13 08:37:15 UTC

(In reply to Dan Horák from comment #9)
> Seems it's the machine type problem again, using -M pseries-4.0 makes the
> "double boot" problem in qemu go away.

This has been fixed upstream by

8deb8019d696 ("spapr: Don't trigger a CAS reboot for XICS/XIVE mode changeover")

Comment 11 Dan Horák 2020-01-13 08:53:48 UTC

Laurent, could it be backported to qemu 4.2? Are there any prerequisite patches? Cedric has already explained me the "double boot" in https://bugzilla.redhat.com/show_bug.cgi?id=1769600#c34 so I'm wondering what route should be we go - running regular VMs is OK, but libguestfs can't deal with that, but it would allow us override the machine parameters via http://libguestfs.org/guestfs.3.html#qemu-wrappers. Easier would be to use pacthed qemu.

Comment 12 Laurent Vivier 2020-01-13 09:59:22 UTC

(In reply to Dan Horák from comment #11)
> Laurent, could it be backported to qemu 4.2?

Yes, and it's straightforward. No prerequisite patches.

Comment 13 Dan Horák 2020-01-13 11:22:30 UTC

(In reply to Laurent Vivier from comment #12)
> (In reply to Dan Horák from comment #11)
> > Laurent, could it be backported to qemu 4.2?
> 
> Yes, and it's straightforward. No prerequisite patches.

and how about qemu 4.1? Because that's the version in F-31 that installed on the Fedora builders.

Comment 14 Laurent Vivier 2020-01-13 11:59:41 UTC

(In reply to Dan Horák from comment #13)
> (In reply to Laurent Vivier from comment #12)
> > (In reply to Dan Horák from comment #11)
> > > Laurent, could it be backported to qemu 4.2?
> > 
> > Yes, and it's straightforward. No prerequisite patches.
> 
> and how about qemu 4.1? Because that's the version in F-31 that installed on
> the Fedora builders.

It more complicated: 4.1 needs more patches to regenerate the device tree when CAS is called, to add a code path to activate and deactivate interrupt controllers and a new version of SLOF.

dgibson knows better than me what is the list of needed patches

Comment 15 Dan Horák 2020-01-13 12:22:39 UTC

OK, then it would be better to use the virt stack from the virt-preview repo which has 4.2 for F-31 and F-30

Comment 16 Dan Horák 2020-01-13 17:07:32 UTC

And I confirm, that the "double boot" goes away when I use qemu 4.2 with the 8deb8019d696 patch applied. I'll now check if oz/image-factory works too.

Comment 17 Dan Horák 2020-01-13 18:08:59 UTC

And I got an image created

============ Final Image Details ============
UUID: 48b2d109-d0ef-4f0b-8e20-4a62424ea25b
Type: base_image
Image filename: /var/lib/imagefactory/storage/48b2d109-d0ef-4f0b-8e20-4a62424ea25b.body
Image build completed SUCCESSFULLY!


So the main question is about the next steps? Can infra use qemu from virt-preview? How to best integrate the patch into our qemu package?

Comment 18 Kevin Fenzi 2020-01-13 18:19:22 UTC

So to clarify, this is in the guest right? 

We could do a qemu build + the patch in our infra repo... that would upgrade all the builders, but I guess that might be ok?

Comment 19 Laurent Vivier 2020-01-13 18:21:08 UTC

Dan,

There is perhaps another way to fix the problem easily if the problem is with the no-reboot parameter and not with the double-boot.

The commit 9146206eb26c ("spapr: Use SHUTDOWN_CAUSE_SUBSYSTEM_RESET for CAS reboots") allows to reboot to do the CAS negotiation even if the --no-reboot parameter is provided.

It is already included in 4.2 and easy to backport to 4.1

Comment 20 Dan Horák 2020-01-13 18:59:02 UTC

(In reply to Kevin Fenzi from comment #18)
> So to clarify, this is in the guest right? 

yes, qemu in the builder VM needs the update
 
> We could do a qemu build + the patch in our infra repo... that would upgrade
> all the builders, but I guess that might be ok?

yes, it should be OK, you could give the other arches some testing in staging env first

Comment 21 Dan Horák 2020-01-13 19:13:59 UTC

(In reply to Laurent Vivier from comment #19)
> Dan,
> 
> There is perhaps another way to fix the problem easily if the problem is
> with the no-reboot parameter and not with the double-boot.
> 
> The commit 9146206eb26c ("spapr: Use SHUTDOWN_CAUSE_SUBSYSTEM_RESET for CAS
> reboots") allows to reboot to do the CAS negotiation even if the --no-reboot
> parameter is provided.
> 
> It is already included in 4.2 and easy to backport to 4.1

I think it's a question for the libguests guys and how libguestfs communicates with qemu. Also I think we have no way to pass additional parameters to to the domain XML or qemu, so qemu 4.2 + the patch looks as a good solution to me :-)

Comment 22 Richard W.M. Jones 2020-01-14 09:11:36 UTC

Not really sure of the question but maybe this diagram helps?

http://libguestfs.org/guestfs-internals.1.html#architecture

We try not to need custom tweaking for each architecture.  If qemu
can't by default boot a kernel image then we usually think of that
architecture as needing to be fixed.  I even have a tool to test this:

https://people.redhat.com/~rjones/qemu-sanity-check/

Comment 23 David Gibson 2020-01-20 03:01:15 UTC

Applying 8deb8019d696 to 4.1 will require a *lot* of preliminary patches.

However, for the problem you have specifically with libguestfs (which occurs because you use -no-reboot), it should only be necessary to use the stopgap fix in 9146206eb26c1436c80a7c2ca1e4c5f86b27179d "spapr: Use SHUTDOWN_CAUSE_SUBSYSTEM_RESET for CAS reboots".

That one should apply to 4.1 much more easily.

Comment 24 Adam Williamson 2020-01-20 11:11:17 UTC

The reboots turned out to actually be a problem for openQA as well, it was just a bit more subtle than I first realized. We have some tests which are set to specify kernel parameters, by typing them into the bootloader when the VM boots. But those tests are broken by the reboot behaviour, because the test only types the parameters on the *first* boot, then the reboot happens and they are effectively lost. Rewriting the test code to handle the VM spontaneously rebooting like this would be a bit awkward.

For now, Kevin has done a backport of qemu 4.2 in the infra repo, and I have that deployed on the openQA VMs as well; with a newer SLOF it seems to be working OK.

Comment 25 David Gibson 2020-01-21 01:43:33 UTC

Ah, right.  9146206eb26c1436c80a7c2ca1e4c5f86b27179d alone won't fix that kernel parameters problem.  For that you will need 8deb8019d696 and all its preliminaries.

Comment 26 Dan Horák 2020-02-14 10:33:25 UTC

Switching to qemu to build an update with the mentioned patch applied.

Cole (or another qemu maintainer), could you build new qemu 4.2 (f32 + rawhide + virt-preview) with
 8deb8019d696 ("spapr: Don't trigger a CAS reboot for XICS/XIVE mode changeover")
applied?

Thanks, Dan.

Comment 27 Cole Robinson 2020-02-18 17:20:52 UTC

Patches pushed and f32 build is done, rawhide qemu build is failing due to some kernel headers breakage though: https://bugzilla.redhat.com/show_bug.cgi?id=1804330

Comment 28 Fedora Program Management 2021-04-29 16:01:17 UTC

This message is a reminder that Fedora 32 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 32 on 2021-05-25.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '32'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 32 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 29 Ben Cotton 2021-05-25 15:14:31 UTC

Fedora 32 changed to end-of-life (EOL) status on 2021-05-25. Fedora 32 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.