Created attachment 1679676 [details] qemu run log Description of problem: Qemu from f32 fails to run/hangs in container f32 guest(5.6.2-301.fc32.ppc64le). Container is run on f31 host(Power 9, 5.5.10-200.fc31.ppc64le), during the build of Fedora CoreOS Version-Release number of selected component (if applicable): 2:4.2.0-7.fc32.ppc64le How reproducible: Always Steps to Reproduce: 1. Build Fedora CoreOS on ppc64le(p9 host) Hope that I will be able to provide reduced reproducer. Actual results: See attached file. Expected results: Build of CoreOS suceeds. Additional info: I'm still trying to reproduce this outside of the build of the FCOS. For the record if rawhide qemu is used everything works.
Forgot to mention NVR of the for me working version from rawhide(qemu-5.0.0-0.1.rc0.fc31.ppc64le.rpm) rebuilt from the rawhide(dist-git along with other packages, mimicking the "virt-preview" COPR repo).
For the record rebuild of latest rawhide(qemu-2:5.0.0-0.3.rc3.fc31.ppc64le) is also OK for me. And re-building of the f32 qemu in mock(f32 BR) on that Power9 host fails(hangs to be specific) to me in tests with: . . . . PASS 8 test-hmp /ppc/hmp/40p PASS 9 test-hmp /ppc/hmp/ref405ep PASS 10 test-hmp /ppc/hmp/g3beige PASS 11 test-hmp /ppc/hmp/mpc8544ds PASS 12 test-hmp /ppc/hmp/taihu PASS 13 test-hmp /ppc/hmp/none+2MB MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=ppc-softmmu/qemu-system-ppc QTEST_QEMU_IMG=qemu-img tests/qos-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="qos-test" SKIP MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 QTEST_QEMU_IMG=qemu-img tests/endianness-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="endianness-test" PASS 1 endianness-test /ppc64/endianness/mac99 PASS 2 endianness-test /ppc64/endianness/pseries PASS 3 endianness-test /ppc64/endianness/pseries-2.7 PASS 4 endianness-test /ppc64/endianness/split/mac99 PASS 5 endianness-test /ppc64/endianness/split/pseries PASS 6 endianness-test /ppc64/endianness/split/pseries-2.7 PASS 7 endianness-test /ppc64/endianness/combine/mac99 PASS 8 endianness-test /ppc64/endianness/combine/pseries PASS 9 endianness-test /ppc64/endianness/combine/pseries-2.7 MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 QTEST_QEMU_IMG=qemu-img tests/boot-order-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="boot-order-test" PASS 1 boot-order-test /ppc64/boot-order/prep PASS 2 boot-order-test /ppc64/boot-order/pmac_oldworld PASS 3 boot-order-test /ppc64/boot-order/pmac_newworld MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 QTEST_QEMU_IMG=qemu-img tests/prom-env-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="prom-env-test" PASS 1 prom-env-test /ppc64/prom-env/mac99 PASS 2 prom-env-test /ppc64/prom-env/g3beige MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 QTEST_QEMU_IMG=qemu-img tests/drive_del-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="drive_del-test" PASS 1 drive_del-test /ppc64/drive_del/without-dev PASS 2 drive_del-test /ppc64/drive_del/after_failed_device_add PASS 3 drive_del-test /ppc64/blockdev/drive_del_device_del MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 QTEST_QEMU_IMG=qemu-img tests/boot-serial-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="boot-serial-test" PASS 1 boot-serial-test /ppc64/boot-serial/ppce500 PASS 2 boot-serial-test /ppc64/boot-serial/40p PASS 3 boot-serial-test /ppc64/boot-serial/mac99 PASS 4 boot-serial-test /ppc64/boot-serial/pseries PASS 5 boot-serial-test /ppc64/boot-serial/powernv8 PASS 6 boot-serial-test /ppc64/boot-serial/powernv9 PASS 7 boot-serial-test /ppc64/boot-serial/sam460ex MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 QTEST_QEMU_IMG=qemu-img tests/m48t59-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="m48t59-test" PASS 1 m48t59-test /ppc64/rtc/fuzz-registers MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 QTEST_QEMU_IMG=qemu-img tests/device-plug-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="device-plug-test" PASS 1 device-plug-test /ppc64/device-plug/pci-unplug-request PASS 2 device-plug-test /ppc64/device-plug/spapr-cpu-unplug-request PASS 3 device-plug-test /ppc64/device-plug/spapr-memory-unplug-request PASS 4 device-plug-test /ppc64/device-plug/spapr-phb-unplug-request MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 QTEST_QEMU_IMG=qemu-img tests/pnv-xscom-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="pnv-xscom-test" PASS 1 pnv-xscom-test /ppc64/pnv-xscom/cfam_id/POWER8 PASS 2 pnv-xscom-test /ppc64/pnv-xscom/cfam_id/POWER8NVL PASS 3 pnv-xscom-test /ppc64/pnv-xscom/cfam_id/POWER9 PASS 4 pnv-xscom-test /ppc64/pnv-xscom/core/POWER8 PASS 5 pnv-xscom-test /ppc64/pnv-xscom/core/POWER8NVL PASS 6 pnv-xscom-test /ppc64/pnv-xscom/core/POWER9 MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 QTEST_QEMU_IMG=qemu-img tests/migration-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="migration-test" PASS 1 migration-test /ppc64/migration/deprecated PASS 2 migration-test /ppc64/migration/bad_dest Could not access KVM kernel module: No such file or directory qemu-system-ppc64: failed to initialize KVM: No such file or directory qemu-system-ppc64: Back to tcg accelerator qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-cfpc=workaround qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-sbbc=workaround qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-ibs=workaround Could not access KVM kernel module: No such file or directory qemu-system-ppc64: failed to initialize KVM: No such file or directory qemu-system-ppc64: Back to tcg accelerator qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-cfpc=workaround qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-sbbc=workaround qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-ibs=workaround PASS 3 migration-test /ppc64/migration/fd_proto Could not access KVM kernel module: No such file or directory qemu-system-ppc64: failed to initialize KVM: No such file or directory qemu-system-ppc64: Back to tcg accelerator qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-cfpc=workaround qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-sbbc=workaround qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-ibs=workaround Could not access KVM kernel module: No such file or directory qemu-system-ppc64: failed to initialize KVM: No such file or directory qemu-system-ppc64: Back to tcg accelerator qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-cfpc=workaround qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-sbbc=workaround qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-ibs=workaround PASS 4 migration-test /ppc64/migration/validate_uuid PASS 5 migration-test /ppc64/migration/validate_uuid_error PASS 6 migration-test /ppc64/migration/validate_uuid_src_not_set PASS 7 migration-test /ppc64/migration/validate_uuid_dst_not_set Could not access KVM kernel module: No such file or directory qemu-system-ppc64: failed to initialize KVM: No such file or directory qemu-system-ppc64: Back to tcg accelerator qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-cfpc=workaround qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-sbbc=workaround qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-ibs=workaround Could not access KVM kernel module: No such file or directory qemu-system-ppc64: failed to initialize KVM: No such file or directory qemu-system-ppc64: Back to tcg accelerator qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-cfpc=workaround qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-sbbc=workaround qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-ibs=workaround qemu: fatal: Trying to deliver HV exception (MSR) 69 with no HV support NIP 000000000dbf4274 LR 000000000dbf33bc CTR 000000000dbf4df8 XER 0000000000000000 CPU#0 MSR 8000000000000000 HID0 0000000000000000 HF 8000000000000000 iidx 3 didx 3 TB 00000002 9745122511 DECR 18446744063964429014 GPR00 000000000dbf1618 000000000e67afe0 000000000dc21900 ffffffffffffffb0 GPR04 000000000e477008 000000000dc5f000 000000000e477008 000000000dc5f008 GPR08 000000000dc73218 0000000400000150 000000000dc5f000 00000000049ee000 GPR12 0000000020000008 0000000000000000 0000000000000000 0000000000000000 GPR16 0000000000000000 0000000000000000 000000000e477010 000000000dc1c258 GPR20 0000000000008000 000000000000f003 0000000000000006 000000000e67b050 GPR24 000000000dc17600 000000000dc1c088 0000000000000003 000000000000f001 GPR28 000000000e67b060 ffffffffffffffff 000000000dbf4df8 000000000dc1cab8 CR 20000004 [ E - - - - - - G ] RES ffffffffffffffff FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPSCR 0000000000000000 SRR0 000000000dbf06b0 SRR1 8000000000000000 PVR 00000000004e1200 VRSAVE 0000000000000000 SPRG0 0000000000000000 SPRG1 000000000000bf10 SPRG2 0000000000000000 SPRG3 0000000000000000 SPRG4 0000000000000000 SPRG5 0000000000000000 SPRG6 0000000000000000 SPRG7 0000000000000000 HSRR0 000000000dbf4274 HSRR1 8000000000000000 CFAR 000000000dbf3b64 LPCR 000000000403f008 PTCR 0000000000000000 DAR 0000000000000000 DSISR 0000000000000000
CCing some PPC virt guys. See the error at the end of Comment #2, any guesses?
(In reply to Cole Robinson from comment #3) > CCing some PPC virt guys. See the error at the end of Comment #2, any > guesses? There is a race condition with TCG and these tests (migration-tests) should not be run with TCG. We have an explicit checking in the test to avoid that: tests/qtest/migration-test.c: /* * On ppc64, the test only works with kvm-hv, but not with kvm-pr and TCG * is touchy due to race conditions on dirty bits (especially on PPC for * some reason) */ if (g_str_equal(qtest_get_arch(), "ppc64") && (access("/sys/module/kvm_hv", F_OK) || access("/dev/kvm", R_OK | W_OK))) { g_test_message("Skipping test: kvm_hv not available"); return g_test_run(); } So the question is why the special files are present and accessible but we can't use them: Could not access KVM kernel module: No such file or directory qemu-system-ppc64: failed to initialize KVM: No such file or directory It looks like a system configuration problem.
Sorry Laurent I should have looked closer before ccing. The build failure in comment #2 was a side report. The main issue that spawned the bug is a qemu ppc runtime issue hit when building Fedora CoreOS. The log is here: https://bugzilla.redhat.com/attachment.cgi?id=1679676 There's lots of these messages from the guest and from qemu side, interspersed: [ 698.405545] icp_hv_set_qirr: bad return code qirr cpu=11 hw_cpu=11 mfrr=0x4 returned -1 qemu-system-ppc64: pseries: h_ipi must only be called for emulated XICS And some more error output in the attachment
Weird. It looks like qemu and the guest disagree on which of the two possible interrupt controller models is in use. First this message: qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM Falling back to kernel-irqchip=off Indicates that qemu thinks the XIVE intc is in use (and also that KVM doesn't support it, but there are a few probably reasons that could happen). qemu is correctly falling back to the emulated irq chip, but.. qemu-system-ppc64: pseries: h_cppr must only be called for emulated XICS [ 0.000000] icp_hv_set_cppr: bad return code cppr cppr=0xff returned -1 these messages indicate that the guest is attempting to make hypercalls for the XICS interrupt controller. Ah.... and I suspect I know why: SLOF ********************************************************************** QEMU Starting Build Date = Jan 28 2020 00:00:00 FW Version = mockbuild@ release 20191022 ^^^^^^^^ During the qemu-5.0 cycle there were a number of changes in the pseries machine which required matching changes (and then a bunch of bugfixes) in the guest firmware, SLOF. It looks like this is running an old SLOF. I'm assuming Fedora, like RHEL, builds the firmware as a separate package rather than using the binary from the qemu tree, so it's pretty easy to see how that would happen. So, looks like you need to rebase SLOF to 20200327 and update the version dependency for the qemu package.
(In reply to Laurent Vivier from comment #4) > (In reply to Cole Robinson from comment #3) > > CCing some PPC virt guys. See the error at the end of Comment #2, any > > guesses? > > There is a race condition with TCG and these tests (migration-tests) should > not be run with TCG. > > We have an explicit checking in the test to avoid that: > > tests/qtest/migration-test.c: > > /* > * On ppc64, the test only works with kvm-hv, but not with kvm-pr and TCG > * is touchy due to race conditions on dirty bits (especially on PPC for > * some reason) > */ > if (g_str_equal(qtest_get_arch(), "ppc64") && > (access("/sys/module/kvm_hv", F_OK) || > access("/dev/kvm", R_OK | W_OK))) { > g_test_message("Skipping test: kvm_hv not available"); > return g_test_run(); > } > > So the question is why the special files are present and accessible but we > can't use them: > > Could not access KVM kernel module: No such file or directory > qemu-system-ppc64: failed to initialize KVM: No such file or directory > > It looks like a system configuration problem. Do you have any tips what to check, look for? For the record build logs(tests) of rawhide qemu, are also riddled with those messages(qemu-system-*: -accel kvm: invalid accelerator kvm), so I doubt that is the culprit here and this has been really a side track/note in this BZ. Just got surprised that I'm not able to build f32 qemu.
(In reply to David Gibson from comment #6) > Weird. . . . > So, looks like you need to rebase SLOF to 20200327 and update the version > dependency for the qemu package. Seems so. I have just tested running with SLOF-0.1.git20200327-1.fc33 on top of the f32 virt-stack and all seems good on my side. Switching component to SLOF. Can we get re-base to SLOF-0.1.git20200327 for f32?
(In reply to Jakub Čajka from comment #7) > (In reply to Laurent Vivier from comment #4) > > (In reply to Cole Robinson from comment #3) > > > CCing some PPC virt guys. See the error at the end of Comment #2, any > > > guesses? > > > > There is a race condition with TCG and these tests (migration-tests) should > > not be run with TCG. > > > > We have an explicit checking in the test to avoid that: > > > > tests/qtest/migration-test.c: > > > > /* > > * On ppc64, the test only works with kvm-hv, but not with kvm-pr and TCG > > * is touchy due to race conditions on dirty bits (especially on PPC for > > * some reason) > > */ > > if (g_str_equal(qtest_get_arch(), "ppc64") && > > (access("/sys/module/kvm_hv", F_OK) || > > access("/dev/kvm", R_OK | W_OK))) { > > g_test_message("Skipping test: kvm_hv not available"); > > return g_test_run(); > > } > > > > So the question is why the special files are present and accessible but we > > can't use them: > > > > Could not access KVM kernel module: No such file or directory > > qemu-system-ppc64: failed to initialize KVM: No such file or directory > > > > It looks like a system configuration problem. > > Do you have any tips what to check, look for? So you are building qemu and running test in a virtual machine? I think you should check on the host that nested kvm is allowed: $ cat /sys/module/kvm_hv/parameters/nested Y and add "-machine pseries,cap-nested-hv=true" to your qemu command line". or to avoid the problem: - either on the host: echo N > /sys/module/kvm_hv/parameters/nested - or in the VM: add kvm, kvm_pr and kvm_hv in the modules blacklist.
(In reply to Laurent Vivier from comment #9) > (In reply to Jakub Čajka from comment #7) > > (In reply to Laurent Vivier from comment #4) > > > (In reply to Cole Robinson from comment #3) > > > > CCing some PPC virt guys. See the error at the end of Comment #2, any > > > > guesses? > > > > > > There is a race condition with TCG and these tests (migration-tests) should > > > not be run with TCG. > > > > > > We have an explicit checking in the test to avoid that: > > > > > > tests/qtest/migration-test.c: > > > > > > /* > > > * On ppc64, the test only works with kvm-hv, but not with kvm-pr and TCG > > > * is touchy due to race conditions on dirty bits (especially on PPC for > > > * some reason) > > > */ > > > if (g_str_equal(qtest_get_arch(), "ppc64") && > > > (access("/sys/module/kvm_hv", F_OK) || > > > access("/dev/kvm", R_OK | W_OK))) { > > > g_test_message("Skipping test: kvm_hv not available"); > > > return g_test_run(); > > > } > > > > > > So the question is why the special files are present and accessible but we > > > can't use them: > > > > > > Could not access KVM kernel module: No such file or directory > > > qemu-system-ppc64: failed to initialize KVM: No such file or directory > > > > > > It looks like a system configuration problem. > > > > Do you have any tips what to check, look for? > > So you are building qemu and running test in a virtual machine? > > I think you should check on the host that nested kvm is allowed: > > $ cat /sys/module/kvm_hv/parameters/nested > Y > > and add "-machine pseries,cap-nested-hv=true" to your qemu command line". > > or to avoid the problem: > > - either on the host: echo N > /sys/module/kvm_hv/parameters/nested > - or in the VM: add kvm, kvm_pr and kvm_hv in the modules blacklist. That is on bare-metal Power9 linux box(no PowerVM/LPAR involved). kvm_hv is loaded and `cat /sys/module/kvm_hv/parameters/nested` returns `Y`. Build of qemu is run in mock.
(In reply to Jakub Čajka from comment #10) > (In reply to Laurent Vivier from comment #9) > > (In reply to Jakub Čajka from comment #7) > > > (In reply to Laurent Vivier from comment #4) > > > > (In reply to Cole Robinson from comment #3) > > > > > CCing some PPC virt guys. See the error at the end of Comment #2, any > > > > > guesses? > > > > > > > > There is a race condition with TCG and these tests (migration-tests) should > > > > not be run with TCG. > > > > > > > > We have an explicit checking in the test to avoid that: > > > > > > > > tests/qtest/migration-test.c: > > > > > > > > /* > > > > * On ppc64, the test only works with kvm-hv, but not with kvm-pr and TCG > > > > * is touchy due to race conditions on dirty bits (especially on PPC for > > > > * some reason) > > > > */ > > > > if (g_str_equal(qtest_get_arch(), "ppc64") && > > > > (access("/sys/module/kvm_hv", F_OK) || > > > > access("/dev/kvm", R_OK | W_OK))) { > > > > g_test_message("Skipping test: kvm_hv not available"); > > > > return g_test_run(); > > > > } > > > > > > > > So the question is why the special files are present and accessible but we > > > > can't use them: > > > > > > > > Could not access KVM kernel module: No such file or directory > > > > qemu-system-ppc64: failed to initialize KVM: No such file or directory > > > > > > > > It looks like a system configuration problem. > > > > > > Do you have any tips what to check, look for? > > > > So you are building qemu and running test in a virtual machine? > > > > I think you should check on the host that nested kvm is allowed: > > > > $ cat /sys/module/kvm_hv/parameters/nested > > Y > > > > and add "-machine pseries,cap-nested-hv=true" to your qemu command line". > > > > or to avoid the problem: > > > > - either on the host: echo N > /sys/module/kvm_hv/parameters/nested > > - or in the VM: add kvm, kvm_pr and kvm_hv in the modules blacklist. > > That is on bare-metal Power9 linux box(no PowerVM/LPAR involved). kvm_hv is > loaded and `cat /sys/module/kvm_hv/parameters/nested` returns `Y`. Build of > qemu is run in mock. In the guest, could you check kvm_hv is not loaded (as nested is disabled) and /sys/module/kvm_hv doesn't exist?
(In reply to Jakub Čajka from comment #8) > (In reply to David Gibson from comment #6) > > Weird. > . > . > . > > So, looks like you need to rebase SLOF to 20200327 and update the version > > dependency for the qemu package. > > Seems so. I have just tested running with SLOF-0.1.git20200327-1.fc33 on top > of the f32 virt-stack and all seems good on my side. Switching component to > SLOF. > > Can we get re-base to SLOF-0.1.git20200327 for f32? It should be in updates now. Can we close this? (not clear to me if there's other issues being discussed)