Bug 1948363
| Summary: | RHEL9 ppc64le guests should work under RHEL8 KVM ppc64le hosts | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | David Gibson <dgibson> |
| Component: | kernel | Assignee: | Daniel Henrique Barboza (IBM) <dbarboza> |
| kernel sub component: | Platform Enablement | QA Contact: | Eirik Fuller <efuller> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | bstinson, carl, dbarboza, efuller, jwboyer, mdeng, mrezanin, ngu, qzhang, rvr, xuma |
| Version: | CentOS Stream | Keywords: | TestOnly, Triaged |
| Target Milestone: | beta | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | ppc64le | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-01-12 11:26:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1959652 | ||
| Bug Blocks: | |||
|
Description
David Gibson
2021-04-12 04:16:26 UTC
Initial test results (in https://beaker.engineering.redhat.com/jobs/5336764) are encouraging. A Witherspoon system running kernel-4.18.0-305.el8 (RHEL-8.4.0-20210503.1) hosted kernel-5.12.0-0.rc8.193.el9 (RHEL-9.0.0-20210428.3) on a couple of KVM guests, with assorted failures in the PowerPC selftests and a couple of perf testsuites. The only untracked PowerPC selftest failure was 'powerpc/security: spectre_v2' (I'll open a bug if I don't find an existing bug which tracks that failure). I'll check the perf test failures to see if any of those need new bugs, and I'll add more tasks in subsequent runs. Eirik, what KVM userspace (qemu, libvirt) are you using for those tests? The host recipe (https://beaker.engineering.redhat.com/recipes/9935001) of the job linked in comment 1 uses /distribution/virt/install and /distribution/virt/start; the former installs libvirt (though it was already installed due to a package tag in the job XML; perhaps that part of the job XML is redundant). In short, the tests described in comment 1 used libvirt, not qemu. If I find a way to test with qemu, I'll log the results here. You're always using qemu, even if you're doing it via libvirt. What I'm after is the versions of libvirt and qemu that were actually installed in this test run. I'm suspecting it's the RHEL-8.4 versions rather than the RHEL-AV-8.4 versions, and the latest Spectre mitigations might only be in the latter. http://download.eng.bos.redhat.com/rhel-8/rel-eng/RHEL-8/RHEL-8.4.0-20210503.1/compose/AppStream/ppc64le/os/Packages/ (used in the comment 1 Beaker host recipe) has libvirt-6.0.0-35.module+el8.4.0+10230+7a9b21e4 And more importantly qemu-kvm-4.2.0-48.module+el8.4.0+10368+630e803b.ppc64le.rpm. Hmm.. can you check the contents of /sys/devices/system/cpu/vulnerabilities/spectre_v2 on both the RHEL8 host and the RHEL9 guest? https://beaker.engineering.redhat.com/recipes/9941957/tasks/125542992/logs/taskout.log (from the /kernel/security/vulnerabilities task on the kernel-4.18.0-305.el8 Witherspoon host) includes the following output. :: [ 23:18:03 ] :: [ BEGIN ] :: Log vulnerabilities status :: actually running 'grep . * | sed 's/:/^/' | column -t -s^ | tee -a /mnt/testarea/tmp.x3aorO' itlb_multihit Not affected l1tf Not affected mds Not affected meltdown Mitigation: RFI Flush, L1D private per thread spec_store_bypass Mitigation: Kernel entry/exit barrier (eieio) spectre_v1 Mitigation: __user pointer sanitization, ori31 speculation barrier enabled spectre_v2 Mitigation: Software count cache flush (hardware accelerated), Software link stack flush srbds Not affected tsx_async_abort Not affected :: [ 23:18:03 ] :: [ PASS ] :: Log vulnerabilities status (Expected 0, got 0) https://beaker.engineering.redhat.com/recipes/9941958/tasks/125543026/logs/taskout.log (from the /kernel/security/vulnerabilities task on the kernel-5.12.0-1.el9 KVM guest) includes the following output. :: [ 23:34:33 ] :: [ BEGIN ] :: Log vulnerabilities status :: actually running 'grep . * | sed 's/:/^/' | column -t -s^ | tee -a /mnt/testarea/tmp.hKWQDP' itlb_multihit Not affected l1tf Mitigation: RFI Flush, L1D private per thread mds Not affected meltdown Mitigation: RFI Flush, L1D private per thread spec_store_bypass Mitigation: Kernel entry/exit barrier (eieio) spectre_v1 Mitigation: __user pointer sanitization, ori31 speculation barrier enabled spectre_v2 Mitigation: Software count cache flush (hardware accelerated), Software link stack flush srbds Not affected tsx_async_abort Not affected :: [ 23:34:33 ] :: [ PASS ] :: Log vulnerabilities status (Expected 0, got 0) Both host and guest report a mitigation for spectre_v2, so it seems either the mitigation is ineffective, or the spectre_v2 selftest is flawed. Either way the spectre_v2 selftest failure does not seem like the result of a qemu issue. It's possible the spectre_v2 selftest failure does not occur on all Power9 hosts. I'll check on a Boston host. https://beaker.engineering.redhat.com/jobs/5341726 is still running as I type this, but it has finished the PowerPC selftests in both recipes. Results from the spectre_v2 selftest on the kernel-4.18.0-305.el8 Boston host follow. # selftests: powerpc/security: spectre_v2 # test: spectre_v2 # tags: git_version:unknown # sysfs reports: 'Mitigation: Indirect branch cache disabled, Software link stack flush' # PM_BR_PRED_CCACHE: result 2147483649 running/enabled 20192793932 # PM_BR_MPRED_CCACHE: result 2147483649 running/enabled 20192790826 # PM_BR_PRED_PCACHE: result 0 running/enabled 20192789488 # PM_BR_MPRED_PCACHE: result 0 running/enabled 20192788160 # Miss percent 100 % # OK - Measured branch prediction rates match reported spectre v2 mitigation. # success: spectre_v2 ok 3 selftests: powerpc/security: spectre_v2 Results from the spectre_v2 selftest on the kernel-5.12.0-1.el9 KVM guest follows. # selftests: powerpc/security: spectre_v2 # test: spectre_v2 # tags: git_version:unknown # sysfs reports: 'Mitigation: Software count cache flush (hardware accelerated), Software link stack flush' # PM_BR_PRED_CCACHE: result 2147483649 running/enabled 19321183346 # PM_BR_MPRED_CCACHE: result 2147483649 running/enabled 19321178802 # PM_BR_PRED_PCACHE: result 0 running/enabled 19321177702 # PM_BR_MPRED_PCACHE: result 0 running/enabled 19321176632 # Miss percent 100 % # Branch misses > 15% unexpected in this configuration! # Possible mis-match between reported & actual mitigation # failure: spectre_v2 not ok 3 selftests: powerpc/security: spectre_v2 # exit=1 The KVM selftest failure is based on a possible mismatch between the reported and actual mitigation. The results are similar across host and guest (Miss percent 100 %), suggesting that the host mitigation takes effect even in the guest. It's not clear whether this selftest reaches an erroneous conclusion in a KVM guest, or qemu somehow needs a better emulation of this vulnerability and its mitigation. My offhand guess is that this selftest does not meaningfully assess the vulnerability (I see no mention of an actual exploit). The /sys/devices/system/cpu/vulnerabilities/spectre_v2 contents reported by the /kernel/security/vulnerabilities task in each recipe match the 'sysfs reports' in the selftest task output. I still need to open a bug for this selftest, if only to track its failures in the Beaker task. Bug 1959652 is the bug mentioned in comment 8. I've talked a bit with Mike Ellerman about this issue and he gave me a few ideas of what might be going wrong. I'll take a look. Daniel, the bug specifically for the spectre problem is bug 1959652, this is more an overall tracking one. Since this is TestOnly, moving to ON_QA. https://beaker.engineering.redhat.com/jobs/5787084 ran regression tests on a RHEL9 KVM guest with kernel-5.14.0-1.el9 hosted by a RHEL8 Boston system with kernel-4.18.0-339.el8. The only failure among those regression tests generated the following output. >>> 566:mbind01 FAIL <<< ############################################################################### # Test Num : 566 # # Test Case : mbind01 # # Test Result : FAIL # ############################################################################### 1 <<<test_start>>> 2 tag=mbind01 stime=1631151913 3 cmdline="mbind01" 4 contacts="" 5 analysis=exit 6 <<<test_output>>> 7 tst_test.c:1311: TINFO: Timeout per run is 0h 05m 00s 8 mbind01.c:169: TINFO: case MPOL_DEFAULT 9 mbind01.c:216: TPASS: Test passed 10 mbind01.c:169: TINFO: case MPOL_DEFAULT (target exists) 11 mbind01.c:216: TPASS: Test passed 12 mbind01.c:169: TINFO: case MPOL_BIND (no target) 13 mbind01.c:216: TPASS: Test passed 14 mbind01.c:169: TINFO: case MPOL_BIND 15 mbind01.c:216: TPASS: Test passed 16 mbind01.c:169: TINFO: case MPOL_INTERLEAVE (no target) 17 mbind01.c:216: TPASS: Test passed 18 mbind01.c:169: TINFO: case MPOL_INTERLEAVE 19 mbind01.c:216: TPASS: Test passed 20 mbind01.c:169: TINFO: case MPOL_PREFERRED (no target) 21 mbind01.c:187: TFAIL: Wrong policy: 1, expected: 4 22 mbind01.c:169: TINFO: case MPOL_PREFERRED 23 mbind01.c:216: TPASS: Test passed 24 mbind01.c:169: TINFO: case UNKNOWN_POLICY 25 mbind01.c:216: TPASS: Test passed 26 mbind01.c:169: TINFO: case MPOL_DEFAULT (invalid flags) 27 mbind01.c:216: TPASS: Test passed 28 mbind01.c:169: TINFO: case MPOL_PREFERRED (invalid nodemask) 29 mbind01.c:216: TPASS: Test passed 30 31 Summary: 32 passed 10 33 failed 1 34 broken 0 35 skipped 0 36 warnings 0 37 <<<execution_status>>> 38 initiation_status="ok" 39 duration=0 termination_type=exited termination_id=1 corefile=no 40 cutime=0 cstime=1 41 <<<test_end>>> The absence of https://github.com/linux-test-project/ltp/commit/d96afad72f2c in the most recent LTP release explains this failure, which should not occur after the Beaker task uses a release with that commit (such a release does not yet exist at this writing). Moving to VERIFIED based on these results. |