Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1948363

Summary:	RHEL9 ppc64le guests should work under RHEL8 KVM ppc64le hosts
Product:	Red Hat Enterprise Linux 9	Reporter:	David Gibson <dgibson>
Component:	kernel	Assignee:	Daniel Henrique Barboza (IBM) <dbarboza>
kernel sub component:	Platform Enablement	QA Contact:	Eirik Fuller <efuller>
Status:	CLOSED CURRENTRELEASE	Docs Contact:
Severity:	high
Priority:	high	CC:	bstinson, carl, dbarboza, efuller, jwboyer, mdeng, mrezanin, ngu, qzhang, rvr, xuma
Version:	CentOS Stream	Keywords:	TestOnly, Triaged
Target Milestone:	beta	Flags:	pm-rhel: mirror+
Target Release:	---
Hardware:	ppc64le
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-01-12 11:26:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1959652
Bug Blocks:

Description David Gibson 2021-04-12 04:16:26 UTC

Description of problem:

Although we're dropping support for KVM on ppc64le in RHEL9, RHEL9 guests should still work under existing RHEL8 KVM hosts.


Additional info:

In theory this shouldn't require anything: the guest environment for RHEL8 KVM hosts should be almost the same as for PowerVM LPARs, which is our primary target.  We just need to verify and address any KVM specific issues that arise.

Comment 1 Eirik Fuller 2021-05-04 01:46:04 UTC

Initial test results (in https://beaker.engineering.redhat.com/jobs/5336764) are encouraging.

A Witherspoon system running kernel-4.18.0-305.el8 (RHEL-8.4.0-20210503.1) hosted kernel-5.12.0-0.rc8.193.el9 (RHEL-9.0.0-20210428.3) on a couple of KVM guests, with assorted failures in the PowerPC selftests and a couple of perf testsuites.

The only untracked PowerPC selftest failure was 'powerpc/security: spectre_v2' (I'll open a bug if I don't find an existing bug which tracks that failure). I'll check the perf test failures to see if any of those need new bugs, and I'll add more tasks in subsequent runs.

Comment 2 David Gibson 2021-05-04 05:17:32 UTC

Eirik, what KVM userspace (qemu, libvirt) are you using for those tests?

Comment 3 Eirik Fuller 2021-05-04 13:03:48 UTC

The host recipe (https://beaker.engineering.redhat.com/recipes/9935001) of the job linked in comment 1 uses /distribution/virt/install and /distribution/virt/start; the former installs libvirt (though it was already installed due to a package tag in the job XML; perhaps that part of the job XML is redundant).

In short, the tests described in comment 1 used libvirt, not qemu. If I find a way to test with qemu, I'll log the results here.

Comment 4 David Gibson 2021-05-05 01:06:40 UTC

You're always using qemu, even if you're doing it via libvirt.  What I'm after is the versions of libvirt and qemu that were actually installed in this test run.

I'm suspecting it's the RHEL-8.4 versions rather than the RHEL-AV-8.4 versions, and the latest Spectre mitigations might only be in the latter.

Comment 5 Eirik Fuller 2021-05-05 01:48:36 UTC

http://download.eng.bos.redhat.com/rhel-8/rel-eng/RHEL-8/RHEL-8.4.0-20210503.1/compose/AppStream/ppc64le/os/Packages/ (used in the comment 1 Beaker host recipe) has libvirt-6.0.0-35.module+el8.4.0+10230+7a9b21e4

Comment 6 David Gibson 2021-05-05 03:27:28 UTC

And more importantly qemu-kvm-4.2.0-48.module+el8.4.0+10368+630e803b.ppc64le.rpm.

Hmm.. can you check the contents of /sys/devices/system/cpu/vulnerabilities/spectre_v2 on both the RHEL8 host and the RHEL9 guest?

Comment 7 Eirik Fuller 2021-05-05 12:44:41 UTC

https://beaker.engineering.redhat.com/recipes/9941957/tasks/125542992/logs/taskout.log (from the /kernel/security/vulnerabilities task on the kernel-4.18.0-305.el8 Witherspoon host) includes the following output.


:: [ 23:18:03 ] :: [  BEGIN   ] :: Log vulnerabilities status :: actually running 'grep . * | sed 's/:/^/' | column -t -s^ | tee -a /mnt/testarea/tmp.x3aorO'
itlb_multihit      Not affected
l1tf               Not affected
mds                Not affected
meltdown           Mitigation: RFI Flush, L1D private per thread
spec_store_bypass  Mitigation: Kernel entry/exit barrier (eieio)
spectre_v1         Mitigation: __user pointer sanitization, ori31 speculation barrier enabled
spectre_v2         Mitigation: Software count cache flush (hardware accelerated), Software link stack flush
srbds              Not affected
tsx_async_abort    Not affected
:: [ 23:18:03 ] :: [   PASS   ] :: Log vulnerabilities status (Expected 0, got 0)


https://beaker.engineering.redhat.com/recipes/9941958/tasks/125543026/logs/taskout.log (from the /kernel/security/vulnerabilities task on the kernel-5.12.0-1.el9 KVM guest) includes the following output.


:: [ 23:34:33 ] :: [  BEGIN   ] :: Log vulnerabilities status :: actually running 'grep . * | sed 's/:/^/' | column -t -s^ | tee -a /mnt/testarea/tmp.hKWQDP'
itlb_multihit      Not affected
l1tf               Mitigation: RFI Flush, L1D private per thread
mds                Not affected
meltdown           Mitigation: RFI Flush, L1D private per thread
spec_store_bypass  Mitigation: Kernel entry/exit barrier (eieio)
spectre_v1         Mitigation: __user pointer sanitization, ori31 speculation barrier enabled
spectre_v2         Mitigation: Software count cache flush (hardware accelerated), Software link stack flush
srbds              Not affected
tsx_async_abort    Not affected
:: [ 23:34:33 ] :: [   PASS   ] :: Log vulnerabilities status (Expected 0, got 0)


Both host and guest report a mitigation for spectre_v2, so it seems either the mitigation is ineffective, or the spectre_v2 selftest is flawed. Either way the spectre_v2 selftest failure does not seem like the result of a qemu issue.

It's possible the spectre_v2 selftest failure does not occur on all Power9 hosts. I'll check on a Boston host.

Comment 8 Eirik Fuller 2021-05-05 16:54:21 UTC

https://beaker.engineering.redhat.com/jobs/5341726 is still running as I type this, but it has finished the PowerPC selftests in both recipes.

Results from the spectre_v2 selftest on the kernel-4.18.0-305.el8 Boston host follow.


# selftests: powerpc/security: spectre_v2
# test: spectre_v2
# tags: git_version:unknown
# sysfs reports: 'Mitigation: Indirect branch cache disabled, Software link stack flush'
#  PM_BR_PRED_CCACHE: result 2147483649 running/enabled 20192793932
# PM_BR_MPRED_CCACHE: result 2147483649 running/enabled 20192790826
#  PM_BR_PRED_PCACHE: result          0 running/enabled 20192789488
# PM_BR_MPRED_PCACHE: result          0 running/enabled 20192788160
# Miss percent 100 %
# OK - Measured branch prediction rates match reported spectre v2 mitigation.
# success: spectre_v2
ok 3 selftests: powerpc/security: spectre_v2


Results from the spectre_v2 selftest on the kernel-5.12.0-1.el9 KVM guest follows.


# selftests: powerpc/security: spectre_v2
# test: spectre_v2
# tags: git_version:unknown
# sysfs reports: 'Mitigation: Software count cache flush (hardware accelerated), Software link stack flush'
#  PM_BR_PRED_CCACHE: result 2147483649 running/enabled 19321183346
# PM_BR_MPRED_CCACHE: result 2147483649 running/enabled 19321178802
#  PM_BR_PRED_PCACHE: result          0 running/enabled 19321177702
# PM_BR_MPRED_PCACHE: result          0 running/enabled 19321176632
# Miss percent 100 %
# Branch misses > 15% unexpected in this configuration!
# Possible mis-match between reported & actual mitigation
# failure: spectre_v2
not ok 3 selftests: powerpc/security: spectre_v2 # exit=1


The KVM selftest failure is based on a possible mismatch between the reported and actual mitigation. The results are similar across host and guest (Miss percent 100 %), suggesting that the host mitigation takes effect even in the guest.

It's not clear whether this selftest reaches an erroneous conclusion in a KVM guest, or qemu somehow needs a better emulation of this vulnerability and its mitigation. My offhand guess is that this selftest does not meaningfully assess the vulnerability (I see no mention of an actual exploit).

The /sys/devices/system/cpu/vulnerabilities/spectre_v2 contents reported by the /kernel/security/vulnerabilities task in each recipe match the 'sysfs reports' in the selftest task output.

I still need to open a bug for this selftest, if only to track its failures in the Beaker task.

Comment 9 Eirik Fuller 2021-05-12 03:00:47 UTC

Bug 1959652 is the bug mentioned in comment 8.

Comment 10 Daniel Henrique Barboza (IBM) 2021-05-30 23:36:40 UTC

I've talked a bit with Mike Ellerman about this issue and he gave me a few ideas of
what might be going wrong. I'll take a look.

Comment 11 David Gibson 2021-06-01 02:12:09 UTC

Daniel, the bug specifically for the spectre problem is bug 1959652, this is more an overall tracking one.

Comment 12 David Gibson 2021-07-12 05:14:13 UTC

Since this is TestOnly, moving to ON_QA.

Comment 13 Eirik Fuller 2021-09-09 14:05:43 UTC

https://beaker.engineering.redhat.com/jobs/5787084 ran regression tests on a RHEL9 KVM guest with kernel-5.14.0-1.el9 hosted by a RHEL8 Boston system with kernel-4.18.0-339.el8.

The only failure among those regression tests generated the following output.


>>> 566:mbind01 FAIL <<<
###############################################################################
# Test Num    : 566                                                           #
# Test Case   : mbind01                                                       #
# Test Result : FAIL                                                          #
###############################################################################

     1	<<<test_start>>>
     2	tag=mbind01 stime=1631151913
     3	cmdline="mbind01"
     4	contacts=""
     5	analysis=exit
     6	<<<test_output>>>
     7	tst_test.c:1311: TINFO: Timeout per run is 0h 05m 00s
     8	mbind01.c:169: TINFO: case MPOL_DEFAULT
     9	mbind01.c:216: TPASS: Test passed
    10	mbind01.c:169: TINFO: case MPOL_DEFAULT (target exists)
    11	mbind01.c:216: TPASS: Test passed
    12	mbind01.c:169: TINFO: case MPOL_BIND (no target)
    13	mbind01.c:216: TPASS: Test passed
    14	mbind01.c:169: TINFO: case MPOL_BIND
    15	mbind01.c:216: TPASS: Test passed
    16	mbind01.c:169: TINFO: case MPOL_INTERLEAVE (no target)
    17	mbind01.c:216: TPASS: Test passed
    18	mbind01.c:169: TINFO: case MPOL_INTERLEAVE
    19	mbind01.c:216: TPASS: Test passed
    20	mbind01.c:169: TINFO: case MPOL_PREFERRED (no target)
    21	mbind01.c:187: TFAIL: Wrong policy: 1, expected: 4
    22	mbind01.c:169: TINFO: case MPOL_PREFERRED
    23	mbind01.c:216: TPASS: Test passed
    24	mbind01.c:169: TINFO: case UNKNOWN_POLICY
    25	mbind01.c:216: TPASS: Test passed
    26	mbind01.c:169: TINFO: case MPOL_DEFAULT (invalid flags)
    27	mbind01.c:216: TPASS: Test passed
    28	mbind01.c:169: TINFO: case MPOL_PREFERRED (invalid nodemask)
    29	mbind01.c:216: TPASS: Test passed
    30	
    31	Summary:
    32	passed   10
    33	failed   1
    34	broken   0
    35	skipped  0
    36	warnings 0
    37	<<<execution_status>>>
    38	initiation_status="ok"
    39	duration=0 termination_type=exited termination_id=1 corefile=no
    40	cutime=0 cstime=1
    41	<<<test_end>>>


The absence of https://github.com/linux-test-project/ltp/commit/d96afad72f2c in the most recent LTP release explains this failure, which should not occur after the Beaker task uses a release with that commit (such a release does not yet exist at this writing).

Moving to VERIFIED based on these results.