RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1948363 - RHEL9 ppc64le guests should work under RHEL8 KVM ppc64le hosts
Summary: RHEL9 ppc64le guests should work under RHEL8 KVM ppc64le hosts
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: CentOS Stream
Hardware: ppc64le
OS: Unspecified
high
high
Target Milestone: beta
: ---
Assignee: Daniel Henrique Barboza (IBM)
QA Contact: Eirik Fuller
URL:
Whiteboard:
Depends On: 1959652
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-12 04:16 UTC by David Gibson
Modified: 2022-01-12 11:41 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-12 11:26:32 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description David Gibson 2021-04-12 04:16:26 UTC
Description of problem:

Although we're dropping support for KVM on ppc64le in RHEL9, RHEL9 guests should still work under existing RHEL8 KVM hosts.


Additional info:

In theory this shouldn't require anything: the guest environment for RHEL8 KVM hosts should be almost the same as for PowerVM LPARs, which is our primary target.  We just need to verify and address any KVM specific issues that arise.

Comment 1 Eirik Fuller 2021-05-04 01:46:04 UTC
Initial test results (in https://beaker.engineering.redhat.com/jobs/5336764) are encouraging.

A Witherspoon system running kernel-4.18.0-305.el8 (RHEL-8.4.0-20210503.1) hosted kernel-5.12.0-0.rc8.193.el9 (RHEL-9.0.0-20210428.3) on a couple of KVM guests, with assorted failures in the PowerPC selftests and a couple of perf testsuites.

The only untracked PowerPC selftest failure was 'powerpc/security: spectre_v2' (I'll open a bug if I don't find an existing bug which tracks that failure). I'll check the perf test failures to see if any of those need new bugs, and I'll add more tasks in subsequent runs.

Comment 2 David Gibson 2021-05-04 05:17:32 UTC
Eirik, what KVM userspace (qemu, libvirt) are you using for those tests?

Comment 3 Eirik Fuller 2021-05-04 13:03:48 UTC
The host recipe (https://beaker.engineering.redhat.com/recipes/9935001) of the job linked in comment 1 uses /distribution/virt/install and /distribution/virt/start; the former installs libvirt (though it was already installed due to a package tag in the job XML; perhaps that part of the job XML is redundant).

In short, the tests described in comment 1 used libvirt, not qemu. If I find a way to test with qemu, I'll log the results here.

Comment 4 David Gibson 2021-05-05 01:06:40 UTC
You're always using qemu, even if you're doing it via libvirt.  What I'm after is the versions of libvirt and qemu that were actually installed in this test run.

I'm suspecting it's the RHEL-8.4 versions rather than the RHEL-AV-8.4 versions, and the latest Spectre mitigations might only be in the latter.

Comment 5 Eirik Fuller 2021-05-05 01:48:36 UTC
http://download.eng.bos.redhat.com/rhel-8/rel-eng/RHEL-8/RHEL-8.4.0-20210503.1/compose/AppStream/ppc64le/os/Packages/ (used in the comment 1 Beaker host recipe) has libvirt-6.0.0-35.module+el8.4.0+10230+7a9b21e4

Comment 6 David Gibson 2021-05-05 03:27:28 UTC
And more importantly qemu-kvm-4.2.0-48.module+el8.4.0+10368+630e803b.ppc64le.rpm.

Hmm.. can you check the contents of /sys/devices/system/cpu/vulnerabilities/spectre_v2 on both the RHEL8 host and the RHEL9 guest?

Comment 7 Eirik Fuller 2021-05-05 12:44:41 UTC
https://beaker.engineering.redhat.com/recipes/9941957/tasks/125542992/logs/taskout.log (from the /kernel/security/vulnerabilities task on the kernel-4.18.0-305.el8 Witherspoon host) includes the following output.


:: [ 23:18:03 ] :: [  BEGIN   ] :: Log vulnerabilities status :: actually running 'grep . * | sed 's/:/^/' | column -t -s^ | tee -a /mnt/testarea/tmp.x3aorO'
itlb_multihit      Not affected
l1tf               Not affected
mds                Not affected
meltdown           Mitigation: RFI Flush, L1D private per thread
spec_store_bypass  Mitigation: Kernel entry/exit barrier (eieio)
spectre_v1         Mitigation: __user pointer sanitization, ori31 speculation barrier enabled
spectre_v2         Mitigation: Software count cache flush (hardware accelerated), Software link stack flush
srbds              Not affected
tsx_async_abort    Not affected
:: [ 23:18:03 ] :: [   PASS   ] :: Log vulnerabilities status (Expected 0, got 0)


https://beaker.engineering.redhat.com/recipes/9941958/tasks/125543026/logs/taskout.log (from the /kernel/security/vulnerabilities task on the kernel-5.12.0-1.el9 KVM guest) includes the following output.


:: [ 23:34:33 ] :: [  BEGIN   ] :: Log vulnerabilities status :: actually running 'grep . * | sed 's/:/^/' | column -t -s^ | tee -a /mnt/testarea/tmp.hKWQDP'
itlb_multihit      Not affected
l1tf               Mitigation: RFI Flush, L1D private per thread
mds                Not affected
meltdown           Mitigation: RFI Flush, L1D private per thread
spec_store_bypass  Mitigation: Kernel entry/exit barrier (eieio)
spectre_v1         Mitigation: __user pointer sanitization, ori31 speculation barrier enabled
spectre_v2         Mitigation: Software count cache flush (hardware accelerated), Software link stack flush
srbds              Not affected
tsx_async_abort    Not affected
:: [ 23:34:33 ] :: [   PASS   ] :: Log vulnerabilities status (Expected 0, got 0)


Both host and guest report a mitigation for spectre_v2, so it seems either the mitigation is ineffective, or the spectre_v2 selftest is flawed. Either way the spectre_v2 selftest failure does not seem like the result of a qemu issue.

It's possible the spectre_v2 selftest failure does not occur on all Power9 hosts. I'll check on a Boston host.

Comment 8 Eirik Fuller 2021-05-05 16:54:21 UTC
https://beaker.engineering.redhat.com/jobs/5341726 is still running as I type this, but it has finished the PowerPC selftests in both recipes.

Results from the spectre_v2 selftest on the kernel-4.18.0-305.el8 Boston host follow.


# selftests: powerpc/security: spectre_v2
# test: spectre_v2
# tags: git_version:unknown
# sysfs reports: 'Mitigation: Indirect branch cache disabled, Software link stack flush'
#  PM_BR_PRED_CCACHE: result 2147483649 running/enabled 20192793932
# PM_BR_MPRED_CCACHE: result 2147483649 running/enabled 20192790826
#  PM_BR_PRED_PCACHE: result          0 running/enabled 20192789488
# PM_BR_MPRED_PCACHE: result          0 running/enabled 20192788160
# Miss percent 100 %
# OK - Measured branch prediction rates match reported spectre v2 mitigation.
# success: spectre_v2
ok 3 selftests: powerpc/security: spectre_v2


Results from the spectre_v2 selftest on the kernel-5.12.0-1.el9 KVM guest follows.


# selftests: powerpc/security: spectre_v2
# test: spectre_v2
# tags: git_version:unknown
# sysfs reports: 'Mitigation: Software count cache flush (hardware accelerated), Software link stack flush'
#  PM_BR_PRED_CCACHE: result 2147483649 running/enabled 19321183346
# PM_BR_MPRED_CCACHE: result 2147483649 running/enabled 19321178802
#  PM_BR_PRED_PCACHE: result          0 running/enabled 19321177702
# PM_BR_MPRED_PCACHE: result          0 running/enabled 19321176632
# Miss percent 100 %
# Branch misses > 15% unexpected in this configuration!
# Possible mis-match between reported & actual mitigation
# failure: spectre_v2
not ok 3 selftests: powerpc/security: spectre_v2 # exit=1


The KVM selftest failure is based on a possible mismatch between the reported and actual mitigation. The results are similar across host and guest (Miss percent 100 %), suggesting that the host mitigation takes effect even in the guest.

It's not clear whether this selftest reaches an erroneous conclusion in a KVM guest, or qemu somehow needs a better emulation of this vulnerability and its mitigation. My offhand guess is that this selftest does not meaningfully assess the vulnerability (I see no mention of an actual exploit).

The /sys/devices/system/cpu/vulnerabilities/spectre_v2 contents reported by the /kernel/security/vulnerabilities task in each recipe match the 'sysfs reports' in the selftest task output.

I still need to open a bug for this selftest, if only to track its failures in the Beaker task.

Comment 9 Eirik Fuller 2021-05-12 03:00:47 UTC
Bug 1959652 is the bug mentioned in comment 8.

Comment 10 Daniel Henrique Barboza (IBM) 2021-05-30 23:36:40 UTC
I've talked a bit with Mike Ellerman about this issue and he gave me a few ideas of
what might be going wrong. I'll take a look.

Comment 11 David Gibson 2021-06-01 02:12:09 UTC
Daniel, the bug specifically for the spectre problem is bug 1959652, this is more an overall tracking one.

Comment 12 David Gibson 2021-07-12 05:14:13 UTC
Since this is TestOnly, moving to ON_QA.

Comment 13 Eirik Fuller 2021-09-09 14:05:43 UTC
https://beaker.engineering.redhat.com/jobs/5787084 ran regression tests on a RHEL9 KVM guest with kernel-5.14.0-1.el9 hosted by a RHEL8 Boston system with kernel-4.18.0-339.el8.

The only failure among those regression tests generated the following output.


>>> 566:mbind01 FAIL <<<
###############################################################################
# Test Num    : 566                                                           #
# Test Case   : mbind01                                                       #
# Test Result : FAIL                                                          #
###############################################################################

     1	<<<test_start>>>
     2	tag=mbind01 stime=1631151913
     3	cmdline="mbind01"
     4	contacts=""
     5	analysis=exit
     6	<<<test_output>>>
     7	tst_test.c:1311: TINFO: Timeout per run is 0h 05m 00s
     8	mbind01.c:169: TINFO: case MPOL_DEFAULT
     9	mbind01.c:216: TPASS: Test passed
    10	mbind01.c:169: TINFO: case MPOL_DEFAULT (target exists)
    11	mbind01.c:216: TPASS: Test passed
    12	mbind01.c:169: TINFO: case MPOL_BIND (no target)
    13	mbind01.c:216: TPASS: Test passed
    14	mbind01.c:169: TINFO: case MPOL_BIND
    15	mbind01.c:216: TPASS: Test passed
    16	mbind01.c:169: TINFO: case MPOL_INTERLEAVE (no target)
    17	mbind01.c:216: TPASS: Test passed
    18	mbind01.c:169: TINFO: case MPOL_INTERLEAVE
    19	mbind01.c:216: TPASS: Test passed
    20	mbind01.c:169: TINFO: case MPOL_PREFERRED (no target)
    21	mbind01.c:187: TFAIL: Wrong policy: 1, expected: 4
    22	mbind01.c:169: TINFO: case MPOL_PREFERRED
    23	mbind01.c:216: TPASS: Test passed
    24	mbind01.c:169: TINFO: case UNKNOWN_POLICY
    25	mbind01.c:216: TPASS: Test passed
    26	mbind01.c:169: TINFO: case MPOL_DEFAULT (invalid flags)
    27	mbind01.c:216: TPASS: Test passed
    28	mbind01.c:169: TINFO: case MPOL_PREFERRED (invalid nodemask)
    29	mbind01.c:216: TPASS: Test passed
    30	
    31	Summary:
    32	passed   10
    33	failed   1
    34	broken   0
    35	skipped  0
    36	warnings 0
    37	<<<execution_status>>>
    38	initiation_status="ok"
    39	duration=0 termination_type=exited termination_id=1 corefile=no
    40	cutime=0 cstime=1
    41	<<<test_end>>>


The absence of https://github.com/linux-test-project/ltp/commit/d96afad72f2c in the most recent LTP release explains this failure, which should not occur after the Beaker task uses a release with that commit (such a release does not yet exist at this writing).

Moving to VERIFIED based on these results.


Note You need to log in before you can comment on or make changes to this bug.