Bug 1532553

Summary:	libvirt cmt /mbmt /mbml perf events can not be set on hosts which should support it
Product:	Red Hat Enterprise Linux 7	Reporter:	yalzhang <yalzhang>
Component:	libvirt	Assignee:	Libvirt Maintainers <libvirt-maint>
Status:	CLOSED CANTFIX	QA Contact:	yalzhang <yalzhang>
Severity:	high	Docs Contact:	Jiri Herrmann <jherrman>
Priority:	high
Version:	7.5	CC:	acme, dzheng, jdenemar, jherrman, jinqi, jishao, lhuang, mcermak, rbalakri, skozina, xuzhang, yafu
Target Milestone:	rc	Keywords:	Regression
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Known Issue
Doc Text:	Guests reporting `cmt`, `mbmt`, or `mbml` perf events fail to boot If a guest virtual machine is set to report `cmt`, `mbmt`, or `mbml` perf events, it is unable to boot after the host is upgraded to Red Hat Enterprise Linux 7.5. To work around this problem, disable this setting by removing lines that contain "event name='cmt'", "event name='mbmt'", or "event name='mbml'" from the `<perf>` section of the domain XML configuration file.	Story Points:	---
Clone Of:
Clones:	1542901 (view as bug list)		Environment:
Last Closed:	2018-02-07 13:19:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1539427, 1542901

Description yalzhang@redhat.com 2018-01-09 10:23:27 UTC

Description of problem:
rhel7.5 kernel do not provide /sys/devices/intel_cqm/type on cpu support cmt/mbm, which cause libvirt cmt perf event can not be set

Version-Release number of selected component (if applicable):
kernel-3.10.0-823.el7.x86_64
libvirt-3.9.0-7.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.on host supports cmt(with cpu Intel® Xeon® E5 2600 v3 product family, or E5 2600 v4, which also support mbm)

# virsh dumpxml rhel7.4 | grep /perf  -B2
  <perf>
    <event name='cmt' enabled='yes'/>
  </perf>
# virsh start rhel7.4
error: Failed to start domain rhel7.4
error: argument unsupported: unable to enable host cpu perf event for cmt

# ll /sys/devices/intel_cqm/type
ls: cannot access /sys/devices/intel_cqm/type: No such file or directory

# uname -a
Linux xxx 3.10.0-823.el7.x86_64 #1 SMP Wed Dec 13 21:17:45 EST 2017 x86_64 x86_64 x86_64 GNU/Linux

2.delete the perf part in guest xml, then the guest start successfully
# virsh start rhel7.4
Domain rhel7.4 started

# virsh perf rhel7.4 --enable cmt
error: Unable to enable/disable perf events
error: argument unsupported: unable to enable host cpu perf event for cmt

3.install the rhel7.4 released kernel and reboot the host to the rhel7.4 kernel, the guest can start successfully

# uname -a
Linux xxx 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

# ll /sys/devices/intel_cqm/type
-r--r--r--. 1 root root 4096 Jan  8 13:02 /sys/devices/intel_cqm/type

# virsh dumpxml rhel7.4 | grep /perf  -B2
  <perf>
    <event name='cmt' enabled='yes'/>
  </perf>

# virsh start rhel7.4
Domain rhel7.4 started

Actual results:
guest can not start with cmt perf event enabled

Expected results:
guest should start successfully with cmt perf event enabled on hosts with available cpus

Additional info:
The same issue occurs for mbml, mbmt perf event. Maybe this bug should report against kernel, but I don't know which sub-component matters.

Comment 2 Jiri Denemark 2018-01-09 12:48:47 UTC

Clearly this works with kernel 3.10.0-693.el7, while it doesn't work with 3.10.0-823.el7 and /sys/devices/intel_cqm/type is missing on that host.

Comment 4 Luyao Huang 2018-02-01 08:31:15 UTC

I think this will break vm basic func on RHOS 11 + RHEL7.5 host and RHOS 12 + RHEL7.5 host, since the perf event will add to the guest xml if libvirt report cmt/mbml/mbmt in the host support cpu list in host caps. If customers update nova compute node to RHEL7.5, the guests which have cmt/mbml/mbmt event enabled will fail to start/restore/migrate after upgrade.

We are preparing the RHOS test environment to try to reproduce this issue on RHOS, and we will update test result after test finished.

Comment 5 Xuesong Zhang 2018-02-01 09:55:29 UTC

Update the priority and severity to double high due to above comments.

Comment 6 Jingjing Shao 2018-02-02 04:48:16 UTC

I try this issue with RHOS 12 + RHEL7.5 host(3.10.0-842.el7.x86_64),the instance can be started successfully.

RHOS will not enable perf event by default. 

If enable it the in /etc/nova/nova.conf with "enabled_perf_events = cmt", the instance also can be started successfully.

# openstack server list 
+--------------------------------------+-------------+--------+--------------------+----------+--------+
| ID                                   | Name        | Status | Networks           | Image    | Flavor |
+--------------------------------------+-------------+--------+--------------------+----------+--------+
| c64e8620-0b55-46e3-97e4-546c73701123 | vm-r7-qcow2 | ACTIVE | net1=192.168.32.10 | r7-qcow2 | m2     |
+--------------------------------------+-------------+--------+--------------------+----------+--------+


Check the nova-compute.log:
2018-02-01 22:26:02.450 114776 WARNING nova.virt.libvirt.driver [-] Host does not support event type cmt.

After communicated with yalzhang, we suppose if the guest was created on the env with RHOS and RHEL7.4， then update the kernel to rhel7.5， it may get error as the description shows.

Comment 7 Jingjing Shao 2018-02-02 10:02:27 UTC

I also try RHOS 12 + RHEL7.5 host(3.10.0-693.el7.x86_64) and create instance with cmt successfully.

# openstack server  list
+--------------------------------------+-------------+--------+-------------------+----------+--------+
| ID                                   | Name        | Status | Networks          | Image    | Flavor |
+--------------------------------------+-------------+--------+-------------------+----------+--------+
| 4fe10c00-0abe-466e-aeb8-f1b304135c32 | vm-r7-qcow2 | ACTIVE | net1=192.168.32.4 | r7-qcow2 | m2     |
+--------------------------------------+-------------+--------+-------------------+----------+--------+

# virsh dumpxml instance-00000008 | grep perf -B5
  <perf>
    <event name='cmt' enabled='yes'/>
  </perf>

I shutdown the instance and update kernel to 3.10.0-843.el7.x86_64, and libvirt, qemu-kvm-rhev version, then check the perf info.

It still exists in the xml of instance, I try to start the instance, it will fail.

But when restart openstack-nova-compute.service, the info about perf will delete from the xml of instance and can be started successfully.

#virsh dumpxml instance-00000008 | grep perf -B10
#

Comment 8 Jingjing Shao 2018-02-02 10:10:01 UTC

(In reply to Jingjing Shao from comment #7)
> I also try RHOS 12 + RHEL7.5 host(3.10.0-693.el7.x86_64) and create instance
> with cmt successfully.

Sorry to make a mistake, here is RHOS 12 + RHEL7.4 host(3.10.0-693.el7.x86_64)

> 
> # openstack server  list
> +--------------------------------------+-------------+--------+--------------
> -----+----------+--------+
> | ID                                   | Name        | Status | Networks    
> | Image    | Flavor |
> +--------------------------------------+-------------+--------+--------------
> -----+----------+--------+
> | 4fe10c00-0abe-466e-aeb8-f1b304135c32 | vm-r7-qcow2 | ACTIVE |
> net1=192.168.32.4 | r7-qcow2 | m2     |
> +--------------------------------------+-------------+--------+--------------
> -----+----------+--------+
> 
> # virsh dumpxml instance-00000008 | grep perf -B5
>   <perf>
>     <event name='cmt' enabled='yes'/>
>   </perf>
> 
> I shutdown the instance and update kernel to 3.10.0-843.el7.x86_64, and
> libvirt, qemu-kvm-rhev version, then check the perf info.
> 
> It still exists in the xml of instance, I try to start the instance, it will
> fail.
> 
> But when restart openstack-nova-compute.service, the info about perf will
> delete from the xml of instance and can be started successfully.

But the function of perf will be disabled.

> 
> #virsh dumpxml instance-00000008 | grep perf -B10
> #

Comment 9 Jiri Olsa 2018-02-06 14:30:28 UTC

perf cqm interface was removed in 7.5 via BZ1457533

Attached the reason for this from RHEL7 commit:
  f6d296556b98 [x86] perf/cqm: Wipe out perf based cqm

There's new interface RDT backported to RHEL7, which provides
the same functionality correctly, please refer to following
kernel doc (search for monitor):
  Documentation/x86/intel_rdt_ui.txt

jirka

---
    'perf cqm' never worked due to the incompatibility between perf
    infrastructure and cqm hardware support.  The hardware uses RMIDs to
    track the llc occupancy of tasks and these RMIDs are per package. This
    makes monitoring a hierarchy like cgroup along with monitoring of tasks
    separately difficult and several patches sent to lkml to fix them were
    NACKed. Further more, the following issues in the current perf cqm make
    it almost unusable:
    
        1. No support to monitor the same group of tasks for which we do
        allocation using resctrl.
    
        2. It gives random and inaccurate data (mostly 0s) once we run out
        of RMIDs due to issues in Recycling.
    
        3. Recycling results in inaccuracy of data because we cannot
        guarantee that the RMID was stolen from a task when it was not
        pulling data into cache or even when it pulled the least data. Also
        for monitoring llc_occupancy, if we stop using an RMID_x and then
        start using an RMID_y after we reclaim an RMID from an other event,
        we miss accounting all the occupancy that was tagged to RMID_x at a
        later perf_count.
    
        2. Recycling code makes the monitoring code complex including
        scheduling because the event can lose RMID any time. Since MBM
        counters count bandwidth for a period of time by taking snap shot of
        total bytes at two different times, recycling complicates the way we
        count MBM in a hierarchy. Also we need a spin lock while we do the
        processing to account for MBM counter overflow. We also currently
        use a spin lock in scheduling to prevent the RMID from being taken
        away.
    
        4. Lack of support when we run different kind of event like task,
        system-wide and cgroup events together. Data mostly prints 0s. This
        is also because we can have only one RMID tied to a cpu as defined
        by the cqm hardware but a perf can at the same time tie multiple
        events during one sched_in.
    
        5. No support of monitoring a group of tasks. There is partial support
        for cgroup but it does not work once there is a hierarchy of cgroups
        or if we want to monitor a task in a cgroup and the cgroup itself.
    
        6. No support for monitoring tasks for the lifetime without perf
        overhead.
    
        7. It reported the aggregate cache occupancy or memory bandwidth over
        all sockets. But most cloud and VMM based use cases want to know the
        individual per-socket usage.

Comment 11 Jiri Denemark 2018-02-07 13:19:10 UTC

There's nothing we can do with this if the kernel interface was removed. We can only document the dropped functionality as a known issue in release notes.