Bug 1532553
Summary: | libvirt cmt /mbmt /mbml perf events can not be set on hosts which should support it | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | yalzhang <yalzhang> | |
Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> | |
Status: | CLOSED CANTFIX | QA Contact: | yalzhang <yalzhang> | |
Severity: | high | Docs Contact: | Jiri Herrmann <jherrman> | |
Priority: | high | |||
Version: | 7.5 | CC: | acme, dzheng, jdenemar, jherrman, jinqi, jishao, lhuang, mcermak, rbalakri, skozina, xuzhang, yafu | |
Target Milestone: | rc | Keywords: | Regression | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Known Issue | ||
Doc Text: |
Guests reporting `cmt`, `mbmt`, or `mbml` perf events fail to boot
If a guest virtual machine is set to report `cmt`, `mbmt`, or `mbml` perf events, it is unable to boot after the host is upgraded to Red Hat Enterprise Linux 7.5.
To work around this problem, disable this setting by removing lines that contain "event name='cmt'", "event name='mbmt'", or "event name='mbml'" from the `<perf>` section of the domain XML configuration file.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1542901 (view as bug list) | Environment: | ||
Last Closed: | 2018-02-07 13:19:10 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1539427, 1542901 |
Description
yalzhang@redhat.com
2018-01-09 10:23:27 UTC
Clearly this works with kernel 3.10.0-693.el7, while it doesn't work with 3.10.0-823.el7 and /sys/devices/intel_cqm/type is missing on that host. I think this will break vm basic func on RHOS 11 + RHEL7.5 host and RHOS 12 + RHEL7.5 host, since the perf event will add to the guest xml if libvirt report cmt/mbml/mbmt in the host support cpu list in host caps. If customers update nova compute node to RHEL7.5, the guests which have cmt/mbml/mbmt event enabled will fail to start/restore/migrate after upgrade. We are preparing the RHOS test environment to try to reproduce this issue on RHOS, and we will update test result after test finished. Update the priority and severity to double high due to above comments. I try this issue with RHOS 12 + RHEL7.5 host(3.10.0-842.el7.x86_64),the instance can be started successfully. RHOS will not enable perf event by default. If enable it the in /etc/nova/nova.conf with "enabled_perf_events = cmt", the instance also can be started successfully. # openstack server list +--------------------------------------+-------------+--------+--------------------+----------+--------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+-------------+--------+--------------------+----------+--------+ | c64e8620-0b55-46e3-97e4-546c73701123 | vm-r7-qcow2 | ACTIVE | net1=192.168.32.10 | r7-qcow2 | m2 | +--------------------------------------+-------------+--------+--------------------+----------+--------+ Check the nova-compute.log: 2018-02-01 22:26:02.450 114776 WARNING nova.virt.libvirt.driver [-] Host does not support event type cmt. After communicated with yalzhang, we suppose if the guest was created on the env with RHOS and RHEL7.4, then update the kernel to rhel7.5, it may get error as the description shows. I also try RHOS 12 + RHEL7.5 host(3.10.0-693.el7.x86_64) and create instance with cmt successfully. # openstack server list +--------------------------------------+-------------+--------+-------------------+----------+--------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+-------------+--------+-------------------+----------+--------+ | 4fe10c00-0abe-466e-aeb8-f1b304135c32 | vm-r7-qcow2 | ACTIVE | net1=192.168.32.4 | r7-qcow2 | m2 | +--------------------------------------+-------------+--------+-------------------+----------+--------+ # virsh dumpxml instance-00000008 | grep perf -B5 <perf> <event name='cmt' enabled='yes'/> </perf> I shutdown the instance and update kernel to 3.10.0-843.el7.x86_64, and libvirt, qemu-kvm-rhev version, then check the perf info. It still exists in the xml of instance, I try to start the instance, it will fail. But when restart openstack-nova-compute.service, the info about perf will delete from the xml of instance and can be started successfully. #virsh dumpxml instance-00000008 | grep perf -B10 # (In reply to Jingjing Shao from comment #7) > I also try RHOS 12 + RHEL7.5 host(3.10.0-693.el7.x86_64) and create instance > with cmt successfully. Sorry to make a mistake, here is RHOS 12 + RHEL7.4 host(3.10.0-693.el7.x86_64) > > # openstack server list > +--------------------------------------+-------------+--------+-------------- > -----+----------+--------+ > | ID | Name | Status | Networks > | Image | Flavor | > +--------------------------------------+-------------+--------+-------------- > -----+----------+--------+ > | 4fe10c00-0abe-466e-aeb8-f1b304135c32 | vm-r7-qcow2 | ACTIVE | > net1=192.168.32.4 | r7-qcow2 | m2 | > +--------------------------------------+-------------+--------+-------------- > -----+----------+--------+ > > # virsh dumpxml instance-00000008 | grep perf -B5 > <perf> > <event name='cmt' enabled='yes'/> > </perf> > > I shutdown the instance and update kernel to 3.10.0-843.el7.x86_64, and > libvirt, qemu-kvm-rhev version, then check the perf info. > > It still exists in the xml of instance, I try to start the instance, it will > fail. > > But when restart openstack-nova-compute.service, the info about perf will > delete from the xml of instance and can be started successfully. But the function of perf will be disabled. > > #virsh dumpxml instance-00000008 | grep perf -B10 > # perf cqm interface was removed in 7.5 via BZ1457533 Attached the reason for this from RHEL7 commit: f6d296556b98 [x86] perf/cqm: Wipe out perf based cqm There's new interface RDT backported to RHEL7, which provides the same functionality correctly, please refer to following kernel doc (search for monitor): Documentation/x86/intel_rdt_ui.txt jirka --- 'perf cqm' never worked due to the incompatibility between perf infrastructure and cqm hardware support. The hardware uses RMIDs to track the llc occupancy of tasks and these RMIDs are per package. This makes monitoring a hierarchy like cgroup along with monitoring of tasks separately difficult and several patches sent to lkml to fix them were NACKed. Further more, the following issues in the current perf cqm make it almost unusable: 1. No support to monitor the same group of tasks for which we do allocation using resctrl. 2. It gives random and inaccurate data (mostly 0s) once we run out of RMIDs due to issues in Recycling. 3. Recycling results in inaccuracy of data because we cannot guarantee that the RMID was stolen from a task when it was not pulling data into cache or even when it pulled the least data. Also for monitoring llc_occupancy, if we stop using an RMID_x and then start using an RMID_y after we reclaim an RMID from an other event, we miss accounting all the occupancy that was tagged to RMID_x at a later perf_count. 2. Recycling code makes the monitoring code complex including scheduling because the event can lose RMID any time. Since MBM counters count bandwidth for a period of time by taking snap shot of total bytes at two different times, recycling complicates the way we count MBM in a hierarchy. Also we need a spin lock while we do the processing to account for MBM counter overflow. We also currently use a spin lock in scheduling to prevent the RMID from being taken away. 4. Lack of support when we run different kind of event like task, system-wide and cgroup events together. Data mostly prints 0s. This is also because we can have only one RMID tied to a cpu as defined by the cqm hardware but a perf can at the same time tie multiple events during one sched_in. 5. No support of monitoring a group of tasks. There is partial support for cgroup but it does not work once there is a hierarchy of cgroups or if we want to monitor a task in a cgroup and the cgroup itself. 6. No support for monitoring tasks for the lifetime without perf overhead. 7. It reported the aggregate cache occupancy or memory bandwidth over all sockets. But most cloud and VMM based use cases want to know the individual per-socket usage. There's nothing we can do with this if the kernel interface was removed. We can only document the dropped functionality as a known issue in release notes. |