Bug 1388528
| Summary: | KVM-RT: halting and starting guests cause latency spikes [rhel-rt-7.3.z] | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Marcel Kolaja <mkolaja> |
| Component: | kernel-rt | Assignee: | Clark Williams <williams> |
| kernel-rt sub component: | KVM | QA Contact: | Pei Zhang <pezhang> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | bhu, chayang, ggopinat, hhuang, jen, jshortt, juzhang, lcapitulino, mkolaja, mst, pagupta, pbonzini, pezhang, riel, sgordon, sherold, snagar, srostedt, virt-maint, williams, xfu |
| Version: | 7.2 | Keywords: | ZStream |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: synchronized_rcu_expedited() is a call used upstream to increase the priority of rcu synchronize operations
Consequence: Calling this may hold off realtime operations and cause latency spikes
Fix: make the call to synchronize_rcu_expedited conditional on not being in an RT kernel
Result: No latency spikes caused by the rcu expedited call
|
Story Points: | --- |
| Clone Of: | 1378172 | Environment: | |
| Last Closed: | 2016-12-06 17:10:29 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1378172 | ||
| Bug Blocks: | 1353018 | ||
|
Description
Marcel Kolaja
2016-10-25 14:53:24 UTC
Hi Clark, QE failed to reproduce this issue with rhel7.3GA version. Below scenarios were tested, but still failed reproduce, no spikes in the testing(the Max latencies < 20): 1. run cyclictests on vm1 for 15m, reboot/halt/shutdown vm2 5min later 2. run cyclictests on vm1, vm2 and vm3 for 15m, reboot vm2 several times However, QE can reproduce this issue with rhel7.2.z(3.10.0-327.36.1.rt56.237.el7.x86_64) version. We also tested with this bug's fixed version kernel-rt-3.10.0-514.1.1.rt56.422.el7, no spike occurs. Could you give QE some suggestions about this bug? Thanks. Best Regards, -Pei The rhel7.3GA version we tested: 3.10.0-514.rt56.420.el7.x86_64 The rhel7.3GA kernel does have the bug, I think it's just a matter of trying harder to reproduce it. What you could do is: 1. Run cyclictest for longer (eg. 1 hour) 2. The second VM should keep rebooting in a loop while cyclictest runs on the other VM Another note, make sure that the VM that reboots is a "standard" VM. Meaning that, it has a network NIC etc. The best way is probably to install it with virt-install and don't change the XML. I talked to Pai Zhang today on IRC and I think we have found out why the problem is not reproducing. As it turn out, the bug reproduces on halt and re-start, not in reboots (as I mention in bug 1378172 comment 22. Sorry for having forgotten about that. The reproducer I've been using is: 1. Install a "standard" VM with virt-manager (that is, don't change the XML) 2. In the VM, add "halt -p" to /etc/rc.d/rc.local (save a snapshot before doing this if you plan to use the VM afterwards) 3. In the host, write a script that does "virsh start VM" every few seconds in a loop Then while this is running, run the cyclitest test-case in the RT VM. Thanks Luiz for providing the detail reproduce method. ==Reproduce== Versions: RHEL7.3GA version: 3.10.0-514.rt56.420.el7.x86_64 Steps: Same as Comment 9. And run cyclitest tests in rt VM for 1 hour. Results: # Min Latencies: 00003 # Avg Latencies: 00005 # Max Latencies: 00033 The Max latencies 33 > 20. So this bug has been reproduced. ==Verification== Versions: 3.10.0-514.1.1.rt56.422.el7.x86_64 Steps: Same as reproduce. Results: # Min Latencies: 00003 # Avg Latencies: 00005 # Max Latencies: 00011 So this bug has been fixed well. Set this bug 'VERIFIED' as Comment 10. Thanks for insisting on having a reproducer Pei Zhang! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2883.html |