Bug 1292117

Summary: tuned: race condition when generating modprobe file
Product: Red Hat Enterprise Linux 7 Reporter: Luiz Capitulino <lcapitulino>
Component: tunedAssignee: Jaroslav Škarvada <jskarvad>
Status: CLOSED ERRATA QA Contact: Tereza Cerna <tcerna>
Severity: unspecified Docs Contact:
Priority: high    
Version: 7.3CC: bhu, hhuang, jeder, jen, jskarvad, juzhang, peterx, tcerna
Target Milestone: rcKeywords: Patch, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tuned-2.5.1-6.el7 Doc Type: Bug Fix
Doc Text:
Cause: Previously there existed race condition in realtime-virtual-host Tuned profile. Consequence: The modprobe configuration file for KVM RT could be generated after the KVM modules are loaded causing the modules options to be not used. This can result in latency spike. The modprobe configuration file was (re)generated on each Tuned start (machine boot) and profile activation which increased the probability that the race condition will be triggered. Fix: Tuned code was extended to provide hint to profiles why it executes "stop" callback and the realtime-virtual-host profile was updated to use this hint and conditionally generate the modprobe configuration file only when the profile is activated (selected). Result: Now the modprobe configuration file is generated only when the realtime-virtual-host profile is activated (selected). The profile activation needs to be followed by machine reboot. During the reboot the modprobe configuration file is not (re)generated again which removes the race condition.
Story Points: ---
Clone Of:
: 1298204 (view as bug list) Environment:
Last Closed: 2016-11-04 07:26:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1240765, 1273048, 1298204    

Description Luiz Capitulino 2015-12-16 14:14:13 UTC
Description of problem:

We are able to reproduce a latency spike with real-time KVM which was debugged down to the vCPU thread scheduling into idle during real-time operation. This causes a latency spike:

qemu-kvm-3317  [011] d...2.. 3912711163038 (+179556): kvm_exit: reason EXTERNAL_INTERRUPT rip 0x7f965efdaa09 info 0 800000fd
qemu-kvm-3317  [011] d...311 3912711174912 (+11874): sched_switch: prev_comm=qemu-kvm prev_pid=3317 prev_prio=98 prev_state=D ==> next_comm=swapper/11 next_pid=0 next_prio=120
  <idle>-0     [011] d...3.. 3912711185082 (+10170): sched_switch: prev_comm=swapper/11 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=qemu-kvm next_pid=3317 next_prio=98
qemu-kvm-3317  [011] d...2.. 3912711191572 (+6490): kvm_entry: vcpu 7

Version-Release number of selected component (if applicable): 3.10.0-327.4.1.rt56.206.el7_2.x86_64

How reproducible:

This issue is very hard to reproduce. We can reproduce it on a system fully setup for real-time KVM operation using a guest with 6 real-time CPUs. We run cyclictest in the guest and get a spike of max > 20us. Originally, this issue was reproduced in 24 hours. With high load in the host, it may be possible to trigger it in less than 12 hours.

Comment 1 Luiz Capitulino 2015-12-16 16:47:12 UTC
I've debugged this down to KVM_REQ_CLOCK_UPDATE being set on vmexit, which causes kvmclock_update_fn() to be called, which in turn will grab a spinlock.

However, kvmclock_update_fn() should never run when the kvm module is loaded with kvmclock_periodic_sync=0. This is ensured by the realtime-virtual-host tuned profile, which generates the kvm.rt.tuned.conf file:

[root@virtlab412 modprobe.d]# cat kvm.rt.tuned.conf
options kvm kvmclock_periodic_sync=0
options kvm_intel ple_gap=0
[root@virtlab412 modprobe.d]# 

As it turns out, this file is generated at every boot. It seems that there is nothing that guarantees it will be generated *before* the module is loaded. In other words, a race is possible where the kvm module is loaded before the file is generated, which will cause it to be loaded with kvmclock_periodic_sync=1 (the default setting).

I wrote a script to reboot a KVM-RT host in a loop, and I can trig this race in the 7th reboot.

Comment 2 Luiz Capitulino 2016-01-06 21:29:05 UTC
This turns out to be very specific to KVM-RT's profile setup code, so I'm taking it.

Comment 6 Tereza Cerna 2016-08-15 12:05:52 UTC
Verified in tuned-2.7.1-2.el7 - PASS

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   PASS   ] :: Command 'systemctl restart tuned' (Expected 0, got 0)
:: [   PASS   ] :: Command 'journalctl -b | grep -q "profile_switch"' (Expected 1, got 1)
:: [   PASS   ] :: Command 'tuned-adm profile test' (Expected 0, got 0)
:: [   PASS   ] :: Command 'journalctl -b | grep -q "profile_switch"' (Expected 0, got 0)
:: [   LOG    ] :: Duration: 13s
:: [   LOG    ] :: Assertions: 4 good, 0 bad
:: [   PASS   ] :: RESULT: Test

_______________________________________________________________

Reproduced in tuned-2.5.1-4.el7 - FAIL

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   PASS   ] :: Command 'systemctl restart tuned' (Expected 0, got 0)
:: [   PASS   ] :: Command 'journalctl -b | grep -q "profile_switch"' (Expected 1, got 1)
:: [   PASS   ] :: Command 'tuned-adm profile test' (Expected 0, got 0)
:: [   FAIL   ] :: Command 'journalctl -b | grep -q "profile_switch"' (Expected 0, got 1)
:: [   LOG    ] :: Duration: 12s
:: [   LOG    ] :: Assertions: 3 good, 1 bad
:: [   FAIL   ] :: RESULT: Test

Comment 8 errata-xmlrpc 2016-11-04 07:26:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2479.html