Bug 1468217
Summary: | KVM: paravirt raw_spinlock priority bump for housekeeping vcpus | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Marcelo Tosatti <mtosatti> | ||||||
Component: | kernel-rt | Assignee: | Marcelo Tosatti <mtosatti> | ||||||
kernel-rt sub component: | KVM | QA Contact: | Pei Zhang <pezhang> | ||||||
Status: | CLOSED WONTFIX | Docs Contact: | |||||||
Severity: | unspecified | ||||||||
Priority: | unspecified | CC: | bhu, chayang, daolivei, jkastner, juzhang, knoel, lcapitulino, lgoncalv, michen, mtosatti, pagupta, pbonzini, pezhang, riel, virt-maint, williams, xiywang | ||||||
Version: | 7.4 | Keywords: | FutureFeature | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-09-25 21:36:19 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Marcelo Tosatti
2017-07-06 11:21:41 UTC
Marcelo asked me to describe the reproducer for this issue here. The reproducer is just to execute a KVM-RT test-case with vcpu0 pinned to a non-isolated core. However, it's been years that we don't do that so I'd like to go back and reproduce it again before giving futher details. This may take a few days as I am very busy working on another problem. Btw, something that occurred to me is whether skew_tick=1 could fix this since the tick will tick at different times for each vcpu so there should not be contention on this raw spinlock (at least not for the particular spinlock that caused the problem, I can't tell if there could be more). Sorry if I'm confused, but... shouldn't emulator threads at least theoretically have a _higher_ FIFO priority than the vCPUs? Emulator threads can interrupt the vCPU, so if their priority is lower you deadlock as Luiz said in comment 14. update: 1. With only 1 emulator cpu and same with vCPU0, the guest fails boot up. <vcpu placement='static'>2</vcpu> <cputune> <vcpupin vcpu='0' cpuset='19'/> <vcpupin vcpu='1' cpuset='18'/> <emulatorpin cpuset='19'/> <vcpusched vcpus='0' scheduler='fifo' priority='1'/> <vcpusched vcpus='1' scheduler='fifo' priority='1'/> </cputune> 2. With 2 emulator CPUs, the guest can boot up. <cputune> <vcpupin vcpu='0' cpuset='19'/> <vcpupin vcpu='1' cpuset='18'/> <emulatorpin cpuset='1,19'/> <vcpusched vcpus='0' scheduler='fifo' priority='1'/> <vcpusched vcpus='1' scheduler='fifo' priority='1'/> </cputune> With this configuration, running latency with rteval for 3 hours, both host and guest work well, and latency value looks good, just like below: Test started at Thu Jul 27 14:47:41 CST 2017 Test duration: 3h Run rteval: y Run stress: y Isolated CPUs: 1 Kernel: 3.10.0-693.rt56.617.el7.x86_64 Kernel cmd-line: BOOT_IMAGE=/vmlinuz-3.10.0-693.rt56.617.el7.x86_64 root=/dev/mapper/rhel_bootp--73--75--90-root ro console=tty0 console=ttyS0,115200n8 biosdevname=0 crashkernel=auto rd.lvm.lv=rhel_bootp-73-75-90/root rd.lvm.lv=rhel_bootp-73-75-90/swap rhgb quiet default_hugepagesz=1G iommu=pt intel_iommu=on isolcpus=1 intel_pstate=disable nosoftlockup skew_tick=1 nohz=on nohz_full=1 rcu_nocbs=1 Machine: bootp-73-75-90.lab.eng.pek2.redhat.com CPU: Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz Results dir: /home/log_nfv-virt-rt-kvm running stress taskset -c 1 /home/nfv-virt-rt-kvm/tests/stress --cpu 1 running rteval rteval --onlyload --duration=3h --verbose starting Thu Jul 27 14:47:46 CST 2017 taskset -c 1 cyclictest -m -n -q -p95 -D 3h -h60 -i 200 -t 1 -a 1 ended Thu Jul 27 17:47:48 CST 2017 Test ended at Thu Jul 27 17:47:49 CST 2017 Latency testing results: # Min Latencies: 00005 # Avg Latencies: 00006 # Max Latencies: 00011 3. I'm testing 12 hours now, and will update the testing results once finish. Pei, are CPUs 19 an 1 isolated? Because they shouldn't for this test. Also, you should try the following test-case: 1. Shutdown the guest 2. Run a kernel build in the host and record how long it took # cd linux-$ver; make mrproper; make allyesconfig; time make -jTWICE_AS_MANY_CPUS 3. When the kernel build finishes, just start it again. And in parallel perform steps 4 and 5 below 4. Start the guest 5. Run cyclictest with rteval in the guest for a duration which is some hours longer than the kernel build duration from item 2 If the kernel build is able to finish about at the same time from item 2, then this test-case is a PASS. If the kernel build is only able to finish after rteval is killed in the guest or if you get any another hangs in the system or in the guest, this test-case is a FAIL. Btw, I don't know how serious item 1 from comment 27 is. We'd have to check if it is at all possible for OpenStack to be setup in this manner (that is, having only a single pCPU for the emulator thread and vcpu0). Created attachment 1305893 [details] Call trace info when host is hang (In reply to Luiz Capitulino from comment #28) > Pei, are CPUs 19 an 1 isolated? Because they shouldn't for this test. Hi Luiz, yes, CPU 19 is isolated. If I understand right, below testing should use an non-isolated pCPU pinned to vCPU0. Please correct me if I'm wrong. > Also, you should try the following test-case: > > 1. Shutdown the guest > 2. Run a kernel build in the host and record how long it took > > # cd linux-$ver; make mrproper; make allyesconfig; time make > -jTWICE_AS_MANY_CPUS > > 3. When the kernel build finishes, just start it again. And in parallel > perform steps 4 and 5 below > 4. Start the guest > 5. Run cyclictest with rteval in the guest for a duration which is some > hours longer than the kernel build duration from item 2 > > If the kernel build is able to finish about at the same time from item 2, > then this test-case is a PASS. If the kernel build is only able to finish > after rteval is killed in the guest or if you get any another hangs in the > system or in the guest, this test-case is a FAIL. This test-case : Fail. 4 Issues observed in the testing: (1). Build kernel is stopped when booting the guest. # time make -j40 (2). Guest boot fails. (3). Host is hang after a few minutes(10 minutes probably), not immediately. (4). Call trace shows in host like below. [ 1616.284164] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=60002 jiffies, g=242521, c=242520, q=110856) [ 1616.284165] All QSes seen, last rcu_preempt kthread activity 59999 (4296283308-4296223309), jiffies_till_next_fqs=3 [ 1616.284167] swapper/0 R running task 0 0 0 0x00080000 [ 1616.284169] ffffffff81a02480 bd007fd2bcff1381 ffff88085e603dd0 ffffffff810be946 [ 1616.284170] ffff88085e612080 ffffffff81a48a00 ffff88085e603e38 ffffffff8113b29d [ 1616.284171] 0000000000000000 ffff88085e612080 000000000001b108 0000000000000000 [ 1616.284171] Call Trace: [ 1616.284178] <IRQ> [<ffffffff810be946>] sched_show_task+0xb6/0x120 [ 1616.284182] [<ffffffff8113b29d>] rcu_check_callbacks+0x83d/0x860 [ 1616.284186] [<ffffffff81091ed1>] update_process_times+0x41/0x70 [ 1616.284189] [<ffffffff810ee720>] tick_sched_handle+0x30/0x70 [ 1616.284191] [<ffffffff810eeb49>] tick_sched_timer+0x39/0x80 [ 1616.284193] [<ffffffff810adef4>] __run_hrtimer+0xc4/0x2c0 [ 1616.284195] [<ffffffff810eeb10>] ? tick_sched_do_timer+0x50/0x50 [ 1616.284196] [<ffffffff810aee20>] hrtimer_interrupt+0x130/0x350 [ 1616.284200] [<ffffffff81047405>] local_apic_timer_interrupt+0x35/0x60 [ 1616.284204] [<ffffffff816bc61d>] smp_apic_timer_interrupt+0x3d/0x50 [ 1616.284205] [<ffffffff816bad9d>] apic_timer_interrupt+0x6d/0x80 [ 1616.284209] <EOI> [<ffffffff81527cac>] ? cpuidle_enter_state+0x5c/0xd0 [ 1616.284211] [<ffffffff81527c98>] ? cpuidle_enter_state+0x48/0xd0 [ 1616.284212] [<ffffffff81527dff>] cpuidle_idle_call+0xdf/0x2b0 [ 1616.284215] [<ffffffff810270be>] arch_cpu_idle+0xe/0x40 [ 1616.284217] [<ffffffff810e2dcc>] cpu_startup_entry+0x14c/0x1d0 [ 1616.284220] [<ffffffff81699e94>] rest_init+0x84/0x90 [ 1616.284223] [<ffffffff81b80040>] start_kernel+0x427/0x448 [ 1616.284224] [<ffffffff81b7fa22>] ? repair_env_string+0x5c/0x5c [ 1616.284226] [<ffffffff81b7f120>] ? early_idt_handler_array+0x120/0x120 [ 1616.284227] [<ffffffff81b7f5e3>] x86_64_start_reservations+0x24/0x26 [ 1616.284228] [<ffffffff81b7f732>] x86_64_start_kernel+0x14d/0x170 (This kind of call trace info will repeatedly show in #dmesg.) Full dmesg log is attached to this Comment. Testing environment: Host kernel line: # cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-693.rt56.617.el7.x86_64 root=/dev/mapper/rhel_dell--per430--09-root ro crashkernel=auto rd.lvm.lv=rhel_dell-per430-09/root rd.lvm.lv=rhel_dell-per430-09/swap console=ttyS0,115200n81 default_hugepagesz=1G iommu=pt intel_iommu=on isolcpus=2,4,6,8,10,12,14,16,18,19,17,15,13 intel_pstate=disable nosoftlockup skew_tick=1 nohz=on nohz_full=2,4,6,8,10,12,14,16,18,19,17,15,13 rcu_nocbs=2,4,6,8,10,12,14,16,18,19,17,15,13 # lscpu | grep NUMA NUMA node(s): 2 NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19 Configuration: <vcpu placement='static'>2</vcpu> <cputune> <vcpupin vcpu='0' cpuset='3'/> <vcpupin vcpu='1' cpuset='18'/> <emulatorpin cpuset='1,3,5'/> <vcpusched vcpus='0' scheduler='fifo' priority='1'/> <vcpusched vcpus='1' scheduler='fifo' priority='1'/> </cputune> Here vCPU0 uses pCPU3 which is not isolated. When this vCPU0 is pinned to a pCPU which is non-isolated, then at least 3 CPUs are needed for emulator. Otherwise guest can not boot up. > Btw, I don't know how serious item 1 from comment 27 is. We'd have to check > if it is at all possible for OpenStack to be setup in this manner (that is, > having only a single pCPU for the emulator thread and vcpu0). I'd like to confirm the configuration in OpenStack next week. Best Regards, Pei (In reply to Pei Zhang from comment #29) > Created attachment 1305893 [details] > Call trace info when host is hang > > (In reply to Luiz Capitulino from comment #28) > > Pei, are CPUs 19 an 1 isolated? Because they shouldn't for this test. > > Hi Luiz, > > yes, CPU 19 is isolated. > > If I understand right, below testing should use an non-isolated pCPU pinned > to vCPU0. Please correct me if I'm wrong. You are correct. But non-isolated CPUs for vcpu0 and the emulator threads should also have been used for the testing done in comment 27. In any case, I think that the testing you did in comment 29 shows that having vcpu0 running with fifo prio on non-isolated CPUs won't work, IMO. (In reply to Luiz Capitulino from comment #30) > (In reply to Pei Zhang from comment #29) > > Created attachment 1305893 [details] > > Call trace info when host is hang > > > > (In reply to Luiz Capitulino from comment #28) > > > Pei, are CPUs 19 an 1 isolated? Because they shouldn't for this test. > > > > Hi Luiz, > > > > yes, CPU 19 is isolated. > > > > If I understand right, below testing should use an non-isolated pCPU pinned > > to vCPU0. Please correct me if I'm wrong. > > You are correct. But non-isolated CPUs for vcpu0 and the emulator threads > should also have been used for the testing done in comment 27. > > In any case, I think that the testing you did in comment 29 shows that > having vcpu0 running with fifo prio on non-isolated CPUs won't work, IMO. So the configuration should looks like: <vcpu placement='static'>2</vcpu> <cputune> <vcpupin vcpu='0' cpuset='3'/> <vcpupin vcpu='1' cpuset='18'/> <emulatorpin cpuset='3'/> <vcpusched vcpus='1' scheduler='fifo' priority='1'/> </cputune> (pCPU3 is not isolated and vCPU0 don't set fifo:1) Do below test with this configuration. Testing results of Comment 27: Step 1~2: Shutdown guest and run kernel build in host, it takes about 15 minutes(3 runs). run 1: real 15m20.880s user 86m6.193s sys 13m51.351s run 2: real 15m22.770s user 86m15.038s sys 13m54.911s run 3: real 15m22.042s user 86m5.641s sys 13m56.440s Step 3~5: When running kernel build, start guest and run cyclictest in guest. - The kernel build can be finished no matter rteval is killed or not. - Both host and guest work well. No any error in #dmesg - The time of running kernel build is about 17minutes, which is more than about 2 minutes then above 15minutes. - I set the latency testing time to 15m. In one of the three testing, the max latency value is as high as 53us. For full log, please refer to[1]: # Min Latencies: 00005 # Avg Latencies: 00006 # Max Latencies: 00053 Note: Host environment is same as Comment 29. Reference: [1] full log of step 3~5: http://pastebin.test.redhat.com/504796 12 hours latency testing results with below configuration: No any stress in host, only did latency testing with rteval in guest. Both host and guest works well, but the latency value is very high. Configuration: <vcpu placement='static'>2</vcpu> <cputune> <vcpupin vcpu='0' cpuset='3'/> <vcpupin vcpu='1' cpuset='18'/> <emulatorpin cpuset='3'/> <vcpusched vcpus='1' scheduler='fifo' priority='1'/> </cputune> 12 hours latency testing results: # Min Latencies: 00004 # Avg Latencies: 00006 # Max Latencies: 05166 Whole testing log please refer to: http://pastebin.test.redhat.com/504992 Host kernel line: # cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-693.rt56.617.el7.x86_64 root=/dev/mapper/rhel_dell--per430--09-root ro crashkernel=auto rd.lvm.lv=rhel_dell-per430-09/root rd.lvm.lv=rhel_dell-per430-09/swap console=ttyS0,115200n81 default_hugepagesz=1G iommu=pt intel_iommu=on isolcpus=2,4,6,8,10,12,14,16,18,19,17,15,13 intel_pstate=disable nosoftlockup skew_tick=1 nohz=on nohz_full=2,4,6,8,10,12,14,16,18,19,17,15,13 rcu_nocbs=2,4,6,8,10,12,14,16,18,19,17,15,13 # lscpu | grep NUMA NUMA node(s): 2 NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19 (In reply to Pei Zhang from comment #29) > Created attachment 1305893 [details] > Call trace info when host is hang > > (In reply to Luiz Capitulino from comment #28) > > Pei, are CPUs 19 an 1 isolated? Because they shouldn't for this test. > > Hi Luiz, > > yes, CPU 19 is isolated. > > If I understand right, below testing should use an non-isolated pCPU pinned > to vCPU0. Please correct me if I'm wrong. > > > Also, you should try the following test-case: > > > > 1. Shutdown the guest > > 2. Run a kernel build in the host and record how long it took > > > > # cd linux-$ver; make mrproper; make allyesconfig; time make > > -jTWICE_AS_MANY_CPUS > > > > 3. When the kernel build finishes, just start it again. And in parallel > > perform steps 4 and 5 below > > 4. Start the guest > > 5. Run cyclictest with rteval in the guest for a duration which is some > > hours longer than the kernel build duration from item 2 > > > > If the kernel build is able to finish about at the same time from item 2, > > then this test-case is a PASS. If the kernel build is only able to finish > > after rteval is killed in the guest or if you get any another hangs in the > > system or in the guest, this test-case is a FAIL. > > > This test-case : Fail. > > > 4 Issues observed in the testing: > > (1). Build kernel is stopped when booting the guest. > # time make -j40 > > (2). Guest boot fails. > > (3). Host is hang after a few minutes(10 minutes probably), not immediately. > > (4). Call trace shows in host like below. > [ 1616.284164] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected > by 0, t=60002 jiffies, g=242521, c=242520, q=110856) > [ 1616.284165] All QSes seen, last rcu_preempt kthread activity 59999 > (4296283308-4296223309), jiffies_till_next_fqs=3 > [ 1616.284167] swapper/0 R running task 0 0 0 > 0x00080000 > [ 1616.284169] ffffffff81a02480 bd007fd2bcff1381 ffff88085e603dd0 > ffffffff810be946 > [ 1616.284170] ffff88085e612080 ffffffff81a48a00 ffff88085e603e38 > ffffffff8113b29d > [ 1616.284171] 0000000000000000 ffff88085e612080 000000000001b108 > 0000000000000000 > [ 1616.284171] Call Trace: > [ 1616.284178] <IRQ> [<ffffffff810be946>] sched_show_task+0xb6/0x120 > [ 1616.284182] [<ffffffff8113b29d>] rcu_check_callbacks+0x83d/0x860 > [ 1616.284186] [<ffffffff81091ed1>] update_process_times+0x41/0x70 > [ 1616.284189] [<ffffffff810ee720>] tick_sched_handle+0x30/0x70 > [ 1616.284191] [<ffffffff810eeb49>] tick_sched_timer+0x39/0x80 > [ 1616.284193] [<ffffffff810adef4>] __run_hrtimer+0xc4/0x2c0 > [ 1616.284195] [<ffffffff810eeb10>] ? tick_sched_do_timer+0x50/0x50 > [ 1616.284196] [<ffffffff810aee20>] hrtimer_interrupt+0x130/0x350 > [ 1616.284200] [<ffffffff81047405>] local_apic_timer_interrupt+0x35/0x60 > [ 1616.284204] [<ffffffff816bc61d>] smp_apic_timer_interrupt+0x3d/0x50 > [ 1616.284205] [<ffffffff816bad9d>] apic_timer_interrupt+0x6d/0x80 > [ 1616.284209] <EOI> [<ffffffff81527cac>] ? cpuidle_enter_state+0x5c/0xd0 > [ 1616.284211] [<ffffffff81527c98>] ? cpuidle_enter_state+0x48/0xd0 > [ 1616.284212] [<ffffffff81527dff>] cpuidle_idle_call+0xdf/0x2b0 > [ 1616.284215] [<ffffffff810270be>] arch_cpu_idle+0xe/0x40 > [ 1616.284217] [<ffffffff810e2dcc>] cpu_startup_entry+0x14c/0x1d0 > [ 1616.284220] [<ffffffff81699e94>] rest_init+0x84/0x90 > [ 1616.284223] [<ffffffff81b80040>] start_kernel+0x427/0x448 > [ 1616.284224] [<ffffffff81b7fa22>] ? repair_env_string+0x5c/0x5c > [ 1616.284226] [<ffffffff81b7f120>] ? early_idt_handler_array+0x120/0x120 > [ 1616.284227] [<ffffffff81b7f5e3>] x86_64_start_reservations+0x24/0x26 > [ 1616.284228] [<ffffffff81b7f732>] x86_64_start_kernel+0x14d/0x170 > (This kind of call trace info will repeatedly show in #dmesg.) > > Full dmesg log is attached to this Comment. > > Testing environment: > Host kernel line: > # cat /proc/cmdline > BOOT_IMAGE=/vmlinuz-3.10.0-693.rt56.617.el7.x86_64 > root=/dev/mapper/rhel_dell--per430--09-root ro crashkernel=auto > rd.lvm.lv=rhel_dell-per430-09/root rd.lvm.lv=rhel_dell-per430-09/swap > console=ttyS0,115200n81 default_hugepagesz=1G iommu=pt intel_iommu=on > isolcpus=2,4,6,8,10,12,14,16,18,19,17,15,13 intel_pstate=disable > nosoftlockup skew_tick=1 nohz=on > nohz_full=2,4,6,8,10,12,14,16,18,19,17,15,13 > rcu_nocbs=2,4,6,8,10,12,14,16,18,19,17,15,13 > > # lscpu | grep NUMA > NUMA node(s): 2 > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19 > > Configuration: > <vcpu placement='static'>2</vcpu> > <cputune> > <vcpupin vcpu='0' cpuset='3'/> > <vcpupin vcpu='1' cpuset='18'/> > <emulatorpin cpuset='1,3,5'/> > <vcpusched vcpus='0' scheduler='fifo' priority='1'/> > <vcpusched vcpus='1' scheduler='fifo' priority='1'/> > </cputune> > > > Here vCPU0 uses pCPU3 which is not isolated. When this vCPU0 is pinned to a > pCPU which is non-isolated, then at least 3 CPUs are needed for emulator. > Otherwise guest can not boot up. Ok, thanks Pei, my configuration was incorrect. I'll reproduce your test. We need a hypercall to change the priority to FIFO once boot is finished, writing that now. (In reply to Paolo Bonzini from comment #26) > Sorry if I'm confused, but... shouldn't emulator threads at least > theoretically have a _higher_ FIFO priority than the vCPUs? Emulator > threads can interrupt the vCPU, so if their priority is lower you deadlock > as Luiz said in comment 14. Damn, good point, the following can cause IO to never be processed: * If you do: 1) submit IO. 2) busy spin on some non-important program on vcpu0. The IO will only interrupt the CPU when the non-important program HLT's, which might be never. So better change the code. I'm using a hypercall to switch to SCHED_OTHER at: * BIOS initialization. * System shutdown. and to switch to SCHED_FIFO at: * System startup. But should instead: 1) hypercall to switch to FIFO:1 2) spin_lock_irqsave(spinlock_shared_with_vcpu1) 3) spin_unlock_irqsave(spinlock_shared_with_vcpu1) 4) hypercall to switch to SCHED_OTHER (In reply to Marcelo Tosatti from comment #34) > (In reply to Paolo Bonzini from comment #26) > > Sorry if I'm confused, but... shouldn't emulator threads at least > > theoretically have a _higher_ FIFO priority than the vCPUs? Emulator > > threads can interrupt the vCPU, so if their priority is lower you deadlock > > as Luiz said in comment 14. > > Damn, good point, the following can cause IO to never be processed: > > * If you do: > > 1) submit IO. > 2) busy spin on some non-important program on vcpu0. > > The IO will only interrupt the CPU when the non-important > program HLT's, which might be never. > > So better change the code. I'm using a hypercall to switch to > SCHED_OTHER at: > * BIOS initialization. > * System shutdown. > and to switch to SCHED_FIFO at: > * System startup. > > But should instead: > > 1) hypercall to switch to FIFO:1 > 2) spin_lock_irqsave(spinlock_shared_with_vcpu1) > 3) spin_unlock_irqsave(spinlock_shared_with_vcpu1) > 4) hypercall to switch to SCHED_OTHER Paolo, Luiz, Unfortunately there are about 250 functions that call raw_spinlocks, so adding this to each site is not a tradeoff. Can't see any better solution than void raw_spin_lock(spinlock_t lock) { if (cpu->isolated == false) hypercall1(ENABLE_FIFO_PRIO); __raw_spin_lock(lock); if (cpu->isolated == true) hypercall1(DISABLE_FIFO_PRIO); } Which might incur some overhead, and therefore upstream might not accept it (however, for the NFV workloads its fine, since the hotpath does not use spinlocks at all). Do you guys have any better ideas? That idea was the best one, I can't think of anything else :( Created attachment 1316490 [details]
kvm hypercalls to switch to/from FIFO prio around raw_spinlocks
|