1921601 – [RHEL8.2rt] sosreport hang in rt-kvm if the isolated vCPU of Guest fully utilised 100%

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1921601 - [RHEL8.2rt] sosreport hang in rt-kvm if the isolated vCPU of Guest fully utilised 100%

Summary: [RHEL8.2rt] sosreport hang in rt-kvm if the isolated vCPU of Guest fully util...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	kernel-rt
Sub Component:
Version:	8.2
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	8.4
Assignee:	Virtualization Maintenance
QA Contact:	Pei Zhang
Docs Contact:
URL:
Whiteboard:
Depends On:	1983167
Blocks:	1825271
TreeView+	depends on / blocked

Reported:	2021-01-28 09:46 UTC by Pradipta Kumar Sahoo
Modified:	2022-03-30 06:10 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-01-23 07:26:58 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
VM XML in OSP 16.1 (8.24 KB, text/plain) 2021-03-31 14:41 UTC, Pei Zhang	no flags	Details
View All

Description Pradipta Kumar Sahoo 2021-01-28 09:46:39 UTC

Description of problem:
In OSP16.1 Realtime Kernel(RHEL8.2), the "sosreport" command hangs if the isolated vCPU of Guest application "DPDK-TestPMD". As soon as terminate the testpmd process then the sosreport task completed.
This issue didn't experience in RHEL8.2 non-realtime kernel.

Version-Release number of selected component (if applicable):
# rpm -qa *kernel*
kernel-tools-4.18.0-193.29.1.el8_2.x86_64
kernel-tools-libs-4.18.0-193.29.1.el8_2.x86_64
kernel-rt-modules-4.18.0-193.28.1.rt13.77.el8_2.x86_64
kernel-rt-4.18.0-193.28.1.rt13.77.el8_2.x86_64
kernel-rt-core-4.18.0-193.28.1.rt13.77.el8_2.x86_64
kernel-rt-kvm-4.18.0-193.28.1.rt13.77.el8_2.x86_64


How reproducible: 100% reproducible in nfv perf lab


Steps to Reproduce:
1. Deployed environment with realtime kernel. Reference kernel boot parameter

    KVM Host:
    # cat /proc/cmdline 
    BOOT_IMAGE=(hd0,msdos2)/boot/vmlinuz-4.18.0-193.28.1.rt13.77.el8_2.x86_64 root=UUID=3092e7a5-c4de-4f9e-a24d-3da66c16e958 ro console=ttyS0 console=ttyS0,115200n81 no_timer_check crashkernel=auto rhgb quiet default_hugepagesz=1GB hugepagesz=1G hugepages=70 iommu=pt intel_iommu=on isolcpus=6-35 mitigations=off l1tf=off kvm-intel.vmentry_l1d_flush=never clocksource=tsc powernow-k8.tscsync=1 skew_tick=1 isolcpus=6-35 intel_pstate=disable nosoftlockup tsc=nowatchdog nohz=on nohz_full=6-35 rcu_nocbs=6-35 skew_tick=1 isolcpus=6-35 intel_pstate=disable nosoftlockup tsc=nowatchdog nohz=on nohz_full=6-35 rcu_nocbs=6-35

    Guest:
    # cat /proc/cmdline 
    BOOT_IMAGE=(hd0,gpt3)/boot/vmlinuz-4.18.0-193.28.1.rt13.77.el8_2.x86_64 root=UUID=c25c9e79-d48e-44ed-9da6-f033a9ad7a10 ro console=ttyS0 console=ttyS0,115200n8 no_timer_check net.ifnames=0 crashkernel=auto skew_tick=1 isolcpus=managed_irq,domain,3-13 intel_pstate=disable nosoftlockup tsc=nowatchdog nohz=on nohz_full=3-13 rcu_nocbs=3-13 default_hugepagesz=1GB hugepagesz=1G hugepages=10 iommu=pt intel_iommu=on isolcpus=3-13 mitigations=off nopti spectre_v2=off spec_store_bypass_disable=off ipv6.disable=1

2. Guest application:
    testpmd -l 3,4,5 -n 4 --huge-dir=/dev/hugepages -w 0000:00:05.0 -w 0000:00:06.0 --socket-mem 10000,0 -- --stats-period=0 --nb-cores=2 --forward-mode=io --rxd=2048 --txd=2048 --rxq=1 --txq=1 --burst=64 --mbcache=512 -a --rss-udp --disable-crc-strip --stats-period 1

3. Process Utilisation:
    Host:
Tasks: 898 total,   1 running, 897 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.6 sy,  0.0 ni, 99.2 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem : 191861.5 total, 108112.3 free,  75481.9 used,   8267.3 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 115303.8 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                
  34383 qemu      20   0   25.7g 498524  46728 S 206.3   0.3 461:13.41 qemu-kvm                                                                                                                                                               
 411788 root      20   0       0      0      0 I  16.2   0.0   0:13.63 kworker/0:1-events_power_efficient                                                                                                                                     
   2068 openvsw+  10 -10  565660  70576  34632 S   3.6   0.0  49:09.99 ovs-vswitchd                                                                                                                                                           
  26350 42436     20   0 2207356 169472  39664 S   1.3   0.1   7:48.59 nova-compute                                                                                                                                                           
 412373 root      20   0   68340   5708   4088 R   0.7   0.0   0:00.07 top                                                                                                                                                                    
     39 root      -5   0       0      0      0 S   0.3   0.0   0:11.08 rcuc/3   
                                       
    Guest
Tasks: 309 total,   1 running, 308 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  19901.5 total,   7947.9 free,  11097.7 used,    855.9 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   8594.1 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                
  32191 root      20   0  128.3g  50936  11228 S 204.7   0.2   8:02.81 testpmd                                                                                                                                                                
      1 root      20   0  245088  13968   9176 S   0.0   0.1   0:03.97 systemd                                                                                                                                                                
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.02 kthreadd                                                                                                                                                               
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                                                                                                                                                                 
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                                                                                                                                                             
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-kblockd                               

4. In this state if we run the "sosreport --batch" command, the compute node gets to hang and sometimes the sosreport process gets stuck.

Actual results: sosreport failed


Expected results: sosreport should collect system data if the vCPU busy with the application and housekeeping cores are available in KVM host.

Comment 2 Nitesh Narayan Lal 2021-01-28 15:29:59 UTC

Hi Pradipta,

Based on the guest kernel cmdline, I am assuming you are using vIOMMU and
hence userspace IOAPIC.

Can you please confirm if you are setting the emulator thread sched
priority to fifo:1?

Thanks

Comment 3 Nitesh Narayan Lal 2021-01-28 18:56:36 UTC

Pradipta, another thing that I missed is the kernel version.
It looks like you are running an older 8.2.z kernel.

Please upgrade the kernel to batch#5 or the latest candidate as some
important RT-specific kernel fixes were included in batch#5.

Comment 4 Pei Zhang 2021-01-29 07:53:37 UTC

I tried to reproduce this issue on rhel8.4 (without osp16.1), but didn't trigger this issue:

Versions:
4.18.0-279.rt7.44.el8.x86_64
openvswitch2.13-2.13.0-82.el8fdp.x86_64
tuned-2.15.0-1.el8.noarch
qemu-kvm-5.2.0-3.scrmod+el8.4.0+9533+d0097f55.wrb210120.x86_64
libvirt-7.0.0-2.scrmod+el8.4.0+9533+d0097f55.x86_64
dpdk-20.11-1.el8.x86_64

No hang when running "sosreport --batch" in RT guest which running testpmd.

Comment 11 Nitesh Narayan Lal 2021-02-03 16:45:10 UTC

Somehow a needinfo got setup for Pei which is not required so clearing it.
Thanks

Comment 12 Pradipta Kumar Sahoo 2021-02-09 06:00:03 UTC

Hi Nitesh,

As suggested in comment #10, I updated kernel patch in both Compute(rt-kvm) and Guest(rt-kernel), but there is no luck. The sosreport issue is still reproducible while vCPU engage with PMD cycle.
Please let me know if you would like take look into my setup for further investigation.

DUT(rt-kvm):
# uname -r
4.18.0-240.14.1.rt7.68.el8_3.x86_64

Guest:
# uname -r
4.18.0-240.14.1.rt7.68.el8_3.x86_64

Comment 15 Nitesh Narayan Lal 2021-02-11 16:27:05 UTC

Hi Pradipta,

Just wanted to follow-up, are you still triggering the issue?
If so, can you please share the access details of your setup for investigation?

Thanks

Comment 16 Pradipta Kumar Sahoo 2021-02-16 04:58:42 UTC

Hi Nilesh,

Sorry for my late response as I was busy with pbench test. I have reproduced the issue in RHEL 8.2, 8.3 and 8.4 real-time kernels.

# uname -r
4.18.0-287.rt7.52.el8.x86_64

This issue affecting our pbench tool to collect the system  SOS data. Here is the below reference sos command which has stuck in RT-KVM host where vCPU inside the rt-guest busy with testpmd.

root      173230  3.1  0.0 1023012 43784 ?       Sl   04:38   0:07 /usr/libexec/platform-python -s /usr/sbin/sosreport -o date -o host -o release -o kernel -o filesys -o devicemapper -o system -o memory -o hardware -o networking -o block -o processor -o collectd -o tuned --batch --quiet --tmp-dir=/var/lib/pbench-agent/trafficgen_RHOS-16.2-RHEL-8.4Beta-KERNEL-4.18.0-287.rt7.52.el8.x86_64-SRIOV-1Q-PVP-Loss-LAT_tg:trex-profile_pf:profile.json_ml:0.0005_tt:os_2021-02-16T04:38:34/sysinfo/beg/nfv-compute-rt-offload-0 --name pbench-nfv-compute-rt-offload-0.localdomain

Although this issue is not a blocker for our test, it would nice we could find out the root cause for further solution/workaround. I am going to schedule a 30min meeting with you for further investigation.

-- Pradipta

Comment 17 Nitesh Narayan Lal 2021-02-16 13:46:43 UTC

(In reply to Pradipta Kumar Sahoo from comment #16)
> Hi Nilesh,
> 
> Sorry for my late response as I was busy with pbench test. I have reproduced
> the issue in RHEL 8.2, 8.3 and 8.4 real-time kernels.

That's perfectly fine, thank you for testing. 

The 8.4 testings is also done in an OSP environment, is that right?

I am assuming that the reason why Pei has not been able to reproduce this
with 8.4 is because she was not testing on an OSP based deployment.

> 
> # uname -r
> 4.18.0-287.rt7.52.el8.x86_64
> 
> This issue affecting our pbench tool to collect the system  SOS data. Here
> is the below reference sos command which has stuck in RT-KVM host where vCPU
> inside the rt-guest busy with testpmd.
> 
> root      173230  3.1  0.0 1023012 43784 ?       Sl   04:38   0:07
> /usr/libexec/platform-python -s /usr/sbin/sosreport -o date -o host -o
> release -o kernel -o filesys -o devicemapper -o system -o memory -o hardware
> -o networking -o block -o processor -o collectd -o tuned --batch --quiet
> --tmp-dir=/var/lib/pbench-agent/trafficgen_RHOS-16.2-RHEL-8.4Beta-KERNEL-4.
> 18.0-287.rt7.52.el8.x86_64-SRIOV-1Q-PVP-Loss-LAT_tg:trex-profile_pf:profile.
> json_ml:0.0005_tt:os_2021-02-16T04:38:34/sysinfo/beg/nfv-compute-rt-offload-
> 0 --name pbench-nfv-compute-rt-offload-0.localdomain
> 

Can you also share the testpmd cmdline that you use to run in the KVM-RT
guest?

> Although this issue is not a blocker for our test, it would nice we could
> find out the root cause for further solution/workaround. I am going to
> schedule a 30min meeting with you for further investigation.

I am assuming this meeting is about walking through the steps required for
the reproduction of the issue.
For the investigation, I would still need access to your machine atleast
for the next few days.

If you are using this box for some other testing, then maybe you can
reproduce this on some other machine from the beaker.

Comment 18 Nitesh Narayan Lal 2021-02-17 13:30:05 UTC

I synced-up with Pradipta yesterday and he is going to reproduce the issue
on another beaker machine and will share the access for further investigation.

@Pradipta, in case you are overloaded with other testing efforts please let
us know, we can check if Pei has the bandwidth to reproduce this in her OSP
environment. Although, that may take some time as she is currently busy
with some other activities.

Thanks

Comment 19 Pradipta Kumar Sahoo 2021-02-22 04:01:11 UTC

Hi Nitesh,

To build another OSP rt-kernel environment took more time as nic templates and configurations are not identical in beaker and other scale-lab servers.
I will plan to share my environment after the latency test. I will share you the environment details once the current tests are completed (~ 1week).

BR,
Pradipta

Comment 20 Nitesh Narayan Lal 2021-02-22 13:46:00 UTC

Hi Pradipta,

Thanks for the update.

Comment 24 Nitesh Narayan Lal 2021-03-05 16:44:42 UTC

Hi Pradipta,

Did you complete your testing in the environment?

Thanks

Comment 25 Nitesh Narayan Lal 2021-03-17 18:01:42 UTC

Hi Pradipta,

Any new update on the environment or the reproducibility of the issue?

Thanks

Comment 26 Pradipta Kumar Sahoo 2021-03-23 15:36:10 UTC

Hi Nitesh,

I am occupied with the latency measurement test. We are hitting some higher packet latency in the same environment where we RHEL8.4 latest rt-kernel. Please allow us some time, the issue is still reproduced in my environment irrespective to any version of rt-kernel.

Please let me know if you need any specific information for further investigation. So I can collect it. Else I can schedule a meeting with you for further syncup to reproduce the issue.

BR,
Pradipta

Comment 27 Nitesh Narayan Lal 2021-03-23 15:51:52 UTC

Hi Pradipta,

It would be faster if I can get access to the environment for a few days
for further investigation of the issue.
One thing that you can check if you have not done it already is dmesg after
the sosreport command stuck.
If there is an RCU stall then we can capture vmcore for further
investigation.

A couple of other experiments that we can do are:

- Try bumping up the priority of the sosreport command/process by using chrt

- Using stalld

Have you tried any of those?

Since you are occupied, let's also check with Pei if she was able to reproduce
it at her end.

@pezhang did you try reproducing this issue with OSP deployment?

Thanks
Nitesh

Comment 28 Pei Zhang 2021-03-31 14:29:18 UTC

(In reply to Nitesh Narayan Lal from comment #27)
> Hi Pradipta,
> 
> It would be faster if I can get access to the environment for a few days
> for further investigation of the issue.
> One thing that you can check if you have not done it already is dmesg after
> the sosreport command stuck.
> If there is an RCU stall then we can capture vmcore for further
> investigation.
> 
> A couple of other experiments that we can do are:
> 
> - Try bumping up the priority of the sosreport command/process by using chrt
> 
> - Using stalld
> 
> Have you tried any of those?
> 
> Since you are occupied, let's also check with Pei if she was able to
> reproduce
> it at her end.
> 
> @pezhang did you try reproducing this issue with OSP deployment?

Hello Nitesh,

I deployed the OSP 16.1 + rhel8.2.z ready. However I still failed to reproduce this issue. The compute node is not hang when running "sosreport --batch" and VM is running testpmd.

In VM:

Port statistics ====================================
  ######################## NIC statistics for port 0  ########################
  RX-packets: 11029174   RX-missed: 0          RX-bytes:  772042576
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 11029617   TX-errors: 0          TX-bytes:  772073190

  Throughput (since last show)
  Rx-pps:        75711          Rx-bps:     42398264
  Tx-pps:        75712          Tx-bps:     42398824
  ############################################################################

  ######################## NIC statistics for port 1  ########################
  RX-packets: 11029675   RX-missed: 0          RX-bytes:  772077250
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 11029220   TX-errors: 0          TX-bytes:  772045400

  Throughput (since last show)
  Rx-pps:        75723          Rx-bps:     42405368
  Tx-pps:        75721          Tx-bps:     42404248
  ############################################################################


[heat-admin@overcloud-computerealtime-0 ~]$ top

top - 14:26:28 up  6:25,  2 users,  load average: 4.71, 2.17, 1.29
Tasks: 813 total,   3 running, 810 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.5 us,  4.6 sy,  0.0 ni, 88.7 id,  0.0 wa,  0.0 hi,  1.2 si,  0.0 st
MiB Mem :  63939.1 total,  20074.4 free,  43139.2 used,    725.6 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  20514.0 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                
  72870 qemu      20   0   11.0g 321640  46672 S 209.3   0.5  13:32.92 qemu-kvm                                                                                                               
  99320 root      20   0       0      0      0 R  80.8   0.0   0:20.16 vhost-72870                                                                                                            
  99582 root      20   0       0      0      0 R  79.5   0.0   0:19.93 vhost-72870                                                                                                            
   2096 openvsw+  10 -10  565160  66240  34636 S  12.3   0.1   0:33.77 ovs-vswitchd                                                                                                           
   4509 root      20   0 1770624  33396  19332 S   2.0   0.1   0:48.23 libvirtd                                                                                                               
  14807 heat-ad+  20   0  161180   4916   3572 S   1.3   0.0   0:03.09 sshd                                                                                                                   
  96081 root      20   0  257816  15724  13884 S   1.3   0.0   0:02.61 virsh                                                                                                                  
   4371 root      20   0  177028  15512  13812 S   1.0   0.0   0:04.81 virtlogd                                                                                                               
 100241 heat-ad+  20   0   68228   5668   4140 R   0.7   0.0   0:00.79 top                                                                                                                    
     11 root      -2   0       0      0      0 I   0.3   0.0   0:03.74 rcu_preempt                                                                                                            
     65 root      -5   0       0      0      0 S   0.3   0.0   0:02.78 rcuc/6                                                                                                                 
     82 root      -5   0       0      0      0 S   0.3   0.0   0:02.87 rcuc/8                                                                                                                 
     83 root      -3   0       0      0      0 S   0.3   0.0   0:15.73 ksoftirqd/8                                                                                                            
   2215 root     -51   0       0      0      0 S   0.3   0.0   0:03.95 irq/85-ens3f0-T                                                                                                        
   3123 root      20   0  101120  14588   8860 S   0.3   0.0   0:23.75 snmpd                                                                                                                  
   4174 42435     20   0  311792  99236  14780 S   0.3   0.2   0:08.24 /usr/bin/python                                                                                                        
  87615 root      20   0       0      0      0 I   0.3   0.0   0:00.76 kworker/u64:0-events_unbound                                                                                           
  99699 root      20   0       0      0      0 I   0.3   0.0   0:00.24 kworker/u64:2-ixgbe                                                                                                    
      1 root      20   0  245968  13524   7968 S   0.0   0.0   0:23.12 systemd                                                                                                                
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.03 kthreadd                                                                                                               
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                                                                                                                 
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                                                                                                             
      9 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq                                                                                                           
     10 root      -3   0       0      0      0 S   0.0   0.0   0:06.80 ksoftirqd/0                                                                                                            
     12 root      -5   0       0      0      0 S   0.0   0.0   0:00.00 rcub/1                                                                                                                 
     13 root      -5   0       0      0      0 S   0.0   0.0   0:02.32 rcuc/0                                                                                                                 
     14 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 posixcputmr/0                                                                                                          
     15 root      rt   0       0      0      0 S   0.0   0.0   0:00.01 migration/0                                                                                                            
     16 root      20   0       0      0      0 P   0.0   0.0   0:00.00 watchdog/0                                                                                                             
     17 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/0                                                                                                                
     18 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/1                                                                                                                
     19 root      20   0       0      0      0 P   0.0   0.0   0:00.00 watchdog/1                                                                                                             
     20 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/1                                                                                                            
     21 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 posixcputmr/1                                                                                                          
     22 root      -5   0       0      0      0 S   0.0   0.0   0:00.00 rcuc/1                                                                                                                 
     23 root      -3   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/1                                                                                                            
     24 root      20   0       0      0      0 I   0.0   0.0   0:00.00 kworker/1:0-mm_percpu_wq                                                                                               
     25 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/1:0H     


[heat-admin@overcloud-computerealtime-0 ~]$ sosreport --batch

sosreport (version 3.8)

WARNING: unable to set option for disabled or non-existing plugin (process)
WARNING: unable to set option for disabled or non-existing plugin (process)


This command will collect diagnostic and configuration information from
this Red Hat Enterprise Linux system and installed applications.

An archive containing the collected information will be generated in
/var/tmp/sos.avq7srox and may be provided to a Red Hat support
representative.

Any information provided to Red Hat will be treated in accordance with
the published support policies at:

  https://access.redhat.com/support/

The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.

No changes will be made to system configuration.


 Setting up archive ...
 Setting up plugins ...
Not all environment variables set. Source the environment file for the user intended to connect to the OpenStack environment.
 Running plugins. Please wait ...

  Starting 1/1   openstack_aodh  [Running: openstack_aodh]

  Finished running plugins

Creating compressed archive...

Your sosreport has been generated and saved in:
  /var/tmp/sosreport-overcloud-computerealtime-0-2021-03-31-zjlcmzb.tar.xz

The checksum is: 57022b4d60d5eb108064510ed11cf809

Please send this file to your support representative.

Comment 30 Pei Zhang 2021-03-31 14:41:09 UTC

Created attachment 1768106 [details]
VM XML in OSP 16.1

Comment 31 Nitesh Narayan Lal 2021-03-31 14:46:44 UTC

Thank you Pei for testing.

Pradipta, Since Pei is failing to reproduce the issue, I am thinking if we
have some misconfiguration that is causing this issue in your environment.

Can you please share your KVM-RT XML, so that we can compare it with Pei's
configuration?

Thanks

Comment 38 Nitesh Narayan Lal 2021-04-15 17:09:50 UTC

Thanks, Pei, for reproducing the issue and for sharing the environment.

So as I suspected this is a typical case of starvation caused by the testpmd
application that occupies 100% of guest vCPU which is running with SCHED_FIFO.
Although, I am not fully aware of all the actions performed by sosreport
command but I would assume it involves doing syscalls, involving workers,
etc. that could be blocked/starved when they try to run on isolated CPUs
occupied by the testpmd application.

A piece of clear evidence for this is in the dmesg on both non-debug
and the debug rt-kernel:


Non-debug-kernel:
----------------
[169350.668608] INFO: task sysctl:2345101 blocked for more than 600 seconds.
[169350.676180]       Not tainted 4.18.0-304.rt7.71.el8.x86_64 #1
[169350.682686] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[169350.691519] sysctl          D    0 2345101 2345099 0x00084084
[169350.698026] Call Trace:
[169350.700850]  __schedule+0x34d/0x880
[169350.704838]  schedule+0x56/0xe0
[169350.708437]  schedule_timeout+0x250/0x400
[169350.713005]  ? preempt_count_add+0x68/0xa0
[169350.717670]  ? _raw_spin_lock+0x13/0x40
[169350.722043]  ? _raw_spin_unlock_irqrestore+0x20/0x60
[169350.727679]  wait_for_completion+0x85/0xe0
[169350.732344]  flush_work+0x123/0x1d0
[169350.736331]  ? flush_workqueue_prep_pwqs+0x130/0x130
[169350.741966]  ? __queue_work+0x1aa/0x590
[169350.746340]  schedule_on_each_cpu+0xa7/0xe0
[169350.751103]  vmstat_refresh+0x21/0xa0
[169350.755283]  proc_sys_call_handler+0x1ad/0x1c0
[169350.760336]  vfs_read+0x91/0x140
[169350.764031]  ksys_read+0x52/0xc0
[169350.767728]  do_syscall_64+0x87/0x1a0
[169350.771908]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[169350.777639] RIP: 0033:0x7f900bca85a5
[169350.781723] Code: Unable to access opcode bytes at RIP 0x7f900bca857b.
[169350.789102] RSP: 002b:00007fff3275fc18 EFLAGS: 00000246 ORIG_RAX: 0000000000000000


Debug-kernel:
------------
[  840.792750] Using feature eBPF/rawtrace.
[  941.338161] megaraid_sas 0000:01:00.0: Firmware crash dump is not available
[  945.692390] megaraid_sas 0000:01:00.0: Firmware crash dump is not available
[ 1872.664007] INFO: task sosreport:7544 blocked for more than 600 seconds.
[ 1872.671491]       Tainted: G        W        --------- -  - 4.18.0-304.rt7.71.el8.x86_64+debug #1
[ 1872.681393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1872.690132] sosreport       D23504  7544   2765 0x00084080
[ 1872.696266] Call Trace:
[ 1872.699004]  __schedule+0x91d/0x2320
[ 1872.702999]  ? __sched_text_start+0x8/0x8
[ 1872.707477]  ? kernel_text_address+0x125/0x140
[ 1872.712437]  ? __kernel_text_address+0xe/0x30
[ 1872.717301]  ? unwind_get_return_address+0x56/0xa0
[ 1872.722642]  ? stack_trace_save+0xb0/0xb0
[ 1872.727179]  schedule+0x101/0x290
[ 1872.730882]  schedule_timeout+0x6f6/0xc90
[ 1872.735360]  ? usleep_range+0x130/0x130
[ 1872.739645]  ? do_profile_hits.isra.5.cold.13+0x2d/0x2d
[ 1872.745482]  ? lock_release+0x4a1/0xb60
[ 1872.749763]  ? lock_release+0x4a1/0xb60
....



The simple solution here is to use stalld as I suggested in comment 27.
Stalld will allow the runnable tasks that are waiting/blocked because of a
thread that is occupying 100% of CPU to run with SCHED_DEADLINE runtime/period.

We can see stalld boosting the tasks in its logs:

Apr 15 13:06:11 dell-per430-11.lab.eng.pek2.redhat.com stalld[12425]:          cpu 14 had 301682 idle time, and now has 302182
Apr 15 13:06:11 dell-per430-11.lab.eng.pek2.redhat.com stalld[12425]:          cpu 15 had 301640 idle time, and now has 302140
Apr 15 13:06:11 dell-per430-11.lab.eng.pek2.redhat.com stalld[12425]:          cpu 16 had 294651 idle time, and now has 295142
Apr 15 13:06:11 dell-per430-11.lab.eng.pek2.redhat.com stalld[12425]:          cpu 17 had 301678 idle time, and now has 302179
Apr 15 13:06:11 dell-per430-11.lab.eng.pek2.redhat.com stalld[12425]:          cpu 18 had 299678 idle time, and now has 300177
Apr 15 13:06:11 dell-per430-11.lab.eng.pek2.redhat.com stalld[12425]:          cpu 19 had 301677 idle time, and now has 302177
Apr 15 13:06:13 dell-per430-11.lab.eng.pek2.redhat.com stalld[12425]: stalld: turbostat-21492 starved on CPU 1 for 33 seconds
Apr 15 13:06:13 dell-per430-11.lab.eng.pek2.redhat.com stalld[12425]: stalld: boosted pid 21492 using SCHED_DEADLINE
Apr 15 13:06:13 dell-per430-11.lab.eng.pek2.redhat.com stalld[12425]: turbostat-21492 starved on CPU 1 for 33 seconds
Apr 15 13:06:13 dell-per430-11.lab.eng.pek2.redhat.com stalld[12425]: boosted pid 21492 using SCHED_DEADLINE


Note: CPU 1 is the same CPU where the guest vCPU that is running testpmd thread, is pinned.

With stalld, I didn't see a hung in Pei's environment 5/5 times.

@Pradipta can you please try to reproduce the issue in your environment with stalld?

Comment 39 Nitesh Narayan Lal 2021-04-15 17:13:16 UTC

Pradipta to install and enable stalld on 8.4, please do the following:

# yum install -y stalld

# systemctl enable stalld.service

# systemctl start stalld.service

Then try running the testpmd and sosreport --batch command.

Thanks

Comment 42 Nitesh Narayan Lal 2021-04-16 22:43:28 UTC

As per comment 16 from Pradipta, this issue is not a blocker.
Hence, reducing the priority and severity of the Bug.

I will still be looking into it to identify the exact process/thread
that is trying to enqueue the jobs on the isolated CPUs resulting in
this hang, that too only when we are using PF.

Thanks

Comment 43 Marcelo Tosatti 2021-04-16 23:36:53 UTC

(In reply to Nitesh Narayan Lal from comment #42)
> As per comment 16 from Pradipta, this issue is not a blocker.
> Hence, reducing the priority and severity of the Bug.
> 
> I will still be looking into it to identify the exact process/thread
> that is trying to enqueue the jobs on the isolated CPUs resulting in
> this hang, that too only when we are using PF.
> 
> Thanks

Nitesh, 

I think its easy to reproduce the problem as follows:

1) Boot host (or guest VM on host without realtime...) with isolcpus=CPULIST.
2) Run oslat on some CPU(s) of CPULIST, with FIFO priority.
3) echo 1 > /proc/sys/vm/stat_refresh

(no need to investigate the PF/non-PF issue, IMHO).

Agree its low priority.

Comment 44 Nitesh Narayan Lal 2021-04-17 02:43:11 UTC

(In reply to Marcelo Tosatti from comment #43)
> (In reply to Nitesh Narayan Lal from comment #42)
> > As per comment 16 from Pradipta, this issue is not a blocker.
> > Hence, reducing the priority and severity of the Bug.
> > 
> > I will still be looking into it to identify the exact process/thread
> > that is trying to enqueue the jobs on the isolated CPUs resulting in
> > this hang, that too only when we are using PF.
> > 
> > Thanks
> 
> Nitesh, 
> 
> I think its easy to reproduce the problem as follows:
> 
> 1) Boot host (or guest VM on host without realtime...) with isolcpus=CPULIST.
> 2) Run oslat on some CPU(s) of CPULIST, with FIFO priority.
> 3) echo 1 > /proc/sys/vm/stat_refresh
> 

Agree, vmstat_refresh is pretty easy to reproduce.

> (no need to investigate the PF/non-PF issue, IMHO).

I think it is good to atleast understand what is happening there with
PF and without PF so that we don't ignore any issue.

Comment 48 Pradipta Kumar Sahoo 2021-04-23 04:02:44 UTC

(In reply to Nitesh Narayan Lal from comment #39)
> Pradipta to install and enable stalld on 8.4, please do the following:
> 
> # yum install -y stalld
> 
> # systemctl enable stalld.service
> 
> # systemctl start stalld.service
> 
> Then try running the testpmd and sosreport --batch command.
> 
> Thanks

HI Nitesh/Pei,

Thank you so much as you are able to reproduce the issue. As my environment is broken due network issue, please allow me some to bring it up. I will install stalld service and rerun the sosreport during the workload. I will update my finding soon.

BR,
Pradipta

Comment 49 Pradipta Kumar Sahoo 2021-05-06 05:49:59 UTC

Hi Nitesh,

Installed stalld package and start the service in OSP compute node. 

Interminetly reproduced the issue in scenario2.

Scenario1:
1. run the testpmd inside the rt-guest vm.
2. Run the sos "sosreport --batch" in compute node.
3. It took ~4-5min to finish all the sosplugin and able to generate the sosreport.


Scenario2:
1. Run the testpmd inside the rt-guest vm and process the traffic into it.
2. Run the sos "sosreport --batch --all-logs" in compute node.
3. The 1st sample test got failed with processor plugin timeout and the ssh conneciton broken.

~~~
  Starting 129/129 yum             [Running: logs processor xfs yum]
  Finishing plugins              [Running: logs processor yum]
  Finishing plugins              [Running: logs processor]
  Finishing plugins              [Running: processor]

 Plugin processor timed out


client_loop: send disconnect: Broken pipe
~~~

4. In the 2nd sample we still noticed the processor plugin timeout, but this time it create sosreport archive.
~~~
  Starting 129/129 yum             [Running: logs processor systemd yum]
  Finishing plugins              [Running: logs processor yum]
  Finishing plugins              [Running: logs processor]
  Finishing plugins              [Running: processor]

 Plugin processor timed out


Creating compressed archive...

Your sosreport has been generated and saved in:
        /var/tmp/sosreport-nfv-compute-rt-offload-0-2021-05-06-duotxba.tar.xz

 Size   556.60MiB
 Owner  root
 md5    077b195b6e2072f87a2353bd23ea4aa7

Please send this file to your support representative.
~~~

5. Stalld log starve the CPU which I didn't find any issue. But is there any further tunning required in stalld to avoid plugin timeout in sosreport.
~~~
May  6 05:41:42 nfv-compute-rt-offload-0 stalld[838604]: turbostat-885072 starved on CPU 8 for 30 seconds
May  6 05:41:42 nfv-compute-rt-offload-0 stalld[838604]: boosted pid 885072 using SCHED_DEADLINE
May  6 05:41:45 nfv-compute-rt-offload-0 stalld[838604]: turbostat-892562 starved on CPU 8 for 33 seconds
May  6 05:41:45 nfv-compute-rt-offload-0 stalld[838604]: boosted pid 892562 using SCHED_DEADLINE
May  6 05:41:48 nfv-compute-rt-offload-0 stalld[838604]: turbostat-897929 starved on CPU 8 for 36 seconds
May  6 05:41:48 nfv-compute-rt-offload-0 stalld[838604]: boosted pid 897929 using SCHED_DEADLINE
May  6 05:42:07 nfv-compute-rt-offload-0 stalld[838604]: turbostat-897929 might starve on CPU 34 (waiting for 15 seconds)
May  6 05:42:22 nfv-compute-rt-offload-0 stalld[838604]: turbostat-897929 starved on CPU 34 for 30 seconds
May  6 05:42:22 nfv-compute-rt-offload-0 stalld[838604]: boosted pid 897929 using SCHED_DEADLINE
~~~

Comment 50 Nitesh Narayan Lal 2021-05-06 13:53:58 UTC

(In reply to Pradipta Kumar Sahoo from comment #49)
> Hi Nitesh,
> 

Hi Pradipta,

Thank you for testing.

> Installed stalld package and start the service in OSP compute node. 
> 
> Interminetly reproduced the issue in scenario2.
> 

<snip>

> Your sosreport has been generated and saved in:
>         /var/tmp/sosreport-nfv-compute-rt-offload-0-2021-05-06-duotxba.tar.xz
> 
>  Size   556.60MiB
>  Owner  root
>  md5    077b195b6e2072f87a2353bd23ea4aa7
> 
> Please send this file to your support representative.
> ~~~
> 
> 5. Stalld log starve the CPU which I didn't find any issue. But is there any
> further tunning required in stalld to avoid plugin timeout in sosreport.
> ~~~
> May  6 05:41:42 nfv-compute-rt-offload-0 stalld[838604]: turbostat-885072
> starved on CPU 8 for 30 seconds
> May  6 05:41:42 nfv-compute-rt-offload-0 stalld[838604]: boosted pid 885072
> using SCHED_DEADLINE
> May  6 05:41:45 nfv-compute-rt-offload-0 stalld[838604]: turbostat-892562
> starved on CPU 8 for 33 seconds
> May  6 05:41:45 nfv-compute-rt-offload-0 stalld[838604]: boosted pid 892562
> using SCHED_DEADLINE
> May  6 05:41:48 nfv-compute-rt-offload-0 stalld[838604]: turbostat-897929
> starved on CPU 8 for 36 seconds
> May  6 05:41:48 nfv-compute-rt-offload-0 stalld[838604]: boosted pid 897929
> using SCHED_DEADLINE
> May  6 05:42:07 nfv-compute-rt-offload-0 stalld[838604]: turbostat-897929
> might starve on CPU 34 (waiting for 15 seconds)
> May  6 05:42:22 nfv-compute-rt-offload-0 stalld[838604]: turbostat-897929
> starved on CPU 34 for 30 seconds
> May  6 05:42:22 nfv-compute-rt-offload-0 stalld[838604]: boosted pid 897929
> using SCHED_DEADLINE
> ~~~

We can try a few things with the stalld configuration, however before doing
that can you tell what kind of impact did you observe on the testpmd
performance numbers with stalld enabled?

Having said that scenario2 does indicate that fixing these issues at the source
is actually the better way to do things.

Comment 51 Pradipta Kumar Sahoo 2021-05-26 04:56:09 UTC

Hi Nitesh,

Apology for my late response.

Our performance tool "pbench-trafficgen" usually collect the SOS report from required nodes before the beginning of the network test, so earlier during the test, our tool was stuck while collecting the sosreport of rt-compute where pCPU was reserved for PMD thread inside the Guest

After enabling stalled in compute node(DUT), we noticed Trafficgen tool doesn't stuck at sosreport collection. Also, we received similar performance report with and without stalld in compute node.
Please find the below results and share any best-practice while stalld for FDP performance test with rt-kernel.

VM(TestPMD) information:
------------------------
# virsh vcpupin instance-00000002
 VCPU   CPU Affinity
----------------------
 0      26
 1      32
 2      22
 3      28
 4      34 --->> PMD Core1
 5      8  --->> PMD Core2
 6      30
 7      10
 8      16
 9      6
 10     12
 11     18
 12     24
 13     14

testpmd -l 3,4,5 -n 4 --huge-dir=/dev/hugepages -w 0000:00:05.0 -w 0000:00:06.0 --socket-mem 10000,0 -- --nb-cores=2 --forward-mode=io --rxd=2048 --txd=2048 --rxq=1 --txq=1 --burst=64 --mbcache=512 -a --rss-udp --disable-crc-strip --stats-period 1



Network Performance Test with Stalld Service:
---------------------------------------------
1. Started the stalled service in DUT
@nfv-compute-rt-offload-0 ~]# systemctl status stalld
_ stalld.service - Stall Monitor
   Loaded: loaded (/usr/lib/systemd/system/stalld.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2021-05-24 06:04:12 UTC; 1 day 21h ago
 Main PID: 1480 (stalld)
    Tasks: 2 (limit: 768443)
   Memory: 1.6M
   CGroup: /system.slice/stalld.service
           __1480 /usr/bin/stalld --systemd -p 1000000000 -r 20000 -d 3 -t 30 --foreground --pidfile /run/stalld.pid

2. Ran the trafficgen test enabling to collect sosreport of all the nodes including DUT and TestPMD VMs. During the execution, it successfully collected sosreport before start the trafficgen test.

3. It looks like stalld service always keeps monitoring and continuously do the actions like turbo-start starve, boosted, kworker starve on pCPU which are pinned to PMD process inside the guest.

# grep stalld /var/log/messages
May 26 02:52:03 nfv-compute-rt-offload-0 stalld[1480]: turbostat-476138 might starve on CPU 8 (waiting for 15 seconds)
May 26 02:52:18 nfv-compute-rt-offload-0 stalld[1480]: turbostat-476138 starved on CPU 8 for 30 seconds
May 26 02:52:18 nfv-compute-rt-offload-0 stalld[1480]: boosted pid 476138 using SCHED_DEADLINE
May 26 02:52:56 nfv-compute-rt-offload-0 stalld[1480]: turbostat-476138 starved on CPU 8 for 30 seconds
May 26 02:52:56 nfv-compute-rt-offload-0 stalld[1480]: boosted pid 476138 using SCHED_DEADLINE
May 26 02:53:34 nfv-compute-rt-offload-0 stalld[1480]: turbostat-476138 starved on CPU 8 for 30 seconds
May 26 02:53:34 nfv-compute-rt-offload-0 stalld[1480]: boosted pid 476138 using SCHED_DEADLINE
May 26 02:53:52 nfv-compute-rt-offload-0 stalld[1480]: turbostat-479184 starved on CPU 8 for 33 seconds
May 26 02:53:52 nfv-compute-rt-offload-0 stalld[1480]: boosted pid 479184 using SCHED_DEADLINE
May 26 02:54:13 nfv-compute-rt-offload-0 stalld[1480]: turbostat-479184 might starve on CPU 34 (waiting for 15 seconds)
May 26 02:54:15 nfv-compute-rt-offload-0 stalld[1480]: kworker/8:1-719 starved on CPU 8 for 33 seconds
May 26 02:54:15 nfv-compute-rt-offload-0 stalld[1480]: boosted pid 719 using SCHED_DEADLINE
May 26 02:54:18 nfv-compute-rt-offload-0 stalld[1480]: turbostat-476138 starved on CPU 8 for 36 seconds
May 26 02:54:18 nfv-compute-rt-offload-0 stalld[1480]: boosted pid 476138 using SCHED_DEADLINE
May 26 02:54:28 nfv-compute-rt-offload-0 stalld[1480]: turbostat-479184 starved on CPU 34 for 30 seconds
May 26 02:54:28 nfv-compute-rt-offload-0 stalld[1480]: boosted pid 479184 using SCHED_DEADLINE
May 26 02:54:56 nfv-compute-rt-offload-0 stalld[1480]: turbostat-476138 starved on CPU 8 for 30 seconds
May 26 02:54:56 nfv-compute-rt-offload-0 stalld[1480]: boosted pid 476138 using SCHED_DEADLINE

4. We collected the network test with stalld where we received a 42Mpps line rate. Please find the below results details for further reference including sosreport.
http://pbench.perf.lab.eng.bos.redhat.com/results/perf122.perf.lab.eng.bos.redhat.com/nfv-osp16.2-rhel8.4rt-trial/trafficgen_RHOS-16.2-RHEL-8.4-KERNEL-4.18.0-296.rt7.63.el8.x86_64-OVS-HW-OFFLOAD-1Q-PVP-BS_tg:trex_r:none_fs:64_nf:1024_fm:si_td:bi_ml:0.002_tt:bs_2021-05-26T02:47:07/


Network Performance Test without Stalld Service:
------------------------------------------------
1. Stopped the stalld service in DUT.
[root@nfv-compute-rt-offload-0 ~]# systemctl status stalld
_ stalld.service - Stall Monitor
   Loaded: loaded (/usr/lib/systemd/system/stalld.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Wed 2021-05-26 04:03:47 UTC; 3s ago
  Process: 550006 ExecStopPost=/usr/bin/throttlectl on (code=exited, status=0/SUCCESS)
  Process: 1480 ExecStart=/usr/bin/stalld --systemd $CLIST $AGGR $BP $BR $BD $THRESH $LOGGING $FG $PF (code=exited, status=0/SUCCESS)
 Main PID: 1480 (code=exited, status=0/SUCCESS)

2. Ran the trafficgen test disabling to collect the sosreport as it is expected sosreport command hangs as the testpmd is running inside the guest.

3. Also we received a 42Mpps rate of traffic without having stalld service in DUT.
http://pbench.perf.lab.eng.bos.redhat.com/results/perf122.perf.lab.eng.bos.redhat.com/nfv-osp16.2-rhel8.4rt-trial/trafficgen_RHOS-16.2-RHEL-8.4-KERNEL-4.18.0-296.rt7.63.el8.x86_64-OVS-HW-OFFLOAD-1Q-PVP-BS_tg:trex_r:none_fs:64_nf:1024_fm:si_td:bi_ml:0.002_tt:bs_2021-05-26T04:06:44/

Hence so far we didn't notice any performance impact of using stalld service in the rt-compute node.


Regards,
Pradipta

Comment 52 Nitesh Narayan Lal 2021-05-26 13:54:01 UTC

Hi Pradipta,

Thank you for sharing a detailed report.

Yes, stalld remains active in the background. Whenever it finds a
starving thread in the run queue, it allows that thread to make progress
by giving it a few time slices as per the configuration in /etc/sysconfig/stalld.

By default, the boost run time (BR) is set to 10us.

Ideally, we would want these issues to be fixed in the kernel so that these
threads never try to run anything on isolated CPUs in the first place. This is
because having stalld enabled for latency-sensitive workload adds overhead.

Marcelo is already working on one of the issues but it will take time to first
upstream it and then to get that fix downstream. In the meanwhile, for this
particular use case, I think we can use stalld as an acceptable solution,
specifically because we are not seeing any impact on the performance.

@Pradipta do you agree with the above?

In terms of best practices, I think stalld configuration may change for
different workloads. But as long as the default configuration is not
impacting the results we should stick with it (IMHO).

@Clark do you have any suggestion about the best practices while enabling
stalld for FDP performance test with rt-kernel?


Thanks
Nitesh

Comment 53 Clark Williams 2021-05-27 15:29:51 UTC

(In reply to Nitesh Narayan Lal from comment #52)
> 
> @Clark do you have any suggestion about the best practices while enabling
> stalld for FDP performance test with rt-kernel?

Nitesh, my only concern currently is that dropping the boost runtime to 10us may
push us into a diminishing returns situation, where the overhead of switching
cuts into the runtime so that the boosted task only gets a few cycles to make
progress. A boosted thread gets 3 boosts, so with the current settings that means
10us every second for 30us total boost time. If the thread doesn't complete it's
work in that period it will be stalled again and have to repeat the detection
and boost phases from stalld. 

I'd say monitor the stalld activity and if we see it boosting the same thread all
the time, change the boost runtime to 15us and see if that helps.

Comment 54 Nitesh Narayan Lal 2021-05-27 16:09:02 UTC

(In reply to Clark Williams from comment #53)
> (In reply to Nitesh Narayan Lal from comment #52)
> > 
> > @Clark do you have any suggestion about the best practices while enabling
> > stalld for FDP performance test with rt-kernel?
> 
> Nitesh, my only concern currently is that dropping the boost runtime to 10us
> may
> push us into a diminishing returns situation, where the overhead of switching
> cuts into the runtime so that the boosted task only gets a few cycles to make
> progress. A boosted thread gets 3 boosts, so with the current settings that
> means
> 10us every second for 30us total boost time. If the thread doesn't complete
> it's
> work in that period it will be stalled again and have to repeat the detection
> and boost phases from stalld. 
> 
> I'd say monitor the stalld activity and if we see it boosting the same
> thread all
> the time, change the boost runtime to 15us and see if that helps.

Ah, that's an excellent point, and looking at the log it does look like
stalld is boosting turbostat multiple times.

@Pradipta, what we can try is to set the BR value in /etc/sysconfig/stalld
to 15000ns and see if that helps in reducing the number of times stalld is
boosting the turbostat process.

Also, please see if there is a performance degradation because of this in
your test-case.

There are a few possibilities here:
- If there is no performance degradation with 15us BR and the starving
  threads are not getting boosted multiple times then we can use that
  stalld configuration in your testing.

- If somehow there is a performance impact because of it then perhaps we
  will have to stick with 10us

- If there is no performance degradation but the threads are still getting
  boosted multiple times, I would say to also try with 20us (20000ns) while
  you are at it.

Please let me know what do you think.

Thanks

Comment 56 Pradipta Kumar Sahoo 2021-06-03 11:43:17 UTC

Hi Nitesh/Clark,


Regarding dropping boost time to 10us (highlighted on comment #53), I believe it is important for us to revalidate the traffic test with rt-kvm using stalld service.
In my last test, I executed the traffic test for a 10min duration which is not sufficient to validate traffic performance with rt-kvm using stalld.

Also, I recently upgraded my lab environment with the latest 8.4 kernel and yet to redeploy the environment with the latest rt-kernel.
Due to other timebound activity, we will re-execute the test as soon the lab upgraded with the latest rt-kernel. Please keep share if there is any stable rt-kernel and stalld version we will perform the test, so I customised the image accordingly.

Also, I will syncup with Andrew and our NFV:DFG whether stalld service is recommended to use in OpenStack rt-kvm or not. Please bear me some time and will update you soon.

-- Pradipta

Comment 57 Nitesh Narayan Lal 2021-06-03 13:42:35 UTC

Hi Pradipta,

Thanks, and please take your time in performing the tests. Just a note we
are actually increasing the BR from 10 to 15us.

It is a good idea to discuss this with Andrew, but please note having stalld
enabled in general for all KVM-RT deployments is **not recommended** yet as
it incurs latency overhead. The enablement is a work in progress as a part
of Bug 1905799.

This is precisely why we are monitoring the impact on the performance of the
workload that you are testing. 

Thanks

Comment 58 Nitesh Narayan Lal 2021-06-16 21:16:09 UTC

Hi Pradipta,

Just wanted to follow up to see if you get a chance to try the experiments Clark suggested?

Thanks
Nitesh

Comment 68 Nitesh Narayan Lal 2021-07-20 18:44:17 UTC

I had a discussion with Luiz today about this issue and we have concluded
the following:

There are two ways to fix this issue-

1. With the kernel fixes that Marcelo is working on

2. By enabling stalld in KVM-RT environments

Getting the kernel fixes merged upstream and then getting them backported
downstream is going to take some time so a more straightforward way to fix
this issue at least for the time being is via stalld.

The work for enabling stalld for the KVM-RT environment is still underway
as a part of the BZ 1905799 and once that completes the issue reported
in this BZ will also be resolved.

Hence, I am closing this bug as a DUP of BZ 1905799.
Please feel free to comment if there are any concerns.

Thanks

*** This bug has been marked as a duplicate of bug 1905799 ***

Comment 69 Marcelo Tosatti 2021-07-23 12:27:09 UTC

Reopening to track turbostat changes.

Comment 72 RHEL Program Management 2022-01-23 07:26:58 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 73 Pei Zhang 2022-03-01 08:52:43 UTC

Hello Luiz,

As Nitesh explained in Comment 68, there are 2 ways to solve this bug. 

Seems item 2(BZ 1905799 has been closed as WONTFIX) is not a valid solution now.

Do we still plan to do item 1? Do we plan to fix this bug? What do you think?

Thanks.

Best regards,

Pei

Comment 74 Luiz Capitulino 2022-03-01 15:59:54 UTC

(In reply to Pei Zhang from comment #73)
> Hello Luiz,
> 
> As Nitesh explained in Comment 68, there are 2 ways to solve this bug. 
> 
> Seems item 2(BZ 1905799 has been closed as WONTFIX) is not a valid solution
> now.
> 
> Do we still plan to do item 1? Do we plan to fix this bug? What do you think?

@Nitesh, can you remind us what's Marcelo fixes you are referring in comment 68?

My recollection from this BZ is that we deprioritized since there are possible workarounds.

Comment 75 Nitesh Narayan Lal 2022-03-01 16:18:19 UTC

(In reply to Luiz Capitulino from comment #74)
> (In reply to Pei Zhang from comment #73)
> > Hello Luiz,
> > 
> > As Nitesh explained in Comment 68, there are 2 ways to solve this bug. 
> > 
> > Seems item 2(BZ 1905799 has been closed as WONTFIX) is not a valid solution
> > now.
> > 
> > Do we still plan to do item 1? Do we plan to fix this bug? What do you think?
> 
> @Nitesh, can you remind us what's Marcelo fixes you are referring in comment
> 68?

It is the task isolation that Marcelo is working upstream.

Comment 76 Luiz Capitulino 2022-03-01 19:52:31 UTC

(In reply to Nitesh Narayan Lal from comment #75)
> (In reply to Luiz Capitulino from comment #74)
> > (In reply to Pei Zhang from comment #73)
> > > Hello Luiz,
> > > 
> > > As Nitesh explained in Comment 68, there are 2 ways to solve this bug. 
> > > 
> > > Seems item 2(BZ 1905799 has been closed as WONTFIX) is not a valid solution
> > > now.
> > > 
> > > Do we still plan to do item 1? Do we plan to fix this bug? What do you think?
> > 
> > @Nitesh, can you remind us what's Marcelo fixes you are referring in comment
> > 68?
> 
> It is the task isolation that Marcelo is working upstream.

OK, so that's long term.

Pei, I'll stick to the deprioritization of this BZ. Maybe Marcelo can comment once he's back from PTO.

Note You need to log in before you can comment on or make changes to this bug.