Bug 1684745

Summary:	VM hangs on RHEL rt-kernel and OSP 13
Product:	Red Hat Enterprise Linux 7	Reporter:	Yichen Wang <yicwang>
Component:	kernel-rt	Assignee:	Daniel Bristot de Oliveira <daolivei>
kernel-rt sub component:	KVM	QA Contact:	Pei Zhang <pezhang>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	urgent
Priority:	urgent	CC:	atheurer, bhu, chayang, chhudson, daolivei, dhoward, egallen, hhuang, ikulkarn, jianzzha, jraju, juzhang, knoel, krister, lcapitulino, lgoncalv, mmilgram, mtosatti, pagupta, pbonzini, peterx, pezhang, rkrcmar, rt-maint, snagar, virt-maint, vkuznets, williams, wlehman, yicwang
Version:	7.5	Keywords:	Regression, ZStream
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	3.10.0-1019.rt56.978	Doc Type:	Bug Fix
Doc Text:	VMs using real-time priority for VCPus and the kernel-rt cannot boot. The VM starts but does not finish booting. The problem was a regression added by a patch about updating in the SynIC timers on guest entry only. The problem was fixed by conditioning the update in the SynIC timers.	Story Points:	---
Clone Of:
Clones:	1687556 1688673 (view as bug list)		Environment:
Last Closed:	2019-08-06 12:36:30 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1655694, 1672377, 1687556, 1688673, 1707454

Description Yichen Wang 2019-03-02 03:11:14 UTC

Description of problem:

VM comes up, but hangs permanently with RT kernel and RT flavor in NOVA.

Host is having RHEL 7.5 (RT kernel) + OSP13 RPMs. Spawning a VM with below flavor metadata:
{"hw:cpu_realtime_mask": "^0", "hw:cpu_policy": "dedicated", "hw:emulator_threads_policy": "isolate", "hw:mem_page_size": "1048576", "hw:cpu_realtime": "yes"}
VM comes up fine (NOVA reports ACTIVE, and libvirt also says "running"), but hangs permanently without really running anything. Means, there is no log you can get from libvirt, also VNC console will just hang with "VNC handshaking".

Few more observations:
1. When you remove "hw:cpu_realtime" and "hw:cpu_realtime_mask", the issue is not seen;
2. The same thing was not seen on older kernel 3.10.0-957.1.3.rt56.913.el7.x86_64, so suspecting the new kernel is causing some regressions. The only difference in terms of OpenStack front is 2M huge page on older kernel, and 1G huge page on newer kernel, but I don't think that will do anything;
3. While this is giving me headaches, I do find a workaround. This may imply something, just sharing to folks to give some sparks. What I did to unblock this is, do a "chrt -o -p 0 <PID>" and temporarily change all QEMU CPU threads to a SCHED_OTHER, and everything will start to move and VM actually comes up in real. Once settled down, do a "chrt -f -p 1 <PID>" to change all threads back to SCHED_FIFO with priority of 1, and everything runs fine after that.

Version-Release number of selected component (if applicable):
# uname -r
3.10.0-957.5.1.rt56.916.el7.x86_64
# rpm -qa | grep nova
python-nova-17.0.7-2.el7.cisco.5.noarch
openstack-nova-common-17.0.7-2.el7.cisco.5.noarch
python2-novaclient-10.1.0-1.el7ost.noarch
# rpm -qa | grep qemu
libvirt-daemon-driver-qemu-3.9.0-14.el7_5.6.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.1.x86_64
ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch
qemu-kvm-common-rhev-2.12.0-18.el7_6.1.x86_64
qemu-kvm-tools-rhev-2.12.0-18.el7_6.1.x86_64
qemu-img-rhev-2.12.0-18.el7_6.1.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Install RHEL RT with above kernel version
2. Install OSP13
3. Spawn a VM with above flavor

Actual results:
VM becomes ACTIVE, but it is really hanging doing nothing;

Expected results:
VM runs in real;

Comment 2 Luiz Capitulino 2019-03-04 14:54:16 UTC

Yichen,

I'm under the impression that the VM's threads (vCPUs and/or iothreads) are deadlocking.
You may be right that this could be a kernel issue, but before debugging the kernel
we have to double the host configuration.

Are you using the realtime-virtual-host profile in the host and the realtime-virtual-guest
profile in the host? What is the contents of /etc/tuned/realtime-virtual-host-variables.conf
in the host and /etc/tuned/realtime-virtual-guest-variables.conf in the guest?

Erwan, Andrew, can you help double checking the OSP configuration?

Comment 3 Luiz Capitulino 2019-03-04 16:48:12 UTC

Adding Jianzhu to also help reviewing OSP configuration.

Comment 4 jianzzha 2019-03-04 17:02:00 UTC

Yicheng, for the flavor setting, rather than using hw:cpu_realtime_mask="^0", say if you have 8 vCPU, can you use hw:cpu_realtime_mask="0-7"  in stead? Thanks

Comment 5 Yichen Wang 2019-03-04 19:57:54 UTC

(In reply to Luiz Capitulino from comment #2)
> Yichen,
> 
> I'm under the impression that the VM's threads (vCPUs and/or iothreads) are
> deadlocking.
> You may be right that this could be a kernel issue, but before debugging the
> kernel
> we have to double the host configuration.
> 
> Are you using the realtime-virtual-host profile in the host and the
> realtime-virtual-guest
> profile in the host? What is the contents of
> /etc/tuned/realtime-virtual-host-variables.conf
> in the host and /etc/tuned/realtime-virtual-guest-variables.conf in the
> guest?
> 

We tried both with and without tuned, same result. When using tuned, we are using realtime-virtual-host, and the contents are just one line with "isolated_cores=1-19,21-39". Also this is purely host level thing, so I don't think it has anything to do with guest.

> Erwan, Andrew, can you help double checking the OSP configuration?

(In reply to jianzzha from comment #4)
> Yicheng, for the flavor setting, rather than using
> hw:cpu_realtime_mask="^0", say if you have 8 vCPU, can you use
> hw:cpu_realtime_mask="0-7"  in stead? Thanks

If I am understanding correctly, by doing 0-8, I am making all my CPU worker thread RT. By doing this, I see VM coms up fine! Why I need all my VM cores to be RT? It is bug, right?

Comment 6 Luiz Capitulino 2019-03-04 21:28:04 UTC

(In reply to Yichen Wang from comment #5)
> (In reply to Luiz Capitulino from comment #2)
> > Yichen,
> > 
> > I'm under the impression that the VM's threads (vCPUs and/or iothreads) are
> > deadlocking.
> > You may be right that this could be a kernel issue, but before debugging the
> > kernel
> > we have to double the host configuration.
> > 
> > Are you using the realtime-virtual-host profile in the host and the
> > realtime-virtual-guest
> > profile in the host? What is the contents of
> > /etc/tuned/realtime-virtual-host-variables.conf
> > in the host and /etc/tuned/realtime-virtual-guest-variables.conf in the
> > guest?
> > 
> 
> We tried both with and without tuned, same result. When using tuned, we are
> using realtime-virtual-host, and the contents are just one line with
> "isolated_cores=1-19,21-39". Also this is purely host level thing, so I
> don't think it has anything to do with guest.

OK, the configuration looks correct. But there are two important points to
mention. The first one is, even if the tuned profile didn't make an
apparent difference, it is still required to use it as we have discussed
in bug 1678810. The second one is, when you get a guest hang as you did,
we don't know right away if it's a host or guest issue. Actually, in this
case it seems that it indeeded was a host issue (more on this below).

> > Erwan, Andrew, can you help double checking the OSP configuration?
> 
> (In reply to jianzzha from comment #4)
> > Yicheng, for the flavor setting, rather than using
> > hw:cpu_realtime_mask="^0", say if you have 8 vCPU, can you use
> > hw:cpu_realtime_mask="0-7"  in stead? Thanks
> 
> If I am understanding correctly, by doing 0-8, I am making all my CPU worker
> thread RT. By doing this, I see VM coms up fine! Why I need all my VM cores
> to be RT? It is bug, right?

It is not a bug. All VM vCPU threads need to be RT so that they are scheduled
to run when they have to. Even if they are spinning, you must have a housekeeping
vCPU0 that might hold kernel locks and hence has to run when it has to not to
delay other vCPU threads.

Btw, if the issue is now fixed, may we close this BZ?

Comment 7 Yichen Wang 2019-03-04 22:58:10 UTC

(In reply to Luiz Capitulino from comment #6)
> (In reply to Yichen Wang from comment #5)
> > (In reply to Luiz Capitulino from comment #2)
> > > Yichen,
> > > 
> > > I'm under the impression that the VM's threads (vCPUs and/or iothreads) are
> > > deadlocking.
> > > You may be right that this could be a kernel issue, but before debugging the
> > > kernel
> > > we have to double the host configuration.
> > > 
> > > Are you using the realtime-virtual-host profile in the host and the
> > > realtime-virtual-guest
> > > profile in the host? What is the contents of
> > > /etc/tuned/realtime-virtual-host-variables.conf
> > > in the host and /etc/tuned/realtime-virtual-guest-variables.conf in the
> > > guest?
> > > 
> > 
> > We tried both with and without tuned, same result. When using tuned, we are
> > using realtime-virtual-host, and the contents are just one line with
> > "isolated_cores=1-19,21-39". Also this is purely host level thing, so I
> > don't think it has anything to do with guest.
> 
> OK, the configuration looks correct. But there are two important points to
> mention. The first one is, even if the tuned profile didn't make an
> apparent difference, it is still required to use it as we have discussed
> in bug 1678810. The second one is, when you get a guest hang as you did,
> we don't know right away if it's a host or guest issue. Actually, in this
> case it seems that it indeeded was a host issue (more on this below).
> 
> > > Erwan, Andrew, can you help double checking the OSP configuration?
> > 
> > (In reply to jianzzha from comment #4)
> > > Yicheng, for the flavor setting, rather than using
> > > hw:cpu_realtime_mask="^0", say if you have 8 vCPU, can you use
> > > hw:cpu_realtime_mask="0-7"  in stead? Thanks
> > 
> > If I am understanding correctly, by doing 0-8, I am making all my CPU worker
> > thread RT. By doing this, I see VM coms up fine! Why I need all my VM cores
> > to be RT? It is bug, right?
> 
> It is not a bug. All VM vCPU threads need to be RT so that they are scheduled
> to run when they have to. Even if they are spinning, you must have a
> housekeeping
> vCPU0 that might hold kernel locks and hence has to run when it has to not to
> delay other vCPU threads.

OK, if above is true, this is a behavior change if you don't want to call it a regression or bug when compared to an older working kernel-rt version. Also if what you said is true, there is no point to have "hw:cpu_realtime_mask", as it will be always all vCPUs.

> 
> Btw, if the issue is now fixed, may we close this BZ?

I would want to get a better clarification on this, as this change is relatively new. We have to handle the existing VMs for infrastructure update, and there will cause a potential breakage if existing VMs are already with "hw:cpu_realtime_mask=^0". I am still not very convinced this is not a bug, please help me to understand the logic behind it why the new kernel-rt changed the behavior.

Thanks very much!

Comment 8 Luiz Capitulino 2019-03-05 17:42:45 UTC

Yichen,

As I explained, as per KVM-RT required configuration, all VM vCPUs must
have real-time priority. By setting vCPU0 without real-time priority, you
will run into spikes since that vCPU may be holding a lock shared with
other vCPUs, which in turn may cause the other vCPUs to spin on the lock.
Also, if you can't reproduce the hang when all vCPUs have real-time priority,
then there's no regression in KVM-RT since that's the required configuration.

Having said that, I can try to confirm whether or not there's a behavior
change for the non-KVM-RT configuration you seem to be using. Could you please
provide the following information for the configuration that reproduces
the bug:

1. Host's lscpu and a list of isolated cores in the host
2. The guest's XML file
3. The kernel version that works and the one that doesn't work

Also, for item 3, I assume you change the kernel version in the host, right?

Comment 9 Yichen Wang 2019-03-05 20:11:58 UTC

(In reply to Luiz Capitulino from comment #8)
> Yichen,
> 
> As I explained, as per KVM-RT required configuration, all VM vCPUs must
> have real-time priority. By setting vCPU0 without real-time priority, you
> will run into spikes since that vCPU may be holding a lock shared with
> other vCPUs, which in turn may cause the other vCPUs to spin on the lock.
> Also, if you can't reproduce the hang when all vCPUs have real-time priority,
> then there's no regression in KVM-RT since that's the required configuration.
> 

Ok, what you said above is new to me, which I don't find any documentation on that saying all my VM cores have to be RT. If you Google it, people are all using the mask. Also from https://docs.openstack.org/nova/rocky/user/flavors.html, no mentioning of above either. Same points:
(1) If all VM cores need to be RT, what is the point of hw:cpu_realtime_mask?
(2) Why it works on previous version kernel, but not in new kernel? Without proper documentation I would mark it as behavior change or regression. This will cause problem in my 3rd point.
(3) For customers running VMs with hw:cpu_realtime_mask=^0. After kernel update, and VM reboots, and the VM will never come up. This is a blocker for people want to do updates on kernels. Unless we patch libvirt XML, I don't see any other better solution. Please advise.

> Having said that, I can try to confirm whether or not there's a behavior
> change for the non-KVM-RT configuration you seem to be using. Could you
> please
> provide the following information for the configuration that reproduces
> the bug:
> 
> 1. Host's lscpu and a list of isolated cores in the host
[root@quincy-compute-2 ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    1
Core(s) per socket:    20
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping:              4
CPU MHz:               1601.000
CPU max MHz:           1601.0000
CPU min MHz:           1000.0000
BogoMIPS:              3200.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              28160K
NUMA node0 CPU(s):     0-19
NUMA node1 CPU(s):     20-39
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke spec_ctrl intel_stibp flush_l1d

isolated cores are: 1-19, 21-39

> 2. The guest's XML file
# virsh dumpxml 6
<domain type='kvm' id='6'>
  <name>instance-0000002c</name>
  <uuid>50210f2e-4ff9-4d95-ac56-655db6b84631</uuid>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="17.0.7-5.cisco.2.el7"/>
      <nova:name>myvm</nova:name>
      <nova:creationTime>2019-03-05 20:04:00</nova:creationTime>
      <nova:flavor name="m1.medium.noht">
        <nova:memory>4096</nova:memory>
        <nova:disk>40</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>9</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="a7bc657862cb40728afc943c8aa57e3e">admin</nova:user>
        <nova:project uuid="88316df5962845d58ac840f0e46869af">admin</nova:project>
      </nova:owner>
      <nova:root type="image" uuid="dcf6bdf2-cef6-487a-87f3-72657b86680b"/>
    </nova:instance>
  </metadata>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB' nodeset='0'/>
    </hugepages>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>9</vcpu>
  <cputune>
    <shares>9216</shares>
    <vcpupin vcpu='0' cpuset='15'/>
    <vcpupin vcpu='1' cpuset='2'/>
    <vcpupin vcpu='2' cpuset='8'/>
    <vcpupin vcpu='3' cpuset='3'/>
    <vcpupin vcpu='4' cpuset='9'/>
    <vcpupin vcpu='5' cpuset='4'/>
    <vcpupin vcpu='6' cpuset='10'/>
    <vcpupin vcpu='7' cpuset='5'/>
    <vcpupin vcpu='8' cpuset='16'/>
    <emulatorpin cpuset='1'/>
    <vcpusched vcpus='1' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='2' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='3' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='4' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='5' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='6' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='7' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='8' scheduler='fifo' priority='1'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>Red Hat</entry>
      <entry name='product'>OpenStack Compute</entry>
      <entry name='version'>17.0.7-5.cisco.2.el7</entry>
      <entry name='serial'>329497fe-667a-11e8-88af-d8c49789492d</entry>
      <entry name='uuid'>50210f2e-4ff9-4d95-ac56-655db6b84631</entry>
      <entry name='family'>Virtual Machine</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.6.0'>hvm</type>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pmu state='off'/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='9' cores='1' threads='1'/>
    <feature policy='require' name='tsc-deadline'/>
    <numa>
      <cell id='0' cpus='0-8' memory='4194304' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/nova/instances/50210f2e-4ff9-4d95-ac56-655db6b84631/disk'/>
      <backingStore type='file' index='1'>
        <format type='raw'/>
        <source file='/var/lib/nova/instances/_base/39924454a2f83b902c560386f4aae35eca3a6575'/>
        <backingStore/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='piix3-uhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <interface type='bridge'>
      <mac address='fa:16:3e:dc:73:75'/>
      <source bridge='qbr5695ac7e-db'/>
      <target dev='tap5695ac7e-db'/>
      <model type='virtio'/>
      <driver name='vhost' rx_queue_size='1024'/>
      <mtu size='9000'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/2'/>
      <log file='/var/lib/nova/instances/50210f2e-4ff9-4d95-ac56-655db6b84631/console.log' append='off'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/2'>
      <source path='/dev/pts/2'/>
      <log file='/var/lib/nova/instances/50210f2e-4ff9-4d95-ac56-655db6b84631/console.log' append='off'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <graphics type='vnc' port='5900' autoport='yes' listen='172.29.86.247' keymap='en-us'>
      <listen type='address' address='172.29.86.247'/>
    </graphics>
    <video>
      <model type='cirrus' vram='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <stats period='10'/>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+2012:+2012</label>
    <imagelabel>+2012:+2012</imagelabel>
  </seclabel>
</domain>

> 3. The kernel version that works and the one that doesn't work
Working old version: 3.10.0-957.1.3.rt56.913.el7.x86_64
Non-working new version: 3.10.0-957.5.1.rt56.916.el7.x86_64
> 
> Also, for item 3, I assume you change the kernel version in the host, right?
Yes

Comment 11 Luiz Capitulino 2019-03-05 20:31:46 UTC

(In reply to Yichen Wang from comment #9)
> (In reply to Luiz Capitulino from comment #8)
> > Yichen,
> > 
> > As I explained, as per KVM-RT required configuration, all VM vCPUs must
> > have real-time priority. By setting vCPU0 without real-time priority, you
> > will run into spikes since that vCPU may be holding a lock shared with
> > other vCPUs, which in turn may cause the other vCPUs to spin on the lock.
> > Also, if you can't reproduce the hang when all vCPUs have real-time priority,
> > then there's no regression in KVM-RT since that's the required configuration.
> > 
> 
> Ok, what you said above is new to me, which I don't find any documentation
> on that saying all my VM cores have to be RT. If you Google it, people are
> all using the mask. Also from
> https://docs.openstack.org/nova/rocky/user/flavors.html, no mentioning of
> above either. 

KVM-RT is very new, so I wouldn't expect all details like this to be
fully documented yet. Also, as it turns out, different kernel versions may
require different tunings since different kernels may have different interrupt
sources. So, not all tunings apply to all kernels equally.

> Same points:
> (1) If all VM cores need to be RT, what is the point of hw:cpu_realtime_mask?

I think this makes sense for full flexibility and not to tie OpenStack
configuration to a particular kernel version. But for more info, we'd need
to talk to OpenStack developers (adding Erwan to the CC list).

> (2) Why it works on previous version kernel, but not in new kernel? Without
> proper documentation I would mark it as behavior change or regression. This
> will cause problem in my 3rd point.
> (3) For customers running VMs with hw:cpu_realtime_mask=^0. After kernel
> update, and VM reboots, and the VM will never come up. This is a blocker for
> people want to do updates on kernels. Unless we patch libvirt XML, I don't
> see any other better solution. Please advise.

I'll try to reproduce this issue to see if there's a kernel regression or
behavior change. But for KVM-RT, the required configuration is all vCPU
threads must have real-time priority.

Comment 12 Luiz Capitulino 2019-03-06 03:29:46 UTC

I was able to reproduce this issue without OpenStack. Actually, there are two ways
to reproduce it:

1. The simplest way is to just reboot a KVM-RT guest from within the guest.
If the bug reproduces, and it reproduces 100% of the time for me, the
guest will get stuck with most vCPU threads taking 3% userspace and
18% kernelspace

2. In the guest XML, configure all vCPUs but vCPU0 to have fifo:1 priority
(that is, vCPU0 will have SCHED_OTHER priority). Try to start the guest
with virsh. virsh itself will hang and the guest will hang like item 1

Here's all that I know so far:

o This is a regression. The first bad kernel is kernel-rt-3.10.0-957.3.1.rt56.914.el7.x86_64.
The last good kernel is kernel-rt-3.10.0-957.2.1.rt56.913.el7.x86_64

o The latest RHEL7.7 kernel 3.10.0-1014.rt56.972.el7.x86_64 is also affected

o The non-RT kernel kernel-3.10.0-957.3.1.el7.x86_64 DOES NOT reproduce the issue,
which makes me think this is an RT-kernel only issue

o When tracing, I observe that the vCPU threads are "spinning" on this
mutex from the signal code tsk->sighang->siglock. Any code path leading to
this spinlock will cause the thread to "spin", for example:

sigprocmask() /* from vcpu run code in KVM */
__set_current_blocked()
migrate_disable()
pin_current_cpu()
spin_lock_irq(&tsk->sighang->siglock)

I'm saying "spinning" because in the RT kernel spinlock contention causes
the thread to go to sleep for a while, wake up, see the lock taken and go
to sleep.

The first bad kernel (kernel-rt-3.10.0-957.3.1.rt56.914.el7.x86_64) has a relatively
big KVM update, mostly relating to HyperV. So, I think I'll try to revert this series.

Otherwise, I have two hypothesis for this issue:

1. There's something in the HyperV updating "conflicting" with the RT kernel code
(say, bad lock semantics)

2. There's a bad conflict resolution

PS: I'm clearing the NEEDINFO request for Erwan, since I think that's a bit irrelevant now

Comment 15 Luiz Capitulino 2019-03-06 14:59:27 UTC

Yichen,

Thanks a lot on finding this one. It is a true bug and we're working on it.

Comment 20 Daniel Bristot de Oliveira 2019-03-07 11:25:52 UTC

I reverted the HyperV patches in the latest 7.6.z, and it works fine.

Here is the brew build:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=20482594

What are the next steps?

make a hotfix kernel, deliver to the customer, and continue working in the real solution?

-- Daniel

Comment 54 Yichen Wang 2019-04-01 05:25:40 UTC

Being delayed for this RPM signature issue, which still haven't been resolved yet... I just install the RPM anyway, and give it a shot. The original issue I saw on this thread has been fixed, and VM is launching fine. However, still want to keep it open, as we are seeing another abnormal things with this new version with wider testing.

In order to better reveal this, we configured our testbed to have mixed of HT and non-HT deployment. On HT enabled node, the issue is 100% reproducible (4/4), on non-HT enabled node, the issue is 30% reproducible (1/3). Please find the "lscpu" output above for the hardware information.

[root@quincy-compute-4 ~]# top
top - 13:03:08 up 13:39,  1 user,  load average: 149.30, 149.47, 149.73
Tasks: 1622 total,   9 running, 1613 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.7 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 39469430+total, 19495484+free, 19490510+used,  4834360 buff/cache
KiB Swap:  2097148 total,  2097148 free,        0 used. 19361452+avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 15856 root      39  19       0      0      0 S  13.9  0.0  77:30.36 kipmi0
   610 root      -4   0       0      0      0 S  11.7  0.0  74:54.14 ktimersoftd/60
    23 root      -3   0       0      0      0 S   4.5  0.0  14:18.30 ksoftirqd/1
   213 root      -4   0       0      0      0 S   4.5  0.0  33:38.03 ktimersoftd/20
226553 root      20   0  163648   3976   1592 R   1.9  0.0   0:00.38 top

On non-HT enabled testbed, top shows:
[root@quincy-control-2 ~]# top
top - 13:03:43 up 13:40,  1 user,  load average: 73.13, 73.89, 73.84
Tasks: 1133 total,   2 running, 1131 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.3 us,  1.6 sy,  0.0 ni, 97.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 19652601+total, 91847016 free, 96707744 used,  7971260 buff/cache
KiB Swap: 33554428 total, 33554428 free,        0 used. 92710192 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
183236 rabbitmq  20   0 3574340 107592   2804 S  28.3  0.1   0:02.50 beam.smp
 54714 rabbitmq  20   0 7835916 268384   4348 S  27.7  0.1 224:43.84 beam.smp
     4 root      -4   0       0      0      0 S   2.9  0.0  27:16.72 ktimersoftd/0
   213 root      -4   0       0      0      0 S   2.6  0.0  21:12.23 ktimersoftd/20
 71258 cloudpu+  20   0  298596  83080   7696 S   2.0  0.0  14:13.75 cloudpulse-serv
 53655 mysql     20   0   15.6g 531176 146124 S   1.6  0.3  10:44.52 mysqld
183237 root      20   0  163116   3476   1592 R   1.6  0.0   0:00.20 top

You see the load average is like 150 for HT, and 74 for non-HT. Given this is a 20 * 2 = 40 cores CPU, if the system is behaving like what "top" says, it should be super overloaded. However, things are still fine, just the top output is scaring people away.

The issue is confirmed to be tuned, on both new and old kernel versions. I tried with tuned profile realtime, same thing. Then I tried realtime's parent profile, network-latency, issue is not seen. So clearly something is wrong when applying realtime tuned profiles. When reproduced, it is 100% happening.

So I would like helps to understand how to debug with this...

Thanks very much!

Regards,
Yichen

Comment 61 Luiz Capitulino 2019-04-01 15:52:08 UTC

(In reply to Yichen Wang from comment #54)
> Being delayed for this RPM signature issue, which still haven't been
> resolved yet... I just install the RPM anyway, and give it a shot. The
> original issue I saw on this thread has been fixed, and VM is launching
> fine. However, still want to keep it open, as we are seeing another abnormal
> things with this new version with wider testing.

Yichen,

Would you mind opening a new BZ for this issue? It doesn't seem related to
the vCPU hang issue so it's better for us to track it in a different BZ.

Thanks!

Comment 62 Yichen Wang 2019-04-01 22:37:43 UTC

(In reply to Luiz Capitulino from comment #61)
> (In reply to Yichen Wang from comment #54)
> > Being delayed for this RPM signature issue, which still haven't been
> > resolved yet... I just install the RPM anyway, and give it a shot. The
> > original issue I saw on this thread has been fixed, and VM is launching
> > fine. However, still want to keep it open, as we are seeing another abnormal
> > things with this new version with wider testing.
> 
> Yichen,
> 
> Would you mind opening a new BZ for this issue? It doesn't seem related to
> the vCPU hang issue so it's better for us to track it in a different BZ.
> 
> Thanks!

No problem, Luiz. Please feel free to close this BZ. A new BZ is opened for tuned issue that I am seeing:
https://bugzilla.redhat.com/show_bug.cgi?id=1694877

Thanks very much!

Regards,
Yichen

Comment 65 errata-xmlrpc 2019-08-06 12:36:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2043