Bug 1679122 - Automatically set in engine the following flags for High Performance VMs types: invtsc cpu flag and also the tsc frequency flag for supporting migration
Summary: Automatically set in engine the following flags for High Performance VMs type...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ovirt-4.4.0
: ---
Assignee: Tomasz Barański
QA Contact: Polina
URL:
Whiteboard:
: 1723583 (view as bug list)
Depends On:
Blocks: 1723583
TreeView+ depends on / blocked
 
Reported: 2019-02-20 11:51 UTC by Sharon Gratch
Modified: 2023-09-14 05:24 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
: 1723583 (view as bug list)
Environment:
Last Closed: 2020-05-20 20:03:49 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 100661 0 'None' MERGED virt: Add TSC frequency to host's capabilities 2021-02-17 09:03:40 UTC
oVirt gerrit 100905 0 'None' MERGED core: Store TSC frequency in VDS dynamic 2021-02-17 09:03:40 UTC
oVirt gerrit 101095 0 'None' MERGED core: Add TSC frequency requirement for HP VMs 2021-02-17 09:03:40 UTC
oVirt gerrit 101267 0 'None' MERGED core: Filter migration target by TSC Frequency 2021-02-17 09:03:40 UTC
oVirt gerrit 101769 0 'None' MERGED webadmin: Add checkbox to control TSC Frequency 2021-02-17 09:03:40 UTC
oVirt gerrit 104077 0 'None' MERGED core: Fix NPE in TSC frequency 2021-02-17 09:03:40 UTC

Description Sharon Gratch 2019-02-20 11:51:47 UTC
Description of problem:

Current cpuflags hook [1] which is used for SAP HANA, sets the following cpu flags: rdtscp, invtsc.

The problem is that once cpuflags hook is installed and invtsc flag is set then migration failed due to improper frequency value in target host. 
Since HP VMs support migration since oVirt 4.3 then this issue should be fixed to support SAP HANA VMs migration. 

So for fixing that we need to automatically set the invtsc flag (move it from cpuflags hook to engine) and also the tsc frequency flag to the lowest supported value as described in libvirt BZ [2]: 
<clock>
    <timer name='tsc' frequency='1234567890'/>
</clock>
 
We can support that by setting VM to the frequency supported by current host which the VM is runnning on.

[1] https://github.com/oVirt/vdsm/blob/master/vdsm_hooks/cpuflags/before_vm_start.py
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1410225

Issues still need to consider/examine:
We still need to examine if we want to add it to HP VMs by default regardless to migration mode or add it only to migrate-able VMs (and then it can't be changed that for running VMs), what are the limitations and what is influence of setting the frequency, how we'll handle current host which supports very low frequency comparing to other hosts etc.

Comment 1 Steven Rosenberg 2019-03-12 18:08:39 UTC
I did some investigation and testing on this issue. 

I tested launching a VM on a Host with the following CPU attributes:

"cpuSpeed": "2660.045",
"cpuModel": "Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz", 


I set the following sections in the LibvirtVmXmlBuilder.java module, writeClock() function:

        writer.writeStartElement("timer");
        writer.writeAttributeString("name", "tsc");
        writer.writeAttributeString("frequency", "2660045000");
        writer.writeEndElement();

When launching the VM, the functionality failed with the following error:

2019-03-12 19:10:56,460+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-4) [] EVENT_ID: VM_DOWN_ERROR(119), VM VMTwoClusters is down with error. Exit message: internal error: qemu unexpectedly closed the monitor: 2019-03-12T17:10:55.150496Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future
2019-03-12T17:10:55.151567Z qemu-kvm: warning: TSC frequency mismatch between VM (2660045 kHz) and host (2659983 kHz), and TSC scaling unavailable
2019-03-12T17:10:55.151654Z qemu-kvm: kvm_init_vcpu failed: Operation not supported.


When changing the values to the following it succeeded:

        writer.writeStartElement("timer");
        writer.writeAttributeString("name", "tsc");
        writer.writeAttributeString("frequency", "2659983000");
        writer.writeEndElement();

When using a value lower (minimum?) such as 1234567890 it also fails:

2019-03-12 18:54:24,013+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-2) [] EVENT_ID: VM_DOWN_ERROR(119), VM VMTwoClusters is down with error. Exit message: internal error: qemu unexpectedly closed the monitor: 2019-03-12T16:54:22.758274Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future
2019-03-12T16:54:22.759327Z qemu-kvm: warning: TSC frequency mismatch between VM (1234567 kHz) and host (2659983 kHz), and TSC scaling unavailable
2019-03-12T16:54:22.759381Z qemu-kvm: kvm_init_vcpu failed: Operation not supported.

It seems one needs the exact value, but it is not clear where to get this value from.

We have the CPU Speed, but it changes on some Hosts and is not always the same as the frequency included in the Model.

For example one of my Hosts has the following model:

"cpuModel": "Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz"


but sometimes the speed was:

"cpuSpeed": "1599.975"

Other times the speed was:


"cpuSpeed": "1946.948"

It seems we need a consistent way to obtain the proper frequency.


Please advise as well what NUMA configurations if any may be required.

Comment 2 Ryan Barry 2019-03-12 20:34:45 UTC
The CPU frequency scales based on the states. The servers may or may not have an ondemand governor.

However, the current frequency is always available from vdsm.cpuinfo.frequency, which is pushed back as cpuSpeedMh in VdsDynamic. Instead of hardcoding, does it work if this value is used?

Comment 3 Steven Rosenberg 2019-03-13 11:57:58 UTC
I checked the cpuSpeedMh value in the VdsDynamic and it is as expected the same as the cpuSpeed, 2660.0 when 2659983 KHz. As a matter of fact it also rounded to Mhz, which would be another issue. The problem is expecting an exact value and it seems we do not have that specific value.

Comment 4 Michal Skrivanek 2019-03-13 12:19:40 UTC
"Intel(R) Core(TM)2" is not a supported CPU, not even before we dropped the "old cpus" recently. You need to test on something actually supporting TSC scaling (and supported in general)

Comment 5 Steven Rosenberg 2019-03-13 15:46:08 UTC
The cpu Flags do include tsc:

    "cpuFlags": "fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,lm,constant_tsc,arch_perfmon,pebs,bts,rep_good,nopl,aperfmperf,eagerfpu,pni,dtes64,monitor,ds_cpl,vmx,est,tm2,ssse3,cx16,xtpr,pdcm,sse4_1,xsave,lahf_lm,tpr_shadow,vnmi,flexpriority,dtherm,model_Opteron_G2,model_kvm32,model_coreduo,model_Conroe,model_Opteron_G1,model_core2duo,model_qemu32,model_Penryn,model_pentium2,model_pentium3,model_qemu64,model_kvm64,model_pentium,model_486",

also in the "Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz":

"cpuFlags": "fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,rep_good,nopl,xtopology,nonstop_tsc,aperfmperf,eagerfpu,pni,pclmulqdq,dtes64,monitor,ds_cpl,vmx,smx,est,tm2,ssse3,cx16,xtpr,pdcm,pcid,sse4_1,sse4_2,x2apic,popcnt,tsc_deadline_timer,aes,xsave,avx,lahf_lm,epb,ssbd,ibrs,ibpb,stibp,tpr_shadow,vnmi,flexpriority,ept,vpid,xsaveopt,dtherm,ida,arat,pln,pts,spec_ctrl,intel_stibp,flush_l1d,model_Opteron_G2,model_kvm32,model_kvm64,model_coreduo,model_SandyBridge-IBRS,model_Conroe,model_Nehalem,model_Westmere-IBRS,model_Opteron_G1,model_core2duo,model_Nehalem-IBRS,model_qemu32,model_Penryn,model_pentium2,model_pentium3,model_qemu64,model_Westmere,model_SandyBridge,model_pentium,model_486",

Comment 6 Ryan Barry 2019-03-13 16:08:59 UTC
The warning is a warning. What's in the actual qemu logs?

Comment 7 Ryan Barry 2019-03-13 17:36:53 UTC
Jiri/Martin -

This actually appears to require an exact frequency, even on hosts which support TSC. Surely there must be something we're missing

https://github.com/qemu/qemu/blob/master/target/i386/kvm.c#L673-L677

Do we actually need to pass in the exact frequency?

Comment 8 Steven Rosenberg 2019-03-13 17:46:43 UTC
The logging is reflected in the engine. 

The cpu flags for both Hosts support tsc.

This issue was retested on the "Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz" Host as well.

These are the results:


Fails when the values are lower:

writer.writeStartElement("timer");
writer.writeAttributeString("name", "tsc");
writer.writeAttributeString("frequency", "1234567890");
writer.writeEndElement();

2019-03-13 17:57:34,021+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-13) [] EVENT_ID: VM_DOWN_ERROR(119), VM VMTwoClusters is down with error. Exit message: internal error: qemu unexpectedly closed the monitor: 2019-03-13T15:57:33.254520Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future
2019-03-13T15:57:33.255335Z qemu-kvm: warning: TSC frequency mismatch between VM (1234567 kHz) and host (3392293 kHz), and TSC scaling unavailable
2019-03-13T15:57:33.255367Z qemu-kvm: kvm_init_vcpu failed: Operation not supported.

And when higher:

writer.writeStartElement("timer");
writer.writeAttributeString("name", "tsc");
writer.writeAttributeString("frequency", "3400000000");
writer.writeEndElement();

2019-03-13 19:11:09,030+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-2) [] EVENT_ID: VM_DOWN_ERROR(119), VM VMTwoClusters is down with error. Exit message: internal error: qemu unexpectedly closed the monitor: 2019-03-13T17:11:08.220017Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future
2019-03-13T17:11:08.220749Z qemu-kvm: warning: TSC frequency mismatch between VM (3400000 kHz) and host (3392293 kHz), and TSC scaling unavailable
2019-03-13T17:11:08.220775Z qemu-kvm: kvm_init_vcpu failed: Operation not supported.

Succeeds when they match:

writer.writeStartElement("timer");
writer.writeAttributeString("name", "tsc");
writer.writeAttributeString("frequency", "3392293000");
writer.writeEndElement();

QEMU code: Fails if the frequencies are not equal as per the diagnostics (prints cur_freq = 3392293 so it is > 0):

if (cur_freq <= 0 || cur_freq != env->tsc_khz) {
        warn_report("TSC frequency mismatch between "
                    "VM (%" PRId64 " kHz) and host (%d kHz), "
                    "and TSC scaling unavailable",
                    env->tsc_khz, cur_freq);

See:

https://github.com/qemu/qemu/blob/master/target/i386/kvm.c#L673-L677


QEMU obtains the value via ioctl from the following port:

#define KVM_GET_TSC_KHZ           _IO(KVMIO,  0xa3)

The question is whether qemu should have a larger resolution or if the exact value is required. It seems contrary to the description of finding the "lowest supported value". Otherwise the VDSM would have to read the value and send it to the engine.

Please advise accordingly concerning the intended design.

Comment 9 Jiri Denemark 2019-03-14 10:46:55 UTC
AFAIK the frequencies have to match exactly unless the CPU supports TSC
scaling. In this case it looks like one of the hosts allows TSC to be
explicitly set (you can change 3400000 to something else and the domain
starts), but the other one doesn't, i.e., it can only start a domain if the
frequency is set to 3392293.

Looking at https://github.com/qemu/qemu/blob/master/target/i386/kvm.c#L663 you
can see QEMU tries to set the TSC frequency and only reports an error if the
frequency cannot be set and it does not match the one used by the current CPU.

Comment 10 Steven Rosenberg 2019-03-14 13:38:43 UTC
So is this feature only for CPUs that support TSC scaling or do we also support CPUs that do not support TSC scaling, the later of which will require obtaining the exact frequency by the VDSM and sending it to the engine. For the former (TSC scaling only), it looks like the VDSM will have to check if tsc scaling is supported before adding the invtsc to the cpu flags.

Please clarify.

Comment 11 Steven Rosenberg 2019-03-14 17:26:48 UTC
Another question, do we need to support this in older Cluster Compatibility versions such as 4.1 or just in say 4.2 and higher?

Comment 12 Ryan Barry 2019-03-14 18:56:26 UTC
No more 4.1 releases. If you can make the patch backportable to 4.2, that would be ideal. I don't have anything older than Haswell to check against, but haswell also has tsc_adjust.

Please check against actual guests, since the ones in my lab do NOT have tsc_adjust set in the guest, but scaling works in testing.

However, this feature is specifically required for SAP HANA migrations, hence the priority. If tsc_adjust || tsc_scale are not in the available CPU flags, either block migrations or block it in the hook if sap_agent is set and the flag is not available (I'd expect all SAP HANA instances to be on CPUs recent enough to support these)

Comment 13 Steven Rosenberg 2019-03-18 08:26:30 UTC
Investigating this issue, the following was found[1]:

AMD SVM Feature Identification, CPUID level 0x8000000a (edx)
tsc_scale: AMD TSC scaling support

Intel-defined CPU features, CPUID level 0x00000007:0 (ebx)
tsc_adjust: TSC adjustment MSR


It seems only the tcs_scale actually states it is for tsc scaling support.

When testing with the following Host that included tsc_adjust, the qemu still fails when the values do not equal:


from cpuinfo:
model name      : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts spec_ctrl intel_stibp flush_l1d
cpu MHz         : 3899.291

Error:

2019-03-17 20:42:58,896+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-4) [] EVENT_ID: VM_DOWN_ERROR(119), VM VM2 is down with error. Exit message: internal error: process exited while connecting to monitor: 2019-03-17T18:42:58.124056Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future
2019-03-17T18:42:58.124842Z qemu-kvm: warning: TSC frequency mismatch between VM (3888000 kHz) and host (3392160 kHz), and TSC scaling unavailable
2019-03-17T18:42:58.124860Z qemu-kvm: kvm_init_vcpu failed: Operation not supported.

It seems both TSC scaling and TSC offsetting needs to be enabled. [2]


Maybe we should add better diagnostics and test the qemu code to see why the set actually fails. 

In the mean time more investigation should be performed.

[1] https://unix.stackexchange.com/questions/43539/what-do-the-flags-in-proc-cpuinfo-mean/43540
[2] https:/www.intel.com/content/dam/www/public/us/n/documents/white-papers/timestamp-counter-scaling-virtualization-white-paper.pdf

Comment 14 Ryan Barry 2019-03-18 11:23:22 UTC
Before we go down the rabbit hole of qemu code itself, let's try the following:

Ensure that tsc_adjust is actually exposed to the guest as part of the CPU configuration in the libvirt XML

Check with Jiri to see if there's something we're missing in the XML

Comment 15 Steven Rosenberg 2019-03-18 12:22:42 UTC
Please advise concerning comments #13 and 14.

Thank you in advance for your help.

Comment 16 Ryan Barry 2019-03-18 12:52:27 UTC
Please provide the libvirt XML the engine is sending across first, so we have as much information as possible

Comment 17 Sharon Gratch 2019-03-18 14:26:26 UTC
(In reply to Ryan Barry from comment #16)
> Please provide the libvirt XML the engine is sending across first, so we
> have as much information as possible

The engine sends the list of cpu flags only if cpu type is not host-model or hostPassthrough, see [1].
The cpu flags list is available in case of cpupassthrough via vm.getCpuName(), although we don't send them to vdsm since it is replaced with mode="host-passthrough".

Steven, 
-did you try your tests with cpu pass through enabled? 
-did you try to play with the timer's "mode" attribute and set it to one of the follows "auto", "native", "emulate", "paravirt", or "smpsafe"? (please see [2] for details) 
e.g.: 
<timer name='tsc' frequency='NNN' mode='auto|native|emulate|smpsafe'/>

mode attribute values
Value	  Description
-----     -----------
auto	  Native if TSC is unstable, otherwise allow native TSC access.
native	  Always allow native TSC access.
emulate	  Always emulate TSC.
smpsafe	  Always emulate TSC and interlock SMP

[1] https://github.com/oVirt/ovirt-engine/blob/66fbbfcbc717c33c0cfb1732e2f322d9b473733d/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java#L393
[2]  https://libvirt.org/formatdomain.html ("Timer" section)

Comment 18 Steven Rosenberg 2019-03-18 15:04:34 UTC
xml from engine log:


9a-47da-8b23-35b01dee2a13] VM <?xml version="1.0" encoding="UTF-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
  <name>VM2</name>
  <uuid>509756f0-e37f-4e24-bef9-225f0b1ff112</uuid>
  <memory>1048576</memory>
  <currentMemory>1048576</currentMemory>
  <iothreads>1</iothreads>
  <maxMemory slots="16">4194304</maxMemory>
  <vcpu current="1">16</vcpu>
  <sysinfo type="smbios">
    <system>
      <entry name="manufacturer">oVirt</entry>
      <entry name="product">OS-NAME:</entry>
      <entry name="version">OS-VERSION:</entry>
      <entry name="serial">HOST-SERIAL:</entry>
      <entry name="uuid">509756f0-e37f-4e24-bef9-225f0b1ff112</entry>
    </system>
  </sysinfo>
  <clock offset="variable" adjustment="0">
    <timer name="tsc" frequency="3888000000"/>
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
  </clock>
  <features>
    <acpi/>
  </features>
  <cpu match="exact">
    <model>Nehalem</model>
    <topology cores="1" threads="1" sockets="16"/>
    <numa>
      <cell id="0" cpus="0" memory="1048576"/>
    </numa>
  </cpu>
  <cputune/>
  <devices>
    <input type="mouse" bus="ps2"/>
    <channel type="unix">
      <target type="virtio" name="ovirt-guest-agent.0"/>
      <source mode="bind" path="/var/lib/libvirt/qemu/channels/509756f0-e37f-4e24-bef9-225f0b1ff112.ovirt-guest-agent.0"/>
    </channel>
    <channel type="unix">
      <target type="virtio" name="org.qemu.guest_agent.0"/>
      <source mode="bind" path="/var/lib/libvirt/qemu/channels/509756f0-e37f-4e24-bef9-225f0b1ff112.org.qemu.guest_agent.0"/>
    </channel>
    <controller type="virtio-serial" index="0" ports="16">
      <alias name="ua-396c697d-904e-4065-9b7f-bd1721c4f4e8"/>
</controller>
    <controller type="usb" model="piix3-uhci" index="0"/>
    <controller type="scsi" model="virtio-scsi" index="0">
      <driver iothread="1"/>
      <alias name="ua-46fab565-2ce9-472d-ac03-6fd3f59f82e8"/>
    </controller>
    <memballoon model="virtio">
      <stats period="5"/>
      <alias name="ua-62f47cf4-ba03-423a-b892-52c253a0f989"/>
    </memballoon>
    <rng model="virtio">
      <backend model="random">/dev/urandom</backend>
      <alias name="ua-8fadd1de-6517-4190-b82a-29bc15d0ac2d"/>
    </rng>
    <video>
      <model type="qxl" vram="8192" heads="1" ram="65536" vgamem="16384"/>
      <alias name="ua-b5345491-68bf-4ce0-adcd-e973c9e01434"/>
    </video>
    <graphics type="spice" port="-1" autoport="yes" passwd="*****" passwdValidTo="1970-01-01T00:00:01" tlsPort="-1">
      <channel name="main" mode="secure"/>
      <channel name="inputs" mode="secure"/>
      <channel name="cursor" mode="secure"/>
      <channel name="playback" mode="secure"/>
      <channel name="record" mode="secure"/>
      <channel name="display" mode="secure"/>
      <channel name="smartcard" mode="secure"/>
      <channel name="usbredir" mode="secure"/>
      <listen type="network" network="vdsm-ovirtmgmt"/>
    </graphics>
    <channel type="spicevmc">
      <target type="virtio" name="com.redhat.spice.0"/>
    </channel>
    <interface type="bridge">
      <model type="virtio"/>
      <link state="up"/>
      <source bridge="ovirtmgmt"/>
      <alias name="ua-50aa551a-7441-404c-8273-ee39c4b0bec7"/>
      <boot order="1"/>
      <mac address="00:1a:4a:16:01:04"/>
      <mtu size="1500"/>
      <filterref filter="vdsm-no-mac-spoofing"/>
      <bandwidth/>
    </interface>
    <disk type="file" device="cdrom" snapshot="no">
      <driver name="qemu" type="raw" error_policy="report"/>
      <source file="" startupPolicy="optional">
        <seclabel model="dac" type="none" relabel="no"/>
      </source>
      <target dev="hdc" bus="ide"/>
      <readonly/>
      <alias name="ua-dcdacc99-c448-489a-bf7a-d5ee2785a16f"/>
    </disk>
  </devices>
  <pm>
    <suspend-to-disk enabled="no"/>
    <suspend-to-mem enabled="no"/>
  </pm>
  <os>
    <type arch="x86_64" machine="pc-i440fx-rhel7.3.0">hvm</type>
    <smbios mode="sysinfo"/>
    <bootmenu enable="yes" timeout="30000"/>
  </os>
  <metadata>
    <ovirt-tune:qos/>
    <ovirt-vm:vm>
      <ovirt-vm:minGuaranteedMemoryMb type="int">1024</ovirt-vm:minGuaranteedMemoryMb>
      <ovirt-vm:clusterVersion>4.2</ovirt-vm:clusterVersion>
      <ovirt-vm:custom/>
      <ovirt-vm:device mac_address="00:1a:4a:16:01:04">
        <ovirt-vm:custom/>
      </ovirt-vm:device>
      <ovirt-vm:launchPaused>false</ovirt-vm:launchPaused>
      <ovirt-vm:resumeBehavior>auto_resume</ovirt-vm:resumeBehavior>
    </ovirt-vm:vm>
  </metadata>
</domain>

Comment 19 Sharon Gratch 2019-03-18 15:16:18 UTC
Steven, as I mentioned in bug title and in comment #0, the domain xml should also include invtsc enabling:
<cpu mode='host-passthrough' check='none'>
   <topology sockets='16' cores='3' threads='2'/>
   <feature policy='require' name='invtsc'/>
...
</cpu>

This was added by sap_agent VDSM hook up till now, but for this solution it should be added to domain xml by the engine.

Comment 20 Sharon Gratch 2019-03-18 15:21:14 UTC
I advise to check this with a HP VM (cpu-pass-through is enabled by default for a HP vm) and add the following to domain xml under the 'cpu' element:
<cpu...>
"<feature policy='require' name='invtsc'/>"
...
</cpu>

Comment 21 Steven Rosenberg 2019-03-19 17:16:55 UTC
This was tested by setting the VM to High Performance and adding the feature section to the passthrough code.

The xml looks like this:

  <clock offset="variable" adjustment="0">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="tsc" frequency="1234567890" mode="emulate"/>
    <timer name="hpet" present="no"/>
  </clock>
  <features>
    <acpi/>
  </features>
  <cpu match="exact" mode="host-passthrough">
    <feature name="invtsc" policy="require"/>
    <topology cores="1" threads="1" sockets="16"/>
    <numa>
      <cell id="0" cpus="0" memory="1048576"/>
    </numa>
  </cpu>

The process fails with the following errors and warnings:

2019-03-19 19:08:39,354+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-2) [] EVENT_ID: VM_DOWN_ERROR(119), VM VM2 is down with error. Exit message: internal error: process exited while connecting to monitor: 2019-03-19T17:08:38.520959Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future
2019-03-19T17:08:38.521596Z qemu-kvm: warning: TSC frequency mismatch between VM (1234567 kHz) and host (3392160 kHz), and TSC scaling unavailable
2019-03-19T17:08:38.521715Z qemu-kvm: kvm_init_vcpu failed: Operation not supported.

It is still failing to set the frequency.


It may be nice to obtain the errno to understand why the ioctl fails in the QEMU libvirt code.

Comment 22 Ryan Barry 2019-03-19 20:43:34 UTC
I did some digging in the kernel code, and all of this is completely unnecessary.

Invariant TSC frequency specifically means "the TSC frequency does not scale with C-states". No checking is needed after boot time, because it does not change. Of course, the host being migrated to must have the same TSC frequency (see https://www.redhat.com/archives/libvir-list/2017-January/msg00092.html), but we can document this as a limitation, and I'd expect SAP HANA hosts (and most clusters) to be comprised of identical hardware anyway.

The exact TSC frequency is available in dmesg:
[    1.470305] tsc: Refined TSC clocksource calibration: 3392.086 MHz

Slice out the \. and pad it with three zeros, and this works exactly as expected. VMs with the invtsc flag can be started (with an exact TSC frequency) and they can be migrated between hosts which have the same TSC frequency.

Comment 23 Steven Rosenberg 2019-03-20 10:44:34 UTC
According to the link Ryan provided in comment 22, the suggestion is to set the tsc-frequency explicitly by "management software":  

"I suggest we allow migration with invtsc if and only if
tsc-frequency is set explicitly by management software. In other
words, apply only patches 1/4 and 2/4 from this series. After
that, we will need libvirt support for configuring tsc-frequency"

The testing shows that qemu fails to set the frequency. 

The engine only has the cpu speed which is different from the specific value which is why the failure occurs as per the example in comment 13 which uses the cpu speed value:

https://bugzilla.redhat.com/show_bug.cgi?id=1679122#c13

According to the link again:

> > > We can't allow migration unconditionally because we don't know if
> > > the destination is a QEMU version that is really going to ensure
> > > there's no TSC frequency mismatch. To ensure we are migrating to
> > > a destination that won't ignore SET_TSC_KHZ errors, allow invtsc
> > > migration only on pc-*-2.9 and newer.

Which states that the goal is "to ensure there's no TSC frequency mismatch" 

If we are sure the set is failing in QEMU because one cannot change the value and that mismatches will fail, 
then maybe the idea is to compare the TSC frequency of the source and destination when specifying High Performance (host-passthrough) "so migration is
aborted earlier" as per the discussion in the link, which would be at the "management software" level, specifically at the engine.

If this is the case, passing the actual frequency from the vdsm to the engine via the get Capabilities message and aborting the migrate when the tsc frequencies for the source and destination hosts do not match before migration occurs may be what we are actually looking to accomplish.

If so it is a matter of the vdsm obtaining that value and then sending it with the CPU speed.

Comment 24 Steven Rosenberg 2019-03-20 10:50:10 UTC
Another clarification: invtsc is not a CPU flag, but a QEMU feature.

Comment 25 Ryan Barry 2019-03-20 11:14:43 UTC
(In reply to Steven Rosenberg from comment #23)
> According to the link Ryan provided in comment 22, the suggestion is to set
> the tsc-frequency explicitly by "management software":  
> 

Right, that "management software" would be RHV.

> The testing shows that qemu fails to set the frequency. 
>

Those other patches may not be present yet, and the original patch (plus testing) shows that an explicit frequency obtained from dmesg with +invtsc added works, so let's go with that.

Even on a physical Sky Lake host with all the flags exposed, we get the same exception when starting a VM this way, but adding +invtsc to the VM is enough to stgart it.
 
> The engine only has the cpu speed which is different from the specific value
> which is why the failure occurs as per the example in comment 13 which uses
> the cpu speed value:

The engine only has the CPU speed because we've never needed to pass in the TSC frequency before, but it's extremely easy to get from dmesg (and vdsm has suid binaries, so we know that isn't a problem), and easy to pass back to the engine. This is a solvable problem. 

> Which states that the goal is "to ensure there's no TSC frequency mismatch"

Yes, because +invtsc is used, and it won't ignore SET_TSC_KHZ, meaning that there's no risk of a sudden TSC adjust if it's migrated to another host, not because we're assuming qemu is going to do this for us.
  
> If we are sure the set is failing in QEMU because one cannot change the
> value and that mismatches will fail,

We're not, unless we want to trace down the ioctls. I'd suggest we don't since we have a path forward.
 
> then maybe the idea is to compare the TSC frequency of the source and
> destination when specifying High Performance (host-passthrough) "so
> migration is
> aborted earlier" as per the discussion in the link, which would be at the
> "management software" level, specifically at the engine.

The idea is to compare the TSC frequency of the host (needed with invtsc) and set it in the guest along with invtsc. To be honest, this is probably going to be easier unless there's a strong technical reason to actually move this to the engine, since the vdsm host hook can easy get (and manipulate) all the required information. Sharon?

> If this is the case, passing the actual frequency from the vdsm to the
> engine via the get Capabilities message and aborting the migrate when the
> tsc frequencies for the source and destination hosts do not match before
> migration occurs may be what we are actually looking to accomplish.

With +invtsc, we should require it to start the VM at all, not just migrations, at least in my testing. If the complexity of blocking the migration due to this (or other scheduler changes) is too high, we can document this limitation and at least move forward with migrations allowed between _some_ (identical CPU) hosts for HANA.

Comment 26 Sharon Gratch 2019-03-31 15:45:06 UTC
(In reply to Ryan Barry from comment #25)
> The idea is to compare the TSC frequency of the host (needed with invtsc)
> and set it in the guest along with invtsc. To be honest, this is probably
> going to be easier unless there's a strong technical reason to actually move
> this to the engine, since the vdsm host hook can easy get (and manipulate)
> all the required information. Sharon?

It seems that "TSC frequency" can be set as part of the sap_agent hook so that once this hook installed then both invtsc and "TSC frequency" (taken from vdsm host) are set for the running VM. 

The only problem is that we still need somehow to filter incompatible destination hosts (either non identical ones or just the ones that support lower frequency than the VM is running with). So we still need to pass that info to engine for each active VDS and running VM so that the scheduler will be able to filter those hosts prior to migration (the same as done for VdsDynamic.cpuflags for example). 

Did we test the live migration of VMs set with invtsc and "TSC frequency" to see what is supported exactly? We still need to decide if it's ok to migrate to any host that supports at least the VM's frequency or should the source and destination be identical so that we'll know how to handle that in scheduler...

Comment 29 Jiri Denemark 2019-06-07 14:43:30 UTC
BTW, libvirt gained support for reporting TSC frequency and scaling support in
the host capabilities (for bug 1641702):

    <capabilities>
      <host>
        ...
        <cpu>
          ...
          <counter name='tsc' frequency='N' scaling='on|off'/>
          ...
        </cpu>
        ...
      </host>
      ...
    </capabilities>

Comment 34 Michal Skrivanek 2019-06-28 09:15:32 UTC
*** Bug 1723583 has been marked as a duplicate of this bug. ***

Comment 35 Ryan Barry 2019-10-24 10:57:41 UTC
It can be tested on el7, and can be checked by creating a HP VM. Then see whether there tsc cpuflag is present. And that the VM can be migrated.

Comment 39 Polina 2019-11-27 12:27:30 UTC
verified according to the #c36 and #c38. 

the new bug https://bugzilla.redhat.com/show_bug.cgi?id=1777325

Comment 40 Sandro Bonazzola 2020-05-20 20:03:49 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 41 Red Hat Bugzilla 2023-09-14 05:24:06 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.