Bug 1411783

Summary: Update of the HE VM does not work
Product: [oVirt] ovirt-hosted-engine-ha Reporter: Artyom <alukiano>
Component: AgentAssignee: Jenny Tokar <jtokar>
Status: CLOSED CURRENTRELEASE QA Contact: Artyom <alukiano>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 2.1.0CC: alukiano, bugs, dfediuck, mavital, msivak
Target Milestone: ovirt-4.1.0-rcKeywords: Regression, Triaged
Target Release: 2.1.0.1Flags: rule-engine: ovirt-4.1+
rule-engine: blocker+
rule-engine: planning_ack+
msivak: devel_ack+
mavital: testing_ack+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-15 14:51:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs(you can start looking from the date "Tue Jan 10 15:15:00") none

Description Artyom 2017-01-10 13:26:32 UTC
Created attachment 1239107 [details]
logs(you can start looking from the date "Tue Jan 10 15:15:00")

Description of problem:
Update of the HE VM does not work

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-2.1.0-0.0.master.20170105095417.20170105095414.git017505b.el7.centos.noarch
ovirt-hosted-engine-setup-2.1.0-0.0.master.20170104124556.git776e0f1.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch

How reproducible:
Always

Steps to Reproduce:
1. Deploy HE and add master storage domain to the engine to initiate auto-import process(HE VM has 4GB)
2. Enable global maintenance
3. Update OvfUpdateIntervalInMinutes to 1 minute (# engine-config -s OvfUpdateIntervalInMinutes=1 && systemctl restart ovirt-engine)
4. Update HE VM memory to 6GB
5. Wait 5 minutes(to be sure that OVF updated)
6. Restart HE VM(on the host # hosted-engine --vm-poweroff && hosted-engine --vm-start)
7. Check amount of the memory on the HE VM guest OS

Actual results:
Guest OS has 4GB of the memory

Expected results:
Guest OS has 8GB of the memory

Additional info:
I also tried to reduce number of CPU's and it also does not work

Comment 1 Doron Fediuck 2017-01-11 11:33:13 UTC
Looking at the agent log below shows an overrun.
Martin, care to review?

1. 4GiB before the change:

MainThread::DEBUG::2017-01-10 15:16:30,408::config::448::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_file_content_from_shared_storage) Reading 'vm.conf' from '/rhev/data-center/mnt/10.35.110.11:_Compute__NFS_alukiano_he__2/c33f3f2a-22ec-4355-aa1c-978579065b02/images/59951756-4398-46aa-92aa-dcb441dae05e/53bfd738-0e61-4779-b04a-2675b490c181'
MainThread::DEBUG::2017-01-10 15:16:30,409::heconflib::73::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_dd_pipe_tar) executing: 'dd if=/rhev/data-center/mnt/10.35.110.11:_Compute__NFS_alukiano_he__2/c33f3f2a-22ec-4355-aa1c-978579065b02/images/59951756-4398-46aa-92aa-dcb441dae05e/53bfd738-0e61-4779-b04a-2675b490c181 bs=4k'
...
MainThread::DEBUG::2017-01-10 15:16:30,425::heconflib::74::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_dd_pipe_tar) executing: 'tar -xOf - vm.conf'
MainThread::DEBUG::2017-01-10 15:16:30,438::heconflib::92::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_dd_pipe_tar) stdout: vmId=d53e5737-22ac-44f9-bb10-3006dee22b05
memSize=4096

2. Then we see the update:

MainThread::INFO::2017-01-10 15:16:44,004::config::409::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert
MainThread::DEBUG::2017-01-10 15:16:44,009::ovf2VmParams::243::root::(confFromOvf) conf is cpuType=Conroe
emulatedMachine=pc-i440fx-rhel7.3.0
...
memSize=6144


3. Then it's back to 4GiB:

MainThread::DEBUG::2017-01-10 15:16:44,025::heconflib::142::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(extractConfFile) extracting 'vm.conf' from '/rhev/data-center/mnt/10.35.110.11:_Compute__NFS_alukiano_he__2/c33f3f2a-22ec-4355-aa1c-978579065b02/images/59951756-4398-46aa-92aa-dcb441dae05e/53bfd738-0e61-4779-b04a-2675b490c181'
...
MainThread::DEBUG::2017-01-10 15:16:44,025::heconflib::74::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_dd_pipe_tar) executing: 'tar -xOf - vm.conf'
MainThread::DEBUG::2017-01-10 15:16:44,038::heconflib::92::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_dd_pipe_tar) stdout: vmId=d53e5737-22ac-44f9-bb10-3006dee22b05
memSize=4096

Comment 2 Doron Fediuck 2017-01-11 11:40:29 UTC
Artyom,
looking at the time diff (0.021 sec) suggests it may be a race.
What is the frequency of reproduction?

Comment 3 Martin Sivák 2017-01-11 11:54:30 UTC
Artyom, you do not need to do the step no. 3 anymore:

3. Update OvfUpdateIntervalInMinutes to 1 minute (# engine-config -s OvfUpdateIntervalInMinutes=1 && systemctl restart ovirt-engine)

But even if you do, what do you see in the webadmin UI? And does this happen when you leave the update interval set to 60s?

Comment 4 Artyom 2017-01-11 13:54:34 UTC
To Doron:
I tried it 3 times on different setups and all time have the same issue so for it 100%

To Martin:
1. Under the UI I can see that HE VM has updated configuration(like memory equal to 6GB)
2. I tried it first with default OVF update interval, when it did not work I changed OVF update interval to 1 minute.

Comment 5 Martin Sivák 2017-01-12 12:35:37 UTC
Doron, your comment talks about two different files. The vm.conf and the ovf stores. We have both in the shared storage and only the OVF store is updated (vm.conf is the original configuration as used by setup).

The OVF store contains the proper value since the update and never reverts back.

So the bug is probably only in the part that decides which file will be used to start the VM.

Comment 6 Martin Sivák 2017-01-12 13:00:10 UTC
And that is here:

MainThread::INFO::2017-01-10 15:16:44,009::config::414::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE
MainThread::DEBUG::2017-01-10 

Here we know that the OVF value was properly extracted. And yet the if not content: returns true and continues to read the fallback config file.

ovirt_hosted_engine_ha/env/config.py:239

15:16:44,010::config::448::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_file_content_from_shared_storage) Reading 'vm.conf' from '/rhev/data-center/mnt/10.35.110.11:_Compute__NFS_alukiano_he__2/c33f3f2a-22ec-4355-aa1c-978579065b02/images/59951756-4398-46aa-92aa-dcb441dae05e/53bfd738-0e61-4779-b04a-2675b490c181'

Comment 7 Martin Sivák 2017-01-12 13:33:08 UTC
The code I cited was a bit old, but the new one had the issue slightly better hidden. We properly used the OVF content, but then rewrote it when we tried to publish it to the /var/run cache file.

Comment 8 Artyom 2017-01-24 11:02:23 UTC
The problem still exists under ovirt-hosted-engine-ha-2.1.0-1.el7ev.noarch

Comment 9 Red Hat Bugzilla Rules Engine 2017-01-24 11:02:29 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 10 Artyom 2017-02-05 14:30:54 UTC
Verified on ovirt-hosted-engine-ha-2.1.0.1-1.el7ev.noarch

1) Memory update - PASS
2) CPU update - PASS
3) Add additional nic - PASS