Bug 1147411
| Summary: | can't start hosted engine VM in cluster with 3+ hosts | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | rhev-integ | ||||||
| Component: | ovirt-hosted-engine-ha | Assignee: | Jiri Moskovcak <jmoskovc> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Nikolai Sednev <nsednev> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 3.5.0 | CC: | dfediuck, ecohen, gklein, iheim, jmoskovc, juwu, lsurette, mavital, rbalakri, sbonazzo, yeylon | ||||||
| Target Milestone: | --- | Keywords: | ZStream | ||||||
| Target Release: | 3.4.3 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | sla | ||||||||
| Fixed In Version: | ovirt-hosted-engine-ha-1.1.6-1.el6ev | Doc Type: | Bug Fix | ||||||
| Doc Text: |
Cause:
The ha agent expected the engine virtual machine to be in up state right after it's started not giving it enough time to actually boot and start the engine.
Consequence:
This makes agent to wrongly determine the state of the engine and the agent penalized the host giving it score 0. This makes other hosts with higher score better target for running the engine virtual machine so the VM is killed on the actual host and started on host with better score where the situation repeats.
Fix:
Change the logic to take :powering up" phase into consideration when checking for the engine state and don't penalize the host if the engine is powering up and wait until it's fully started.
Result:
The engine is properly started and the host score is not penalized while the engine vm is powering up.
|
Story Points: | --- | ||||||
| Clone Of: | 1130173 | Environment: | |||||||
| Last Closed: | 2014-10-27 22:47:09 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1097767 | ||||||||
| Attachments: |
|
||||||||
Created attachment 946350 [details]
answers.conf
Components: qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64 ovirt-hosted-engine-setup-1.2.1-1.el6ev.noarch libvirt-0.10.2-46.el6.x86_64 sanlock-2.8-1.el6.x86_64 vdsm-4.16.6-1.el6ev.x86_64 ovirt-hosted-engine-ha-1.2.2-2.el6ev.noarch Created attachment 946351 [details]
vdsm and supervdsm logs
The above failure is due to deployment issue and has nothing to do with this BZ. Moving to on_qa. After putting the HE vm to power-off via halt -p, and then running on the same host on which it ran before command hosted-engine --vm-start, engine doesn't starts on that particular host, but it starts on third host, which is seen as stale from host on which VM was tried to be started:
--== Host 4 status ==--
Status up-to-date : False
Hostname : 10.35.117.26
Host ID : 4
Engine status : unknown stale-data
Score : 2400
Local maintenance : False
Host timestamp : 1413953568
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1413953568 (Wed Oct 22 07:52:48 2014)
host-id=4
score=2400
maintenance=False
state=EngineUp
When entering to the host on which VM is running (the same that reported as unknown stale-data (10.35.117.26), then VM is shown as running on it:
--== Host 4 status ==--
Status up-to-date : True
Hostname : 10.35.117.26
Host ID : 4
Engine status : {"health": "good", "vm": "up", "detail": "up"}
Score : 2400
Local maintenance : False
Host timestamp : 1413953494
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1413953494 (Wed Oct 22 07:51:34 2014)
host-id=4
score=2400
maintenance=False
state=EngineUp
We don't have an issue with that HE VM doesn't started at all, it's started, but not on the requested host and third host shown incorrectly as stale.
Checked using these components: libvirt-0.10.2-46.el6.x86_64 ovirt-hosted-engine-ha-1.1.6-3.el6ev.noarch ovirt-host-deploy-1.2.3-1.el6ev.noarch qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64 vdsm-4.14.17-1.el6ev.x86_64 ovirt-hosted-engine-setup-1.1.5-1.el6ev.noarch sanlock-2.8-1.el6.x86_64 rhevm-3.4.3-1.2.el6ev.noarch It's expected behavior, when you killed the engine by halt -p the host running the engine VM got score 0 because of that unexpected shutdown, so when you tried to start it on the same host the agent detects there are hosts with better score and immediately re-starts the engine VM on the host with better score. And even if this would be a problem it's definitelly not connected with this bug so I don't understand why you marked it as FailedQA. (In reply to Jiri Moskovcak from comment #9) > It's expected behavior, when you killed the engine by halt -p the host > running the engine VM got score 0 because of that unexpected shutdown, so > when you tried to start it on the same host the agent detects there are > hosts with better score and immediately re-starts the engine VM on the host > with better score. And even if this would be a problem it's definitelly not > connected with this bug so I don't understand why you marked it as FailedQA. The reason I re-opened is because host on which VM was eventually powered-up was seen by 2 others as in stale state, although it was running the VM, additionally VM first was started on one host, then brought down and then up again, instead of doing it once, I'll verify this one and open 2 more on this issue, as root cause was fixed by you. (In reply to Nikolai Sednev from comment #10) > (In reply to Jiri Moskovcak from comment #9) > > It's expected behavior, when you killed the engine by halt -p the host > > running the engine VM got score 0 because of that unexpected shutdown, so > > when you tried to start it on the same host the agent detects there are > > hosts with better score and immediately re-starts the engine VM on the host > > with better score. And even if this would be a problem it's definitelly not > > connected with this bug so I don't understand why you marked it as FailedQA. > > The reason I re-opened is because host on which VM was eventually powered-up > was seen by 2 others as in stale state, although it was running the VM, > additionally VM first was started on one host, then brought down and then up > again, instead of doing it once, I'll verify this one and open 2 more on > this issue, as root cause was fixed by you. Is this test run by some script? The stale data might just mean that the agents on the other hosts weren't just running long enough, it takes time to synchronize. Hi Jiri,
Please provide the doc text or set require_doc_text flag to -.
Many thanks,
Julie
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2014-1722.html |
I was unable to complete the deployment of the engine: root@blue-vdsc ~]# hosted-engine --deploy [ INFO ] Stage: Initializing Continuing will configure this host for serving as hypervisor and create a VM where you have to install oVirt Engine afterwards. Are you sure you want to continue? (Yes, No)[Yes]: It has been detected that this program is executed through an SSH connection without using screen. Continuing with the installation may lead to broken installation if the network connection fails. It is highly recommended to abort the installation and run it inside a screen session using command "screen". Do you want to continue anyway? (Yes, No)[No]: yes [ INFO ] Generating a temporary VNC password. [ INFO ] Stage: Environment setup Configuration files: [] Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20141013142058-lt2k1s.log Version: otopi-1.3.0 (otopi-1.3.0-1.el6ev) [ INFO ] Hardware supports virtualization [ INFO ] Stage: Environment packages setup [ INFO ] Stage: Programs detection [ INFO ] Stage: Environment setup [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Generating libvirt-spice certificates [ INFO ] Stage: Environment customization --== STORAGE CONFIGURATION ==-- During customization use CTRL-D to abort. Please specify the storage you would like to use (iscsi, nfs3, nfs4)[nfs3]: Please specify the full shared storage connection path to use (example: host:/path): 10.35.160.108:/RHEV/nsednev_HE_3_5 [ INFO ] Installing on first host Please provide storage domain name. [hosted_storage]: Local storage datacenter name is an internal name and currently will not be shown in engine's admin UI.Please enter local datacenter name [hosted_datacenter]: --== SYSTEM CONFIGURATION ==-- --== NETWORK CONFIGURATION ==-- Please indicate a nic to set rhevm bridge on: (eth1, eth0) [eth1]: eth0 iptables was detected on your computer, do you wish setup to configure it? (Yes, No)[Yes]: Please indicate a pingable gateway IP address [10.35.103.254]: --== VM CONFIGURATION ==-- Please specify the device to boot the VM from (cdrom, disk, pxe) [cdrom]: pxe The following CPU types are supported by this host: - model_Conroe: Intel Conroe Family Please specify the CPU type to be used by the VM [model_Conroe]: Please specify the number of virtual CPUs for the VM [Defaults to minimum requirement: 2]: Please specify the disk size of the VM in GB [Defaults to minimum requirement: 25]: You may specify a unicast MAC address for the VM or accept a randomly generated default [00:16:3e:7c:64:66]: 00:16:3E:7B:B8:53 Please specify the memory size of the VM in MB [Defaults to minimum requirement: 4096]: Please specify the console type you would like to use to connect to the VM (vnc, spice) [vnc]: --== HOSTED ENGINE CONFIGURATION ==-- Enter the name which will be used to identify this host inside the Administrator Portal [hosted_engine_1]: Enter 'admin@internal' user password that will be used for accessing the Administrator Portal: Confirm 'admin@internal' user password: Please provide the FQDN for the engine you would like to use. This needs to match the FQDN that you will use for the engine installation within the VM. Note: This will be the FQDN of the VM you are now going to create, it should not point to the base host or to any other existing machine. Engine FQDN: nsednev-he-1.qa.lab.tlv.redhat.com Please provide the name of the SMTP server through which we will send notifications [localhost]: Please provide the TCP port number of the SMTP server [25]: Please provide the email address from which notifications will be sent [root@localhost]: Please provide a comma-separated list of email addresses which will get notifications [root@localhost]: [ INFO ] Stage: Setup validation --== CONFIGURATION PREVIEW ==-- Bridge interface : eth0 Engine FQDN : nsednev-he-1.qa.lab.tlv.redhat.com Bridge name : rhevm SSH daemon port : 22 Firewall manager : iptables Gateway address : 10.35.103.254 Host name for web application : hosted_engine_1 Host ID : 1 Image size GB : 25 Storage connection : 10.35.160.108:/RHEV/nsednev_HE_3_5 Console type : vnc Memory size MB : 4096 MAC address : 00:16:3E:7B:B8:53 Boot type : pxe Number of CPUs : 2 CPU Type : model_Conroe Please confirm installation settings (Yes, No)[Yes]: [ INFO ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf' [ INFO ] Stage: Transaction setup [ INFO ] Stage: Misc configuration [ INFO ] Stage: Package installation [ INFO ] Stage: Misc configuration [ INFO ] Configuring libvirt [ INFO ] Configuring VDSM [ INFO ] Starting vdsmd [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Configuring the management bridge [ INFO ] Creating Storage Domain [ INFO ] Creating Storage Pool [ INFO ] Connecting Storage Pool [ INFO ] Verifying sanlock lockspace initialization [ INFO ] Creating VM Image [ INFO ] Disconnecting Storage Pool [ INFO ] Start monitoring domain [ INFO ] Configuring VM [ INFO ] Updating hosted-engine configuration [ INFO ] Stage: Transaction commit [ INFO ] Stage: Closing up [ INFO ] Creating VM [ ERROR ] Failed to execute stage 'Closing up': Cannot set temporary password for console connection. The VM may not have been created: please check VDSM logs [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf' [ INFO ] Answer file '/etc/ovirt-hosted-engine/answers.conf' has been updated [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination