Bug 1147411

Summary:

can't start hosted engine VM in cluster with 3+ hosts

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

rhev-integ

Component:

ovirt-hosted-engine-ha

Assignee:

Jiri Moskovcak <jmoskovc>

Status:

CLOSED ERRATA

QA Contact:

Nikolai Sednev <nsednev>

Severity:

high

Docs Contact:

Priority:

high

Version:

3.5.0

CC:

dfediuck, ecohen, gklein, iheim, jmoskovc, juwu, lsurette, mavital, rbalakri, sbonazzo, yeylon

Target Milestone:

---

Keywords:

ZStream

Target Release:

3.4.3

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

sla

Fixed In Version:

ovirt-hosted-engine-ha-1.1.6-1.el6ev

Doc Type:

Bug Fix

Doc Text:

Cause: The ha agent expected the engine virtual machine to be in up state right after it's started not giving it enough time to actually boot and start the engine. Consequence: This makes agent to wrongly determine the state of the engine and the agent penalized the host giving it score 0. This makes other hosts with higher score better target for running the engine virtual machine so the VM is killed on the actual host and started on host with better score where the situation repeats. Fix: Change the logic to take :powering up" phase into consideration when checking for the engine state and don't penalize the host if the engine is powering up and wait until it's fully started. Result: The engine is properly started and the host score is not penalized while the engine vm is powering up.

Story Points:

---

Clone Of:

1130173

Environment:

Last Closed:

2014-10-27 22:47:09 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

SLA

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1097767

Attachments:

Description	Flags
answers.conf	none
vdsm and supervdsm logs	none

Comment 2 Nikolai Sednev 2014-10-13 11:51:16 UTC

I was unable to complete the deployment of the engine:
root@blue-vdsc ~]# hosted-engine --deploy
[ INFO  ] Stage: Initializing             
          Continuing will configure this host for serving as hypervisor and create a VM where you have to install oVirt Engine afterwards.                                                                                                                        
          Are you sure you want to continue? (Yes, No)[Yes]:                                                                     
          It has been detected that this program is executed through an SSH connection without using screen.                     
          Continuing with the installation may lead to broken installation if the network connection fails.                      
          It is highly recommended to abort the installation and run it inside a screen session using command "screen".          
          Do you want to continue anyway? (Yes, No)[No]: yes                                                                     
[ INFO  ] Generating a temporary VNC password.                                                                                   
[ INFO  ] Stage: Environment setup                                                                                               
          Configuration files: []                                                                                                
          Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20141013142058-lt2k1s.log                       
          Version: otopi-1.3.0 (otopi-1.3.0-1.el6ev)                                                                             
[ INFO  ] Hardware supports virtualization                                                                                       
[ INFO  ] Stage: Environment packages setup                                                                                      
[ INFO  ] Stage: Programs detection                                                                                              
[ INFO  ] Stage: Environment setup                                                                                               
[ INFO  ] Waiting for VDSM hardware info                                                                                         
[ INFO  ] Waiting for VDSM hardware info                                                                                         
[ INFO  ] Waiting for VDSM hardware info                                                                                         
[ INFO  ] Waiting for VDSM hardware info                                                                                         
[ INFO  ] Generating libvirt-spice certificates                                                                                  
[ INFO  ] Stage: Environment customization                                                                                       
                                                                                                                                 
          --== STORAGE CONFIGURATION ==--                                                                                        
                                                                                                                                 
          During customization use CTRL-D to abort.                                                                              
          Please specify the storage you would like to use (iscsi, nfs3, nfs4)[nfs3]:                                            
          Please specify the full shared storage connection path to use (example: host:/path): 10.35.160.108:/RHEV/nsednev_HE_3_5
[ INFO  ] Installing on first host                                                                                               
          Please provide storage domain name. [hosted_storage]:                                                                  
          Local storage datacenter name is an internal name and currently will not be shown in engine's admin UI.Please enter local datacenter name [hosted_datacenter]:                                                                                          
                                                                                                                                 
          --== SYSTEM CONFIGURATION ==--                                                                                         
                                                                                                                                 
                                                                                                                                 
          --== NETWORK CONFIGURATION ==--                                                                                        
                                                                                                                                 
          Please indicate a nic to set rhevm bridge on: (eth1, eth0) [eth1]: eth0                                                
          iptables was detected on your computer, do you wish setup to configure it? (Yes, No)[Yes]:                             
          Please indicate a pingable gateway IP address [10.35.103.254]:                                                         
                                                                                                                                 
          --== VM CONFIGURATION ==--                                                                                             
                                                                                                                                 
          Please specify the device to boot the VM from (cdrom, disk, pxe) [cdrom]: pxe                                          
          The following CPU types are supported by this host:                                                                    
                 - model_Conroe: Intel Conroe Family                                                                             
          Please specify the CPU type to be used by the VM [model_Conroe]:                                                       
          Please specify the number of virtual CPUs for the VM [Defaults to minimum requirement: 2]:                             
          Please specify the disk size of the VM in GB [Defaults to minimum requirement: 25]:                                    
          You may specify a unicast MAC address for the VM or accept a randomly generated default [00:16:3e:7c:64:66]: 00:16:3E:7B:B8:53                                                                                                                          
          Please specify the memory size of the VM in MB [Defaults to minimum requirement: 4096]:                                
          Please specify the console type you would like to use to connect to the VM (vnc, spice) [vnc]:                         
                                                                                                                                 
          --== HOSTED ENGINE CONFIGURATION ==--                                                                                  
                                                                                                                                 
          Enter the name which will be used to identify this host inside the Administrator Portal [hosted_engine_1]:             
          Enter 'admin@internal' user password that will be used for accessing the Administrator Portal:                         
          Confirm 'admin@internal' user password:                                                                                
          Please provide the FQDN for the engine you would like to use.                                                          
          This needs to match the FQDN that you will use for the engine installation within the VM.                              
          Note: This will be the FQDN of the VM you are now going to create,                                                     
          it should not point to the base host or to any other existing machine.                                                 
          Engine FQDN: nsednev-he-1.qa.lab.tlv.redhat.com                                                                        
          Please provide the name of the SMTP server through which we will send notifications [localhost]:                       
          Please provide the TCP port number of the SMTP server [25]:                                                            
          Please provide the email address from which notifications will be sent [root@localhost]:                               
          Please provide a comma-separated list of email addresses which will get notifications [root@localhost]:                
[ INFO  ] Stage: Setup validation                                                                                                
                                                                                                                                 
          --== CONFIGURATION PREVIEW ==--                                                                                        
                                                                                                                                 
          Bridge interface                   : eth0                                                                              
          Engine FQDN                        : nsednev-he-1.qa.lab.tlv.redhat.com                                                
          Bridge name                        : rhevm                                                                             
          SSH daemon port                    : 22                                                                                
          Firewall manager                   : iptables                                                                          
          Gateway address                    : 10.35.103.254                                                                     
          Host name for web application      : hosted_engine_1                                                                   
          Host ID                            : 1                                                                                 
          Image size GB                      : 25                                                                                
          Storage connection                 : 10.35.160.108:/RHEV/nsednev_HE_3_5                                                
          Console type                       : vnc                                                                               
          Memory size MB                     : 4096                                                                              
          MAC address                        : 00:16:3E:7B:B8:53                                                                 
          Boot type                          : pxe                                                                               
          Number of CPUs                     : 2
          CPU Type                           : model_Conroe

          Please confirm installation settings (Yes, No)[Yes]:
[ INFO  ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf'
[ INFO  ] Stage: Transaction setup
[ INFO  ] Stage: Misc configuration
[ INFO  ] Stage: Package installation
[ INFO  ] Stage: Misc configuration
[ INFO  ] Configuring libvirt
[ INFO  ] Configuring VDSM
[ INFO  ] Starting vdsmd
[ INFO  ] Waiting for VDSM hardware info
[ INFO  ] Waiting for VDSM hardware info
[ INFO  ] Waiting for VDSM hardware info
[ INFO  ] Configuring the management bridge
[ INFO  ] Creating Storage Domain
[ INFO  ] Creating Storage Pool
[ INFO  ] Connecting Storage Pool
[ INFO  ] Verifying sanlock lockspace initialization
[ INFO  ] Creating VM Image
[ INFO  ] Disconnecting Storage Pool
[ INFO  ] Start monitoring domain
[ INFO  ] Configuring VM
[ INFO  ] Updating hosted-engine configuration
[ INFO  ] Stage: Transaction commit
[ INFO  ] Stage: Closing up
[ INFO  ] Creating VM
[ ERROR ] Failed to execute stage 'Closing up': Cannot set temporary password for console connection. The VM may not have been created: please check VDSM logs
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf'
[ INFO  ] Answer file '/etc/ovirt-hosted-engine/answers.conf' has been updated
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination

Comment 3 Nikolai Sednev 2014-10-13 11:53:42 UTC

Created attachment 946350 [details]
answers.conf

Comment 4 Nikolai Sednev 2014-10-13 11:55:24 UTC

Components:
qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64
ovirt-hosted-engine-setup-1.2.1-1.el6ev.noarch
libvirt-0.10.2-46.el6.x86_64
sanlock-2.8-1.el6.x86_64
vdsm-4.16.6-1.el6ev.x86_64
ovirt-hosted-engine-ha-1.2.2-2.el6ev.noarch

Comment 5 Nikolai Sednev 2014-10-13 11:58:44 UTC

Created attachment 946351 [details]
vdsm and supervdsm logs

Comment 6 Doron Fediuck 2014-10-14 12:12:37 UTC

The above failure is due to deployment issue and has nothing to do with this BZ.
Moving to on_qa.

Comment 7 Nikolai Sednev 2014-10-22 10:57:59 UTC

After putting the HE vm to power-off via halt -p, and then running on the same host on which it ran before command hosted-engine --vm-start, engine doesn't starts on that particular host, but it starts on third host, which is seen as stale from host on which VM was tried to be started:
--== Host 4 status ==--

Status up-to-date                  : False
Hostname                           : 10.35.117.26
Host ID                            : 4
Engine status                      : unknown stale-data
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 1413953568
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=1413953568 (Wed Oct 22 07:52:48 2014)
        host-id=4
        score=2400
        maintenance=False
        state=EngineUp

When entering to the host on which VM is running (the same that reported as unknown stale-data (10.35.117.26), then VM is shown as running on it:

--== Host 4 status ==--

Status up-to-date                  : True
Hostname                           : 10.35.117.26
Host ID                            : 4
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 1413953494
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=1413953494 (Wed Oct 22 07:51:34 2014)
        host-id=4
        score=2400
        maintenance=False
        state=EngineUp


We don't have an issue with that HE VM doesn't started at all, it's started, but not on the requested host and third host shown incorrectly as stale.

Comment 8 Nikolai Sednev 2014-10-22 11:05:29 UTC

Checked using these components:
libvirt-0.10.2-46.el6.x86_64
ovirt-hosted-engine-ha-1.1.6-3.el6ev.noarch
ovirt-host-deploy-1.2.3-1.el6ev.noarch
qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64
vdsm-4.14.17-1.el6ev.x86_64
ovirt-hosted-engine-setup-1.1.5-1.el6ev.noarch
sanlock-2.8-1.el6.x86_64
rhevm-3.4.3-1.2.el6ev.noarch

Comment 9 Jiri Moskovcak 2014-10-23 06:45:39 UTC

It's expected behavior, when you killed the engine by halt -p the host running the engine VM got score 0 because of that unexpected shutdown, so when you tried to start it on the same host the agent detects there are hosts with better score and immediately re-starts the engine VM on the host with better score. And even if this would be a problem it's definitelly not connected with this bug so I don't understand why you marked it as FailedQA.

Comment 10 Nikolai Sednev 2014-10-23 07:04:49 UTC

(In reply to Jiri Moskovcak from comment #9)
> It's expected behavior, when you killed the engine by halt -p the host
> running the engine VM got score 0 because of that unexpected shutdown, so
> when you tried to start it on the same host the agent detects there are
> hosts with better score and immediately re-starts the engine VM on the host
> with better score. And even if this would be a problem it's definitelly not
> connected with this bug so I don't understand why you marked it as FailedQA.

The reason I re-opened is because host on which VM was eventually powered-up was seen by 2 others as in stale state, although it was running the VM, additionally VM first was started on one host, then brought down and then up again, instead of doing it once, I'll verify this one and open 2 more on this issue, as root cause was fixed by you.

Comment 11 Jiri Moskovcak 2014-10-23 07:08:41 UTC

(In reply to Nikolai Sednev from comment #10)
> (In reply to Jiri Moskovcak from comment #9)
> > It's expected behavior, when you killed the engine by halt -p the host
> > running the engine VM got score 0 because of that unexpected shutdown, so
> > when you tried to start it on the same host the agent detects there are
> > hosts with better score and immediately re-starts the engine VM on the host
> > with better score. And even if this would be a problem it's definitelly not
> > connected with this bug so I don't understand why you marked it as FailedQA.
> 
> The reason I re-opened is because host on which VM was eventually powered-up
> was seen by 2 others as in stale state, although it was running the VM,
> additionally VM first was started on one host, then brought down and then up
> again, instead of doing it once, I'll verify this one and open 2 more on
> this issue, as root cause was fixed by you.

Is this test run by some script? The stale data might just mean that the agents on the other hosts weren't just running long enough, it takes time to synchronize.

Comment 12 Julie 2014-10-23 08:53:45 UTC

Hi Jiri,
    Please provide the doc text or set require_doc_text flag to -.

Many thanks,
Julie

Comment 14 errata-xmlrpc 2014-10-27 22:47:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2014-1722.html