Bug 1416023 - Engine setup got stuck on the appliance
Summary: Engine setup got stuck on the appliance
Keywords:
Status: CLOSED DUPLICATE of bug 1409203
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-setup
Version: 4.1.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ovirt-4.1.0-rc
: ---
Assignee: Simone Tiraboschi
QA Contact: meital avital
URL:
Whiteboard:
Depends On: 1372260
Blocks: 1361511 1379405 1402435 1403903
TreeView+ depends on / blocked
 
Reported: 2017-01-24 11:41 UTC by Nikolai Sednev
Modified: 2019-04-28 14:10 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-26 16:41:09 UTC
oVirt Team: Integration
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport-alma04.qa.lab.tlv.redhat.com-20170124134123.tar.xz (8.54 MB, application/x-xz)
2017-01-24 11:45 UTC, Nikolai Sednev
no flags Details
sosreport-alma03.qa.lab.tlv.redhat.com-20170124155705.tar.xz (8.47 MB, application/x-xz)
2017-01-24 14:00 UTC, Nikolai Sednev
no flags Details
cloud-init issue (35.62 KB, image/png)
2017-01-24 14:46 UTC, Simone Tiraboschi
no flags Details
sosreport from the enginevm (7.85 MB, application/x-xz)
2017-01-24 14:59 UTC, Simone Tiraboschi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 71205 0 'None' MERGED cloud-init: restarting sshd in background 2021-02-10 13:06:50 UTC
oVirt gerrit 71206 0 'None' MERGED cloud-init: restarting sshd in background 2021-02-10 13:06:50 UTC
oVirt gerrit 71230 0 'None' MERGED cloud-init: add a comment to explain how to avoid 1372260 2021-02-10 13:06:50 UTC
oVirt gerrit 71232 0 'None' MERGED cloud-init: add a comment to explain how to avoid 1372260 2021-02-10 13:06:50 UTC

Description Nikolai Sednev 2017-01-24 11:41:05 UTC
Description of problem:
[ INFO  ] Running engine-setup on the appliance
[ ERROR ] Engine setup got stuck on the appliance
[ ERROR ] Failed to execute stage 'Closing up': Engine setup is stalled on the appliance since 1800 seconds ago. Please check its log on the appliance. 
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20170124120451.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20170124111743-t9fclj.log

This happens because of the wrong old component rhevm-appliance.noarch exists within the repositories:
yum list | grep appliance
rhvm-appliance.noarch               1:4.1.20170119.1-1.el7ev
rhevm-appliance.noarch              20161214.0-1.el7ev     rhv-multi-product-channel

I've manually installed as usual rhevm-appliance.noarch and during the deployment got stuck because of there was no root access was created on appliance, although I've implicitly answered "yes" for "Do you want to enable ssh access for the root user (yes, no, without-password) [yes]: "

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.On clean host run these:
yum clean all && yum install screen -y && screen yum update -y && yum install -y ovirt-hosted-engine-setup && yum install -y rhevm-appliance.noarch && systemctl stop NetworkManager && chkconfig NetworkManager off && systemctl is-enabled NetworkManager && systemctl status NetworkManager
2.Deploy hosted engine "hosted-engine --deploy".
3.

Actual results:
No ssh access available for root on the appliance during deployment and wrong name is given for the appliance and two appliances exists within the repos.

Expected results:
Only latest rhevm-appliance.noarch should exist within the repos and hosted engine deployment should succeed.

Additional info:
sosreport being attached.

Comment 1 Nikolai Sednev 2017-01-24 11:45:16 UTC
Created attachment 1243894 [details]
sosreport-alma04.qa.lab.tlv.redhat.com-20170124134123.tar.xz

Comment 3 Sandro Bonazzola 2017-01-24 12:03:42 UTC
Simone, can you please check?

Comment 5 Nikolai Sednev 2017-01-24 13:12:10 UTC
The workaround is to manually install
rhvm-appliance.noarch 1:4.1.20170119.1-1.el7ev.
If not installing any appliance prior to hosted-engine deployment, then during  deployment rhvm-appliance-4.1.20170119.1-1.el7ev.noarch.rpm being installed as appears bellow:

[ ERROR ] No engine appliance image is available on your system.
          The oVirt engine appliance is now required to deploy hosted-engine.
          You could get oVirt engine appliance installing ovirt-engine-appliance rpm.
          Do you want to install ovirt-engine-appliance rpm? (Yes, No) [Yes]: 
[ INFO  ] Stage: Environment packages setup
[ INFO  ] Installing the oVirt engine appliance
[ INFO  ] Yum Status: Downloading Packages
[ INFO  ] Yum Downloading: rhvm-appliance-4.1.20170119.1-1.el7ev.noarch.rpm 938 M(57%)

Then deployment succeeds.

The issue here is that rhevm name was changed to rhvm, "e" letter was lost, and for some reason I've seen both rhvm-appliance.noarch               1:4.1.20170119.1-1.el7ev and rhevm-appliance.noarch              20161214.0-1.el7ev rhv-multi-product-channel at the same time on host from it's repositories, but now I can see only rhvm-appliance-4.1.20170119.1-1.el7ev.noarch, which is fine.

Comment 6 Nikolai Sednev 2017-01-24 13:57:01 UTC
Deployment has failed:
# hosted-engine --deploy
[ INFO  ] Stage: Initializing
[ INFO  ] Generating a temporary VNC password.
[ INFO  ] Stage: Environment setup
          During customization use CTRL-D to abort.
          Continuing will configure this host for serving as hypervisor and create a VM where you have to install the engine afterwards.
          Are you sure you want to continue? (Yes, No)[Yes]: 
          It has been detected that this program is executed through an SSH connection without using screen.
          Continuing with the installation may lead to broken installation if the network connection fails.
          It is highly recommended to abort the installation and run it inside a screen session using command "screen".
          Do you want to continue anyway? (Yes, No)[No]: yes
[ INFO  ] Hardware supports virtualization
          Configuration files: []
          Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20170124142226-7iqzlz.log
          Version: otopi-1.6.0 (otopi-1.6.0-1.el7ev)
[ INFO  ] Detecting available oVirt engine appliances
[ ERROR ] No engine appliance image is available on your system.
          The oVirt engine appliance is now required to deploy hosted-engine.
          You could get oVirt engine appliance installing ovirt-engine-appliance rpm.
          Do you want to install ovirt-engine-appliance rpm? (Yes, No) [Yes]: 
[ INFO  ] Stage: Environment packages setup
[ INFO  ] Installing the oVirt engine appliance
[ INFO  ] Yum Status: Downloading Packages
[ INFO  ] Yum Downloading: rhvm-appliance-4.1.20170119.1-1.el7ev.noarch.rpm 938 M(57%)
[ INFO  ] Yum Download/Verify: 1:rhvm-appliance-4.1.20170119.1-1.el7ev.noarch
[ INFO  ] Yum Status: Check Package Signatures
[ INFO  ] Yum Status: Running Test Transaction
[ INFO  ] Yum Status: Running Transaction
[ INFO  ] Yum install: 1/1: 1:rhvm-appliance-4.1.20170119.1-1.el7ev.noarch
[ INFO  ] Yum Verify: 1/1: rhvm-appliance.noarch 1:4.1.20170119.1-1.el7ev - u
[ INFO  ] Stage: Programs detection
[ INFO  ] Stage: Environment setup
[ INFO  ] Generating libvirt-spice certificates
[WARNING] Cannot locate gluster packages, Hyper Converged setup support will be disabled.
[ INFO  ] Please abort the setup and install vdsm-gluster, glusterfs-server >= 3.7.2 and restart vdsmd service in order to gain Hyper Converged setup support.
[ INFO  ] Stage: Environment customization
         
          --== STORAGE CONFIGURATION ==--
         
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs3, nfs4)[nfs3]: 
          Please specify the full shared storage connection path to use (example: host:/path): 10.35.110.11:/Compute_NFS/nsednev_he_1
         
          --== HOST NETWORK CONFIGURATION ==--
         
          iptables was detected on your computer, do you wish setup to configure it? (Yes, No)[Yes]: 
          Please indicate a pingable gateway IP address [10.35.117.254]: 
          Please indicate a nic to set ovirtmgmt bridge on: (eno1, eno2, enp5s0f0, enp5s0f1) [eno1]: enp5s0f0
         
          --== VM CONFIGURATION ==--
         
          The following appliance have been found on your system:
                [1] - The RHEV-M Appliance image (OVA) - 4.1.20170119.1-1.el7ev
                [2] - Directly select an OVA file
          Please select an appliance (1, 2) [1]: 
[ INFO  ] Verifying its sha1sum
[ INFO  ] Checking OVF archive content (could take a few minutes depending on archive size)
[ INFO  ] Checking OVF XML content (could take a few minutes depending on archive size)
          Please specify the console type you would like to use to connect to the VM (vnc, spice) [vnc]: 
[ INFO  ] Detecting host timezone.
          Would you like to use cloud-init to customize the appliance on the first boot (Yes, No)[Yes]? 
          Would you like to generate on-fly a cloud-init ISO image (of no-cloud type)
          or do you have an existing one (Generate, Existing)[Generate]? 
          Please provide the FQDN you would like to use for the engine appliance.
          Note: This will be the FQDN of the engine VM you are now going to launch,
          it should not point to the base host or to any other existing machine.
          Engine VM FQDN: (leave it empty to skip):  []: nsednev-he-1.qa.lab.tlv.redhat.com
          Please provide the domain name you would like touse for the engine appliance.
          Engine VM domain: [qa.lab.tlv.redhat.com]
          Automatically execute engine-setup on the engine appliance on first boot (Yes, No)[Yes]? 
          Automatically restart the engine VM as a monitored service after engine-setup (Yes, No)[Yes]? 
          Enter root password that will be used for the engine appliance (leave it empty to skip): 
          Confirm appliance root password: 
          Enter ssh public key for the root user that will be used for the engine appliance (leave it empty to skip): 
[WARNING] Skipping appliance root ssh public key
          Do you want to enable ssh access for the root user (yes, no, without-password) [yes]: 
          Please specify the size of the VM disk in GB: [50]: 150
          Please specify the memory size of the VM in MB (Defaults to appliance OVF value): [4096]: 16384
          The following CPU types are supported by this host:
                 - model_SandyBridge: Intel SandyBridge Family
                 - model_Westmere: Intel Westmere Family
                 - model_Nehalem: Intel Nehalem Family
                 - model_Penryn: Intel Penryn Family
                 - model_Conroe: Intel Conroe Family
          Please specify the CPU type to be used by the VM [model_SandyBridge]: 
          Please specify the number of virtual CPUs for the VM (Defaults to appliance OVF value): [2]: 4
          You may specify a unicast MAC address for the VM or accept a randomly generated default [00:16:3e:71:fd:19]: 00:16:3E:7B:B8:53
          How should the engine VM network be configured (DHCP, Static)[DHCP]? 
          Add lines for the appliance itself and for this host to /etc/hosts on the engine VM?
          Note: ensuring that this host could resolve the engine VM hostname is still up to you
          (Yes, No)[No] yes
         
          --== HOSTED ENGINE CONFIGURATION ==--
         
          Enter engine admin password: 
          Confirm engine admin password: 
          Please provide the name of the SMTP server through which we will send notifications [localhost]: 
          Please provide the TCP port number of the SMTP server [25]: 
          Please provide the email address from which notifications will be sent [root@localhost]: 
          Please provide a comma-separated list of email addresses which will get notifications [root@localhost]: 
[ INFO  ] Stage: Setup validation
         
          --== CONFIGURATION PREVIEW ==--
         
          Bridge interface                   : enp5s0f0
          Engine FQDN                        : nsednev-he-1.qa.lab.tlv.redhat.com
          Bridge name                        : ovirtmgmt
          Host address                       : alma03.qa.lab.tlv.redhat.com
          SSH daemon port                    : 22
          Firewall manager                   : iptables
          Gateway address                    : 10.35.117.254
          Storage Domain type                : nfs3
          Image size GB                      : 150
          Host ID                            : 1
          Storage connection                 : 10.35.110.11:/Compute_NFS/nsednev_he_1
          Console type                       : vnc
          Memory size MB                     : 16384
          MAC address                        : 00:16:3E:7B:B8:53
          Number of CPUs                     : 4
          OVF archive (for disk boot)        : /usr/share/ovirt-engine-appliance/rhvm-appliance-4.1.20170119.1-1.el7ev.ova
          Appliance version                  : 4.1.20170119.1-1.el7ev
          Restart engine VM after engine-setup: True
          Engine VM timezone                 : Asia/Jerusalem
          CPU Type                           : model_SandyBridge
         
          Please confirm installation settings (Yes, No)[Yes]: 
[ INFO  ] Stage: Transaction setup
[ INFO  ] Stage: Misc configuration
[ INFO  ] Stage: Package installation
[ INFO  ] Stage: Misc configuration
[ INFO  ] Configuring libvirt
[ INFO  ] Configuring VDSM
[ INFO  ] Starting vdsmd
[ INFO  ] Configuring the management bridge
[ INFO  ] Creating Storage Domain
[ INFO  ] Creating Storage Pool
[ INFO  ] Connecting Storage Pool
[ INFO  ] Verifying sanlock lockspace initialization
[ INFO  ] Creating Image for 'hosted-engine.lockspace' ...
[ INFO  ] Image for 'hosted-engine.lockspace' created successfully
[ INFO  ] Creating Image for 'hosted-engine.metadata' ...
[ INFO  ] Image for 'hosted-engine.metadata' created successfully
[ INFO  ] Creating VM Image
[ INFO  ] Extracting disk image from OVF archive (could take a few minutes depending on archive size)
[ INFO  ] Validating pre-allocated volume size
[ INFO  ] Uploading volume to data domain (could take a few minutes depending on archive size)
[ INFO  ] Image successfully imported from OVF
[ INFO  ] Destroying Storage Pool
[ INFO  ] Start monitoring domain
[ INFO  ] Configuring VM
[ INFO  ] Updating hosted-engine configuration
[ INFO  ] Stage: Transaction commit
[ INFO  ] Stage: Closing up
[ INFO  ] Creating VM
          You can now connect to the VM with the following command:
                hosted-engine --console
          You can also graphically connect to the VM from your system with the following command:
                remote-viewer vnc://alma03.qa.lab.tlv.redhat.com:5900
          Use temporary password "1023hYtq" to connect to vnc console.
          Please ensure that your Guest OS is properly configured to support serial console according to your distro documentation.
          Follow http://www.ovirt.org/Serial_Console_Setup#I_need_to_access_the_console_the_old_way for more info.
          If you need to reboot the VM you will need to start it manually using the command:
          hosted-engine --vm-start
          You can then set a temporary password using the command:
          hosted-engine --add-console-password
[ INFO  ] Running engine-setup on the appliance
[ ERROR ] Engine setup got stuck on the appliance
[ ERROR ] Failed to execute stage 'Closing up': Engine setup is stalled on the appliance since 1800 seconds ago. Please check its log on the appliance. 
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20170124154208.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20170124142226-7iqzlz.log
[root@alma03 ~]# rpm -qa | grep appliance
rhvm-appliance-4.1.20170119.1-1.el7ev.noarch

Comment 7 Nikolai Sednev 2017-01-24 14:00:05 UTC
Created attachment 1243922 [details]
sosreport-alma03.qa.lab.tlv.redhat.com-20170124155705.tar.xz

Comment 8 Simone Tiraboschi 2017-01-24 14:46:33 UTC
Created attachment 1243944 [details]
cloud-init issue

Comment 9 Simone Tiraboschi 2017-01-24 14:59:27 UTC
Created attachment 1243946 [details]
sosreport from the enginevm

Comment 10 Nikolai Sednev 2017-01-24 16:49:39 UTC
I've also tried to deploy using the latest rhvm-appliance-4.1.20170123.0-1.el7ev.noarch. The result was the same.

Comment 11 Simone Tiraboschi 2017-01-24 17:00:36 UTC
The point is that for some reason the cloud-init script fails to run on the appliance; I double check the script and it seams fine.

Comment 12 Nikolai Sednev 2017-01-25 08:41:28 UTC
I was using a bit different deployment parameters than standard, but they doesn't seem to influence as they've worked before just fine.
I've used disk at the size of 150GB, 16384 RAM and 4CPUs for the HE-VM.

Comment 13 Nikolai Sednev 2017-01-25 14:44:27 UTC
I've tried to deploy with default requirements and also deployment met with failure.

Comment 14 Simone Tiraboschi 2017-01-25 17:14:07 UTC
Probably due to https://bugzilla.redhat.com/1372260, 'systemctl restart sshd' got stuck forever and this stuck also our cloud-init script till hosted-engine-setup killed the engine VM as its timeout expires.

Comment 15 Nikolai Sednev 2017-01-25 17:31:23 UTC
Once Simone applied https://gerrit.ovirt.org/#/c/71205/ on host, the deployment was successful.

Comment 16 Sandro Bonazzola 2017-01-26 16:41:09 UTC
Closing as duplicate of bug #1409203.
The root cause is the same, a regression in kernel package which has been workarounded in cloud-init script executed by ovirt-hosted-engine-setup.

*** This bug has been marked as a duplicate of bug 1409203 ***

Comment 17 Nikolai Sednev 2017-01-26 18:06:46 UTC
Please provide the work around.
I don't see any WA in https://bugzilla.redhat.com/show_bug.cgi?id=1409203.

Comment 18 Simone Tiraboschi 2017-01-27 09:14:22 UTC
Patch https://gerrit.ovirt.org/#/c/71205/ modifies our cloud-init script introducing a workaround: moving 'systemctl restart sshd' to a later stage for some, not that clear, reason seams enough to avoid BZ#1372260, more than that, if also that one is not enough, we are now executing it in background to be sure we can continue with the setup process.


Note You need to log in before you can comment on or make changes to this bug.