Bug 1671912 - Hosted engine deploy failed via ansible playbook with ovirt-ansible packages
Summary: Hosted engine deploy failed via ansible playbook with ovirt-ansible packages
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: cockpit-ovirt
Classification: oVirt
Component: Hosted Engine
Version: 0.12.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ovirt-4.3.3
: ---
Assignee: Simone Tiraboschi
QA Contact: Wei Wang
URL:
Whiteboard:
Depends On: 1693560
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-02 05:49 UTC by Wei Wang
Modified: 2019-04-16 13:58 UTC (History)
11 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-04-16 13:58:33 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.3+
cshao: testing_ack+


Attachments (Terms of Use)
Log file (1.02 MB, application/gzip)
2019-02-02 05:49 UTC, Wei Wang
no flags Details
New log file (6.38 MB, application/gzip)
2019-03-20 09:41 UTC, Wei Wang
no flags Details

Description Wei Wang 2019-02-02 05:49:45 UTC
Created attachment 1526175 [details]
Log file

Description of problem:
Hosted engine deploy failed via ansible playbook with ovirt-ansible packages
TASK [ovirt.hosted_engine_setup : Wait for the host to be up] *****************************************************************************************************************************************************
FAILED - RETRYING: Wait for the host to be up (120 retries left).
FAILED - RETRYING: Wait for the host to be up (119 retries left).
FAILED - RETRYING: Wait for the host to be up (118 retries left).
FAILED - RETRYING: Wait for the host to be up (117 retries left).
FAILED - RETRYING: Wait for the host to be up (116 retries left).
FAILED - RETRYING: Wait for the host to be up (115 retries left).
... ...
TASK [ovirt.hosted_engine_setup : Notify the user about a failure] ************************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
	to retry, use: --limit @/root/hosted_engine_deploy.retry

PLAY RECAP ********************************************************************************************************************************************************************************************************
localhost                  : ok=199  changed=60   unreachable=0    failed=2   
rhevh-hostedengine-vm-06.lab.eng.pek2.redhat.com : ok=29   changed=11   unreachable=0    failed=0   


Version-Release number of selected component (if applicable):
RHVH-4.3-20190201.0-RHVH-x86_64-dvd1.iso
ovirt-ansible-engine-setup-1.1.7-1.el7ev.noarch
ovirt-ansible-hosted-engine-setup-1.0.8-1.el7ev.noarch
ovirt-ansible-repositories-1.1.4-2.el7ev.noarch
rhvm-appliance-4.3-20190129.0.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Clean install RHVH-4.3-20190201.0-RHVH-x86_64-dvd1.iso
2. Create playbook according to https://github.com/oVirt/ovirt-ansible-hosted-engine-setup#example-playbook
3. Deploy hosted engine via ansible playbook(https://raw.githubusercontent.com/oVirt/ovirt-ansible-hosted-engine-setup/master/examples/hosted_engine_deploy_localhost.yml) with ovirt-ansible packages.

Actual results:
hosted engine deploy failed via ansible playbook with ovirt-ansible packages

Expected results:
hosted engine deploy successfully via ansible playbook with ovirt-ansible packages

Additional info:
Hosted engine deploy successfully via cockpit UI.

Comment 1 Simone Tiraboschi 2019-02-04 13:20:19 UTC
According to engine.log the hostname has been refused by the engine:

2019-02-02 10:45:47,701+08 WARN  [org.ovirt.engine.core.bll.hostdeploy.AddVdsCommand] (default task-1) [a40d4701-fc79-40bd-8cb1-a27f862524c8] Validation of action 'AddVds' failed for user admin@internal-authz. Reasons: VAR__ACTION__ADD,VAR__TYPE__HOST,$server {'skip_reason': 'Conditional result was False', 'skipped': True, 'changed': False},VALIDATION_VDS_NAME_INVALID,$groups [Ljava.lang.Class;@3ad6c71a,$message VALIDATION_VDS_NAME_INVALID,$payload [Ljava.lang.Class;@40b9f1b1,ACTION_TYPE_FAILED_ATTRIBUTE_PATH,$path vdsStatic.name,$validatedValue {'skip_reason': 'Conditional result was False', 'skipped': True, 'changed': False},VALIDATION_VDS_HOSTNAME_HOSTNAME_OR_IP,$groups [Ljava.lang.Class;@2e4c2283,$message VALIDATION_VDS_HOSTNAME_HOSTNAME_OR_IP,$payload [Ljava.lang.Class;@515256cd,ACTION_TYPE_FAILED_ATTRIBUTE_PATH,$path vdsStatic.hostName,$validatedValue {'skip_reason': 'Conditional result was False', 'skipped': True, 'changed': False}
2019-02-02 10:45:47,726+08 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-1) [] Operation Failed: [Host name must be formed of alphanumeric characters, numbers or "-_.", Attribute: vdsStatic.name, Host address must be a FQDN or a valid IP address, Attribute: vdsStatic.name]

Comment 3 Sandro Bonazzola 2019-02-18 07:54:59 UTC
Moving to 4.3.2 not being identified as blocker for 4.3.1.

Comment 4 Sandro Bonazzola 2019-03-20 08:21:18 UTC
Can you please try to reproduce with latest build?

Comment 5 Wei Wang 2019-03-20 09:39:26 UTC
Test With RHVH-4.3-20190313.3-RHVH-x86_64-dvd1.iso
Failed as below:
TASK [ovirt.hosted_engine_setup : Check for the local bootstrap VM] *******************************************************************************************************************************************
ok: [localhost]

TASK [ovirt.hosted_engine_setup : Make the engine aware that the external VM is stopped] **********************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: Error: Fault reason is "Operation Failed". Fault detail is "[Desktop does not exist]". HTTP response code is 400.
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Desktop does not exist]\". HTTP response code is 400."}
...ignoring

Steps:
According to docs https://github.com/oVirt/ovirt-ansible-hosted-engine-setup
1. ansible-galaxy install the two roles
2. create hosted_engine_deploy_localhost.yml, passwords.yml,he_deployment.json
3. ansible-vault encrypt passwords.yml
4. run "ansible-playbook hosted_engine_deploy_localhost.yml --extra-vars='@he_deployment.json' --extra-vars='@passwords.yml' --ask-vault-pass"

Result:
Deploy failed.

Comment 6 Wei Wang 2019-03-20 09:41:05 UTC
Created attachment 1545964 [details]
New log file

Comment 7 Simone Tiraboschi 2019-03-20 09:51:01 UTC
(In reply to Wei Wang from comment #5)
> Test With RHVH-4.3-20190313.3-RHVH-x86_64-dvd1.iso
> Failed as below:
> TASK [ovirt.hosted_engine_setup : Check for the local bootstrap VM]
> *****************************************************************************
> **************************************************************
> ok: [localhost]
> 
> TASK [ovirt.hosted_engine_setup : Make the engine aware that the external VM
> is stopped]
> *****************************************************************************
> *****************************************
> An exception occurred during task execution. To see the full traceback, use
> -vvv. The error was: Error: Fault reason is "Operation Failed". Fault detail
> is "[Desktop does not exist]". HTTP response code is 400.
> fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is
> \"Operation Failed\". Fault detail is \"[Desktop does not exist]\". HTTP
> response code is 400."}
> ...ignoring


This is a duplicate of a vdsm bug: https://bugzilla.redhat.com/show_bug.cgi?id=1690301#c3

Can you please retest once that one is fixed?

Comment 8 Wei Wang 2019-03-20 10:03:07 UTC
(In reply to Simone Tiraboschi from comment #7)
> (In reply to Wei Wang from comment #5)
> > Test With RHVH-4.3-20190313.3-RHVH-x86_64-dvd1.iso
> > Failed as below:
> > TASK [ovirt.hosted_engine_setup : Check for the local bootstrap VM]
> > *****************************************************************************
> > **************************************************************
> > ok: [localhost]
> > 
> > TASK [ovirt.hosted_engine_setup : Make the engine aware that the external VM
> > is stopped]
> > *****************************************************************************
> > *****************************************
> > An exception occurred during task execution. To see the full traceback, use
> > -vvv. The error was: Error: Fault reason is "Operation Failed". Fault detail
> > is "[Desktop does not exist]". HTTP response code is 400.
> > fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is
> > \"Operation Failed\". Fault detail is \"[Desktop does not exist]\". HTTP
> > response code is 400."}
> > ...ignoring
> 
> 
> This is a duplicate of a vdsm bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1690301#c3
> 
> Can you please retest once that one is fixed?

Yes, I will retest it once the vdsm bug is fixed.

Comment 9 Sandro Bonazzola 2019-03-26 13:54:01 UTC
To be tested with 4.3.3 RC1 compose

Comment 10 Wei Wang 2019-03-28 04:37:14 UTC
Test Version:
rhvh-4.3.0.5-0.20190327.0+1
ovirt-ansible-repositories-1.1.5-1.el7ev.noarch
ovirt-ansible-engine-setup-1.1.9-1.el7ev.noarch
ovirt-ansible-hosted-engine-setup-1.0.14-1.el7ev.noarch
rhvm-appliance-4.3-20190327.0.el7.x86_64

Steps:
According to docs https://github.com/oVirt/ovirt-ansible-hosted-engine-setup
1. ansible-galaxy install the two roles
2. create hosted_engine_deploy_localhost.yml, passwords.yml,he_deployment.json
3. ansible-vault encrypt passwords.yml
4. run "ansible-playbook hosted_engine_deploy_localhost.yml --extra-vars='@he_deployment.json' --extra-vars='@passwords.yml' --ask-vault-pass"

Result:
Deploy failed as below issue:
TASK [ovirt.hosted_engine_setup : Wait for the host to be up] *************************************************************************************************************************************************
FAILED - RETRYING: Wait for the host to be up (120 retries left).
FAILED - RETRYING: Wait for the host to be up (119 retries left).
....
FAILED - RETRYING: Wait for the host to be up (1 retries left).
fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "hp-dl388g9-04.lab.eng.pek2.redhat.com", "affinity_labels": [], "auto_numa_status": "unknown", "certificate": {"organization": "lab.eng.pek2.redhat.com", "subject": "O=lab.eng.pek2.redhat.com,CN=hp-dl388g9-04.lab.eng.pek2.redhat.com"}, "cluster": {"href": "/ovirt-engine/api/clusters/b886db90-5104-11e9-baa3-5254003404b0", "id": "b886db90-5104-11e9-baa3-5254003404b0"}, "comment": "", "cpu": {"speed": 0.0, "topology": {}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"supported_rng_sources": []}, "hooks": [], "href": "/ovirt-engine/api/hosts/44ad9074-1f95-4446-a4e8-ba2dd226a872", "id": "44ad9074-1f95-4446-a4e8-ba2dd226a872", "katello_errata": [], "kdump_status": "unknown", "ksm": {"enabled": false}, "max_scheduling_memory": 0, "memory": 0, "name": "hp-dl388g9-04.lab.eng.pek2.redhat.com", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": false, "os": {"custom_kernel_cmdline": ""}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:/+437R+N0iIA6tU08WHTIqK4xBXOcqnWm++Nyc6pp+o", "port": 22}, "statistics": [], "status": "install_failed", "storage_connection_extensions": [], "summary": {"total": 0}, "tags": [], "transparent_huge_pages": {"enabled": false}, "type": "ovirt_node", "unmanaged_networks": [], "update_available": false, "vgpu_placement": "consolidated"}]}, "attempts": 120, "changed": false}

TASK [ovirt.hosted_engine_setup : Fetch logs from the engine VM] **********************************************************************************************************************************************
included: /usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/fetch_engine_logs.yml for localhost

ovirt-engine/engine.log
2019-03-28 10:59:06,078+08 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [6305c0d5] Host installation failed for host '44ad9074-1f95-4446-a4e8-ba2dd226a872', 'hp-dl388g9-04.lab.eng.pek2.redhat.com': Failed to execute Ansible host-deploy role. Please check logs for more details: /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20190328105638-hp-dl388g9-04.lab.eng.pek2.redhat.com-6305c0d5.log
2019-03-28 10:59:06,081+08 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1) [6305c0d5] START, SetVdsStatusVDSCommand(HostName = hp-dl388g9-04.lab.eng.pek2.redhat.com, SetVdsStatusVDSCommandParameters:{hostId='44ad9074-1f95-4446-a4e8-ba2dd226a872', status='InstallFailed', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 6d0d66


/var/log/ovirt-hosted-engine-setup/engine-logs-2019-03-28T02:51:06Z/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20190328105638-hp-dl388g9-04.lab.eng.pek2.redhat.com-6305c0d5.log:
2019-03-28 10:59:06,007 p=5835 u=ovirt |  TASK [oVirt.metrics/roles/oVirt.initial-validations : Set fluentd_base_packages_available fact] ***
2019-03-28 10:59:06,030 p=5835 u=ovirt |  fatal: [hp-dl388g9-04.lab.eng.pek2.redhat.com]: FAILED! => {}

MSG:

The conditional check 'item.results == []' failed. The error was: error while evaluating conditional (item.results == []): 'dict object' has no attribute 'results'

The error appears to have been in '/usr/share/ansible/roles/oVirt.metrics/roles/oVirt.initial-validations/tasks/check_logging_collectors.yml': line 60, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:


- name: Set fluentd_base_packages_available fact
  ^ here


Change the status to "ASSIGNED"

Comment 11 Simone Tiraboschi 2019-03-28 09:25:44 UTC
Moving back to ON_QA, please retest once https://bugzilla.redhat.com/show_bug.cgi?id=1693560 is fixed.

Comment 12 Wei Wang 2019-03-29 01:49:33 UTC
Test Version
RHVH-4.3-20190328.0-RHVH-x86_64-dvd1.iso
ovirt-ansible-repositories-1.1.5-1.el7ev.noarch
ovirt-ansible-engine-setup-1.1.9-1.el7ev.noarch
ovirt-ansible-hosted-engine-setup-1.0.14-1.el7ev.noarch
rhvm-appliance-4.3-20190328.1.el7.rpm

Test Steps:
According to comment 5

Result:
Hosted engine deploy successfully.

Bug is fixed, change the status to "VERIFIED"

Comment 13 Sandro Bonazzola 2019-04-16 13:58:33 UTC
This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.