Bug 1565516
Summary: | Deploy HE failed after checking host result up 120 times via cockpit based ansible deployment. | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] cockpit-ovirt | Reporter: | Wei Wang <weiwang> | ||||||||
Component: | Hosted Engine | Assignee: | Phillip Bailey <phbailey> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Wei Wang <weiwang> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 0.11.20 | CC: | bugs, cshao, david, huzhao, jiaczhan, mike, qiyuan, rbarry, stirabos, yaniwang, ycui, ylavi, yzhao | ||||||||
Target Milestone: | ovirt-4.2.3 | Flags: | rule-engine:
ovirt-4.2+
ylavi: exception+ cshao: testing_ack+ |
||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2018-04-20 09:18:13 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Created attachment 1419727 [details]
Log files
I cannot meet this issue, please see https://bugzilla.redhat.com/show_bug.cgi?id=1562011#c8. I cannot find the previous bug about this issue, I think it is fixed on ovirt-hosted-engine-setup-2.2.15-1.el7ev.noarch (In reply to Yihui Zhao from comment #2) > I cannot meet this issue, please see > https://bugzilla.redhat.com/show_bug.cgi?id=1562011#c8. > > I cannot find the previous bug about this issue, I think it is fixed on > ovirt-hosted-engine-setup-2.2.15-1.el7ev.noarch Yes, maybe it is not 100% reproducible. Test with new version RHVH-4.2-20180410.1-RHVH-x86_64-dvd1.iso, after testing 5 times, deployment fail occurs 1 time. [ INFO ] TASK [Wait for SSH to restart on the local VM] [ ERROR ] fatal: [localhost -> localhost]: FAILED! => {"changed": false, "elapsed": 301, "msg": "Timeout when waiting for rhevh-hostedengine-vm-06.lab.eng.pek2.redhat.com:22"} [ INFO ] TASK [include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [Remove local vm dir] [ INFO ] changed: [localhost] [ INFO ] TASK [Notify the user about a failure] [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"} From above information, we can see it is different with the original one(Attach the log in attachment), but deployment fail is probability events. So for the customer, it is not a good experience. Created attachment 1423318 [details]
HE deploy fail logs
Is this reproducible on the CLI? Also, if this is being tested in a VM, please try on physical hardware. (In reply to Ryan Barry from comment #6) > Is this reproducible on the CLI? > > Also, if this is being tested in a VM, please try on physical hardware. Using CLI to retest 8 times, cannot reproduce the related issue. QE use physical hardware to test no matter with cockpit or with CLI. I tried reproducing it 4 times in a raw from cockpit it worked as expected. We need a reproducer for being able to do something about this. Setting conditional NAK on reproducer. CLOSING as WORKSFORME, please reopen if we found a reproducer I don't know how to reproduce this but I found a workaround It happened to me with oVirt Node 4.2.3 ovirt-hosted-engine-setup-2.2.20-1.el7.centos.noarch May 28 14:14:59 ovn-1.vm-net2 ovirt-vmconsole-host-sshd[1629]: /usr/share/ovirt-vmconsole/ovirt-vmconsole-host/ovirt-vmconsole-host-sshd/sshd_config line 23: Deprecated option RSAAuthentication May 28 14:15:00 ovn-1.vm-net2 ovirt-vmconsole-host-sshd[1629]: Could not load host key: /etc/pki/ovirt-vmconsole/host-ssh_host_rsa May 28 14:15:00 ovn-1.vm-net2 ovirt-vmconsole-host-sshd[1629]: sshd: no hostkeys available -- exiting. The above service fails to start because for some reason the SSH host key isn't generated. When I used `ssh-keygen` to generate the host key at that path, and started/enabled ovirt-vmconsole-host-sshd, and re-deployed, it got past that error. ssh-keygen -h -t rsa /etc/pki/ovirt-vmconsole/host-ssh_host_rsa The problem seems to be if you have dns then files in your nsswitch.conf When the setup script modifies the hosts file and tries to ssh in the engine it will not be able to as the dns will point it to the IP that's meant to be setup on now what the bridge interface sets up initially (In reply to Mike Goodwin from comment #11) > I don't know how to reproduce this but I found a workaround > > It happened to me with oVirt Node 4.2.3 > > ovirt-hosted-engine-setup-2.2.20-1.el7.centos.noarch > > > May 28 14:14:59 ovn-1.vm-net2 ovirt-vmconsole-host-sshd[1629]: > /usr/share/ovirt-vmconsole/ovirt-vmconsole-host/ovirt-vmconsole-host-sshd/ > sshd_config line 23: Deprecated option RSAAuthentication > May 28 14:15:00 ovn-1.vm-net2 ovirt-vmconsole-host-sshd[1629]: Could not > load host key: /etc/pki/ovirt-vmconsole/host-ssh_host_rsa > May 28 14:15:00 ovn-1.vm-net2 ovirt-vmconsole-host-sshd[1629]: sshd: no > hostkeys available -- exiting. > > > The above service fails to start because for some reason the SSH host key > isn't generated. > > When I used `ssh-keygen` to generate the host key at that path, and > started/enabled ovirt-vmconsole-host-sshd, and re-deployed, it got past that > error. > > ssh-keygen -h -t rsa /etc/pki/ovirt-vmconsole/host-ssh_host_rsa I think you will need a -f if you specify the location ssh-keygen -h -t rsa -f /etc/pki/ovirt-vmconsole/host-ssh_host_rsa |
Created attachment 1419726 [details] deployment fail picture Description of problem: Deploy HE failed after checking host result up 120 times via cockpit based ansible deployment. [ INFO ] TASK [Add host] [ INFO ] changed: [localhost] [ INFO ] TASK [Wait for the host to be up] [ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": []}, "attempts": 120, "changed": false} [ INFO ] TASK [include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [Remove local vm dir] [ INFO ] changed: [localhost] [ INFO ] TASK [Notify the user about a failure] [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"} Version-Release number of selected component (if applicable): RHVH-4.2-20180408.0-RHVH-x86_64-dvd1.iso cockpit-bridge-160-3.el7.x86_64 cockpit-160-3.el7.x86_64 cockpit-ws-160-3.el7.x86_64 cockpit-system-160-3.el7.noarch cockpit-ovirt-dashboard-0.11.20-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.15-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.9-1.el7ev.noarch rhvm-appliance-4.2-20180404.0.el7.4.2.rpm How reproducible: 100% Steps to Reproduce: 1. Clean install RHVH-4.2-20180408.0-RHVH-x86_64-dvd1.iso with anaconda 2. Deploy hosted-engine via cockpit based ansible deployment. Actual results: Deploy HE failed after checking host result up 120 times Expected results: Deploy HE successful without any error. Additional info: This issue cannot be reproduced with CLI ansible deployment.