Bug 1706275
| Summary: | HE deployment fails trying to bootstrap it from a slow USB device | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-ansible-collection | Reporter: | Chris Kuperstein <ckuperst> | ||||||||||||||||
| Component: | hosted-engine-setup | Assignee: | Ido Rosenzwig <irosenzw> | ||||||||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Wei Wang <weiwang> | ||||||||||||||||
| Severity: | medium | Docs Contact: | Tahlia Richardson <trichard> | ||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||
| Version: | unspecified | CC: | bugs, dholler, stirabos | ||||||||||||||||
| Target Milestone: | ovirt-4.3.4 | Keywords: | ZStream | ||||||||||||||||
| Target Release: | 1.0.18 | Flags: | sbonazzo:
ovirt-4.3?
sbonazzo: planning_ack? sbonazzo: devel_ack+ sbonazzo: testing_ack? |
||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||
| OS: | Linux | ||||||||||||||||||
| Whiteboard: | |||||||||||||||||||
| Fixed In Version: | ovirt-ansible-hosted-engine-setup-1.0.18 | Doc Type: | If docs needed, set a value | ||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||
| Last Closed: | 2019-06-11 06:24:12 UTC | Type: | Bug | ||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||
| oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
| Embargoed: | |||||||||||||||||||
| Bug Depends On: | |||||||||||||||||||
| Bug Blocks: | 1709969, 1710352 | ||||||||||||||||||
| Attachments: |
|
||||||||||||||||||
Created attachment 1562865 [details]
output from cockpit
Created attachment 1562866 [details]
virsh output (post-attempt)
this is the state of virsh during and after the attempt.
Created attachment 1562867 [details]
ip a (output)
Chris, can you please attach the file ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-*.log again, the currently attached file seems to be broken. Can you please add /var/log/messages and yum.log, too? Probably not related: Please note that teaming is not supported, use bonding instead. If the bond is not required during the install, it is recommended to create the bond after the installation. Created attachment 1564589 [details]
ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-20194318536-6vruip.log
Created attachment 1564590 [details]
yum.log
Created attachment 1564591 [details]
/var/log/messages
Actions: - # ovirt-hosted-engine-cleanup - removed team0.2, team0, and p5p1+p5p2 interface configurations - configured bond0 and vlan interface bond0.2 with "mode=4 miimon=100" - restarted network.service - attempted hosted engine deployment again no success. I will attempt no bond as well, with just a standard access port and no VLAN interface on the host machine's native 1Gbit eth interfaces (em1) standard host network interface without bonds or VLANs is a no-go. Chris, deploying over a teamed device is not supported and it will fail for sure. Unfortunately we cannot easily identify team interfaces due to an issue in ansible facts module: it's tracked here: https://github.com/ansible/ansible/issues/43129 Bonds, vlans and vlans over bonds (bond0.2) are instead supported. Chris, can you please attach the logs for the attempt mentioned in comments 8 and 9? Simone, unfortunately I went forward with a full wipe of the host to proceed with a different storage configuration, so can't provide the logs from attempts in comments 8 and 9. I suspected the limited IO on the internal USB device where the root fs was mounted was inhibiting the local deployment of the engine appliance. I reconfigured the host like so: (Dell Perc H710p mini mono controller): 4x SSD in hardware RAID 10 (/dev/sda) 2x HDD in hardware RAID 1 (/dev/sdb) [root@vhost0 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 930.5G 0 disk ├─sda1 8:1 0 2.8G 0 part /boot/efi ├─sda2 8:2 0 1.9G 0 part /boot └─sda3 8:3 0 925.9G 0 part ├─rhel-root 253:0 0 93.1G 0 lvm / ├─rhel-swap 253:1 0 7.5G 0 lvm [SWAP] └─rhel-usr_share_ssd 253:2 0 825.3G 0 lvm /usr/share/ssd sdb 8:16 0 4.6T 0 disk └─sdb1 8:17 0 4.6T 0 part └─data-hdd 253:3 0 4.6T 0 lvm /usr/share/hdd with the root fs residing on a 93.1G partition existing on the SSDs, polling for the VM IP only took about a minute or so. I suspect this was a race condition against the timeout for the appliance deployment on local storage. In the future, perhaps there is a better way to deploy the local engine appliance straight to a selected storage pool rather than deploying the local VM then choosing a storage pool to migrate to after the fact? (In reply to Chris Kuperstein from comment #11) > Simone, > > unfortunately I went forward with a full wipe of the host to proceed with a > different storage configuration, so can't provide the logs from attempts in > comments 8 and 9. So did it finally worked? can we close this? > I suspected the limited IO on the internal USB device > where the root fs was mounted was inhibiting the local deployment of the > engine appliance. We are polling 50 times with a 10 seconds delay. 500 seconds seems definitively a reasonable amount of time to bootstrap a VM and have getting an address from an internal DHCP server. > with the root fs residing on a 93.1G partition existing on the SSDs, polling > for the VM IP only took about a minute or so. I suspect this was a race > condition against the timeout for the appliance deployment on local storage. For the deployment we need about 3 GB under /var/tmp > In the future, perhaps there is a better way to deploy the local engine > appliance straight to a selected storage pool rather than deploying the > local VM then choosing a storage pool to migrate to after the fact? The whole point of this flow is to bootstrap a VM with a locally running engine as quickly as possible in order to use that engine (via ansible modules) to do everything else (configuring the storage and the network, creating disks, a VM...) using standard and well tested engine code instead of duplicating it. This is confirmed working, and I suspect it would be okay when using a SATA DOM for internal host storage, but even if you have a sufficiently sized internal USB device (which is extremely common on commodity hardware ~5+ years old), deployment of the hosted engine simply will not work. It looks like this may be limited by read/write speed on the device which /var/tmp is mounted on. I understand the reuse of modules present in the engine appliance itself, but this is a (albeit minor) disadvantage in comparison to ESXi/VMWare VCSA which uses an out of band delivery method (OVFTool) for management. let's raise the timeout, not sure this is a real common use case. Discuss with DEV, then test this issue with RHVH-4.3-20190516.1-RHVH-x86_64-dvd1.iso Version: RHVH-4.3-20190516.1-RHVH-x86_64-dvd1.iso cockpit-ovirt-dashboard-0.12.9-1.el7ev.noarch ovirt-hosted-engine-setup-2.3.8-1.el7ev.noarch ovirt-hosted-engine-ha-2.3.1-1.el7ev.noarch Steps: 1. Clean install RHVH-4.3-20190516.1-RHVH-x86_64-dvd1.iso 2. Setting network to bond+vlan 3. Deploy Hosted engine(CLI and Cockpit UI) Result: Deployment successful without error under bond+vlan network. bug is fixed, change status to "VERIFIED" This bugzilla is included in oVirt 4.3.4 release, published on June 11th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.4 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |
Created attachment 1562864 [details] tarball of /var/logs/ovirt-hosted-engine-setup Description of problem: During deployment of the hosted engine, gathering the local VM IP does not succeed and installation fails. Version-Release number of selected component (if applicable): 4.2 How reproducible: Steps to Reproduce: 1. Install RHEL 7.6 Minimal 2. Configure 2 10 Gigabit interfaces in a single LACP Team (team0) 3. Configure 1 VLAN interface on single team (team0) 4. run hosted-engine --deploy OR: 5. firewall-cmd --permanent --add-port=9090/tcp 6. install cockpit and run hosted engine installer Actual results: during deployment, the Ansible playbook fails to retrieve the local VM IP from the HostedEngineLocal instance, and the installer fails. The local VM instance is not properly terminated or cleaned up during the installer cleanup, and the network interface on the local VM does not bind an IP. Expected results: Installer to complete deploying local engine VM and proceed to the storage domain configuration phase for VM migration. Additional info: Dell R720 Asset Tag# J71P5X1 128GB RAM Intel X520-DA2 10Gbit NIC Internal Storage: 32GB Sandisk Cruzer Fit (OS) 4x Samsung EVO 860 512GB SSD (LVM RAID10 + XFS) 2x Seagate Barracuda 5TB HDD (LVM RAID1 + XFS)