Description of problem: After rebooting HE VM, the newly added vnic comes unplugged. Version-Release number of selected component (if applicable): - 3.6.5 - Hosted Engine vm built based of RHEV-M appliance. How reproducible: 100% Steps to Reproduce: 1. Get RHEVM 3.6.5 in Hosted Engine environment. 2. Edit HE VM, aka manager, through the Admin Portal and add additional vnic to it. 3. The vnic is added successfully, the guest OS sees this and all good. 4. Reboot the guest. ( I did cold reboot - turn the machine off and then start it.) Actual results: The second vnic appears unplugged. Expected results: The second vnic should be plugged. Additional info: I can provide logs, but the issue is very easy reproducible, so I'll skip that step. Let me know if you need logs.
Marina you probably hit Bug 1311151 which is verified in 3.6.6. Let me know if any of what you see is differnt. Otherwise I'll close it as a dup
Hello Roy. Marina is out today so I took a look at this case. It does seem like the same issue however please do not mark it a duplicate yet until I concur with Marina.Also is there a workaround available for this or does this customer need to wait for 3.6.6? Regards, Frank
Hi Roy, It is a different bug. Here, the nic is not present to the guest at all. In the guest: # ip a s 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:16:3e:59:a6:03 brd ff:ff:ff:ff:ff:ff inet 10.10.176.36/21 brd 10.10.183.255 scope global eth0 inet6 2620:52:0:ab0:216:3eff:fe59:a603/64 scope global dynamic valid_lft 2591965sec preferred_lft 604765sec inet6 fe80::216:3eff:fe59:a603/64 scope link valid_lft forever preferred_lft forever No mention of eth1 at all. Same here: # ll /etc/sysconfig/network-scripts/ifcfg-* -rw-r--r--. 1 root root 126 Mar 1 07:27 /etc/sysconfig/network-scripts/ifcfg-eth0 -rw-r--r--. 1 root root 254 Apr 12 05:14 /etc/sysconfig/network-scripts/ifcfg-lo # ethtool eth1 Settings for eth1: Cannot get device settings: No such device Cannot get wake-on-lan settings: No such device Cannot get message level: No such device Cannot get link status: No such device No data available # lspci 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device 00:04.0 SCSI storage controller: Red Hat, Inc Virtio SCSI 00:05.0 Communication controller: Red Hat, Inc Virtio console 00:06.0 SCSI storage controller: Red Hat, Inc Virtio block device And after modifying the nic in the Admin Portal to plug it in, it appears on the guest: # lspci 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device 00:04.0 SCSI storage controller: Red Hat, Inc Virtio SCSI 00:05.0 Communication controller: Red Hat, Inc Virtio console 00:06.0 SCSI storage controller: Red Hat, Inc Virtio block device 00:07.0 Ethernet controller: Red Hat, Inc Virtio network device <=== additional network device showed up It is present in the ip output: # ip a s eth1 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:1a:4a:16:01:54 brd ff:ff:ff:ff:ff:ff ..
I see that vm.conf does not change after I modify the VM on the Admin Portal to plug the nic in. Not sure if it is by design of hosted-engine or not. I would expect vm.conf to contain all the information about this VM. This is especially important in the scenarios when this additional network is served as display network. I am going to modify the vm.conf on the share storage and see if it makes any effect. And hopefully it will be the right way to workaround it, and not to modify the database. (if HE VM first starts and then reads the DB, I would assume vm.conf to contain all that information, rather then the DB). Thinking about this more, maybe it is something related to ova updater? That it didn't have the time to update the OVF store with the new config? Probably not. The VM has been up for long enough time (hour+) to get them updated.
I succeeded to modify vm.conf using this solution: https://access.redhat.com/solutions/2209751 At step 4, when vm.conf should be modified, here is the line to add: devices={nicModel:pv,macAddr:<MAC_from_Admin_Portal>,linkActive:true,network:<logical_net_name>,deviceId:<device_id from vm_devices>,address:None,device:bridge,type:interface} Only question is why device is always bridge. I guess it is ok, from the times when every logical network had to be a bridge and we just don't have a non-bridge network definition. I don't know.
I don't understand this - so, HE VM boots correctly with the nic up, however, /var/run/ovirt-hosted-engine-ha/vm.conf still does not contain the updated information. What does it mean? That from now on it will be always required to start the VM with that modified vm.conf? What is the way to update vm.conf on the share storage? I could not find it anywhere on the hosted storage domain.
Editing a custom vm.conf is just a recovery action if something went wrong with previous edit attempt and the system is not able to start the engine VM. Its local modification are not persisted in any way: vm.conf gets periodically regenerated from the ovf description wrote by the engine in the OVF_STORE. So the only way to have modification to the engine VM correctly persisted, is modifying the the VM on engine side.
Created attachment 1162586 [details] HE VM from OVF
Seems like ovf file is not being updated with the other nics. See attachment from comment 11. I added 3 additional nics, with and without AllowEditingHostedEngine, I waited enough OvfUpdateIntervalInMinutes time to make sure the OVF file is updated. However, in the ovf store, there is only 1 interface defined. Logs are available, but this is very easy reproducible, so let me know. Interesting enough, there is no way to remove all those nics from UI now. :-) I should probably open some bug on that too.
Marina this is a clone of by Bug 1327267. Please do open a bug if you can't remove nics (and state there that we should make sure we don't leave the engine with no nics at all)
(In reply to Roy Golan from comment #13) > Marina this is a clone of by Bug 1327267. > > Please do open a bug if you can't remove nics (and state there that we > should make sure we don't leave the engine with no nics at all) Roy, maybe. I agree to have this bug as TestOnly. As for the nic removal - once the nic is unplugged, I can remove it. So, we are good here.
I've received this error on host: hosted-engine --vm-status Unable to read vm.conf, please check ovirt-ha-agent logs MainThread::ERROR::2016-06-02 16:02:14,018::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Connection to storage server failed' - trying to restart agent MainThread::WARNING::2016-06-02 16:02:19,023::agent::208::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '3' MainThread::INFO::2016-06-02 16:02:19,045::hosted_engine::244::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: alma04.qa.lab.tlv.redhat.com MainThread::INFO::2016-06-02 16:02:19,046::hosted_engine::613::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2016-06-02 16:02:19,096::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2016-06-02 16:02:19,096::storage_server::218::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2016-06-02 16:02:19,334::storage_server::222::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::ERROR::2016-06-02 16:02:19,373::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Connection to storage server failed' - trying to restart agent ~ systemctl status ovirt-ha-agent -l ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2016-06-02 16:03:45 IDT; 39s ago Main PID: 19178 (ovirt-ha-agent) CGroup: /system.slice/ovirt-ha-agent.service └─19178 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon Jun 02 16:04:18 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[19178]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to storage server failed' - trying to restart agent Jun 02 16:04:18 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[19178]: ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Connection to storage server failed' - trying to restart agent Jun 02 16:04:23 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[19178]: WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '6' Jun 02 16:04:23 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[19178]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: alma04.qa.lab.tlv.redhat.com Jun 02 16:04:23 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[19178]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM Jun 02 16:04:23 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[19178]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage Jun 02 16:04:23 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[19178]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Jun 02 16:04:23 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[19178]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Jun 02 16:04:24 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[19178]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to storage server failed' - trying to restart agent Jun 02 16:04:24 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[19178]: ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Connection to storage server failed' - trying to restart agent systemctl status ovirt-ha-broker -l ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2016-06-02 15:29:04 IDT; 35min ago Main PID: 855 (ovirt-ha-broker) CGroup: /system.slice/ovirt-ha-broker.service └─855 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon Jun 02 15:29:09 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[855]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor ping Jun 02 15:29:09 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[855]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor mem-load Jun 02 15:29:09 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[855]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor engine-health Jun 02 15:29:09 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[855]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor engine-health Jun 02 15:29:09 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[855]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor mem-free Jun 02 15:29:09 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[855]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor cpu-load Jun 02 15:29:09 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[855]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor cpu-load-no-engine Jun 02 15:29:09 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[855]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Finished loading submonitors Jun 02 15:29:09 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[855]: INFO:ovirt_hosted_engine_ha.broker.listener.Listener:Initializing SocketServer Jun 02 15:29:09 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[855]: INFO:ovirt_hosted_engine_ha.broker.listener.Listener:SocketServer ready Thread-2552::DEBUG::2016-06-02 16:05:23,671::mount::229::Storage.Misc.excCmd::(_runcmd) /usr/bin/taskset --cpu-list 0-3 /usr/bin/sudo -n /usr/bin/ mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3,None 10.35.64.11:/vol/RHEV/Virt/nsednev_3_6_HE_2 /rhev/data-center/mnt/10.35.64.11 :_vol_RHEV_Virt_nsednev__3__6__HE__2 (cwd None) Thread-2552::ERROR::2016-06-02 16:05:23,694::hsm::2473::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2470, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 455, in connect return self._mountCon.connect() File "/usr/share/vdsm/storage/storageServer.py", line 237, in connect six.reraise(t, v, tb) File "/usr/share/vdsm/storage/storageServer.py", line 229, in connect self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP) File "/usr/share/vdsm/storage/mount.py", line 225, in mount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 241, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (32, ';mount.nfs: an incorrect mount option was specified\n') Thread-2555::ERROR::2016-06-02 16:05:28,995::sdc::139::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain 88615f12-cb6e-4e8f-acbd-fe149c289726 Thread-2555::ERROR::2016-06-02 16:05:28,995::sdc::156::Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain 88615f12-cb6e-4e8f-acbd-fe149c289726 Thread-2555::DEBUG::2016-06-02 16:05:28,996::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-2555::DEBUG::2016-06-02 16:05:28,996::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-3 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 88615f12-cb6e-4e8f-acbd-fe149c289726 (cwd None) Thread-2555::DEBUG::2016-06-02 Sosreport from host is attached. How I've got to this state: 1)Got el6.8 HE-3.6.7 deployed on el7.2. 2)Added data storage domain to get HE-SD auto-imported. 3)Edited via WEBUI the HE-VM and added additional VNIC to it, which was also seen from CLI from the HE-VM, all this without removing already existing management VNIC. 4)Rebooted the host. 5)After reboot, I've logged in to the host and ran "hosted-engine --vm-status" command. Components on host: ovirt-vmconsole-1.0.2-2.el7ev.noarch ovirt-hosted-engine-setup-1.3.7.0-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 mom-0.5.3-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch vdsm-4.17.29-0.el7ev.noarch rhevm-sdk-python-3.6.5.1-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.5.x86_64 ovirt-hosted-engine-ha-1.3.5.6-1.el7ev.noarch ovirt-vmconsole-host-1.0.2-2.el7ev.noarch rhevm-appliance-20160515.0-1.el7ev.noarch Red Hat Enterprise Linux Server release 7.2 (Maipo) Linux version 3.10.0-327.22.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon May 16 13:31:48 EDT 2016 Linux 3.10.0-327.22.1.el7.x86_64 #1 SMP Mon May 16 13:31:48 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
How is this failure related to the bug?
Providing the external link to the logs here, as the file size is much bigger that supported by Bugzilla. https://drive.google.com/a/redhat.com/file/d/0B85BEaDBcF88TEZqNHFDMDBVMTg/view?usp=sharing (In reply to Yaniv Dary from comment #17) > How is this failure related to the bug? Its related as I've followed the exactly the same scenario as described by Marina and got to situation in which engine can't even start. Untill this issue is resolved, this very bug can't be ON_QA.
See comment 13 and the depends field. This BZ should be verified after Bug 1327267.
Followed these steps: 1. Get RHEVM 3.6.5 in Hosted Engine environment. 2. Edit HE VM, aka manager, through the Admin Portal and add additional vnic to it. 3. The vnic is added successfully, the guest OS sees this and all good. 4. Reboot the host. As a result, when host booted up, Engine recognized newly added NIC before it was restarted: eth0 Link encap:Ethernet HWaddr 00:16:3E:7B:BB:BB inet addr:10.35.97.250 Bcast:10.35.97.255 Mask:255.255.255.0 inet6 addr: fe80::216:3eff:fe7b:bbbb/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:591028 errors:0 dropped:0 overruns:0 frame:0 TX packets:79145 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4227960859 (3.9 GiB) TX bytes:9473511 (9.0 MiB) eth1 Link encap:Ethernet HWaddr 00:1A:4A:16:01:51 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:249837 errors:0 dropped:0 overruns:0 frame:0 TX packets:249837 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:212690803 (202.8 MiB) TX bytes:212690803 (202.8 MiB) After host restarted, I've got agent in failed state: systemctl status ovirt-ha-agent -l ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2016-06-08 15:20:52 IDT; 33s ago Main PID: 12152 (ovirt-ha-agent) CGroup: /system.slice/ovirt-ha-agent.service └─12152 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon Jun 08 15:21:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[12152]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to storage server failed' - trying to restart agent Jun 08 15:21:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[12152]: ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Connection to storage server failed' - trying to restart agent Jun 08 15:21:25 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[12152]: WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '5' Jun 08 15:21:25 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[12152]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: alma03.qa.lab.tlv.redhat.com Jun 08 15:21:25 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[12152]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM Jun 08 15:21:25 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[12152]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage Jun 08 15:21:25 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[12152]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Jun 08 15:21:25 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[12152]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Jun 08 15:21:25 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[12152]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to storage server failed' - trying to restart agent Jun 08 15:21:25 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[12152]: ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Connection to storage server failed' - trying to restart agent Engine failed to start at all. The components on host: qemu-kvm-rhev-2.3.0-31.el7_2.15.x86_64 ovirt-vmconsole-host-1.0.2-2.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch vdsm-4.17.30-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.5.x86_64 ovirt-hosted-engine-ha-1.3.5.7-1.el7ev.noarch ovirt-vmconsole-1.0.2-2.el7ev.noarch ovirt-hosted-engine-setup-1.3.7.1-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 mom-0.5.3-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch Red Hat Enterprise Linux Server release 7.2 (Maipo) Linux version 3.10.0-327.22.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon May 16 13:31:48 EDT 2016 Linux 3.10.0-327.22.1.el7.x86_64 #1 SMP Mon May 16 13:31:48 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
1)I've deployed clean HE-VM on single host over NFS. 2)Added data storage domain, so that HE-VM could succeed in auto-import of HE-VM. 3)Waited until HE-SD was successfully auto-imported. 4)Added empty vNIC to the engine's VM via WEBUI. 5)Set hosted-engine in to global maintenance. 6)Powered off the HE-VM via hosted-engine CLI option. 7)Sent hosted-engine --vm-start. 8)Set maintenance to none. 9)Engine failed to get started and I see more than 30 minutes the "INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Current state EngineUnexpectedlyDown (score: 0)" Host: qemu-kvm-rhev-2.3.0-31.el7_2.15.x86_64 mom-0.5.4-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 ovirt-vmconsole-host-1.0.2-2.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.7-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.5.x86_64 ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.7.2-1.el7ev.noarch ovirt-vmconsole-1.0.2-2.el7ev.noarch vdsm-4.17.31-0.el7ev.noarch Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux alma03.qa.lab.tlv.redhat.com 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) Engine: rhevm-dwh-setup-3.6.6-1.el6ev.noarch rhevm-webadmin-portal-3.6.7.3-0.1.el6.noarch rhevm-spice-client-x64-cab-3.6-7.el6.noarch rhevm-setup-plugins-3.6.5-1.el6ev.noarch rhevm-setup-3.6.7.3-0.1.el6.noarch rhevm-tools-backup-3.6.7.3-0.1.el6.noarch rhevm-doc-3.6.7-1.el6eng.noarch rhevm-branding-rhev-3.6.0-10.el6ev.noarch rhevm-setup-base-3.6.7.3-0.1.el6.noarch rhevm-backend-3.6.7.3-0.1.el6.noarch rhevm-dbscripts-3.6.7.3-0.1.el6.noarch rhevm-dependencies-3.6.0-1.el6ev.noarch rhevm-spice-client-x86-cab-3.6-7.el6.noarch rhevm-sdk-python-3.6.7.0-1.el6ev.noarch rhevm-guest-agent-common-1.0.11-6.el6ev.noarch rhevm-image-uploader-3.6.0-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.6.7.3-0.1.el6.noarch rhevm-vmconsole-proxy-helper-3.6.7.3-0.1.el6.noarch rhevm-reports-setup-3.6.5.1-1.el6ev.noarch rhevm-restapi-3.6.7.3-0.1.el6.noarch rhevm-3.6.7.3-0.1.el6.noarch rhevm-log-collector-3.6.1-1.el6ev.noarch rhevm-spice-client-x86-msi-3.6-7.el6.noarch rhevm-setup-plugin-vmconsole-proxy-helper-3.6.7.3-0.1.el6.noarch rhevm-extensions-api-impl-3.6.7.3-0.1.el6.noarch rhevm-websocket-proxy-3.6.7.3-0.1.el6.noarch rhevm-reports-3.6.5.1-1.el6ev.noarch rhevm-tools-3.6.7.3-0.1.el6.noarch rhevm-setup-plugin-ovirt-engine-common-3.6.7.3-0.1.el6.noarch rhevm-dwh-3.6.6-1.el6ev.noarch rhevm-userportal-3.6.7.3-0.1.el6.noarch rhevm-spice-client-x64-msi-3.6-7.el6.noarch rhevm-iso-uploader-3.6.0-1.el6ev.noarch rhevm-lib-3.6.7.3-0.1.el6.noarch rhevm-cli-3.6.2.1-1.el6ev.noarch rhevm-setup-plugin-websocket-proxy-3.6.7.3-0.1.el6.noarch Linux version 2.6.32-642.1.1.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Fri May 6 14:54:05 EDT 2016 Linux nsednev-he-2.qa.lab.tlv.redhat.com 2.6.32-642.1.1.el6.x86_64 #1 SMP Fri May 6 14:54:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 6.8 (Santiago)
Created attachment 1167540 [details] sosreport from host alma03
Thread-18503::INFO::2016-06-13 18:03:34,675::vm::1932::virt.vm::(_run) vmId=`3721aa2b-67d8-47ad-bfac-460a2a089710`::<?xml version="1.0" encoding="utf-8"?><domain type="kvm" xmlns:ovirt="http://ovirt.org/vm/tune/1.0"> <name>HostedEngine</name> <uuid>3721aa2b-67d8-47ad-bfac-460a2a089710</uuid> <memory>4194304</memory> <currentMemory>4194304</currentMemory> <vcpu current="4">4</vcpu> <devices> <channel type="unix"> <target name="com.redhat.rhevm.vdsm" type="virtio"/> <source mode="bind" path="/var/lib/libvirt/qemu/channels/3721aa2b-67d8-47ad-bfac-460a2a089710.com.redhat.rhevm.vdsm"/> </channel> <channel type="unix"> <target name="org.qemu.guest_agent.0" type="virtio"/> <source mode="bind" path="/var/lib/libvirt/qemu/channels/3721aa2b-67d8-47ad-bfac-460a2a089710.org.qemu.guest_agent.0"/> </channel> <input bus="ps2" type="mouse"/> <memballoon model="none"/> <controller index="0" ports="16" type="virtio-serial"> <address bus="0x00" domain="0x0000" function="0x0" slot="0x05" type="pci"/> </controller> <controller type="ide"> <address bus="0x00" domain="0x0000" function="0x1" slot="0x01" type="pci"/> </controller> <controller model="virtio-scsi" type="scsi"> <address bus="0x00" domain="0x0000" function="0x0" slot="0x04" type="pci"/> </controller> <controller type="usb"> <address bus="0x00" domain="0x0000" function="0x2" slot="0x01" type="pci"/> </controller> <video> <address bus="0x00" domain="0x0000" function="0x0" slot="0x02" type="pci"/> <model heads="1" type="cirrus" vram="32768"/> </video> <graphics autoport="yes" defaultMode="secure" listen="0" passwd="*****" passwdValidTo="1970-01-01T00:00:01" port="-1" type="vnc"/> <interface type="bridge"> <address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci"/> <mac address="00:16:3E:7B:BB:BB"/> <model type="virtio"/> <source bridge="ovirtmgmt"/> <link state="up"/> </interface> <interface address="None" type="bridge"> <mac address="00:1a:4a:16:01:51"/> <model type="virtio"/> <source bridge="None"/> <link state="up"/> </interface> <disk device="disk" snapshot="no" type="file"> <address bus="0x00" domain="0x0000" function="0x0" slot="0x06" type="pci"/> <source file="/var/run/vdsm/storage/a0903254-f31e-4798-be1b-e5f6150bcdec/501bc9ac-0463-4c8b-a23b-efd6cd1c523b/7f04725e-1926-48b6-bb0f-d14f62555b6b"/> <target bus="virtio" dev="vda"/> <serial>501bc9ac-0463-4c8b-a23b-efd6cd1c523b</serial> <boot order="1"/> <driver cache="none" error_policy="stop" io="threads" name="qemu" type="raw"/> </disk> <disk device="cdrom" snapshot="no" type="file"> <address bus="1" controller="0" target="0" type="drive" unit="0"/> <source file="" startupPolicy="optional"/> <target bus="ide" dev="hdc"/> <readonly/> </disk> <lease> <key>7f04725e-1926-48b6-bb0f-d14f62555b6b</key> <lockspace>a0903254-f31e-4798-be1b-e5f6150bcdec</lockspace> <target offset="0" path="/rhev/data-center/mnt/10.35.64.11:_vol_RHEV_Virt_nsednev__3__6__HE__2/a0903254-f31e-4798-be1b-e5f6150bcdec/images/501bc9ac-0463-4c8b-a23b-efd6cd1c523b/7f04725e-1926-48b6-bb0f-d14f62555b6b.lease"/> </lease> <channel type="unix"><target name="org.ovirt.hosted-engine-setup.0" type="virtio"/><source mode="bind" path="/var/lib/libvirt/qemu/channels/3721aa2b-67d8-47ad-bfac-460a2a089710.org.ovirt.hosted-engine-setup.0"/></channel></devices> <metadata> <ovirt:qos/> </metadata> <os> <type arch="x86_64" machine="rhel6.5.0">hvm</type> <smbios mode="sysinfo"/> </os> <sysinfo type="smbios"> <system> <entry name="manufacturer">Red Hat</entry> <entry name="product">RHEV Hypervisor</entry> <entry name="version">7.2-13.0.el7ev</entry> <entry name="serial">4C4C4544-0059-4410-8053-B7C04F573032</entry> <entry name="uuid">3721aa2b-67d8-47ad-bfac-460a2a089710</entry> </system> </sysinfo> <clock adjustment="0" offset="variable"> <timer name="rtc" tickpolicy="catchup"/> <timer name="pit" tickpolicy="delay"/> <timer name="hpet" present="no"/> </clock> <features> <acpi/> </features> <cpu match="exact"> <model>qemu64</model> <feature name="svm" policy="disable"/> </cpu> <on_poweroff>destroy</on_poweroff><on_reboot>destroy</on_reboot><on_crash>destroy</on_crash></domain> mailbox.SPMMonitor::DEBUG::2016-06-13 18:03:34,679::storage_mailbox::735::Storage.Misc.excCmd::(_checkForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0450888 s, 22.7 MB/s\n'; <rc> = 0 Thread-18503::ERROR::2016-06-13 18:03:34,821::vm::759::virt.vm::(_startUnderlyingVm) vmId=`3721aa2b-67d8-47ad-bfac-460a2a089710`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 703, in _startUnderlyingVm self._run() File "/usr/share/vdsm/virt/vm.py", line 1941, in _run self._connection.createXML(domxml, flags), File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3611, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: Cannot get interface MTU on 'None': No such device Thread-18503::INFO::2016-06-13 18:03:34,823::vm::1330::virt.vm::(setDownStatus) vmId=`3721aa2b-67d8-47ad-bfac-460a2a089710`::Changed state to Down: Cannot get interface MTU on 'None': No such device (code=1)
[root@alma03 ~]# cat /var/run/ovirt-hosted-engine-ha/vm.conf emulatedMachine=rhel6.5.0 vmId=3721aa2b-67d8-47ad-bfac-460a2a089710 smp=4 memSize=4096 spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir vmName=HostedEngine display=vnc devices={index:0,iface:virtio,format:raw,bootOrder:1,address:{slot:0x06,bus:0x00,domain:0x0000,type:pci,function:0x0},volumeID:7f04725e-1926-48b6-bb0f-d14f62555b6b,imageID:501bc9ac-0463-4c8b-a23b-efd6cd1c523b,readonly:false,domainID:a0903254-f31e-4798-be1b-e5f6150bcdec,deviceId:501bc9ac-0463-4c8b-a23b-efd6cd1c523b,poolID:00000000-0000-0000-0000-000000000000,device:disk,shared:exclusive,propagateErrors:off,type:disk} devices={nicModel:pv,macAddr:00:16:3E:7B:BB:BB,linkActive:true,network:ovirtmgmt,deviceId:be052cf9-8747-4982-805e-b3bcaa55b8d5,address:{slot:0x03,bus:0x00,domain:0x0000,type:pci,function:0x0},device:bridge,type:interface} devices={nicModel:pv,macAddr:00:1a:4a:16:01:51,linkActive:true,network:None,deviceId:a65ffa7a-d182-4d73-967f-93eae6a755f8,address:None,device:bridge,type:interface} devices={index:2,iface:ide,shared:false,readonly:true,deviceId:8c3179ac-b322-4f5c-9449-c52e3665e0ae,address:{controller:0,target:0,unit:0,bus:1,type:drive},device:cdrom,path:,type:disk} devices={device:virtio-serial,type:controller,deviceId:aafc77c0-4a8d-4ced-8467-f92cd1868d90,address:{slot:0x05,bus:0x00,domain:0x0000,type:pci,function:0x0}} devices={device:cirrus,alias:video0,type:video,deviceId:d691d9a5-36fb-490d-9c21-53eca8b02fce,address:{slot:0x02,bus:0x00,domain:0x0000,type:pci,function:0x0}} devices={device:ide,type:controller,deviceId:25156b8d-9962-4b45-967f-8e9b26931425,address:{slot:0x01,bus:0x00,domain:0x0000,type:pci,function:0x1}} devices={device:scsi,model:virtio-scsi,type:controller,deviceId:67589c37-bf92-4f1e-865d-b1b46d6c770f,address:{slot:0x04,bus:0x00,domain:0x0000,type:pci,function:0x0}} devices={device:usb,type:controller,deviceId:72a87009-523f-40a9-b861-c61dc3a57c95,address:{slot:0x01,bus:0x00,domain:0x0000,type:pci,function:0x2}}
This is what I saw on engine right after new and empty interface was created: [root@nsednev-he-2 ~]# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:16:3E:7B:BB:BB inet addr:10.35.97.250 Bcast:10.35.97.255 Mask:255.255.255.0 inet6 addr: fe80::216:3eff:fe7b:bbbb/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1287636 errors:0 dropped:0 overruns:0 frame:0 TX packets:8362 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:832955848 (794.3 MiB) TX bytes:4428613 (4.2 MiB) eth1 Link encap:Ethernet HWaddr 00:1A:4A:16:01:51 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:122762 errors:0 dropped:0 overruns:0 frame:0 TX packets:122762 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:48337150 (46.0 MiB) TX bytes:48337150 (46.0 MiB) So eth1 was a new interface and it had default ethernet MTU value of 1500. Reassigning back the bug.
in vm.conf we have two interfaces but on the new one we are missing the whole address value. devices={nicModel:pv,macAddr:00:16:3E:7B:BB:BB,linkActive:true,network:ovirtmgmt,deviceId:be052cf9-8747-4982-805e-b3bcaa55b8d5,address:{slot:0x03,bus:0x00,domain:0x0000,type:pci,function:0x0},device:bridge,type:interface} devices={nicModel:pv,macAddr:00:1a:4a:16:01:51,linkActive:true,network:None,deviceId:a65ffa7a-d182-4d73-967f-93eae6a755f8,address:None,device:bridge,type:interface}
In the OVF_STORE we have: <?xml version='1.0' encoding='UTF-8'?><ovf:Envelope xmlns:ovf="http://schemas.dmtf.org/ovf/envelope/1/" xmlns:rasd="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData" xmlns:vssd="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_VirtualSystemSettingData" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ovf:version="3.6.0.0"><References><File ovf:href="501bc9ac-0463-4c8b-a23b-efd6cd1c523b/7f04725e-1926-48b6-bb0f-d14f62555b6b" ovf:id="7f04725e-1926-48b6-bb0f-d14f62555b6b" ovf:size="53687091200" ovf:description="Active VM" ovf:disk_storage_type="IMAGE" ovf:cinder_volume_type=""/><Nic ovf:id="be052cf9-8747-4982-805e-b3bcaa55b8d5"/><Nic ovf:id="a65ffa7a-d182-4d73-967f-93eae6a755f8"/></References><Section xsi:type="ovf:NetworkSection_Type"><Info>List of networks</Info><Network ovf:name="Network 1"/></Section><Section xsi:type="ovf:DiskSection_Type"><Info>List of Virtual Disks</Info><Disk ovf:diskId="7f04725e-1926-48b6-bb0f-d14f62555b6b" ovf:size="50" ovf:actual_size="6" ovf:vm_snapshot_id="d276a54e-be19-4902-89a3-68d78876241e" ovf:parentRef="" ovf:fileRef="501bc9ac-0463-4c8b-a23b-efd6cd1c523b/7f04725e-1926-48b6-bb0f-d14f62555b6b" ovf:format="http://www.vmware.com/specifications/vmdk.html#sparse" ovf:volume-format="RAW" ovf:volume-type="Preallocated" ovf:disk-interface="VirtIO" ovf:boot="false" ovf:disk-alias="virtio-disk0" ovf:wipe-after-delete="false"/></Section><Content ovf:id="out" xsi:type="ovf:VirtualSystem_Type"><CreationDate>2016/06/13 13:13:54</CreationDate><ExportDate>2016/06/13 14:40:17</ExportDate><DeleteProtected>false</DeleteProtected><SsoMethod>guest_agent</SsoMethod><IsSmartcardEnabled>false</IsSmartcardEnabled><TimeZone>Etc/GMT</TimeZone><default_boot_sequence>0</default_boot_sequence><Generation>7</Generation><VmType>0</VmType><ClusterCompatibilityVersion>3.6</ClusterCompatibilityVersion><MinAllocatedMem>4096</MinAllocatedMem><IsStateless>false</IsStateless><IsRunAndPause>false</IsRunAndPause><AutoStartup>false</AutoStartup><Priority>1</Priority><MigrationSupport>1</MigrationSupport><IsBootMenuEnabled>false</IsBootMenuEnabled><IsSpiceFileTransferEnabled>true</IsSpiceFileTransferEnabled><IsSpiceCopyPasteEnabled>true</IsSpiceCopyPasteEnabled><CustomEmulatedMachine>rhel6.5.0</CustomEmulatedMachine><CustomCpuName></CustomCpuName><PredefinedProperties></PredefinedProperties><UserDefinedProperties></UserDefinedProperties><Name>HostedEngine</Name><TemplateId>00000000-0000-0000-0000-000000000000</TemplateId><TemplateName>Blank</TemplateName><IsInitilized>false</IsInitilized><Origin>6</Origin><app_list>kernel-2.6.32-642.el6,kernel-2.6.32-642.1.1.el6,cloud-init-0.7.5-5.el6,rhevm-guest-agent-common-1.0.11-6.el6ev</app_list><DefaultDisplayType>0</DefaultDisplayType><TrustedService>false</TrustedService><UseHostCpu>false</UseHostCpu><UseLatestVersion>false</UseLatestVersion><Section ovf:id="3721aa2b-67d8-47ad-bfac-460a2a089710" ovf:required="false" xsi:type="ovf:OperatingSystemSection_Type"><Info>Guest Operating System</Info><Description>other_linux</Description></Section><Section xsi:type="ovf:VirtualHardwareSection_Type"><Info>4 CPU, 4096 Memeory</Info><System><vssd:VirtualSystemType>ENGINE 3.6.0.0</vssd:VirtualSystemType></System><Item><rasd:Caption>4 virtual cpu</rasd:Caption><rasd:Description>Number of virtual CPU</rasd:Description><rasd:InstanceId>1</rasd:InstanceId><rasd:ResourceType>3</rasd:ResourceType><rasd:num_of_sockets>4</rasd:num_of_sockets><rasd:cpu_per_socket>1</rasd:cpu_per_socket><rasd:threads_per_cpu>1</rasd:threads_per_cpu></Item><Item><rasd:Caption>4096 MB of memory</rasd:Caption><rasd:Description>Memory Size</rasd:Description><rasd:InstanceId>2</rasd:InstanceId><rasd:ResourceType>4</rasd:ResourceType><rasd:AllocationUnits>MegaBytes</rasd:AllocationUnits><rasd:VirtualQuantity>4096</rasd:VirtualQuantity></Item><Item><rasd:Caption>virtio-disk0</rasd:Caption><rasd:InstanceId>7f04725e-1926-48b6-bb0f-d14f62555b6b</rasd:InstanceId><rasd:ResourceType>17</rasd:ResourceType><rasd:HostResource>501bc9ac-0463-4c8b-a23b-efd6cd1c523b/7f04725e-1926-48b6-bb0f-d14f62555b6b</rasd:HostResource><rasd:Parent>00000000-0000-0000-0000-000000000000</rasd:Parent><rasd:Template>00000000-0000-0000-0000-000000000000</rasd:Template><rasd:ApplicationList></rasd:ApplicationList><rasd:StorageId>a0903254-f31e-4798-be1b-e5f6150bcdec</rasd:StorageId><rasd:StoragePoolId>00000001-0001-0001-0001-00000000017e</rasd:StoragePoolId><rasd:CreationDate>2016/06/13 13:13:55</rasd:CreationDate><rasd:LastModified>1970/01/01 00:00:00</rasd:LastModified><rasd:last_modified_date>2016/06/13 14:40:17</rasd:last_modified_date><Type>disk</Type><Device>disk</Device><rasd:Address>{slot=0x06, bus=0x00, domain=0x0000, type=pci, function=0x0}</rasd:Address><BootOrder>0</BootOrder><IsPlugged>true</IsPlugged><IsReadOnly>false</IsReadOnly><Alias>virtio-disk0</Alias></Item><Item><rasd:Caption>Ethernet adapter on ovirtmgmt</rasd:Caption><rasd:InstanceId>be052cf9-8747-4982-805e-b3bcaa55b8d5</rasd:InstanceId><rasd:ResourceType>10</rasd:ResourceType><rasd:OtherResourceType>ovirtmgmt</rasd:OtherResourceType><rasd:ResourceSubType>3</rasd:ResourceSubType><rasd:Connection>ovirtmgmt</rasd:Connection><rasd:Linked>true</rasd:Linked><rasd:Name>vnet0</rasd:Name><rasd:MACAddress>00:16:3E:7B:BB:BB</rasd:MACAddress><rasd:speed>1000</rasd:speed><Type>interface</Type><Device>bridge</Device><rasd:Address>{slot=0x03, bus=0x00, domain=0x0000, type=pci, function=0x0}</rasd:Address><BootOrder>0</BootOrder><IsPlugged>true</IsPlugged><IsReadOnly>false</IsReadOnly><Alias>net0</Alias></Item><Item><rasd:Caption>Ethernet adapter on [No Network]</rasd:Caption><rasd:InstanceId>a65ffa7a-d182-4d73-967f-93eae6a755f8</rasd:InstanceId><rasd:ResourceType>10</rasd:ResourceType><rasd:OtherResourceType/><rasd:ResourceSubType>3</rasd:ResourceSubType><rasd:Connection/><rasd:Linked>true</rasd:Linked><rasd:Name>nic1</rasd:Name><rasd:MACAddress>00:1a:4a:16:01:51</rasd:MACAddress><rasd:speed>1000</rasd:speed><Type>interface</Type><Device>bridge</Device><rasd:Address></rasd:Address><BootOrder>0</BootOrder><IsPlugged>true</IsPlugged><IsReadOnly>false</IsReadOnly><Alias></Alias></Item><Item><rasd:Caption>USB Controller</rasd:Caption><rasd:InstanceId>3</rasd:InstanceId><rasd:ResourceType>23</rasd:ResourceType><rasd:UsbPolicy>DISABLED</rasd:UsbPolicy></Item><Item><rasd:Caption>Graphical Framebuffer</rasd:Caption><rasd:InstanceId>6c3528ac-6357-430d-9824-7acadca0e0e2</rasd:InstanceId><rasd:ResourceType>26</rasd:ResourceType><Type>graphics</Type><Device>vnc</Device><rasd:Address></rasd:Address><BootOrder>0</BootOrder><IsPlugged>true</IsPlugged><IsReadOnly>false</IsReadOnly><Alias></Alias></Item><Item><rasd:Caption>CDROM</rasd:Caption><rasd:InstanceId>4c8813c9-e228-484c-8a13-377c5f8dae50</rasd:InstanceId><rasd:ResourceType>15</rasd:ResourceType><Type>disk</Type><Device>cdrom</Device><rasd:Address></rasd:Address><BootOrder>0</BootOrder><IsPlugged>false</IsPlugged><IsReadOnly>true</IsReadOnly><Alias></Alias><SpecParams><path></path></SpecParams></Item><Item><rasd:ResourceType>0</rasd:ResourceType><rasd:InstanceId>aafc77c0-4a8d-4ced-8467-f92cd1868d90</rasd:InstanceId><Type>controller</Type><Device>virtio-serial</Device><rasd:Address>{slot=0x05, bus=0x00, domain=0x0000, type=pci, function=0x0}</rasd:Address><BootOrder>0</BootOrder><IsPlugged>true</IsPlugged><IsReadOnly>false</IsReadOnly><Alias>virtio-serial0</Alias></Item><Item><rasd:ResourceType>0</rasd:ResourceType><rasd:InstanceId>d691d9a5-36fb-490d-9c21-53eca8b02fce</rasd:InstanceId><Type>video</Type><Device>cirrus</Device><rasd:Address>{slot=0x02, bus=0x00, domain=0x0000, type=pci, function=0x0}</rasd:Address><BootOrder>0</BootOrder><IsPlugged>true</IsPlugged><IsReadOnly>false</IsReadOnly><Alias>video0</Alias></Item><Item><rasd:ResourceType>0</rasd:ResourceType><rasd:InstanceId>0bc293b9-907d-4ac0-81a8-123c6cd77e60</rasd:InstanceId><Type>channel</Type><Device>unix</Device><rasd:Address>{bus=0, controller=0, type=virtio-serial, port=2}</rasd:Address><BootOrder>0</BootOrder><IsPlugged>true</IsPlugged><IsReadOnly>false</IsReadOnly><Alias>channel1</Alias></Item><Item><rasd:ResourceType>0</rasd:ResourceType><rasd:InstanceId>25156b8d-9962-4b45-967f-8e9b26931425</rasd:InstanceId><Type>controller</Type><Device>ide</Device><rasd:Address>{slot=0x01, bus=0x00, domain=0x0000, type=pci, function=0x1}</rasd:Address><BootOrder>0</BootOrder><IsPlugged>true</IsPlugged><IsReadOnly>false</IsReadOnly><Alias>ide</Alias></Item><Item><rasd:ResourceType>0</rasd:ResourceType><rasd:InstanceId>3ab4d2ac-7c67-41cc-bbed-5bdc059ab303</rasd:InstanceId><Type>channel</Type><Device>unix</Device><rasd:Address>{bus=0, controller=0, type=virtio-serial, port=3}</rasd:Address><BootOrder>0</BootOrder><IsPlugged>true</IsPlugged><IsReadOnly>false</IsReadOnly><Alias>channel2</Alias></Item><Item><rasd:ResourceType>0</rasd:ResourceType><rasd:InstanceId>617be138-60bd-458d-b7d2-de9ae5fc01ea</rasd:InstanceId><Type>channel</Type><Device>unix</Device><rasd:Address>{bus=0, controller=0, type=virtio-serial, port=1}</rasd:Address><BootOrder>0</BootOrder><IsPlugged>true</IsPlugged><IsReadOnly>false</IsReadOnly><Alias>channel0</Alias></Item><Item><rasd:ResourceType>0</rasd:ResourceType><rasd:InstanceId>638c4da0-fbb0-480f-8b0a-4dd9fe5be8f8</rasd:InstanceId><Type>disk</Type><Device>cdrom</Device><rasd:Address>{bus=1, controller=0, type=drive, target=0, unit=0}</rasd:Address><BootOrder>0</BootOrder><IsPlugged>true</IsPlugged><IsReadOnly>false</IsReadOnly><Alias>ide0-1-0</Alias></Item><Item><rasd:ResourceType>0</rasd:ResourceType><rasd:InstanceId>67589c37-bf92-4f1e-865d-b1b46d6c770f</rasd:InstanceId><Type>controller</Type><Device>scsi</Device><rasd:Address>{slot=0x04, bus=0x00, domain=0x0000, type=pci, function=0x0}</rasd:Address><BootOrder>0</BootOrder><IsPlugged>true</IsPlugged><IsReadOnly>false</IsReadOnly><Alias>scsi0</Alias></Item><Item><rasd:ResourceType>0</rasd:ResourceType><rasd:InstanceId>72a87009-523f-40a9-b861-c61dc3a57c95</rasd:InstanceId><Type>controller</Type><Device>usb</Device><rasd:Address>{slot=0x01, bus=0x00, domain=0x0000, type=pci, function=0x2}</rasd:Address><BootOrder>0</BootOrder><IsPlugged>true</IsPlugged><IsReadOnly>false</IsReadOnly><Alias>usb</Alias></Item><Item><rasd:ResourceType>0</rasd:ResourceType><rasd:InstanceId>94570f3f-3274-4acf-86a6-d85ab48df6ef</rasd:InstanceId><Type>sound</Type><Device>ich6</Device><rasd:Address></rasd:Address><BootOrder>0</BootOrder><IsPlugged>false</IsPlugged><IsReadOnly>true</IsReadOnly><Alias></Alias></Item></Section><Section xsi:type="ovf:SnapshotsSection_Type"><Snapshot ovf:id="d276a54e-be19-4902-89a3-68d78876241e"><Type>ACTIVE</Type><Description>Active VM</Description><CreationDate>2016/06/13 13:13:55</CreationDate></Snapshot></Section></Content></ovf:Envelope>
Is there a chance that https://bugzilla.redhat.com/show_bug.cgi?id=1340628 could somehow be related to this bug? In this bug I've done reproduction without changing default update timings for OVF, I mean that I did not changed the "set engine-config -s OvfUpdateIntervalInMinutes=1" before restarting the engine. Could this be a root cause?
Yes, you have to change that value and restart the engine VM. Otherwise after any modification to the engine VM you have to wait at least one hour before restarting the engine VM.
I've changed the "set engine-config -s OvfUpdateIntervalInMinutes=1" and have the same symptoms just as before, powered off/on the engine and got to the same results as described in comment #26. Then I've also restarted the hosted-engine-host and hit again the https://bugzilla.redhat.com/show_bug.cgi?id=1343005. Engine fails to get started.
I've tried to make "warm-restart" of HE-VM using these steps on host first: hosted-engine --set-maintenance --mode=global hosted-engine --vm-poweroff hosted-engine --vm-start hosted-engine --set-maintenance --mode=none hosted-engine --vm-status hosted-engine --set-maintenance --mode=none hosted-engine --vm-status [root@alma03 ~]# systemctl status ovirt-ha-agent -l ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Sun 2016-06-19 20:53:10 IDT; 27min ago Main PID: 2916 (ovirt-ha-agent) CGroup: /system.slice/ovirt-ha-agent.service └─2916 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon Jun 19 21:20:26 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images Jun 19 21:20:31 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain Jun 19 21:20:31 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE Jun 19 21:20:37 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:2474f383-99a7-418e-8ca7-d24c8336e363, volUUID:4b5556d0-f693-45e1-b5fb-4da06e91d190 Jun 19 21:20:37 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:7fc06428-2493-4f23-a58f-f534a33283f2, volUUID:d9a5c8ff-9235-4bd4-9b61-ac22fb36ada8 Jun 19 21:20:37 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Extracting Engine VM OVF from the OVF_STORE Jun 19 21:20:37 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:OVF_STORE volume path: /rhev/data-center/mnt/10.35.64.11:_vol_RHEV_Virt_nsednev__3__6__HE__2/b9b588f4-9c91-4784-a0b8-242b3e034bd6/images/7fc06428-2493-4f23-a58f-f534a33283f2/d9a5c8ff-9235-4bd4-9b61-ac22fb36ada8 Jun 19 21:20:37 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Found an OVF for HE VM, trying to convert Jun 19 21:20:37 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Got vm.conf from OVF_STORE Jun 19 21:20:37 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Current state EngineUnexpectedlyDown (score: 0) [root@alma03 ~]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : alma03.qa.lab.tlv.redhat.com Host ID : 1 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"} Score : 0 stopped : False Local maintenance : False crc32 : adfb9a85 Host timestamp : 1728 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1728 (Sun Jun 19 21:20:11 2016) host-id=1 score=0 maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 02:35:46 1970 Then I tried these steps: [root@alma03 ~]# systemctl restart ovirt-ha-broker && systemctl restart ovirt-ha-agent As a result score changed to 3400 and engine was starting, but eventually failed with: hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : alma03.qa.lab.tlv.redhat.com Host ID : 1 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"} Score : 0 stopped : False Local maintenance : False crc32 : f6866174 Host timestamp : 2101 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=2101 (Sun Jun 19 21:26:24 2016) host-id=1 score=0 maintenance=False state=EngineUnexpectedlyDown stopped=False [root@alma03 ~]# systemctl status ovirt-ha-agent -l ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Sun 2016-06-19 21:22:42 IDT; 4min 41s ago Main PID: 10420 (ovirt-ha-agent) CGroup: /system.slice/ovirt-ha-agent.service └─10420 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon Jun 19 21:27:03 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Timeout set to Sun Jun 19 21:37:03 2016 while transitioning <class 'ovirt_hosted_engine_ha.agent.states.EngineUnexpectedlyDown'> -> <class 'ovirt_hosted_engine_ha.agent.states.EngineUnexpectedlyDown'> Jun 19 21:27:03 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM Jun 19 21:27:08 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage Jun 19 21:27:08 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Jun 19 21:27:18 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Jun 19 21:27:18 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain Jun 19 21:27:18 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images Jun 19 21:27:18 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images Jun 19 21:27:23 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain Jun 19 21:27:23 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE Then I've waited for more than 15 minutes to see if score would rise to 3400 on my host, and did so: [root@alma03 ~]# systemctl status ovirt-ha-agent -l ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Sun 2016-06-19 21:22:42 IDT; 15min ago Main PID: 10420 (ovirt-ha-agent) CGroup: /system.slice/ovirt-ha-agent.service └─10420 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon Jun 19 21:38:04 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Engine down and local host has best score (3400), attempting to start engine VM Jun 19 21:38:04 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Trying: notify time=1466361484.36 type=state_transition detail=EngineDown-EngineStart hostname='alma03.qa.lab.tlv.redhat.com' Jun 19 21:38:06 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, was notification of state_transition (EngineDown-EngineStart) sent? sent Jun 19 21:38:06 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM Jun 19 21:38:11 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage Jun 19 21:38:11 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Jun 19 21:38:21 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Jun 19 21:38:21 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain Jun 19 21:38:21 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images Jun 19 21:38:21 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images [root@alma03 ~]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : alma03.qa.lab.tlv.redhat.com Host ID : 1 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"} Score : 3400 stopped : False Local maintenance : False crc32 : 00e62eca Host timestamp : 2801 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=2801 (Sun Jun 19 21:38:04 2016) host-id=1 score=3400 maintenance=False state=EngineStart stopped=False But engine failed to start and score was zeroed again: [root@alma03 ~]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : alma03.qa.lab.tlv.redhat.com Host ID : 1 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"} Score : 0 stopped : False Local maintenance : False crc32 : e47afaba Host timestamp : 2884 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=2884 (Sun Jun 19 21:39:27 2016) host-id=1 score=0 maintenance=False state=EngineUnexpectedlyDown stopped=False [root@alma03 ~]# systemctl status ovirt-ha-agent -l ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Sun 2016-06-19 21:22:42 IDT; 18min ago Main PID: 10420 (ovirt-ha-agent) CGroup: /system.slice/ovirt-ha-agent.service └─10420 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon Jun 19 21:40:26 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE Jun 19 21:40:32 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:2474f383-99a7-418e-8ca7-d24c8336e363, volUUID:4b5556d0-f693-45e1-b5fb-4da06e91d190 Jun 19 21:40:32 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:7fc06428-2493-4f23-a58f-f534a33283f2, volUUID:d9a5c8ff-9235-4bd4-9b61-ac22fb36ada8 Jun 19 21:40:32 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Extracting Engine VM OVF from the OVF_STORE Jun 19 21:40:32 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:OVF_STORE volume path: /rhev/data-center/mnt/10.35.64.11:_vol_RHEV_Virt_nsednev__3__6__HE__2/b9b588f4-9c91-4784-a0b8-242b3e034bd6/images/7fc06428-2493-4f23-a58f-f534a33283f2/d9a5c8ff-9235-4bd4-9b61-ac22fb36ada8 Jun 19 21:40:32 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Found an OVF for HE VM, trying to convert Jun 19 21:40:32 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Got vm.conf from OVF_STORE Jun 19 21:40:32 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Score is 0 due to unexpected vm shutdown at Sun Jun 19 21:40:06 2016 Jun 19 21:40:32 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Current state EngineUnexpectedlyDown (score: 0) Jun 19 21:40:42 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[10420]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM I've also tried to power-cycle the host, and result was the same, engine could not start.
Created attachment 1169587 [details] sosreport from host alma03
Adding components from host: ovirt-iso-uploader-4.0.0-1.el7ev.noarch ovirt-engine-restapi-4.0.0.5-0.1.el7ev.noarch ovirt-vmconsole-1.0.3-1.el7ev.noarch vdsm-4.18.3-0.el7ev.x86_64 ovirt-host-deploy-1.5.0-1.el7ev.noarch ovirt-vmconsole-proxy-1.0.3-1.el7ev.noarch ovirt-engine-extension-aaa-jdbc-1.1.0-1.el7ev.noarch ovirt-engine-dwh-4.0.0-2.el7ev.noarch ovirt-engine-setup-4.0.0.5-0.1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 ovirt-engine-cli-3.6.7.0-1.el7ev.noarch ovirt-engine-websocket-proxy-4.0.0.5-0.1.el7ev.noarch ovirt-engine-sdk-python-3.6.7.0-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.5.x86_64 ovirt-image-uploader-4.0.0-1.el7ev.noarch ovirt-engine-userportal-4.0.0.5-0.1.el7ev.noarch ovirt-engine-webadmin-portal-4.0.0.5-0.1.el7ev.noarch ovirt-engine-tools-4.0.0.5-0.1.el7ev.noarch ovirt-engine-setup-plugin-websocket-proxy-4.0.0.5-0.1.el7ev.noarch ovirt-engine-backend-4.0.0.5-0.1.el7ev.noarch ovirt-engine-vmconsole-proxy-helper-4.0.0.5-0.1.el7ev.noarch ovirt-engine-dbscripts-4.0.0.5-0.1.el7ev.noarch ovirt-hosted-engine-setup-2.0.0.2-1.el7ev.noarch ovirt-engine-lib-4.0.0.5-0.1.el7ev.noarch ovirt-engine-setup-plugin-ovirt-engine-common-4.0.0.5-0.1.el7ev.noarch ovirt-engine-extensions-api-impl-4.0.0.5-0.1.el7ev.noarch ovirt-host-deploy-java-1.5.0-1.el7ev.noarch ovirt-engine-dwh-setup-4.0.0-2.el7ev.noarch ovirt-engine-setup-plugin-vmconsole-proxy-helper-4.0.0.5-0.1.el7ev.noarch ovirt-engine-4.0.0.5-0.1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.16.x86_64 mom-0.5.4-1.el7ev.noarch ovirt-engine-tools-backup-4.0.0.5-0.1.el7ev.noarch ovirt-log-collector-4.0.0-1.el7ev.noarch ovirt-engine-dashboard-1.0.0-20160615git43298a4.el7ev.x86_64 ovirt-engine-setup-plugin-ovirt-engine-4.0.0.5-0.1.el7ev.noarch ovirt-vmconsole-host-1.0.3-1.el7ev.noarch ovirt-hosted-engine-ha-2.0.0-1.el7ev.noarch ovirt-setup-lib-1.0.2-1.el7ev.noarch ovirt-engine-setup-base-4.0.0.5-0.1.el7ev.noarch Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo)
I do see this in /var/run/ovirt-hosted-engine-ha/vm.conf: [root@alma03 ~]# cat /var/run/ovirt-hosted-engine-ha/vm.conf emulatedMachine=rhel6.5.0 vmId=df291a06-fa08-4e7a-8e7c-32a2903de512 smp=4 memSize=4096 spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir vmName=HostedEngine display=vnc devices={index:0,iface:virtio,format:raw,bootOrder:1,address:{slot:0x06,bus:0x00,domain:0x0000,type:pci,function:0x0},volumeID:dd85c3e1-cbb0-4719-bc24-7b7c058510a4,imageID:fb42d97d-8b85-42ca-a55c-dcdc48c15850,readonly:false,domainID:b9b588f4-9c91-4784-a0b8-242b3e034bd6,deviceId:fb42d97d-8b85-42ca-a55c-dcdc48c15850,poolID:00000000-0000-0000-0000-000000000000,device:disk,shared:exclusive,propagateErrors:off,type:disk} devices={nicModel:pv,macAddr:00:16:3E:7B:BB:BB,linkActive:true,network:ovirtmgmt,deviceId:c53ae1e8-3dad-4659-993a-1e2ff0b3b7d4,address:{slot:0x03,bus:0x00,domain:0x0000,type:pci,function:0x0},device:bridge,type:interface} devices={nicModel:pv,macAddr:00:1a:4a:16:01:51,linkActive:true,network:None,deviceId:59d8c675-b7c5-4bf1-a716-91a9263b1d5e,address:{slot:0x07,bus:0x00,domain:0x0000,type:pci,function:0x0},device:bridge,type:interface} devices={index:2,iface:ide,shared:false,readonly:true,deviceId:8c3179ac-b322-4f5c-9449-c52e3665e0ae,address:{controller:0,target:0,unit:0,bus:1,type:drive},device:cdrom,path:,type:disk} devices={device:vga,alias:video0,type:video,deviceId:4d4f1a6c-db8e-4855-9d03-f8c34ee81135,address:{slot:0x02,bus:0x00,domain:0x0000,type:pci,function:0x0}} devices={device:virtio-serial,type:controller,deviceId:81a03974-39fa-4140-b827-4b4b941c6006,address:{slot:0x05,bus:0x00,domain:0x0000,type:pci,function:0x0}} devices={device:usb,type:controller,deviceId:ec58c2e6-3f86-44c6-8627-8025be9fd509,address:{slot:0x01,bus:0x00,domain:0x0000,type:pci,function:0x2}} devices={device:scsi,model:virtio-scsi,type:controller,deviceId:fe7eb17d-56c7-4cf7-9db1-631ef165d4f9,address:{slot:0x04,bus:0x00,domain:0x0000,type:pci,function:0x0}} devices={device:ide,type:controller,deviceId:3a9be9c2-9f12-4cd7-ace7-a26edeb45dcd,address:{slot:0x01,bus:0x00,domain:0x0000,type:pci,function:0x1}}
I think that root cause of why engine not being able to get started happens because of this: Thread-27987::DEBUG::2016-06-20 12:04:43,927::bindingxmlrpc::1242::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0} , 'statsList': [{'status': 'Down', 'exitMessage': "Cannot get interface MTU on 'None': No such device", 'statusTime': '4349468610', 'vmId': 'df291 a06-fa08-4e7a-8e7c-32a2903de512', 'exitReason': 1, 'exitCode': 1}]} When empty interface being created for the engine's VM, for some reason it's type is 'None' instead of 'Dummy' and MTU is empty, so these caused for engine's VM not to start. If empty vNIC added to guest-VM then it's type is dummy and there is no problem starting it at all.
Can you please provide more information regarding customer's vNIC? Was it assigned to existing network or was it simply created as empty?
Customer never shared that exact information with me. From my tests, I didn't create any additional logical network and defined the second vnic on ovirtmgmt logical network. I never got into the state when engine would not start. But I also never saw OVF store updated with the nic, and consequently, I didn't see vm.conf modified. However, I think let's have a clear line to start from, because the code is now different since the time the customer and myself has tested this. How I see the use cases: Case #1: - Deploy HE VM with network on ovirtmgmt logical network (default/only option). - Modify the VM in the UI and add additional vnic that will be on a different network then ovirtmgmt logical network, let's call it display network. Mark it as display in the UI as well. - Reboot HE VM - Second vNIC, attached to display logical network is up and active. Case #2: Same as above, but instead rebooting, test live migration of HE VM. Make sure after the migration to another host the second vnic is still plugged and active. Case #3: (customer case) Same as number, with one difference: ovirtmgmt is on a private network. The display network is the public network. To access HE VM Admin Portal you either need to be on that private network or add the display network to HE VM. Reboot and make sure you still can access the VM/admin portal from the public network. Case #4: (my testing environment, not sure if it is a valid use case though): Same as number #1 with the difference that the second vnic will be on the same logical network as the first vnic, ovirtmgmt.
Followed reproduction steps as follows: 1)Ran "engine-config -s OvfUpdateIntervalInMinutes=1" on engine. 2)Restarted the engine. 3)Added new network to host and attached it to it's NIC. 4)Added new vNIC to HE-VM and assigned it to new network. 5)Set host in global maintenance "hosted-engine --set-maintenance --mode=global" 6)On host "hosted-engine --vm-shutdown". 7)On host cat /var/run/ovirt-hosted-engine-ha/vm.conf and nevly added vNIC was seen on HE-VM and on host, VM was still powering down. 8)Started HE-VM on host "hosted-engine --vm-start". 9)On host "hosted-engine --set-maintenance --mode=none". 10)Engine booted up and newly added vNIC remained connected to it. Host: libvirt-client-1.2.17-13.el7_2.5.x86_64 ovirt-vmconsole-1.0.3-1.el7ev.noarch ovirt-engine-sdk-python-3.6.7.0-1.el7ev.noarch vdsm-4.18.3-0.el7ev.x86_64 ovirt-setup-lib-1.0.2-1.el7ev.noarch ovirt-hosted-engine-ha-2.0.0-1.el7ev.noarch mom-0.5.4-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 ovirt-host-deploy-1.5.0-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.16.x86_64 ovirt-vmconsole-host-1.0.3-1.el7ev.noarch ovirt-hosted-engine-setup-2.0.0.2-1.el7ev.noarch Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) Engine: rhevm-doc-4.0.0-2.el7ev.noarch rhev-release-4.0.0-18-001.noarch rhevm-setup-plugins-4.0.0.1-1.el7ev.noarch rhevm-spice-client-x64-msi-4.0-2.el7ev.noarch rhevm-branding-rhev-4.0.0-1.el7ev.noarch rhevm-4.0.0.6-0.1.el7ev.noarch rhevm-guest-agent-common-1.0.12-2.el7ev.noarch rhevm-dependencies-4.0.0-1.el7ev.noarch rhevm-spice-client-x86-msi-4.0-2.el7ev.noarch rhev-guest-tools-iso-4.0-2.el7ev.noarch Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo)
Attaching link of screen-capture of what I've been doing. https://drive.google.com/a/redhat.com/file/d/0B85BEaDBcF88a09QZDE4WkVjLW8/view?usp=sharing
In case that host being power-cycled, and reproduction is made as described in Comment 45, then everything is working just fine too. Migration also passed OK. Case #4 is not legal as there is only one vNIC that can be within the management network while HE-VM is running, as there is already first vNIC in that network, I can't add another newly created vNIC to the same mgmt network. Case #3 You may use is a bit problematic for reproduction on my environment, I'd better be asking network team for help here. I'll split this bug in to two possible options: 1)Customer did as described in 1348640. 2)Customer did not let shared storage to get OVF updated with new vNIC parameters, which is by default 1 hour (customer could shorten the timer by running "engine-config -s OvfUpdateIntervalInMinutes=1" on their engine's and then restart them) prior to HE-VM was migrated/powered-off/on or host was power-cycled causing HE-VM for cold restart.
I've tried to: As a precondition, I've ran "engine-config -s OvfUpdateIntervalInMinutes=1" on engine and restarted ovirt-engine service on HE-VM. 1)Added network named "Monitoring1" to networks tab. 2)Associated "Monitoring1" on both hosts with additional physical interface "enp3s0f1", interface was unplugged from network, so there was no link up on it and it had no assigned IP address. 3)Added new vNIC to the engine and associated it with "Monitoring1" profile. 4)Migrated HE-VM back and forth between the hosts. 5)Set host with HE-VM in to maintenance and saw that HE-VM was migrated to another active host. 6)Turned one host in to local maintenance and then power-cycled second host with HE-VM, once host came online agan, the HE-VM powered up successfully, without loosing new vNIC. During these tests vNIC did not disappeared from HE-VM. Works for me on these components: Hosts: ovirt-setup-lib-1.0.2-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.16.x86_64 mom-0.5.4-1.el7ev.noarch rhev-release-4.0.0-19-001.noarch vdsm-4.18.4-2.el7ev.x86_64 ovirt-vmconsole-host-1.0.3-1.el7ev.noarch ovirt-hosted-engine-ha-2.0.0-1.el7ev.noarch ovirt-engine-sdk-python-3.6.7.0-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.5.x86_64 ovirt-host-deploy-1.5.0-1.el7ev.noarch ovirt-hosted-engine-setup-2.0.0.2-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 ovirt-vmconsole-1.0.3-1.el7ev.noarch Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) Engine: rhevm-doc-4.0.0-2.el7ev.noarch rhevm-setup-plugins-4.0.0.1-1.el7ev.noarch rhevm-spice-client-x64-msi-4.0-2.el7ev.noarch rhevm-4.0.0.6-0.1.el7ev.noarch rhev-release-4.0.0-19-001.noarch rhevm-guest-agent-common-1.0.12-2.el7ev.noarch rhevm-dependencies-4.0.0-1.el7ev.noarch rhevm-branding-rhev-4.0.0-2.el7ev.noarch rhevm-spice-client-x86-msi-4.0-2.el7ev.noarch rhev-guest-tools-iso-4.0-2.el7ev.noarch Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-1743.html