Description of problem: 3 node HCI cluster CentOS8. Converted to CentOS Streams now Engine no longer booting up. All three nodes same with cycling events about python library error per below Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Upgrade from CentOS8 to streams 2. Update packages. Reboot 3. HCI with Gluster volumes are mounting but engine no longer starts Actual results: [root@thor ~]# systemctl enable ovirt-ha-agent [root@thor ~]# systemctl start ovirt-ha-agent [root@thor ~]# systemctl status ovirt-ha-agent ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: activating (auto-restart) (Result: exit-code) since Fri 2021-01-01 09:33:34 EST; 6s ago Process: 427064 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (code=exited, status=157) Main PID: 427064 (code=exited, status=157) Tasks: 0 (limit: 1235320) Memory: 0B CGroup: /system.slice/ovirt-ha-agent.service [root@thor ~]# [root@thor ~]# [root@thor ~]# systemctl status ovirt-ha-agent -l ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: activating (auto-restart) (Result: exit-code) since Fri 2021-01-01 09:34:06 EST; 3s ago Process: 427376 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (code=exited, status=157) Main PID: 427376 (code=exited, status=157) Tasks: 0 (limit: 1235320) Memory: 0B CGroup: /system.slice/ovirt-ha-agent.service [root@thor ~]# Expected results: Nodes to boot HCI engine which will launch VMs. Additional info: Jan 1 09:35:28 thor journal[428169]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.broker.Broker ERROR Failed initializing the broker: [Errno 107] Transport endpoint is not connected: '/rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_engine/3afc47ba-afb9-413f-8de5-8d9a2f45ecde/ha_agent/hosted-engine.metadata' Jan 1 09:35:28 thor journal[428169]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.broker.Broker ERROR Traceback (most recent call last):#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 64, in run#012 self._storage_broker_instance = self._get_storage_broker()#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 143, in _get_storage_broker#012 return storage_broker.StorageBroker()#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 97, in __init__#012 self._backend.connect()#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 408, in connect#012 self._check_symlinks(self._storage_path, volume.path, service_link)#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 105, in _check_symlinks#012 os.unlink(service_link)#012OSError: [Errno 107] Transport endpoint is not connected: '/rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_engine/3afc47ba-afb9-413f-8de5-8d9a2f45ecde/ha_agent/hosted-engine.metadata' Jan 1 09:35:28 thor journal[428169]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.broker.Broker ERROR Trying to restart the broker Jan 1 09:35:28 thor platform-python[428169]: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 1 09:35:28 thor systemd[1]: ovirt-ha-broker.service: Main process exited, code=exited, status=1/FAILURE Jan 1 09:35:28 thor systemd[1]: ovirt-ha-broker.service: Failed with result 'exit-code'. Jan 1 09:35:28 thor abrt-server[428202]: Deleting problem directory Python3-2021-01-01-09:35:28-428169 (dup of Python3-2020-09-18-14:25:13-1363) Jan 1 09:35:28 thor systemd[1]: ovirt-ha-broker.service: Service RestartSec=100ms expired, scheduling restart. Jan 1 09:35:28 thor systemd[1]: ovirt-ha-broker.service: Scheduled restart job, restart counter is at 8235. Jan 1 09:35:28 thor systemd[1]: Stopped oVirt Hosted Engine High Availability Communications Broker. Jan 1 09:35:28 thor systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. Jan 1 09:35:28 thor abrt-server[428202]: /bin/sh: reporter-systemd-journal: command not found Jan 1 09:35:29 thor systemd[1]: ovirt-ha-agent.service: Service RestartSec=10s expired, scheduling restart. Jan 1 09:35:29 thor systemd[1]: ovirt-ha-agent.service: Scheduled restart job, restart counter is at 3962. Jan 1 09:35:29 thor systemd[1]: Stopped oVirt Hosted Engine High Availability Monitoring Agent. Jan 1 09:35:29 thor systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent. Jan 1 09:35:30 thor journal[428219]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to start necessary monitors Jan 1 09:35:30 thor journal[428219]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call last):#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 85, in start_monitor#012 response = self._proxy.start_monitor(type, options)#012 File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__#012 return self.__send(self.__name, args)#012 File "/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request#012 verbose=self.__verbose#012 File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request#012 return self.single_request(host, handler, request_body, verbose)#012 File "/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in single_request#012 http_conn = self.send_request(host, handler, request_body, verbose)#012 File "/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request#012 self.send_content(connection, request_body)#012 File "/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_content#012 connection.endheaders(request_body)#012 File "/usr/lib64/python3.6/http/client.py", line 1264, in endheaders#012 self._send_output(message_body, encode_chunked=encode_chunked)#012 File "/usr/lib64/python3.6/http/client.py", line 1040, in _send_output#012 self.send(msg)#012 File "/usr/lib64/python3.6/http/client.py", line 978, in send#012 self.connect()#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 74, in connect#012 self.sock.connect(base64.b16decode(self.host))#012FileNotFoundError: [Errno 2] No such file or directory#012#012During handling of the above exception, another exception occurred:#012#012Traceback (most recent call last):#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent#012 return action(he)#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper#012 return he.start_monitoring()#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 437, in start_monitoring#012 self._initialize_broker()#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 561, in _initialize_broker#012 m.get('options', {}))#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 91, in start_monitor#012 ).format(t=type, o=options, e=e)#012ovirt_hosted_engine_ha.lib.exceptions.RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'addr': '172.16.100.1', 'network_test': 'dns', 'tcp_t_address': '', 'tcp_t_port': ''}] Jan 1 09:35:30 thor journal[428219]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent Jan 1 09:35:30 thor systemd[1]: ovirt-ha-agent.service: Main process exited, code=exited, status=157/n/a Jan 1 09:35:30 thor systemd[1]: ovirt-ha-agent.service: Failed with result 'exit-code'. Jan 1 09:35:30 thor systemd[1]: Started Session c8243 of user root. Jan 1 09:35:30 thor systemd[1]: session-c8243.scope: Succeeded. Jan 1 09:35:31 thor vdsm[7851]: WARN Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished?
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.
Created attachment 1744219 [details] Logs from hosted-engine broker Something is wrong with update referencing gluster mount point for engine. Just trying to start back up my VMs :) I am about 90% certain nothing is wrong with the volumes.. Something patched and broke how HA starts engine services. ### [root@medusa ~]# mount |grep gluster /dev/mapper/vdo_0900 on /gluster_bricks/gv0 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota,_netdev,x-systemd.requires=vdo.service) /dev/mapper/gluster_vg_sdb-gluster_lv_engine on /gluster_bricks/engine type xfs (rw,noatime,nodiratime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota,_netdev,x-systemd.requires=vdo.service) /dev/mapper/gluster_vg_sdb-gluster_lv_vmstore on /gluster_bricks/vmstore type xfs (rw,noatime,nodiratime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota,_netdev,x-systemd.requires=vdo.service) /dev/mapper/gluster_vg_sdb-gluster_lv_data on /gluster_bricks/data type xfs (rw,noatime,nodiratime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota,_netdev,x-systemd.requires=vdo.service) thorst.penguinpages.local:/engine on /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_engine type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev,x-systemd.device-timeout=0) medusast.penguinpages.local:/data on /media/data type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev) medusast.penguinpages.local:/engine on /media/engine type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev) medusast.penguinpages.local:/vmstore on /media/vmstore type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev) medusast.penguinpages.local:/gv0 on /media/gv0 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev) [root@medusa ~]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 32G 0 32G 0% /dev tmpfs 32G 16K 32G 1% /dev/shm tmpfs 32G 1.2G 31G 4% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/mapper/cl-root 50G 9.4G 41G 19% / /dev/mapper/cl-home 163G 1.2G 162G 1% /home /dev/sda2 976M 252M 658M 28% /boot /dev/sda1 599M 6.9M 592M 2% /boot/efi /dev/mapper/vdo_0900 4.0T 29G 4.0T 1% /gluster_bricks/gv0 /dev/mapper/gluster_vg_sdb-gluster_lv_engine 100G 9.0G 91G 9% /gluster_bricks/engine /dev/mapper/gluster_vg_sdb-gluster_lv_vmstore 1000G 249G 752G 25% /gluster_bricks/vmstore /dev/mapper/gluster_vg_sdb-gluster_lv_data 1000G 769G 232G 77% /gluster_bricks/data thorst.penguinpages.local:/engine 100G 10G 90G 10% /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_engine tmpfs 6.3G 0 6.3G 0% /run/user/0 medusast.penguinpages.local:/data 1000G 779G 222G 78% /media/data medusast.penguinpages.local:/engine 100G 10G 90G 10% /media/engine medusast.penguinpages.local:/vmstore 1000G 259G 742G 26% /media/vmstore medusast.penguinpages.local:/gv0 4.0T 70G 4.0T 2% /media/gv0 [root@medusa ~]# vdostats --human-readable Device Size Used Available Use% Space saving% /dev/mapper/vdo_sdb 476.9G 397.3G 79.7G 83% 72% /dev/mapper/vdo_0900 931.5G 4.8G 926.7G 0% 68%
Questions: 1) is their any way to manually start VMs while the "engine" is offline? 2) I think this ticket is noting a fix to be coming in 4.4+? [root@odin ~]# rpm -qa |grep ovirt ovirt-imageio-client-2.1.1-1.el8.x86_64 ovirt-vmconsole-1.0.9-1.el8.noarch ovirt-openvswitch-2.11-0.2020061801.el8.noarch ovirt-provider-ovn-driver-1.2.33-1.el8.noarch ovirt-engine-appliance-4.4-20201221110111.1.el8.x86_64 ovirt-host-dependencies-4.4.1-4.el8.x86_64 ovirt-python-openvswitch-2.11-0.2020061801.el8.noarch ovirt-hosted-engine-ha-2.4.5-1.el8.noarch ovirt-ansible-collection-1.2.4-1.el8.noarch ovirt-openvswitch-ovn-common-2.11-0.2020061801.el8.noarch ovirt-release44-4.4.4-1.el8.noarch ovirt-openvswitch-ovn-2.11-0.2020061801.el8.noarch ovirt-openvswitch-ovn-host-2.11-0.2020061801.el8.noarch python3-ovirt-setup-lib-1.3.2-1.el8.noarch ovirt-vmconsole-host-1.0.9-1.el8.noarch ovirt-imageio-common-2.1.1-1.el8.x86_64 ovirt-imageio-daemon-2.1.1-1.el8.x86_64 python3-ovirt-engine-sdk4-4.4.8-1.el8.x86_64 As noted ... I am on 4.4-2 I just need to start my VMs again.. and after three weeks now have to decide to give up debug and wipe because no ETA on fix.
Learning a bit more .. I think root cause is package conflict issues ########### [root@thor ~]# dnf update Last metadata expiration check: 2:39:36 ago on Fri 15 Jan 2021 01:17:50 PM EST. Error: Problem 1: package ovirt-host-4.4.1-4.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed - package cockpit-bridge-234-1.el8.x86_64 conflicts with cockpit-dashboard < 233 provided by cockpit-dashboard-217-1.el8.noarch - cannot install the best update candidate for package ovirt-host-4.4.1-4.el8.x86_64 - cannot install the best update candidate for package cockpit-bridge-217-1.el8.x86_64 Problem 2: problem with installed package ovirt-host-4.4.1-4.el8.x86_64 - package ovirt-host-4.4.1-4.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed - package cockpit-system-234-1.el8.noarch obsoletes cockpit-dashboard provided by cockpit-dashboard-217-1.el8.noarch - cannot install the best update candidate for package cockpit-dashboard-217-1.el8.noarch Problem 3: package ovirt-hosted-engine-setup-2.4.9-1.el8.noarch requires ovirt-host >= 4.4.0, but none of the providers can be installed - package ovirt-host-4.4.1-4.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed - package ovirt-host-4.4.1-1.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed - package ovirt-host-4.4.1-2.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed - package ovirt-host-4.4.1-3.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed - package cockpit-system-234-1.el8.noarch obsoletes cockpit-dashboard provided by cockpit-dashboard-217-1.el8.noarch - cannot install the best update candidate for package ovirt-hosted-engine-setup-2.4.9-1.el8.noarch - cannot install the best update candidate for package cockpit-system-217-1.el8.noarch (try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages) [root@thor ~]# dnf update --allowerasing Last metadata expiration check: 2:39:45 ago on Fri 15 Jan 2021 01:17:50 PM EST. Dependencies resolved. ========================================================================================================================================================================================================================================= Package Architecture Version Repository Size ========================================================================================================================================================================================================================================= Upgrading: cockpit-bridge x86_64 234-1.el8 baseos 597 k cockpit-system noarch 234-1.el8 baseos 3.1 M replacing cockpit-dashboard.noarch 217-1.el8 Removing dependent packages: cockpit-ovirt-dashboard noarch 0.14.17-1.el8 @ovirt-4.4 16 M ovirt-host x86_64 4.4.1-4.el8 @ovirt-4.4 11 k ovirt-hosted-engine-setup noarch 2.4.9-1.el8 @ovirt-4.4 1.3 M Transaction Summary ========================================================================================================================================================================================================================================= Upgrade 2 Packages Remove 3 Packages ##################### [root@odin ~]# systemctl status ovirt-ha-agent.service ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: inactive (dead) Condition: start condition failed at Fri 2021-01-15 16:02:53 EST; 7s ago └─ ConditionFileNotEmpty=/etc/ovirt-hosted-engine/hosted-engine.conf was not met [root@odin ~]# [root@odin ~]# systemctl status ovirt-ha-agent.service ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: inactive (dead) Condition: start condition failed at Fri 2021-01-15 16:02:53 EST; 7s ago └─ ConditionFileNotEmpty=/etc/ovirt-hosted-engine/hosted-engine.conf was not met [root@odin ~]# If you do force the upgrade.. the never ending log rotation of errors goes away..
closing as duplicate of bug #1917011. Please wait before switching to CentOS Stream, compatibility for it is still in tech preview. *** This bug has been marked as a duplicate of bug 1917011 ***