Bug 1911910
| Summary: | ovirt-ha-broker - Failing to start Post upgrade to streams | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | penguinpages <jeremey.wise> | ||||
| Component: | Python Library | Assignee: | Asaf Rachmani <arachman> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | meital avital <mavital> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.4.4.1 | CC: | ahadas, bugs | ||||
| Target Milestone: | ovirt-4.4.5 | Flags: | pm-rhel:
ovirt-4.4+
|
||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-01-18 09:37:57 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again. Created attachment 1744219 [details]
Logs from hosted-engine broker
Something is wrong with update referencing gluster mount point for engine.
Just trying to start back up my VMs :)
I am about 90% certain nothing is wrong with the volumes.. Something patched and broke how HA starts engine services.
###
[root@medusa ~]# mount |grep gluster
/dev/mapper/vdo_0900 on /gluster_bricks/gv0 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota,_netdev,x-systemd.requires=vdo.service)
/dev/mapper/gluster_vg_sdb-gluster_lv_engine on /gluster_bricks/engine type xfs (rw,noatime,nodiratime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota,_netdev,x-systemd.requires=vdo.service)
/dev/mapper/gluster_vg_sdb-gluster_lv_vmstore on /gluster_bricks/vmstore type xfs (rw,noatime,nodiratime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota,_netdev,x-systemd.requires=vdo.service)
/dev/mapper/gluster_vg_sdb-gluster_lv_data on /gluster_bricks/data type xfs (rw,noatime,nodiratime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota,_netdev,x-systemd.requires=vdo.service)
thorst.penguinpages.local:/engine on /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_engine type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev,x-systemd.device-timeout=0)
medusast.penguinpages.local:/data on /media/data type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev)
medusast.penguinpages.local:/engine on /media/engine type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev)
medusast.penguinpages.local:/vmstore on /media/vmstore type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev)
medusast.penguinpages.local:/gv0 on /media/gv0 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev)
[root@medusa ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 16K 32G 1% /dev/shm
tmpfs 32G 1.2G 31G 4% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/cl-root 50G 9.4G 41G 19% /
/dev/mapper/cl-home 163G 1.2G 162G 1% /home
/dev/sda2 976M 252M 658M 28% /boot
/dev/sda1 599M 6.9M 592M 2% /boot/efi
/dev/mapper/vdo_0900 4.0T 29G 4.0T 1% /gluster_bricks/gv0
/dev/mapper/gluster_vg_sdb-gluster_lv_engine 100G 9.0G 91G 9% /gluster_bricks/engine
/dev/mapper/gluster_vg_sdb-gluster_lv_vmstore 1000G 249G 752G 25% /gluster_bricks/vmstore
/dev/mapper/gluster_vg_sdb-gluster_lv_data 1000G 769G 232G 77% /gluster_bricks/data
thorst.penguinpages.local:/engine 100G 10G 90G 10% /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_engine
tmpfs 6.3G 0 6.3G 0% /run/user/0
medusast.penguinpages.local:/data 1000G 779G 222G 78% /media/data
medusast.penguinpages.local:/engine 100G 10G 90G 10% /media/engine
medusast.penguinpages.local:/vmstore 1000G 259G 742G 26% /media/vmstore
medusast.penguinpages.local:/gv0 4.0T 70G 4.0T 2% /media/gv0
[root@medusa ~]# vdostats --human-readable
Device Size Used Available Use% Space saving%
/dev/mapper/vdo_sdb 476.9G 397.3G 79.7G 83% 72%
/dev/mapper/vdo_0900 931.5G 4.8G 926.7G 0% 68%
Questions: 1) is their any way to manually start VMs while the "engine" is offline? 2) I think this ticket is noting a fix to be coming in 4.4+? [root@odin ~]# rpm -qa |grep ovirt ovirt-imageio-client-2.1.1-1.el8.x86_64 ovirt-vmconsole-1.0.9-1.el8.noarch ovirt-openvswitch-2.11-0.2020061801.el8.noarch ovirt-provider-ovn-driver-1.2.33-1.el8.noarch ovirt-engine-appliance-4.4-20201221110111.1.el8.x86_64 ovirt-host-dependencies-4.4.1-4.el8.x86_64 ovirt-python-openvswitch-2.11-0.2020061801.el8.noarch ovirt-hosted-engine-ha-2.4.5-1.el8.noarch ovirt-ansible-collection-1.2.4-1.el8.noarch ovirt-openvswitch-ovn-common-2.11-0.2020061801.el8.noarch ovirt-release44-4.4.4-1.el8.noarch ovirt-openvswitch-ovn-2.11-0.2020061801.el8.noarch ovirt-openvswitch-ovn-host-2.11-0.2020061801.el8.noarch python3-ovirt-setup-lib-1.3.2-1.el8.noarch ovirt-vmconsole-host-1.0.9-1.el8.noarch ovirt-imageio-common-2.1.1-1.el8.x86_64 ovirt-imageio-daemon-2.1.1-1.el8.x86_64 python3-ovirt-engine-sdk4-4.4.8-1.el8.x86_64 As noted ... I am on 4.4-2 I just need to start my VMs again.. and after three weeks now have to decide to give up debug and wipe because no ETA on fix.
Learning a bit more .. I think root cause is package conflict issues
###########
[root@thor ~]# dnf update
Last metadata expiration check: 2:39:36 ago on Fri 15 Jan 2021 01:17:50 PM EST.
Error:
Problem 1: package ovirt-host-4.4.1-4.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed
- package cockpit-bridge-234-1.el8.x86_64 conflicts with cockpit-dashboard < 233 provided by cockpit-dashboard-217-1.el8.noarch
- cannot install the best update candidate for package ovirt-host-4.4.1-4.el8.x86_64
- cannot install the best update candidate for package cockpit-bridge-217-1.el8.x86_64
Problem 2: problem with installed package ovirt-host-4.4.1-4.el8.x86_64
- package ovirt-host-4.4.1-4.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed
- package cockpit-system-234-1.el8.noarch obsoletes cockpit-dashboard provided by cockpit-dashboard-217-1.el8.noarch
- cannot install the best update candidate for package cockpit-dashboard-217-1.el8.noarch
Problem 3: package ovirt-hosted-engine-setup-2.4.9-1.el8.noarch requires ovirt-host >= 4.4.0, but none of the providers can be installed
- package ovirt-host-4.4.1-4.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed
- package ovirt-host-4.4.1-1.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed
- package ovirt-host-4.4.1-2.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed
- package ovirt-host-4.4.1-3.el8.x86_64 requires cockpit-dashboard, but none of the providers can be installed
- package cockpit-system-234-1.el8.noarch obsoletes cockpit-dashboard provided by cockpit-dashboard-217-1.el8.noarch
- cannot install the best update candidate for package ovirt-hosted-engine-setup-2.4.9-1.el8.noarch
- cannot install the best update candidate for package cockpit-system-217-1.el8.noarch
(try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
[root@thor ~]# dnf update --allowerasing
Last metadata expiration check: 2:39:45 ago on Fri 15 Jan 2021 01:17:50 PM EST.
Dependencies resolved.
=========================================================================================================================================================================================================================================
Package Architecture Version Repository Size
=========================================================================================================================================================================================================================================
Upgrading:
cockpit-bridge x86_64 234-1.el8 baseos 597 k
cockpit-system noarch 234-1.el8 baseos 3.1 M
replacing cockpit-dashboard.noarch 217-1.el8
Removing dependent packages:
cockpit-ovirt-dashboard noarch 0.14.17-1.el8 @ovirt-4.4 16 M
ovirt-host x86_64 4.4.1-4.el8 @ovirt-4.4 11 k
ovirt-hosted-engine-setup noarch 2.4.9-1.el8 @ovirt-4.4 1.3 M
Transaction Summary
=========================================================================================================================================================================================================================================
Upgrade 2 Packages
Remove 3 Packages
#####################
[root@odin ~]# systemctl status ovirt-ha-agent.service
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
Active: inactive (dead)
Condition: start condition failed at Fri 2021-01-15 16:02:53 EST; 7s ago
└─ ConditionFileNotEmpty=/etc/ovirt-hosted-engine/hosted-engine.conf was not met
[root@odin ~]#
[root@odin ~]# systemctl status ovirt-ha-agent.service
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
Active: inactive (dead)
Condition: start condition failed at Fri 2021-01-15 16:02:53 EST; 7s ago
└─ ConditionFileNotEmpty=/etc/ovirt-hosted-engine/hosted-engine.conf was not met
[root@odin ~]#
If you do force the upgrade.. the never ending log rotation of errors goes away..
closing as duplicate of bug #1917011. Please wait before switching to CentOS Stream, compatibility for it is still in tech preview. *** This bug has been marked as a duplicate of bug 1917011 *** |
Description of problem: 3 node HCI cluster CentOS8. Converted to CentOS Streams now Engine no longer booting up. All three nodes same with cycling events about python library error per below Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Upgrade from CentOS8 to streams 2. Update packages. Reboot 3. HCI with Gluster volumes are mounting but engine no longer starts Actual results: [root@thor ~]# systemctl enable ovirt-ha-agent [root@thor ~]# systemctl start ovirt-ha-agent [root@thor ~]# systemctl status ovirt-ha-agent ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: activating (auto-restart) (Result: exit-code) since Fri 2021-01-01 09:33:34 EST; 6s ago Process: 427064 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (code=exited, status=157) Main PID: 427064 (code=exited, status=157) Tasks: 0 (limit: 1235320) Memory: 0B CGroup: /system.slice/ovirt-ha-agent.service [root@thor ~]# [root@thor ~]# [root@thor ~]# systemctl status ovirt-ha-agent -l ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: activating (auto-restart) (Result: exit-code) since Fri 2021-01-01 09:34:06 EST; 3s ago Process: 427376 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (code=exited, status=157) Main PID: 427376 (code=exited, status=157) Tasks: 0 (limit: 1235320) Memory: 0B CGroup: /system.slice/ovirt-ha-agent.service [root@thor ~]# Expected results: Nodes to boot HCI engine which will launch VMs. Additional info: Jan 1 09:35:28 thor journal[428169]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.broker.Broker ERROR Failed initializing the broker: [Errno 107] Transport endpoint is not connected: '/rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_engine/3afc47ba-afb9-413f-8de5-8d9a2f45ecde/ha_agent/hosted-engine.metadata' Jan 1 09:35:28 thor journal[428169]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.broker.Broker ERROR Traceback (most recent call last):#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 64, in run#012 self._storage_broker_instance = self._get_storage_broker()#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 143, in _get_storage_broker#012 return storage_broker.StorageBroker()#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 97, in __init__#012 self._backend.connect()#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 408, in connect#012 self._check_symlinks(self._storage_path, volume.path, service_link)#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 105, in _check_symlinks#012 os.unlink(service_link)#012OSError: [Errno 107] Transport endpoint is not connected: '/rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_engine/3afc47ba-afb9-413f-8de5-8d9a2f45ecde/ha_agent/hosted-engine.metadata' Jan 1 09:35:28 thor journal[428169]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.broker.Broker ERROR Trying to restart the broker Jan 1 09:35:28 thor platform-python[428169]: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 1 09:35:28 thor systemd[1]: ovirt-ha-broker.service: Main process exited, code=exited, status=1/FAILURE Jan 1 09:35:28 thor systemd[1]: ovirt-ha-broker.service: Failed with result 'exit-code'. Jan 1 09:35:28 thor abrt-server[428202]: Deleting problem directory Python3-2021-01-01-09:35:28-428169 (dup of Python3-2020-09-18-14:25:13-1363) Jan 1 09:35:28 thor systemd[1]: ovirt-ha-broker.service: Service RestartSec=100ms expired, scheduling restart. Jan 1 09:35:28 thor systemd[1]: ovirt-ha-broker.service: Scheduled restart job, restart counter is at 8235. Jan 1 09:35:28 thor systemd[1]: Stopped oVirt Hosted Engine High Availability Communications Broker. Jan 1 09:35:28 thor systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. Jan 1 09:35:28 thor abrt-server[428202]: /bin/sh: reporter-systemd-journal: command not found Jan 1 09:35:29 thor systemd[1]: ovirt-ha-agent.service: Service RestartSec=10s expired, scheduling restart. Jan 1 09:35:29 thor systemd[1]: ovirt-ha-agent.service: Scheduled restart job, restart counter is at 3962. Jan 1 09:35:29 thor systemd[1]: Stopped oVirt Hosted Engine High Availability Monitoring Agent. Jan 1 09:35:29 thor systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent. Jan 1 09:35:30 thor journal[428219]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to start necessary monitors Jan 1 09:35:30 thor journal[428219]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call last):#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 85, in start_monitor#012 response = self._proxy.start_monitor(type, options)#012 File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__#012 return self.__send(self.__name, args)#012 File "/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request#012 verbose=self.__verbose#012 File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request#012 return self.single_request(host, handler, request_body, verbose)#012 File "/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in single_request#012 http_conn = self.send_request(host, handler, request_body, verbose)#012 File "/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request#012 self.send_content(connection, request_body)#012 File "/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_content#012 connection.endheaders(request_body)#012 File "/usr/lib64/python3.6/http/client.py", line 1264, in endheaders#012 self._send_output(message_body, encode_chunked=encode_chunked)#012 File "/usr/lib64/python3.6/http/client.py", line 1040, in _send_output#012 self.send(msg)#012 File "/usr/lib64/python3.6/http/client.py", line 978, in send#012 self.connect()#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 74, in connect#012 self.sock.connect(base64.b16decode(self.host))#012FileNotFoundError: [Errno 2] No such file or directory#012#012During handling of the above exception, another exception occurred:#012#012Traceback (most recent call last):#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent#012 return action(he)#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper#012 return he.start_monitoring()#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 437, in start_monitoring#012 self._initialize_broker()#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 561, in _initialize_broker#012 m.get('options', {}))#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 91, in start_monitor#012 ).format(t=type, o=options, e=e)#012ovirt_hosted_engine_ha.lib.exceptions.RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'addr': '172.16.100.1', 'network_test': 'dns', 'tcp_t_address': '', 'tcp_t_port': ''}] Jan 1 09:35:30 thor journal[428219]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent Jan 1 09:35:30 thor systemd[1]: ovirt-ha-agent.service: Main process exited, code=exited, status=157/n/a Jan 1 09:35:30 thor systemd[1]: ovirt-ha-agent.service: Failed with result 'exit-code'. Jan 1 09:35:30 thor systemd[1]: Started Session c8243 of user root. Jan 1 09:35:30 thor systemd[1]: session-c8243.scope: Succeeded. Jan 1 09:35:31 thor vdsm[7851]: WARN Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished?