Hide Forgot
Description of problem: - Deployed 4.3.0 RC3, ovirt-imageio-daemon-1.4.6-1.el7.noarch - Rebooted the host. imageio daemon fails to start with: 2019-01-29 09:44:37,425 INFO (MainThread) [server] Starting (pid=22613, version=1.4.6) 2019-01-29 09:44:37,427 ERROR (MainThread) [server] Service failed (remote_service=<ovirt_imageio_daemon.server.RemoteService object at 0x7fb406e25050>, local_service=<ovirt_imageio_daemon.server.LocalService object at 0x7fb405bc1810>, control_service=None, running=True) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 58, in main start(config) File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 99, in start control_service = ControlService(config) File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 206, in __init__ config.tickets.socket, uhttp.UnixWSGIRequestHandler) File "/usr/lib64/python2.7/SocketServer.py", line 419, in __init__ self.server_bind() File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/uhttp.py", line 79, in server_bind self.socket.bind(self.server_address) File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 2] No such file or directory no indication of which file or directory is missing.
Happened only on one of the 2 hosts in my datacenter so it's not 100% reproducible.
Sandro, can you share the logs from the host with this issue? I think output of "journalctl -b failing-boot-id" will be useful. This smells like a duplicate of bug 1639667, but we need the logs to be sure.
Created attachment 1524528 [details] journalctl -b output
Created attachment 1524541 [details] sosreport -o ovirt_imageio,vdsm
In journal.log (attachment 1524528 [details]) we see: gen 29 09:36:31 minidell.home systemd[1]: ovirt-imageio-daemon.service: main process exited, code=exited, status=1/FAILURE gen 29 09:36:31 minidell.home systemd[1]: Failed to start oVirt ImageIO Daemon. gen 29 09:36:31 minidell.home systemd[1]: Unit ovirt-imageio-daemon.service entered failed state. gen 29 09:36:31 minidell.home systemd[1]: ovirt-imageio-daemon.service failed. ... gen 29 09:36:35 minidell.home vdsm-tool[4930]: Traceback (most recent call last): gen 29 09:36:35 minidell.home vdsm-tool[4930]: File "/usr/bin/vdsm-tool", line 220, in main gen 29 09:36:35 minidell.home vdsm-tool[4930]: return tool_command[cmd]["command"](*args) gen 29 09:36:35 minidell.home vdsm-tool[4930]: File "/usr/lib/python2.7/site-packages/vdsm/tool/network.py", line 96, in dump_bonding_options gen 29 09:36:35 minidell.home vdsm-tool[4930]: sysfs_options_mapper.dump_bonding_options() gen 29 09:36:35 minidell.home vdsm-tool[4930]: File "/usr/lib/python2.7/site-packages/vdsm/network/link/bond/sysfs_options_mapper.py", line 48, in dump_bonding_options gen 29 09:36:35 minidell.home vdsm-tool[4930]: with open(sysfs_options.BONDING_DEFAULTS, 'w') as f: gen 29 09:36:35 minidell.home vdsm-tool[4930]: IOError: [Errno 2] No such file or directory: '/var/run/vdsm/bonding-defaults.json' gen 29 09:36:35 minidell.home systemd[1]: vdsm-network-init.service: main process exited, code=exited, status=1/FAILURE ... gen 29 09:36:49 minidell.home ovirt-ha-agent[5662]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Service vdsmd is not running and the admin is responsible for starting it. How ovirt-imageio-daemon is started before vdsm, and before vdsm run directory is created? The service must be disabled, and it starts when vdsm starts because vdsm wants it: $ grep ovirt-imageio-daemon static/usr/lib/systemd/system/vdsmd.service Wants=mom-vdsm.service ovirt-imageio-daemon.service abrtd.service \ Sandro, is ovirt-imageio-daemon.service is enabled on this setup? Please share output of "systemctl show ovirt-imageio-daemon.service" This look like a duplicate of bug 1639667, but this is on hosted engine.
Created attachment 1524624 [details] systemctl show ovirt-imageio-daemon.service
Since you mentioned bug #1639667 , # ls -ld /var/run/vdsm drwxr-xr-x. 4 vdsm kvm 220 29 gen 10.57 /var/run/vdsm # ls -lR /var/run/vdsm /var/run/vdsm: totale 20 -rw-r--r--. 1 root root 9918 29 gen 10.57 bonding-defaults.json -rw-r--r--. 1 root root 5742 29 gen 10.57 bonding-name2numeric.json drwxr-xr-x. 2 root root 40 29 gen 10.57 lvm srwxr-xr-x. 1 vdsm kvm 0 29 gen 10.57 mom-vdsm.sock -rw-r--r--. 1 root root 0 29 gen 10.57 nets_restored drwxr-xr-x. 3 vdsm kvm 60 29 gen 10.57 storage -rwxr-xr-x. 1 root root 0 29 gen 10.57 supervdsmd.lock srwxr-xr-x. 1 vdsm kvm 0 29 gen 10.57 svdsm.sock -rwxr-xr-x. 1 vdsm kvm 0 29 gen 10.57 vdsmd.lock /var/run/vdsm/lvm: totale 0 /var/run/vdsm/storage: totale 0 drwxr-xr-x. 2 vdsm kvm 140 29 gen 11.04 f4764a60-5b3a-4fe1-857b-82a6e3610f54 /var/run/vdsm/storage/f4764a60-5b3a-4fe1-857b-82a6e3610f54: totale 20 lrwxrwxrwx. 1 vdsm kvm 129 29 gen 11.03 0f8bceaf-23dc-41f2-95d5-b6dddd460154 -> /rhev/data-center/mnt/minidell.home:_home_hosted/f4764a60-5b3a-4fe1-857b-82a6e3610f54/images/0f8bceaf-23dc-41f2-95d5-b6dddd460154 lrwxrwxrwx. 1 vdsm kvm 129 29 gen 11.03 9603469e-a848-4bfb-962d-4642459845ab -> /rhev/data-center/mnt/minidell.home:_home_hosted/f4764a60-5b3a-4fe1-857b-82a6e3610f54/images/9603469e-a848-4bfb-962d-4642459845ab lrwxrwxrwx. 1 vdsm kvm 129 29 gen 10.57 9ca65d99-11f4-469b-b9bf-3b0dfd314187 -> /rhev/data-center/mnt/minidell.home:_home_hosted/f4764a60-5b3a-4fe1-857b-82a6e3610f54/images/9ca65d99-11f4-469b-b9bf-3b0dfd314187 lrwxrwxrwx. 1 vdsm kvm 129 29 gen 10.57 9eec7b7a-4d7f-4def-85f5-f2b7a6c0dcbe -> /rhev/data-center/mnt/minidell.home:_home_hosted/f4764a60-5b3a-4fe1-857b-82a6e3610f54/images/9eec7b7a-4d7f-4def-85f5-f2b7a6c0dcbe lrwxrwxrwx. 1 vdsm kvm 129 29 gen 11.03 b12c2e16-8400-4f2c-b4c0-f645110472fa -> /rhev/data-center/mnt/minidell.home:_home_hosted/f4764a60-5b3a-4fe1-857b-82a6e3610f54/images/b12c2e16-8400-4f2c-b4c0-f645110472fa [root@minidell vdsm]#
Pushed a suggested solution for ensuring that the daemon is running after vdsm service: https://gerrit.ovirt.org/#/c/97407/ It could be hard to verify manually, but we'll release a new imageio version, and see if it gets reproduced.
Referenced build was included in oVirt 4.3.2, moving to QA
Verified on ovirt-imageio-daemon-1.5.1-0.
This bugzilla is included in oVirt 4.3.2 release, published on March 19th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.