Bug 1670312 - ovirt-imageio-daemon fails to start after reboot
Summary: ovirt-imageio-daemon fails to start after reboot
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-imageio
Classification: oVirt
Component: Daemon
Version: 1.4.6
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ovirt-4.3.2
: ---
Assignee: Daniel Erez
QA Contact: Evelina Shames
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-29 08:49 UTC by Sandro Bonazzola
Modified: 2019-03-26 07:20 UTC (History)
8 users (show)

Fixed In Version: v1.5.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-26 07:20:51 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.3+


Attachments (Terms of Use)
journalctl -b output (244.70 KB, application/x-xz)
2019-01-29 09:58 UTC, Sandro Bonazzola
no flags Details
sosreport -o ovirt_imageio,vdsm (5.72 MB, application/x-xz)
2019-01-29 10:01 UTC, Sandro Bonazzola
no flags Details
systemctl show ovirt-imageio-daemon.service (1.44 KB, application/x-xz)
2019-01-29 13:51 UTC, Sandro Bonazzola
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 97407 0 'None' MERGED daemon: run service after vdsm 2020-11-26 18:39:59 UTC

Description Sandro Bonazzola 2019-01-29 08:49:09 UTC
Description of problem:
- Deployed 4.3.0 RC3, ovirt-imageio-daemon-1.4.6-1.el7.noarch
- Rebooted the host.

imageio daemon fails to start with:
2019-01-29 09:44:37,425 INFO    (MainThread) [server] Starting (pid=22613, version=1.4.6)
2019-01-29 09:44:37,427 ERROR   (MainThread) [server] Service failed (remote_service=<ovirt_imageio_daemon.server.RemoteService object at 0x7fb406e25050>, local_service=<ovirt_imageio_daemon.server.LocalService object at 0x7fb405bc1810>, control_service=None, running=True)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 58, in main
    start(config)
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 99, in start
    control_service = ControlService(config)
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 206, in __init__
    config.tickets.socket, uhttp.UnixWSGIRequestHandler)
  File "/usr/lib64/python2.7/SocketServer.py", line 419, in __init__
    self.server_bind()
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/uhttp.py", line 79, in server_bind
    self.socket.bind(self.server_address)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory

no indication of which file or directory is missing.

Comment 1 Sandro Bonazzola 2019-01-29 08:57:09 UTC
Happened only on one of the 2 hosts in my datacenter so it's not 100% reproducible.

Comment 2 Nir Soffer 2019-01-29 09:45:59 UTC
Sandro, can you share the logs from the host with this issue?

I think output of "journalctl -b failing-boot-id" will be useful.

This smells like a duplicate of bug 1639667, but we need the logs
to be sure.

Comment 3 Sandro Bonazzola 2019-01-29 09:58:18 UTC
Created attachment 1524528 [details]
journalctl -b output

Comment 4 Sandro Bonazzola 2019-01-29 10:01:35 UTC
Created attachment 1524541 [details]
sosreport -o ovirt_imageio,vdsm

Comment 5 Nir Soffer 2019-01-29 10:17:16 UTC
In journal.log (attachment 1524528 [details]) we see:

gen 29 09:36:31 minidell.home systemd[1]: ovirt-imageio-daemon.service: main process exited, code=exited, status=1/FAILURE
gen 29 09:36:31 minidell.home systemd[1]: Failed to start oVirt ImageIO Daemon.
gen 29 09:36:31 minidell.home systemd[1]: Unit ovirt-imageio-daemon.service entered failed state.
gen 29 09:36:31 minidell.home systemd[1]: ovirt-imageio-daemon.service failed.
...
gen 29 09:36:35 minidell.home vdsm-tool[4930]: Traceback (most recent call last):
gen 29 09:36:35 minidell.home vdsm-tool[4930]: File "/usr/bin/vdsm-tool", line 220, in main
gen 29 09:36:35 minidell.home vdsm-tool[4930]: return tool_command[cmd]["command"](*args)
gen 29 09:36:35 minidell.home vdsm-tool[4930]: File "/usr/lib/python2.7/site-packages/vdsm/tool/network.py", line 96, in dump_bonding_options
gen 29 09:36:35 minidell.home vdsm-tool[4930]: sysfs_options_mapper.dump_bonding_options()
gen 29 09:36:35 minidell.home vdsm-tool[4930]: File "/usr/lib/python2.7/site-packages/vdsm/network/link/bond/sysfs_options_mapper.py", line 48, in dump_bonding_options
gen 29 09:36:35 minidell.home vdsm-tool[4930]: with open(sysfs_options.BONDING_DEFAULTS, 'w') as f:
gen 29 09:36:35 minidell.home vdsm-tool[4930]: IOError: [Errno 2] No such file or directory: '/var/run/vdsm/bonding-defaults.json'
gen 29 09:36:35 minidell.home systemd[1]: vdsm-network-init.service: main process exited, code=exited, status=1/FAILURE
...
gen 29 09:36:49 minidell.home ovirt-ha-agent[5662]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Service vdsmd is not running and the admin is responsible for starting it.

How ovirt-imageio-daemon is started before vdsm, and before vdsm run directory
is created? The service must be disabled, and it starts when vdsm starts
because vdsm wants it:

$ grep ovirt-imageio-daemon static/usr/lib/systemd/system/vdsmd.service
Wants=mom-vdsm.service ovirt-imageio-daemon.service abrtd.service \

Sandro, is ovirt-imageio-daemon.service is enabled on this setup?

Please share output of "systemctl show ovirt-imageio-daemon.service"

This look like a duplicate of bug 1639667, but this is on hosted engine.

Comment 6 Sandro Bonazzola 2019-01-29 13:51:54 UTC
Created attachment 1524624 [details]
systemctl show ovirt-imageio-daemon.service

Comment 7 Sandro Bonazzola 2019-01-29 13:54:17 UTC
Since you mentioned bug #1639667 ,
 # ls -ld /var/run/vdsm
 drwxr-xr-x. 4 vdsm kvm 220 29 gen 10.57 /var/run/vdsm

# ls -lR /var/run/vdsm
/var/run/vdsm:
totale 20
-rw-r--r--. 1 root root 9918 29 gen 10.57 bonding-defaults.json
-rw-r--r--. 1 root root 5742 29 gen 10.57 bonding-name2numeric.json
drwxr-xr-x. 2 root root   40 29 gen 10.57 lvm
srwxr-xr-x. 1 vdsm kvm     0 29 gen 10.57 mom-vdsm.sock
-rw-r--r--. 1 root root    0 29 gen 10.57 nets_restored
drwxr-xr-x. 3 vdsm kvm    60 29 gen 10.57 storage
-rwxr-xr-x. 1 root root    0 29 gen 10.57 supervdsmd.lock
srwxr-xr-x. 1 vdsm kvm     0 29 gen 10.57 svdsm.sock
-rwxr-xr-x. 1 vdsm kvm     0 29 gen 10.57 vdsmd.lock

/var/run/vdsm/lvm:
totale 0

/var/run/vdsm/storage:
totale 0
drwxr-xr-x. 2 vdsm kvm 140 29 gen 11.04 f4764a60-5b3a-4fe1-857b-82a6e3610f54

/var/run/vdsm/storage/f4764a60-5b3a-4fe1-857b-82a6e3610f54:
totale 20
lrwxrwxrwx. 1 vdsm kvm 129 29 gen 11.03 0f8bceaf-23dc-41f2-95d5-b6dddd460154 -> /rhev/data-center/mnt/minidell.home:_home_hosted/f4764a60-5b3a-4fe1-857b-82a6e3610f54/images/0f8bceaf-23dc-41f2-95d5-b6dddd460154
lrwxrwxrwx. 1 vdsm kvm 129 29 gen 11.03 9603469e-a848-4bfb-962d-4642459845ab -> /rhev/data-center/mnt/minidell.home:_home_hosted/f4764a60-5b3a-4fe1-857b-82a6e3610f54/images/9603469e-a848-4bfb-962d-4642459845ab
lrwxrwxrwx. 1 vdsm kvm 129 29 gen 10.57 9ca65d99-11f4-469b-b9bf-3b0dfd314187 -> /rhev/data-center/mnt/minidell.home:_home_hosted/f4764a60-5b3a-4fe1-857b-82a6e3610f54/images/9ca65d99-11f4-469b-b9bf-3b0dfd314187
lrwxrwxrwx. 1 vdsm kvm 129 29 gen 10.57 9eec7b7a-4d7f-4def-85f5-f2b7a6c0dcbe -> /rhev/data-center/mnt/minidell.home:_home_hosted/f4764a60-5b3a-4fe1-857b-82a6e3610f54/images/9eec7b7a-4d7f-4def-85f5-f2b7a6c0dcbe
lrwxrwxrwx. 1 vdsm kvm 129 29 gen 11.03 b12c2e16-8400-4f2c-b4c0-f645110472fa -> /rhev/data-center/mnt/minidell.home:_home_hosted/f4764a60-5b3a-4fe1-857b-82a6e3610f54/images/b12c2e16-8400-4f2c-b4c0-f645110472fa
[root@minidell vdsm]#

Comment 8 Daniel Erez 2019-01-29 15:51:40 UTC
Pushed a suggested solution for ensuring that the daemon is running after vdsm service:
https://gerrit.ovirt.org/#/c/97407/

It could be hard to verify manually, but we'll release a new imageio version, and see if it gets reproduced.

Comment 11 Sandro Bonazzola 2019-03-22 11:48:24 UTC
Referenced build was included in oVirt 4.3.2, moving to QA

Comment 14 Evelina Shames 2019-03-24 11:46:26 UTC
Verified on ovirt-imageio-daemon-1.5.1-0.

Comment 15 Sandro Bonazzola 2019-03-26 07:20:51 UTC
This bugzilla is included in oVirt 4.3.2 release, published on March 19th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.