Bug 1221566 - [ovirt-node] node 3.6 (0.999.201505122017.el7.centos) does not work - vdsmd.service: main process exited, code=exited, status=1/FAILURE
Summary: [ovirt-node] node 3.6 (0.999.201505122017.el7.centos) does not work - vdsmd.s...
Keywords:
Status: CLOSED DUPLICATE of bug 1224400
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: 3.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.6.0
Assignee: Yaniv Bronhaim
QA Contact: Gil Klein
URL:
Whiteboard: infra
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-14 11:37 UTC by Jiri Belka
Modified: 2016-02-10 19:30 UTC (History)
13 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-06-15 15:17:18 UTC
oVirt Team: Infra
Embargoed:


Attachments (Terms of Use)
/var/log (336.90 KB, application/x-gzip)
2015-05-14 11:37 UTC, Jiri Belka
no flags Details

Description Jiri Belka 2015-05-14 11:37:26 UTC
Created attachment 1025359 [details]
/var/log

Description of problem:

i could not add ovirt node into ovirt 3.6:

vdsmd doesn't run, it fails all the time and it is restarted (and fails again), i see no bridge configured

[root@dell-r210ii-13 ~]# systemctl status vdsmd
vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: deactivating (stop-post) (Result: exit-code) since Thu 2015-05-14 10:27:47 UTC; 326ms ago
  Process: 20840 ExecStart=/usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/share/vdsm/vdsm (code=exited, status=1/FAILURE)
  Process: 20748 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 20840 (code=exited, status=1/FAILURE);         : 20850 (vdsmd_init_comm)
   CGroup: /system.slice/vdsmd.service
           └─control
             ├─20850 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --post-stop
             └─20851 /usr/bin/python /usr/share/vdsm/hooks.pyc after_vdsm_stop

May 14 10:27:45 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20748]: vdsm: Running syslog_available
May 14 10:27:45 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20748]: vdsm: Running nwfilter
May 14 10:27:46 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20748]: vdsm: Running dummybr
May 14 10:27:47 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20748]: vdsm: Running load_needed_modules
May 14 10:27:47 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20748]: vdsm: Running tune_system
May 14 10:27:47 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20748]: vdsm: Running test_space
May 14 10:27:47 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20748]: vdsm: Running test_lo
May 14 10:27:47 dell-r210ii-13.rhev.lab.eng.brq.redhat.com systemd[1]: Started Virtual Desktop Server Manager.
May 14 10:27:47 dell-r210ii-13.rhev.lab.eng.brq.redhat.com systemd[1]: vdsmd.service: main process exited, code=exited, status=1/FAILURE
May 14 10:27:47 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20850]: vdsm: Running run_final_hooks
[root@dell-r210ii-13 ~]# systemctl status vdsmd
vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: activating (start-pre) since Thu 2015-05-14 10:27:48 UTC; 417ms ago
  Process: 20850 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
  Process: 20840 ExecStart=/usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/share/vdsm/vdsm (code=exited, status=1/FAILURE)
 Main PID: 20840 (code=exited, status=1/FAILURE);         : 20861 (vdsmd_init_comm)
   CGroup: /system.slice/vdsmd.service
           └─control
             ├─20861 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --pre-start
             └─20885 /usr/bin/python /usr/share/vdsm/hooks.pyc before_vdsm_start

May 14 10:27:48 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20861]: vdsm: Running mkdirs
May 14 10:27:48 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20861]: vdsm: Running configure_coredump
May 14 10:27:48 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20861]: vdsm: Running configure_vdsm_logs
May 14 10:27:48 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20861]: vdsm: Running wait_for_network
May 14 10:27:48 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20861]: vdsm: Running run_init_hooks

It seems problem is in /usr/share/vdsm/vdsm python script as it returns 1.

(imo somebody should think how to make these python spaghetti more talkative, no verbose, no debug...)

Version-Release number of selected component (if applicable):
vdsm-4.17.0-783.git79781a1.el7.x86_64 / node 3.6 (0.999.201505122017.el7.centos)

How reproducible:
100%

Steps to Reproduce:
1. try to add this node to ovirt env
2.
3.

Actual results:
non-operational

Expected results:
should work

Additional info:

Comment 1 Fabian Deutsch 2015-05-19 12:29:05 UTC
I'm seeing in messages:
May 14 11:25:09 localhost systemd: Stopping Virtual Desktop Server Manager...
May 14 11:25:09 localhost systemd: Starting Virtual Desktop Server Manager...
May 14 11:25:09 localhost vdsmd_init_common.sh: vdsm: Running mkdirs
May 14 11:25:09 localhost vdsmd_init_common.sh: vdsm: Running configure_coredump
May 14 11:25:09 localhost vdsmd_init_common.sh: vdsm: Running configure_vdsm_logs
May 14 11:25:09 localhost vdsmd_init_common.sh: vdsm: Running wait_for_network
May 14 11:25:09 localhost vdsmd_init_common.sh: vdsm: Running run_init_hooks
May 14 11:25:10 localhost vdsmd_init_common.sh: vdsm: Running upgraded_version_check
May 14 11:25:10 localhost vdsmd_init_common.sh: vdsm: Running check_is_configured
May 14 11:25:11 localhost vdsmd_init_common.sh: vdsm: Running validate_configuration
May 14 11:25:12 localhost vdsmd_init_common.sh: vdsm: Running prepare_transient_repository
May 14 11:25:14 localhost vdsmd_init_common.sh: vdsm: Running syslog_available
May 14 11:25:14 localhost vdsmd_init_common.sh: vdsm: Running nwfilter
May 14 11:25:15 localhost vdsmd_init_common.sh: vdsm: Running dummybr
May 14 11:25:16 localhost vdsmd_init_common.sh: vdsm: Running load_needed_modules
May 14 11:25:16 localhost vdsmd_init_common.sh: vdsm: Running tune_system
May 14 11:25:16 localhost vdsmd_init_common.sh: vdsm: Running test_space
May 14 11:25:16 localhost vdsmd_init_common.sh: vdsm: Running test_lo
May 14 11:25:16 localhost systemd: Started Virtual Desktop Server Manager.
May 14 11:25:16 localhost systemd: vdsmd.service: main process exited, code=exited, status=1/FAILURE

and in supervdsmd log:
MainThread::DEBUG::2015-05-14 10:22:52,859::__init__::47::blivet::(register_device_format) registered device format class USBFS as usbfs
MainThread::DEBUG::2015-05-14 10:22:52,887::storage_log::69::blivet::(log_exception_info) IGNORED:        Caught exception, continuing.
MainThread::DEBUG::2015-05-14 10:22:52,887::storage_log::72::blivet::(log_exception_info) IGNORED:        Problem description: failed to get initiator name from iscsi firmware
MainThread::DEBUG::2015-05-14 10:22:52,887::storage_log::73::blivet::(log_exception_info) IGNORED:        Begin exception details.
MainThread::DEBUG::2015-05-14 10:22:52,888::storage_log::76::blivet::(log_exception_info) IGNORED:            Traceback (most recent call last):
MainThread::DEBUG::2015-05-14 10:22:52,888::storage_log::76::blivet::(log_exception_info) IGNORED:              File "/usr/lib/python2.7/site-packages/blivet/iscsi.py", line 87, in __init__
MainThread::DEBUG::2015-05-14 10:22:52,888::storage_log::76::blivet::(log_exception_info) IGNORED:            IOError: Unknown error
MainThread::DEBUG::2015-05-14 10:22:52,888::storage_log::77::blivet::(log_exception_info) IGNORED:        End exception details.
MainThread::DEBUG::2015-05-14 10:22:52,893::supervdsmServer::473::SuperVdsm.Server::(main) Making sure I'm root - SuperVdsm
MainThread::DEBUG::2015-05-14 10:22:52,893::supervdsmServer::482::SuperVdsm.Server::(main) Parsing cmd args

The iscsi problems looks like the cause for this.

Moving this to vdsm.

Comment 2 Yaniv Bronhaim 2015-06-11 20:35:37 UTC
Jiri, I didn't try to reproduce it and I will - can you point me to the node iso?

this also looks suspicious

May 14 07:49:54 localhost systemd: Started Virtual Desktop Server Manager network restoration.
May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/ppp_a
sync.ko: No such file or directory
May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/ppp_g
eneric.ko: No such file or directory
May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/bsd_c
omp.ko: No such file or directory
May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/ppp_d
eflate.ko: No such file or directory
May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/ppp_m
ppe.ko: No such file or directory
May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/pppox
.ko: No such file or directory
May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/ppp_s
ynctty.ko: No such file or directory
May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/pptp.
ko: No such file or directory
May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/pppoe
.ko: No such file or directory

Comment 4 Yaniv Bronhaim 2015-06-15 12:31:13 UTC
Those the rpms you used but what ovirt node iso did you install, nothing in http://resources.ovirt.org/pub/ovirt-master-snapshot/iso/ovirt-node-iso/ .. if you have this host still up please ping me in irc to connect and see it .
from the logs I can't find what causes vdsmd to stop immediately

Comment 5 Yaniv Bronhaim 2015-06-15 15:03:37 UTC
Sorry, missed the iso rpm there.

I successfully reproduced it, then I switched to vdsm user (su - vdsm -s /bin/sh) then ran /usr/share/vdsm/vdsm and see that it fails while accessing /var/log/ovirt-node.log which is owned by root - I changed it to be owned by vdsm:kvm and then it failed on "Permission denied: '/var/log/ovirt.log'" - changed it also and now it runs successfully 

We should create those two log files during vdsm installation and give them vdsm ownership - otherwise some node scripts write first and create them as root. 

See the following backtrace - we didn't change the imports since 11/2014 - so the regression is in node code which now loads the logger globally so the import  in vdsm/utils.py triggers it in vdsm context ..

Douglas, Fabian - let me know how do you prefer to fix it (in vdsm spec which might be the wrong way, or not loading the load as part of importing ovirtfunctions - which is in node border)

-sh-4.2$ /usr/share/vdsm/vdsm
Traceback (most recent call last):
  File "/usr/share/vdsm/vdsm", line 35, in <module>
    from vdsm import utils
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 62, in <module>
    from ovirtnode import ovirtfunctions
  File "/usr/lib/python2.7/site-packages/ovirtnode/ovirtfunctions.py", line 41, in <module>
    from ovirt.node.config import defaults
  File "/usr/lib/python2.7/site-packages/ovirt/node/config/defaults.py", line 22, in <module>
    from ovirt.node.config.network import NicConfig
  File "/usr/lib/python2.7/site-packages/ovirt/node/config/network.py", line 22, in <module>
    from ovirt.node.utils import AugeasWrapper as Augeas, fs, is_fileobj
  File "/usr/lib/python2.7/site-packages/ovirt/node/utils/fs/__init__.py", line 40, in <module>
    LOGGER = log.getLogger(__name__)
  File "/usr/lib/python2.7/site-packages/ovirt/node/log.py", line 77, in getLogger
    configure_logging()
  File "/usr/lib/python2.7/site-packages/ovirt/node/log.py", line 71, in configure_logging
    logging.config.fileConfig(mixedfile, disable_existing_loggers=False)
  File "/usr/lib64/python2.7/logging/config.py", line 78, in fileConfig
    handlers = _install_handlers(cp, formatters)
  File "/usr/lib64/python2.7/logging/config.py", line 156, in _install_handlers
    h = klass(*args)
  File "/usr/lib64/python2.7/logging/handlers.py", line 392, in __init__
    logging.FileHandler.__init__(self, filename, mode, encoding, delay)
  File "/usr/lib64/python2.7/logging/__init__.py", line 902, in __init__
    StreamHandler.__init__(self, self._open())
  File "/usr/lib64/python2.7/logging/__init__.py", line 925, in _open
    stream = open(self.baseFilename, self.mode)
IOError: [Errno 13] Permission denied: '/var/log/ovirt-node.log'

Comment 6 Douglas Schilling Landgraf 2015-06-15 15:17:18 UTC

*** This bug has been marked as a duplicate of bug 1224400 ***


Note You need to log in before you can comment on or make changes to this bug.