Created attachment 1025359 [details] /var/log Description of problem: i could not add ovirt node into ovirt 3.6: vdsmd doesn't run, it fails all the time and it is restarted (and fails again), i see no bridge configured [root@dell-r210ii-13 ~]# systemctl status vdsmd vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: deactivating (stop-post) (Result: exit-code) since Thu 2015-05-14 10:27:47 UTC; 326ms ago Process: 20840 ExecStart=/usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/share/vdsm/vdsm (code=exited, status=1/FAILURE) Process: 20748 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 20840 (code=exited, status=1/FAILURE); : 20850 (vdsmd_init_comm) CGroup: /system.slice/vdsmd.service └─control ├─20850 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --post-stop └─20851 /usr/bin/python /usr/share/vdsm/hooks.pyc after_vdsm_stop May 14 10:27:45 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20748]: vdsm: Running syslog_available May 14 10:27:45 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20748]: vdsm: Running nwfilter May 14 10:27:46 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20748]: vdsm: Running dummybr May 14 10:27:47 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20748]: vdsm: Running load_needed_modules May 14 10:27:47 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20748]: vdsm: Running tune_system May 14 10:27:47 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20748]: vdsm: Running test_space May 14 10:27:47 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20748]: vdsm: Running test_lo May 14 10:27:47 dell-r210ii-13.rhev.lab.eng.brq.redhat.com systemd[1]: Started Virtual Desktop Server Manager. May 14 10:27:47 dell-r210ii-13.rhev.lab.eng.brq.redhat.com systemd[1]: vdsmd.service: main process exited, code=exited, status=1/FAILURE May 14 10:27:47 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20850]: vdsm: Running run_final_hooks [root@dell-r210ii-13 ~]# systemctl status vdsmd vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: activating (start-pre) since Thu 2015-05-14 10:27:48 UTC; 417ms ago Process: 20850 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Process: 20840 ExecStart=/usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/share/vdsm/vdsm (code=exited, status=1/FAILURE) Main PID: 20840 (code=exited, status=1/FAILURE); : 20861 (vdsmd_init_comm) CGroup: /system.slice/vdsmd.service └─control ├─20861 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --pre-start └─20885 /usr/bin/python /usr/share/vdsm/hooks.pyc before_vdsm_start May 14 10:27:48 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20861]: vdsm: Running mkdirs May 14 10:27:48 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20861]: vdsm: Running configure_coredump May 14 10:27:48 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20861]: vdsm: Running configure_vdsm_logs May 14 10:27:48 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20861]: vdsm: Running wait_for_network May 14 10:27:48 dell-r210ii-13.rhev.lab.eng.brq.redhat.com vdsmd_init_common.sh[20861]: vdsm: Running run_init_hooks It seems problem is in /usr/share/vdsm/vdsm python script as it returns 1. (imo somebody should think how to make these python spaghetti more talkative, no verbose, no debug...) Version-Release number of selected component (if applicable): vdsm-4.17.0-783.git79781a1.el7.x86_64 / node 3.6 (0.999.201505122017.el7.centos) How reproducible: 100% Steps to Reproduce: 1. try to add this node to ovirt env 2. 3. Actual results: non-operational Expected results: should work Additional info:
I'm seeing in messages: May 14 11:25:09 localhost systemd: Stopping Virtual Desktop Server Manager... May 14 11:25:09 localhost systemd: Starting Virtual Desktop Server Manager... May 14 11:25:09 localhost vdsmd_init_common.sh: vdsm: Running mkdirs May 14 11:25:09 localhost vdsmd_init_common.sh: vdsm: Running configure_coredump May 14 11:25:09 localhost vdsmd_init_common.sh: vdsm: Running configure_vdsm_logs May 14 11:25:09 localhost vdsmd_init_common.sh: vdsm: Running wait_for_network May 14 11:25:09 localhost vdsmd_init_common.sh: vdsm: Running run_init_hooks May 14 11:25:10 localhost vdsmd_init_common.sh: vdsm: Running upgraded_version_check May 14 11:25:10 localhost vdsmd_init_common.sh: vdsm: Running check_is_configured May 14 11:25:11 localhost vdsmd_init_common.sh: vdsm: Running validate_configuration May 14 11:25:12 localhost vdsmd_init_common.sh: vdsm: Running prepare_transient_repository May 14 11:25:14 localhost vdsmd_init_common.sh: vdsm: Running syslog_available May 14 11:25:14 localhost vdsmd_init_common.sh: vdsm: Running nwfilter May 14 11:25:15 localhost vdsmd_init_common.sh: vdsm: Running dummybr May 14 11:25:16 localhost vdsmd_init_common.sh: vdsm: Running load_needed_modules May 14 11:25:16 localhost vdsmd_init_common.sh: vdsm: Running tune_system May 14 11:25:16 localhost vdsmd_init_common.sh: vdsm: Running test_space May 14 11:25:16 localhost vdsmd_init_common.sh: vdsm: Running test_lo May 14 11:25:16 localhost systemd: Started Virtual Desktop Server Manager. May 14 11:25:16 localhost systemd: vdsmd.service: main process exited, code=exited, status=1/FAILURE and in supervdsmd log: MainThread::DEBUG::2015-05-14 10:22:52,859::__init__::47::blivet::(register_device_format) registered device format class USBFS as usbfs MainThread::DEBUG::2015-05-14 10:22:52,887::storage_log::69::blivet::(log_exception_info) IGNORED: Caught exception, continuing. MainThread::DEBUG::2015-05-14 10:22:52,887::storage_log::72::blivet::(log_exception_info) IGNORED: Problem description: failed to get initiator name from iscsi firmware MainThread::DEBUG::2015-05-14 10:22:52,887::storage_log::73::blivet::(log_exception_info) IGNORED: Begin exception details. MainThread::DEBUG::2015-05-14 10:22:52,888::storage_log::76::blivet::(log_exception_info) IGNORED: Traceback (most recent call last): MainThread::DEBUG::2015-05-14 10:22:52,888::storage_log::76::blivet::(log_exception_info) IGNORED: File "/usr/lib/python2.7/site-packages/blivet/iscsi.py", line 87, in __init__ MainThread::DEBUG::2015-05-14 10:22:52,888::storage_log::76::blivet::(log_exception_info) IGNORED: IOError: Unknown error MainThread::DEBUG::2015-05-14 10:22:52,888::storage_log::77::blivet::(log_exception_info) IGNORED: End exception details. MainThread::DEBUG::2015-05-14 10:22:52,893::supervdsmServer::473::SuperVdsm.Server::(main) Making sure I'm root - SuperVdsm MainThread::DEBUG::2015-05-14 10:22:52,893::supervdsmServer::482::SuperVdsm.Server::(main) Parsing cmd args The iscsi problems looks like the cause for this. Moving this to vdsm.
Jiri, I didn't try to reproduce it and I will - can you point me to the node iso? this also looks suspicious May 14 07:49:54 localhost systemd: Started Virtual Desktop Server Manager network restoration. May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/ppp_a sync.ko: No such file or directory May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/ppp_g eneric.ko: No such file or directory May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/bsd_c omp.ko: No such file or directory May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/ppp_d eflate.ko: No such file or directory May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/ppp_m ppe.ko: No such file or directory May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/pppox .ko: No such file or directory May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/ppp_s ynctty.ko: No such file or directory May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/pptp. ko: No such file or directory May 14 07:49:54 localhost kdumpctl: /usr/lib/dracut/modules.d/40network/module-setup.sh: line 36: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/net/ppp/pppoe .ko: No such file or directory
http://bob.eng.lab.tlv.redhat.com/builds/3.6/3.6.0/3.6.0-1/snapshot/rpm/el7/noarch/ or newer... http://resources.ovirt.org/pub/ovirt-master-snapshot/rpm/el7/noarch/
Those the rpms you used but what ovirt node iso did you install, nothing in http://resources.ovirt.org/pub/ovirt-master-snapshot/iso/ovirt-node-iso/ .. if you have this host still up please ping me in irc to connect and see it . from the logs I can't find what causes vdsmd to stop immediately
Sorry, missed the iso rpm there. I successfully reproduced it, then I switched to vdsm user (su - vdsm -s /bin/sh) then ran /usr/share/vdsm/vdsm and see that it fails while accessing /var/log/ovirt-node.log which is owned by root - I changed it to be owned by vdsm:kvm and then it failed on "Permission denied: '/var/log/ovirt.log'" - changed it also and now it runs successfully We should create those two log files during vdsm installation and give them vdsm ownership - otherwise some node scripts write first and create them as root. See the following backtrace - we didn't change the imports since 11/2014 - so the regression is in node code which now loads the logger globally so the import in vdsm/utils.py triggers it in vdsm context .. Douglas, Fabian - let me know how do you prefer to fix it (in vdsm spec which might be the wrong way, or not loading the load as part of importing ovirtfunctions - which is in node border) -sh-4.2$ /usr/share/vdsm/vdsm Traceback (most recent call last): File "/usr/share/vdsm/vdsm", line 35, in <module> from vdsm import utils File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 62, in <module> from ovirtnode import ovirtfunctions File "/usr/lib/python2.7/site-packages/ovirtnode/ovirtfunctions.py", line 41, in <module> from ovirt.node.config import defaults File "/usr/lib/python2.7/site-packages/ovirt/node/config/defaults.py", line 22, in <module> from ovirt.node.config.network import NicConfig File "/usr/lib/python2.7/site-packages/ovirt/node/config/network.py", line 22, in <module> from ovirt.node.utils import AugeasWrapper as Augeas, fs, is_fileobj File "/usr/lib/python2.7/site-packages/ovirt/node/utils/fs/__init__.py", line 40, in <module> LOGGER = log.getLogger(__name__) File "/usr/lib/python2.7/site-packages/ovirt/node/log.py", line 77, in getLogger configure_logging() File "/usr/lib/python2.7/site-packages/ovirt/node/log.py", line 71, in configure_logging logging.config.fileConfig(mixedfile, disable_existing_loggers=False) File "/usr/lib64/python2.7/logging/config.py", line 78, in fileConfig handlers = _install_handlers(cp, formatters) File "/usr/lib64/python2.7/logging/config.py", line 156, in _install_handlers h = klass(*args) File "/usr/lib64/python2.7/logging/handlers.py", line 392, in __init__ logging.FileHandler.__init__(self, filename, mode, encoding, delay) File "/usr/lib64/python2.7/logging/__init__.py", line 902, in __init__ StreamHandler.__init__(self, self._open()) File "/usr/lib64/python2.7/logging/__init__.py", line 925, in _open stream = open(self.baseFilename, self.mode) IOError: [Errno 13] Permission denied: '/var/log/ovirt-node.log'
*** This bug has been marked as a duplicate of bug 1224400 ***