Description of problem: When vdsm is started after removing vdsm, it fails to connect with libvirtd. Version-Release number of selected component (if applicable): vdsm: 5ffd93d758c3c66 How reproducible: Always Steps to Reproduce: 1. Start with previous vdsm build installed and running. 2. yum remove vdsm* 3. yum install --enablerepo=ovirt-beta x86_64/* noarch/vdsm-xml* noarch/vdsm-cli* 4. service vdsmd start Actual results: Service fails Expected results: Service start normally Additional info: The second attempt to start vdsmd succeeds. Example shell output: Dependencies Resolved ========================================================================================================= Package Arch Version Repository Size ========================================================================================================= Installing: vdsm x86_64 4.12.0-159.git5ffd93d.el6 /vdsm-4.12.0-159.git5ffd93d.el6.x86_64 4.0 M vdsm-cli noarch 4.12.0-159.git5ffd93d.el6 /vdsm-cli-4.12.0-159.git5ffd93d.el6.noarch 362 k vdsm-python x86_64 4.12.0-159.git5ffd93d.el6 /vdsm-python-4.12.0-159.git5ffd93d.el6.x86_64 440 k vdsm-python-cpopen x86_64 4.12.0-159.git5ffd93d.el6 /vdsm-python-cpopen-4.12.0-159.git5ffd93d.el6.x86_64 34 k vdsm-xmlrpc noarch 4.12.0-159.git5ffd93d.el6 /vdsm-xmlrpc-4.12.0-159.git5ffd93d.el6.noarch 123 k Transaction Summary ========================================================================================================= Install 5 Package(s) Total size: 5.0 M Installed size: 5.0 M Is this ok [y/N]: y Downloading Packages: Running rpm_check_debug Running Transaction Test Transaction Test Succeeded Running Transaction Installing : vdsm-python-4.12.0-159.git5ffd93d.el6.x86_64 1/5 Installing : vdsm-python-cpopen-4.12.0-159.git5ffd93d.el6.x86_64 2/5 Installing : vdsm-xmlrpc-4.12.0-159.git5ffd93d.el6.noarch 3/5 Installing : vdsm-4.12.0-159.git5ffd93d.el6.x86_64 4/5 Installing : vdsm-cli-4.12.0-159.git5ffd93d.el6.noarch 5/5 Verifying : vdsm-python-4.12.0-159.git5ffd93d.el6.x86_64 1/5 Verifying : vdsm-python-cpopen-4.12.0-159.git5ffd93d.el6.x86_64 2/5 Verifying : vdsm-4.12.0-159.git5ffd93d.el6.x86_64 3/5 Verifying : vdsm-cli-4.12.0-159.git5ffd93d.el6.noarch 4/5 Verifying : vdsm-xmlrpc-4.12.0-159.git5ffd93d.el6.noarch 5/5 Installed: vdsm.x86_64 0:4.12.0-159.git5ffd93d.el6 vdsm-cli.noarch 0:4.12.0-159.git5ffd93d.el6 vdsm-python.x86_64 0:4.12.0-159.git5ffd93d.el6 vdsm-python-cpopen.x86_64 0:4.12.0-159.git5ffd93d.el6 vdsm-xmlrpc.noarch 0:4.12.0-159.git5ffd93d.el6 Complete! [root@dhcp-2-233 RPMS]# service vdsmd stop; echo $? Shutting down vdsm daemon: vdsm watchdog stop [ OK ] vdsm: not running [FAILED] vdsm: Running run_final_hooks vdsm stop [ OK ] 0 [root@dhcp-2-233 RPMS]# service vdsmd start supervdsm start [ OK ] vdsm: Running run_init_hooks vdsm: Running gencerts vdsm: Running reconfigure_sanlock vdsm: Running reconfigure_libvirt Stopping libvirtd daemon: [ OK ] Reconfiguration of libvirt is done. To start working with the new configuration, execute: 'vdsm-tool libvirt-configure-services-restart' This will manage restarting of the following services: libvirtd, supervdsmd diff: /etc/init/libvirtd.conf: No such file or directory vdsm: Running syslog_available vdsm: Running nwfilter libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory Traceback (most recent call last): File "/usr/bin/vdsm-tool", line 143, in <module> sys.exit(main()) File "/usr/bin/vdsm-tool", line 140, in main return tool_command[cmd]["command"](*args[1:]) File "/usr/lib64/python2.6/site-packages/vdsm/tool/nwfilter.py", line 35, in main conn = libvirtconnection.get(None, False) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 125, in get conn = utils.retry(libvirtOpenAuth, timeout=10, sleep=0.2) File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 903, in retry return func() File "/usr/lib64/python2.6/site-packages/libvirt.py", line 102, in openAuth if ret is None:raise libvirtError('virConnectOpenAuth() failed') libvirt.libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory vdsm: failed to execute nwfilter, error code 1 Second run: Starting up vdsm daemon: vdsm start [ OK ] [root@dhcp-2-233 RPMS]# service vdsmd stop Shutting down vdsm daemon: vdsm watchdog stop [ OK ] vdsm: Running run_final_hooks [ OK ] vdsm stop [ OK ] [root@dhcp-2-233 RPMS]# service vdsmd start libvirtd start/running, process 25651 vdsm: Running run_init_hooks vdsm: Running gencerts vdsm: Running reconfigure_sanlock vdsm: Running reconfigure_libvirt libvirt is already configured for vdsm vdsm: Running syslog_available vdsm: Running nwfilter libvir: Network Filter Driver error : Requested operation is not valid: nwfilter is in use vdsm: Running dummybr vdsm: Running load_needed_modules vdsm: Running tune_system vdsm: Running mkdirs vdsm: Running test_space vdsm: Running test_lo vdsm: Running test_conflicting_conf SUCCESS: ssl configured to true. No conflicts Starting up vdsm daemon: vdsm start [ OK ]
alonbl notes that restarts of services in common are not acceptable, as it runs also over systemd that takes care of dependencies between services. He also mentions that during bootstrap we call to libvirt-reconfigure and it's valid to fail vdsmd start if libvirt-reconfigure was not performed before the start. The bug description here talks about installing and starting the service without bootstrap flow, so if we decide to remove the reconfigure from pre-start the expected results will be failing the start with message to run libvirt-reconfigure first. Any comments on that?
So what is the missing command ensuring the bootstrap flow?
After installing Vdsm manually, one should currently run # vdsm-tool libvirt-configure # vdsm-tool libvirt-configure-services-restart
btw, it is unfortunate that we do not have an easy way if `vdsm-tool libvirt-configure` bailed out early since the libvirt and co were already configured. In such a case, `vdsm-tool libvirt-configure-services-restart` would be most likely unnecessary.
I posted alternative patch: http://gerrit.ovirt.org/#/c/19761/
This as it is a documentation issue, fixed now in 4a02e8744ae2e8baa. The way vdsm fails in this case is rather messy - I'll open a new improvemnt bug for this.
The patch 19761 is not enough. First Dan, I'm not sure I followed your comment 5, what easy way do you look for? way to avoid the restarts? Anyhow, we need to add a condition that prints such instructions also if user tries to start vdsmd service without performing the libvirt-configure first before, otherwise the actual results stay the same: [root@dhcp-2-233 RPMS]# service vdsmd start supervdsm start [ OK ] vdsm: Running run_init_hooks vdsm: Running gencerts vdsm: Running reconfigure_sanlock vdsm: Running reconfigure_libvirt Stopping libvirtd daemon: [ OK ] Reconfiguration of libvirt is done. To start working with the new configuration, execute: 'vdsm-tool libvirt-configure-services-restart' This will manage restarting of the following services: libvirtd, supervdsmd diff: /etc/init/libvirtd.conf: No such file or directory vdsm: Running syslog_available vdsm: Running nwfilter libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory ...
(In reply to Yaniv Bronhaim from comment #8) > The patch 19761 is not enough. It is enough to avoid this issue. > Anyhow, we need to add a condition that prints such instructions also if > user tries to start vdsmd service without performing the libvirt-configure > first before, otherwise the actual results stay the same: > > [root@dhcp-2-233 RPMS]# service vdsmd start > supervdsm start [ OK ] > vdsm: Running run_init_hooks > vdsm: Running gencerts > vdsm: Running reconfigure_sanlock > vdsm: Running reconfigure_libvirt > Stopping libvirtd daemon: [ OK ] > Reconfiguration of libvirt is done. > > To start working with the new configuration, execute: > 'vdsm-tool libvirt-configure-services-restart' > This will manage restarting of the following services: > libvirtd, supervdsmd > > diff: /etc/init/libvirtd.conf: No such file or directory > vdsm: Running syslog_available > vdsm: Running nwfilter > libvir: XML-RPC error : Failed to connect socket to > '/var/run/libvirt/libvirt-sock': No such file or directory > libvir: XML-RPC error : Failed to connect socket to > '/var/run/libvirt/libvirt-sock': No such file or directory > ... I think we can deal with this in another bug - fail in a useful manner when libvirt is not configured.
This bug should be targeted to 3.4 , or at least after stable ovirt-3.3 branch. In addition of adding the README print we should omit the call to libvirt_reconfigure from --pre-start , and consider having a condition with appropriate message to use when starting vdsmd before reconfiguring. Any objections?
You may be right Yaniv. The vdsmd startup script should not go ahead if it finds that a libvirt restart is required. It should fail with an appropriate error message. We may consider putting the text on vdsm's %post script, but that is frowned upon in rpm world; Another option is to actually RUN libvirt-reconfigure on the %post script, followed by a conditional restart of libvirtd. In the past this was impossible since we did not want to change the configuration when ovirt-node image was built, but now that we have ovirt-node-plugin-vdsm, we can assume that whomever installed it, wants vdsm running.
When vdsm fails to start (because of libvirt configuration or other issues), it should fail in a clean way, for example: [root@dhcp-2-233 RPMS]# service vdsmd start supervdsm start [ OK ] vdsm: Running run_init_hooks vdsm: Running gencerts vdsm: Running reconfigure_sanlock vdsm: Running reconfigure_libvirt Stopping libvirtd daemon: [ OK ] Reconfiguration of libvirt is done. To start working with the new configuration, execute: 'vdsm-tool libvirt-configure-services-restart' This will manage restarting of the following services: libvirtd, supervdsmd diff: /etc/init/libvirtd.conf: No such file or directory vdsm: Running reconfigure_libvirt Stopping libvirtd daemon: [ OK ] Reconfiguration of libvirt is done. To start working with the new configuration, execute: 'vdsm-tool libvirt-configure-services-restart' This will manage restarting of the following services: libvirtd, supervdsmd diff: /etc/init/libvirtd.conf: No such file or directory vdsm: Running syslog_available vdsm: Running nwfilter libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory Error: Error connecting to libvirt Please restart libvirtd service... Notes: 1. Do not log all connection attempts. We can retry multiple times, but can log one line after all retries failed. 2. Show clear error message separated from the noise 3. Provide recovery information if possible 4. Traceback should *not* be displayed for expected errors - it does not help anyone. 5. Is "diff: /etc/init/libvirtd.conf: No such file or directory" an expected condition or an error? If error, we probbaly should stop right after the error. If expected, is this interesting enough to log?
(In reply to Dan Kenigsberg from comment #11) > You may be right Yaniv. The vdsmd startup script should not go ahead if it > finds that a libvirt restart is required. It should fail with an appropriate > error message. > > We may consider putting the text on vdsm's %post script, but that is frowned > upon in rpm world; Another option is to actually RUN libvirt-reconfigure on > the %post script, followed by a conditional restart of libvirtd. In the past > this was impossible since we did not want to change the configuration when > ovirt-node image was built, but now that we have ovirt-node-plugin-vdsm, we > can assume that whomever installed it, wants vdsm running. I would like to see a cleaner solution. with no spec file handling at all meaning: - no cond restarts of any service - no libvirt reconfigure So in real life: - When one starts vdsm and linvirt is not configured ... we should give appropriate message and exit (vdsm-tool ...). - When one starts VDSM and vdsm identified libvirt needs to restart On both cases vdsm should not come up. The problem does not exist on host deploy as it takes care of all the sequence, The only remaining issue is the documentation on how to upgrade a hypervisor - not through host deploy ... and according to the messages described above it should be self explanatory. This is certainly not something we want to do for 3.3.1
Yes, let's target this to 3.4.0. Our biggest issue with failing vdsmd gracefully, is that we have legacy bootstrap scripts out there which expect vdsm to fix everything on its first start.
Well, so how do we plan to work with such legacy if we omit the automatic reconfigure part and replace it with a message? Over systemd we have such workaround with the usage of init/systemd/systemd-vdsmd.in , and now with this change we will need another ugly hack for sysv, right?
Oh thanks for reminding me of init/systemd/systemd-vdsmd.in: legacy bootstrap calls `service vdsmd reconfigure` explicitly on el6 and this funny script on Fedora. So I see no problem with ditching auto-configure ASAP.
Right.. forgot about the call to vdsmd reconfigure explicitly. Great, posting patch for that.
sanlock reconfigure should also detach from the --pre-start part and run manually. Should we discuss about it in the same bz scope or creating another one for that issue?
Currently when one of the pre-start part fails we don't the start flow.
Currently when one of the pre-start parts fails we stop the start flow.
oVirt 3.4.0 alpha has been released including the fix for this issue.
What to test here? What should be seen when starting vdsmd on a clean box which is not part of any setup (thus not configured)?
To see traceback output is not OK IMHO. av2.1/vdsm-4.14.2-0.3.el6ev.x86_64 (the newest version we have as ovirt beta3 provided by CI is older) # service vdsmd start Starting multipathd daemon: [ OK ] Starting rpcbind: [ OK ] Starting wdmd: [ OK ] Starting sanlock: [ OK ] Starting libvirtd daemon: [ OK ] supervdsm start [ OK ] Starting iscsid: [ OK ] vdsm: Running mkdirs vdsm: Running configure_coredump vdsm: Running run_init_hooks vdsm: Running gencerts Configuring a self-signed VDSM host certificatevdsm: Running check_is_configured libvirt is not configured for vdsm yet sanlock service requires restart Modules libvirt,sanlock are not configured Traceback (most recent call last): File "/usr/bin/vdsm-tool", line 145, in <module> sys.exit(main()) File "/usr/bin/vdsm-tool", line 142, in main return tool_command[cmd]["command"](*args[1:]) File "/usr/lib64/python2.6/site-packages/vdsm/tool/configurator.py", line 265, in isconfigured raise RuntimeError(msg) RuntimeError: One of the modules is not configured to work with VDSM. To configure the module use the following: 'vdsm-tool configure [module_name]'. If all modules are not configured try to use: 'vdsm-tool configure --force' (The force flag will stop the module's service and start it afterwards automatically to load the new configuration.) vdsm: stopped during execute check_is_configured task (task returned with error code 1). vdsm start
Vdsm cannot be run properly without libvirt being configured by it. The error message says that, and gives advice how to fix the issue. """ One of the modules is not configured to work with VDSM. To configure the module use the following: 'vdsm-tool configure [module_name]'. """ So the failure is no longer messy as reported by Nir, it is quite orderly and as planned.
(In reply to Dan Kenigsberg from comment #25) > So the failure is no longer messy as reported by Nir, it is quite orderly > and as planned. But the error message still show traceback, although the error is expected and well understood. This is not messy as it was, but should be fixed as well. So we can close this as the big mess is fixed, and open a new bug for removing the traceback, or fix it now.
ok, for traceback i created separate BZ1076371
this is an automated message: moving to Closed CURRENT RELEASE since oVirt 3.4.0 has been released