Description of problem:
case (history):
after vdsm started to configure libvirt using initctl, we seem to hit a repeated case where vdsm fails connecting libvirt socket, and system fails to initialize.
how it happens?
system is alive after vdsm started libvirtd using initctl, user comes and restart libvirt using sysvfs (service libvirtd restart), in some conditions, it creates a state of 2 running libvirt daemons, were actually, its one running libvirt process, along with watchdog tries to start another process, and using 'watch', i see lots of pid's changing (probably goes up and down).
Repro steps:
[root@nott-vds1 core]# initctl stop libvirtd
initctl: Unknown instance:
[root@nott-vds1 core]# initctl start libvirtd
libvirtd start/running, process 16772
[root@nott-vds1 core]# /etc/init.d/libvirtd start
Starting libvirtd daemon:
since we are moving forward to beta2, and RC, I think we should find better solution, or at least, from libvirt side, protect such cases, so in case libvirt process already runs, do not try to start another one.
vdsm error log connection to socket:
clientIFinit::ERROR::2011-08-07 11:20:43,811::clientIF::933::vds::(_recoverExistingVms) Vm's recovery failed
Traceback (most recent call last):
File "/usr/share/vdsm/clientIF.py", line 898, in _recoverExistingVms
vdsmVms = self.getVDSMVms()
File "/usr/share/vdsm/clientIF.py", line 959, in getVDSMVms
conn = libvirtconnection.get(self)
File "/usr/share/vdsm/libvirtconnection.py", line 94, in get
conn = libvirt.openAuth('qemu:///system', auth, 0)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 102, in openAuth
if ret is None:raise libvirtError('virConnectOpenAuth() failed')
libvirtError: Cannot recv data: Connection reset by peer
libvirt log when watchdog tries to start another process:
11:21:36.806: 19298: error : virNetSocketNewListenTCP:281 : Unable to bind to port: Address already in use
11:21:36.820: 19310: info : libvirt version: 0.9.4, package: 0rc1.2.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2011-08-01-23:37:12, x86-003.build.bos.redhat.com)
11:21:36.820: 19310: debug : virRegisterNetworkDriver:584 : registering Network as network driver 3
11:21:36.820: 19310: debug : virRegisterInterfaceDriver:617 : registering Interface as interface driver 3
11:21:36.820: 19310: debug : virRegisterStorageDriver:650 : registering storage as storage driver 3
11:21:36.820: 19310: debug : virRegisterDeviceMonitor:683 : registering udevDeviceMonitor as device driver 3
11:21:36.820: 19310: debug : virRegisterSecretDriver:716 : registering secret as secret driver 3
11:21:36.820: 19310: debug : virRegisterNWFilterDriver:749 : registering nwfilter as network filter driver 3
11:21:36.820: 19310: debug : virRegisterDriver:767 : driver=0x71d500 name=QEMU
11:21:36.820: 19310: debug : virRegisterDriver:791 : registering QEMU as driver 3
11:21:36.820: 19310: debug : virRegisterDriver:767 : driver=0x71db20 name=LXC
11:21:36.820: 19310: debug : virRegisterDriver:791 : registering LXC as driver 4
11:21:36.821: 19310: debug : virHookCheck:115 : No hook script /etc/libvirt/hooks/daemon
11:21:36.821: 19310: debug : virHookCheck:115 : No hook script /etc/libvirt/hooks/qemu
11:21:36.821: 19310: debug : virHookCheck:115 : No hook script /etc/libvirt/hooks/lxc
11:21:37.160: 19310: error : virNetSocketNewListenTCP:281 : Unable to bind to port: Address already in us
Hi Haim, it seems like this could be solved with the same change as BZ 728153, having the init script check to see if libvirtd is being managed by upstart or systemd, and if so, notifying the user and exiting. If that works for you, I'll close this one as a dup and we can track the work through 728158.