Description of problem: During installation of OpenStack with packstack, nova-compute process can die sometimes(!) because of socket connection to libvirt is broken, while libvirt is getting restarted, and at the end, after packstack run ends "successfully" nova-compute service is dead. This was spotted only in with-qpid installations, though does not have to be limited only to qpid. Version-Release number of selected component (if applicable): > openstack-nova-compute.noarch 2014.1-7.el7ost > libvirt-daemon.x86_64 (and other libvirt-* pkgs) 1.1.1-29.el7 > python-nova.noarch 2014.1-7.el7ost > python-novaclient.noarch 1:2.17.0-2.el7ost > openstack-packstack.noarch (and ..-puppet) 2014.1.1-0.28.dev1194.el7ost > openstack-puppet-modules.noarch 2014.1-18.el7ost > python-qpid.noarch 0.18-12.el7 > qpid-cpp-client.x86_64 0.18-25.el7 > qpid-cpp-server.x86_64 0.18-25.el7 Steps to Reproduce: 1. just try installing with install with packstack (answerfile provided, +qpid?) few times (not rerun on same machine) 2. after packstack ends check status of service (service/systemctl/..., for ex. openstack-status shows 'inactive' but without "disabled on boot") Actual results: openstack-nova-compute service is dead nova-compute.log contains 2014-06-30 06:17:02.716 11723 ERROR nova.openstack.common.threadgroup [-] Connection to the hypervisor is broken on host: jenkins-298aae9c-112.novalocal Expected results: nova-compute is running and working Additional info: in /var/log/messages following appears: > Jun 30 06:17:02 jenkins-298aae9c-112 systemd: Stopping Virtualization daemon... > Jun 30 06:17:02 jenkins-298aae9c-112 systemd: Starting Virtualization daemon... > Jun 30 06:17:02 jenkins-298aae9c-112 nova-compute: libvirt: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory > Jun 30 06:17:02 jenkins-298aae9c-112 nova-compute: Traceback (most recent call last): > Jun 30 06:17:02 jenkins-298aae9c-112 nova-compute: File "/usr/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 97, in wait timing matches packstack run (/var/tmp/packstack/*/openstack-setup.log): > 2014-06-30 06:10:57::DEBUG::run_setup::409::root:: no post condition check for group PUPPET > 2014-06-30 06:10:57::DEBUG::run_setup::596::root:: {'CONFIG_RH_USER': '', 'CONFIG_REPO': '', 'CONFIG_AMQP_ENABLE_SSL': 'n', 'CONFIG_RH_OPTIONAL': 'y', 'CONFIG_CINDER_KS_PW': '********', 'CONF > 2014-06-30 06:10:57::DEBUG::sequences::93::root:: Running sequence Clean Up. > 2014-06-30 06:10:57::DEBUG::sequences::34::root:: Running step Clean Up. > 2014-06-30 06:10:57::INFO::shell::81::root:: [localhost] Executing script: > rm -rf /var/tmp/packstack/20140630-061057-ZBuGR4/manifests/*pp ... > ======== END OF STDOUT ======== > 2014-06-30 06:21:21::DEBUG::run_setup::575::root:: *** The following params were used as user input: > 2014-06-30 06:21:21::DEBUG::run_setup::580::root:: ssh-public-key: /root/.ssh/id_rsa.pub > 2014-06-30 06:21:21::DEBUG::run_setup::580::root:: mysql-install: y > 2014-06-30 06:21:21::DEBUG::run_setup::580::root:: os-glance-install: y > 2014-06-30 06:21:21::DEBUG::run_setup::580::root:: os-cinder-install: y > 2014-06-30 06:21:21::DEBUG::run_setup::580::root:: os-nova-install: y > 2014-06-30 06:21:21::DEBUG::run_setup::580::root:: os-neutron-install: n While this could be also solved by compute being able to reconnect/survive libvirt restarts (bug #1092820), packstack or puppets could try to make sure to restart libvirt before nova is started, or (re)start it after libvirt, or at the end of installation - to not end up with dead service. Will attach full /var/{log/messages,packstack/*}.
Created attachment 913480 [details] messages, nova and packstack logs Attaching collected /var/log/{messages,nova} and /var/tmp/packstack/*the-one-who-installed*/.
*** Bug 1115735 has been marked as a duplicate of this bug. ***
This appears to be a race between libvirtd and nova-compute restarting.
This could also be a nova bug. The systemd unit file for nova-compute does not have an explicit dependency on libvirtd, despite it ALWAYS requiring it on RHEL7 installations. Perhaps simply adding the following to nova-compute unit file would fix it? After=network.target Requires=libvirtd.service
All my tests were using RabbitMQ.
This bug is because of another bug we addressed where we have to restart libvirtd in order to pick up new network filters. Perhaps another possible solution here is to SIGHUP libvirtd instead of restarting it. Dan, do you think this is feasible?
A SIGHUP makes libvirt reloading XML files, but not its primary config files. So depends whether there was any other change made to qemu.conf or libvirtd.conf - if so that requires a full restart.
It should just be the network XML filters.
[root@localhost ~]# virsh nwfilter-list UUID Name ---------------------------------------------------------------- [root@localhost ~]# service libvirtd status Redirecting to /bin/systemctl status libvirtd.service libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled) Active: active (running) since Thu 2014-07-03 09:16:29 EDT; 2min 49s ago Main PID: 1207 (libvirtd) CGroup: /system.slice/libvirtd.service └─1207 /usr/sbin/libvirtd Jul 03 09:16:29 localhost.localdomain libvirtd[1207]: libvirt version: 1.1.1,... Jul 03 09:16:29 localhost.localdomain libvirtd[1207]: Module /usr/lib64/libvi... Jul 03 09:16:29 localhost.localdomain systemd[1]: Started Virtualization daemon. Hint: Some lines were ellipsized, use -l to show in full. [root@localhost ~]# killall -HUP libvirtd [root@localhost ~]# virsh nwfilter-list UUID Name ---------------------------------------------------------------- 11e8a452-fb59-4efb-b424-e01200e6b443 allow-arp 24ac7d37-8674-4edd-8eed-d2e97a27d296 allow-dhcp 015c14f4-4dce-49cd-b06e-1ee3dd70a2a7 allow-dhcp-server 395590a6-6400-4a7f-9e71-e08ebaade117 allow-incoming-ipv4 3306bac1-80f5-42a6-a6ae-3c550023b485 allow-ipv4 e9c89db4-561b-448e-8261-5719045836b6 clean-traffic 6558df19-8e12-47db-8bc4-e2333d61188b no-arp-ip-spoofing e7b06988-9e9a-4a61-a452-e5007e062f95 no-arp-mac-spoofing 70226ae4-4dee-482b-8a51-5fae1970d104 no-arp-spoofing 0caf8e39-ff1a-41a1-992c-58cd0d7ebcee no-ip-multicast 5e1deb99-a661-4e06-8ed4-bc6a97e7e7c2 no-ip-spoofing 5a3f518e-481f-4000-9e8a-a02dbe380645 no-mac-broadcast 0cd64b01-2883-471b-bcb2-58232e9bacd0 no-mac-spoofing a4345dd4-9a14-4699-82e3-c110c215e6ee no-other-l2-traffic 14cc0036-08b7-44a3-9e96-9a9e76abbcbe no-other-rarp-traffic fc3f51cd-6fec-40bc-ad2a-528663245af8 qemu-announce-self e2517154-8deb-4130-b62f-858bf5074b71 qemu-announce-self-rarp It appears that changing packstack to send libvirtd SIGHUP instead of a restart (and there's probably a systemctl command to do this cleanly) will address the previous issue without restarting libvirtd - so this bug would thus not appear.
'service libvirtd reload' also works, see https://bugzilla.redhat.com/show_bug.cgi?id=1109362#c22
*** This bug has been marked as a duplicate of bug 1109362 ***