1114690 – during packstack installation nova-compute may die while libvirt is restarted

Bug 1114690 - during packstack installation nova-compute may die while libvirt is restarted

Summary: during packstack installation nova-compute may die while libvirt is restarted

Keywords:
Status:	CLOSED DUPLICATE of bug 1109362
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-packstack
Sub Component:
Version:	5.0 (RHEL 7)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	rc
Target Release:	5.0 (RHEL 7)
Assignee:	Martin Magr
QA Contact:	Ami Jeain
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1115735 (view as bug list)
Depends On:	1109362
Blocks:
TreeView+	depends on / blocked

Reported:	2014-06-30 16:26 UTC by Pavel Sedlák
Modified:	2023-09-18 09:58 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-07-03 17:49:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
messages, nova and packstack logs (1.23 MB, application/x-bzip2) 2014-06-30 16:34 UTC, Pavel Sedlák	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	104560	0	None	None	None	Never

Description Pavel Sedlák 2014-06-30 16:26:37 UTC

Description of problem:
During installation of OpenStack with packstack, nova-compute process can die sometimes(!) because of socket connection to libvirt is broken, while libvirt is getting restarted, and at the end, after packstack run ends "successfully" nova-compute service is dead.
This was spotted only in with-qpid installations, though does not have to be limited only to qpid.

Version-Release number of selected component (if applicable):
> openstack-nova-compute.noarch                     2014.1-7.el7ost
> libvirt-daemon.x86_64 (and other libvirt-* pkgs)  1.1.1-29.el7
> python-nova.noarch                                2014.1-7.el7ost            
> python-novaclient.noarch                          1:2.17.0-2.el7ost
> openstack-packstack.noarch (and ..-puppet)        2014.1.1-0.28.dev1194.el7ost
> openstack-puppet-modules.noarch                   2014.1-18.el7ost
> python-qpid.noarch                                0.18-12.el7
> qpid-cpp-client.x86_64                            0.18-25.el7
> qpid-cpp-server.x86_64                            0.18-25.el7

Steps to Reproduce:
1. just try installing with install with packstack (answerfile provided, +qpid?) few times (not rerun on same machine)
2. after packstack ends check status of service (service/systemctl/..., for ex. openstack-status shows 'inactive' but without "disabled on boot")

Actual results:
openstack-nova-compute service is dead
nova-compute.log contains
2014-06-30 06:17:02.716 11723 ERROR nova.openstack.common.threadgroup [-] Connection to the hypervisor is broken on host: jenkins-298aae9c-112.novalocal

Expected results:
nova-compute is running and working

Additional info:
in /var/log/messages following appears:

> Jun 30 06:17:02 jenkins-298aae9c-112 systemd: Stopping Virtualization daemon...           
> Jun 30 06:17:02 jenkins-298aae9c-112 systemd: Starting Virtualization daemon...           
> Jun 30 06:17:02 jenkins-298aae9c-112 nova-compute: libvirt: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory
> Jun 30 06:17:02 jenkins-298aae9c-112 nova-compute: Traceback (most recent call last):     
> Jun 30 06:17:02 jenkins-298aae9c-112 nova-compute: File "/usr/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 97, in wait

timing matches packstack run (/var/tmp/packstack/*/openstack-setup.log):
> 2014-06-30 06:10:57::DEBUG::run_setup::409::root:: no post condition check for group PUPPET
> 2014-06-30 06:10:57::DEBUG::run_setup::596::root:: {'CONFIG_RH_USER': '', 'CONFIG_REPO': '', 'CONFIG_AMQP_ENABLE_SSL': 'n', 'CONFIG_RH_OPTIONAL': 'y', 'CONFIG_CINDER_KS_PW': '********', 'CONF
> 2014-06-30 06:10:57::DEBUG::sequences::93::root:: Running sequence Clean Up.              
> 2014-06-30 06:10:57::DEBUG::sequences::34::root:: Running step Clean Up.                  
> 2014-06-30 06:10:57::INFO::shell::81::root:: [localhost] Executing script:                
> rm -rf /var/tmp/packstack/20140630-061057-ZBuGR4/manifests/*pp                            
...
> ======== END OF STDOUT ========                                                           
> 2014-06-30 06:21:21::DEBUG::run_setup::575::root:: *** The following params were used as user input:
> 2014-06-30 06:21:21::DEBUG::run_setup::580::root:: ssh-public-key: /root/.ssh/id_rsa.pub  
> 2014-06-30 06:21:21::DEBUG::run_setup::580::root:: mysql-install: y                       
> 2014-06-30 06:21:21::DEBUG::run_setup::580::root:: os-glance-install: y                   
> 2014-06-30 06:21:21::DEBUG::run_setup::580::root:: os-cinder-install: y                   
> 2014-06-30 06:21:21::DEBUG::run_setup::580::root:: os-nova-install: y                     
> 2014-06-30 06:21:21::DEBUG::run_setup::580::root:: os-neutron-install: n                  

While this could be also solved by compute being able to reconnect/survive libvirt restarts (bug #1092820),
packstack or puppets could try to make sure to restart libvirt before nova is started,
or (re)start it after libvirt, or at the end of installation - to not end up with dead service.

Will attach full /var/{log/messages,packstack/*}.

Comment 1 Pavel Sedlák 2014-06-30 16:34:35 UTC

Created attachment 913480 [details]
messages, nova and packstack logs

Attaching collected /var/log/{messages,nova} and /var/tmp/packstack/*the-one-who-installed*/.

Comment 3 Lon Hohberger 2014-07-03 01:37:58 UTC

*** Bug 1115735 has been marked as a duplicate of this bug. ***

Comment 4 Lon Hohberger 2014-07-03 01:39:18 UTC

This appears to be a race between libvirtd and nova-compute restarting.

Comment 5 Lon Hohberger 2014-07-03 01:47:23 UTC

This could also be a nova bug.

The systemd unit file for nova-compute does not have an explicit dependency on libvirtd, despite it ALWAYS requiring it on RHEL7 installations.

Perhaps simply adding the following to nova-compute unit file would fix it?

  After=network.target
  Requires=libvirtd.service

Comment 6 Lon Hohberger 2014-07-03 11:47:57 UTC

All my tests were using RabbitMQ.

Comment 7 Lon Hohberger 2014-07-03 11:51:34 UTC

This bug is because of another bug we addressed where we have to restart libvirtd in order to pick up new network filters.  Perhaps another possible solution here is to SIGHUP libvirtd instead of restarting it.

Dan, do you think this is feasible?

Comment 8 Daniel Berrangé 2014-07-03 12:06:10 UTC

A SIGHUP makes libvirt reloading XML files, but not its primary config files. So depends whether there was any other change made to qemu.conf or libvirtd.conf - if so that requires a full restart.

Comment 9 Lon Hohberger 2014-07-03 13:13:07 UTC

It should just be the network XML filters.

Comment 10 Lon Hohberger 2014-07-03 13:22:31 UTC

[root@localhost ~]# virsh nwfilter-list
UUID                                  Name                 
----------------------------------------------------------------

[root@localhost ~]# service libvirtd status
Redirecting to /bin/systemctl status  libvirtd.service
libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled)
   Active: active (running) since Thu 2014-07-03 09:16:29 EDT; 2min 49s ago
 Main PID: 1207 (libvirtd)
   CGroup: /system.slice/libvirtd.service
           └─1207 /usr/sbin/libvirtd

Jul 03 09:16:29 localhost.localdomain libvirtd[1207]: libvirt version: 1.1.1,...
Jul 03 09:16:29 localhost.localdomain libvirtd[1207]: Module /usr/lib64/libvi...
Jul 03 09:16:29 localhost.localdomain systemd[1]: Started Virtualization daemon.
Hint: Some lines were ellipsized, use -l to show in full.
[root@localhost ~]# killall -HUP libvirtd
[root@localhost ~]# virsh nwfilter-list
UUID                                  Name                 
----------------------------------------------------------------
11e8a452-fb59-4efb-b424-e01200e6b443  allow-arp           
24ac7d37-8674-4edd-8eed-d2e97a27d296  allow-dhcp          
015c14f4-4dce-49cd-b06e-1ee3dd70a2a7  allow-dhcp-server   
395590a6-6400-4a7f-9e71-e08ebaade117  allow-incoming-ipv4 
3306bac1-80f5-42a6-a6ae-3c550023b485  allow-ipv4          
e9c89db4-561b-448e-8261-5719045836b6  clean-traffic       
6558df19-8e12-47db-8bc4-e2333d61188b  no-arp-ip-spoofing  
e7b06988-9e9a-4a61-a452-e5007e062f95  no-arp-mac-spoofing 
70226ae4-4dee-482b-8a51-5fae1970d104  no-arp-spoofing     
0caf8e39-ff1a-41a1-992c-58cd0d7ebcee  no-ip-multicast     
5e1deb99-a661-4e06-8ed4-bc6a97e7e7c2  no-ip-spoofing      
5a3f518e-481f-4000-9e8a-a02dbe380645  no-mac-broadcast    
0cd64b01-2883-471b-bcb2-58232e9bacd0  no-mac-spoofing     
a4345dd4-9a14-4699-82e3-c110c215e6ee  no-other-l2-traffic 
14cc0036-08b7-44a3-9e96-9a9e76abbcbe  no-other-rarp-traffic
fc3f51cd-6fec-40bc-ad2a-528663245af8  qemu-announce-self  
e2517154-8deb-4130-b62f-858bf5074b71  qemu-announce-self-rarp


It appears that changing packstack to send libvirtd SIGHUP instead of a restart (and there's probably a systemctl command to do this cleanly) will address the previous issue without restarting libvirtd - so this bug would thus not appear.

Comment 11 Lon Hohberger 2014-07-03 13:27:38 UTC

'service libvirtd reload' also works, see https://bugzilla.redhat.com/show_bug.cgi?id=1109362#c22

Comment 12 Alan Pevec 2014-07-03 17:49:45 UTC


*** This bug has been marked as a duplicate of bug 1109362 ***

Note You need to log in before you can comment on or make changes to this bug.