Bug 857747

Summary: nova-network crashes if launched with nova-cert
Product: [Fedora] Fedora Reporter: Zane Bitter <zbitter>
Component: openstack-novaAssignee: Pádraig Brady <p>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: akscram, alexander.sakhnov, apevec, asalkeld, bfilippov, breu, Jan.van.Eldik, jonathansteffan, jose.castro.leon, markmc, matt_domsch, mlvov, pbrady, p, rbryant, rkukura
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-01 17:57:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zane Bitter 2012-09-16 18:42:23 UTC
Description of problem:
When starting nova-network immediately before nova-cert:

    systemctl start openstack-nova-network.service openstack-nova-cert.service

nova-network crashes. When started in the opposite order:

    systemctl start openstack-nova-cert.service openstack-nova-network.service

everything works fine. Everything also works when any other nova-* service is substituted for nova-cert in the first command above.

Log:

Sep 16 14:36:18 mrg-04 nova-network[17166]: Traceback (most recent call last):
Sep 16 14:36:18 mrg-04 nova-network[17166]: File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 336, in fire_timers
Sep 16 14:36:18 mrg-04 nova-network[17166]: timer()
Sep 16 14:36:18 mrg-04 nova-network[17166]: File "/usr/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 56, in __call__
Sep 16 14:36:18 mrg-04 nova-network[17166]: cb(*args, **kw)
Sep 16 14:36:18 mrg-04 nova-network[17166]: File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 192, in main
Sep 16 14:36:18 mrg-04 nova-network[17166]: result = function(*args, **kwargs)
Sep 16 14:36:18 mrg-04 nova-network[17166]: File "/usr/lib/python2.7/site-packages/nova/service.py", line 101, in run_server
Sep 16 14:36:18 mrg-04 nova-network[17166]: server.start()
Sep 16 14:36:18 mrg-04 nova-network[17166]: File "/usr/lib/python2.7/site-packages/nova/service.py", line 162, in start
Sep 16 14:36:18 mrg-04 nova-network[17166]: self.manager.init_host()
Sep 16 14:36:18 mrg-04 nova-network[17166]: File "/usr/lib/python2.7/site-packages/nova/network/manager.py", line 1702, in init_host
Sep 16 14:36:18 mrg-04 nova-network[17166]: self.l3driver.initialize()
Sep 16 14:36:18 mrg-04 nova-network[17166]: File "/usr/lib/python2.7/site-packages/nova/network/l3.py", line 84, in initialize
Sep 16 14:36:18 mrg-04 nova-network[17166]: linux_net.init_host()
Sep 16 14:36:18 mrg-04 nova-network[17166]: File "/usr/lib/python2.7/site-packages/nova/network/linux_net.py", line 442, in init_host
Sep 16 14:36:18 mrg-04 nova-network[17166]: add_snat_rule(ip_range)
Sep 16 14:36:18 mrg-04 nova-network[17166]: File "/usr/lib/python2.7/site-packages/nova/network/linux_net.py", line 432, in add_snat_rule
Sep 16 14:36:18 mrg-04 nova-network[17166]: iptables_manager.apply()
Sep 16 14:36:18 mrg-04 nova-network[17166]: File "/usr/lib/python2.7/site-packages/nova/utils.py", line 944, in inner
Sep 16 14:36:18 mrg-04 nova-network[17166]: retval = f(*args, **kwargs)
Sep 16 14:36:18 mrg-04 nova-network[17166]: File "/usr/lib/python2.7/site-packages/lockfile/__init__.py", line 233, in __exit__
Sep 16 14:36:18 mrg-04 nova-network[17166]: self.release()
Sep 16 14:36:18 mrg-04 nova-network[17166]: File "/usr/lib/python2.7/site-packages/lockfile/linklockfile.py", line 52, in release
Sep 16 14:36:18 mrg-04 nova-network[17166]: raise NotLocked
Sep 16 14:36:18 mrg-04 nova-network[17166]: NotLocked
Sep 16 14:36:18 mrg-04 nova-network[17166]: 2012-09-16 14:36:18 CRITICAL nova [-]
Sep 16 14:36:18 mrg-04 systemd[1]: openstack-nova-network.service: main process exited, code=exited, status=1
Sep 16 14:36:18 mrg-04 systemd[1]: Unit openstack-nova-network.service entered failed state.

The cause is the SELinux policy for nova-cert. This prevents nova-cert from determining if nova-network is running by doing kill(pid, 0) in nova.utils.cleanup_file_locks(). The nova utils code interprets the denial of permission as meaning that the process is not running, and consequently deletes any locks that are in use. When nova-network, which is indeed still running, goes to delete its lock and finds it gone, it raises an exception.

Audit log:

type=AVC msg=audit(1347820578.676:7219): avc:  denied  { signull } for  pid=1716
7 comm="nova-cert" scontext=system_u:system_r:nova_cert_t:s0 tcontext=system_u:s
ystem_r:nova_network_t:s0 tclass=process

Version-Release number of selected component (if applicable):
openstack-nova-2012.1.1-15.fc17
openstack-nova-network-2012.1.1-15.fc17
openstack-nova-cert-2012.1.1-15.fc17
selinux-policy-3.10.0-146.fc17

How reproducible:
100% on a very fast machine in the lab. The timing may be different on other machines.

Steps to Reproduce:
1. systemctl start openstack-nova-network.service openstack-nova-cert.service
2. ...
3. Profit!
  
Actual results:
# systemctl status openstack-nova-network.serviceopenstack-nova-network.service - OpenStack Nova Network Server
	  Loaded: loaded (/usr/lib/systemd/system/openstack-nova-network.service; disabled)
	  Active: failed (Result: exit-code) since Sun, 16 Sep 2012 14:36:18 -0400; 7s ago
	 Process: 17166 ExecStart=/usr/bin/nova-network --config-file /etc/nova/nova.conf --logfile /var/log/nova/network.log (code=exited, status=1/FAILURE)
	  CGroup: name=systemd:/system/openstack-nova-network.service

Sep 16 14:36:18 mrg-04.mpc.lab.eng.bos.redhat.com nova-network[17166]: File "...
Sep 16 14:36:18 mrg-04.mpc.lab.eng.bos.redhat.com nova-network[17166]: iptabl...
Sep 16 14:36:18 mrg-04.mpc.lab.eng.bos.redhat.com nova-network[17166]: File "...
Sep 16 14:36:18 mrg-04.mpc.lab.eng.bos.redhat.com nova-network[17166]: retval...
Sep 16 14:36:18 mrg-04.mpc.lab.eng.bos.redhat.com nova-network[17166]: File "...
Sep 16 14:36:18 mrg-04.mpc.lab.eng.bos.redhat.com nova-network[17166]: self.r...
Sep 16 14:36:18 mrg-04.mpc.lab.eng.bos.redhat.com nova-network[17166]: File "...
Sep 16 14:36:18 mrg-04.mpc.lab.eng.bos.redhat.com nova-network[17166]: raise ...
Sep 16 14:36:18 mrg-04.mpc.lab.eng.bos.redhat.com nova-network[17166]: NotLocked
Sep 16 14:36:18 mrg-04.mpc.lab.eng.bos.redhat.com nova-network[17166]: 2012-0...

Expected results:
# systemctl status openstack-nova-network.serviceopenstack-nova-network.service - OpenStack Nova Network Server
	  Loaded: loaded (/usr/lib/systemd/system/openstack-nova-network.service; disabled)
	  Active: active (running) since Sun, 16 Sep 2012 14:39:46 -0400; 3s ago
	Main PID: 17224 (nova-network)
	  CGroup: name=systemd:/system/openstack-nova-network.service
		  └ 17224 /usr/bin/python /usr/bin/nova-network --config-fil...

Sep 16 14:39:46 mrg-04.mpc.lab.eng.bos.redhat.com sudo[17241]: nova : TTY=unk...
Sep 16 14:39:46 mrg-04.mpc.lab.eng.bos.redhat.com sudo[17244]: nova : TTY=unk...
Sep 16 14:39:47 mrg-04.mpc.lab.eng.bos.redhat.com sudo[17247]: nova : TTY=unk...
Sep 16 14:39:47 mrg-04.mpc.lab.eng.bos.redhat.com sudo[17250]: nova : TTY=unk...
Sep 16 14:39:47 mrg-04.mpc.lab.eng.bos.redhat.com sudo[17253]: nova : TTY=unk...
Sep 16 14:39:47 mrg-04.mpc.lab.eng.bos.redhat.com sudo[17256]: nova : TTY=unk...
Sep 16 14:39:47 mrg-04.mpc.lab.eng.bos.redhat.com sudo[17259]: nova : TTY=unk...
Sep 16 14:39:47 mrg-04.mpc.lab.eng.bos.redhat.com sudo[17262]: nova : TTY=unk...
Sep 16 14:39:47 mrg-04.mpc.lab.eng.bos.redhat.com sudo[17265]: nova : TTY=unk...
Sep 16 14:39:47 mrg-04.mpc.lab.eng.bos.redhat.com nova-network[17224]: 2012-0...

Additional info:
It seems odd that SELinux denies permission to do kill(pid, 0), since this does not actually send a signal. However, another potential fix would be to check the errno of the OSError exception raised in nova.utils.cleanup_file_locks(), rather than just assuming that an exception means the process is not running.

Comment 1 Pádraig Brady 2012-09-17 12:31:09 UTC
Thanks for the excellent analysis Zane.
Note we got rid of cleanup_file_locks completely in
the OpenStack Folsom release.

It's tempting to backport that patch,
but I've proposed a simpler fix at:
https://review.openstack.org/13106

thanks!

Comment 2 Pádraig Brady 2012-09-27 08:52:23 UTC
This is now upstream and should be released very soon for F17

Comment 3 Fedora End Of Life 2013-07-04 07:41:21 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 5 Fedora End Of Life 2013-08-01 17:57:49 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.