Bug 971082

Summary: Instance boot fails when security groups enabled and no network created
Product: Red Hat OpenStack Reporter: Daniel Berrangé <berrange>
Component: openstack-novaAssignee: Brent Eagles <beagles>
Status: CLOSED ERRATA QA Contact: Ami Jeain <ajeain>
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: afazekas, apevec, beagles, bperkins, chrisw, dallan, jkt, jliberma, jturner, mlopes, ndipanov, sclewis, xqueralt, yeylon
Target Milestone: Upstream M3   
Target Release: 4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-2013.2-0.23.rc1.el6ost Doc Type: Bug Fix
Doc Text:
Prior to this update, security group checks run against instances without networks would result in 0 matches and rejection. As a result, booting an instance would fail if there were no configured networks. With this update, security group checks do not run if there are no configured networks, and it is now possible to create instances without networks.
Story Points: ---
Clone Of:
: 981028 (view as bug list) Environment:
Last Closed: 2013-12-20 00:04:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 981028    

Description Daniel Berrangé 2013-06-05 15:53:24 UTC
Description of problem:
I provisioned a brand new RHEL-6.4 server and installed openstack using 'packstack --allinone'.

After uploading an image to glance, I am unable to start a VM. Nova compute logs an error

2013-06-05 16:38:28.815 ERROR nova.compute.manager [req-8cd60823-a03a-40b0-9b8e-89ca1cbfb5d7 71418ad737d6448dbfbf10bbbd00c703 556b397dac274d52b143895000d6ac6d] [instance: 26d34be9-df57-4902-ad7c-1ee001ea321a] Instance failed network setup
2013-06-05 16:38:28.815 26078 TRACE nova.compute.manager [instance: 26d34be9-df57-4902-ad7c-1ee001ea321a] Traceback (most recent call last):
2013-06-05 16:38:28.815 26078 TRACE nova.compute.manager [instance: 26d34be9-df57-4902-ad7c-1ee001ea321a]   File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1071, in _allocate_network
2013-06-05 16:38:28.815 26078 TRACE nova.compute.manager [instance: 26d34be9-df57-4902-ad7c-1ee001ea321a]     security_groups=security_groups)
2013-06-05 16:38:28.815 26078 TRACE nova.compute.manager [instance: 26d34be9-df57-4902-ad7c-1ee001ea321a]   File "/usr/lib/python2.6/site-packages/nova/network/api.py", line 46, in wrapper
2013-06-05 16:38:28.815 26078 TRACE nova.compute.manager [instance: 26d34be9-df57-4902-ad7c-1ee001ea321a]     res = f(self, context, *args, **kwargs)
2013-06-05 16:38:28.815 26078 TRACE nova.compute.manager [instance: 26d34be9-df57-4902-ad7c-1ee001ea321a]   File "/usr/lib/python2.6/site-packages/nova/network/quantumv2/api.py", line 213, in allocate_for_instance
2013-06-05 16:38:28.815 26078 TRACE nova.compute.manager [instance: 26d34be9-df57-4902-ad7c-1ee001ea321a]     security_group_id=security_group)
2013-06-05 16:38:28.815 26078 TRACE nova.compute.manager [instance: 26d34be9-df57-4902-ad7c-1ee001ea321a] SecurityGroupNotFound: Security group default not found.
2013-06-05 16:38:28.815 26078 TRACE nova.compute.manager [instance: 26d34be9-df57-4902-ad7c-1ee001ea321a] 
2013-06-05 16:38:29.008 AUDIT nova.compute.manager [req-8cd60823-a03a-40b0-9b8e-89ca1cbfb5d7 71418ad737d6448dbfbf10bbbd00c703 556b397dac274d52b143895000d6ac6d] [instance: 26d34be9-df57-4902-ad7c-1ee001ea321a] Terminating instance

The quantum CLI lists zero security groups:

# quantum security-group-list 

but strangely I am unable to create a group called 'default'

# quantum security-group-create default
Default security group already exists.


If i create a group with a different name it works:

# quantum security-group-create wibble
Created a new security_group:
...snip...


and magically the 'default' security group is now visible too

# quantum security-group-list
+--------------------------------------+---------+-------------+
| id                                   | name    | description |
+--------------------------------------+---------+-------------+
| ad82dd22-727d-4a2e-b45b-3ad073616a39 | default | default     |
| e62f8d30-7554-48aa-8da3-a69949ec3b81 | wibble  |             |
+--------------------------------------+---------+-------------+


so for some reason the 'default' security group exists, but it is not visible/accessible to nova or cli tools until a second security group is created

Version-Release number of selected component (if applicable):
openstack-packstack-2013.1.1-0.8.dev601.el6ost.noarch
openstack-nova-compute-2013.1.1-4.el6ost.noarch
openstack-quantum-2013.1.1-10.el6ost.noarch


How reproducible:
Only attempted once

Steps to Reproduce:
1. Provision bare RHEL 6.4 server
2. Add openstack yum repo
3. Install packstack
4. Run  packstack --allinone
5. Upload an image to glance
6. Boot a VM from the image

Actual results:
VM fails to boot with missing 'default' security group

Expected results:
VM boots

Additional info:

Comment 2 Alan Pevec 2013-06-05 17:39:56 UTC
Reproduced on a fresh install, after reboot:

DEBUG: quantumclient.client 
REQ: curl -i http://192.168.129.3:9696/v2.0/security-groups.json -X GET -H "User-Agent: python-quantumclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: c44d9611508b435ea4f570253eeb27bf"

DEBUG: quantumclient.client RESP:{'date': 'Wed, 05 Jun 2013 17:37:59 GMT', 'status': '200', 'content-length': '23', 'content-type': 'application/json; charset=UTF-8', 'content-location': u'http://192.168.129.3:9696/v2.0/security-groups.json'} {"security_groups": []}

Comment 3 Alan Pevec 2013-06-05 17:48:53 UTC
In ovs_quantum db:
mysql> select * from securitygroups;
Empty set (0.00 sec)

When is the default group created then?

Comment 4 Chris Wright 2013-06-05 18:09:57 UTC
Alan answered the following questions:

- nova.conf security_group_api = quantum (not nova)
- nova secgroup-list is null list like quantum security-group-list
- the db is truly empty (no default security group)

Some relevant code snippets:

quantum/plugins/openvswitch/ovs_quantum_plugin.py::create_network()
...
        self._ensure_default_security_group(context, tenant_id)

The default security group is created lazily.

And in response to #c1, Dan, the default security group is also built when a new security group is created:

quantum/db/securitygroups_db.py::create_security_group()
...
        if not default_sg:
            self._ensure_default_security_group(context, tenant_id)

So I believe the issue is at launch time, in nova.  Nova should not look for a security group if there is no network.

Comment 5 Alan Pevec 2013-06-05 20:27:38 UTC
(In reply to Chris Wright from comment #4)
> So I believe the issue is at launch time, in nova.  Nova should not look for
> a security group if there is no network.

Should this be moved to openstack-nova then?

Comment 6 Alan Pevec 2013-06-05 21:19:17 UTC
(In reply to Daniel Berrange from comment #0)
> After uploading an image to glance, I am unable to start a VM. Nova compute
> logs an error

What where your steps exactly?
Docs[1] actually assume network is created first, and quantum net-create would trigger creation of the default security group.

[1] http://docs.openstack.org/trunk/openstack-network/admin/content/basic_workflow_with_nova.html

Comment 7 Daniel Berrangé 2013-06-06 07:38:26 UTC
Exact set of commands were:

# packstack --allinone
# . keystonerc_admin 
# glance image-create --name f17 --disk-format qcow2 --container-format bare --file /root/f17-x86_64-openstack-sda.qcow2 --is-public True
# nova boot --flavor m1.tiny --image f17 f17demo1

I did not attempt to create any network, nor specify any --nic when booting the guest.

Comment 8 Gary Kotton 2013-06-06 15:25:26 UTC
The Quantum security groups are for quantum ports. A port can only be part of a Quantum network. 
When a nic is added there should be a security group attached to the VM.
Can you please clarfiy why this is a problem.

Comment 9 Daniel Berrangé 2013-06-06 15:29:25 UTC
(In reply to Gary Kotton from comment #8)
> The Quantum security groups are for quantum ports. A port can only be part
> of a Quantum network. 
> When a nic is added there should be a security group attached to the VM.
> Can you please clarfiy why this is a problem.

The above commands & stack trace show the problem.  Booting an instance with the above sequence of commands shouldn't result in a stack trace about a missing security group.

Comment 10 Stephen Gordon 2013-06-07 18:16:59 UTC
*** Bug 967283 has been marked as a duplicate of this bug. ***

Comment 11 Perry Myers 2013-06-08 12:50:39 UTC
Reading through the thread, based on analysis by cdub in Comment # 4 this looks like an issue in openstack-nova.

Moving to that component and assigning to a nova engineer to look at it more closely.

In short...  If a quantum network is created and you launch an instance on that network, the default security group should be created lazily.

But if you launch an instance without a network, the default security group is not created by quantum.  In this case, nova should probably either not launch the VM at all (because what use is a VM without a network?) or just skip associating the VM with a security group completely.

Comment 12 Perry Myers 2013-06-08 12:52:27 UTC
Also, if it is the case that this only happens when launching an instance before a quantum network is created, I think this can be easily worked around by telling users "prior to launching an instance, please create a quantum network for the instance to use".  This can be part of the users guide.

If that workaround is accurate, this is not a RHOS 3.0 blocker and we can push it off.  I'd like confirmation from cdub and danpb before we do that though.

Comment 13 jliberma@redhat.com 2013-06-10 13:48:18 UTC
I don't think the proposed workaround is sufficient

I also see this error when booting an instance in a second tenant

I can reproduce this error by...

1. installing packstack
2. create two tenants, each with a user and role
3. import an image into glance
4. as first tenant user, create a network and boot an instance (works no errors)
5. as second tenant user, create a network and boot an instance --> FAILs with "Security group default not found."

Comment 14 Perry Myers 2013-06-10 13:54:49 UTC
> 1. installing packstack
> 2. create two tenants, each with a user and role
> 3. import an image into glance
> 4. as first tenant user, create a network and boot an instance (works no
> errors)
> 5. as second tenant user, create a network and boot an instance --> FAILs
> with "Security group default not found."

Ok, so the lazy creation of default security group is only happening for the first quantum network or tenant, and not for the second.  That makes this bug a bit higher in severity.

Comment 16 Brent Eagles 2013-06-11 16:12:39 UTC
Information note:

The check that is causing the issue is called from _validate_and_provision_instance in compute/api.py. If no security group or default security group is specified, the code throws this exception simply because it checks for the security group before it does the network check. So the lazy initialization makes the order of checks a little more fragile.

Personal observations:
I'm mulling this one over... a hack might be to "eat the exception" if the security group is default. Sure does feel like a hack. You could also reverse the checks. Would that be bad? I don't think security groups are meant to affect the ability to check for the existence of a network. As long as it does not violate the notion of security group checks in this context, that's preferable to swallowing the exception.

Comment 17 Brent Eagles 2013-06-11 18:22:08 UTC
Comment 16 is not valid to this case, the api handles the "default" security group case and it throws further on.

Comment 19 Brent Eagles 2013-06-14 18:17:47 UTC
I was thinking about my patch and got to wondering if it was actually a little silly. It still will throw an exception and the VM will not be created, but the error information in the nova show output will be "NoValidHost"... which I think is similar to what happens if there is no network available when using nova-network (not 100% sure but I vaguely recall something like that). While it is better than throwing a bogus exception, does this actually fix the bug in the minds of all?

Comment 24 Ami Jeain 2013-10-11 05:16:35 UTC
verified:
1. created an image
2. neutron net-list

[root@cougar14 ~(keystone_admin)]# 
3. booted that image (saw there is only a default secgroup and didn't create any network

4. no error during the boot:
# nova boot --flavor m1.tiny --image f18 f18demo1
+--------------------------------------+--------------------------------------+
| Property                             | Value                                |
+--------------------------------------+--------------------------------------+
| OS-EXT-STS:task_state                | scheduling                           |
| image                                | f18                                  |
| OS-EXT-STS:vm_state                  | building                             |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000001                    |
| OS-SRV-USG:launched_at               | None                                 |
| flavor                               | m1.tiny                              |
| id                                   | 9e7e0929-e02e-429c-8125-445b4d0556a4 |
| security_groups                      | [{u'name': u'default'}]              |
| user_id                              | c6311b5310ce4306831c4673b79a452a     |
| OS-DCF:diskConfig                    | MANUAL                               |
| accessIPv4                           |                                      |
| accessIPv6                           |                                      |
| progress                             | 0                                    |
| OS-EXT-STS:power_state               | 0                                    |
| OS-EXT-AZ:availability_zone          | nova                                 |
| config_drive                         |                                      |
| status                               | BUILD                                |
| updated                              | 2013-10-11T05:13:52Z                 |
| hostId                               |                                      |
| OS-EXT-SRV-ATTR:host                 | None                                 |
| OS-SRV-USG:terminated_at             | None                                 |
| key_name                             | None                                 |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | None                                 |
| name                                 | f18demo1                             |
| adminPass                            | aiDMWTLn8Psa                         |
| tenant_id                            | a20e6029a126406ca896fbdf0ac7a353     |
| created                              | 2013-10-11T05:13:51Z                 |
| os-extended-volumes:volumes_attached | []                                   |
| metadata                             | {}                                   |
+--------------------------------------+--------------------------------------+

Comment 27 errata-xmlrpc 2013-12-20 00:04:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html