1049985 – RDO: Instance hangs in 'Deleting' state ... multi node (using GRE tenant networks)

RDO tickets are now tracked in Jira https://issues.redhat.com/projects/RDO/issues/

Bug 1049985 - RDO: Instance hangs in 'Deleting' state ... multi node (using GRE tenant networks)

Summary: RDO: Instance hangs in 'Deleting' state ... multi node (using GRE tenant netw...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	RDO
Classification:	Community
Component:	openstack-nova
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	Icehouse
Assignee:	RHOS Maint
QA Contact:	Gabriel Szasz
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-01-08 14:57 UTC by Ronelle Landy
Modified:	2016-04-26 18:06 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-03-20 15:59:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
api.log (18.59 MB, text/x-log) 2014-01-09 09:15 UTC, Ronelle Landy	no flags	Details
compute.log (18.61 MB, application/x-tar) 2014-01-09 09:20 UTC, Ronelle Landy	no flags	Details
View All

Description Ronelle Landy 2014-01-08 14:57:48 UTC

Description of problem:

Found while running test setups on RDO Icehouse Test day ...

On Fedora 20, I following these steps: http://openstack.redhat.com/Using_GRE_Tenant_Networks and used packstack to install RDO. I then ran the steps in the 'Next Steps' (http://openstack.redhat.com/Quickstart#Step_2:_Install_Packstack_Installer) to run an instance, configure a floating IP range and add a second compute node.

After that, with the instance still running, I ran the steps listed: http://openstack.redhat.com/Using_VXLAN_Tenant_Networks to test out vxlan tunnels. During these steps, I needed to delete and recreate the network. I tried to delete the network and got an error on delete. I assumed the error may be due to the running instance still associated with it. I then tried to delete the running instance from Horizon. The instance is just stuck in 'Deleting' state. Issuing another delete dos not help.

This error shows up repeatedly in /var/log/nova/computer/log:

leted. (/builddir/build/BUILD/qpid-0.24/cpp/src/qpid/broker/Queue.cpp:1499)(408)
2014-01-08 09:12:18.095 14695 TRACE root SessionError: Queue compute has been deleted. (/builddir/build/BUILD/qpid-0.24/cpp/src/qpid/broker/Queue.cpp:1499)(408)
2014-01-08 09:13:19.115 14695 TRACE root SessionError: Queue compute has been deleted. (/builddir/build/BUILD/qpid-0.24/cpp/src/qpid/broker/Queue.cpp:1499)(408)

A similar report was logged here: http://openstack.redhat.com/forum/discussion/comment/1702

Version-Release number of selected component (if applicable):

>> rpm -qa |grep openstack
openstack-nova-compute-2014.1-0.5.b1.fc21.noarch
openstack-nova-novncproxy-2014.1-0.5.b1.fc21.noarch
openstack-ceilometer-alarm-2014.1-0.2.b1.fc21.noarch
openstack-keystone-2014.1-0.2.b1.fc21.noarch
openstack-utils-2013.2-2.fc20.noarch
openstack-ceilometer-common-2014.1-0.2.b1.fc21.noarch
openstack-packstack-2013.2.1-0.27.dev936.fc21.noarch
openstack-nova-common-2014.1-0.5.b1.fc21.noarch
openstack-nova-cert-2014.1-0.5.b1.fc21.noarch
openstack-ceilometer-central-2014.1-0.2.b1.fc21.noarch
openstack-ceilometer-compute-2014.1-0.2.b1.fc21.noarch
openstack-neutron-2014.1-0.1.b1.fc21.noarch
openstack-ceilometer-collector-2014.1-0.2.b1.fc21.noarch
python-django-openstack-auth-1.1.3-1.fc20.noarch
openstack-nova-console-2014.1-0.5.b1.fc21.noarch
openstack-ceilometer-api-2014.1-0.2.b1.fc21.noarch
openstack-dashboard-2014.1-0.1b1.fc21.noarch
openstack-glance-2014.1-0.1.b1.fc21.noarch
openstack-cinder-2014.1-0.2.b1.fc21.noarch
openstack-nova-conductor-2014.1-0.5.b1.fc21.noarch
openstack-nova-scheduler-2014.1-0.5.b1.fc21.noarch
openstack-neutron-openvswitch-2014.1-0.1.b1.fc21.noarch
openstack-nova-api-2014.1-0.5.b1.fc21.noarch

How reproducible:

At least twice so far


Steps to Reproduce:
1 Follow these steps: http://openstack.redhat.com/Using_GRE_Tenant_Networks and use packstack to install RDO.
2. Then create an instance, add a network and add a second node (as per description and links above)
3. ( Unsure if the XVLAN step above is even required)
4. Try to delete the network via Horizon
5. If there is an error returned, try delete the running instance

Actual results:

The instance hangs in 'Deleting' state

Expected results:

The instance and then then network should be deleted

Additional info:

Comment 1 Ronelle Landy 2014-01-08 16:02:11 UTC

Additional note:

The instance was created with: m1.medium | 4GB RAM | 2 VCPU | 40.0GB Disk.

Had a second report of a test day tester not being able to delete an m1.medium instance (m1.tiny worked)

Comment 2 Kashyap Chamarthy 2014-01-09 07:52:08 UTC

Ronelle, can you please also add Nova api.log from Controller node; and Nova compute.log from your Compute node?

Comment 3 Ronelle Landy 2014-01-09 09:15:09 UTC

Created attachment 847541 [details]
api.log

Comment 4 Ronelle Landy 2014-01-09 09:20:52 UTC

Created attachment 847543 [details]
compute.log

Comment 5 Kashyap Chamarthy 2014-09-01 13:19:27 UTC

A gentle note for next time: please upload contextual plain text logs when you're hitting the errors (instead of more than 15 MB of log files). Something like:

  (a) Empty out the relevant log files, e.g. $ > /var/log/nova/api.log
  (b) Perform the offending test
  (c) Upload the (plain text) log files, which would capture 
      just enough context.


That said, I took a look at the api.log, I see the below:
-----------------
2014-01-08 14:34:53.860 3534 DEBUG keystoneclient.middleware.auth_token [-] Token validation failure. _validate_user_token /usr/lib/python2.7/site-packages/keystoneclient/middleware/auth_token.py:820
2014-01-08 14:34:53.860 3534 TRACE keystoneclient.middleware.auth_token Traceback (most recent call last):
2014-01-08 14:34:53.860 3534 TRACE keystoneclient.middleware.auth_token   File "/usr/lib/python2.7/site-packages/keystoneclient/middleware/auth_token.py", line 812, in _validate_user_token
2014-01-08 14:34:53.860 3534 TRACE keystoneclient.middleware.auth_token     expires = confirm_token_not_expired(data)
2014-01-08 14:34:53.860 3534 TRACE keystoneclient.middleware.auth_token   File "/usr/lib/python2.7/site-packages/keystoneclient/middleware/auth_token.py", line 333, in confirm_token_not_expired
2014-01-08 14:34:53.860 3534 TRACE keystoneclient.middleware.auth_token     raise InvalidUserToken('Token authorization failed')
2014-01-08 14:34:53.860 3534 TRACE keystoneclient.middleware.auth_token InvalidUserToken: Token authorization failed
2014-01-08 14:34:53.860 3534 TRACE keystoneclient.middleware.auth_token 
2014-01-08 14:34:53.861 3534 DEBUG keystoneclient.middleware.auth_token [-] Marking token 6d5b434481287a617655c46c5b7f7c7d as unauthorized in memcache _cache_store_invalid /usr/lib/python2.7/site-packages/keystoneclient/middleware/auth_token.py:1068
2014-01-08 14:34:53.861 3534 WARNING keystoneclient.middleware.auth_token [-] Authorization failed for token 6d5b434481287a617655c46c5b7f7c7d
2014-01-08 14:34:53.861 3534 INFO keystoneclient.middleware.auth_token [-] Invalid user token - rejecting request
2014-01-08 14:34:53.862 3534 INFO nova.osapi_compute.wsgi.server [req-aa58d3d9-6311-4503-adc5-f8ef31a1f88d 434c50d000f4418ea67c6e5a36681e84 b98a2045e57f4160946583856f762ae4] 10.16.96.113 "GET /v2/b98a2045e57f4160946583856f762ae4/servers/detail?host=cloud-qe-2.idm.lab.bos.redhat.com&all_tenants=True HTTP/1.1" status: 401 len: 195 time: 0.0410540
-----------------

But, I couldn't extract your compute.tar, when I untar it, the resulting compute.log also ended up being a tarball, and when I try untar it again, it looks corrupted. (A nice demonstration why just plain text files with contextual logs are more helpful.)

* * *

And, it's been more than six months this was tested with IceHouse milestole-1 pacakges. I wonder if you're still seeing this behavior with current stable IceHouse packages on Fedora? (latest stable update packages: 2014.1.2)

Comment 6 Kashyap Chamarthy 2015-02-18 10:42:49 UTC

[Ping, bug triaging here. It's been in NEEDINFO for 5 months. If there's no response from reporter in two weeks, this bug will be closed with INSUFFICIENT_DATA. But, the reporter can reopen the bug if the bug is reproducible again.]

Comment 7 Ronelle Landy 2015-08-10 14:24:16 UTC

(In reply to Kashyap Chamarthy from comment #6)
> [Ping, bug triaging here. It's been in NEEDINFO for 5 months. If there's no
> response from reporter in two weeks, this bug will be closed with
> INSUFFICIENT_DATA. But, the reporter can reopen the bug if the bug is
> reproducible again.]

This BZ was closed correctly. Apologies for late reply. Works in current release

Note You need to log in before you can comment on or make changes to this bug.