This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1267013 - Instances stuck in build and deleting state
Instances stuck in build and deleting state
Status: CLOSED NOTABUG
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova (Show other bugs)
6.0 (Juno)
Unspecified Unspecified
unspecified Severity urgent
: ---
: 6.0 (Juno)
Assigned To: Sylvain Bauza
nlevinki
: Unconfirmed, ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-28 15:47 EDT by Jeremy
Modified: 2015-10-27 16:40 EDT (History)
19 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-10-27 16:40:33 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jeremy 2015-09-28 15:47:01 EDT
Description of problem:
Instances stuck in build and deleting state


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
 [root@osp1-controller01 ~(openstack_admin)]# nova show a1b32b1b-7cff-43c9-9ace-9070e815f6b6                                                    
+--------------------------------------+--------------------------------------------------------------+
| Property                             | Value                                                        |
+--------------------------------------+--------------------------------------------------------------+
| OS-DCF:diskConfig                    | AUTO                                                         |
| OS-EXT-AZ:availability_zone          | nova                                                         |
| OS-EXT-SRV-ATTR:host                 | osp1-compute04.osp.poc                                       |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | osp1-compute04.osp.poc                                       |
| OS-EXT-SRV-ATTR:instance_name        | instance-000063fc                                            |
| OS-EXT-STS:power_state               | 1                                                            |
| OS-EXT-STS:task_state                | deleting                                                     |
| OS-EXT-STS:vm_state                  | deleted                                                      |
| OS-SRV-USG:launched_at               | 2015-09-28T12:34:10.000000                                   |
| OS-SRV-USG:terminated_at             | 2015-09-28T15:34:49.000000                                   |
| accessIPv4                           |                                                              |
| accessIPv6                           |                                                              |
| config_drive                         |                                                              |
| created                              | 2015-09-28T12:33:39Z                                         |
| flavor                               | m1.small (2)                                                 |
| heanet-default network               | 192.168.0.215                                                |
| hostId                               | 2bbb88190c64b3f52ab02591cd9c7ac7cf589412dc6f4f00382f85a4     |
| id                                   | a1b32b1b-7cff-43c9-9ace-9070e815f6b6                         |
| image                                | au-file-download-test (d9369a32-4eb9-4758-8379-7b42ff27391a) |
| key_name                             | au-default                                                   |
| metadata                             | {}                                                           |
| name                                 | au-kernel-test-a1b32b1b-7cff-43c9-9ace-9070e815f6b6          |
| os-extended-volumes:volumes_attached | []                                                           |
| status                               | DELETED                                                      |
| tenant_id                            | dbc9c4efbd364687bb6e8ab4fe7b2fd2                             |
| updated                              | 2015-09-28T15:35:35Z                                         |
| user_id                              | 1a25f8911ce048c1afc855a15c7cdcf5                             |
+--------------------------------------+--------------------------------------------------------------+

Expected results:


Additional info:
Comment 9 John Eckersberg 2015-10-01 10:22:04 EDT
Check the output from...

# cat /proc/sys/net/ipv4/tcp_keepalive_*

and

# grep keepalive /etc/rabbitmq/rabbitmq.config

I'm assuming TCP keepalives are not enabled and the connections are just timing out due to inactivity.
Comment 19 Andrew Beekhof 2015-10-04 20:07:35 EDT
Pacemaker's not doing jack.

The new logs start at Sep 29 17:18:59 and we don't do anything until one of the galera instances fails almost a day later (which is well after the connection errors discussed in comment #17):

Sep 30 16:39:12 osp1-controller01 pengine[5416]: warning: unpack_rsc_op_failure: Processing failed op monitor for galera:1 on osp1-controller01: not running (7)
Sep 30 16:39:12 osp1-controller01 pengine[5416]: notice: LogActions: Recover galera:1	(Master osp1-controller01)

What makes you think pacemaker is bouncing resources around?
Comment 27 Dave Maley 2015-10-27 16:40:33 EDT
As we have not heard anything back from the customer for nearly 3 weeks I'm closing this bug.  Please re-open if the customer comes back w/ the requested information.

Note You need to log in before you can comment on or make changes to this bug.