Bug 1751595 - Unable to connect to AMQP server on HOSTNAME:5672 after None tries: 'NoneType' object has no attribute '__getitem__'
Summary: Unable to connect to AMQP server on HOSTNAME:5672 after None tries: 'NoneType...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-amqp
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: z8
: 14.0 (Rocky)
Assignee: RHOS Maint
QA Contact: pkomarov
URL:
Whiteboard:
Depends On: 1747226
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-12 08:34 UTC by Hervé Beraud
Modified: 2020-12-21 19:34 UTC (History)
15 users (show)

Fixed In Version: python-amqp-2.3.2-5.el7ost
Doc Type: No Doc Update
Doc Text:
Clone Of: 1747226
Environment:
Last Closed: 2019-11-06 16:53:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1747226 0 urgent CLOSED Unable to connect to AMQP server on HOSTNAME:5672 after None tries: 'NoneType' object has no attribute '__getitem__' 2023-09-18 00:17:15 UTC
Red Hat Product Errata RHBA-2019:3747 0 None None None 2019-11-06 16:54:14 UTC

Description Hervé Beraud 2019-09-12 08:34:26 UTC
Description of problem:
While trying to create a large number of VM's, some of them are getting stuck in BUILD state. The only relevant error we can see is coming from nova-conductor and is:

Unable to connect to AMQP server on HOSTNAME:5672 after None tries: 'NoneType' object has no attribute '__getitem__'

A restart of all Nova services seems to have resolved this issue. However, we had a repeat over night and saw the same errors in the heat-engine.log.

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 13.0.7 (Queens)
python2-oslo-messaging-5.35.4-1.el7ost.noarch

How reproducible:
Tough to reproduce.

Steps to Reproduce:
 Initially, before restarting Nova, we could reproduce it by running:

for i in {10..60}; do openstack server create --image a3806e23-1df5-40b0-baf3-f78ef7b1bf4c --nic net-id=54d8b69b-e8a0-4ce7-87ca-37b34fc368a2 --flavor VM1-flavor --availability-zone nova:compute-$i; done

If we added a --wait in there, it worked a lot better. But we still observed 1 or two failures when doing it that way.

Actual results:
Some Instances would get stuck in BUILD state and not move.

Expected results:
Instances would complete or ERROR

Additional info:

After restarting all nova_* containers, we were able to execute:

for i in {10..100}; do openstack server create --image a3806e23-1df5-40b0-baf3-f78ef7b1bf4c --nic net-id=54d8b69b-e8a0-4ce7-87ca-37b34fc368a2 --flavor VM1-flavor; done

And have all instances build successfully.

Comment 1 Hervé Beraud 2019-09-12 08:43:13 UTC
This BZ is a clone of https://bugzilla.redhat.com/show_bug.cgi?id=1747226

I created this one to cross tags packages between OSP13 and OSP14 as if it is already the case for python-amqp.

For further reading about this bug please take a look to comments of https://bugzilla.redhat.com/show_bug.cgi?id=1747226

My py-amqp fix is now merged [1].

python-amqp for OSP13 and OSP14 are cross tagged so I'll duplicate this BZ to backport my fix via OSP14 and then cross tag the version with OSP13 in a second time.

As described in the title of this BZ the main topic of this BZ is to address the python-amqp issue fixed by my patch [1], else for hotfix the SSL part I already described how to hotfix that in my previous comments (cf. https://bugzilla.redhat.com/show_bug.cgi?id=1747226#c16 and https://bugzilla.redhat.com/show_bug.cgi?id=1733930#c29).

If you want to hotfix the customer env with my py-amqp patch without bump the package version then you need to follow the instructions below.

How to hotfix the "NoneType __getitem__" part
=============================================

Apply the following patch to `/usr/lib/python2.7/site-packages/amqp/connection.py`:

```
500,502c500,504
<         return self.channels[channel_id].dispatch_method(
<             method_sig, payload, content,
<         )
---
>       if self.channels is not None:
>             return self.channels[channel_id].dispatch_method(
>                 method_sig, payload, content,
>             )
>       raise RecoverableConnectionError('Connection already closed')
```

After applying the patch your file should be similar to:

```
$ vi /usr/lib/python2.7/site-packages/amqp/connection.py +499
   def on_inbound_method(self, channel_id, method_sig, payload, content):
       if self.channels is not None:
             return self.channels[channel_id].dispatch_method(
                 method_sig, payload, content,
             )
       raise RecoverableConnectionError('Connection already closed')
```

To apply this patch to all your containers you can follow the same approach that I described in my comment https://bugzilla.redhat.com/show_bug.cgi?id=1747226#c16 (cf. the docker part etc...).

I think you need to patch all your services who use python-amqp on controlers and computes since this kind of issue can occur after a network issue and so all the services can be impacted by (cf. your logs with nova, heat, neutron, etc...).

[1] https://github.com/celery/py-amqp/pull/289

Comment 2 Hervé Beraud 2019-09-12 09:11:43 UTC
Fixed in version python-amqp-2.3.2-5.el7ost
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=23466544

Comment 6 pkomarov 2019-10-23 23:06:37 UTC
Verified , 

(undercloud) [stack@undercloud-0 ~]$ rhos-release -L
Installed repositories (rhel-7.7):
  14
  ceph-3
  ceph-osd-3
  rhel-7.7
(undercloud) [stack@undercloud-0 ~]$ cat core_puddle_version 
2019-10-21.1(undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep amqp
python2-amqp-2.3.2-5.el7ost.noarch

more than 60 vm's were created , the computes were hard rebooted while 
vms were being created , and the env didn't show the bug characteristics:

(overcloud) [stack@undercloud-0 ~]$ ansible controller -mshell -b -a'grep -ir NoneType /var/log/containers/nova/||echo "no NoneType errors found in nova"'
 [WARNING]: Found both group and host with same name: undercloud

controller-1 | SUCCESS | rc=0 >>
no NoneType errors found in nova

controller-2 | SUCCESS | rc=0 >>
no NoneType errors found in nova

controller-0 | SUCCESS | rc=0 >>
no NoneType errors found in nova

#test instance creation with disruptions
$ for i in {1..86}; do openstack server create --image cirros --flavor m1.nano compute-$i; done 

meanwhile ssh to computes and do echo b >/proc/sysrq-trigger 

check number of created instance :
(overcloud) [stack@undercloud-0 ~]$ nova list|wc -l
86

(overcloud) [stack@undercloud-0 ~]$ nova list|grep BUILD||echo "no instances stuck in BUILD mode"
no instances stuck in BUILD mode

(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+------------+--------+------------+-------------+----------+
| ID                                   | Name       | Status | Task State | Power State | Networks |
+--------------------------------------+------------+--------+------------+-------------+----------+
| 0a25b50c-9482-42fa-92e2-01fc3939e83a | compute-10 | ACTIVE | -          | Running     |          |
| 2f99780e-aba7-43b0-950e-c865b9830bb2 | compute-10 | ACTIVE | -          | Running     |          |
| 7dd82f33-4e9e-4427-acbe-2ba4708322e0 | compute-10 | ACTIVE | -          | Running     |          |
| e9046cec-55b7-44fe-a429-b900502b97c9 | compute-10 | ACTIVE | -          | Running     |          |
| 22db217d-0f02-4b6e-8fa7-4f0af999717b | compute-11 | ACTIVE | -          | Running     |          |
| 8d5fcc35-6d96-45e8-8ac4-87940235eb8e | compute-11 | ACTIVE | -          | Running     |          |
| 9972df49-bf73-4bb0-861a-fc26d0409cfb | compute-11 | ERROR  | -          | NOSTATE     |          |
| b1b7fffb-7494-4e03-b076-0e52646b6fe0 | compute-11 | ACTIVE | -          | Running     |          |
| 0e439eae-5207-48ab-9b10-2f68c05fa6fc | compute-12 | ACTIVE | -          | Running     |          |
| 682d8532-e7ba-455f-9172-5e74946876d0 | compute-12 | ACTIVE | -          | Running     |          |
| aaeb6364-e75e-439c-8839-93044141f82f | compute-12 | ACTIVE | -          | Running     |          |
| d01d3f3b-4ccf-4af3-b400-31e91f10987a | compute-12 | ACTIVE | -          | Running     |          |
| 2f69751f-723d-4300-a2a2-2bf17c568931 | compute-13 | ACTIVE | -          | Running     |          |
| 5cd5c640-19a1-4f9d-ae78-99be667c6cf1 | compute-13 | ACTIVE | -          | Running     |          |
| 90b2f0c2-d767-4bfa-b6f8-05ba269a3d82 | compute-13 | ACTIVE | -          | Running     |          |
| cf5088cb-2eed-40a8-adec-0596092d3583 | compute-13 | ACTIVE | -          | Running     |          |
| 2b9ce7bb-ac0a-447c-8741-bca3e7363820 | compute-14 | ACTIVE | -          | Running     |          |
| 85ad57af-531d-4b26-805f-0133344fd870 | compute-14 | ACTIVE | -          | Running     |          |
| a447621d-8f6b-47b0-935e-4001b82fcd9c | compute-14 | ACTIVE | -          | Running     |          |
| eeea258c-3b05-4b25-8db4-e1912bcbf234 | compute-14 | ACTIVE | -          | Running     |          |
| 07db7ca9-0f13-4cc7-91ba-f3d198632fae | compute-15 | ACTIVE | -          | Running     |          |
| a550d027-c3d0-46ca-aac6-f54c36156acb | compute-15 | ACTIVE | -          | Running     |          |
| b0002e0b-97d3-44df-93ec-5012a15d06e3 | compute-15 | ACTIVE | -          | Running     |          |
| b24b6faa-5f54-4133-aa3b-043133df8afe | compute-15 | ACTIVE | -          | Running     |          |
| 120361e6-8729-484f-bbff-c32668f26402 | compute-16 | ACTIVE | -          | Running     |          |
| 2ae4d1c5-976e-424d-a1fd-0c32da007184 | compute-16 | ACTIVE | -          | Running     |          |
| 8c157eff-1cd1-4d3a-937b-7fef62c10502 | compute-16 | ACTIVE | -          | Running     |          |
| aa15b5ad-6185-4986-866c-c808c19b741c | compute-16 | ACTIVE | -          | Running     |          |
| 89f68275-3099-4bda-bf17-b01b4104cd85 | compute-17 | ACTIVE | -          | Running     |          |
| 8f8fb2b7-a927-4058-90ac-c1c9e5e01cc3 | compute-17 | ACTIVE | -          | Running     |          |
| b2cfad18-5fc2-46d0-8739-c5eabb756115 | compute-17 | ACTIVE | -          | Running     |          |
| e40f14fb-5ee1-491a-957a-1e3e2cf17e73 | compute-17 | ACTIVE | -          | Running     |          |
| 21a3f47b-87e7-4e5b-b8a3-d4f529eefdf6 | compute-18 | ACTIVE | -          | Running     |          |
| 2c44ce44-edc8-4f3d-9b9f-30d8c9fc0125 | compute-18 | ACTIVE | -          | Running     |          |
| 45240480-1c8a-450c-974a-0647dc6ef9ff | compute-18 | ACTIVE | -          | Running     |          |
| 4df70bf2-6e40-49fc-8bce-d929059e6837 | compute-18 | ACTIVE | -          | Running     |          |
| 37c3b44a-4be4-475f-a28c-52d81c224299 | compute-19 | ACTIVE | -          | Running     |          |
| e6d43b7c-e0d3-43bd-a0d5-556c4773f51d | compute-19 | ACTIVE | -          | Running     |          |
| f0f0a66e-0153-42b5-b3fa-7c976a78d8f7 | compute-19 | ACTIVE | -          | Running     |          |
| f725b389-00c0-4189-aa92-0bb0b7a3817b | compute-19 | ACTIVE | -          | Running     |          |
| 290430f6-6147-4ae9-97a8-1438627a6769 | compute-20 | ACTIVE | -          | Running     |          |
| 5024727c-9e58-40ec-924d-d50af8a4a42b | compute-20 | ACTIVE | -          | Running     |          |
| 9fbfaed3-ad8d-4580-9d34-ab40f9f66188 | compute-21 | ACTIVE | -          | Running     |          |
| bd893c95-4922-467f-a976-88ec37e08773 | compute-21 | ACTIVE | -          | Running     |          |
| 8ed1dd80-310d-4560-99d2-01876c9bc1b8 | compute-22 | ACTIVE | -          | Running     |          |
| cfa2a0a2-5e64-41bf-af88-f4587fb6ec3e | compute-22 | ACTIVE | -          | Running     |          |
| 4f487283-7c5f-41d8-a859-897ebcfeb05d | compute-23 | ACTIVE | -          | Running     |          |
| 6db33352-b91b-4128-8bb3-b77f1d7590f9 | compute-23 | ACTIVE | -          | Running     |          |
| 941b56e3-75e6-4fe4-a490-888307ddfe1d | compute-24 | ACTIVE | -          | Running     |          |
| bccc0273-8395-4624-a226-294dacae2a39 | compute-24 | ACTIVE | -          | Running     |          |
| 39a8d9bb-e5ec-4f45-8031-373256d3289c | compute-25 | ACTIVE | -          | Running     |          |
| fe771534-8bf0-4da8-9060-048f3177c09c | compute-25 | ACTIVE | -          | Running     |          |
| 097ed0e9-021d-47c3-ae33-7b12c47ff93f | compute-26 | ACTIVE | -          | Running     |          |
| ec0b2521-d740-4e37-9925-de6314f37b18 | compute-26 | ACTIVE | -          | Running     |          |
| 47643512-4f96-43e1-9d3f-b5591285ed50 | compute-27 | ACTIVE | -          | Running     |          |
| ec0bb841-c526-4bd6-9d37-94fda5daa822 | compute-27 | ACTIVE | -          | Running     |          |
| 07d1dce9-5b68-43f5-8b28-b10aa75f9ff4 | compute-28 | ACTIVE | -          | Running     |          |
| 6459bbc2-ad4f-4ea6-9dc8-3e3f13e73755 | compute-28 | ACTIVE | -          | Running     |          |
| 6413ec10-6364-47cf-8a01-2957cfc6f446 | compute-29 | ACTIVE | -          | Running     |          |
| 87c0ef18-f758-405b-b4a9-5ae6c7abb2ee | compute-29 | ACTIVE | -          | Running     |          |
| 94868ba0-296b-4ef4-86f9-f8ff9bffbc31 | compute-30 | ACTIVE | -          | Running     |          |
| ec3b7fdb-614e-4f23-bc4b-eba57d176e3d | compute-30 | ACTIVE | -          | Running     |          |
| 47ada1bb-84e7-47ef-9655-e41c721edcea | compute-31 | ACTIVE | -          | Running     |          |
| 9c7d7efe-6326-4f34-9c94-3f89cc05cdae | compute-31 | ACTIVE | -          | Running     |          |
| 58b66775-9d3a-43e4-a981-6dc3e82714ef | compute-32 | ACTIVE | -          | Running     |          |
| 6175d213-8a7e-40a3-8fa1-3a0727e19a9f | compute-32 | ACTIVE | -          | Running     |          |
| 4b1c4c94-dba1-4cf9-801a-da781967ad57 | compute-33 | ACTIVE | -          | Running     |          |
| c5784af8-b2ca-4f23-9d0b-4ba162cf725c | compute-33 | ACTIVE | -          | Running     |          |
| 242c1e08-5e00-465a-b369-5e801246f4da | compute-34 | ACTIVE | -          | Running     |          |
| e7062ec4-d16e-445c-9daa-1d628a49d87b | compute-34 | ACTIVE | -          | Running     |          |
| 4556ba64-7853-4b8b-99cb-3b84f521375a | compute-35 | ACTIVE | -          | Running     |          |
| 5ba6f46f-3b87-4512-80b4-0a289e19dbe3 | compute-35 | ACTIVE | -          | Running     |          |
| 6e337292-7db6-40f9-b2fe-f8a9364cd8ae | compute-36 | ACTIVE | -          | Running     |          |
| 88f39712-9fd1-4424-abed-af37997c2557 | compute-36 | ACTIVE | -          | Running     |          |
| b4dbcfae-ac32-4c71-8552-fb572235c34e | compute-37 | ACTIVE | -          | Running     |          |
| da00a078-2cbf-4744-a45e-98c2feaef70d | compute-37 | ACTIVE | -          | Running     |          |
| 255db0e3-85e7-41e5-a8a1-b4b330111664 | compute-38 | ACTIVE | -          | Running     |          |
| 4e7386e2-4bfa-41a9-b05a-5497ace37219 | compute-38 | ACTIVE | -          | Running     |          |
| 8d2c9691-8e93-4b06-91ed-71498e3a38e4 | compute-39 | ACTIVE | -          | Running     |          |
| da2e98aa-fb63-49d9-8933-2b69a290663a | compute-39 | ACTIVE | -          | Running     |          |
| 17b7b048-1cf3-46ba-87d7-de14d8c80420 | compute-40 | ACTIVE | -          | Running     |          |
| 910910c6-64c8-402b-a91a-4a6266a923e5 | compute-40 | ACTIVE | -          | Running     |          |
+--------------------------------------+------------+--------+------------+-------------+----------+

Comment 8 errata-xmlrpc 2019-11-06 16:53:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3747


Note You need to log in before you can comment on or make changes to this bug.