Bug 1631334

Summary: WSGI workers not started again after SIGHUP signal
Product: Red Hat OpenStack Reporter: Slawek Kaplonski <skaplons>
Component: openstack-neutronAssignee: Assaf Muller <amuller>
Status: CLOSED DUPLICATE QA Contact: Roee Agiman <ragiman>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 14.0 (Rocky)CC: amuller, chrisw, nyechiel
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-27 14:13:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Slawek Kaplonski 2018-09-20 11:19:00 UTC
From upstream bug report:


on ROCKY promotion pipeline, FS020 rocky periodic job is failing constantly while running neutron tempest tests and returning following errors:
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-rocky/8a335fb/logs/undercloud/home/zuul/tempest.log.txt.gz#_2018-09-20_05_02_51

{1} neutron_tempest_plugin.api.test_security_groups.SecGroupProtocolIPv6Test.test_create_security_group_rule_with_ipv6_protocol_integers [120.552979s] ... FAILED
2018-09-20 05:02:51 |
2018-09-20 05:02:51 | Captured traceback:
2018-09-20 05:02:51 | ~~~~~~~~~~~~~~~~~~~
2018-09-20 05:02:51 | Traceback (most recent call last):
2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/neutron_tempest_plugin/api/test_security_groups.py", line 118, in test_create_security_group_rule_with_ipv6_protocol_integers
2018-09-20 05:02:51 | group_create_body, _ = self._create_security_group()
2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/neutron_tempest_plugin/api/base_security_groups.py", line 69, in _create_security_group
2018-09-20 05:02:51 | **kwargs)
2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/neutron_tempest_plugin/services/network/json/network_client.py", line 150, in _create
2018-09-20 05:02:51 | resp, body = self.post(uri, post_data)
2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 279, in post
2018-09-20 05:02:51 | return self.request('POST', url, extra_headers, headers, body, chunked)
2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 670, in request
2018-09-20 05:02:51 | self._error_checker(resp, resp_body)
2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 851, in _error_checker
2018-09-20 05:02:51 | resp=resp)
2018-09-20 05:02:51 | tempest.lib.exceptions.UnexpectedResponseCode: Unexpected response code received
2018-09-20 05:02:51 | Details: 504
2018-09-20 05:02:51 |
2018-09-20 05:02:51 |
2018-09-20 05:02:51 | Captured pythonlogging:
2018-09-20 05:02:51 | ~~~~~~~~~~~~~~~~~~~~~~~
2018-09-20 05:02:51 | 2018-09-20 05:00:51,976 75 INFO [tempest.lib.common.rest_client] Request (SecGroupProtocolIPv6Test:test_create_security_group_rule_with_ipv6_protocol_integers): 201 POST http://192.168.24.12:5000/v3/auth/tokens
2018-09-20 05:02:51 | 2018-09-20 05:00:51,976 75 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json'}
2018-09-20 05:02:51 | Body: <omitted>
2018-09-20 05:02:51 | Response - Headers: {'status': '201', u'content-length': '9380', 'content-location': 'http://192.168.24.12:5000/v3/auth/tokens', u'x-subject-token': '<omitted>', u'vary': 'X-Auth-Token', u'server': 'Apache', u'connection': 'close', u'date': 'Thu, 20 Sep 2018 05:00:51 GMT', u'content-type': 'application/json', u'x-openstack-request-id': 'req-bc763536-0c22-4767-ae59-2cbb82bf6261'}
2018-09-20 05:02:51 | Body: {"token": {"is_domain": false, "methods": ["password"], "roles": [{"id": "c3fb420a9a5b414db4346d945db0809e", "name": "reader"}, {"id": "01f6da1f673442388a7f8879aef1f020", "name": "member"}], "expires_at": "2018-09-20T06:00:51.000000Z", "project": {"domain": {"id": "default", "name": "Default"}, "id": "fbc37901ca074cd98064d1e8227f01ea", "name": "tempest-SecGroupProtocolIPv6Test-1358316931"}, "catalog": [{"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8041", "region": "regionOne", "interface": "public", "id": "7b0a60207f00400d81022894c33a63a3"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8041", "region": "regionOne", "interface": "admin", "id": "e1ff1d23b17a4e279d1c057522b6d311"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8041", "region": "regionOne", "interface": "internal", "id": "e3f618b7aeb048ca9d61ff0d5238b557"}], "type": "metric", "id": "1a267574906c46a7a380cccaa20923fe", "name": "gnocchi"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8778/placement", "region": "regionOne", "interface": "internal", "id": "094c337206274871b09a665e0462a86c"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8778/placement", "region": "regionOne", "interface": "admin", "id": "8196b8acabb045f88121fc2c30690c37"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8778/placement", "region": "regionOne", "interface": "public", "id": "e12fbffa549d4be28ff97e45d817caec"}], "type": "placement", "id": "2ec14d7bdc3049efba0d3f03577c0341", "name": "placement"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8004/v1/fbc37901ca074cd98064d1e8227f01ea", "region": "regionOne", "interface": "internal", "id": "18465ccc67a840ecbd348db31b40432a"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8004/v1/fbc37901ca074cd98064d1e8227f01ea", "region": "regionOne", "interface": "admin", "id": "a98972e7761c480c80c2fcd70da447d9"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8004/v1/fbc37901ca074cd98064d1e8227f01ea", "region": "regionOne", "interface": "public", "id": "f431475df19e4eae9f6ce334c646b3dc"}], "type": "orchestration", "id": "3c5e1617f4b74013a8582c0e57fda411", "name": "heat"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8977", "region": "regionOne", "interface": "admin", "id": "206dfb1b711d44c1ad31fa73fbd53153"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8977", "region": "regionOne", "interface": "internal", "id": "81a80af1789d4a20894acd94422272fd"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8977", "region": "regionOne", "interface": "public", "id": "fddfc3aa28ac43cd8ddd21c5f2427f97"}], "type": "event", "id": "3e58c2e452e643f2a5ec348558f37b01", "name": "panko"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:5000", "region": "regionOne", "interface": "internal", "id": "69f09773cc9d468d9470815ecbc59fa0"}, {"region_id": "regionOne", "url": "http://192.168.24.12:5000", "region": "regionOne", "interface": "public", "id": "c76cc752ef0d458195066335b3145934"}, {"region_id": "regionOne", "url": "http://192.168.24.12:35357", "region": "regionOne", "interface": "admin", "id": "f18ba99d410f49388cbb6a20c42dcedd"}], "type": "identity", "id": "3f6498f7065a458fbeb3e41fcc1f7015", "name": "keystone"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8042", "region": "regionOne", "interface": "admin", "id": "2a08cd22eb9143fd9e3026f19975e62a"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8042", "region": "regionOne", "interface": "public", "id": "d0b2cbf0338c43a990108df0280cab67"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8042", "region": "regionOne", "interface": "internal", "id": "d6bff66ad4db4228b9b3562cf69f54f8"}], "type": "alarming", "id": "41467ff4ee7342dea2e8991aede793ac", "name": "aodh"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8776/v2/fbc37901ca074cd98064d1e8227f01ea", "region": "regionOne", "interface": "internal", "id": "0bf9791e02864d53be0a7ced90b184b7"}, {"region_id": "regio
2018-09-20 05:02:51 | 2018-09-20 05:02:51,981 75 INFO [tempest.lib.common.rest_client] Request (SecGroupProtocolIPv6Test:test_create_security_group_rule_with_ipv6_protocol_integers): 504 POST http://192.168.24.12:9696/v2.0/security-groups 120.004s
2018-09-20 05:02:51 | 2018-09-20 05:02:51,982 75 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
2018-09-20 05:02:51 | Body: {"security_group": {"name": "tempest-secgroup--1489112465"}}
2018-09-20 05:02:51 | Response - Headers: {'status': '504', u'connection': 'close', u'content-type': 'text/html', 'content-location': 'http://192.168.24.12:9696/v2.0/security-groups', u'cache-control': 'no-cache'}
2018-09-20 05:02:51 | Body: <html><body><h1>504 Gateway Time-out</h1>
2018-09-20 05:02:51 | The server didn't respond in time.
2018-09-20 05:02:51 | </body></html>
2018-09-20 05:02:51 |

Almost most of the neutron api tests are failing with same issue.
I tried to find the issue but no luck, It need investigation


----------------------------

I was checking neutron logs from this job, and it looks for me that there was SIGHUP send to neutron-server:
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-rocky/8a335fb/logs/overcloud-controller-0/var/log/containers/neutron/server.log.txt.gz#_2018-09-20_05_00_50_628

After that wsgi workers were killed, like:
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-rocky/8a335fb/logs/overcloud-controller-0/var/log/containers/neutron/server.log.txt.gz#_2018-09-20_05_00_51_880

rpc workers were later restarted: https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-rocky/8a335fb/logs/overcloud-controller-0/var/log/containers/neutron/server.log.txt.gz#_2018-09-20_05_00_54_899

and neutron-server started processing rpc messages again but wsgi workers were not started again so there is no any http request in logs after this SIGHUP.

Comment 1 Bernard Cafarelli 2018-09-27 14:13:00 UTC
Duplicated at creation I guess, keeping the one with more content open

*** This bug has been marked as a duplicate of bug 1631335 ***