From upstream bug report: on ROCKY promotion pipeline, FS020 rocky periodic job is failing constantly while running neutron tempest tests and returning following errors: https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-rocky/8a335fb/logs/undercloud/home/zuul/tempest.log.txt.gz#_2018-09-20_05_02_51 {1} neutron_tempest_plugin.api.test_security_groups.SecGroupProtocolIPv6Test.test_create_security_group_rule_with_ipv6_protocol_integers [120.552979s] ... FAILED 2018-09-20 05:02:51 | 2018-09-20 05:02:51 | Captured traceback: 2018-09-20 05:02:51 | ~~~~~~~~~~~~~~~~~~~ 2018-09-20 05:02:51 | Traceback (most recent call last): 2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/neutron_tempest_plugin/api/test_security_groups.py", line 118, in test_create_security_group_rule_with_ipv6_protocol_integers 2018-09-20 05:02:51 | group_create_body, _ = self._create_security_group() 2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/neutron_tempest_plugin/api/base_security_groups.py", line 69, in _create_security_group 2018-09-20 05:02:51 | **kwargs) 2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/neutron_tempest_plugin/services/network/json/network_client.py", line 150, in _create 2018-09-20 05:02:51 | resp, body = self.post(uri, post_data) 2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 279, in post 2018-09-20 05:02:51 | return self.request('POST', url, extra_headers, headers, body, chunked) 2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 670, in request 2018-09-20 05:02:51 | self._error_checker(resp, resp_body) 2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 851, in _error_checker 2018-09-20 05:02:51 | resp=resp) 2018-09-20 05:02:51 | tempest.lib.exceptions.UnexpectedResponseCode: Unexpected response code received 2018-09-20 05:02:51 | Details: 504 2018-09-20 05:02:51 | 2018-09-20 05:02:51 | 2018-09-20 05:02:51 | Captured pythonlogging: 2018-09-20 05:02:51 | ~~~~~~~~~~~~~~~~~~~~~~~ 2018-09-20 05:02:51 | 2018-09-20 05:00:51,976 75 INFO [tempest.lib.common.rest_client] Request (SecGroupProtocolIPv6Test:test_create_security_group_rule_with_ipv6_protocol_integers): 201 POST http://192.168.24.12:5000/v3/auth/tokens 2018-09-20 05:02:51 | 2018-09-20 05:00:51,976 75 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json'} 2018-09-20 05:02:51 | Body: <omitted> 2018-09-20 05:02:51 | Response - Headers: {'status': '201', u'content-length': '9380', 'content-location': 'http://192.168.24.12:5000/v3/auth/tokens', u'x-subject-token': '<omitted>', u'vary': 'X-Auth-Token', u'server': 'Apache', u'connection': 'close', u'date': 'Thu, 20 Sep 2018 05:00:51 GMT', u'content-type': 'application/json', u'x-openstack-request-id': 'req-bc763536-0c22-4767-ae59-2cbb82bf6261'} 2018-09-20 05:02:51 | Body: {"token": {"is_domain": false, "methods": ["password"], "roles": [{"id": "c3fb420a9a5b414db4346d945db0809e", "name": "reader"}, {"id": "01f6da1f673442388a7f8879aef1f020", "name": "member"}], "expires_at": "2018-09-20T06:00:51.000000Z", "project": {"domain": {"id": "default", "name": "Default"}, "id": "fbc37901ca074cd98064d1e8227f01ea", "name": "tempest-SecGroupProtocolIPv6Test-1358316931"}, "catalog": [{"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8041", "region": "regionOne", "interface": "public", "id": "7b0a60207f00400d81022894c33a63a3"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8041", "region": "regionOne", "interface": "admin", "id": "e1ff1d23b17a4e279d1c057522b6d311"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8041", "region": "regionOne", "interface": "internal", "id": "e3f618b7aeb048ca9d61ff0d5238b557"}], "type": "metric", "id": "1a267574906c46a7a380cccaa20923fe", "name": "gnocchi"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8778/placement", "region": "regionOne", "interface": "internal", "id": "094c337206274871b09a665e0462a86c"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8778/placement", "region": "regionOne", "interface": "admin", "id": "8196b8acabb045f88121fc2c30690c37"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8778/placement", "region": "regionOne", "interface": "public", "id": "e12fbffa549d4be28ff97e45d817caec"}], "type": "placement", "id": "2ec14d7bdc3049efba0d3f03577c0341", "name": "placement"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8004/v1/fbc37901ca074cd98064d1e8227f01ea", "region": "regionOne", "interface": "internal", "id": "18465ccc67a840ecbd348db31b40432a"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8004/v1/fbc37901ca074cd98064d1e8227f01ea", "region": "regionOne", "interface": "admin", "id": "a98972e7761c480c80c2fcd70da447d9"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8004/v1/fbc37901ca074cd98064d1e8227f01ea", "region": "regionOne", "interface": "public", "id": "f431475df19e4eae9f6ce334c646b3dc"}], "type": "orchestration", "id": "3c5e1617f4b74013a8582c0e57fda411", "name": "heat"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8977", "region": "regionOne", "interface": "admin", "id": "206dfb1b711d44c1ad31fa73fbd53153"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8977", "region": "regionOne", "interface": "internal", "id": "81a80af1789d4a20894acd94422272fd"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8977", "region": "regionOne", "interface": "public", "id": "fddfc3aa28ac43cd8ddd21c5f2427f97"}], "type": "event", "id": "3e58c2e452e643f2a5ec348558f37b01", "name": "panko"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:5000", "region": "regionOne", "interface": "internal", "id": "69f09773cc9d468d9470815ecbc59fa0"}, {"region_id": "regionOne", "url": "http://192.168.24.12:5000", "region": "regionOne", "interface": "public", "id": "c76cc752ef0d458195066335b3145934"}, {"region_id": "regionOne", "url": "http://192.168.24.12:35357", "region": "regionOne", "interface": "admin", "id": "f18ba99d410f49388cbb6a20c42dcedd"}], "type": "identity", "id": "3f6498f7065a458fbeb3e41fcc1f7015", "name": "keystone"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8042", "region": "regionOne", "interface": "admin", "id": "2a08cd22eb9143fd9e3026f19975e62a"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8042", "region": "regionOne", "interface": "public", "id": "d0b2cbf0338c43a990108df0280cab67"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8042", "region": "regionOne", "interface": "internal", "id": "d6bff66ad4db4228b9b3562cf69f54f8"}], "type": "alarming", "id": "41467ff4ee7342dea2e8991aede793ac", "name": "aodh"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8776/v2/fbc37901ca074cd98064d1e8227f01ea", "region": "regionOne", "interface": "internal", "id": "0bf9791e02864d53be0a7ced90b184b7"}, {"region_id": "regio 2018-09-20 05:02:51 | 2018-09-20 05:02:51,981 75 INFO [tempest.lib.common.rest_client] Request (SecGroupProtocolIPv6Test:test_create_security_group_rule_with_ipv6_protocol_integers): 504 POST http://192.168.24.12:9696/v2.0/security-groups 120.004s 2018-09-20 05:02:51 | 2018-09-20 05:02:51,982 75 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'} 2018-09-20 05:02:51 | Body: {"security_group": {"name": "tempest-secgroup--1489112465"}} 2018-09-20 05:02:51 | Response - Headers: {'status': '504', u'connection': 'close', u'content-type': 'text/html', 'content-location': 'http://192.168.24.12:9696/v2.0/security-groups', u'cache-control': 'no-cache'} 2018-09-20 05:02:51 | Body: <html><body><h1>504 Gateway Time-out</h1> 2018-09-20 05:02:51 | The server didn't respond in time. 2018-09-20 05:02:51 | </body></html> 2018-09-20 05:02:51 | Almost most of the neutron api tests are failing with same issue. I tried to find the issue but no luck, It need investigation ---------------------------- I was checking neutron logs from this job, and it looks for me that there was SIGHUP send to neutron-server: https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-rocky/8a335fb/logs/overcloud-controller-0/var/log/containers/neutron/server.log.txt.gz#_2018-09-20_05_00_50_628 After that wsgi workers were killed, like: https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-rocky/8a335fb/logs/overcloud-controller-0/var/log/containers/neutron/server.log.txt.gz#_2018-09-20_05_00_51_880 rpc workers were later restarted: https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-rocky/8a335fb/logs/overcloud-controller-0/var/log/containers/neutron/server.log.txt.gz#_2018-09-20_05_00_54_899 and neutron-server started processing rpc messages again but wsgi workers were not started again so there is no any http request in logs after this SIGHUP.
There is an upstream change for SIGHUP in containers that we should get downstream, linked the rocky backport. There is additional work to make neutron-server handle SIGHUP better, that is WIP. Bernard will continue work on that.
Merged in master, rocky backport in progress: https://review.openstack.org/#/c/604006/ Not merged yet in master: https://review.openstack.org/#/c/596274/ https://review.openstack.org/#/c/596275/
https://review.openstack.org/#/c/604006/ merged in rocky https://review.openstack.org/#/c/596274/ and https://review.openstack.org/#/c/596275/ both have W+1, waiting on upstream CI issues
https://review.openstack.org/#/c/604006/ merged in rocky https://review.openstack.org/#/c/596274/ merged in master, backport created at https://review.openstack.org/#/c/605349/ https://review.openstack.org/#/c/596275/ W+1 but failed gate, recheck in progress
*** Bug 1631334 has been marked as a duplicate of this bug. ***
https://review.openstack.org/#/c/604006/ and https://review.openstack.org/#/c/605349/ merged in rocky https://review.openstack.org/#/c/596275/ is W+1 but still fails gate, rechecks in progress
https://review.openstack.org/#/c/596275/ merged in master, backport created at https://review.openstack.org/#/c/606864/
*** Bug 1629659 has been marked as a duplicate of this bug. ***
Last patch was reverted in master with https://review.openstack.org/#/c/607052/ as it caused failures in CI. https://bugs.launchpad.net/tripleo/+bug/1795411 for details. Updating status to reflect that
Long story short: SIGHUP breaks mistral-engine and heat-engine, which are key deployment framework components. The WIP patches are to fall back to copytruncate instead of SIGHUP signals being sent.
*** Bug 1631490 has been marked as a duplicate of this bug. ***
I removed the previous patches links, and point to current proposed patch (copytruncate) in master.
*** Bug 1636034 has been marked as a duplicate of this bug. ***
stable/rocky backport merged
*** Bug 1640780 has been marked as a duplicate of this bug. ***
*** Bug 1640419 has been marked as a duplicate of this bug. ***
*** Bug 1641175 has been marked as a duplicate of this bug. ***
*** Bug 1636070 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045