Bug 1631335 - WSGI workers not started again after SIGHUP signal
Summary: WSGI workers not started again after SIGHUP signal
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: beta
: 14.0 (Rocky)
Assignee: Bernard Cafarelli
QA Contact: Roee Agiman
URL:
Whiteboard:
: 1629659 1631334 1631490 1636034 1636070 1640419 1640780 (view as bug list)
Depends On:
Blocks: 1629449
TreeView+ depends on / blocked
 
Reported: 2018-09-20 11:19 UTC by Slawek Kaplonski
Modified: 2022-03-13 15:44 UTC (History)
18 users (show)

Fixed In Version: puppet-tripleo-9.3.1-0.20181001112252.a6eaab1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-11 11:53:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1793482 0 None None None 2018-09-20 11:19:21 UTC
OpenStack gerrit 608594 0 None MERGED Copytruncate containerized logrotate configuration 2021-01-25 11:40:24 UTC
Red Hat Issue Tracker OSP-13658 0 None None None 2022-03-13 15:44:07 UTC
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:53:20 UTC

Internal Links: 1792820

Description Slawek Kaplonski 2018-09-20 11:19:22 UTC
From upstream bug report:


on ROCKY promotion pipeline, FS020 rocky periodic job is failing constantly while running neutron tempest tests and returning following errors:
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-rocky/8a335fb/logs/undercloud/home/zuul/tempest.log.txt.gz#_2018-09-20_05_02_51

{1} neutron_tempest_plugin.api.test_security_groups.SecGroupProtocolIPv6Test.test_create_security_group_rule_with_ipv6_protocol_integers [120.552979s] ... FAILED
2018-09-20 05:02:51 |
2018-09-20 05:02:51 | Captured traceback:
2018-09-20 05:02:51 | ~~~~~~~~~~~~~~~~~~~
2018-09-20 05:02:51 | Traceback (most recent call last):
2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/neutron_tempest_plugin/api/test_security_groups.py", line 118, in test_create_security_group_rule_with_ipv6_protocol_integers
2018-09-20 05:02:51 | group_create_body, _ = self._create_security_group()
2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/neutron_tempest_plugin/api/base_security_groups.py", line 69, in _create_security_group
2018-09-20 05:02:51 | **kwargs)
2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/neutron_tempest_plugin/services/network/json/network_client.py", line 150, in _create
2018-09-20 05:02:51 | resp, body = self.post(uri, post_data)
2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 279, in post
2018-09-20 05:02:51 | return self.request('POST', url, extra_headers, headers, body, chunked)
2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 670, in request
2018-09-20 05:02:51 | self._error_checker(resp, resp_body)
2018-09-20 05:02:51 | File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 851, in _error_checker
2018-09-20 05:02:51 | resp=resp)
2018-09-20 05:02:51 | tempest.lib.exceptions.UnexpectedResponseCode: Unexpected response code received
2018-09-20 05:02:51 | Details: 504
2018-09-20 05:02:51 |
2018-09-20 05:02:51 |
2018-09-20 05:02:51 | Captured pythonlogging:
2018-09-20 05:02:51 | ~~~~~~~~~~~~~~~~~~~~~~~
2018-09-20 05:02:51 | 2018-09-20 05:00:51,976 75 INFO [tempest.lib.common.rest_client] Request (SecGroupProtocolIPv6Test:test_create_security_group_rule_with_ipv6_protocol_integers): 201 POST http://192.168.24.12:5000/v3/auth/tokens
2018-09-20 05:02:51 | 2018-09-20 05:00:51,976 75 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json'}
2018-09-20 05:02:51 | Body: <omitted>
2018-09-20 05:02:51 | Response - Headers: {'status': '201', u'content-length': '9380', 'content-location': 'http://192.168.24.12:5000/v3/auth/tokens', u'x-subject-token': '<omitted>', u'vary': 'X-Auth-Token', u'server': 'Apache', u'connection': 'close', u'date': 'Thu, 20 Sep 2018 05:00:51 GMT', u'content-type': 'application/json', u'x-openstack-request-id': 'req-bc763536-0c22-4767-ae59-2cbb82bf6261'}
2018-09-20 05:02:51 | Body: {"token": {"is_domain": false, "methods": ["password"], "roles": [{"id": "c3fb420a9a5b414db4346d945db0809e", "name": "reader"}, {"id": "01f6da1f673442388a7f8879aef1f020", "name": "member"}], "expires_at": "2018-09-20T06:00:51.000000Z", "project": {"domain": {"id": "default", "name": "Default"}, "id": "fbc37901ca074cd98064d1e8227f01ea", "name": "tempest-SecGroupProtocolIPv6Test-1358316931"}, "catalog": [{"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8041", "region": "regionOne", "interface": "public", "id": "7b0a60207f00400d81022894c33a63a3"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8041", "region": "regionOne", "interface": "admin", "id": "e1ff1d23b17a4e279d1c057522b6d311"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8041", "region": "regionOne", "interface": "internal", "id": "e3f618b7aeb048ca9d61ff0d5238b557"}], "type": "metric", "id": "1a267574906c46a7a380cccaa20923fe", "name": "gnocchi"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8778/placement", "region": "regionOne", "interface": "internal", "id": "094c337206274871b09a665e0462a86c"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8778/placement", "region": "regionOne", "interface": "admin", "id": "8196b8acabb045f88121fc2c30690c37"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8778/placement", "region": "regionOne", "interface": "public", "id": "e12fbffa549d4be28ff97e45d817caec"}], "type": "placement", "id": "2ec14d7bdc3049efba0d3f03577c0341", "name": "placement"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8004/v1/fbc37901ca074cd98064d1e8227f01ea", "region": "regionOne", "interface": "internal", "id": "18465ccc67a840ecbd348db31b40432a"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8004/v1/fbc37901ca074cd98064d1e8227f01ea", "region": "regionOne", "interface": "admin", "id": "a98972e7761c480c80c2fcd70da447d9"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8004/v1/fbc37901ca074cd98064d1e8227f01ea", "region": "regionOne", "interface": "public", "id": "f431475df19e4eae9f6ce334c646b3dc"}], "type": "orchestration", "id": "3c5e1617f4b74013a8582c0e57fda411", "name": "heat"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8977", "region": "regionOne", "interface": "admin", "id": "206dfb1b711d44c1ad31fa73fbd53153"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8977", "region": "regionOne", "interface": "internal", "id": "81a80af1789d4a20894acd94422272fd"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8977", "region": "regionOne", "interface": "public", "id": "fddfc3aa28ac43cd8ddd21c5f2427f97"}], "type": "event", "id": "3e58c2e452e643f2a5ec348558f37b01", "name": "panko"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:5000", "region": "regionOne", "interface": "internal", "id": "69f09773cc9d468d9470815ecbc59fa0"}, {"region_id": "regionOne", "url": "http://192.168.24.12:5000", "region": "regionOne", "interface": "public", "id": "c76cc752ef0d458195066335b3145934"}, {"region_id": "regionOne", "url": "http://192.168.24.12:35357", "region": "regionOne", "interface": "admin", "id": "f18ba99d410f49388cbb6a20c42dcedd"}], "type": "identity", "id": "3f6498f7065a458fbeb3e41fcc1f7015", "name": "keystone"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8042", "region": "regionOne", "interface": "admin", "id": "2a08cd22eb9143fd9e3026f19975e62a"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8042", "region": "regionOne", "interface": "public", "id": "d0b2cbf0338c43a990108df0280cab67"}, {"region_id": "regionOne", "url": "http://192.168.24.12:8042", "region": "regionOne", "interface": "internal", "id": "d6bff66ad4db4228b9b3562cf69f54f8"}], "type": "alarming", "id": "41467ff4ee7342dea2e8991aede793ac", "name": "aodh"}, {"endpoints": [{"region_id": "regionOne", "url": "http://192.168.24.12:8776/v2/fbc37901ca074cd98064d1e8227f01ea", "region": "regionOne", "interface": "internal", "id": "0bf9791e02864d53be0a7ced90b184b7"}, {"region_id": "regio
2018-09-20 05:02:51 | 2018-09-20 05:02:51,981 75 INFO [tempest.lib.common.rest_client] Request (SecGroupProtocolIPv6Test:test_create_security_group_rule_with_ipv6_protocol_integers): 504 POST http://192.168.24.12:9696/v2.0/security-groups 120.004s
2018-09-20 05:02:51 | 2018-09-20 05:02:51,982 75 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
2018-09-20 05:02:51 | Body: {"security_group": {"name": "tempest-secgroup--1489112465"}}
2018-09-20 05:02:51 | Response - Headers: {'status': '504', u'connection': 'close', u'content-type': 'text/html', 'content-location': 'http://192.168.24.12:9696/v2.0/security-groups', u'cache-control': 'no-cache'}
2018-09-20 05:02:51 | Body: <html><body><h1>504 Gateway Time-out</h1>
2018-09-20 05:02:51 | The server didn't respond in time.
2018-09-20 05:02:51 | </body></html>
2018-09-20 05:02:51 |

Almost most of the neutron api tests are failing with same issue.
I tried to find the issue but no luck, It need investigation


----------------------------

I was checking neutron logs from this job, and it looks for me that there was SIGHUP send to neutron-server:
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-rocky/8a335fb/logs/overcloud-controller-0/var/log/containers/neutron/server.log.txt.gz#_2018-09-20_05_00_50_628

After that wsgi workers were killed, like:
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-rocky/8a335fb/logs/overcloud-controller-0/var/log/containers/neutron/server.log.txt.gz#_2018-09-20_05_00_51_880

rpc workers were later restarted: https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-rocky/8a335fb/logs/overcloud-controller-0/var/log/containers/neutron/server.log.txt.gz#_2018-09-20_05_00_54_899

and neutron-server started processing rpc messages again but wsgi workers were not started again so there is no any http request in logs after this SIGHUP.

Comment 3 Brian Haley 2018-09-20 14:02:37 UTC
There is an upstream change for SIGHUP in containers that we should get downstream, linked the rocky backport.

There is additional work to make neutron-server handle SIGHUP better, that is WIP.  Bernard will continue work on that.

Comment 5 Bernard Cafarelli 2018-09-20 14:18:07 UTC
Merged in master, rocky backport in progress:
https://review.openstack.org/#/c/604006/

Not merged yet in master:
https://review.openstack.org/#/c/596274/
https://review.openstack.org/#/c/596275/

Comment 6 Bernard Cafarelli 2018-09-25 15:39:16 UTC
https://review.openstack.org/#/c/604006/ merged in rocky

https://review.openstack.org/#/c/596274/ and https://review.openstack.org/#/c/596275/ both have W+1, waiting on upstream CI issues

Comment 8 Bernard Cafarelli 2018-09-26 15:03:12 UTC
https://review.openstack.org/#/c/604006/ merged in rocky

https://review.openstack.org/#/c/596274/ merged in master, backport created at https://review.openstack.org/#/c/605349/

https://review.openstack.org/#/c/596275/ W+1 but failed gate, recheck in progress

Comment 9 Bernard Cafarelli 2018-09-27 14:13:00 UTC
*** Bug 1631334 has been marked as a duplicate of this bug. ***

Comment 10 Bernard Cafarelli 2018-09-28 08:50:56 UTC
https://review.openstack.org/#/c/604006/ and https://review.openstack.org/#/c/605349/ merged in rocky

https://review.openstack.org/#/c/596275/ is W+1 but still fails gate, rechecks in progress

Comment 11 Bernard Cafarelli 2018-09-30 21:41:21 UTC
https://review.openstack.org/#/c/596275/ merged in master, backport created at https://review.openstack.org/#/c/606864/

Comment 13 Bernard Cafarelli 2018-10-01 14:06:39 UTC
*** Bug 1629659 has been marked as a duplicate of this bug. ***

Comment 15 Bernard Cafarelli 2018-10-02 06:53:09 UTC
Last patch was reverted in master with https://review.openstack.org/#/c/607052/ as it caused failures in CI. https://bugs.launchpad.net/tripleo/+bug/1795411 for details.

Updating status to reflect that

Comment 16 Bogdan Dobrelya 2018-10-03 11:30:43 UTC
Long story short: SIGHUP breaks mistral-engine and heat-engine, which are key deployment framework components. The WIP patches are to fall back to copytruncate instead of SIGHUP signals being sent.

Comment 17 Waldemar Znoinski 2018-10-04 10:39:48 UTC
*** Bug 1631490 has been marked as a duplicate of this bug. ***

Comment 18 Bernard Cafarelli 2018-10-04 13:42:03 UTC
I removed the previous patches links, and point to current proposed patch (copytruncate) in master.

Comment 19 Waldemar Znoinski 2018-10-10 14:00:04 UTC
*** Bug 1636034 has been marked as a duplicate of this bug. ***

Comment 20 Bernard Cafarelli 2018-10-15 08:26:05 UTC
stable/rocky backport merged

Comment 23 Bernard Cafarelli 2018-10-19 09:24:05 UTC
*** Bug 1640780 has been marked as a duplicate of this bug. ***

Comment 24 Bernard Cafarelli 2018-10-23 13:43:32 UTC
*** Bug 1640419 has been marked as a duplicate of this bug. ***

Comment 26 Bernard Cafarelli 2018-10-25 15:36:21 UTC
*** Bug 1641175 has been marked as a duplicate of this bug. ***

Comment 28 Noam Manos 2018-10-31 10:31:31 UTC
*** Bug 1636070 has been marked as a duplicate of this bug. ***

Comment 31 errata-xmlrpc 2019-01-11 11:53:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045


Note You need to log in before you can comment on or make changes to this bug.