Bug 2124872 - After controller hard reboot heat api is unavailable
Summary: After controller hard reboot heat api is unavailable
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 17.0 (Wallaby)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: OSP Team
QA Contact: David Rosenfeld
URL:
Whiteboard:
: 2128008 2208577 2222610 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-09-07 10:37 UTC by dabarzil
Modified: 2023-07-17 15:27 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-18587 0 None None None 2022-09-07 11:00:46 UTC

Description dabarzil 2022-09-07 10:37:01 UTC
Description of problem:
Test done on composable roles where all components are hard resetted,and wait for ssh Up on them. 
But heatclient service is unavailable and gets a 503 error code:
in controller-0/var/log/containers/haproxy/haproxy.log.gz
Sep  3 06:51:53 controller-1 haproxy[7]: 10.0.0.72:50700 [03/Sep/2022:06:51:53.532] heat_api heat_api/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 3/1/0/0/0 0/0 "GET /v1/6d533fc9824147bb8e2ea3590f40b850/stacks/tobiko.openstack.tests._nova.TestServerCreationStack-207439-0?resolve_outputs=False HTTP/1.1"
Sep  3 06:52:03 controller-1 haproxy[7]: 10.0.0.72:43948 [03/Sep/2022:06:52:03.193] nova_osapi nova_osapi/controller-1.internalapi.redhat.local 0/0/0/38/38 200 682 - - ---- 3/1/0/0/0 0/0 "GET /v2.1/os-services HTTP/1.1"
Sep  3 06:52:03 controller-1 haproxy[7]: 10.0.0.72:40818 [03/Sep/2022:06:52:03.242] neutron neutron/controller-2.internalapi.redhat.local 0/0/0/13/13 200 3716 - - ---- 3/1/0/0/0 0/0 "GET /v2.0/agents HTTP/1.1"
Sep  3 06:52:03 controller-1 haproxy[7]: 10.0.0.72:43948 [03/Sep/2022:06:52:03.259] nova_osapi nova_osapi/controller-1.internalapi.redhat.local 0/0/0/340/340 200 2065 - - ---- 2/1/0/0/0 0/0 "GET /v2.1/servers/detail HTTP/1.1"
Sep  3 06:52:03 controller-1 haproxy[7]: 10.0.0.72:34260 [03/Sep/2022:06:52:03.611] heat_api heat_api/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 3/1/0/0/0 0/0 "GET /v1/6d533fc9824147bb8e2ea3590f40b850/stacks/tobiko.openstack.tests._nova.TestServerCreationStack-207439-0?resolve_outputs=False HTTP/1.1"
Sep  3 06:52:03 controller-1 haproxy[7]: 10.0.0.72:34260 [03/Sep/2022:06:52:03.615] heat_api heat_api/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 3/1/0/0/0 0/0 "GET /v1/6d533fc9824147bb8e2ea3590f40b850/stacks/tobiko.openstack.tests._nova.TestServerCreationStack-207439-0?resolve_outputs=False HTTP/1.1"
Sep  3 06:52:13 controller-1 haproxy[7]: 10.0.0.72:43948 [03/Sep/2022:06:52:13.325] nova_osapi nova_osapi/controller-1.internalapi.redhat.local 0/0/0/30/30 200 677 - - ---- 2/1/0/0/0 0/0 "GET /v2.1/os-services HTTP/1.1"
Sep  3 06:52:13 controller-1 haproxy[7]: 10.0.0.72:49282 [03/Sep/2022:06:52:13.365] neutron neutron/controller-1.internalapi.redhat.local 0/0/0/14/14 200 3716 - - ---- 3/1/0/0/0 0/0 "GET /v2.0/agents HTTP/1.1"
Sep  3 06:52:13 controller-1 haproxy[7]: 10.0.0.72:43948 [03/Sep/2022:06:52:13.382] nova_osapi nova_osapi/controller-1.internalapi.redhat.local 0/0/0/249/249 200 2065 - - ---- 2/1/0/0/0 0/0 "GET /v2.1/servers/detail HTTP/1.1"
Sep  3 06:52:13 controller-1 haproxy[7]: 10.0.0.72:46138 [03/Sep/2022:06:52:13.643] heat_api heat_api/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 3/1/0/0/0 0/0 "GET /v1/6d533fc9824147bb8e2ea3590f40b850/stacks/tobiko.openstack.tests._nova.TestServerCreationStack-207439-0?resolve_outputs=False HTTP/1.1"
Sep  3 06:52:13 controller-1 haproxy[7]: 10.0.0.72:46138 [03/Sep/2022:06:52:13.647] heat_api heat_api/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 3/1/0/0/0 0/0 "GET /v1/6d533fc9824147bb8e2ea3590f40b850/stacks/tobiko.openstack.tests._nova.TestServerCreationStack-207439-0?resolve_outputs=False HTTP/1.1"

It is also noticeable that controller-0/var/log/containers/heat/heat_api.log.gz
stops after the reboot.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 dabarzil 2022-09-07 12:41:44 UTC
And it is indeed the command you pointed out

Comment 9 Luca Miccini 2022-09-07 13:22:03 UTC
fwiw I can't reproduce this issue by running the commands outside of tobiko or even crashing/destroying the controllers via virsh.

In the logs I see the pid file gets overwritten:

/var/log/containers/httpd/glance/error_log:[Wed Sep 07 12:53:04.164744 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
/var/log/containers/httpd/heat-api/error_log:[Wed Sep 07 12:53:05.061322 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
/var/log/containers/httpd/heat-api-cfn/error_log:[Wed Sep 07 12:53:04.449985 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
/var/log/containers/httpd/neutron-api/error_log:[Wed Sep 07 12:52:59.423087 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
/var/log/containers/httpd/nova-api/error_log:[Wed Sep 07 12:53:03.194573 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
/var/log/containers/httpd/nova-metadata/error_log:[Wed Sep 07 12:53:05.486070 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
/var/log/containers/httpd/placement/error_log:[Wed Sep 07 12:53:03.060705 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run?

and all the containers are healthy.

Comment 15 Julia Marciano 2022-09-21 12:17:51 UTC
*** Bug 2128008 has been marked as a duplicate of this bug. ***

Comment 20 Takashi Kajinami 2023-05-29 15:44:58 UTC
*** Bug 2208577 has been marked as a duplicate of this bug. ***

Comment 21 Yatin Karel 2023-07-17 15:27:20 UTC
*** Bug 2222610 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.