Bug 2124872

Summary: After controller hard reboot heat api is unavailable
Product: Red Hat OpenStack Reporter: dabarzil
Component: rhosp-directorAssignee: OSP Team <rhos-maint>
Status: NEW --- QA Contact: David Rosenfeld <drosenfe>
Severity: high Docs Contact:
Priority: high    
Version: 17.0 (Wallaby)CC: bdobreli, chjones, dalvarez, eolivare, jeckersb, jmarcian, jniu, lmartins, lmiccini, mburns, morazi, ramishra, spower, ykarel
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description dabarzil 2022-09-07 10:37:01 UTC
Description of problem:
Test done on composable roles where all components are hard resetted,and wait for ssh Up on them. 
But heatclient service is unavailable and gets a 503 error code:
in controller-0/var/log/containers/haproxy/haproxy.log.gz
Sep  3 06:51:53 controller-1 haproxy[7]: 10.0.0.72:50700 [03/Sep/2022:06:51:53.532] heat_api heat_api/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 3/1/0/0/0 0/0 "GET /v1/6d533fc9824147bb8e2ea3590f40b850/stacks/tobiko.openstack.tests._nova.TestServerCreationStack-207439-0?resolve_outputs=False HTTP/1.1"
Sep  3 06:52:03 controller-1 haproxy[7]: 10.0.0.72:43948 [03/Sep/2022:06:52:03.193] nova_osapi nova_osapi/controller-1.internalapi.redhat.local 0/0/0/38/38 200 682 - - ---- 3/1/0/0/0 0/0 "GET /v2.1/os-services HTTP/1.1"
Sep  3 06:52:03 controller-1 haproxy[7]: 10.0.0.72:40818 [03/Sep/2022:06:52:03.242] neutron neutron/controller-2.internalapi.redhat.local 0/0/0/13/13 200 3716 - - ---- 3/1/0/0/0 0/0 "GET /v2.0/agents HTTP/1.1"
Sep  3 06:52:03 controller-1 haproxy[7]: 10.0.0.72:43948 [03/Sep/2022:06:52:03.259] nova_osapi nova_osapi/controller-1.internalapi.redhat.local 0/0/0/340/340 200 2065 - - ---- 2/1/0/0/0 0/0 "GET /v2.1/servers/detail HTTP/1.1"
Sep  3 06:52:03 controller-1 haproxy[7]: 10.0.0.72:34260 [03/Sep/2022:06:52:03.611] heat_api heat_api/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 3/1/0/0/0 0/0 "GET /v1/6d533fc9824147bb8e2ea3590f40b850/stacks/tobiko.openstack.tests._nova.TestServerCreationStack-207439-0?resolve_outputs=False HTTP/1.1"
Sep  3 06:52:03 controller-1 haproxy[7]: 10.0.0.72:34260 [03/Sep/2022:06:52:03.615] heat_api heat_api/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 3/1/0/0/0 0/0 "GET /v1/6d533fc9824147bb8e2ea3590f40b850/stacks/tobiko.openstack.tests._nova.TestServerCreationStack-207439-0?resolve_outputs=False HTTP/1.1"
Sep  3 06:52:13 controller-1 haproxy[7]: 10.0.0.72:43948 [03/Sep/2022:06:52:13.325] nova_osapi nova_osapi/controller-1.internalapi.redhat.local 0/0/0/30/30 200 677 - - ---- 2/1/0/0/0 0/0 "GET /v2.1/os-services HTTP/1.1"
Sep  3 06:52:13 controller-1 haproxy[7]: 10.0.0.72:49282 [03/Sep/2022:06:52:13.365] neutron neutron/controller-1.internalapi.redhat.local 0/0/0/14/14 200 3716 - - ---- 3/1/0/0/0 0/0 "GET /v2.0/agents HTTP/1.1"
Sep  3 06:52:13 controller-1 haproxy[7]: 10.0.0.72:43948 [03/Sep/2022:06:52:13.382] nova_osapi nova_osapi/controller-1.internalapi.redhat.local 0/0/0/249/249 200 2065 - - ---- 2/1/0/0/0 0/0 "GET /v2.1/servers/detail HTTP/1.1"
Sep  3 06:52:13 controller-1 haproxy[7]: 10.0.0.72:46138 [03/Sep/2022:06:52:13.643] heat_api heat_api/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 3/1/0/0/0 0/0 "GET /v1/6d533fc9824147bb8e2ea3590f40b850/stacks/tobiko.openstack.tests._nova.TestServerCreationStack-207439-0?resolve_outputs=False HTTP/1.1"
Sep  3 06:52:13 controller-1 haproxy[7]: 10.0.0.72:46138 [03/Sep/2022:06:52:13.647] heat_api heat_api/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 3/1/0/0/0 0/0 "GET /v1/6d533fc9824147bb8e2ea3590f40b850/stacks/tobiko.openstack.tests._nova.TestServerCreationStack-207439-0?resolve_outputs=False HTTP/1.1"

It is also noticeable that controller-0/var/log/containers/heat/heat_api.log.gz
stops after the reboot.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 dabarzil 2022-09-07 12:41:44 UTC
And it is indeed the command you pointed out

Comment 9 Luca Miccini 2022-09-07 13:22:03 UTC
fwiw I can't reproduce this issue by running the commands outside of tobiko or even crashing/destroying the controllers via virsh.

In the logs I see the pid file gets overwritten:

/var/log/containers/httpd/glance/error_log:[Wed Sep 07 12:53:04.164744 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
/var/log/containers/httpd/heat-api/error_log:[Wed Sep 07 12:53:05.061322 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
/var/log/containers/httpd/heat-api-cfn/error_log:[Wed Sep 07 12:53:04.449985 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
/var/log/containers/httpd/neutron-api/error_log:[Wed Sep 07 12:52:59.423087 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
/var/log/containers/httpd/nova-api/error_log:[Wed Sep 07 12:53:03.194573 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
/var/log/containers/httpd/nova-metadata/error_log:[Wed Sep 07 12:53:05.486070 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
/var/log/containers/httpd/placement/error_log:[Wed Sep 07 12:53:03.060705 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run?

and all the containers are healthy.

Comment 15 Julia Marciano 2022-09-21 12:17:51 UTC
*** Bug 2128008 has been marked as a duplicate of this bug. ***

Comment 20 Takashi Kajinami 2023-05-29 15:44:58 UTC
*** Bug 2208577 has been marked as a duplicate of this bug. ***

Comment 21 Yatin Karel 2023-07-17 15:27:20 UTC
*** Bug 2222610 has been marked as a duplicate of this bug. ***