Description of problem: Test done on composable roles where all components are hard resetted,and wait for ssh Up on them. But heatclient service is unavailable and gets a 503 error code: in controller-0/var/log/containers/haproxy/haproxy.log.gz Sep 3 06:51:53 controller-1 haproxy[7]: 10.0.0.72:50700 [03/Sep/2022:06:51:53.532] heat_api heat_api/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 3/1/0/0/0 0/0 "GET /v1/6d533fc9824147bb8e2ea3590f40b850/stacks/tobiko.openstack.tests._nova.TestServerCreationStack-207439-0?resolve_outputs=False HTTP/1.1" Sep 3 06:52:03 controller-1 haproxy[7]: 10.0.0.72:43948 [03/Sep/2022:06:52:03.193] nova_osapi nova_osapi/controller-1.internalapi.redhat.local 0/0/0/38/38 200 682 - - ---- 3/1/0/0/0 0/0 "GET /v2.1/os-services HTTP/1.1" Sep 3 06:52:03 controller-1 haproxy[7]: 10.0.0.72:40818 [03/Sep/2022:06:52:03.242] neutron neutron/controller-2.internalapi.redhat.local 0/0/0/13/13 200 3716 - - ---- 3/1/0/0/0 0/0 "GET /v2.0/agents HTTP/1.1" Sep 3 06:52:03 controller-1 haproxy[7]: 10.0.0.72:43948 [03/Sep/2022:06:52:03.259] nova_osapi nova_osapi/controller-1.internalapi.redhat.local 0/0/0/340/340 200 2065 - - ---- 2/1/0/0/0 0/0 "GET /v2.1/servers/detail HTTP/1.1" Sep 3 06:52:03 controller-1 haproxy[7]: 10.0.0.72:34260 [03/Sep/2022:06:52:03.611] heat_api heat_api/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 3/1/0/0/0 0/0 "GET /v1/6d533fc9824147bb8e2ea3590f40b850/stacks/tobiko.openstack.tests._nova.TestServerCreationStack-207439-0?resolve_outputs=False HTTP/1.1" Sep 3 06:52:03 controller-1 haproxy[7]: 10.0.0.72:34260 [03/Sep/2022:06:52:03.615] heat_api heat_api/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 3/1/0/0/0 0/0 "GET /v1/6d533fc9824147bb8e2ea3590f40b850/stacks/tobiko.openstack.tests._nova.TestServerCreationStack-207439-0?resolve_outputs=False HTTP/1.1" Sep 3 06:52:13 controller-1 haproxy[7]: 10.0.0.72:43948 [03/Sep/2022:06:52:13.325] nova_osapi nova_osapi/controller-1.internalapi.redhat.local 0/0/0/30/30 200 677 - - ---- 2/1/0/0/0 0/0 "GET /v2.1/os-services HTTP/1.1" Sep 3 06:52:13 controller-1 haproxy[7]: 10.0.0.72:49282 [03/Sep/2022:06:52:13.365] neutron neutron/controller-1.internalapi.redhat.local 0/0/0/14/14 200 3716 - - ---- 3/1/0/0/0 0/0 "GET /v2.0/agents HTTP/1.1" Sep 3 06:52:13 controller-1 haproxy[7]: 10.0.0.72:43948 [03/Sep/2022:06:52:13.382] nova_osapi nova_osapi/controller-1.internalapi.redhat.local 0/0/0/249/249 200 2065 - - ---- 2/1/0/0/0 0/0 "GET /v2.1/servers/detail HTTP/1.1" Sep 3 06:52:13 controller-1 haproxy[7]: 10.0.0.72:46138 [03/Sep/2022:06:52:13.643] heat_api heat_api/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 3/1/0/0/0 0/0 "GET /v1/6d533fc9824147bb8e2ea3590f40b850/stacks/tobiko.openstack.tests._nova.TestServerCreationStack-207439-0?resolve_outputs=False HTTP/1.1" Sep 3 06:52:13 controller-1 haproxy[7]: 10.0.0.72:46138 [03/Sep/2022:06:52:13.647] heat_api heat_api/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 3/1/0/0/0 0/0 "GET /v1/6d533fc9824147bb8e2ea3590f40b850/stacks/tobiko.openstack.tests._nova.TestServerCreationStack-207439-0?resolve_outputs=False HTTP/1.1" It is also noticeable that controller-0/var/log/containers/heat/heat_api.log.gz stops after the reboot. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Sure. It's test_hard_reboot_controllers_recovery the method in: https://opendev.org/x/tobiko/src/branch/master/tobiko/tests/faults/ha/test_cloud_recovery.py#:~:text=def%20test_hard_reboot_controllers_recovery(self)%3A
And it is indeed the command you pointed out
fwiw I can't reproduce this issue by running the commands outside of tobiko or even crashing/destroying the controllers via virsh. In the logs I see the pid file gets overwritten: /var/log/containers/httpd/glance/error_log:[Wed Sep 07 12:53:04.164744 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run? /var/log/containers/httpd/heat-api/error_log:[Wed Sep 07 12:53:05.061322 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run? /var/log/containers/httpd/heat-api-cfn/error_log:[Wed Sep 07 12:53:04.449985 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run? /var/log/containers/httpd/neutron-api/error_log:[Wed Sep 07 12:52:59.423087 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run? /var/log/containers/httpd/nova-api/error_log:[Wed Sep 07 12:53:03.194573 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run? /var/log/containers/httpd/nova-metadata/error_log:[Wed Sep 07 12:53:05.486070 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run? /var/log/containers/httpd/placement/error_log:[Wed Sep 07 12:53:03.060705 2022] [core:warn] [pid 2:tid 2] AH00098: pid file /etc/httpd/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run? and all the containers are healthy.
*** Bug 2128008 has been marked as a duplicate of this bug. ***
*** Bug 2208577 has been marked as a duplicate of this bug. ***
*** Bug 2222610 has been marked as a duplicate of this bug. ***