Bug 1691285 - Undercloud containers didn't survive reboot
Summary: Undercloud containers didn't survive reboot
Keywords:
Status: CLOSED DUPLICATE of bug 1685658
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Emilien Macchi
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-21 10:46 UTC by Sasha Smolyak
Modified: 2019-04-11 17:47 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-11 17:47:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Sasha Smolyak 2019-03-21 10:46:13 UTC
Description of problem:
Deployed undercloud. Run healthcheck on containers, besides for the known issues, the healthcheck passed.
Rebooted undercloud node, run the healthcheck. Besides swift and mysql containers with ) and rabbitmq with an unclear status of 137, everything failed.
Here is the list of containers, running after reboot:

nova_compute
ironic_inspector_dnsmasq
ironic_inspector
ironic_pxe_http
ironic_pxe_tftp
ironic_neutron_agent
ironic_conductor
mistral_api
ironic_api
swift_proxy
nova_metadata
nova_api
glance_api
nova_placement
zaqar_websocket
zaqar
swift_rsync
swift_object_updater
swift_object_server
swift_object_expirer
swift_container_updater
swift_container_server
swift_account_server
swift_account_reaper
nova_scheduler
nova_conductor
nova_api_cron
neutron_api
mistral_executor
mistral_event_engine
mistral_engine
logrotate_crond
heat_engine
heat_api_cron
heat_api_cfn
heat_api
keystone_cron
keystone
iscsid
mysql
rabbitmq
haproxy
memcached
keepalived

Here is the result of their healthcheck:

nova_compute Mar 21 10:11:35 undercloud-0.redhat.local podman[175827]: exit status 1
nova_compute status=1/FAILURE
ironic_inspector_dnsmasq status=0/SUCCESS
ironic_inspector Mar 21 10:11:22 undercloud-0.redhat.local podman[173901]: exit status 1
ironic_inspector status=1/FAILURE
Unit tripleo_ironic_pxe_http_healthcheck.service could not be found.
ironic_pxe_tftp Mar 21 10:11:11 undercloud-0.redhat.local podman[171913]: exit status 1
ironic_pxe_tftp status=1/FAILURE
Unit tripleo_ironic_neutron_agent_healthcheck.service could not be found.
ironic_conductor Mar 21 10:11:28 undercloud-0.redhat.local podman[174851]: exit status 1
ironic_conductor status=1/FAILURE
mistral_api Mar 21 10:12:26 undercloud-0.redhat.local podman[180908]: exit status 1
mistral_api status=1/FAILURE
ironic_api Mar 21 10:11:51 undercloud-0.redhat.local podman[177492]: exit status 1
ironic_api status=1/FAILURE
swift_proxy status=0/SUCCESS
nova_metadata Mar 21 10:12:25 undercloud-0.redhat.local podman[181124]: exit status 1
nova_metadata status=1/FAILURE
nova_api Mar 21 10:12:12 undercloud-0.redhat.local podman[179651]: exit status 1
nova_api status=1/FAILURE
glance_api Mar 21 10:12:05 undercloud-0.redhat.local podman[178872]: exit status 1
glance_api status=1/FAILURE
nova_placement Mar 21 10:11:33 undercloud-0.redhat.local podman[175170]: exit status 1
nova_placement status=1/FAILURE
Unit tripleo_zaqar_websocket_healthcheck.service could not be found.
Unit tripleo_zaqar_healthcheck.service could not be found.
Unit tripleo_swift_rsync_healthcheck.service could not be found.
Unit tripleo_swift_object_updater_healthcheck.service could not be found.
swift_object_server status=0/SUCCESS
Unit tripleo_swift_object_expirer_healthcheck.service could not be found.
Unit tripleo_swift_container_updater_healthcheck.service could not be found.
swift_container_server status=0/SUCCESS
swift_account_server status=0/SUCCESS
Unit tripleo_swift_account_reaper_healthcheck.service could not be found.
nova_scheduler Mar 21 10:11:53 undercloud-0.redhat.local podman[177856]: exit status 1
nova_scheduler status=1/FAILURE
nova_conductor Mar 21 10:11:00 undercloud-0.redhat.local podman[171784]: exit status 1
nova_conductor status=1/FAILURE
Unit tripleo_nova_api_cron_healthcheck.service could not be found.
neutron_api Mar 21 10:12:19 undercloud-0.redhat.local podman[180455]: exit status 1
neutron_api status=1/FAILURE
mistral_executor Mar 21 10:11:23 undercloud-0.redhat.local podman[174413]: exit status 1
mistral_executor status=1/FAILURE
mistral_event_engine Mar 21 10:12:05 undercloud-0.redhat.local podman[178932]: exit status 1
mistral_event_engine status=1/FAILURE
mistral_engine Mar 21 10:11:41 undercloud-0.redhat.local podman[176528]: exit status 1
mistral_engine status=1/FAILURE
Unit tripleo_logrotate_crond_healthcheck.service could not be found.
heat_engine Mar 21 10:11:14 undercloud-0.redhat.local podman[173358]: exit status 1
heat_engine status=1/FAILURE
Unit tripleo_heat_api_cron_healthcheck.service could not be found.
heat_api_cfn Mar 21 10:12:29 undercloud-0.redhat.local podman[181571]: exit status 1
heat_api_cfn status=1/FAILURE
heat_api Mar 21 10:11:36 undercloud-0.redhat.local podman[175498]: exit status 1
heat_api status=1/FAILURE
Unit tripleo_keystone_cron_healthcheck.service could not be found.
keystone Mar 21 10:11:02 undercloud-0.redhat.local podman[171864]: exit status 1
keystone status=1/FAILURE
iscsid status=0/SUCCESS
mysql status=0/SUCCESS
rabbitmq Mar 21 10:11:33 undercloud-0.redhat.local podman[175269]: exit status 137
rabbitmq status=137/n/a
Unit tripleo_haproxy_healthcheck.service could not be found.
memcached Mar 21 10:11:42 undercloud-0.redhat.local podman[176430]: exit status 1
memcached status=1/FAILURE
Unit tripleo_keepalived_healthcheck.service could not be found.


Version-Release number of selected component (if applicable):
RHOS_TRUNK-15.0-RHEL-8-20190314.n.0

How reproducible:
100% after every reboot

Steps to Reproduce:
1. Deploy undercloud
2. Observe containers: sudo podman ps 
and healthcheck 
sudo systemctl tripleo_<container>_healthcheck.service
3. Reboot undercloud
4. After 5 min observe healthcheck once again

Actual results:
most of healthchecks fail

Expected results:
Every healthcheck passes

Additional info:

Comment 1 Emilien Macchi 2019-04-11 17:47:55 UTC
Duplicate of:

1) OVS does not start at reboot: https://bugzilla.redhat.com/show_bug.cgi?id=1685658
2) br-ctlplane does not start https://bugzilla.redhat.com/show_bug.cgi?id=1666387
3) iptables does not start https://review.rdoproject.org/r/19994

Please re-open if needed.

*** This bug has been marked as a duplicate of bug 1685658 ***


Note You need to log in before you can comment on or make changes to this bug.