Bug 1795956 - Octavia API and driver agent containers fail to start on node reboot
Summary: Octavia API and driver agent containers fail to start on node reboot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: z2
: 16.0 (Train on RHEL 8.1)
Assignee: Carlos Goncalves
QA Contact: Arieh Maron
URL:
Whiteboard:
: 1809482 1840907 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-29 10:23 UTC by Carlos Goncalves
Modified: 2023-10-06 19:05 UTC (History)
10 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200218132857.ab1079e.el8ost
Doc Type: Known Issue
Doc Text:
There is a known issue for the Red Hat OpenStack Platform Load-balancing service: the containers octavia_api and octavia_driver_agent fail to start when rebooting a node. The cause for this issue is that the directory, /var/run/octavia, does not exist when the node is rebooted. To fix this issue, add the following line to the file, /etc/tmpfiles.d/var-run-octavia.conf: ---- d /var/run/octavia 0755 root root - - ----
Clone Of:
Environment:
Last Closed: 2020-05-14 12:15:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1861267 0 None None None 2020-01-29 10:34:34 UTC
OpenStack gerrit 704773 0 None MERGED Ensure /var/run/octavia is present upon reboot 2020-12-07 04:47:58 UTC
Red Hat Issue Tracker OSP-15511 0 None None None 2022-06-02 15:34:03 UTC
Red Hat Knowledge Base (Solution) 5121991 0 None None None 2020-06-01 23:24:48 UTC
Red Hat Product Errata RHBA-2020:2114 0 None None None 2020-05-14 12:15:55 UTC

Internal Links: 1840907

Description Carlos Goncalves 2020-01-29 10:23:43 UTC
Containers octavia_api and octavia_driver_agent fail to start upon node reboot. Root cause is /var/run/octavia is created on deployment and is ephemeral until reboot. A workaround is to create file /etc/tmpfiles.d/var-run-octavia.conf (see more in "expected results" below).


Steps to Reproduce:
1. deploy OSP 16 (ML2/OVN, Octavia)
2. check octavia_api and octavia_driver_agent containers are running
3. reboot a controller node
4. check octavia_api and octavia_driver_agent containers are *not* running
5. also check directory controller@/var/run/octavia does not exist

Actual results:

$ sudo journalctl -u tripleo_octavia_api.service

Jan 28 15:55:08 standalone-1.localdomain systemd[1]: Started octavia_api container.
Jan 28 17:06:54 standalone-1.localdomain systemd[1]: Stopping octavia_api container...
Jan 28 17:06:56 standalone-1.localdomain podman[523069]: 2020-01-28 17:06:56.548720359 +0000 UTC m=+1.692333091 container died 31814cca73fdf7fe4df52e399a2c3c99ebd0f9ddc4798780948b58cfb9f9eac6 (image=docker.io/tripleomaster/centos-binary-o
ctavia-api:current-tripleo, name=octavia_api)
Jan 28 17:06:56 standalone-1.localdomain podman[523069]: 2020-01-28 17:06:56.564271477 +0000 UTC m=+1.707884196 container stop 31814cca73fdf7fe4df52e399a2c3c99ebd0f9ddc4798780948b58cfb9f9eac6 (image=docker.io/tripleomaster/centos-binary-o
ctavia-api:current-tripleo, name=octavia_api)
Jan 28 17:06:56 standalone-1.localdomain podman[523069]: 31814cca73fdf7fe4df52e399a2c3c99ebd0f9ddc4798780948b58cfb9f9eac6
Jan 28 17:06:56 standalone-1.localdomain systemd[1]: Stopped octavia_api container.
-- Reboot --
Jan 29 09:37:07 standalone-1.localdomain systemd[1]: Starting octavia_api container...
Jan 29 09:37:11 standalone-1.localdomain podman[1955]: Error: unable to start container "octavia_api": container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"rootfs_linux.go:58: mounting \
\\"/var/run/octavia\\\" to rootfs \\\"/var/lib/containers/storage/overlay/20dab5709562147d93cfaec5de165ff4224d1d75dd6c1ee8c8a104aaf088e40d/merged\\\" at \\\"/var/run/octavia\\\" caused \\\"stat /var/run/octavia: no such file or directory\
\\"\"": OCI runtime error
Jan 29 09:37:11 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service: control process exited, code=exited status=125
Jan 29 09:37:11 standalone-1.localdomain systemd[1]: Failed to start octavia_api container.
Jan 29 09:37:11 standalone-1.localdomain systemd[1]: Unit tripleo_octavia_api.service entered failed state.
Jan 29 09:37:11 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service failed.
Jan 29 09:37:11 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service holdoff time over, scheduling restart.
Jan 29 09:37:11 standalone-1.localdomain systemd[1]: Stopped octavia_api container.
Jan 29 09:37:11 standalone-1.localdomain systemd[1]: Starting octavia_api container...
Jan 29 09:37:13 standalone-1.localdomain podman[3876]: Error: unable to start container "octavia_api": error reading container (probably exited) json message: EOF
Jan 29 09:37:13 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service: control process exited, code=exited status=125
Jan 29 09:37:13 standalone-1.localdomain systemd[1]: Failed to start octavia_api container.
Jan 29 09:37:13 standalone-1.localdomain systemd[1]: Unit tripleo_octavia_api.service entered failed state.
Jan 29 09:37:13 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service failed.
Jan 29 09:37:13 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service holdoff time over, scheduling restart.
Jan 29 09:37:13 standalone-1.localdomain systemd[1]: Stopped octavia_api container.
[...]
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: Starting octavia_api container...
Jan 29 09:37:15 standalone-1.localdomain podman[5432]: Error: unable to start container "octavia_api": error reading container (probably exited) json message: EOF
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service: control process exited, code=exited status=125
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: Failed to start octavia_api container.
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: Unit tripleo_octavia_api.service entered failed state.
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service failed.
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service holdoff time over, scheduling restart.
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: Stopped octavia_api container.
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: start request repeated too quickly for tripleo_octavia_api.service
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: Failed to start octavia_api container.
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: Unit tripleo_octavia_api.service entered failed state.
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service failed.
Jan 29 09:37:16 standalone-1.localdomain podman[5523]: 2020-01-29 09:37:16.027283872 +0000 UTC m=+0.405000065 container cleanup 31814cca73fdf7fe4df52e399a2c3c99ebd0f9ddc4798780948b58cfb9f9eac6 (image=docker.io/tripleomaster/centos-binary-octavia-api:current-tripleo, name=octavia_api)
Jan 29 09:39:27 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:40:37 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:41:57 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:42:57 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:44:07 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:45:18 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:47:08 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:48:48 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:49:58 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:51:38 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:52:48 standalone-1.localdomain systemd[1]: octavia_api container is not active.

Expected results:

1. octavia_api and octavia_driver_agent containers should be running after node rebooted
2. directory controller@/var/run/octavia should exist
3. File controller@/etc/tmpfiles.d/var-run-octavia.conf should exist and contain following content:
   $ cat /etc/tmpfiles.d/var-run-octavia.conf 
     d /var/run/octavia 0755 root root - -

Comment 10 Carlos Goncalves 2020-03-04 15:59:30 UTC
*** Bug 1809482 has been marked as a duplicate of this bug. ***

Comment 13 Arieh Maron 2020-04-22 08:14:58 UTC
The expected behavior has now (2020-04-22) has been observed and confirmed with version openstack-tripleo-heat-templates-11.3.2-0.20200405044623.ec9970c.el8ost.noarch.rpm

Comment 17 errata-xmlrpc 2020-05-14 12:15:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2114

Comment 18 ldenny 2020-06-01 22:13:32 UTC
*** Bug 1840907 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.