Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1795956

Summary: Octavia API and driver agent containers fail to start on node reboot
Product: Red Hat OpenStack Reporter: Carlos Goncalves <cgoncalves>
Component: openstack-tripleo-heat-templatesAssignee: Carlos Goncalves <cgoncalves>
Status: CLOSED ERRATA QA Contact: Arieh Maron <amaron>
Severity: high Docs Contact:
Priority: urgent    
Version: 16.0 (Train)CC: ahasson, bbonguar, cjanisze, gregraka, jamsmith, ldenny, mburns, njohnston, sputhenp, tfreger
Target Milestone: z2Keywords: Regression, Triaged
Target Release: 16.0 (Train on RHEL 8.1)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200218132857.ab1079e.el8ost Doc Type: Known Issue
Doc Text:
There is a known issue for the Red Hat OpenStack Platform Load-balancing service: the containers octavia_api and octavia_driver_agent fail to start when rebooting a node. The cause for this issue is that the directory, /var/run/octavia, does not exist when the node is rebooted. To fix this issue, add the following line to the file, /etc/tmpfiles.d/var-run-octavia.conf: ---- d /var/run/octavia 0755 root root - - ----
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-14 12:15:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Carlos Goncalves 2020-01-29 10:23:43 UTC
Containers octavia_api and octavia_driver_agent fail to start upon node reboot. Root cause is /var/run/octavia is created on deployment and is ephemeral until reboot. A workaround is to create file /etc/tmpfiles.d/var-run-octavia.conf (see more in "expected results" below).


Steps to Reproduce:
1. deploy OSP 16 (ML2/OVN, Octavia)
2. check octavia_api and octavia_driver_agent containers are running
3. reboot a controller node
4. check octavia_api and octavia_driver_agent containers are *not* running
5. also check directory controller@/var/run/octavia does not exist

Actual results:

$ sudo journalctl -u tripleo_octavia_api.service

Jan 28 15:55:08 standalone-1.localdomain systemd[1]: Started octavia_api container.
Jan 28 17:06:54 standalone-1.localdomain systemd[1]: Stopping octavia_api container...
Jan 28 17:06:56 standalone-1.localdomain podman[523069]: 2020-01-28 17:06:56.548720359 +0000 UTC m=+1.692333091 container died 31814cca73fdf7fe4df52e399a2c3c99ebd0f9ddc4798780948b58cfb9f9eac6 (image=docker.io/tripleomaster/centos-binary-o
ctavia-api:current-tripleo, name=octavia_api)
Jan 28 17:06:56 standalone-1.localdomain podman[523069]: 2020-01-28 17:06:56.564271477 +0000 UTC m=+1.707884196 container stop 31814cca73fdf7fe4df52e399a2c3c99ebd0f9ddc4798780948b58cfb9f9eac6 (image=docker.io/tripleomaster/centos-binary-o
ctavia-api:current-tripleo, name=octavia_api)
Jan 28 17:06:56 standalone-1.localdomain podman[523069]: 31814cca73fdf7fe4df52e399a2c3c99ebd0f9ddc4798780948b58cfb9f9eac6
Jan 28 17:06:56 standalone-1.localdomain systemd[1]: Stopped octavia_api container.
-- Reboot --
Jan 29 09:37:07 standalone-1.localdomain systemd[1]: Starting octavia_api container...
Jan 29 09:37:11 standalone-1.localdomain podman[1955]: Error: unable to start container "octavia_api": container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"rootfs_linux.go:58: mounting \
\\"/var/run/octavia\\\" to rootfs \\\"/var/lib/containers/storage/overlay/20dab5709562147d93cfaec5de165ff4224d1d75dd6c1ee8c8a104aaf088e40d/merged\\\" at \\\"/var/run/octavia\\\" caused \\\"stat /var/run/octavia: no such file or directory\
\\"\"": OCI runtime error
Jan 29 09:37:11 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service: control process exited, code=exited status=125
Jan 29 09:37:11 standalone-1.localdomain systemd[1]: Failed to start octavia_api container.
Jan 29 09:37:11 standalone-1.localdomain systemd[1]: Unit tripleo_octavia_api.service entered failed state.
Jan 29 09:37:11 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service failed.
Jan 29 09:37:11 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service holdoff time over, scheduling restart.
Jan 29 09:37:11 standalone-1.localdomain systemd[1]: Stopped octavia_api container.
Jan 29 09:37:11 standalone-1.localdomain systemd[1]: Starting octavia_api container...
Jan 29 09:37:13 standalone-1.localdomain podman[3876]: Error: unable to start container "octavia_api": error reading container (probably exited) json message: EOF
Jan 29 09:37:13 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service: control process exited, code=exited status=125
Jan 29 09:37:13 standalone-1.localdomain systemd[1]: Failed to start octavia_api container.
Jan 29 09:37:13 standalone-1.localdomain systemd[1]: Unit tripleo_octavia_api.service entered failed state.
Jan 29 09:37:13 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service failed.
Jan 29 09:37:13 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service holdoff time over, scheduling restart.
Jan 29 09:37:13 standalone-1.localdomain systemd[1]: Stopped octavia_api container.
[...]
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: Starting octavia_api container...
Jan 29 09:37:15 standalone-1.localdomain podman[5432]: Error: unable to start container "octavia_api": error reading container (probably exited) json message: EOF
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service: control process exited, code=exited status=125
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: Failed to start octavia_api container.
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: Unit tripleo_octavia_api.service entered failed state.
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service failed.
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service holdoff time over, scheduling restart.
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: Stopped octavia_api container.
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: start request repeated too quickly for tripleo_octavia_api.service
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: Failed to start octavia_api container.
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: Unit tripleo_octavia_api.service entered failed state.
Jan 29 09:37:15 standalone-1.localdomain systemd[1]: tripleo_octavia_api.service failed.
Jan 29 09:37:16 standalone-1.localdomain podman[5523]: 2020-01-29 09:37:16.027283872 +0000 UTC m=+0.405000065 container cleanup 31814cca73fdf7fe4df52e399a2c3c99ebd0f9ddc4798780948b58cfb9f9eac6 (image=docker.io/tripleomaster/centos-binary-octavia-api:current-tripleo, name=octavia_api)
Jan 29 09:39:27 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:40:37 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:41:57 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:42:57 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:44:07 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:45:18 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:47:08 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:48:48 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:49:58 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:51:38 standalone-1.localdomain systemd[1]: octavia_api container is not active.
Jan 29 09:52:48 standalone-1.localdomain systemd[1]: octavia_api container is not active.

Expected results:

1. octavia_api and octavia_driver_agent containers should be running after node rebooted
2. directory controller@/var/run/octavia should exist
3. File controller@/etc/tmpfiles.d/var-run-octavia.conf should exist and contain following content:
   $ cat /etc/tmpfiles.d/var-run-octavia.conf 
     d /var/run/octavia 0755 root root - -

Comment 10 Carlos Goncalves 2020-03-04 15:59:30 UTC
*** Bug 1809482 has been marked as a duplicate of this bug. ***

Comment 13 Arieh Maron 2020-04-22 08:14:58 UTC
The expected behavior has now (2020-04-22) has been observed and confirmed with version openstack-tripleo-heat-templates-11.3.2-0.20200405044623.ec9970c.el8ost.noarch.rpm

Comment 17 errata-xmlrpc 2020-05-14 12:15:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2114

Comment 18 ldenny 2020-06-01 22:13:32 UTC
*** Bug 1840907 has been marked as a duplicate of this bug. ***