Bug 1782533

Summary: ironic neutron agent healthcheck service fails on the undercloud
Product: Red Hat OpenStack Reporter: Alex Schultz <aschultz>
Component: openstack-containersAssignee: Bob Fournier <bfournie>
Status: CLOSED ERRATA QA Contact: Alexander Chuzhoy <sasha>
Severity: high Docs Contact:
Priority: medium    
Version: 16.0 (Train)CC: bfournie, jschluet, m.andre, mburns, slinaber
Target Milestone: gaKeywords: Triaged
Target Release: 16.0 (Train on RHEL 8.1)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-ironic-neutron-agent-container-16.0-80 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-03 22:12:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alex Schultz 2019-12-11 19:31:20 UTC
Description of problem:

We're seeing this failure in our tests:

tripleo_ironic_neutron_agent_healthcheck.service - ironic_neutron_agent healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_ironic_neutron_agent_healthcheck.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2019-12-06 18:28:05 UTC; 4s ago
  Process: 275678 ExecStart=/usr/bin/podman exec ironic_neutron_agent /openstack/healthcheck 5672 (code=exited, status=1/FAILURE)
 Main PID: 275678 (code=exited, status=1/FAILURE)
Dec 06 18:28:05 undercloud-0.redhat.local systemd[1]: Starting ironic_neutron_agent healthcheck...
Dec 06 18:28:05 undercloud-0.redhat.local podman[275678]: exec failed: container_linux.go:345: starting container process caused "exec: \"/openstack/healthcheck\": stat /openstack/healthcheck: no such file or directory"
Dec 06 18:28:05 undercloud-0.redhat.local podman[275678]: time="2019-12-06T18:28:05Z" level=error msg="Error removing exit file for container af5ba0b398aef2b4cc19bc96c167571673fd09818d0e304d3e1cb3e321fe8a8b exec session 2839de7b0cdd77e150e070e0bce4730748158e8416dcf6a2828a62df49f91073: remove /var/run/containers/storage/overlay-containers/af5ba0b398aef2b4cc19bc96c167571673fd09818d0e304d3e1cb3e321fe8a8b/userdata/exec_pid_2839de7b0cdd77e150e070e0bce4730748158e8416dcf6a2828a62df49f91073: no such file or directory"
Dec 06 18:28:05 undercloud-0.redhat.local podman[275678]: Error: exit status 1
Dec 06 18:28:05 undercloud-0.redhat.local systemd[1]: tripleo_ironic_neutron_agent_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Dec 06 18:28:05 undercloud-0.redhat.local systemd[1]: tripleo_ironic_neutron_agent_healthcheck.service: Failed with result 'exit-code'.
Dec 06 18:28:05 undercloud-0.redhat.local systemd[1]: Failed to start ironic_neutron_agent healthcheck.


Version-Release number of selected component (if applicable):

Container is: rhosp16-openstack-ironic-neutron-agent:20191202.1

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Alex Schultz 2019-12-11 19:49:55 UTC
This is probably a downstream container issue as the upstream has a block in tripleo-common for this container.

{% block ironic_neutron_agent_footer %}                                         
RUN mkdir -p /openstack && \                                                    
    ln -s /usr/share/openstack-tripleo-common/healthcheck/ironic-neutron-agent /openstack/healthcheck && \
    chmod a+rx /openstack/healthcheck                                           
{% endblock %}

Comment 3 Alex Schultz 2019-12-11 19:50:55 UTC
https://review.opendev.org/#/c/693899/

Comment 5 Bob Fournier 2020-01-14 21:28:54 UTC
I see the patch https://review.opendev.org/#/c/698580/ is installed in RHOS_TRUNK-16.0-RHEL-8-20200113.n.0, for example its in:
/var/lib/containers/storage/overlay/3b6e177aadc2704ead5708dc50b882c7d638beb296839ebafa8042a0574b34cc/diff/usr/share/openstack-tripleo-common-containers/container-images/tripleo_kolla_template_overrides.j2

However I'm seeing in /var/log/messages:
Jan 14 18:27:41 undercloud-0 systemd[1]: Starting ironic_neutron_agent healthcheck...
Jan 14 18:27:41 undercloud-0 podman[79336]: exec failed: container_linux.go:345: starting container process caused "exec: \"/openstack/healthcheck\": stat /openstack/healthcheck: no such file or directory"
Jan 14 18:27:41 undercloud-0 podman[79336]: time="2020-01-14T18:27:41Z" level=error msg="Error removing exit file for container e82b0e8eb016d90dd60eb60b61dacb04cb8eff9b3c4b471c546b6363c1581d3f exec session 59103bca15cbb84fc30808137cd3ac682b020a6535d158fbecb384a94780e4e9: remove /var/run/containers/storage/overlay-containers/e82b0e8eb016d90dd60eb60b61dacb04cb8eff9b3c4b471c546b6363c1581d3f/userdata/exec_pid_59103bca15cbb84fc30808137cd3ac682b020a6535d158fbecb384a94780e4e9: no such file or directory"
Jan 14 18:27:41 undercloud-0 podman[79336]: Error: exit status 1
Jan 14 18:27:41 undercloud-0 systemd[1]: tripleo_ironic_neutron_agent_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Jan 14 18:27:41 undercloud-0 systemd[1]: tripleo_ironic_neutron_agent_healthcheck.service: Failed with result 'exit-code'.
Jan 14 18:27:41 undercloud-0 systemd[1]: Failed to start ironic_neutron_agent healthcheck.

Comment 6 Bob Fournier 2020-01-31 18:39:11 UTC
Moving back to assigned as we need this in downstream container.

Comment 7 Bob Fournier 2020-01-31 19:47:55 UTC
Verified we see healthcheck started:
Jan 31 14:46:13 hardprov-dl360-g9-01 systemd[1]: Started ironic_neutron_agent healthcheck.

and no errors in /var/log/messages

Comment 11 errata-xmlrpc 2020-03-03 22:12:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0659