Bug 1782533 - ironic neutron agent healthcheck service fails on the undercloud
Summary: ironic neutron agent healthcheck service fails on the undercloud
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-containers
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ga
: 16.0 (Train on RHEL 8.1)
Assignee: Bob Fournier
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-11 19:31 UTC by Alex Schultz
Modified: 2020-03-03 22:12 UTC (History)
5 users (show)

Fixed In Version: openstack-ironic-neutron-agent-container-16.0-80
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-03 22:12:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1848618 0 None None None 2019-12-11 20:10:16 UTC
OpenStack gerrit 698580 0 None MERGED kolla/overrides: add missing healthcheck for ironic-neutron-agent 2020-02-17 21:49:21 UTC
Red Hat Product Errata RHBA-2020:0659 0 None None None 2020-03-03 22:12:21 UTC

Description Alex Schultz 2019-12-11 19:31:20 UTC
Description of problem:

We're seeing this failure in our tests:

tripleo_ironic_neutron_agent_healthcheck.service - ironic_neutron_agent healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_ironic_neutron_agent_healthcheck.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2019-12-06 18:28:05 UTC; 4s ago
  Process: 275678 ExecStart=/usr/bin/podman exec ironic_neutron_agent /openstack/healthcheck 5672 (code=exited, status=1/FAILURE)
 Main PID: 275678 (code=exited, status=1/FAILURE)
Dec 06 18:28:05 undercloud-0.redhat.local systemd[1]: Starting ironic_neutron_agent healthcheck...
Dec 06 18:28:05 undercloud-0.redhat.local podman[275678]: exec failed: container_linux.go:345: starting container process caused "exec: \"/openstack/healthcheck\": stat /openstack/healthcheck: no such file or directory"
Dec 06 18:28:05 undercloud-0.redhat.local podman[275678]: time="2019-12-06T18:28:05Z" level=error msg="Error removing exit file for container af5ba0b398aef2b4cc19bc96c167571673fd09818d0e304d3e1cb3e321fe8a8b exec session 2839de7b0cdd77e150e070e0bce4730748158e8416dcf6a2828a62df49f91073: remove /var/run/containers/storage/overlay-containers/af5ba0b398aef2b4cc19bc96c167571673fd09818d0e304d3e1cb3e321fe8a8b/userdata/exec_pid_2839de7b0cdd77e150e070e0bce4730748158e8416dcf6a2828a62df49f91073: no such file or directory"
Dec 06 18:28:05 undercloud-0.redhat.local podman[275678]: Error: exit status 1
Dec 06 18:28:05 undercloud-0.redhat.local systemd[1]: tripleo_ironic_neutron_agent_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Dec 06 18:28:05 undercloud-0.redhat.local systemd[1]: tripleo_ironic_neutron_agent_healthcheck.service: Failed with result 'exit-code'.
Dec 06 18:28:05 undercloud-0.redhat.local systemd[1]: Failed to start ironic_neutron_agent healthcheck.


Version-Release number of selected component (if applicable):

Container is: rhosp16-openstack-ironic-neutron-agent:20191202.1

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Alex Schultz 2019-12-11 19:49:55 UTC
This is probably a downstream container issue as the upstream has a block in tripleo-common for this container.

{% block ironic_neutron_agent_footer %}                                         
RUN mkdir -p /openstack && \                                                    
    ln -s /usr/share/openstack-tripleo-common/healthcheck/ironic-neutron-agent /openstack/healthcheck && \
    chmod a+rx /openstack/healthcheck                                           
{% endblock %}

Comment 3 Alex Schultz 2019-12-11 19:50:55 UTC
https://review.opendev.org/#/c/693899/

Comment 5 Bob Fournier 2020-01-14 21:28:54 UTC
I see the patch https://review.opendev.org/#/c/698580/ is installed in RHOS_TRUNK-16.0-RHEL-8-20200113.n.0, for example its in:
/var/lib/containers/storage/overlay/3b6e177aadc2704ead5708dc50b882c7d638beb296839ebafa8042a0574b34cc/diff/usr/share/openstack-tripleo-common-containers/container-images/tripleo_kolla_template_overrides.j2

However I'm seeing in /var/log/messages:
Jan 14 18:27:41 undercloud-0 systemd[1]: Starting ironic_neutron_agent healthcheck...
Jan 14 18:27:41 undercloud-0 podman[79336]: exec failed: container_linux.go:345: starting container process caused "exec: \"/openstack/healthcheck\": stat /openstack/healthcheck: no such file or directory"
Jan 14 18:27:41 undercloud-0 podman[79336]: time="2020-01-14T18:27:41Z" level=error msg="Error removing exit file for container e82b0e8eb016d90dd60eb60b61dacb04cb8eff9b3c4b471c546b6363c1581d3f exec session 59103bca15cbb84fc30808137cd3ac682b020a6535d158fbecb384a94780e4e9: remove /var/run/containers/storage/overlay-containers/e82b0e8eb016d90dd60eb60b61dacb04cb8eff9b3c4b471c546b6363c1581d3f/userdata/exec_pid_59103bca15cbb84fc30808137cd3ac682b020a6535d158fbecb384a94780e4e9: no such file or directory"
Jan 14 18:27:41 undercloud-0 podman[79336]: Error: exit status 1
Jan 14 18:27:41 undercloud-0 systemd[1]: tripleo_ironic_neutron_agent_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Jan 14 18:27:41 undercloud-0 systemd[1]: tripleo_ironic_neutron_agent_healthcheck.service: Failed with result 'exit-code'.
Jan 14 18:27:41 undercloud-0 systemd[1]: Failed to start ironic_neutron_agent healthcheck.

Comment 6 Bob Fournier 2020-01-31 18:39:11 UTC
Moving back to assigned as we need this in downstream container.

Comment 7 Bob Fournier 2020-01-31 19:47:55 UTC
Verified we see healthcheck started:
Jan 31 14:46:13 hardprov-dl360-g9-01 systemd[1]: Started ironic_neutron_agent healthcheck.

and no errors in /var/log/messages

Comment 11 errata-xmlrpc 2020-03-03 22:12:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0659


Note You need to log in before you can comment on or make changes to this bug.