Bug 1814230 - [OSP13] neutron_ovs_agent container is unhealthy with no errors in log
Summary: [OSP13] neutron_ovs_agent container is unhealthy with no errors in log
Keywords:
Status: CLOSED DUPLICATE of bug 1813758
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Open vSwitch development team
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-17 12:28 UTC by Vadim Khitrin
Modified: 2020-03-18 07:51 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-18 07:51:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Vadim Khitrin 2020-03-17 12:28:50 UTC
Description of problem:
neutron_ovs_agent container present on controller and compute nodes is in an unhealthy state.
[root@controller-0 ~]# docker ps | grep neutron_ovs_agent
37961fdd3cf5        192.0.60.1:8787/rh-osbs/rhosp13-openstack-neutron-openvswitch-agent:20200303.1   "dumb-init --singl..."   24 hours ago        Up 24 hours (unhealthy)                       neutron_ovs_agent

This was observed on DPDK enabled setups (both on ComputeOvsDpdkSriov and ComputeHCIOvsDpdk roles), but it might be unrelated to roles.

When running the health check script:
[root@computehciovsdpdk-0 ~]# docker exec -it neutron_ovs_agent /openstack/healthcheck
There is no neutron-openvsw process with opened RabbitMQ ports (5671,5672) running in the container
()[neutron@controller-0 /]$ ss -ntp | grep -E ":($ports).*,pid=($pids)."
ESTAB      0      0      10.10.116.107:59460              10.10.116.105:5672                users:(("/usr/bin/python",pid=138071,fd=24))
ESTAB      0      0      10.10.116.107:59168              10.10.116.105:5672                users:(("/usr/bin/python",pid=137524,fd=7))
ESTAB      0      0      10.10.116.107:45170              10.10.116.106:5672                users:(("/usr/bin/python",pid=138071,fd=23))
ESTAB      0      0      10.10.116.107:35534              10.10.116.107:5672                users:(("neutron-server:",pid=136363,fd=15))
ESTAB      0      0      10.10.116.107:35670              10.10.116.102:3306                users:(("neutron-server:",pid=136373,fd=13))
ESTAB      0      0      10.10.116.107:48994              10.10.116.102:3306                users:(("neutron-server:",pid=136363,fd=18))
ESTAB      0      0      10.10.116.107:59454              10.10.116.105:5672                users:(("/usr/bin/python",pid=138071,fd=21))
ESTAB      0      0      10.10.116.107:32812              10.10.116.105:5672                users:(("neutron-server:",pid=136364,fd=14))
ESTAB      0      0      10.10.116.107:44740              10.10.116.106:5672                users:(("/usr/bin/python",pid=136996,fd=8))
ESTAB      0      0      10.10.116.107:35933              10.10.116.102:3306                users:(("neutron-server:",pid=136364,fd=13))
ESTAB      0      0      10.10.116.107:41165              10.10.116.102:3306                users:(("neutron-server:",pid=136373,fd=15))
ESTAB      0      0      10.10.116.107:35532              10.10.116.107:5672                users:(("neutron-server:",pid=136360,fd=15))
ESTAB      0      0      10.10.116.107:35486              10.10.116.107:5672                users:(("neutron-server:",pid=136360,fd=13))
ESTAB      0      0      10.10.116.107:46152              10.10.116.105:5672                users:(("neutron-server:",pid=136343,fd=16))
ESTAB      0      0      127.0.0.1:60492              127.0.0.1:6640                users:(("/usr/bin/python",pid=136996,fd=18))
ESTAB      0      0      127.0.0.1:33872              127.0.0.1:6640                users:(("ovsdb-client",pid=138757,fd=3))
ESTAB      0      0      10.10.116.107:51860              10.10.116.105:5672                users:(("neutron-server:",pid=136373,fd=14))
ESTAB      0      0      10.10.116.107:36390              10.10.116.107:5672                users:(("neutron-server:",pid=136362,fd=17))
ESTAB      0      0      10.10.116.107:52079              10.10.116.102:3306                users:(("neutron-server:",pid=136360,fd=18))
ESTAB      0      0      127.0.0.1:6633               127.0.0.1:44272               users:(("/usr/bin/python",pid=138071,fd=18))
ESTAB      0      0      10.10.116.107:36324              10.10.116.107:5672                users:(("/usr/bin/python",pid=138071,fd=15))
ESTAB      0      0      127.0.0.1:33232              127.0.0.1:6640                users:(("/usr/bin/python",pid=137524,fd=19))
ESTAB      0      0      10.10.116.107:36410              10.10.116.107:5672                users:(("/usr/bin/python",pid=138071,fd=25))
ESTAB      0      0      10.10.116.107:45192              10.10.116.106:5672                users:(("/usr/bin/python",pid=138071,fd=27))
ESTAB      0      0      10.10.116.107:34399              10.10.116.102:3306                users:(("neutron-server:",pid=136362,fd=18))
ESTAB      0      0      10.10.116.107:44718              10.10.116.106:5672                users:(("/usr/bin/python",pid=136996,fd=6))
ESTAB      0      0      10.10.116.107:59072              10.10.116.105:5672                users:(("/usr/bin/python",pid=137140,fd=8))
ESTAB      0      0      10.10.116.107:60080              10.10.116.106:5672                users:(("neutron-server:",pid=136343,fd=14))
ESTAB      0      0      127.0.0.1:6633               127.0.0.1:44216               users:(("/usr/bin/python",pid=138071,fd=12))
ESTAB      0      0      10.10.116.107:59000              10.10.116.105:5672                users:(("/usr/bin/python",pid=136996,fd=5))
ESTAB      0      0      10.10.116.107:35456              10.10.116.107:5672                users:(("neutron-server:",pid=136360,fd=12))
ESTAB      0      0      10.10.116.107:50232              10.10.116.102:3306                users:(("neutron-server:",pid=136363,fd=16))
ESTAB      0      0      10.10.116.107:36104              10.10.116.107:5672                users:(("/usr/bin/python",pid=137524,fd=6))
ESTAB      0      0      10.10.116.107:36038              10.10.116.107:5672                users:(("/usr/bin/python",pid=137140,fd=9))
ESTAB      0      0      10.10.116.107:45098              10.10.116.106:5672                users:(("/usr/bin/python",pid=138071,fd=16))
ESTAB      0      0      10.10.116.107:45160              10.10.116.106:5672                users:(("neutron-server:",pid=136360,fd=17))
ESTAB      0      0      10.10.116.107:44280              10.10.116.106:5672                users:(("neutron-server:",pid=136363,fd=14))
ESTAB      8      0      10.10.116.107:45758              10.10.116.105:5672                users:(("neutron-server:",pid=136348,fd=16))
ESTAB      0      0      10.10.116.107:50377              10.10.116.102:3306                users:(("neutron-server:",pid=136360,fd=16))
ESTAB      0      0      10.10.116.107:45210              10.10.116.106:5672                users:(("/usr/bin/python",pid=138071,fd=29))
ESTAB      0      0      10.10.116.107:44284              10.10.116.106:5672                users:(("neutron-server:",pid=136360,fd=14))
ESTAB      0      0      10.10.116.107:44924              10.10.116.106:5672                users:(("/usr/bin/python",pid=137524,fd=9))
ESTAB      0      0      10.10.116.107:35466              10.10.116.107:5672                users:(("neutron-server:",pid=136363,fd=12))
ESTAB      0      0      10.10.116.107:38342              10.10.116.106:5672                users:(("neutron-server:",pid=136345,fd=16))
ESTAB      8      0      10.10.116.107:50910              10.10.116.107:5672                users:(("neutron-server:",pid=136348,fd=14))
ESTAB      0      0      127.0.0.1:6633               127.0.0.1:44206               users:(("/usr/bin/python",pid=138071,fd=11))
ESTAB      0      0      10.10.116.107:36416              10.10.116.107:5672                users:(("/usr/bin/python",pid=138071,fd=26))
ESTAB      0      0      127.0.0.1:33862              127.0.0.1:6640                users:(("ovsdb-client",pid=138755,fd=3))
ESTAB      0      0      10.10.116.107:57766              10.10.116.107:5672                users:(("neutron-server:",pid=136345,fd=15))
ESTAB      0      0      10.10.116.107:44860              10.10.116.106:5672                users:(("/usr/bin/python",pid=137524,fd=5))
ESTAB      0      0      10.10.116.107:48206              10.10.116.102:3306                users:(("neutron-server:",pid=136362,fd=16))
ESTAB      0      0      127.0.0.1:33684              127.0.0.1:6640                users:(("/usr/bin/python",pid=138071,fd=8))
ESTAB      0      0      10.10.116.107:58546              10.10.116.105:5672                users:(("neutron-server:",pid=136362,fd=13))
ESTAB      0      0      10.10.116.107:45116              10.10.116.106:5672                users:(("/usr/bin/python",pid=138071,fd=17))
ESTAB      0      0      10.10.116.107:44720              10.10.116.106:5672                users:(("/usr/bin/python",pid=136996,fd=7))
ESTAB      0      0      10.10.116.107:58516              10.10.116.105:5672                users:(("neutron-server:",pid=136362,fd=12))
ESTAB      0      0      10.10.116.107:59372              10.10.116.105:5672                users:(("/usr/bin/python",pid=138071,fd=14))
ESTAB      0      0      10.10.116.107:35510              10.10.116.107:5672                users:(("neutron-server:",pid=136362,fd=14))
ESTAB      0      0      10.10.116.107:32821              10.10.116.102:3306                users:(("neutron-server:",pid=136373,fd=16))
ESTAB      0      0      10.10.116.107:44160              10.10.116.106:5672                users:(("neutron-server:",pid=136373,fd=6),("neutron-server:",pid=136364,fd=6),("neutron-server:",pid=136363,fd=6),("neutron-server:",pid=136362,fd=6),("neutron-server:",pid=136360,fd=6),("neutron-server:",pid=136348,fd=6),("neutron-server:",pid=136345,fd=6),("neutron-server:",pid=136343,fd=6),("/usr/bin/python",pid=135432,fd=6))
ESTAB      0      0      10.10.116.107:59488              10.10.116.105:5672                users:(("/usr/bin/python",pid=138071,fd=28))
ESTAB      0      0      10.10.116.107:45072              10.10.116.106:5672                users:(("/usr/bin/python",pid=138071,fd=13))
ESTAB      0      0      10.10.116.107:44226              10.10.116.106:5672                users:(("neutron-server:",pid=136364,fd=12))
ESTAB      0      0      10.10.116.107:45186              10.10.116.106:5672                users:(("neutron-server:",pid=136363,fd=17))
ESTAB      0      0      10.10.116.107:35530              10.10.116.107:5672                users:(("neutron-server:",pid=136362,fd=15))
ESTAB      0      0      10.10.116.107:58548              10.10.116.105:5672                users:(("neutron-server:",pid=136363,fd=13))
ESTAB      0      0      10.10.116.107:44232              10.10.116.106:5672                users:(("neutron-server:",pid=136373,fd=12))

Output from 13z10 where the container is healthy:
()[root@controller-0 /]# ss -ntp | grep -E ":($ports).*,pid=($pids),"
ESTAB      0      0      10.10.131.106:39016              10.10.131.112:5672                users:(("neutron-openvsw",pid=784124,fd=26))
ESTAB      0      0      10.10.131.106:39138              10.10.131.112:5672                users:(("neutron-openvsw",pid=784124,fd=29))
ESTAB      0      0      10.10.131.106:34084              10.10.131.100:5672                users:(("neutron-openvsw",pid=784124,fd=14))
ESTAB      0      0      10.10.131.106:34600              10.10.131.100:5672                users:(("neutron-openvsw",pid=784124,fd=22))
ESTAB      0      0      10.10.131.106:34202              10.10.131.100:5672                users:(("neutron-openvsw",pid=784124,fd=15))
ESTAB      0      0      10.10.131.106:34836              10.10.131.100:5672                users:(("neutron-openvsw",pid=784124,fd=28))
ESTAB      0      0      10.10.131.106:34336              10.10.131.100:5672                users:(("neutron-openvsw",pid=784124,fd=16))
ESTAB      0      0      10.10.131.106:34582              10.10.131.100:5672                users:(("neutron-openvsw",pid=784124,fd=18))
ESTAB      0      0      10.10.131.106:34598              10.10.131.100:5672                users:(("neutron-openvsw",pid=784124,fd=24))
ESTAB      0      0      10.10.131.106:34462              10.10.131.100:5672                users:(("neutron-openvsw",pid=784124,fd=17))
ESTAB      0      0      127.0.0.1:33528              127.0.0.1:6640                users:(("neutron-openvsw",pid=784124,fd=8))
ESTAB      0      0      10.10.131.106:34840              10.10.131.100:5672                users:(("neutron-openvsw",pid=784124,fd=30))
ESTAB      0      0      127.0.0.1:6633               127.0.0.1:34810               users:(("neutron-openvsw",pid=784124,fd=12))
ESTAB      0      0      127.0.0.1:6633               127.0.0.1:34808               users:(("neutron-openvsw",pid=784124,fd=11))
ESTAB      0      0      10.10.131.106:34714              10.10.131.100:5672                users:(("neutron-openvsw",pid=784124,fd=25))
ESTAB      0      0      127.0.0.1:6633               127.0.0.1:34812               users:(("neutron-openvsw",pid=784124,fd=13))
ESTAB      0      0      10.10.131.106:39134              10.10.131.112:5672                users:(("neutron-openvsw",pid=784124,fd=27))
ESTAB      0      0      127.0.0.1:6633               127.0.0.1:34806               users:(("neutron-openvsw",pid=784124,fd=9))


Version-Release number of selected component (if applicable):
container: rh-osbs/rhosp13-openstack-neutron-openvswitch-agent:20200303.1
puddle: 2020-03-10.1

How reproducible:
Alway

Steps to Reproduce:
1. Deploy an overcloud
2. Log into controller and compute nodes and view neutron_ovs_agent container

Actual results:
Container is unhealthy while there are no errors or warning in logs

Expected results:
Container is healthy if there are no errors or warnings in log

Additional info:
Will provide SOS report in comment

Comment 2 Saravanan KR 2020-03-18 04:51:38 UTC
I was debugging the issue on Vadim's cluster before raising this bz. I think this issue is related to https://bugs.launchpad.net/tripleo/+bug/1821856. And specifically this particular fix - https://review.opendev.org/#/c/648027/6/healthcheck/common.sh@23

I haven't analyzed how this regression was introduced. Looping Tengu for his comments.

Comment 3 Cédric Jeanneret 2020-03-18 07:38:08 UTC
Hello,

This is probably a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1813758

Care to confirm?

Comment 4 Saravanan KR 2020-03-18 07:51:32 UTC
Yes it is, marking as duplicate.

*** This bug has been marked as a duplicate of bug 1813758 ***


Note You need to log in before you can comment on or make changes to this bug.