1436729 – After undercloud upgrade deploying new servers fails to get dhcp response

Bug 1436729 - After undercloud upgrade deploying new servers fails to get dhcp response

Summary: After undercloud upgrade deploying new servers fails to get dhcp response

Keywords:
Status:	CLOSED DUPLICATE of bug 1425507
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	rhosp-director
Sub Component:
Version:	11.0 (Ocata)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	11.0 (Ocata)
Assignee:	Sofer Athlan-Guyot
QA Contact:	Amit Ugol
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-03-28 14:06 UTC by Raviv Bar-Tal
Modified:	2023-02-22 23:02 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-04-24 13:22:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
neutron/dhcp-agent.log (3.17 MB, text/plain) 2017-03-28 21:39 UTC, Bob Fournier	no flags	Details
neutron/openvswitch-agent.log (10.43 MB, text/plain) 2017-03-28 21:41 UTC, Bob Fournier	no flags	Details
process list (4.74 KB, text/plain) 2017-04-20 15:03 UTC, Raviv Bar-Tal	no flags	Details
server log (9.95 MB, text/plain) 2017-04-20 15:09 UTC, Raviv Bar-Tal	no flags	Details
openswitch-agent (11.50 MB, text/plain) 2017-04-20 15:11 UTC, Raviv Bar-Tal	no flags	Details
View All

Description Raviv Bar-Tal 2017-03-28 14:06:24 UTC

Description of problem:
When trying to deploy new overcloud node after undercloud upgrade (Newton -> Ocata) the deploy server does not get dhcp response 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. install Newton undercloud and overcloud with 2 compute nodes
2. upgrade undercloud to Ocata
3. try to scale out (run the same commans as in setp 1 with compute-scale 3)

Actual results:
new node fail to get ip from dhcp server.

Expected results:
node deployed 

Additional info:
* This bug have some similarity to BZ #1432028 
 
** Running tcpdump on the undercloude show the dhcp request but there is no response:
08:27:38.956868 52:54:00:0a:25:27 > Broadcast, ethertype IPv4 (0x0800), length 438: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:0a:25:27, length 396
08:27:46.866607 52:54:00:0a:25:27 > Broadcast, ethertype IPv4 (0x0800), length 438: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:0a:25:27, length 396
08:28:02.684609 52:54:00:0a:25:27 > Broadcast, ethertype IPv4 (0x0800), length 438: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:0a:25:27, length 396
08:59:16.975626 52:54:00:0a:25:27 > Broadcast, ethertype IPv4 (0x0800), length 438: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:0a:25:27, length 396
08:59:24.884914 52:54:00:0a:25:27 > Broadcast, ethertype IPv4 (0x0800), length 438: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:0a:25:27, length 396
08:59:40.703398 52:54:00:0a:25:27 > Broadcast, ethertype IPv4 (0x0800), length 438: 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:0a:25:27, length 396

Comment 1 Bob Fournier 2017-03-28 21:39:27 UTC

Created attachment 1267167 [details]
neutron/dhcp-agent.log

Comment 2 Bob Fournier 2017-03-28 21:41:00 UTC

Created attachment 1267168 [details]
neutron/openvswitch-agent.log

Comment 3 Bob Fournier 2017-03-28 21:42:15 UTC

As with https://bugzilla.redhat.com/show_bug.cgi?id=1432028, it looks like the neutron dhcp server is not responding because of a neutron port binding failure due to a dead neutron agent.

** dhcp-agent.log **
Here the neutron port corresponding to mac 52:54:00:0a:25:27 is being added to the dhcp agent but the vid_type is "bind_failed":
2017-03-28 10:02:07.706 19566 INFO neutron.agent.dhcp.agent [req-31a21181-c1b0-42ed-9763-693c4cfe8dbe a41c516915bd49b8b58573b9d81a3a23 f8fcf891cea7471e957d2dda4a009380 - - -] Trigger reload_allocations for port admin_state_up=True, allowed_address_pairs=[], binding:host_id=undercloud-0.redhat.local, binding:profile=, binding:vif_details=, binding:vif_type=binding_failed, binding:vnic_type=normal, created_at=2017-03-28T14:02:00Z, description=, device_id=0e368cf7-21b2-43b8-8a84-82eea663bd9a, device_owner=compute:None, extra_dhcp_opts=[{u'opt_value': u'192.168.24.1', u'ip_version': 4, u'opt_name': u'server-ip-address'}, {u'opt_value': u'undionly.kpxe', u'ip_version': 4, u'opt_name': u'tag:!ipxe,bootfile-name'}, {u'opt_value': u'http://192.168.24.1:8088/boot.ipxe', u'ip_version': 4, u'opt_name': u'tag:ipxe,bootfile-name'}, {u'opt_value': u'192.168.24.1', u'ip_version': 4, u'opt_name': u'tftp-server'}], fixed_ips=[{u'subnet_id': u'0c7877b2-b7d6-44f7-acc0-37c23386ccb7', u'ip_address': u'192.168.24.23'}], id=048b10ab-fbbe-403b-b61b-6c2768e8fe6c, mac_address=52:54:00:0a:25:27, name=, network_id=d69795b7-5e8d-4e5c-af76-a2b5c8ba4572, project_id=7d637d45ea47471080085a245f85ad83, revision_number=11, security_groups=[u'a5775727-aed9-4ac3-b4af-9dae5c1923de'], status=DOWN, tags=[], tenant_id=7d637d45ea47471080085a245f85ad83, updated_at=2017-03-28T14:02:07Z

In the log above the port id is 048b10ab-fbbe-403b-b61b-6c2768e8fe6c.

** neutron/server.log**
We can see the port bind failure occurred because of the dead neutron agent:

2017-03-28 10:02:06.840 19532 WARNING neutron.plugins.ml2.drivers.mech_agent [req-92a14e4a-8b4b-4813-88c5-e7af3226721b a41c516915bd49b8b58573b9d81a3a23 f8fcf891cea7471e957d2dda4a009380 - - -] Refusing to bind port 048b10ab-fbbe-403b-b61b-6c2768e8fe6c to dead agent: {'binary': u'neutron-openvswitch-agent', 'description': None, 'admin_state_up': True, 'heartbeat_timestamp': datetime.datetime(2017, 3, 28, 9, 33, 28), 'availability_zone': None, 'alive': False, 'topic': u'N/A', 'host': u'undercloud-0.redhat.local', 'agent_type': u'Open vSwitch agent', 'resource_versions': {u'Subnet': u'1.0', u'Network': u'1.0', u'SubPort': u'1.0', u'SecurityGroup': u'1.0', u'SecurityGroupRule': u'1.0', u'Trunk': u'1.1', u'QosPolicy': u'1.4', u'Port': u'1.0'}, 'created_at': datetime.datetime(2017, 3, 27, 14, 14, 45), 'started_at': datetime.datetime(2017, 3, 28, 9, 32, 58), 'id': u'25693163-c59b-4b67-8fd1-ee306ac8ed8e', 'configurations': {u'ovs_hybrid_plug': True, u'in_distributed_mode': False, u'datapath_type': u'system', u'vhostuser_socket_dir': u'/var/run/openvswitch', u'tunneling_ip': None, u'arp_responder_enabled': False, u'devices': 1, u'ovs_capabilities': {u'datapath_types': [u'netdev', u'system'], u'iface_types': [u'geneve', u'gre', u'internal', u'ipsec_gre', u'lisp', u'patch', u'stt', u'system', u'tap', u'vxlan']}, u'log_agent_heartbeats': False, u'l2_population': False, u'tunnel_types': [], u'extensions': [], u'enable_distributed_routing': False, u'bridge_mappings': {u'ctlplane': u'br-ctlplane'}}}


There are many of these "dead agent" log messages starting at 2017-03-28 05:34:49.937:
2017-03-28 05:34:49.937 19537 WARNING neutron.db.agents_db [req-1c2f1800-dae7-4511-935b-da84fbb33408 - - - 
- -] Agent healthcheck: found 1 dead agents out of 2:
2017-03-28 05:35:26.968 19537 WARNING neutron.db.agents_db [req-1c2f1800-dae7-4511-935b-da84fbb33408 - - - 
- -] Agent healthcheck: found 1 dead agents out of 2:
2017-03-28 05:36:03.996 19537 WARNING neutron.db.agents_db [req-1c2f1800-dae7-4511-935b-da84fbb33408 - - - 


** neutron/openvswitch-agent.log **
It looks like the agent died here:

2017-03-28 05:15:20.123 23628 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor Interface name,ofport,external_ids --format=json]: None _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:233
2017-03-28 05:15:20.124 23628 ERROR neutron.agent.linux.async_process [-] Error received from [ovsdb-client monitor Interface name,ofport,external_ids --format=json]: None
2017-03-28 05:15:20.124 23628 ERROR neutron.agent.linux.async_process [-] Process [ovsdb-client monitor Interface name,ofport,external_ids --format=json] dies due to the error: None
2017-03-28 05:15:20.134 23628 ERROR ryu.lib.hub [-] hub: uncaught exception: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 545, in close
    self.uninstantiate(app_name)
  File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 528, in uninstantiate
    app = self.applications.pop(name)
KeyError: 'ofctl_service'

2017-03-28 05:15:20.197 23628 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=23708
2017-03-28 05:20:55.303 6015 INFO neutron.common.config [-] Logging enabled!
2017-03-28 05:20:55.304 6015 INFO neutron.common.config [-] /usr/bin/neutron-openvswitch-agent version 10.0.0



I'm attaching the relevant neutron logs.  I want to see if someone from Neutron team can take a look to see why this agent died after upgrade.

Comment 4 Marius Cornea 2017-03-28 21:52:17 UTC

This looks familiar, we've seen similar ryu error while investigating bug 1425507.

It also seems that it's not always reproducible which could explain why I wasn't able to reproduce bug 1432028.

Comment 5 Bob Fournier 2017-03-28 23:03:48 UTC

Thanks Marius.  Yeah this comment - https://bugzilla.redhat.com/show_bug.cgi?id=1425507#c13 refers to the ryu bug that was being hit on upgrade which has the same signature as above: https://bugs.launchpad.net/neutron/+bug/1589746. 

It might make sense to make this a duplicate of bug 1425507.

Comment 6 Ramon Acedo 2017-03-29 12:11:31 UTC

Hi Raviv, could you confirm wether this is a duplicate of bugzilla 1425507 and if it is mark it accordingly? Many thanks.

Comment 7 Raviv Bar-Tal 2017-03-29 15:03:57 UTC

This bug was discovered on the undercloud while BZ 1425507 was discovered on the overcloud node (compute), As the upgrade procedure is the same for undecloud and overfcloud we can safely assume it the same bug and the fix will work in both cases.

*** This bug has been marked as a duplicate of bug 1425507 ***

Comment 8 Raviv Bar-Tal 2017-04-20 14:47:47 UTC

I'm re opening this bug as BZ 1425507 is fixed and can not be reproduce any more, but this bug still happens.

attached are new neutron logs. and system info

Comment 9 Raviv Bar-Tal 2017-04-20 14:51:26 UTC

rpm info:
[root@undercloud-0 neutron]# rpm -qa | grep neutron
openstack-neutron-ml2-10.0.0-18.el7ost.noarch
openstack-neutron-10.0.0-18.el7ost.noarch
python-neutron-10.0.0-18.el7ost.noarch
python-neutron-lib-1.1.0-1.el7ost.noarch
openstack-neutron-openvswitch-10.0.0-18.el7ost.noarch
puppet-neutron-10.3.0-2.el7ost.noarch
python-neutronclient-6.1.0-1.el7ost.noarch
openstack-neutron-common-10.0.0-18.el7ost.noarch

Comment 10 Raviv Bar-Tal 2017-04-20 15:03:46 UTC

Created attachment 1272998 [details]
process list

Comment 11 Raviv Bar-Tal 2017-04-20 15:09:12 UTC

Created attachment 1273002 [details]
server log

Comment 12 Raviv Bar-Tal 2017-04-20 15:11:57 UTC

Created attachment 1273004 [details]
openswitch-agent

Comment 13 Sofer Athlan-Guyot 2017-04-21 11:17:53 UTC

Hi,

here's what happen:

 - stop neutron and openstack using and upgrade some package as part of the undercloud upgrade proc:

   22  sudo yum -y update instack-undercloud openstack-puppet-modules openstack-tripleo-common python-tripleoclient
   23  sudo systemctl stop 'openstack-*'
   24  sudo systemctl stop 'neutron-*'
   25  sudo systemctl stop httpd

NOTE: Raviv, this is not in the same order that there https://gitlab.cee.redhat.com/mandreou/OSP10-OSP11-Upgrade/#undercloud-upgrade , don't know if it's relevant but it would be worth putting those commands in the same order as documented there.

  - then we hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1425507

after that neutron doesn't recover.

Here is all the detail from the platform.

TZ is different in the log.  EDT for some, UTC for other.  For ref EDT=UTC-4

===> yum.log: EDT => initial installation

Apr 20 04:13:34 Installed: openstack-tripleo-heat-templates-5.2.0-9.el7ost.noarch
Apr 20 04:16:29 Installed: 1:openstack-neutron-common-9.2.0-8.el7ost.noarch
Apr 20 04:16:29 Installed: 1:openstack-neutron-9.2.0-8.el7ost.noarch

===> /var/log/openvswitch/ovs-vswitchd.log-20170421.gz: UTC => everything looks good

2017-04-20T08:25:47.089Z|00020|bridge|INFO|bridge br-int: added interface br-int on port 65534
2017-04-20T08:25:47.320Z|00025|rconn|INFO|br-int<->tcp:127.0.0.1:6633: connecting...
2017-04-20T08:25:47.327Z|00026|rconn|INFO|br-int<->tcp:127.0.0.1:6633: connected
2017-04-20T08:26:57.535Z|00040|connmgr|INFO|br-int<->tcp:127.0.0.1:6633: 6 flow_mods 50 s ago (1 adds, 5 deletes)


===> yum.log: EDT => as part of the undercloud upgrade we upgrade some packages:
Apr 20 06:52:55 Updated: instack-undercloud-6.0.0-6.el7ost.noarch

===> history (inferred from yum.log): so we know that those command took places at ~06:52 ETD.

   22  sudo yum -y update instack-undercloud openstack-puppet-modules openstack-tripleo-common python-tripleoclient
   23  sudo systemctl stop 'openstack-*'
   24  sudo systemctl stop 'neutron-*'
   25  sudo systemctl stop httpd

===> /var/log/neutron/openvswitch-agent.log-20170421.gz: EDT : bz 1425507

2017-04-20 06:54:39.385 23058 ERROR neutron.agent.linux.async_process [-] Process [ovsdb-client monitor Interface name,ofport,external_ids --format=json] dies due to the error: None
2017-04-20 06:54:39.386 23058 ERROR ryu.lib.hub [-] hub: uncaught exception: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 545, in close
    self.uninstantiate(app_name)
  File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 528, in uninstantiate
    app = self.applications.pop(name)
KeyError: 'ofctl_service'

2017-04-20 06:54:39.457 23058 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=23141

===> /var/log/openvswitch/ovs-vswitchd.log-20170421.gz: UTC : it will never work again :)

2017-04-20T10:54:39.536Z|00042|rconn|INFO|br-int<->tcp:127.0.0.1:6633: connection closed by peer
2017-04-20T10:54:40.192Z|00044|rconn|INFO|br-int<->tcp:127.0.0.1:6633: connecting...
2017-04-20T10:54:40.192Z|00045|rconn|WARN|br-int<->tcp:127.0.0.1:6633: connection failed (Connection refused)
2017-04-20T10:54:40.192Z|00046|rconn|INFO|br-int<->tcp:127.0.0.1:6633: waiting 2 seconds before reconnect

===> yum.log: EDT : the yum upgrade command takes place as part of the undercloud upgrade, it's already too late (those are the versions where the fix is in bz 1425507)

Apr 20 07:56:52 Updated: 1:openstack-neutron-10.0.0-18.el7ost.noarch
Apr 20 08:00:09 Updated: openstack-tripleo-heat-templates-6.0.0-5.el7ost.noarch










2

Comment 14 Sofer Athlan-Guyot 2017-04-21 11:21:11 UTC

Asking Jacob to have a look as he was assigned bz 145507 and adjusting DFG.

Comment 15 Ramon Acedo 2017-04-21 11:39:04 UTC

Hi Sofer, DFG shows as DFG:HardProv in the internal whiteboard, shouldn't it be DFG:Upgrades or DFG:Networking?

Comment 16 Sofer Athlan-Guyot 2017-04-21 11:42:48 UTC

yeap, I though I modified that by I had a mid-air collision ...

Comment 17 Jakub Libosvar 2017-04-21 11:48:32 UTC

The issue in bug 1425507 was that rootwrap daemon wasn't killed properly at some cases keeping the socket with port bound. The upgrade scriptlet of openstack-neutron-openvswitch agent should kill the orphaned rootwrap daemon.

Can you please paste here which process is still bound to port 6633? Is that the orphaned 23439 ?

Comment 18 Jakub Libosvar 2017-04-21 11:56:52 UTC

I'm setting needinfo also to Sofer as the original bug was reported before bug 1425507 was fixed.

===> yum.log: EDT : the yum upgrade command takes place as part of the undercloud upgrade, it's already too late

The orphaned rootwrap is killed with upgrade of openstack-neutron-openvswitch package. I don't understand for what "it's already too late".

Comment 19 Sofer Athlan-Guyot 2017-04-21 12:12:03 UTC

Hi Jakub, it's already too late because it was restarted and fail to do so earlier, at 2017-04-20 06:54:39.385 (EDT), and the upgrade of the package with the fix happen at Apr 20 08:00:09 (EDT)

Comment 22 Sofer Athlan-Guyot 2017-04-23 15:09:06 UTC

Hi,

so on the environment after the upgrade of the undercloud, we have that:

$ neutron agent-list                                                                                                                                                                        
+--------------------------------------+--------------------+---------------------------+-------------------+-------+----------------+---------------------------+
| id                                   | agent_type         | host                      | availability_zone | alive | admin_state_up | binary                    |
+--------------------------------------+--------------------+---------------------------+-------------------+-------+----------------+---------------------------+
| 0b5342a6-f9ea-4018-a8b7-b6d7b4c18178 | Open vSwitch agent | undercloud-0.redhat.local |                   | xxx   | True           | neutron-openvswitch-agent |
| 50011aa0-6230-4246-8e25-4c61f8e8f3db | DHCP agent         | undercloud-0.redhat.local | nova              | :-)   | True           | neutron-dhcp-agent        |
+--------------------------------------+--------------------+---------------------------+-------------------+-------+----------------+---------------------------+

If one restart ovs-vswitchd:

sudo systemctl restart ovs-vswitchd

Then the installation of the newton undercloud is working:

./overcloud_deploy_compat_non_ha.sh

...

2017-04-22 16:16:53Z [overcloud]: CREATE_COMPLETE  Stack CREATE completed successfully

 Stack overcloud CREATE_COMPLETE 

and the stack is there:

$ nova list
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| 985564d7-3814-4160-a431-9ed394609f8d | compute-0    | ACTIVE | -          | Running     | ctlplane=192.168.24.15 |
| b18fceba-8241-49bf-9096-f6e1e25ec437 | controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.14 |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+


unfortunately compute-0 is accessible, but controller-0 is not (ping 192.168.24.14 doesn't work).

So we first have a problem during undercloud upgrade and we should get this out of the way as it's unrelated to mixed version.

It would be nice to restart this test using the exact sequence highlighted in https://gitlab.cee.redhat.com/mandreou/OSP10-OSP11-Upgrade/#undercloud-upgrade for the undercloud upgrade and then check the status of the  ovs-vswitchd, using neutron agent-list just after the undercloud upgrade.

Then when this is sorted out, we will check further why the controller-0 is created but not responding.

Comment 23 Raviv Bar-Tal 2017-04-24 09:48:50 UTC

The same problem happen again when running the procedure from the document line by line.

Comment 24 Sofer Athlan-Guyot 2017-04-24 09:53:39 UTC

Hi,

just as a precision, the problem we reproduce here is that the upgrade of the undercloud hits the non ripping of neutron rootwrap daemon and thus has a non ovs-vswitchd daemon.

So, this bz is not yet related to mixversion problem but to *undercloud upgrade* problem and openvswitch.

Comment 25 Sofer Athlan-Guyot 2017-04-24 09:59:19 UTC

So confirmation of the non ripping rootwrap:

[root@undercloud-0 ~]# netstat -pant | grep 6633 | grep LIST
tcp       51      0 127.0.0.1:6633          0.0.0.0:*               LISTEN      4870/sudo           

[root@undercloud-0 ~]# ps fauxww | grep 4870
root     24358  0.0  0.0 112648   964 pts/1    S+   05:43   0:00                      \_ grep --color=auto 4870
root      4870  0.0  0.0 193392  2812 ?        S    04:47   0:00 sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf

[root@undercloud-0 ~]# systemctl status ovs-vswitchd
● ovs-vswitchd.service - Open vSwitch Forwarding Unit
   Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled)
   Active: active (running) since Mon 2017-04-24 04:47:41 EDT; 56min ago
 Main PID: 4750 (ovs-vswitchd)
   CGroup: /system.slice/ovs-vswitchd.service
           └─4750 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach

Apr 24 04:47:41 undercloud-0.redhat.local systemd[1]: Starting Open vSwitch Forwarding Unit...
Apr 24 04:47:41 undercloud-0.redhat.local ovs-ctl[4726]: Starting ovs-vswitchd [  OK  ]
Apr 24 04:47:41 undercloud-0.redhat.local ovs-ctl[4726]: Enabling remote OVSDB managers [  OK  ]
Apr 24 04:47:41 undercloud-0.redhat.local systemd[1]: Started Open vSwitch Forwarding Unit.
Apr 24 05:00:33 undercloud-0.redhat.local ovs-vswitchd[4750]: ovs|00052|rconn|ERR|br-int<->tcp:127.0.0.1:6633: no response to inactivity probe after 5 seconds, disconnecting
Apr 24 05:00:34 undercloud-0.redhat.local ovs-vswitchd[4750]: ovs|00054|rconn|ERR|br-ctlplane<->tcp:127.0.0.1:6633: no response to inactivity probe after 5 seconds, disconnecting


The packages used:

Apr 23 08:51:35 Installed: openvswitch-2.5.0-14.git20160727.el7fdp.x86_64
Apr 23 08:58:29 Installed: python-openvswitch-2.5.0-14.git20160727.el7fdp.noarch
Apr 23 08:58:29 Installed: 1:openstack-neutron-openvswitch-9.2.0-8.el7ost.noarch

Apr 24 04:43:06 Updated: python-openvswitch-2.6.1-10.git20161206.el7fdp.noarch
Apr 24 04:43:26 Updated: openvswitch-2.6.1-10.git20161206.el7fdp.x86_64
Apr 24 04:43:39 Updated: 1:openstack-neutron-openvswitch-10.0.1-1.el7ost.noarch

Comment 26 Jakub Libosvar 2017-04-24 10:04:07 UTC

(In reply to Sofer Athlan-Guyot from comment #24)
> Hi,
> 
> just as a precision, the problem we reproduce here is that the upgrade of
> the undercloud hits the non ripping of neutron rootwrap daemon and thus has
> a non ovs-vswitchd daemon.

What is 'non ovs-vswitch daemon'? How does ovs-vswitchd relate to rootwrap daemon?

ovs-vswitchd sets the datapath based on ovsdb content. It should be running during the upgrade all the time. The culprit must be in the scriptlet of openstack-neutron-openvswitch package and the way it kills orphaned neutron-rootwrap-daemon if you see the orphaned process after undercloud upgrade.

Comment 28 Raviv Bar-Tal 2017-04-24 13:22:56 UTC

This bug is the duplication of BZ 1425507, 
New bug about dead neutron agent after undercloud upgrade will be opens regardless of the mix version and overcloud networking issue

*** This bug has been marked as a duplicate of bug 1425507 ***

Note You need to log in before you can comment on or make changes to this bug.