Bug 1467496 - zero byte ifcfg files after overcloud deployment in Ravello
Summary: zero byte ifcfg files after overcloud deployment in Ravello
Keywords:
Status: CLOSED DUPLICATE of bug 1450223
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Assaf Muller
QA Contact: Toni Freger
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-04 04:23 UTC by Michael Jarrett
Modified: 2017-07-20 13:14 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-07-20 13:14:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
openvswitch_agent.log with ryu exception (24.84 KB, text/plain)
2017-07-19 22:36 UTC, Bob Fournier
no flags Details

Description Michael Jarrett 2017-07-04 04:23:35 UTC
Description of problem:


After deploying the overcloud the ifcfg files are zero btye when deploying in Ravello. This issue is not present with the ILT implementation. When this happens to compute1, if I remove ifcfg-eth0 and reboot, compute then receives the proper ifcfg-eth0 file. 


How reproducible:


Steps to Reproduce:
1. Deploy the overcloud 
 [stack@director ~]$ openstack overcloud deploy \
--templates ~/templates --compute-scale 2 \
--environment-directory ~/templates/cl210-environment
2. Attempt to SSH into compute1 
3. Console into compute1 and CAT the ifcfg-eth0 file.

Actual results:
Cannot SSH into compute1. After accessing the VM using the console, the ifcfg-eth0 file is zero byte.

Expected results:
Access compute1 using SSH. The ifcfg-eth0 file is populated correctly.

Additional info:
cloud-init.log before manual reboot:
Jul  3 23:34:57 localhost cloud-init: Cloud-init v. 0.7.6 running 'init-local' at Tue, 04 Jul 2017 03:34:57 +0000. 
Up 10.69 seconds.
Jul  3 23:35:06 localhost cloud-init: Cloud-init v. 0.7.6 running 'init' at Tue, 04 Jul 2017 03:35:06 +0000. Up 19.
94 seconds.
Jul  3 23:35:06 localhost cloud-init: 2017-07-03 23:35:06,643 - util.py[WARNING]: Route info failed: Unexpected err
or while running command.
Jul  3 23:35:06 localhost cloud-init: Command: ['netstat', '-rn']
Jul  3 23:35:06 localhost cloud-init: Exit code: 1
Jul  3 23:35:06 localhost cloud-init: Reason: -
Jul  3 23:35:06 localhost cloud-init: Stdout: 'Kernel IP routing table\nDestination     Gateway         Genmask    
     Flags   MSS Window  irtt Iface\n'
Jul  3 23:35:06 localhost cloud-init: Stderr: ''
Jul  3 23:35:06 localhost cloud-init: ci-info: +++++++++++++++++++++++Net device info+++++++++++++++++++++++
Jul  3 23:35:06 localhost cloud-init: ci-info: +--------+------+-----------+-----------+-------------------+
Jul  3 23:35:06 localhost cloud-init: ci-info: | Device |  Up  |  Address  |    Mask   |     Hw-Address    |
Jul  3 23:35:06 localhost cloud-init: ci-info: +--------+------+-----------+-----------+-------------------+
Jul  3 23:35:06 localhost cloud-init: ci-info: |  lo:   | True | 127.0.0.1 | 255.0.0.0 |         .         |
Jul  3 23:35:06 localhost cloud-init: ci-info: | eth1:  | True |     .     |     .     | 52:54:00:01:00:0c |
Jul  3 23:35:06 localhost cloud-init: ci-info: | eth2:  | True |     .     |     .     | 52:54:00:02:fa:0c |
Jul  3 23:35:06 localhost cloud-init: ci-info: | eth0:  | True |     .     |     .     | 52:54:00:00:f9:0c |
Jul  3 23:35:06 localhost cloud-init: ci-info: +--------+------+-----------+-----------+-------------------+
Jul  3 23:35:06 localhost cloud-init: ci-info: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Route info failed!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!
Jul  3 23:35:13 localhost cloud-init: Cloud-init v. 0.7.6 running 'modules:config' at Tue, 04 Jul 2017 03:35:13 +00
00. Up 26.77 seconds.
Jul  3 23:35:15 localhost cloud-init: Cloud-init v. 0.7.6 running 'modules:final' at Tue, 04 Jul 2017 03:35:15 +000
0. Up 28.98 seconds.
Jul  3 23:35:15 localhost cloud-init: Cloud-init v. 0.7.6 finished at Tue, 04 Jul 2017 03:35:15 +0000. Datasource D
ataSourceConfigDriveNet [net,ver=2][source=/dev/vda1].  Up 29.41 seconds
[heat-admin@overcloud-compute-1 ~]$ 

cloud-init.log after manual reboot:
Jul  3 23:40:24 overcloud-compute-1 cloud-init: Cloud-init v. 0.7.6 running 'init-local' at Tue, 04 Jul 2017 03:40:
24 +0000. Up 10.17 seconds.
Jul  3 23:40:39 overcloud-compute-1 cloud-init: Cloud-init v. 0.7.6 running 'init' at Tue, 04 Jul 2017 03:40:39 +00
00. Up 24.88 seconds.
Jul  3 23:40:39 overcloud-compute-1 cloud-init: ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++
+++++++++++
Jul  3 23:40:39 overcloud-compute-1 cloud-init: ci-info: +--------+------+---------------+---------------+---------
----------+
Jul  3 23:40:39 overcloud-compute-1 cloud-init: ci-info: | Device |  Up  |    Address    |      Mask     |     Hw-A
ddress    |
Jul  3 23:40:39 overcloud-compute-1 cloud-init: ci-info: +--------+------+---------------+---------------+---------
----------+
Jul  3 23:40:39 overcloud-compute-1 cloud-init: ci-info: |  lo:   | True |   127.0.0.1   |   255.0.0.0   |         
.         |
Jul  3 23:40:39 overcloud-compute-1 cloud-init: ci-info: | eth1:  | True |       .       |       .       | 52:54:00
:01:00:0c |
Jul  3 23:40:39 overcloud-compute-1 cloud-init: ci-info: | eth2:  | True |       .       |       .       | 52:54:00
:02:fa:0c |
Jul  3 23:40:39 overcloud-compute-1 cloud-init: ci-info: | eth0:  | True | 172.25.249.52 | 255.255.255.0 | 52:54:00
:00:f9:0c |
Jul  3 23:40:39 overcloud-compute-1 cloud-init: ci-info: +--------+------+---------------+---------------+---------
----------+
Jul  3 23:40:39 overcloud-compute-1 cloud-init: ci-info: ++++++++++++++++++++++++++++++++++++Route info++++++++++++
++++++++++++++++++++++++
Jul  3 23:40:40 overcloud-compute-1 cloud-init: ci-info: +-------+-----------------+----------------+--------------
---+-----------+-------+
Jul  3 23:40:40 overcloud-compute-1 cloud-init: ci-info: | Route |   Destination   |    Gateway     |     Genmask  
   | Interface | Flags |
Jul  3 23:40:40 overcloud-compute-1 cloud-init: ci-info: +-------+-----------------+----------------+--------------
---+-----------+-------+
Jul  3 23:40:40 overcloud-compute-1 cloud-init: ci-info: |   0   |     0.0.0.0     | 172.25.249.200 |     0.0.0.0  
   |    eth0   |   UG  |
Jul  3 23:40:40 overcloud-compute-1 cloud-init: ci-info: |   1   | 169.254.169.254 | 172.25.249.200 | 255.255.255.2
55 |    eth0   |  UGH  |
Jul  3 23:40:40 overcloud-compute-1 cloud-init: ci-info: |   2   | 169.254.169.254 | 172.25.249.51  | 255.255.255.2
55 |    eth0   |  UGH  |
Jul  3 23:40:40 overcloud-compute-1 cloud-init: ci-info: |   3   |   172.25.249.0  |    0.0.0.0     |  255.255.255.
0  |    eth0   |   U   |
Jul  3 23:40:40 overcloud-compute-1 cloud-init: ci-info: +-------+-----------------+----------------+--------------
---+-----------+-------+
Jul  3 23:40:43 overcloud-compute-1 cloud-init: Cloud-init v. 0.7.6 running 'modules:config' at Tue, 04 Jul 2017 03
:40:43 +0000. Up 29.07 seconds.
Jul  3 23:40:45 overcloud-compute-1 cloud-init: Cloud-init v. 0.7.6 running 'modules:final' at Tue, 04 Jul 2017 03:
40:44 +0000. Up 30.23 seconds.
Jul  3 23:40:45 overcloud-compute-1 cloud-init: Cloud-init v. 0.7.6 finished at Tue, 04 Jul 2017 03:40:45 +0000. Da
tasource DataSourceConfigDriveNet [net,ver=2][source=/dev/vda1].  Up 30.41 seconds

Comment 2 Red Hat Bugzilla Rules Engine 2017-07-07 18:11:58 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 3 Bob Fournier 2017-07-18 11:11:48 UTC
Michael or Forrest - can you provide a sosreport when this occurs and/or set me up in an environment where I can duplicate this?  SSO ID is bfournie.  Thanks.

Comment 5 Bob Fournier 2017-07-19 22:35:23 UTC
I was able to reproduce this bug doing the training, specifically in Chapter 6 - Scaling Overcloud Nodes. The steps described in the video cause the failure as described on compute1:
- ifcfg files are empty
- "ip a" shows no address for eth0 or any other interfaces

It appears that the deployment to add an overcloud node has not completed successfully.

Looking at the logs on the undercloud, this appears to be a Neutron bug that has been fixed.
I see this in /var/log/neutron/openvswitch-agent.log:
2017-07-01 08:07:47.039 24375 ERROR ryu.lib.hub [-] hub: uncaught exception: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 545, in close
    self.uninstantiate(app_name)
  File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 528, in uninstantiate
    app = self.applications.pop(name)
KeyError: 'ofctl_service'

2017-07-01 08:07:47.203 24375 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=24451

The bug is https://bugzilla.redhat.com/show_bug.cgi?id=1425507

See https://bugzilla.redhat.com/show_bug.cgi?id=1425507#c15
the stack trace in the comment matches this trace exactly and the comment describes the issue being seen:
"Moreover this issue appears to be affecting not only overcloud upgrades but undercloud upgrade as well, preventing operations such as adding overcloud nodes."

Note that packages installed in undercloud:
[stack@director log]$ rpm -aq | grep openvswitch
openstack-neutron-openvswitch-9.2.0-2.el7ost.noarch
openvswitch-2.5.0-14.git20160727.el7fdp.x86_64
python-openvswitch-2.5.0-14.git20160727.el7fdp.noarch

I'm attaching the openvswitch_agent.log file that shows the error

Comment 6 Bob Fournier 2017-07-19 22:36:48 UTC
Created attachment 1301448 [details]
openvswitch_agent.log with ryu exception

Comment 7 Bob Fournier 2017-07-19 22:44:28 UTC
I recommend upgrading the training system to latest OSP-11 build to pick up this bug fix.  If that's not possible for this training setup there may be workarounds available.  I'm going to add some people who were involved in https://bugzilla.redhat.com/show_bug.cgi?id=1425507 in case they can recommend a workaround.

I will leave this bug open for a short while before closing it as a duplicate.

Comment 8 Robert Locke 2017-07-19 23:24:29 UTC
So, upgrading to OSP 11 is not an option for our training environment. We have made a commitment to the "extended release" versions so need to stay on OSP 10 (course will likely be revised for OSP 13).

The current classroom environment is running OSP 10.0.2. Has this been resolved in 10.0.4 or a subsequent maintenance release of 10.0.z?

Comment 9 Bob Fournier 2017-07-19 23:56:47 UTC
>So, upgrading to OSP 11 is not an option for our training environment. We have made a >commitment to the "extended release" versions so need to stay on OSP 10 (course will >likely be revised for OSP 13).

I understand.

>The current classroom environment is running OSP 10.0.2. Has this been resolved in >10.0.4 or a subsequent maintenance release of 10.0.z?

Marius or Jakub - can you indicate if there is a patch in OSP 10 for this openvswitch ryu issue? Thanks.

Comment 10 Jakub Libosvar 2017-07-20 11:19:13 UTC
(In reply to Bob Fournier from comment #9)
> >So, upgrading to OSP 11 is not an option for our training environment. We have made a >commitment to the "extended release" versions so need to stay on OSP 10 (course will >likely be revised for OSP 13).
> 
> I understand.
> 
> >The current classroom environment is running OSP 10.0.2. Has this been resolved in >10.0.4 or a subsequent maintenance release of 10.0.z?
> 
> Marius or Jakub - can you indicate if there is a patch in OSP 10 for this
> openvswitch ryu issue? Thanks.

There is a solved bug for OSP10 overcloud [1], RDO patch backported to OSP10 is here [2]. For undercloud, reboot of the node is required as per [3].

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1450223
[2] https://review.rdoproject.org/r/#/c/6648
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1444883#c14

Comment 11 Bob Fournier 2017-07-20 13:14:03 UTC
>There is a solved bug for OSP10 overcloud [1], RDO patch backported to OSP10 is here 
>[2]. For undercloud, reboot of the node is required as per [3].

>[1] https://bugzilla.redhat.com/show_bug.cgi?id=1450223
>[2] https://review.rdoproject.org/r/#/c/6648
>[3] https://bugzilla.redhat.com/show_bug.cgi?id=1444883#c14

Thanks a lot Jakub.

I'm closing this as a duplicate, note that according to [1] the fix is in 10.0.z3 
Please pick up release and reboot undercloud.

*** This bug has been marked as a duplicate of bug 1450223 ***


Note You need to log in before you can comment on or make changes to this bug.