Bug 1294680 - os-net-config adds provisionning interface to linux bond
Summary: os-net-config adds provisionning interface to linux bond
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-net-config
Version: 7.0 (Kilo)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ga
: 8.0 (Liberty)
Assignee: RHOS Maint
QA Contact: Udi Shkalim
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-12-29 15:54 UTC by Dimitri Savineau
Modified: 2019-10-10 10:47 UTC (History)
13 users (show)

Fixed In Version: os-net-config-0.2.3-2.el7ost.noarch
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-27 21:27:47 UTC
Target Upstream Version:


Attachments (Terms of Use)
controller.yaml (5.75 KB, text/plain)
2016-05-03 12:04 UTC, Udi Shkalim
no flags Details
network-environment.yaml (1.51 KB, text/plain)
2016-05-03 12:05 UTC, Udi Shkalim
no flags Details
network configuration applied on the controller (3.17 KB, text/plain)
2016-05-03 12:07 UTC, Udi Shkalim
no flags Details

Description Dimitri Savineau 2015-12-29 15:54:55 UTC
Description of problem:
os-net-config is adding the provisionning/ctrlplane interface in the linux bond.
The first run of os-net-config creates all the bond/bridge/vlan as expected. But during the second run os-net-config will try to add the last interface (prosionning/ctrlplane) to the linux bond resulting a lost of the connectivity.

Version-Release number of selected component (if applicable):
RHEL 7.2 / OSP-d 7.2 / OSP 7.0.3
os-net-config-0.1.4-6.el7ost.noarch

How reproducible:
Deploy an overcloud with linux bond and a dedicated interface for provisionning.

Actual results:
* 1 linux bond (bond0) with 3 interfaces (eth{0,1,2}) :
* 1 provisionning interface DOWN (eth4)

Expected results:
* 1 linux bond (bond0) with 3 interfaces (eth{0,1,2})
* 1 provisionning interface UP (eth4) 

Additional info:

In OSP-d 7.1 the same setup but with ovs bond (balance-tcp/lacp) instead of linux bond was working without issues.

Comment 6 Dan Sneddon 2016-01-29 15:34:49 UTC
I'm not sure why os-net-config is being run twice, I thought it was only run once during deployment. It appears to be failing the second time, though, and not the first.

This problem can probably be worked around by using the real NIC names in the templates rather than the NIC abstractions (so use "eth3" instead of "nic4" for provisioning).

I'm confused about why os-net-config was called twice, however, was there anything out of the ordinary about this deployment?

Comment 7 Dimitri Savineau 2016-01-29 16:00:47 UTC
I can try to use real NIC names but I don't think it will change anything.

When I use ovs bond instead of linux bond with the same configuration I don't have any errors.

The diff between both configuration is :

- type: ovs_bond
+ type: linux_bond

- ovs_options: {get_param: BondInterfaceOvsOptions}
+ bonding_options: {get_param: BondInterfaceOvsOptions}

- BondInterfaceOvsOptions: "bond_mod=active-backup lacp=active other-config:lacp-fallback-ab=true"
+ BondInterfaceOvsOptions: "mode=balance-xor xmit_hash_policy=layer2+3 miimon=100"

When ovs bond is used in a deployment I also have multiple runs of os-net-config (more than 2) :

* Controllers :
$ journalctl -u os-collect-config|grep -c "os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes"
15
* Computes :
$ journalctl -u os-collect-config|grep -c "os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes"
6
* Ceph :
$ journalctl -u os-collect-config|grep -c "os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes"
8

Comment 8 Dimitri Savineau 2016-01-29 20:59:08 UTC
Ok using the real name fixed the issue.

So the problem is in the interfaces abstractions ?

Comment 9 Dan Sneddon 2016-01-29 21:31:49 UTC
(In reply to Dimitri Savineau from comment #8)
> Ok using the real name fixed the issue.
> 
> So the problem is in the interfaces abstractions ?

Yes and no. We definitely have problems with consistency using the net abstractions sometimes. In this case, it looks like os-net-config was run twice, though, which may indicate a problem. Either way, this workaround should work in this case.

Comment 10 Dan Prince 2016-01-29 21:37:32 UTC
Is the machine in question using biosdevname?

In TripleO this would mean your overcloud image was build using the stable-interface-names element.

Comment 12 Dan Sneddon 2016-02-03 14:20:31 UTC
Can you please answer Dan Prince's question above?

Comment 13 Dimitri Savineau 2016-02-04 14:18:46 UTC
I don't have built the images. I use pre-build images from CDN (7.2.0-46).

There is no reference to biosdevname or net.ifnames in the kernel boot options / grub configuration.

Comment 14 Dan Sneddon 2016-02-04 15:51:02 UTC
(In reply to Dimitri Savineau from comment #13)
> I don't have built the images. I use pre-build images from CDN (7.2.0-46).
> 
> There is no reference to biosdevname or net.ifnames in the kernel boot
> options / grub configuration.

That's fine, you can add the options to grub in the prebuilt images.

So, for instance, if I have the following in /etc/grub2.cfg:

linux16 /boot/vmlinuz-3.10.0-327.4.5.el7.x86_64 root=UUID=d43fb280-ab24-4996-9af0-440e4601c988 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet LANG=en_US.UTF-8

I can just add the options to the boot string:

linux16 /boot/vmlinuz-3.10.0-327.4.5.el7.x86_64 root=UUID=d43fb280-ab24-4996-9af0-440e4601c988 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet LANG=en_US.UTF-8 net.ifnames=0 biosdevname=0

Here is the procedure to edit the grub2.cnf in guestfish:

sudo yum install -y guestfish
guestfish
add overcloud-full.qcow2
run
mount /dev/sda /
vi /etc/grub2.cfg   # edit the linux boot string
grub2-install /dev/sda
grub-mkconfig -o /boot/grub/grub.cfg
exit

Comment 15 Dimitri Savineau 2016-02-04 18:13:25 UTC
I have exactly the same issue using net.ifnames=0 biosdevname=0 options and interface abstraction.

$ virt-customize -a overcloud-full.qcow2 --run-command "sed -i -e 's/crashkernel=auto/crashkernel=auto net.ifnames=0 biosdevname=0/' /etc/default/grub /root/anaconda-ks.cfg"

// on overcloud nodes
$ cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-3.10.0-327.3.1.el7.x86_64 root=UUID=63014910-5a3c-4c2a-aa85-eca64c45c3e3 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto net.ifnames=0 biosdevname=0 rhgb quiet

Comment 16 Dimitri Savineau 2016-02-29 16:05:12 UTC
FYI this issue is also present in the last release (7.3)

RHEL 7.2 / OSP-d 7.3 / OSP 7.0.4
# rpm -qa os-net-config
os-net-config-0.1.4-7.el7ost.noarch

Comment 17 Mike Burns 2016-04-07 21:03:37 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 18 Dan Sneddon 2016-04-07 21:05:24 UTC
(In reply to Dimitri Savineau from comment #16)
> FYI this issue is also present in the last release (7.3)
> 
> RHEL 7.2 / OSP-d 7.3 / OSP 7.0.4
> # rpm -qa os-net-config
> os-net-config-0.1.4-7.el7ost.noarch

Dimitri, thanks for your work on testing this bug earlier this year. I believe the problem was fixed with the latest os-net-config that is going to be released with OSP 8. If you get a chance to test this bug with OSP 8, please let me know if you find if it is fixed or still broken.

Comment 19 Dimitri Savineau 2016-04-08 10:36:19 UTC
Dan, I can confirm that the bug is not present with the GA puddle 8.0

# rpm -qa os-net-config
os-net-config-0.2.3-2.el7ost.noarch

Comment 23 Udi Shkalim 2016-05-02 15:10:13 UTC
Hey, Can you share the exact yaml file used for the verification steps since I've tried with the one attached but it failed the deployment.
thanks

Comment 25 Dimitri Savineau 2016-05-02 16:02:41 UTC
Created attachment 1152999 [details]
Controller nic template

@ushkalim you can find the controller nic template in attachment.

Comment 28 Udi Shkalim 2016-05-03 08:11:43 UTC
If that is the case then my deployment failed using the attached template - Do you want to have a look on the environment?

Comment 29 Udi Shkalim 2016-05-03 12:04:13 UTC
Created attachment 1153371 [details]
controller.yaml

Comment 30 Udi Shkalim 2016-05-03 12:05:20 UTC
Created attachment 1153372 [details]
network-environment.yaml

Comment 31 Udi Shkalim 2016-05-03 12:07:55 UTC
Created attachment 1153373 [details]
network configuration applied on the controller

Comment 32 Udi Shkalim 2016-05-03 12:09:32 UTC
Verified on: os-net-config-0.2.3-2.el7ost.noarch
All files used are attached.


Note You need to log in before you can comment on or make changes to this bug.