Bug 1294680 - os-net-config adds provisionning interface to linux bond
os-net-config adds provisionning interface to linux bond
Status: CLOSED CURRENTRELEASE
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-net-config (Show other bugs)
7.0 (Kilo)
x86_64 Linux
urgent Severity urgent
: ga
: 8.0 (Liberty)
Assigned To: RHOS Maint
Udi Shkalim
: Reopened, TestOnly
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-29 10:54 EST by Dimitri Savineau
Modified: 2016-07-27 17:27 EDT (History)
13 users (show)

See Also:
Fixed In Version: os-net-config-0.2.3-2.el7ost.noarch
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-07-27 17:27:47 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
controller.yaml (5.75 KB, text/plain)
2016-05-03 08:04 EDT, Udi Shkalim
no flags Details
network-environment.yaml (1.51 KB, text/plain)
2016-05-03 08:05 EDT, Udi Shkalim
no flags Details
network configuration applied on the controller (3.17 KB, text/plain)
2016-05-03 08:07 EDT, Udi Shkalim
no flags Details

  None (edit)
Description Dimitri Savineau 2015-12-29 10:54:55 EST
Description of problem:
os-net-config is adding the provisionning/ctrlplane interface in the linux bond.
The first run of os-net-config creates all the bond/bridge/vlan as expected. But during the second run os-net-config will try to add the last interface (prosionning/ctrlplane) to the linux bond resulting a lost of the connectivity.

Version-Release number of selected component (if applicable):
RHEL 7.2 / OSP-d 7.2 / OSP 7.0.3
os-net-config-0.1.4-6.el7ost.noarch

How reproducible:
Deploy an overcloud with linux bond and a dedicated interface for provisionning.

Actual results:
* 1 linux bond (bond0) with 3 interfaces (eth{0,1,2}) :
* 1 provisionning interface DOWN (eth4)

Expected results:
* 1 linux bond (bond0) with 3 interfaces (eth{0,1,2})
* 1 provisionning interface UP (eth4) 

Additional info:

In OSP-d 7.1 the same setup but with ovs bond (balance-tcp/lacp) instead of linux bond was working without issues.
Comment 6 Dan Sneddon 2016-01-29 10:34:49 EST
I'm not sure why os-net-config is being run twice, I thought it was only run once during deployment. It appears to be failing the second time, though, and not the first.

This problem can probably be worked around by using the real NIC names in the templates rather than the NIC abstractions (so use "eth3" instead of "nic4" for provisioning).

I'm confused about why os-net-config was called twice, however, was there anything out of the ordinary about this deployment?
Comment 7 Dimitri Savineau 2016-01-29 11:00:47 EST
I can try to use real NIC names but I don't think it will change anything.

When I use ovs bond instead of linux bond with the same configuration I don't have any errors.

The diff between both configuration is :

- type: ovs_bond
+ type: linux_bond

- ovs_options: {get_param: BondInterfaceOvsOptions}
+ bonding_options: {get_param: BondInterfaceOvsOptions}

- BondInterfaceOvsOptions: "bond_mod=active-backup lacp=active other-config:lacp-fallback-ab=true"
+ BondInterfaceOvsOptions: "mode=balance-xor xmit_hash_policy=layer2+3 miimon=100"

When ovs bond is used in a deployment I also have multiple runs of os-net-config (more than 2) :

* Controllers :
$ journalctl -u os-collect-config|grep -c "os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes"
15
* Computes :
$ journalctl -u os-collect-config|grep -c "os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes"
6
* Ceph :
$ journalctl -u os-collect-config|grep -c "os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes"
8
Comment 8 Dimitri Savineau 2016-01-29 15:59:08 EST
Ok using the real name fixed the issue.

So the problem is in the interfaces abstractions ?
Comment 9 Dan Sneddon 2016-01-29 16:31:49 EST
(In reply to Dimitri Savineau from comment #8)
> Ok using the real name fixed the issue.
> 
> So the problem is in the interfaces abstractions ?

Yes and no. We definitely have problems with consistency using the net abstractions sometimes. In this case, it looks like os-net-config was run twice, though, which may indicate a problem. Either way, this workaround should work in this case.
Comment 10 Dan Prince 2016-01-29 16:37:32 EST
Is the machine in question using biosdevname?

In TripleO this would mean your overcloud image was build using the stable-interface-names element.
Comment 12 Dan Sneddon 2016-02-03 09:20:31 EST
Can you please answer Dan Prince's question above?
Comment 13 Dimitri Savineau 2016-02-04 09:18:46 EST
I don't have built the images. I use pre-build images from CDN (7.2.0-46).

There is no reference to biosdevname or net.ifnames in the kernel boot options / grub configuration.
Comment 14 Dan Sneddon 2016-02-04 10:51:02 EST
(In reply to Dimitri Savineau from comment #13)
> I don't have built the images. I use pre-build images from CDN (7.2.0-46).
> 
> There is no reference to biosdevname or net.ifnames in the kernel boot
> options / grub configuration.

That's fine, you can add the options to grub in the prebuilt images.

So, for instance, if I have the following in /etc/grub2.cfg:

linux16 /boot/vmlinuz-3.10.0-327.4.5.el7.x86_64 root=UUID=d43fb280-ab24-4996-9af0-440e4601c988 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet LANG=en_US.UTF-8

I can just add the options to the boot string:

linux16 /boot/vmlinuz-3.10.0-327.4.5.el7.x86_64 root=UUID=d43fb280-ab24-4996-9af0-440e4601c988 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet LANG=en_US.UTF-8 net.ifnames=0 biosdevname=0

Here is the procedure to edit the grub2.cnf in guestfish:

sudo yum install -y guestfish
guestfish
add overcloud-full.qcow2
run
mount /dev/sda /
vi /etc/grub2.cfg   # edit the linux boot string
grub2-install /dev/sda
grub-mkconfig -o /boot/grub/grub.cfg
exit
Comment 15 Dimitri Savineau 2016-02-04 13:13:25 EST
I have exactly the same issue using net.ifnames=0 biosdevname=0 options and interface abstraction.

$ virt-customize -a overcloud-full.qcow2 --run-command "sed -i -e 's/crashkernel=auto/crashkernel=auto net.ifnames=0 biosdevname=0/' /etc/default/grub /root/anaconda-ks.cfg"

// on overcloud nodes
$ cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-3.10.0-327.3.1.el7.x86_64 root=UUID=63014910-5a3c-4c2a-aa85-eca64c45c3e3 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto net.ifnames=0 biosdevname=0 rhgb quiet
Comment 16 Dimitri Savineau 2016-02-29 11:05:12 EST
FYI this issue is also present in the last release (7.3)

RHEL 7.2 / OSP-d 7.3 / OSP 7.0.4
# rpm -qa os-net-config
os-net-config-0.1.4-7.el7ost.noarch
Comment 17 Mike Burns 2016-04-07 17:03:37 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.
Comment 18 Dan Sneddon 2016-04-07 17:05:24 EDT
(In reply to Dimitri Savineau from comment #16)
> FYI this issue is also present in the last release (7.3)
> 
> RHEL 7.2 / OSP-d 7.3 / OSP 7.0.4
> # rpm -qa os-net-config
> os-net-config-0.1.4-7.el7ost.noarch

Dimitri, thanks for your work on testing this bug earlier this year. I believe the problem was fixed with the latest os-net-config that is going to be released with OSP 8. If you get a chance to test this bug with OSP 8, please let me know if you find if it is fixed or still broken.
Comment 19 Dimitri Savineau 2016-04-08 06:36:19 EDT
Dan, I can confirm that the bug is not present with the GA puddle 8.0

# rpm -qa os-net-config
os-net-config-0.2.3-2.el7ost.noarch
Comment 23 Udi Shkalim 2016-05-02 11:10:13 EDT
Hey, Can you share the exact yaml file used for the verification steps since I've tried with the one attached but it failed the deployment.
thanks
Comment 25 Dimitri Savineau 2016-05-02 12:02 EDT
Created attachment 1152999 [details]
Controller nic template

@ushkalim you can find the controller nic template in attachment.
Comment 28 Udi Shkalim 2016-05-03 04:11:43 EDT
If that is the case then my deployment failed using the attached template - Do you want to have a look on the environment?
Comment 29 Udi Shkalim 2016-05-03 08:04 EDT
Created attachment 1153371 [details]
controller.yaml
Comment 30 Udi Shkalim 2016-05-03 08:05 EDT
Created attachment 1153372 [details]
network-environment.yaml
Comment 31 Udi Shkalim 2016-05-03 08:07 EDT
Created attachment 1153373 [details]
network configuration applied on the controller
Comment 32 Udi Shkalim 2016-05-03 08:09:32 EDT
Verified on: os-net-config-0.2.3-2.el7ost.noarch
All files used are attached.

Note You need to log in before you can comment on or make changes to this bug.