Hide Forgot
Description of problem: os-net-config is adding the provisionning/ctrlplane interface in the linux bond. The first run of os-net-config creates all the bond/bridge/vlan as expected. But during the second run os-net-config will try to add the last interface (prosionning/ctrlplane) to the linux bond resulting a lost of the connectivity. Version-Release number of selected component (if applicable): RHEL 7.2 / OSP-d 7.2 / OSP 7.0.3 os-net-config-0.1.4-6.el7ost.noarch How reproducible: Deploy an overcloud with linux bond and a dedicated interface for provisionning. Actual results: * 1 linux bond (bond0) with 3 interfaces (eth{0,1,2}) : * 1 provisionning interface DOWN (eth4) Expected results: * 1 linux bond (bond0) with 3 interfaces (eth{0,1,2}) * 1 provisionning interface UP (eth4) Additional info: In OSP-d 7.1 the same setup but with ovs bond (balance-tcp/lacp) instead of linux bond was working without issues.
I'm not sure why os-net-config is being run twice, I thought it was only run once during deployment. It appears to be failing the second time, though, and not the first. This problem can probably be worked around by using the real NIC names in the templates rather than the NIC abstractions (so use "eth3" instead of "nic4" for provisioning). I'm confused about why os-net-config was called twice, however, was there anything out of the ordinary about this deployment?
I can try to use real NIC names but I don't think it will change anything. When I use ovs bond instead of linux bond with the same configuration I don't have any errors. The diff between both configuration is : - type: ovs_bond + type: linux_bond - ovs_options: {get_param: BondInterfaceOvsOptions} + bonding_options: {get_param: BondInterfaceOvsOptions} - BondInterfaceOvsOptions: "bond_mod=active-backup lacp=active other-config:lacp-fallback-ab=true" + BondInterfaceOvsOptions: "mode=balance-xor xmit_hash_policy=layer2+3 miimon=100" When ovs bond is used in a deployment I also have multiple runs of os-net-config (more than 2) : * Controllers : $ journalctl -u os-collect-config|grep -c "os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes" 15 * Computes : $ journalctl -u os-collect-config|grep -c "os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes" 6 * Ceph : $ journalctl -u os-collect-config|grep -c "os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes" 8
Ok using the real name fixed the issue. So the problem is in the interfaces abstractions ?
(In reply to Dimitri Savineau from comment #8) > Ok using the real name fixed the issue. > > So the problem is in the interfaces abstractions ? Yes and no. We definitely have problems with consistency using the net abstractions sometimes. In this case, it looks like os-net-config was run twice, though, which may indicate a problem. Either way, this workaround should work in this case.
Is the machine in question using biosdevname? In TripleO this would mean your overcloud image was build using the stable-interface-names element.
Can you please answer Dan Prince's question above?
I don't have built the images. I use pre-build images from CDN (7.2.0-46). There is no reference to biosdevname or net.ifnames in the kernel boot options / grub configuration.
(In reply to Dimitri Savineau from comment #13) > I don't have built the images. I use pre-build images from CDN (7.2.0-46). > > There is no reference to biosdevname or net.ifnames in the kernel boot > options / grub configuration. That's fine, you can add the options to grub in the prebuilt images. So, for instance, if I have the following in /etc/grub2.cfg: linux16 /boot/vmlinuz-3.10.0-327.4.5.el7.x86_64 root=UUID=d43fb280-ab24-4996-9af0-440e4601c988 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet LANG=en_US.UTF-8 I can just add the options to the boot string: linux16 /boot/vmlinuz-3.10.0-327.4.5.el7.x86_64 root=UUID=d43fb280-ab24-4996-9af0-440e4601c988 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet LANG=en_US.UTF-8 net.ifnames=0 biosdevname=0 Here is the procedure to edit the grub2.cnf in guestfish: sudo yum install -y guestfish guestfish add overcloud-full.qcow2 run mount /dev/sda / vi /etc/grub2.cfg # edit the linux boot string grub2-install /dev/sda grub-mkconfig -o /boot/grub/grub.cfg exit
I have exactly the same issue using net.ifnames=0 biosdevname=0 options and interface abstraction. $ virt-customize -a overcloud-full.qcow2 --run-command "sed -i -e 's/crashkernel=auto/crashkernel=auto net.ifnames=0 biosdevname=0/' /etc/default/grub /root/anaconda-ks.cfg" // on overcloud nodes $ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-3.10.0-327.3.1.el7.x86_64 root=UUID=63014910-5a3c-4c2a-aa85-eca64c45c3e3 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto net.ifnames=0 biosdevname=0 rhgb quiet
FYI this issue is also present in the last release (7.3) RHEL 7.2 / OSP-d 7.3 / OSP 7.0.4 # rpm -qa os-net-config os-net-config-0.1.4-7.el7ost.noarch
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
(In reply to Dimitri Savineau from comment #16) > FYI this issue is also present in the last release (7.3) > > RHEL 7.2 / OSP-d 7.3 / OSP 7.0.4 > # rpm -qa os-net-config > os-net-config-0.1.4-7.el7ost.noarch Dimitri, thanks for your work on testing this bug earlier this year. I believe the problem was fixed with the latest os-net-config that is going to be released with OSP 8. If you get a chance to test this bug with OSP 8, please let me know if you find if it is fixed or still broken.
Dan, I can confirm that the bug is not present with the GA puddle 8.0 # rpm -qa os-net-config os-net-config-0.2.3-2.el7ost.noarch
Hey, Can you share the exact yaml file used for the verification steps since I've tried with the one attached but it failed the deployment. thanks
Created attachment 1152999 [details] Controller nic template @ushkalim you can find the controller nic template in attachment.
If that is the case then my deployment failed using the attached template - Do you want to have a look on the environment?
Created attachment 1153371 [details] controller.yaml
Created attachment 1153372 [details] network-environment.yaml
Created attachment 1153373 [details] network configuration applied on the controller
Verified on: os-net-config-0.2.3-2.el7ost.noarch All files used are attached.