Description of problem: RHVH can't obtain ip over bond+vlan network after anaconda interactive installation. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Anaconda interactive install RHVH via iso(with default ks) 2. Enter network page. 3. Add bond network -> save. 4. Enter editing bond connection page-> bond tab, click add button. 5. Choose a connection type -> VLAN -> select a vlan nic -> set vlan name and id 20. 6. click add button again, choose a connection type -> VLAN -> select another vlan nic -> set vlan name and id 20. 7. Bond mode set -> active backup 8. Save 9. Bond+vlan can obtain vlan IP. 10. Continue the installation. 11. Reboot and login RHVH. 12. ip addr Actual results: 1. After step 9, bond+vlan network can obtain vlan IP. 2. After step 12, RHVH can't obtain IP after reboot Expected results: RHEV can obtain ip over bond+vlan network after reboot. Additional info:
Created attachment 1179875 [details] bond_vlan.png
Created attachment 1179876 [details] all_log_info
Does this mean bz#1355678 can be closed? It looks like the same configuration. Can you please get a sosreport?
(In reply to Ryan Barry from comment #3) > Does this mean bz#1355678 can be closed? It looks like the same > configuration. > Yes. > Can you please get a sosreport? sosreport has been included in #c2.
Sorry, I must have missed it -- ethtool shows p3p1 and p4p1 (the bond slaves) as having links However, networkmanager shows them as disconnected (possibly because ONBOOT=no) I'll try to reproduce, but can you please try "sed -e 's/ONBOOT=no/ONBOOT=yes/' /etc/sysconfig/network-scripts/ifcfg-p?p1", then rebooting? This is not a permanent solution, but I'd like to ensure that this is the problem in case I can't reproduce.
(In reply to Ryan Barry from comment #5) > Sorry, I must have missed it -- > > ethtool shows p3p1 and p4p1 (the bond slaves) as having links > > However, networkmanager shows them as disconnected (possibly because > ONBOOT=no) > > I'll try to reproduce, but can you please try "sed -e > 's/ONBOOT=no/ONBOOT=yes/' /etc/sysconfig/network-scripts/ifcfg-p?p1", then > rebooting? > Bond + vlan network still can't up after a reboot, new log attached. # sed -e 's/ONBOOT=no/ONBOOT=yes/' /etc/sysconfig/network-scripts/ifcfg-p?p1 TYPE=Ethernet BOOTPROTO=dhcp DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=no IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no NAME=p3p1 UUID=29db3899-e8e0-4f46-98fb-3639107a2726 DEVICE=p3p1 ONBOOT=yes TYPE=Ethernet BOOTPROTO=dhcp DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=no IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no NAME=p4p1 UUID=9a0f4af5-983f-4472-a0b3-7ea0868bcaa5 DEVICE=p4p1 ONBOOT=yes > This is not a permanent solution, but I'd like to ensure that this is the > problem in case I can't reproduce.
Created attachment 1180023 [details] new log after run sed and reboot
Can you please leave a test system up at the end of your day? This appears to be a networkmanager problem, but I need to investigate, and I don't have an appropriate testing environment in my lab right now.
(In reply to Ryan Barry from comment #8) > Can you please leave a test system up at the end of your day? This appears > to be a networkmanager problem, but I need to investigate, and I don't have > an appropriate testing environment in my lab right now. Sure, I will leave env for you at the end of today.
Hi Ryan, I have send a mail with test env to you.
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.
Is this reproducible on RHEL? This appears to be a bug in NetworkManager (or the way anaconda writes NetworkManager files) -- this is not something which RHVH touches. # nmcli c up "bond0 slave 1" Error: Connection activation failed: Master device bond0 unmanaged or not available for activation [root@dell-op790-01 ~]# nmcli d DEVICE TYPE STATE CONNECTION em1 ethernet connected em1 p3p1 ethernet disconnected -- p4p1 ethernet disconnected -- p4p2 ethernet disconnected -- 11 vlan disconnected -- 22 vlan disconnected -- bond0 bond unmanaged -- lo loopback unmanaged -- The slaves aren't up, either. However: [root@dell-op790-01 ~]# systemctl stop NetworkManager.service ;systemctl start network.service ;systemctl stop network.service; systemctl start NetworkManager.service && nmcli d [root@dell-op790-01 ~]# nmcli d DEVICE TYPE STATE CONNECTION em1 ethernet connected em1 11 vlan connected bond0 slave 1 22 vlan connected bond0 slave 2 bond0 bond connecting (getting IP configuration) Bond connection 1 p3p1 ethernet disconnected -- p4p1 ethernet disconnected -- p4p2 ethernet disconnected -- lo loopback unmanaged -- ifcfg files and journalctl -u NetworkManager.service are attached. thaller, what could be happening here?
Created attachment 1181751 [details] ifcfg files and networkmanager log
(In reply to Ryan Barry from comment #12) > ifcfg files and journalctl -u NetworkManager.service are attached. > > thaller, what could be happening here? The configuration in the ifcfg files seems correct. Can you please set level=DEBUG in the [logging] section of /etc/NetworkManager/NetworkManager.conf, reproduce the issue and attach NM logs? Thanks.
Thanks Beniamino - I don't actually have a test environment in my lab, so we'll both need to wait for QE.
(In reply to Beniamino Galvani from comment #14) > (In reply to Ryan Barry from comment #12) > > > ifcfg files and journalctl -u NetworkManager.service are attached. > > > > thaller, what could be happening here? > > The configuration in the ifcfg files seems correct. Can you please set > level=DEBUG in the [logging] section of > /etc/NetworkManager/NetworkManager.conf, reproduce the issue and attach NM > logs? Thanks. Still can reproduce this issue on other test env. 1. During anaconda interactive install RHVH via ISO(with default), set level=DEBUG in the [logging] section of /etc/NetworkManager/NetworkManager.conf. 2. reproduce the issue Actually this is no NetworkManager.log, You can find the Network Manager logs in /var/log/syslog, which acts as a catch-all for log messages. Meanwhile, I have sent a mail which contain the reproduced env to both of your. And the env will keep for 2 days. Thanks.
Created attachment 1182732 [details] new_bond_vlan_log_with_NM
I've analyzed the logs and the problem is that there is an externally-created bond0 when NM starts and the device is down. NM doesn't manage software interfaces with link down to avoid situations like [1]. There are two possible workarounds: (1) don't create the interface before NM starts or (2) at least bring it up so that NM will manage it and activate the existing connection. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1030947
(In reply to Beniamino Galvani from comment #18) > I've analyzed the logs and the problem is that there is an > externally-created bond0 when NM starts and the device is down. NM > doesn't manage software interfaces with link down to avoid situations > like [1]. There are two possible workarounds: > (1) don't create the interface before NM starts or Don't create the interface means don't create bond over vlan network, am I right? If so, just create a vlan nic is enough? I did the test only create one vlan network, it can obtain IP before and after reboot. >(2) at least bring it up so that NM will manage it and activate the existing > connection. Actually the nic which over bond + vlan was up status before reboot, the nic can obtain IP(192.168.xx.xx), please see attachment "bond_vlan-1.png" for more details. The ip just lost after a reboot. > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1030947
(In reply to shaochen from comment #19) > (In reply to Beniamino Galvani from comment #18) > > I've analyzed the logs and the problem is that there is an > > externally-created bond0 when NM starts and the device is down. NM > > doesn't manage software interfaces with link down to avoid situations > > like [1]. There are two possible workarounds: > > (1) don't create the interface before NM starts or > Don't create the interface means don't create bond over vlan network, am I > right? Correct, you should not manually create bond0 at boot. NM will do so when it starts, if there is the "Bond connection 1" connection referring to bond0. > If so, just create a vlan nic is enough? The point is that you don't need to manually create interfaces. > I did the test only create one vlan network, it can obtain IP before and > after reboot. > > >(2) at least bring it up so that NM will manage it and activate the existing > connection. > > Actually the nic which over bond + vlan was up status before reboot, the nic > can obtain IP(192.168.xx.xx), please see attachment "bond_vlan-1.png" for > more details. The ip just lost after a reboot. I see, the reason is the one explained above and in comment 18.
Considering comment 18 and comment 20, this sounds like this is something to improve on the vdsm side.
(In reply to Beniamino Galvani from comment #20) > (In reply to shaochen from comment #19) > > (In reply to Beniamino Galvani from comment #18) > > > I've analyzed the logs and the problem is that there is an > > > externally-created bond0 when NM starts and the device is down. NM > > > doesn't manage software interfaces with link down to avoid situations > > > like [1]. There are two possible workarounds: > > > (1) don't create the interface before NM starts or > > Don't create the interface means don't create bond over vlan network, am I > > right? The externally-created bond0 is likely user error on my part -- I don't often use nmcli, and never before with names like "Bond_Connection_1" If ONBOOT=yes is set for em1, the provided test system can easily be rebooted and configurations with user error tested. But the steps to reproduce are essentially: Create a bond on top of 2 VLAN devices in Anaconda Finish install Bond doesn't work The provided test system shows this configuration. Any messages about manual creation should be seen as user error from me looking at the system... > > Correct, you should not manually create bond0 at boot. NM will do so when it > starts, if there is the "Bond connection 1" connection referring to bond0. > > > If so, just create a vlan nic is enough? > > The point is that you don't need to manually create interfaces. I believe this refers to creating the interface through Anaconda's NetworkManager abstraction > I see, the reason is the one explained above and in comment 18. By "before reboot", I believe this refers to "during Anaconda"
I think I understand what's happening. As said before, the root cause of the failure of the bond activation is the presence of a bond0 interface at NM startup. This happens because the file /etc/modules-load.d/vdsm.conf loads the bonding module and doing so automatically generates a bond0 interface. To avoid such problem in NM we pass the max_bonds=0 option to the bonding module upon load, so that the initial interface is not created. I suppose the simplest solution would be to add the line: options bonding max_bonds=0 to the vdsm.conf file. I haven't tried it, but I think it should work.
(In reply to Beniamino Galvani from comment #24) > I think I understand what's happening. As said before, the root cause > of the failure of the bond activation is the presence of a bond0 > interface at NM startup. This happens because the file > /etc/modules-load.d/vdsm.conf loads the bonding module and doing so > automatically generates a bond0 interface. > > To avoid such problem in NM we pass the max_bonds=0 option to the > bonding module upon load, so that the initial interface is not > created. I suppose the simplest solution would be to add the line: > > options bonding max_bonds=0 > > to the vdsm.conf file. I haven't tried it, but I think it should work. The nic over bond_vlan still can up after append "options bonding max_bonds=0" to /etc/modules-load.d/vdsm.conf.
(In reply to shaochen from comment #25) > (In reply to Beniamino Galvani from comment #24) > > I think I understand what's happening. As said before, the root cause > > of the failure of the bond activation is the presence of a bond0 > > interface at NM startup. This happens because the file > > /etc/modules-load.d/vdsm.conf loads the bonding module and doing so > > automatically generates a bond0 interface. > > > > To avoid such problem in NM we pass the max_bonds=0 option to the > > bonding module upon load, so that the initial interface is not > > created. I suppose the simplest solution would be to add the line: > > > > options bonding max_bonds=0 > > > > to the vdsm.conf file. I haven't tried it, but I think it should work. > > The nic over bond_vlan still can up after append "options bonding > max_bonds=0" to /etc/modules-load.d/vdsm.conf. Typo: The nic over bond_vlan still can't up after append "options bonding max_bonds=0" to /etc/modules-load.d/vdsm.conf.
It is not possible to set module options in /etc/modules-load.d/vdsm.conf, you have to write it into /etc/modprobe.d/vdsm.conf. [root@vm_fc ~]# cat /etc/modprobe.d/vdsm-bonding-modprobe.conf # VDSM bonding modprobe configuration options bonding max_bonds=0 I created a patch which should fix this problem, but I don't know how to verify it with Anaconda installation. Could you help me? I can give you RPMs, just tell me for which system and which version of VDSM should I build them.
Petr, fi your fix is working on RHEL-H then this should very likely also work on NGN. Did you verify it on RHEL?
I built and installed it on Fedora 23, it does not create bond0 on boot anymore. On CentOS 7 I just tried if config files works, they does the same as on Fedora 23 (at least for me, edwardh reported that for him it still creates bond0 even with changed bonding modules options). I have not tried it together with Anaconda. Is it possible to install VDSM via CentOS 7 Anaconda?
(In reply to Beniamino Galvani from comment #24) > I think I understand what's happening. As said before, the root cause > of the failure of the bond activation is the presence of a bond0 > interface at NM startup. This happens because the file > /etc/modules-load.d/vdsm.conf loads the bonding module and doing so > automatically generates a bond0 interface. > > To avoid such problem in NM we pass the max_bonds=0 option to the > bonding module upon load, so that the initial interface is not > created. I suppose the simplest solution would be to add the line: > > options bonding max_bonds=0 > > to the vdsm.conf file. I haven't tried it, but I think it should work. Unfortunately, this solution is not consistent. the /etc/modprobe.d/* may be loaded after the kernel module is, making it irrelevant for the existing state. I would suggest avoiding the use of bond0 as it collides with NM.
*** Bug 1364476 has been marked as a duplicate of this bug. ***
Pasting here the comment from the gerrit patch. It refers to the reason that the bonding module may be loaded even though the modprobe.d has been updated. For the patch to be helpful it will require a 'dracut -f' command to update initrd. If the initrd used is an old one, the following scenario will occur: - Install VDSM. - Upgrade the kernel (initrd is created, taking /etc/modules-load.d/vdsm.conf config) [From this point on, bonding is loaded by initrd] - Upgrade VDSM with a build that includes this patch: no effect on boot. As NGN updates initrd on each boot, this solution will work smoothly.
Test version: redhat-virtualization-host-4.0-20160803.3 imgbased-0.7.4-0.1.el7ev.noarch vdsm-4.18.10-1.el7ev.x86_64 I have to assigned this bug due to I still can reproduce this issue with the latest RHVH build. RHVH still can't obtain IP after reboot And I notice that all the related path are belong to vdsm component, should we move the bug to VDSM component?
Is this the bond0 problem or something new? Please confirm if /etc/modprobe.d/vdsm-bonding-modprobe.conf exist and that after boot, bond0 is not defined.
(In reply to Edward Haas from comment #34) > Is this the bond0 problem or something new? Seems it still the bond0(bond+vlan network) problem, please see attachment for more details. > Please confirm if /etc/modprobe.d/vdsm-bonding-modprobe.conf exist and that > after boot, bond0 is not defined. # cat /etc/modprobe.d/vdsm-bonding-modprobe.conf # VDSM bonding modprobe configuration options bonding max_bonds=0
Created attachment 1189074 [details] bond_vlan_new_log_0803
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
ok then we should add to bootloader command in iso ks file --append="max_bond=0"
We should not have custom installation arguments for NGN. We need to understand why the modprobe.d approach is not working (having the config in a file i modprobe.d)
I have not noticed that we are talking about having vlan interfaces as slaved under a bond. That option is not supported by RHEL as far as I know (https://access.redhat.com/solutions/483803) and it is surely not supported by VDSM. Please retest without the vlan slaves (I guess moving the vlan on top of the bond).
(In reply to Edward Haas from comment #40) > I have not noticed that we are talking about having vlan interfaces as > slaved under a bond. > That option is not supported by RHEL as far as I know > (https://access.redhat.com/solutions/483803) and it is surely not supported > by VDSM. > > Please retest without the vlan slaves (I guess moving the vlan on top of the > bond). Test result without the vlan slaves: Test version: redhat-virtualization-host-4.0-20160810.1 imgbased-0.8.3-0.1.el7ev.noarch redhat-release-virtualization-host-4.0-0.29.el7.x86_64 vdsm-4.18.11-1.el7ev.x86_64 Test steps: 1. Anaconda interactive install RHVH via PXE. 2. Enter network page. 3. Add bond network -> save. 4. Enter editing bond connection page-> bond tab, click add button. 5. Choose a connection type -> Eth -> select a nic 6. click add button again, choose a connection type -> Eth -> select another nic 7. Bond mode set -> active backup 8. Save 9. Bond can obtain vlan IP. 10. Continue the installation. 11. Reboot and login RHVH. 12. ip addr Test result: RHVH still can obtain IP over bond network. So the bug is fixed, change bug status to VERIFIED.
Thanks Edward! I've opened bug 1366298 to prevent the creatipn of this setup in NetworkManager.