Description of problem: UPI on vsphere with OVN install the br-ex ip get a new ip not same with the ens192(default interface). [core@control-plane-1 ~]$ cat /etc/sysconfig/network-scripts/ifcfg-ens192 TYPE=Ethernet BOOTPROTO=none NAME=ens192 DEVICE=ens192 ONBOOT=yes IPADDR=139.178.76.12 PREFIX=26 GATEWAY=139.178.76.1 [core@control-plane-1 ~]$ nmcli d show br-ex GENERAL.DEVICE: br-ex GENERAL.TYPE: ovs-interface GENERAL.HWADDR: (unknown) GENERAL.MTU: 1500 GENERAL.STATE: 100 (connected) GENERAL.CONNECTION: ovs-if-br-ex GENERAL.CON-PATH: /org/freedesktop/NetworkManager/ActiveConnection/5 IP4.ADDRESS[1]: 139.178.76.43/26 IP4.GATEWAY: 139.178.76.1 IP4.ROUTE[1]: dst = 0.0.0.0/0, nh = 139.178.76.1, mt = 800 IP4.ROUTE[2]: dst = 139.178.76.0/26, nh = 0.0.0.0, mt = 800 IP4.DNS[1]: 139.178.76.62 IP4.DOMAIN[1]: internal.example.com IP6.ADDRESS[1]: fe80::f964:d219:d681:d154/64 IP6.GATEWAY: -- IP6.ROUTE[1]: dst = fe80::/64, nh = ::, mt = 800 IP6.ROUTE[2]: dst = ff00::/8, nh = ::, mt = 256, table=255 Version-Release number of selected component (if applicable): 4.6 How reproducible: always Steps to Reproduce: 1. setup the cluster with upi on vsphere with OVN 2. 3. Actual results: cluster setup failed Check the br-ex ip not same with ens192 the origin ip Expected results: br-ex if need to consider as unmanaged by NM Additional info:
> br-ex if need to consider as unmanaged by NM It's actually created by NM. When "eth0" is using DHCP, we need br-ex to run DHCP as well, to keep the lease active. I guess the fix is that if "eth0" has ipv4.method=manual / ipv6.method=manual, we need to copy that to br-ex too, so it _doesn't_ run DHCP.
(In reply to Dan Winship from comment #1) > > br-ex if need to consider as unmanaged by NM > > It's actually created by NM. > > When "eth0" is using DHCP, we need br-ex to run DHCP as well, to keep the > lease active. > > I guess the fix is that if "eth0" has ipv4.method=manual / > ipv6.method=manual, we need to copy that to br-ex too, so it _doesn't_ run > DHCP. I thought our agreement was if you want to do manual interface config on the box, then you need to provide ignition NetworkManager key files to specify your manual config. Here is an example you an use: https://gist.github.com/trozet/8cd5da0ce872b2f8d3952bf83279af71
OK, apparently there was some miscommunication. I thought we had agreed that the user was responsible for ensuring the *underlying host network* was set up the way they wanted (eg, "default behavior" / "use a static IP" / "bond eth0 and eth1 and then make a vlan interface off the bond and use that as the default interface" / etc), and then the ovn-kubernetes setup script would take whatever interface had the default route, and move it to br-ex. Tim thought we had agreed that if the user wanted anything other than 100% default networking (DHCP on primary ethernet interface) then they would have to provide NM config files for the entire network, *up to and including br-ex*. It turns out that my interpretation is gibberish, because if you want an OVS bridge with a bonded interface inside it, you can't configure that by taking a kernel-level bonded interface and moving it into the bridge, you have to configure the bonding at the OVS level. Which Tim assumed I knew, and hence he assumed I was saying his version, since that's the only thing that makes sense in that case. I don't like the current situation because: - While I think it's reasonable to make life difficult for people who want "really complicated" network configurations, I don't think "I need static IP instead of DHCP" should count as "really complicated". - This would mean that the sdn-to-ovn migration tool would not work for users with static IPs. (Given that we don't currently have a declarative spec for host networking config, it's not reasonable to expect the migration tool to be able to fix up their cluster configuration automatically.) - Requiring the user to specify a configuration for br-ex exposes too much of the low-level details of ovn-kubernetes shared gateway mode. What if we realize later that we need to configure the shared gateway bridge somewhat differently? What do we do with people's explicit br-ex configurations then? So I feel like, if you aren't using bonds and vlans (as in the case here in this bug) then the setup script needs to just cope. (Or maybe the case in this bug is the _only_ other case that gets the "simple" treatment; you get automatic handling for single-ethernet-with-DHCP, and automatic handling for single-ethernet-with-static-IP, but everything else needs the more complicated config?) For the users who are doing "complicated" stuff, I'm not sure. The nmstate-like approach is definitely better than writing out NetworkManager configs (other than the parsing issues), but I still don't like the fact that it requires specifying the _exact_ topology including the ovn-kubernetes-specific parts, rather than only specifying the "raw" host network config. It may be that there's nothing we can do about that. I guess at least with the nmstate-like file, if we do need to change the topology in the future, it would be much easier to tweak the provided config to match what we need than it would be in the NetworkManager config case.
@Dan do you mean we can set the br-ex ignition file to set br-ex with static ip? if so , how to set this? could you help provide the details steps, that's will very helpful. thanks.
I didn't mean that, although I guess you could, at the moment. The eventual fix is that you should not have to do anything different than you were doing before. I think if you just edit the existing br-ex definition on the node using nmcli or nmtui to set a static IP, then it should work.
After some discussion, we have agreed to allowing static IP to be detected and moved over to the OVS interface. However, all other manual configuration, including bonds or vlans will require configuring the host manually.
Is there anyway you can test out https://github.com/openshift/machine-config-operator/pull/2018 and see if it fixes the problem? You can try changing the script on the host and rebooting the node if you don't want to build a custom MCO image. Or if you can give me access to a node that fails I can modify it and test it out.
You can tell cluster-bot "build openshift/machine-config-operator#2018" and it will build a release image containing that PR for you, and then follow the link it gives you and look in the build-log.txt and near the end you'll see: 2020/07/18 14:50:09 Create release image registry.svc.ci.openshift.org/ci-ln-zn97t2k/release:latest and then run "oc adm release extract --command=openshift-install THAT-IMAGE-NAME" to get an appropriate openshift-install binary that will use that image.
(In reply to Dan Winship from comment #8) > You can tell cluster-bot "build openshift/machine-config-operator#2018" and > it will build a release image containing that PR for you, and then follow > the link it gives you and look in the build-log.txt and near the end you'll > see: > > 2020/07/18 14:50:09 Create release image > registry.svc.ci.openshift.org/ci-ln-zn97t2k/release:latest > > and then run "oc adm release extract --command=openshift-install > THAT-IMAGE-NAME" to get an appropriate openshift-install binary that will > use that image. @Dan, I guess the cluster-bot only supports IPI? This is UPI where nodes will have static IP
ah sorry misunderstood btw launch and build..heh
Update: Shared cluster with Tim today with web console access
Thanks Anurag! Verified with some tweaks the fix will work for vsphere. Under review now.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196