@etamir I thought my understanding was that ODF doesn't support multus on the public network which is what I see configured from this section of the description: ``` spec: network: provider: multus selectors: public: openshift-storage/ocs-public ``` Since we are close to dev freeze, I want to make sure we aren't blocking the release trying to fix a feature that isn't provided for customers.
@sagrawal I'd like to gather some follow-up info that is partly related to my previous comment. Is this an automated test or a manual test? Is it a new test for 4.12, or did it also exist in 4.11? ---------- I'm still looking into the must-gather, but from what I can tell after an initial pass is that the multus interfaces are being successfully attached to the mon pods. PVCs are also reportedly attached to the mon pods. There is something preventing the pods from running that I will continue evaluating. mon-a has a reported a failure to schedule due to anti-affinity rules, but I can't see a reason why. The nodes are labeled correctly, and the anti-affinities should also be working afaict. Mon-b's pod seems as though it should have started, but none of the containers are running or attempting to run. I haven't seen a reason why yet.
This is a regression for a supported feature that must be fixed in 4.12
@sagrawal if this test is re-run, does it pass? The issues where pod containers don't seem to be starting is strange, and I don't see any indication of a related error from kubelet logs. I wonder if there could have been an issue with the vmware nodes during the test run, and it would be good to rule that out. The multus-related features seem to be working as expected, so I don't currently think multus itself is causing issues. I will continue to look into the must-gather to see if I can find other telling details.
From what I can tell, Kubelet is not able to create the "sandbox" for the Pod because multus isn't able to configure whereabouts to use the interface. I believe the error message is indicating that the network interface is busy. Is it possible that there is a limit on the number of connections that can be bridged to ens192? This could be on the host system or on multus's config itself. One thing I notice from the pod describe output. Pods that started 44m or 43m ago all got a multus whereabouts connections successfully. Pods that started 42m ago, some succeed, but this is where pods begin to fail to have networks attached. Anything more recent than 42m ago seems to be consistently failing. I believe this has to be some sort of configuration problem with the VM or with multus.
(In reply to Blaine Gardner from comment #11) > From what I can tell, Kubelet is not able to create the "sandbox" for the > Pod because multus isn't able to configure whereabouts to use the interface. > I believe the error message is indicating that the network interface is > busy. > > Is it possible that there is a limit on the number of connections that can > be bridged to ens192? This could be on the host system or on multus's config > itself. > > One thing I notice from the pod describe output. Pods that started 44m or > 43m ago all got a multus whereabouts connections successfully. Pods that > started 42m ago, some succeed, but this is where pods begin to fail to have > networks attached. Anything more recent than 42m ago seems to be > consistently failing. > > I believe this has to be some sort of configuration problem with the VM or > with multus. There is no connection limit on i/f but identifed problem with the infra and also there is a issue in ocs-ci as well. Tested the deployment with OCP 4.11 + ODF 4.12 here: https://url.corp.redhat.com/729196c . Deployment is successfull but failed in verification stage ( mostly chnages needed in ocs-ci). For OCP 4.12 + ODF 4.12, need changes in ocs-ci which I will be working on it.
(In reply to Vijay Avuthu from comment #14) > (In reply to Blaine Gardner from comment #11) > > From what I can tell, Kubelet is not able to create the "sandbox" for the > > Pod because multus isn't able to configure whereabouts to use the interface. > > I believe the error message is indicating that the network interface is > > busy. > > > > Is it possible that there is a limit on the number of connections that can > > be bridged to ens192? This could be on the host system or on multus's config > > itself. > > > > One thing I notice from the pod describe output. Pods that started 44m or > > 43m ago all got a multus whereabouts connections successfully. Pods that > > started 42m ago, some succeed, but this is where pods begin to fail to have > > networks attached. Anything more recent than 42m ago seems to be > > consistently failing. > > > > I believe this has to be some sort of configuration problem with the VM or > > with multus. > > There is no connection limit on i/f but identifed problem with the infra and > also there is a issue in ocs-ci as well. > > Tested the deployment with OCP 4.11 + ODF 4.12 here: > https://url.corp.redhat.com/729196c . Deployment is successfull but failed > in verification stage ( mostly chnages needed in ocs-ci). > > For OCP 4.12 + ODF 4.12, need changes in ocs-ci which I will be working on > it. for OCP 4.12 + ODF 4.12, deployment passed here: https://url.corp.redhat.com/2be14fb
(In reply to Vijay Avuthu from comment #15) > (In reply to Vijay Avuthu from comment #14) > > (In reply to Blaine Gardner from comment #11) > > > From what I can tell, Kubelet is not able to create the "sandbox" for the > > > Pod because multus isn't able to configure whereabouts to use the interface. > > > I believe the error message is indicating that the network interface is > > > busy. > > > > > > Is it possible that there is a limit on the number of connections that can > > > be bridged to ens192? This could be on the host system or on multus's config > > > itself. > > > > > > One thing I notice from the pod describe output. Pods that started 44m or > > > 43m ago all got a multus whereabouts connections successfully. Pods that > > > started 42m ago, some succeed, but this is where pods begin to fail to have > > > networks attached. Anything more recent than 42m ago seems to be > > > consistently failing. > > > > > > I believe this has to be some sort of configuration problem with the VM or > > > with multus. > > > > There is no connection limit on i/f but identifed problem with the infra and > > also there is a issue in ocs-ci as well. > > > > Tested the deployment with OCP 4.11 + ODF 4.12 here: > > https://url.corp.redhat.com/729196c . Deployment is successfull but failed > > in verification stage ( mostly chnages needed in ocs-ci). > > > > For OCP 4.12 + ODF 4.12, need changes in ocs-ci which I will be working on > > it. > > for OCP 4.12 + ODF 4.12, deployment passed here: > https://url.corp.redhat.com/2be14fb must gather: https://url.corp.redhat.com/5b0a70c job with adding new nic ( ens224 ): https://url.corp.redhat.com/cdc3e70 must gather: https://url.corp.redhat.com/08868d7
Based on Vijay's comments this doesn't look like a blocker, keeping it open till we have more test results.