Bug 2142617 - [Multus][VMware] ODF deployment with multus unsuccessful
Summary: [Multus][VMware] ODF deployment with multus unsuccessful
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.12
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Blaine Gardner
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks: 1979561 2133550
TreeView+ depends on / blocked
 
Reported: 2022-11-14 15:38 UTC by Sidhant Agrawal
Modified: 2023-08-09 17:03 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-12-06 05:47:32 UTC
Embargoed:


Attachments (Terms of Use)

Comment 4 Blaine Gardner 2022-11-14 23:32:12 UTC
@etamir I thought my understanding was that ODF doesn't support multus on the public network which is what I see configured from this section of the description:

```
  spec:
    network:
      provider: multus
      selectors:
        public: openshift-storage/ocs-public
```

Since we are close to dev freeze, I want to make sure we aren't blocking the release trying to fix a feature that isn't provided for customers.

Comment 5 Blaine Gardner 2022-11-15 00:07:08 UTC
@sagrawal I'd like to gather some follow-up info that is partly related to my previous comment.

Is this an automated test or a manual test?

Is it a new test for 4.12, or did it also exist in 4.11?

----------

I'm still looking into the must-gather, but from what I can tell after an initial pass is that the multus interfaces are being successfully attached to the mon pods. PVCs are also reportedly attached to the mon pods.

There is something preventing the pods from running that I will continue evaluating. mon-a has a reported a failure to schedule due to anti-affinity rules, but I can't see a reason why. The nodes are labeled correctly, and the anti-affinities should also be working afaict.

Mon-b's pod seems as though it should have started, but none of the containers are running or attempting to run. I haven't seen a reason why yet.

Comment 8 Elad 2022-11-15 13:07:58 UTC
This is a regression for a supported feature that must be fixed in 4.12

Comment 9 Blaine Gardner 2022-11-15 15:27:18 UTC
@sagrawal if this test is re-run, does it pass? The issues where pod containers don't seem to be starting is strange, and I don't see any indication of a related error from kubelet logs. I wonder if there could have been an issue with the vmware nodes during the test run, and it would be good to rule that out.

The multus-related features seem to be working as expected, so I don't currently think multus itself is causing issues. I will continue to look into the must-gather to see if I can find other telling details.

Comment 11 Blaine Gardner 2022-11-17 20:19:23 UTC
From what I can tell, Kubelet is not able to create the "sandbox" for the Pod because multus isn't able to configure whereabouts to use the interface. I believe the error message is indicating that the network interface is busy. 

Is it possible that there is a limit on the number of connections that can be bridged to ens192? This could be on the host system or on multus's config itself.

One thing I notice from the pod describe output. Pods that started 44m or 43m ago all got a multus whereabouts connections successfully. Pods that started 42m ago, some succeed, but this is where pods begin to fail to have networks attached. Anything more recent than 42m ago seems to be consistently failing.

I believe this has to be some sort of configuration problem with the VM or with multus.

Comment 14 Vijay Avuthu 2022-11-23 15:47:59 UTC
(In reply to Blaine Gardner from comment #11)
> From what I can tell, Kubelet is not able to create the "sandbox" for the
> Pod because multus isn't able to configure whereabouts to use the interface.
> I believe the error message is indicating that the network interface is
> busy. 
> 
> Is it possible that there is a limit on the number of connections that can
> be bridged to ens192? This could be on the host system or on multus's config
> itself.
> 
> One thing I notice from the pod describe output. Pods that started 44m or
> 43m ago all got a multus whereabouts connections successfully. Pods that
> started 42m ago, some succeed, but this is where pods begin to fail to have
> networks attached. Anything more recent than 42m ago seems to be
> consistently failing.
> 
> I believe this has to be some sort of configuration problem with the VM or
> with multus.

There is no connection limit on i/f but identifed problem with the infra and also there is a issue in ocs-ci as well.

Tested the deployment with OCP 4.11 + ODF 4.12 here: https://url.corp.redhat.com/729196c . Deployment is successfull but failed in verification stage ( mostly chnages needed in ocs-ci). 

For OCP 4.12 + ODF 4.12, need changes in ocs-ci which I will be working on it.

Comment 15 Vijay Avuthu 2022-11-29 15:32:51 UTC
(In reply to Vijay Avuthu from comment #14)
> (In reply to Blaine Gardner from comment #11)
> > From what I can tell, Kubelet is not able to create the "sandbox" for the
> > Pod because multus isn't able to configure whereabouts to use the interface.
> > I believe the error message is indicating that the network interface is
> > busy. 
> > 
> > Is it possible that there is a limit on the number of connections that can
> > be bridged to ens192? This could be on the host system or on multus's config
> > itself.
> > 
> > One thing I notice from the pod describe output. Pods that started 44m or
> > 43m ago all got a multus whereabouts connections successfully. Pods that
> > started 42m ago, some succeed, but this is where pods begin to fail to have
> > networks attached. Anything more recent than 42m ago seems to be
> > consistently failing.
> > 
> > I believe this has to be some sort of configuration problem with the VM or
> > with multus.
> 
> There is no connection limit on i/f but identifed problem with the infra and
> also there is a issue in ocs-ci as well.
> 
> Tested the deployment with OCP 4.11 + ODF 4.12 here:
> https://url.corp.redhat.com/729196c . Deployment is successfull but failed
> in verification stage ( mostly chnages needed in ocs-ci). 
> 
> For OCP 4.12 + ODF 4.12, need changes in ocs-ci which I will be working on
> it.

for OCP 4.12 + ODF 4.12, deployment passed here: https://url.corp.redhat.com/2be14fb

Comment 16 Vijay Avuthu 2022-11-29 15:45:04 UTC
(In reply to Vijay Avuthu from comment #15)
> (In reply to Vijay Avuthu from comment #14)
> > (In reply to Blaine Gardner from comment #11)
> > > From what I can tell, Kubelet is not able to create the "sandbox" for the
> > > Pod because multus isn't able to configure whereabouts to use the interface.
> > > I believe the error message is indicating that the network interface is
> > > busy. 
> > > 
> > > Is it possible that there is a limit on the number of connections that can
> > > be bridged to ens192? This could be on the host system or on multus's config
> > > itself.
> > > 
> > > One thing I notice from the pod describe output. Pods that started 44m or
> > > 43m ago all got a multus whereabouts connections successfully. Pods that
> > > started 42m ago, some succeed, but this is where pods begin to fail to have
> > > networks attached. Anything more recent than 42m ago seems to be
> > > consistently failing.
> > > 
> > > I believe this has to be some sort of configuration problem with the VM or
> > > with multus.
> > 
> > There is no connection limit on i/f but identifed problem with the infra and
> > also there is a issue in ocs-ci as well.
> > 
> > Tested the deployment with OCP 4.11 + ODF 4.12 here:
> > https://url.corp.redhat.com/729196c . Deployment is successfull but failed
> > in verification stage ( mostly chnages needed in ocs-ci). 
> > 
> > For OCP 4.12 + ODF 4.12, need changes in ocs-ci which I will be working on
> > it.
> 
> for OCP 4.12 + ODF 4.12, deployment passed here:
> https://url.corp.redhat.com/2be14fb
must gather: https://url.corp.redhat.com/5b0a70c

job with adding new nic ( ens224 ): https://url.corp.redhat.com/cdc3e70
must gather: https://url.corp.redhat.com/08868d7

Comment 17 Mudit Agarwal 2022-12-05 14:51:04 UTC
Based on Vijay's comments this doesn't look like a blocker, keeping it open till we have more test results.


Note You need to log in before you can comment on or make changes to this bug.