Bug 1956640

Summary: HyperConverged deployment should not require "multinetwork functionality" from install-config.yaml
Product: Container Native Virtualization (CNV) Reporter: Patrik Martinsson <martinsson.patrik>
Component: NetworkingAssignee: Petr Horáček <phoracek>
Status: CLOSED NOTABUG QA Contact: Meni Yakove <myakove>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.6.0CC: aos-bugs, cnv-qe-bugs, eparis, fdeutsch, jokerman, phoracek, rgarcia, stirabos
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-16 12:22:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Patrik Martinsson 2021-05-04 06:37:09 UTC
Description of problem:

When creating a "HyperConverged" deployment from the operator "OpenShift Virtualization" one is required to have set the "disableMultiNetwork: false" in the install-config.yaml. 
This in turn will enable the multus-deployment that will enable pods to have multiple networks attached. If you have "disableMultiNetork: true" in the install-config.yaml, the installation of "HyperConverged" will fail with a message telling you that you need to "enable multinetwork functionality". 

I don't see why this should be a requiremnt, we are running "OpenShift Virtualization" and have successfully created an "Hyperconverged" deployment and can successfully run VM's - without a "multinetwork functionality" (ie. we have set "disableMultiNetork: true" in our install-config.yaml). We had to temporary enable it though to get the installation of HyperConverged to continue, but once the installation finished, we removed it again.

In other words, it seems to me that one *is not needed* to have a multus deployment (disableMultiNetwork: false) to successfully have running VM's and hence it should not be required. 

Version-Release number of selected component (if applicable):
OpenShift                : 4.7.6
OpenShift Virtualization : 2.6.1 

How reproducible:
Always.


Steps to Reproduce:
1. Install an OpenShift 4.7 cluster with "disableMultiNetwork: true" 

2. Install the Virtualization Operator

3. Create a hyperconverged deployment. 


Actual results:
Deployment of hyperconverged should suceed. 


Expected results:
Deployment of hyperconverged should fails with a message similar to "you need to enable multinetwork functionality to be able to install hyperconverged"


Additional info:
In order to change an installation from "disableMultiNetork: true" to "disableMultiNetork: false" one is needed to "trick" the network operator by changing its applied state, otherwise the operator will tell you that you can't change the parameter after install. This is of course unsupported, but that's not the point here. 

The point is simply, a working hyperconverged deployment doesn't seem to need "disableMultiNetwork: false" and hence should not require it at installation. 

I hope I made sense and please tell me if I'm missing something.


Ps. 
I'm not able to create a bug under the product "Container Native Virtualization (CNV)" (this product is simply not shown in the list of all products for me). And when trying to clone an existing one, ie. https://bugzilla.redhat.com/show_bug.cgi?id=1952509 I get a page saying that "Sorry, either the product Container Native Virtualization (CNV) does not exist or you aren't authorized to enter a bug into it.". That's why I need to create it here - maybe there is something wrong with the permissions for that product? I'm logged in as a regular user. 


Best regards,
Patrik,
Sweden

Comment 1 Petr Horáček 2021-05-13 11:09:49 UTC
Hello Patrik, thanks for reporting this. You are of course right that that Multus is redundant in your case. We are trying to keep the set of deployed consistent across different deployments, that's why we keep even optional components installed. It would be very helpful to understand your motivation to run without Multus. Are you merely trying to save resources, do you run a different meta CNI or something else? Once we understand this, we could try to figure out a solution that would help you run your workload.

Comment 2 Patrik Martinsson 2021-05-13 11:35:30 UTC
(In reply to Petr Horáček from comment #1)

Hi Pete,


> Hello Patrik, thanks for reporting this. You are of course right that that
> Multus is redundant in your case. We are trying to keep the set of deployed
> consistent across different deployments, that's why we keep even optional
> components installed. 

Ok, I see. So it's to keep deployments consistent, rather then a "hard requirement" then. I can buy that, even if don't necessarily agree. One more component, like multus, adds unnecessary complexity and more things that could go wrong. But non the less, I can see where you are coming from.

It would be very helpful to understand your motivation
> to run without Multus. Are you merely trying to save resources, do you run a
> different meta CNI or something else? Once we understand this, we could try
> to figure out a solution that would help you run your workload.

Oh, it's nothing like that. It's as simple as that I didn't understand why  we needed it in the first place. Since hypeeconverged works without it. The whole reason I went down the rabbit hole was that the multus deployment didn't work in our installation since we use the 'cisco aci cni' and that version didn't have support for Multinetwork installations (i think their newer releases has support for it).

Anyway, I don't have an issue with this, but it would be beneficial to us if we can scale down the multus pods to 0 since we don't need them. It's there a workaround to do this somehow?

Best regards,
Patrik,
Sweden

Comment 3 Petr Horáček 2021-05-18 14:02:04 UTC
(In reply to Patrik Martinsson from comment #2)
> (In reply to Petr Horáček from comment #1)
> 
> Hi Pete,
> 
> 
> > Hello Patrik, thanks for reporting this. You are of course right that that
> > Multus is redundant in your case. We are trying to keep the set of deployed
> > consistent across different deployments, that's why we keep even optional
> > components installed. 
> 
> Ok, I see. So it's to keep deployments consistent, rather then a "hard
> requirement" then. I can buy that, even if don't necessarily agree. One more
> component, like multus, adds unnecessary complexity and more things that
> could go wrong. But non the less, I can see where you are coming from.
> 
> It would be very helpful to understand your motivation

This is mostly to reduce complexity of the system and scope of tests. This does not apply only to this feature but to many others that could be optional. If we allowed Multus and other components to be "opt-in", each of them would add one dimension to the test matrix and it would become difficult for us to maintain proper test coverage. It would also open new "ifs" in our UI and documentation that we would need to keep considering (now I can rely on Multus being available on all deployments, if it became optional, I would need to condition all features depending on it). Being opinionated is just simpler.

I hear where you're coming from, installing only what you use is indeed cleaner. We would be open to make some features optional, but there needs to be a very good motivation that would balance the cost.

> > to run without Multus. Are you merely trying to save resources, do you run a
> > different meta CNI or something else? Once we understand this, we could try
> > to figure out a solution that would help you run your workload.
> 
> Oh, it's nothing like that. It's as simple as that I didn't understand why 
> we needed it in the first place. Since hypeeconverged works without it. The
> whole reason I went down the rabbit hole was that the multus deployment
> didn't work in our installation since we use the 'cisco aci cni' and that
> version didn't have support for Multinetwork installations (i think their
> newer releases has support for it).

Oh I see, thanks, this is useful information. Note that we haven't certified OpenShift Virtualization with ACI yet. Out of curiosity, except for the Multus annoyance, is it working fine with OpenShift VMs?

> 
> Anyway, I don't have an issue with this, but it would be beneficial to us if
> we can scale down the multus pods to 0 since we don't need them. It's there
> a workaround to do this somehow?

I'm afraid there is no (good) workaround. One thing you may do is to deploy OCP without Multus and ignore the error reported by HCO. The system should be functional (except for features based on Multus). It may be a little annoying but quite safe.

> 
> Best regards,
> Patrik,
> Sweden

I hope this makes the motivation little more transparent. Please let me know if you have any further questions as it is interesting to hear a user's take on this.