Bug 1897073
Summary: | [OCP 4.5] wrong netid assigned to Openshift projects/namespaces | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Angelo Gabrieli <agabriel> |
Component: | Networking | Assignee: | Juan Luis de Sousa-Valadas <jdesousa> |
Networking sub component: | openshift-sdn | QA Contact: | Arti Sood <asood> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | low | CC: | anusaxen, bbennett, javier.ordax, jdesousa, jechen |
Version: | 4.5 | ||
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
N/A
Consequence:
OpenShift SDN used to log incorrectly unable to allocate "netid 1: provided netid is not in the valid range" for namespaces with netid 1
Fix:
Don't log anything for netid < 10
Result:
Doesn't log that line anymore.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:32:41 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Angelo Gabrieli
2020-11-12 08:54:56 UTC
> The cluster looks installed with NetworkPolicy default plugin and with no NetworkPolicy rules: No, it really looks like it's installed with openshift-sdn in the Multitenant mode, in which case this behavior is expected. > 2020-09-10T10:37:48.024219635Z E0910 10:37:48.024139 1 vnids.go:292] unable to allocate netid 1: provided netid is not in the valid range (We should fix this error message though. It shouldn't be logging that.) Hi Dan, Exactly my thoughts, I spoke with Angelo about this and it's effectively multitenant: apiVersion: operator.openshift.io/v1 kind: Network metadata: creationTimestamp: "2020-02-20T15:21:10Z" generation: 1 name: cluster resourceVersion: "430" selfLink: /apis/operator.openshift.io/v1/networks/cluster uid: 9fd5b0f2-53f4-11ea-9964-005056ba4c5e spec: clusterNetwork: - cidr: 150.128.0.0/14 hostPrefix: 23 defaultNetwork: openshiftSDNConfig: mode: Multitenant mtu: 1450 vxlanPort: 4789 type: OpenShiftSDN serviceNetwork: - 140.30.0.0/16 I have a PR to fix the error message. Angelo, Regarding how to solve the communication issues: https://docs.openshift.com/container-platform/4.5/networking/openshift_sdn/multitenant-isolation.html#nw-multitenant-joining_multitenant-isolation Except for the error message everything is expected behavior. Hi, @jdesousa from where do you get that network configuration. During the live session we had with the customer, support requested to the customer to gather the network configuration and mode multitenant is not there. The terminal output was exported and is attached to the RedHat case. oc get network cluster -o yaml apiVersion: config.openshift.io/v1 kind: Network metadata: creationTimestamp: "2020-02-20T15:21:10Z" generation: 2 name: cluster resourceVersion: "1827" selfLink: /apis/config.openshift.io/v1/networks/cluster uid: 9fb72218-53f4-11ea-9964-005056ba4c5e spec: clusterNetwork: - cidr: 150.128.0.0/14 hostPrefix: 23 externalIP: policy: \{\} networkType: OpenShiftSDN serviceNetwork: - 140.30.0.0/16 status: clusterNetwork: - cidr: 150.128.0.0/14 hostPrefix: 23 clusterNetworkMTU: 1450 networkType: OpenShiftSDN serviceNetwork: - 140.30.0.0/16 Sorry, I have just realized we are not checking the same object. I will review again. That yaml output was provided by Angelo Gabrieli in a private conversation in Slack about 2 hours ago. Steps to reproduce:- 1. Create 4.5.22 cluster with multi tenant mode with flexy template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_5/ipi-on-aws/versioned-installer-multitenant-ci oc version Client Version: 4.6.6 Server Version: 4.5.22 Kubernetes Version: v1.18.3+616db59 oc get clusternetwork NAME CLUSTER NETWORK SERVICE NETWORK PLUGIN NAME default 10.128.0.0/14 172.30.0.0/16 redhat/openshift-ovs-multitenant oc get network.operator -o yaml | grep mode f:mode: {} mode: Multitenant 2. Create new project and pod in the new project oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/networking/pod-for-ping.json 3. check logs of all the pods sdn-controller-* in openshift-sdn project to see the error messages. oc logs sdn-controller-7tbwk oc logs sdn-controller-tmq6x | grep 'unable to allocate' E1209 13:52:24.752288 1 vnids.go:298] unable to allocate netid 1: provided netid is not in the valid range E1209 13:52:24.752315 1 vnids.go:298] unable to allocate netid 1: provided netid is not in the valid range E1209 13:52:24.753518 1 vnids.go:298] unable to allocate netid 1: provided netid is not in the valid range E1209 13:52:24.754645 1 vnids.go:298] unable to allocate netid 1: provided netid is not in the valid range E1209 13:52:24.755819 1 vnids.go:298] unable to allocate netid 1: provided netid is not in the valid range E1209 13:52:24.756887 1 vnids.go:298] unable to allocate netid 1: provided netid is not in the valid range E1209 13:52:24.758044 1 vnids.go:298] unable to allocate netid 1: provided netid is not in the valid range E1209 13:52:24.759192 1 vnids.go:298] unable to allocate netid 1: provided netid is not in the valid range E1209 13:52:24.760360 1 vnids.go:298] unable to allocate netid 1: provided netid is not in the valid range On 4.7 (registry.svc.ci.openshift.org/ocp/release:4.7.0-0.nightly-2020-12-09-112139) I am ubable to reproduce the errors from sdn-controller pods. Using the following steps, asood was able to reproduce the bug on 4.5, and then I was able to use the same steps to verify the bug is no longer present in 4.7. ❯ oc project openshift-sdn Now using project "openshift-sdn" on server "https://api.dbrahane-4-7-1209-multitenant.qe.devcluster.openshift.com:6443". ❯ oc get pods NAME READY STATUS RESTARTS AGE sdn-controller-7r85b 1/1 Running 0 164m sdn-controller-b6n95 1/1 Running 0 164m sdn-controller-rcwql 1/1 Running 0 164m ❯ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/networking/pod-for-ping.json ❯ oc logs sdn-controller-b6n95 | grep "unable to allocate" ❯ oc logs sdn-controller-rcwql | grep "unable to allocate" ❯ oc logs sdn-controller-7r85b | grep "unable to allocate" Hi, the target release for this bug is 4.7, therefore I think this should be verified. The fix was merged in master last week, so it's expected that the problem is still reproducible in 4.5 and 4.6. Comment#8 is reproduction in 4.5.22 and Comment#9 is verification in 4.7. Still waiting for fix in 4.5 and 4.6. Marking it verified as per comment #9. Hi Arti, We normally don't backport anything unless it's having an impact. Is there any real impact? Has any customer requested it? Don't get me wrong, if someone needs the backport I'm happy to do it, but I don't think it has any actual impact because the only customer that I know has complained about it is already aware that they can ignore it. Hi Juan, Discussed with Anurag and as per your comment as for now there is no actual impact known but we are not sure about performance impact of logging in large scale cluster at this point. We can mark it verified for now and open a new bug for prior releases if necessary. Hi Arti, It's just printing 7 lines (once per netnamespace with netid 1) every time a project is createdand when the cluster is started. I don't think it's worth investigating it. Anyway if you still think it may be a problem I'll do an automatic cherry-pick, it's not complex enough to waste anybody's time checking if it can actually decrease performance. Hi Juan, Let us do it. There may be some customers as I found out continue to use 4.5 and 4.6. Thank You! Thanks for clarification Juan and thanks for verifying it Arti and Dan. Agree with Juan, this is not recurring and its just 7 lines in the start when namespaces with netid 1 gets created during cluster bring up (in multitenant mode). Juan, I am not sure if customers like Verizon is using multitenant mode or not in 4.6, if not we may not need back porting. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |