Bug 2023295
Summary: | Must-gather tool gathering data from custom namespaces. | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Utkarsh Wagh <uwagh> |
Component: | Networking | Assignee: | Nadia Pinaeva <npinaeva> |
Networking sub component: | openshift-sdn | QA Contact: | huirwang |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | aos-bugs, eparis, ffernand, huirwang, maszulik, mdulko, mfojtik, npinaeva |
Version: | 4.8 | Keywords: | Triaged |
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 10:39:46 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2084591 |
Description
Utkarsh Wagh
2021-11-15 11:53:56 UTC
Scraping all network policies was added in https://github.com/openshift/must-gather/pull/201 moving this over to network team. This seems to be true for multus scripts too [1]. Are these wrong too, or are we okay with these in particular? What's the policy here? [1] https://github.com/openshift/must-gather/blob/2cff95604c6874d2b5504478c4ac0948a9e19f45/collection-scripts/gather_network_logs#L7-L16 (In reply to Michał Dulko from comment #2) > This seems to be true for multus scripts too [1]. Are these wrong too, or > are we okay with these in particular? What's the policy here? > > [1] > https://github.com/openshift/must-gather/blob/ > 2cff95604c6874d2b5504478c4ac0948a9e19f45/collection-scripts/ > gather_network_logs#L7-L16 I'd reach out to the authors with that question, the general guidance is to gather data which you require for debugging your component, if that's all namespaces will greatly depend on your own use case. Alright, I guess we could refer to gathering only the KuryrNetworkPolicy CRDs instead. I just realized that Kuryr script does not gather all the networkpolicies, but rather kuryrnetworkpolicy CRDs. I was able to reproduce the behavior in latest 4.10 by running `oc adm must-gather`. As the network policy in question lands in the /namespaces/<ns-name> directory, it cannot come from the gather_network_logs script, as these outputs are placed in /network_logs. I'm moving this back to oc component. Meanwhile I'm trying to reproduce the behavior using `oc inspect` commands taken from `gather` script [1]. [1] https://github.com/openshift/must-gather/blob/5a8d5089d194c6604496e6940e6531c39aac58a8/collection-scripts/gather#L44-L46 Yup, this did the trick and gathered that NP: mdulko:openshift-clusters/ $ echo $group_resources_text clusterversion,clusteroperators,certificatesigningrequests,nodes,storageclasses,persistentvolumes,volumeattachments,csidrivers,csinodes,volumesnapshotclasses,volumesnapshotcontents,imagecontentsourcepolicies.operator.openshift.io,networks.operator.openshift.io mdulko:openshift-clusters/ $ oc adm inspect --dest-dir must-gather --rotated-pod-logs "${group_resources_text}" At a glance it seems like clusteroperators one is responsible for that. I investigated issue further; - $ oc get co network -oyaml apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: annotations: include.release.openshift.io/ibm-cloud-managed: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" network.operator.openshift.io/last-seen-state: '{"DaemonsetStates":[],"DeploymentStates":[],"StatefulsetStates":[]}' network.operator.openshift.io/relatedClusterObjects: "" creationTimestamp: "2022-04-27T08:44:00Z" generation: 1 name: network ownerReferences: - apiVersion: config.openshift.io/v1 kind: ClusterVersion name: version resourceVersion: "23064" spec: {} ... relatedObjects: ... - group: rbac.authorization.k8s.io name: openshift-network-public-role-binding namespace: openshift-config-managed resource: rolebindings - group: "" name: openshift-network-operator resource: namespaces - group: operator.openshift.io name: cluster resource: networks - group: networking.k8s.io name: "" resource: NetworkPolicy NetworkPolicy in relatedObjects field does not include namespace. Therefore, must-gather lists all network policies and in our case it found one "deny-by-default". But that does not mean that it executes must-gather for that resource in custom namespace. In reality; - $ oc adm inspect --dest-dir /tmp/mg-network-policy --rotated-pod-logs clusteroperators -v=3 I0427 13:32:37.318657 115786 resource.go:168] Gathering related object reference information for "NetworkPolicy/deny-by-default"... I0427 13:32:37.318704 115786 resource.go:172] "NetworkPolicy/deny-by-default" does not contain .status.relatedObjects Thus, must-gather writes it's content into a file and continues others. I think question is that is networkPolicy in relatedObjects field deliberately set without namespace? I'm moving this to networking team to ensure that networkPolicy without any namespace is set correct. If it is correct you can move it back to oc. NetworkPolicy is not the only resource that can cause this issue. Some resources were added as relatedObjects for cluster-network-operator without name and namespace, just to be easily collected from the whole cluster. 2 resources are cluster-scoped and 2 are namespace-scoped - they are being removed from relatedObjects [1] and added to must-gather [2] and oc adm inspect namespace [3] [1] https://github.com/openshift/cluster-network-operator/pull/1432 [2] https://github.com/openshift/must-gather/pull/300 [3] https://github.com/openshift/oc/pull/1128 Verification steps: with ovn cluster 1. Create cluster-scoped resources: EgressIP and CloudPrivateIPConfig, verify it's collected by default must-gather 2. Create namespace-scoped resources: NetworkPolicy and EgressFirewall, verify custom namespace with these resources is not collected with default must-gather, and is collected with oc adm inspect <ns>. check other networking plugins don't break must-gather and inspect (it should just omit resources that are not registered like EgressFirewall and egressips.k8s.ovn.org) @huirwang Could you help verify this bug? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |