Description: Must-gather tool gathering data from custom namespaces. Version-Release number of selected component (if applicable): OCP v4.8.13 How reproducible: we created a custom namespace on RHOCP v4.8.13 with custom content (eg: deployment, network-policy..), After running `$ oc adm must-gather` we can see that the must-gather tool is collect data/info from custom namespaces as well. Ideally, it should collect data only for ***openshift-*** prefixed namespaces. Steps to Reproduce: 1. Create a custom namespace with the command: `$ oc new-project <project-name>` 2. In that custom namespace create a custom network policy to deny ingress traffic from all namespaces. Using the below yaml. ~~~ kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: deny-by-default spec: podSelector: {} ingress: [] ~~~ 3. Run the command to create custom network policy: `$ oc create -f networkpolicy.yaml` 4. Once network policy is successfully created, Intaiate collecting must-gather report with the command: `$ oc adm must-gather` Actual results: The Must-gather captures data from the custom namespaces created. We can verify these in two ways: [i] From the output on the CLI as (here `utkarsh` is custom namespace ). ~~~ [must-gather-nd7qm] OUT namespaces/utkarsh/networking.k8s.io/ [must-gather-nd7qm] OUT namespaces/utkarsh/networking.k8s.io/networkpolicies/ [must-gather-nd7qm] OUT namespaces/utkarsh/networking.k8s.io/networkpolicies/deny-by-default.yaml ~~~ [ii] In the must-gather tar, when we manually go inside the must-gather we can find custom namespace is captured and has network policy resource present. To verify this we go to the below directory location. ~~~ /home/quicklab/must-gather.local.8851143450465617044/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-fdafc98a8cf6223d6e578c8af5daa387686b0e10792643fd4c6ae896f2afb6ab/namespaces/utkarsh/networking.k8s.io/networkpolicies ~~~ Expected results: In the must-gather tar file which will be generated at the end, It should not capture data from the custom namespace. Additional info: I have reproduced the above on ***quicklab*** cluster version ***4.8.13***. The results we got were the same as explained by the customer in his environment OCP ***4.8.13***.
Scraping all network policies was added in https://github.com/openshift/must-gather/pull/201 moving this over to network team.
This seems to be true for multus scripts too [1]. Are these wrong too, or are we okay with these in particular? What's the policy here? [1] https://github.com/openshift/must-gather/blob/2cff95604c6874d2b5504478c4ac0948a9e19f45/collection-scripts/gather_network_logs#L7-L16
(In reply to Michał Dulko from comment #2) > This seems to be true for multus scripts too [1]. Are these wrong too, or > are we okay with these in particular? What's the policy here? > > [1] > https://github.com/openshift/must-gather/blob/ > 2cff95604c6874d2b5504478c4ac0948a9e19f45/collection-scripts/ > gather_network_logs#L7-L16 I'd reach out to the authors with that question, the general guidance is to gather data which you require for debugging your component, if that's all namespaces will greatly depend on your own use case.
Alright, I guess we could refer to gathering only the KuryrNetworkPolicy CRDs instead.
I just realized that Kuryr script does not gather all the networkpolicies, but rather kuryrnetworkpolicy CRDs. I was able to reproduce the behavior in latest 4.10 by running `oc adm must-gather`. As the network policy in question lands in the /namespaces/<ns-name> directory, it cannot come from the gather_network_logs script, as these outputs are placed in /network_logs. I'm moving this back to oc component. Meanwhile I'm trying to reproduce the behavior using `oc inspect` commands taken from `gather` script [1]. [1] https://github.com/openshift/must-gather/blob/5a8d5089d194c6604496e6940e6531c39aac58a8/collection-scripts/gather#L44-L46
Yup, this did the trick and gathered that NP: mdulko:openshift-clusters/ $ echo $group_resources_text clusterversion,clusteroperators,certificatesigningrequests,nodes,storageclasses,persistentvolumes,volumeattachments,csidrivers,csinodes,volumesnapshotclasses,volumesnapshotcontents,imagecontentsourcepolicies.operator.openshift.io,networks.operator.openshift.io mdulko:openshift-clusters/ $ oc adm inspect --dest-dir must-gather --rotated-pod-logs "${group_resources_text}" At a glance it seems like clusteroperators one is responsible for that.
I investigated issue further; - $ oc get co network -oyaml apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: annotations: include.release.openshift.io/ibm-cloud-managed: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" network.operator.openshift.io/last-seen-state: '{"DaemonsetStates":[],"DeploymentStates":[],"StatefulsetStates":[]}' network.operator.openshift.io/relatedClusterObjects: "" creationTimestamp: "2022-04-27T08:44:00Z" generation: 1 name: network ownerReferences: - apiVersion: config.openshift.io/v1 kind: ClusterVersion name: version resourceVersion: "23064" spec: {} ... relatedObjects: ... - group: rbac.authorization.k8s.io name: openshift-network-public-role-binding namespace: openshift-config-managed resource: rolebindings - group: "" name: openshift-network-operator resource: namespaces - group: operator.openshift.io name: cluster resource: networks - group: networking.k8s.io name: "" resource: NetworkPolicy NetworkPolicy in relatedObjects field does not include namespace. Therefore, must-gather lists all network policies and in our case it found one "deny-by-default". But that does not mean that it executes must-gather for that resource in custom namespace. In reality; - $ oc adm inspect --dest-dir /tmp/mg-network-policy --rotated-pod-logs clusteroperators -v=3 I0427 13:32:37.318657 115786 resource.go:168] Gathering related object reference information for "NetworkPolicy/deny-by-default"... I0427 13:32:37.318704 115786 resource.go:172] "NetworkPolicy/deny-by-default" does not contain .status.relatedObjects Thus, must-gather writes it's content into a file and continues others. I think question is that is networkPolicy in relatedObjects field deliberately set without namespace?
I'm moving this to networking team to ensure that networkPolicy without any namespace is set correct. If it is correct you can move it back to oc.
NetworkPolicy is not the only resource that can cause this issue. Some resources were added as relatedObjects for cluster-network-operator without name and namespace, just to be easily collected from the whole cluster. 2 resources are cluster-scoped and 2 are namespace-scoped - they are being removed from relatedObjects [1] and added to must-gather [2] and oc adm inspect namespace [3] [1] https://github.com/openshift/cluster-network-operator/pull/1432 [2] https://github.com/openshift/must-gather/pull/300 [3] https://github.com/openshift/oc/pull/1128
Verification steps: with ovn cluster 1. Create cluster-scoped resources: EgressIP and CloudPrivateIPConfig, verify it's collected by default must-gather 2. Create namespace-scoped resources: NetworkPolicy and EgressFirewall, verify custom namespace with these resources is not collected with default must-gather, and is collected with oc adm inspect <ns>. check other networking plugins don't break must-gather and inspect (it should just omit resources that are not registered like EgressFirewall and egressips.k8s.ovn.org)
@huirwang Could you help verify this bug?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069