Bug 2023295

Summary: Must-gather tool gathering data from custom namespaces.
Product: OpenShift Container Platform Reporter: Utkarsh Wagh <uwagh>
Component: NetworkingAssignee: Nadia Pinaeva <npinaeva>
Networking sub component: openshift-sdn QA Contact: huirwang
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, eparis, ffernand, huirwang, maszulik, mdulko, mfojtik, npinaeva
Version: 4.8Keywords: Triaged
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 10:39:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2084591    

Description Utkarsh Wagh 2021-11-15 11:53:56 UTC
Description: Must-gather tool gathering data from custom namespaces.


Version-Release number of selected component (if applicable): OCP v4.8.13


How reproducible: we created a custom namespace on RHOCP v4.8.13 with custom content (eg: deployment, network-policy..), After running `$ oc adm must-gather`
we can see that the must-gather tool is collect data/info from custom namespaces as well. Ideally, it should collect data only for ***openshift-*** prefixed namespaces.


Steps to Reproduce:
1. Create a custom namespace with the command: `$ oc new-project <project-name>`

2. In that custom namespace create a custom network policy to deny ingress traffic from all namespaces. Using the below yaml.
~~~
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-by-default
spec:
  podSelector: {}
  ingress: []
~~~

3. Run the command to create custom network policy: `$ oc create -f networkpolicy.yaml`

4. Once network policy is successfully created, Intaiate collecting must-gather report with the command: `$ oc adm must-gather`


Actual results: The Must-gather captures data from the custom namespaces created. We can verify these in two ways:
[i] From the output on the CLI as (here `utkarsh` is custom namespace ).

~~~
[must-gather-nd7qm] OUT namespaces/utkarsh/networking.k8s.io/
[must-gather-nd7qm] OUT namespaces/utkarsh/networking.k8s.io/networkpolicies/
[must-gather-nd7qm] OUT namespaces/utkarsh/networking.k8s.io/networkpolicies/deny-by-default.yaml
~~~

[ii]  In the must-gather tar, when we manually go inside the must-gather we can find custom namespace is captured and has network policy resource present. To verify this we go to the below directory location.

~~~
/home/quicklab/must-gather.local.8851143450465617044/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-fdafc98a8cf6223d6e578c8af5daa387686b0e10792643fd4c6ae896f2afb6ab/namespaces/utkarsh/networking.k8s.io/networkpolicies
~~~

Expected results:  In the must-gather tar file which will be generated at the end, It should not capture data from the custom namespace.



Additional info: I have reproduced the above on ***quicklab*** cluster version ***4.8.13***. The results we got were the same as explained by the customer in his environment OCP ***4.8.13***.

Comment 1 Maciej Szulik 2021-11-22 16:00:20 UTC
Scraping all network policies was added in https://github.com/openshift/must-gather/pull/201
moving this over to network team.

Comment 2 Michał Dulko 2021-12-08 10:51:31 UTC
This seems to be true for multus scripts too [1]. Are these wrong too, or are we okay with these in particular? What's the policy here?

[1] https://github.com/openshift/must-gather/blob/2cff95604c6874d2b5504478c4ac0948a9e19f45/collection-scripts/gather_network_logs#L7-L16

Comment 3 Maciej Szulik 2021-12-21 12:48:58 UTC
(In reply to Michał Dulko from comment #2)
> This seems to be true for multus scripts too [1]. Are these wrong too, or
> are we okay with these in particular? What's the policy here?
> 
> [1]
> https://github.com/openshift/must-gather/blob/
> 2cff95604c6874d2b5504478c4ac0948a9e19f45/collection-scripts/
> gather_network_logs#L7-L16

I'd reach out to the authors with that question, the general guidance is to gather data which you require for debugging your component, if that's all namespaces will greatly depend on your own use case.

Comment 4 Michał Dulko 2021-12-22 13:01:44 UTC
Alright, I guess we could refer to gathering only the KuryrNetworkPolicy CRDs instead.

Comment 6 Michał Dulko 2022-01-25 18:00:58 UTC
I just realized that Kuryr script does not gather all the networkpolicies, but rather kuryrnetworkpolicy CRDs. I was able to reproduce the behavior in latest 4.10 by running `oc adm must-gather`. As the network policy in question lands in the /namespaces/<ns-name> directory, it cannot come from the gather_network_logs script, as these outputs are placed in /network_logs.

I'm moving this back to oc component. Meanwhile I'm trying to reproduce the behavior using `oc inspect` commands taken from `gather` script [1].

[1] https://github.com/openshift/must-gather/blob/5a8d5089d194c6604496e6940e6531c39aac58a8/collection-scripts/gather#L44-L46

Comment 7 Michał Dulko 2022-01-25 18:34:55 UTC
Yup, this did the trick and gathered that NP:

mdulko:openshift-clusters/ $ echo $group_resources_text                                                                                                                                                              
clusterversion,clusteroperators,certificatesigningrequests,nodes,storageclasses,persistentvolumes,volumeattachments,csidrivers,csinodes,volumesnapshotclasses,volumesnapshotcontents,imagecontentsourcepolicies.operator.openshift.io,networks.operator.openshift.io                                                                                                                                                                      
mdulko:openshift-clusters/ $ oc adm inspect --dest-dir must-gather --rotated-pod-logs "${group_resources_text}"

At a glance it seems like clusteroperators one is responsible for that.

Comment 8 Arda Guclu 2022-04-27 11:14:05 UTC
I investigated issue further;

- $ oc get co network -oyaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    network.operator.openshift.io/last-seen-state: '{"DaemonsetStates":[],"DeploymentStates":[],"StatefulsetStates":[]}'
    network.operator.openshift.io/relatedClusterObjects: ""
  creationTimestamp: "2022-04-27T08:44:00Z"
  generation: 1
  name: network
  ownerReferences:
  - apiVersion: config.openshift.io/v1
    kind: ClusterVersion
    name: version
  resourceVersion: "23064"
spec: {}
...
relatedObjects:
...
  - group: rbac.authorization.k8s.io
    name: openshift-network-public-role-binding
    namespace: openshift-config-managed
    resource: rolebindings
  - group: ""
    name: openshift-network-operator
    resource: namespaces
  - group: operator.openshift.io
    name: cluster
    resource: networks
  - group: networking.k8s.io
    name: ""
    resource: NetworkPolicy

NetworkPolicy in relatedObjects field does not include namespace. Therefore, must-gather lists all network policies and in our case it found one "deny-by-default".

But that does not mean that it executes must-gather for that resource in custom namespace. 

In reality;

- $ oc adm inspect --dest-dir /tmp/mg-network-policy --rotated-pod-logs clusteroperators -v=3
I0427 13:32:37.318657  115786 resource.go:168] Gathering related object reference information for "NetworkPolicy/deny-by-default"...
I0427 13:32:37.318704  115786 resource.go:172] "NetworkPolicy/deny-by-default" does not contain .status.relatedObjects

Thus, must-gather writes it's content into a file and continues others.

I think question is that is networkPolicy in relatedObjects field deliberately set without namespace?

Comment 9 Arda Guclu 2022-04-27 11:21:10 UTC
I'm moving this to networking team to ensure that networkPolicy without any namespace is set correct. If it is correct you can move it back to oc.

Comment 10 Nadia Pinaeva 2022-05-11 11:55:34 UTC
NetworkPolicy is not the only resource that can cause this issue. Some resources were added as relatedObjects for cluster-network-operator without name and namespace, just to be easily collected from the whole cluster.
2 resources are cluster-scoped and 2 are namespace-scoped - they are being removed from relatedObjects [1] and added to must-gather [2] and oc adm inspect namespace [3]

[1] https://github.com/openshift/cluster-network-operator/pull/1432
[2] https://github.com/openshift/must-gather/pull/300
[3] https://github.com/openshift/oc/pull/1128

Comment 13 Nadia Pinaeva 2022-05-18 08:54:59 UTC
Verification steps:
with ovn cluster
1. Create cluster-scoped resources: EgressIP and CloudPrivateIPConfig, verify it's collected by default must-gather 
2. Create namespace-scoped resources: NetworkPolicy and EgressFirewall, verify custom namespace with these resources is not collected with default must-gather, and is collected with oc adm inspect <ns>.

check other networking plugins don't break must-gather and inspect (it should just omit resources that are not registered like EgressFirewall and egressips.k8s.ovn.org)

Comment 14 zhaozhanqi 2022-05-18 09:05:11 UTC
@huirwang Could you help verify this bug?

Comment 22 errata-xmlrpc 2022-08-10 10:39:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069