Bug 2023295 - Must-gather tool gathering data from custom namespaces.
Summary: Must-gather tool gathering data from custom namespaces.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.11.0
Assignee: Nadia Pinaeva
QA Contact: huirwang
URL:
Whiteboard:
Depends On:
Blocks: 2084591
TreeView+ depends on / blocked
 
Reported: 2021-11-15 11:53 UTC by Utkarsh Wagh
Modified: 2022-08-10 10:40 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 10:39:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1432 0 None open Bug 2023295: Cleanup CNO relatedObjects 2022-05-09 15:51:09 UTC
Github openshift must-gather pull 300 0 None open Bug 2023295: Add networking resources 2022-05-11 11:50:12 UTC
Github openshift must-gather pull 303 0 None open Bug 2023295: fix getting deployment name 2022-05-19 09:15:41 UTC
Github openshift oc pull 1128 0 None open Bug 2023295: [inspect] Add namespace-scoped networking resources to inspect 2022-05-11 11:49:55 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:40:12 UTC

Description Utkarsh Wagh 2021-11-15 11:53:56 UTC
Description: Must-gather tool gathering data from custom namespaces.


Version-Release number of selected component (if applicable): OCP v4.8.13


How reproducible: we created a custom namespace on RHOCP v4.8.13 with custom content (eg: deployment, network-policy..), After running `$ oc adm must-gather`
we can see that the must-gather tool is collect data/info from custom namespaces as well. Ideally, it should collect data only for ***openshift-*** prefixed namespaces.


Steps to Reproduce:
1. Create a custom namespace with the command: `$ oc new-project <project-name>`

2. In that custom namespace create a custom network policy to deny ingress traffic from all namespaces. Using the below yaml.
~~~
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-by-default
spec:
  podSelector: {}
  ingress: []
~~~

3. Run the command to create custom network policy: `$ oc create -f networkpolicy.yaml`

4. Once network policy is successfully created, Intaiate collecting must-gather report with the command: `$ oc adm must-gather`


Actual results: The Must-gather captures data from the custom namespaces created. We can verify these in two ways:
[i] From the output on the CLI as (here `utkarsh` is custom namespace ).

~~~
[must-gather-nd7qm] OUT namespaces/utkarsh/networking.k8s.io/
[must-gather-nd7qm] OUT namespaces/utkarsh/networking.k8s.io/networkpolicies/
[must-gather-nd7qm] OUT namespaces/utkarsh/networking.k8s.io/networkpolicies/deny-by-default.yaml
~~~

[ii]  In the must-gather tar, when we manually go inside the must-gather we can find custom namespace is captured and has network policy resource present. To verify this we go to the below directory location.

~~~
/home/quicklab/must-gather.local.8851143450465617044/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-fdafc98a8cf6223d6e578c8af5daa387686b0e10792643fd4c6ae896f2afb6ab/namespaces/utkarsh/networking.k8s.io/networkpolicies
~~~

Expected results:  In the must-gather tar file which will be generated at the end, It should not capture data from the custom namespace.



Additional info: I have reproduced the above on ***quicklab*** cluster version ***4.8.13***. The results we got were the same as explained by the customer in his environment OCP ***4.8.13***.

Comment 1 Maciej Szulik 2021-11-22 16:00:20 UTC
Scraping all network policies was added in https://github.com/openshift/must-gather/pull/201
moving this over to network team.

Comment 2 Michał Dulko 2021-12-08 10:51:31 UTC
This seems to be true for multus scripts too [1]. Are these wrong too, or are we okay with these in particular? What's the policy here?

[1] https://github.com/openshift/must-gather/blob/2cff95604c6874d2b5504478c4ac0948a9e19f45/collection-scripts/gather_network_logs#L7-L16

Comment 3 Maciej Szulik 2021-12-21 12:48:58 UTC
(In reply to Michał Dulko from comment #2)
> This seems to be true for multus scripts too [1]. Are these wrong too, or
> are we okay with these in particular? What's the policy here?
> 
> [1]
> https://github.com/openshift/must-gather/blob/
> 2cff95604c6874d2b5504478c4ac0948a9e19f45/collection-scripts/
> gather_network_logs#L7-L16

I'd reach out to the authors with that question, the general guidance is to gather data which you require for debugging your component, if that's all namespaces will greatly depend on your own use case.

Comment 4 Michał Dulko 2021-12-22 13:01:44 UTC
Alright, I guess we could refer to gathering only the KuryrNetworkPolicy CRDs instead.

Comment 6 Michał Dulko 2022-01-25 18:00:58 UTC
I just realized that Kuryr script does not gather all the networkpolicies, but rather kuryrnetworkpolicy CRDs. I was able to reproduce the behavior in latest 4.10 by running `oc adm must-gather`. As the network policy in question lands in the /namespaces/<ns-name> directory, it cannot come from the gather_network_logs script, as these outputs are placed in /network_logs.

I'm moving this back to oc component. Meanwhile I'm trying to reproduce the behavior using `oc inspect` commands taken from `gather` script [1].

[1] https://github.com/openshift/must-gather/blob/5a8d5089d194c6604496e6940e6531c39aac58a8/collection-scripts/gather#L44-L46

Comment 7 Michał Dulko 2022-01-25 18:34:55 UTC
Yup, this did the trick and gathered that NP:

mdulko:openshift-clusters/ $ echo $group_resources_text                                                                                                                                                              
clusterversion,clusteroperators,certificatesigningrequests,nodes,storageclasses,persistentvolumes,volumeattachments,csidrivers,csinodes,volumesnapshotclasses,volumesnapshotcontents,imagecontentsourcepolicies.operator.openshift.io,networks.operator.openshift.io                                                                                                                                                                      
mdulko:openshift-clusters/ $ oc adm inspect --dest-dir must-gather --rotated-pod-logs "${group_resources_text}"

At a glance it seems like clusteroperators one is responsible for that.

Comment 8 Arda Guclu 2022-04-27 11:14:05 UTC
I investigated issue further;

- $ oc get co network -oyaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    network.operator.openshift.io/last-seen-state: '{"DaemonsetStates":[],"DeploymentStates":[],"StatefulsetStates":[]}'
    network.operator.openshift.io/relatedClusterObjects: ""
  creationTimestamp: "2022-04-27T08:44:00Z"
  generation: 1
  name: network
  ownerReferences:
  - apiVersion: config.openshift.io/v1
    kind: ClusterVersion
    name: version
  resourceVersion: "23064"
spec: {}
...
relatedObjects:
...
  - group: rbac.authorization.k8s.io
    name: openshift-network-public-role-binding
    namespace: openshift-config-managed
    resource: rolebindings
  - group: ""
    name: openshift-network-operator
    resource: namespaces
  - group: operator.openshift.io
    name: cluster
    resource: networks
  - group: networking.k8s.io
    name: ""
    resource: NetworkPolicy

NetworkPolicy in relatedObjects field does not include namespace. Therefore, must-gather lists all network policies and in our case it found one "deny-by-default".

But that does not mean that it executes must-gather for that resource in custom namespace. 

In reality;

- $ oc adm inspect --dest-dir /tmp/mg-network-policy --rotated-pod-logs clusteroperators -v=3
I0427 13:32:37.318657  115786 resource.go:168] Gathering related object reference information for "NetworkPolicy/deny-by-default"...
I0427 13:32:37.318704  115786 resource.go:172] "NetworkPolicy/deny-by-default" does not contain .status.relatedObjects

Thus, must-gather writes it's content into a file and continues others.

I think question is that is networkPolicy in relatedObjects field deliberately set without namespace?

Comment 9 Arda Guclu 2022-04-27 11:21:10 UTC
I'm moving this to networking team to ensure that networkPolicy without any namespace is set correct. If it is correct you can move it back to oc.

Comment 10 Nadia Pinaeva 2022-05-11 11:55:34 UTC
NetworkPolicy is not the only resource that can cause this issue. Some resources were added as relatedObjects for cluster-network-operator without name and namespace, just to be easily collected from the whole cluster.
2 resources are cluster-scoped and 2 are namespace-scoped - they are being removed from relatedObjects [1] and added to must-gather [2] and oc adm inspect namespace [3]

[1] https://github.com/openshift/cluster-network-operator/pull/1432
[2] https://github.com/openshift/must-gather/pull/300
[3] https://github.com/openshift/oc/pull/1128

Comment 13 Nadia Pinaeva 2022-05-18 08:54:59 UTC
Verification steps:
with ovn cluster
1. Create cluster-scoped resources: EgressIP and CloudPrivateIPConfig, verify it's collected by default must-gather 
2. Create namespace-scoped resources: NetworkPolicy and EgressFirewall, verify custom namespace with these resources is not collected with default must-gather, and is collected with oc adm inspect <ns>.

check other networking plugins don't break must-gather and inspect (it should just omit resources that are not registered like EgressFirewall and egressips.k8s.ovn.org)

Comment 14 zhaozhanqi 2022-05-18 09:05:11 UTC
@huirwang Could you help verify this bug?

Comment 22 errata-xmlrpc 2022-08-10 10:39:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.