Bug 1967933 - Network-Tools debug scripts not working as expected
Summary: Network-Tools debug scripts not working as expected
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Surya Seetharaman
QA Contact: Dan Brahaney
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-04 13:08 UTC by Matthew Robson
Modified: 2021-07-27 23:11 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:11:38 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1126 0 None closed Bug 1967933: Add network-tools to image-stream 2021-06-10 16:55:37 UTC
Github openshift network-tools pull 45 0 None closed Bug 1967933: Replace the image with the official openshift quay one 2021-06-08 05:39:05 UTC
Github openshift network-tools pull 46 0 None closed Bug 1967933: Add proper directory path to which output will be saved 2021-06-10 04:43:58 UTC
Github openshift network-tools pull 47 0 None closed Bug 1967933: Add entry point for must-gather tooling 2021-06-11 08:55:19 UTC
Github openshift network-tools pull 48 0 None open Bug 1967933: Save logs to file as well in addition to stdouting 2021-06-11 14:58:29 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:11:55 UTC

Description Matthew Robson 2021-06-04 13:08:48 UTC
Description of problem:

Testing the Network-Tools debug scripts for OpenShift SDN on 4.8 is not working as expected.

Running the script from a debug pod does not work at all due to RBAC issues:

oc debug node/ocp48ipiathon-jjllv-worker-0-2t6lv --image=quay.io/openshift/origin-network-tools:latest
Creating debug namespace/openshift-debug-node-7sw4x ...
Starting pod/ocp48ipiathon-jjllv-worker-0-2t6lv-debug ...
To use host binaries, run `chroot /host`

Pod IP: 10.0.0.74
If you don't see a command prompt, try pressing enter.

sh-4.4# sdn_pod_to_pod_connectivity
Error from server (Forbidden): nodes is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot list resource "nodes" in API group "" at the cluster scope
error: resource name may not be empty
INFO: Scheduling network-tools-debug-pod-lrx2w on
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot create resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
Error from server (Forbidden): pods "network-tools-debug-pod-lrx2w" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
error: resource name may not be empty
Error from server (Forbidden): pods "network-tools-debug-pod-lrx2w" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
Error from server (Forbidden): nodes is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot list resource "nodes" in API group "" at the cluster scope
error: resource name may not be empty
INFO: Scheduling network-tools-debug-pod-7v24a on
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot create resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
Error from server (Forbidden): pods "network-tools-debug-pod-7v24a" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
error: resource name may not be empty
Error from server (Forbidden): pods "network-tools-debug-pod-7v24a" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
Error from server (Forbidden): pods "network-tools-debug-pod-lrx2w" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
Error from server (Forbidden): pods "network-tools-debug-pod-7v24a" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
INFO: IP of client pod network-tools-debug-pod-lrx2w:  and IP of server pod network-tools-debug-pod-7v24a:

INFO: Running ping  -c 1 -W 2 in the netns of pod network-tools-debug-pod-lrx2w
Error from server (Forbidden): pods "network-tools-debug-pod-lrx2w" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
error: arguments in resource/name form must have a single resource and name
INFO:
FAILURE: Pod network-tools-debug-pod-lrx2w unable to establish an ICMP connection against network-tools-debug-pod-7v24a
sh-4.4# sdn_pod_to_svc_connectivity
Error from server (Forbidden): nodes is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot list resource "nodes" in API group "" at the cluster scope
error: resource name may not be empty
INFO: Scheduling network-tools-debug-pod-cq2gm on
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot create resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
Error from server (Forbidden): pods "network-tools-debug-pod-cq2gm" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
error: resource name may not be empty
Error from server (Forbidden): pods "network-tools-debug-pod-cq2gm" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
Error from server (Forbidden): nodes is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot list resource "nodes" in API group "" at the cluster scope
error: resource name may not be empty
INFO: Scheduling network-tools-debug-svc-85tfb on
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot create resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
Error from server (Forbidden): pods "network-tools-debug-svc-85tfb" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
error: resource name may not be empty
INFO: Creating a ClusterIP service: network-tools-debug-svc-85tfb
Error from server (Forbidden): pods "network-tools-debug-svc-85tfb" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
Error from server (Forbidden): pods "network-tools-debug-svc-85tfb" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "pods" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
Error from server (Forbidden): endpoints "network-tools-debug-svc-85tfb" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "endpoints" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
waiting for svc
Error from server (Forbidden): endpoints "network-tools-debug-svc-85tfb" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "endpoints" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
waiting for svc
Error from server (Forbidden): endpoints "network-tools-debug-svc-85tfb" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "endpoints" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
waiting for svc
Error from server (Forbidden): endpoints "network-tools-debug-svc-85tfb" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "endpoints" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
waiting for svc
Error from server (Forbidden): endpoints "network-tools-debug-svc-85tfb" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "endpoints" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
waiting for svc
Error from server (Forbidden): endpoints "network-tools-debug-svc-85tfb" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "endpoints" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
waiting for svc
Error from server (Forbidden): endpoints "network-tools-debug-svc-85tfb" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "endpoints" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
waiting for svc
Error from server (Forbidden): endpoints "network-tools-debug-svc-85tfb" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "endpoints" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
waiting for svc
Error from server (Forbidden): endpoints "network-tools-debug-svc-85tfb" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "endpoints" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
waiting for svc
Error from server (Forbidden): endpoints "network-tools-debug-svc-85tfb" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "endpoints" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"
waiting for svc
Error from server (Forbidden): endpoints "network-tools-debug-svc-85tfb" is forbidden: User "system:serviceaccount:openshift-debug-node-7sw4x2pn4h:default" cannot get resource "endpoints" in API group "" in the namespace "openshift-debug-node-7sw4x2pn4h"

Running the scripts via a must gather, the script run successfully, but no data is captured in the must-gather directory:

oc adm must-gather --image=quay.io/openshift/origin-network-tools:latest -- sdn_pod_to_pod_connectivity
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-network-tools:latest
[must-gather      ] OUT namespace/openshift-must-gather-nrjxx created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-s8wtl created
[must-gather      ] OUT pod for plug-in image quay.io/openshift/origin-network-tools:latest created
[must-gather-zgn84] POD node/ocp48ipiathon-jjllv-worker-0-2t6lv labeled
[must-gather-zgn84] POD INFO: Scheduling network-tools-debug-pod-1dgwj on ocp48ipiathon-jjllv-worker-0-2t6lv
[must-gather-zgn84] POD pod/network-tools-debug-pod-1dgwj created
[must-gather-zgn84] POD pod/network-tools-debug-pod-1dgwj condition met
[must-gather-zgn84] POD node/ocp48ipiathon-jjllv-worker-0-2t6lv labeled
[must-gather-zgn84] POD node/ocp48ipiathon-jjllv-worker-0-2t6lv labeled
[must-gather-zgn84] POD INFO: Scheduling network-tools-debug-pod-t1clt on ocp48ipiathon-jjllv-worker-0-2t6lv
[must-gather-zgn84] POD pod/network-tools-debug-pod-t1clt created
[must-gather-zgn84] POD pod/network-tools-debug-pod-t1clt condition met
[must-gather-zgn84] POD node/ocp48ipiathon-jjllv-worker-0-2t6lv labeled
[must-gather-zgn84] POD INFO: IP of client pod network-tools-debug-pod-1dgwj: 10.131.0.19 and IP of server pod network-tools-debug-pod-t1clt: 10.131.0.20
[must-gather-zgn84] POD
[must-gather-zgn84] POD INFO: Running ping 10.131.0.20 -c 1 -W 2 in the netns of pod network-tools-debug-pod-1dgwj
[must-gather-zgn84] POD Starting pod/ocp48ipiathon-jjllv-worker-0-2t6lv-debug ...
[must-gather-zgn84] POD To use host binaries, run `chroot /host`
[must-gather-zgn84] POD
[must-gather-zgn84] POD Removing debug pod ...
[must-gather-zgn84] POD INFO: PING 10.131.0.20 (10.131.0.20) 56(84) bytes of data.
[must-gather-zgn84] POD 64 bytes from 10.131.0.20: icmp_seq=1 ttl=64 time=1.37 ms
[must-gather-zgn84] POD
[must-gather-zgn84] POD --- 10.131.0.20 ping statistics ---
[must-gather-zgn84] POD 1 packets transmitted, 1 received, 0% packet loss, time 0ms
[must-gather-zgn84] POD rtt min/avg/max/mdev = 1.371/1.371/1.371/0.000 ms
[must-gather-zgn84] POD SUCCESS: Pod network-tools-debug-pod-1dgwj established an ICMP connection successfully against network-tools-debug-pod-t1clt
[must-gather-zgn84] OUT waiting for gather to complete
[must-gather-zgn84] OUT downloading gather output
[must-gather-zgn84] OUT receiving file list ... done
[must-gather-zgn84] OUT ./
[must-gather-zgn84] OUT
[must-gather-zgn84] OUT sent 22 bytes  received 54 bytes  21.71 bytes/sec
[must-gather-zgn84] OUT total size is 0  speedup is 0.00
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-s8wtl deleted
[must-gather      ] OUT namespace/openshift-must-gather-nrjxx deleted

oc adm must-gather --image=quay.io/openshift/origin-network-tools:latest -- sdn_node_and_cluster_info
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-network-tools:latest
[must-gather      ] OUT namespace/openshift-must-gather-9km8w created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-jtwm6 created
[must-gather      ] OUT pod for plug-in image quay.io/openshift/origin-network-tools:latest created
[must-gather-wrldq] POD /bin/bash: sdn_node_and_cluster_info: command not found
[must-gather-wrldq] OUT waiting for gather to complete
[must-gather-wrldq] OUT downloading gather output
[must-gather-wrldq] OUT receiving file list ... done
[must-gather-wrldq] OUT ./
[must-gather-wrldq] OUT
[must-gather-wrldq] OUT sent 22 bytes  received 50 bytes  20.57 bytes/sec
[must-gather-wrldq] OUT total size is 0  speedup is 0.00
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-jtwm6 deleted
[must-gather      ] OUT namespace/openshift-must-gather-9km8w deleted


No data is collected in the folder:
> pwd
/Users/matthewrobson/must-gather.local.2561360630462638539/quay-io-openshift-origin-network-tools-sha256-893717e20c6852e6759596890a515bfc21cd9218ea83233bfa47f313b594cb67

> ll -a
total 0
drwxrwxrwx  2 matthewrobson  staff    64B  4 Jun 08:56 ./
drwxr-xr-x  5 matthewrobson  staff   160B  4 Jun 08:56 ../


Version-Release number of selected component (if applicable):
4.8 fc7

How reproducible:
Always

Steps to Reproduce:
1. Run using quay image
2.
3.

Actual results:
Does not work with a debug pod
Using must gather, no data is collected and saved

Expected results:


Additional info:

Comment 10 Surya Seetharaman 2021-06-08 07:50:05 UTC
@mrobson@redhat.com: The script name is sdn_cluster_and_node_info not sdn_node_and_cluster_info

oc adm must-gather --image=quay.io/openshift/origin-network-tools:latest -- sdn_cluster_and_node_info
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-network-tools:latest
[must-gather      ] OUT namespace/openshift-must-gather-v85kl created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-btm8g created
[must-gather      ] OUT pod for plug-in image quay.io/openshift/origin-network-tools:latest created
[must-gather-g5n55] POD INFO: Gathering cluster wide info like nodes, pods, svc, eps, routes, hostsubnets, netns
[must-gather-g5n55] POD W0608 05:59:28.085695      86 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
[must-gather-g5n55] POD INFO: Gathering node wise info
[must-gather-g5n55] POD INFO: User did not provide node name in input. Selecting all nodes.
[must-gather-g5n55] POD namespace/openshift-network-tools-6b62j created
[must-gather-g5n55] POD Now using project "openshift-network-tools-6b62j" on server "https://172.30.0.1:443".
[must-gather-g5n55] POD INFO: Creating host-network-pod ci-ln-z01i9mt-f76d1-kffvh-master-0-debug on node ci-ln-z01i9mt-f76d1-kffvh-master-0 to gather information
[must-gather-g5n55] POD INFO: Scheduling ci-ln-z01i9mt-f76d1-kffvh-master-0-debug on ci-ln-z01i9mt-f76d1-kffvh-master-0
[must-gather-g5n55] POD pod/ci-ln-z01i9mt-f76d1-kffvh-master-0-debug condition met
[must-gather-g5n55] POD INFO: Gathering nmcli --nocheck -f all dev show from node ci-ln-z01i9mt-f76d1-kffvh-master-0
[must-gather-g5n55] POD INFO: Gathering nmcli -- nocheck -f all con show from node ci-ln-z01i9mt-f76d1-kffvh-master-0
[must-gather-g5n55] POD INFO: Gathering ip addr show from node ci-ln-z01i9mt-f76d1-kffvh-master-0
[must-gather-g5n55] POD INFO: Gathering ip route show from node ci-ln-z01i9mt-f76d1-kffvh-master-0
[must-gather-g5n55] POD INFO: Gathering ip -s neighbor show from node ci-ln-z01i9mt-f76d1-kffvh-master-0
[must-gather-g5n55] POD INFO: Gathering iptables-save from node ci-ln-z01i9mt-f76d1-kffvh-master-0
[must-gather-g5n55] POD INFO: Gathering cat /etc/hosts from node ci-ln-z01i9mt-f76d1-kffvh-master-0
[must-gather-g5n55] POD INFO: Gathering cat /etc/resolv.conf from node ci-ln-z01i9mt-f76d1-kffvh-master-0
......

PR https://github.com/openshift/network-tools/pull/46 fixes the output wanting to be downloaded into the must-gather directory.

Please run oc adm must-gather --image=quay.io/openshift/origin-network-tools:latest --source-dir="network-tools" -- sdn_cluster_and_node_info and see. I just tried this now and it works for me.

Comment 13 Dan Brahaney 2021-06-15 14:10:17 UTC
I'm not seeing the label issue either-- and the other reported issues in this bug are not issues, as Surya has already described. It seems the image works just fine when run via must-gather. I think the confusion is just with the fact that this image does not actually download any files, so on a healthy cluster will show no output.

marking verified.

Comment 16 errata-xmlrpc 2021-07-27 23:11:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.