1726137 – must-gather gather_network_logs script does not save output under /must-gather

Bug 1726137 - must-gather gather_network_logs script does not save output under /must-gather

Summary: must-gather gather_network_logs script does not save output under /must-gather

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Ricardo Carrillo Cruz
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1726144
TreeView+	depends on / blocked

Reported:	2019-07-02 08:13 UTC by Ricardo Carrillo Cruz
Modified:	2019-10-16 06:32 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1726144 (view as bug list)
Environment:
Last Closed:	2019-10-16 06:32:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:2922	0	None	None	None	2019-10-16 06:32:59 UTC

Description Ricardo Carrillo Cruz 2019-07-02 08:13:06 UTC

Description of problem:

must-gather gather_network_logs does not set the log path to /must-gather

Version-Release number of selected component (if applicable):

4.2.0

Comment 1 Eric Rich 2019-07-02 12:32:10 UTC

https://github.com/openshift/must-gather/pull/108 is the PR for this issue.

Comment 2 Mike Fiedler 2019-07-16 11:59:45 UTC

This is available to test in current nightly builds.

Comment 3 zhaozhanqi 2019-07-18 05:59:12 UTC

hi, Ricardo Carrillo Cruz
As here said: https://jira.coreos.com/browse/SDN-428?focusedCommentId=101714&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-101714 

how can we collect the network logs, it do not be included in `oc adm must-gather`?

I'm using `oc adm must-gather`, I also do not find `network_log` folder.

Comment 4 Ricardo Carrillo Cruz 2019-07-18 15:32:26 UTC

gather_network_logs is a collection script that is not meant to be run by default,
that's why is not present in 'gather' collection script, which is what 'oc adm must-gather' invokes.

To test this does the right thing, you can try something like this:

1. podman pull quay.io/openshift/origin-must-gather
2. mkdir /tmp/kube
3. cp <your cluster kubeconfig> kube/
4. mkdir /tmp/must-gather
5. podman run -v /tmp/test/kube:/root/.kube -v /tmp/test/must-gather/:/must-gather -it quay.io/openshift/origin-must-gather gather_network_logs

Command should succeed and you should see the network logs under your /test/must-gather, which then confirms the /must-gather folder is created within the container.

Comment 5 Eric Rich 2019-07-18 18:44:21 UTC

(In reply to Ricardo Carrillo Cruz from comment #4)
> gather_network_logs is a collection script that is not meant to be run by
> default,
> that's why is not present in 'gather' collection script, which is what 'oc
> adm must-gather' invokes.
> 
> To test this does the right thing, you can try something like this:

$ oc adm must-gather -- /usr/bin/gather_audit_logs
OR
$ oc adm must-gather -- /usr/bin/gather_audit_logs NODENAME [NODENAME]

Doing what you suggest below - test the bash script running in a pod; not the cli running a pod (image) running the bash script. 
- tl;dr you skipping some key pieces of the puzzle. 

> 1. podman pull quay.io/openshift/origin-must-gather
> 2. mkdir /tmp/kube
> 3. cp <your cluster kubeconfig> kube/
> 4. mkdir /tmp/must-gather
> 5. podman run -v /tmp/test/kube:/root/.kube -v
> /tmp/test/must-gather/:/must-gather -it quay.io/openshift/origin-must-gather
> gather_network_logs
> 
> Command should succeed and you should see the network logs under your
> /test/must-gather, which then confirms the /must-gather folder is created
> within the container.

This should apply the commands I have above too.

Comment 6 Ricardo Carrillo Cruz 2019-07-19 10:05:35 UTC

Right, but since it seems running arbitrary commands (i.e. 'oc adm must-gather -- <command>' does not work,
the suggestion I made was to test what the fix is all about, and that is the gather_network_logs
does put the logs collected under /must-gather.

For reference, running your example on a 4.2 cluster:

<snip>
[ricky@ricky-laptop test]$ oc adm must-gather -- /usr/bin/gather_audit_logs
namespace/openshift-must-gather-9d785 created
clusterrolebinding.rbac.authorization.k8s.io/must-gather-mj2lp created
WARNING: cannot use rsync: rsync not available in container
WARNING: cannot use tar: tar not available in container
clusterrolebinding.rbac.authorization.k8s.io/must-gather-mj2lp deleted
namespace/openshift-must-gather-9d785 deleted
error: No available strategies to copy.
[ricky@ricky-laptop test]$ ls
</snip>

Execution fails, and no must-gather folder is created on my laptop.

Comment 7 Ricardo Carrillo Cruz 2019-07-19 10:10:18 UTC

Sorry, I truncated one line from earlier output, 
below shows no must-gather folder is created on my laptop:

<snip>
[ricky@ricky-laptop test]$ ls
config  kube
</snip>

Comment 8 Ricardo Carrillo Cruz 2019-07-19 10:25:15 UTC

I opened https://bugzilla.redhat.com/show_bug.cgi?id=1731394 , to handle
oc adm must-gather inability to run arbitrary commmands, since it's different
to what we are addressing here.

Comment 10 Anurag saxena 2019-07-24 20:21:21 UTC

@Ricardo, any idea why i am getting this error or its a bug in runc or somethng

1. podman pull quay.io/openshift/origin-must-gather
2. mkdir /tmp/kube
3. cp ~/.kube/config /tmp/kube/
4. mkdir /tmp/must-gather


$ podman run -v /tmp/kube:/root/.kube -v /tmp/must-gather/:/must-gather -it quay.io/openshift/origin-must-gather gather_network_logs
Error: cannot specify gid= mount options for unmapped gid in rootless containers
: OCI runtime error

And I am not sure how the customer would collect these logs as apparently the process for gathering network logs is not straight forward as per comment 4. As we though the 'oc adm must-gather' would invoke it?

Comment 11 Ricardo Carrillo Cruz 2019-07-25 10:42:48 UTC

I just tested with a cluster from clusterbot and works for me:

<snip>
[ricky@ricky-laptop ~]$ vi /tmp/kubeconfig
[ricky@ricky-laptop ~]$ export KUBECONFIG=/tmp/kubeconfig
[ricky@ricky-laptop ~]$ oc get nodes
NAME                           STATUS   ROLES    AGE     VERSION
ip-10-0-130-65.ec2.internal    Ready    master   15m     v1.14.0+bd34733a7
ip-10-0-135-112.ec2.internal   Ready    master   15m     v1.14.0+bd34733a7
ip-10-0-135-249.ec2.internal   Ready    worker   7m29s   v1.14.0+bd34733a7
ip-10-0-143-32.ec2.internal    Ready    worker   7m31s   v1.14.0+bd34733a7
ip-10-0-145-65.ec2.internal    Ready    worker   7m31s   v1.14.0+bd34733a7
ip-10-0-148-189.ec2.internal   Ready    master   15m     v1.14.0+bd34733a7
[ricky@ricky-laptop ~]$ podman pull quay.io/openshift/origin-must-gather
Trying to pull docker://quay.io/openshift/origin-must-gather...Getting image source signatures
Copying blob 5dfdb3f0bcc0 skipped: already exists
Copying blob 806a9a74c184 skipped: already exists
Copying blob 34999dd9efc2 skipped: already exists
Copying blob 99f178453a43 skipped: already exists
Copying blob 1bbd7989ec96 done
Copying blob 2baeb8593e81 done
Copying config 1d9d780fec done
Writing manifest to image destination
Storing signatures
1d9d780fec83cf21bf1dcf1e2b94431113c6b6855c684b5cf93740531f777474
[ricky@ricky-laptop ~]$ mkdir /tmp/kube
[ricky@ricky-laptop ~]$ cp /tmp/kubeconfig /tmp/kube/config
[ricky@ricky-laptop ~]$ mkdir /tmp/must-gather
[ricky@ricky-laptop ~]$ podman run -v /tmp/kube:/root/.kube -v /tmp/must-gather/:/must-gather -it quay.io/openshift/origin-must-gather gather_network_logs
WARNING: Collecting network logs on ALL nodes in your cluster. This could take a large amount of time.
Error from server: error dialing backend: remote error: tls: internal error
Error from server: error dialing backend: remote error: tls: internal error
Error from server: error dialing backend: remote error: tls: internal error
Error from server: error dialing backend: remote error: tls: internal error
INFO: Waiting for node network log collection to complete ...
INFO: Node network log collection to complete.
[ricky@ricky-laptop ~]$ ls /tmp/must-gather/network_logs/
ip-10-0-130-65.ec2.internal_iptables   ip-10-0-135-249.ec2.internal_iptables  ip-10-0-145-65.ec2.internal_iptables   ovs-8ksbl_ovsdb_log     ovs-dg4q7_ovsdb_log     ovs-k4brh_ovsdb_log
ip-10-0-130-65.ec2.internal_ovs_dump   ip-10-0-135-249.ec2.internal_ovs_dump  ip-10-0-145-65.ec2.internal_ovs_dump   ovs-8ksbl_vswitchd_log  ovs-dg4q7_vswitchd_log  ovs-k4brh_vswitchd_log
ip-10-0-135-112.ec2.internal_iptables  ip-10-0-143-32.ec2.internal_iptables   ip-10-0-148-189.ec2.internal_iptables  ovs-8rv8x_ovsdb_log     ovs-f8cdr_ovsdb_log     ovs-tzsv8_ovsdb_log
ip-10-0-135-112.ec2.internal_ovs_dump  ip-10-0-143-32.ec2.internal_ovs_dump   ip-10-0-148-189.ec2.internal_ovs_dump  ovs-8rv8x_vswitchd_log  ovs-f8cdr_vswitchd_log  ovs-tzsv8_vswitchd_log
</snip>

Maybe earlier the steps had a bogus step, but basicly is having a kubeconfig for your cluster copied over as /tmp/kube/config .

And yes, I'm not implying a customer should use these steps to gather network logs.
As stated in a previous comment, there's a bug in oc adm must-gather that prevents from running arbitrary commands, which is why I opened https://bugzilla.redhat.com/show_bug.cgi?id=1731394 .
I put the 'podman' steps as a method for testing the whole purpose of this particular fix works, and that is the gather_network_logs script does save logs on /must-gather.

Comment 12 Anurag saxena 2019-07-26 21:37:02 UTC

Thanks @ricardo for trying at your side. I am still seeing the same error trying it on actual AWS cluster following same steps. Not sure yet how to verify this fix

$ podman run -v /tmp/kube:/root/.kube -v /tmp/must-gather/:/must-gather -it quay.io/openshift/origin-must-gather gather_network_logs
Error: cannot specify gid= mount options for unmapped gid in rootless containers
: OCI runtime error

Comment 13 Ricardo Carrillo Cruz 2019-07-29 09:43:04 UTC

Are you running a recent runc/podman version?

https://github.com/containers/libpod/issues/1147
https://github.com/containers/libpod/issues/3541

Comment 14 Anurag saxena 2019-07-30 19:56:36 UTC

(In reply to Ricardo Carrillo Cruz from comment #13)
> Are you running a recent runc/podman version?
> 
> https://github.com/containers/libpod/issues/1147
> https://github.com/containers/libpod/issues/3541

Hi Ricardo,

My versions are following and i am using latest 4.2 nightly build

[core@ip-]$ runc --version
runc version spec: 1.0.0

[core@ip-]$ podman --version
podman version 1.4.2

Comment 15 Ricardo Carrillo Cruz 2019-07-31 11:37:31 UTC

These are my versions:

[ricky@ricky-laptop openshift-ci-prom]$ runc --version
runc version 1.0.0-rc6+dev
commit: a5dee658ceacfe758cf47ba4ff4319278e58760c
spec: 1.0.1-dev
[ricky@ricky-laptop openshift-ci-prom]$ podman --version
podman version 1.2.0

Comment 16 Anurag saxena 2019-07-31 13:37:49 UTC

(In reply to Ricardo Carrillo Cruz from comment #15)
> These are my versions:
> 
> [ricky@ricky-laptop openshift-ci-prom]$ runc --version
> runc version 1.0.0-rc6+dev
> commit: a5dee658ceacfe758cf47ba4ff4319278e58760c
> spec: 1.0.1-dev
> [ricky@ricky-laptop openshift-ci-prom]$ podman --version
> podman version 1.2.0

Hmm, i need to find a way to upgrade my runc to match your version. Podman version looks okay. Not sure right now how the runc can be upgraded but will dig into this or if you have any suggestions. Thanks.

Comment 17 Ricardo Carrillo Cruz 2019-07-31 13:50:32 UTC

Another alternative could be just use docker run to get into the must-gather container.
Then create a KUBECONFIG file to point to a cluster, export the envvar, and verify that gather_network_logs does indeed save contents /must-gather.

Comment 18 Anurag saxena 2019-07-31 14:33:04 UTC

Thanks Ricardo. I am going to try this today and will keep you apprised

Comment 19 Anurag saxena 2019-07-31 21:44:19 UTC

Alright seems good to me post updating runc to latest version. Podman run also complained about OCI runtime error so had no option left but to try updating runc. 
However BZ 1731394 fix would give more better picture post fix. 

[core@ip-10-0-141-209 ~]$ runc --version
runc version spec: 1.0.0-rc6+dev

[core@ip-10-0-141-209 ~]$ podman --version
podman version 1.4.2

[core@ip-10-0-141-209 ~]$ vi /tmp/kubeconfig
[core@ip-10-0-141-209 ~]$ export KUBECONFIG=/tmp/kubeconfig
[core@ip-10-0-141-209 ~]$ oc get nodes
NAME                                              STATUS   ROLES    AGE   VERSION
ip-10-0-131-242.ap-northeast-1.compute.internal   Ready    worker   29h   v1.14.0+2e9d4a117
ip-10-0-141-209.ap-northeast-1.compute.internal   Ready    master   29h   v1.14.0+2e9d4a117
ip-10-0-146-149.ap-northeast-1.compute.internal   Ready    master   29h   v1.14.0+2e9d4a117
ip-10-0-156-230.ap-northeast-1.compute.internal   Ready    worker   29h   v1.14.0+2e9d4a117
ip-10-0-167-156.ap-northeast-1.compute.internal   Ready    master   29h   v1.14.0+2e9d4a117

[core@ip-10-0-141-209 ~]$ podman pull quay.io/openshift/origin-must-gather
Trying to pull docker://quay.io/openshift/origin-must-gather...Getting image source signatures
Copying blob 7dfdb3f0bcc0 skipped: already exists
Copying blob 506a9a74c184 skipped: already exists
Copying blob 84999dd9efc2 skipped: already exists
Copying blob 49f178453a43 skipped: already exists
Copying blob 3bbd7989ec96 done
Copying blob 9baeb8593e81 done
Copying config 6d9d780fec done
Writing manifest to image destination
Storing signatures
4d9d780fec83cf21bf1dcf1e2b94431113c6b6855c684b5cf93740531f777474
[core@ip-10-0-141-209 ~]$ mkdir /tmp/kube
[core@ip-10-0-141-209 ~]$ cp /tmp/kubeconfig /tmp/kube/config
[core@ip-10-0-141-209 ~]$$ mkdir /tmp/must-gather
[core@ip-10-0-141-209 ~]$$ podman run -v /tmp/kube:/root/.kube -v /tmp/must-gather/:/must-gather -it quay.io/openshift/origin-must-gather gather_network_logs
WARNING: Collecting network logs on ALL nodes in your cluster. This could take a large amount of time.
INFO: Waiting for node network log collection to complete ...
INFO: Node network log collection to complete.
[core@ip-10-0-141-209 ~]$ ls /tmp/must-gather/network_logs/
ip-10-0-131-242.ec2.internal_iptables   ip-10-0-146-149.ec2.internal_iptables  ip-10-0-167-156.ec2.internal_iptables   ovs-7ksbl_ovsdb_log     ovs-eg4q7_ovsdb_log     ovs-h4brh_ovsdb_log
ip-10-0-131-242.ec2.internal_ovs_dump   ip-10-0-146-149.ec2.internal_ovs_dump  ip-10-0-167-156.ec2.internal_ovs_dump   ovs-7ksbl_vswitchd_log  ovs-eg4q7_vswitchd_log  ovs-h4brh_vswitchd_log
ip-10-0-141-209.ec2.internal_iptables  ip-10-0-156-230.ec2.internal_iptables   ip-10-0-146-149.ec2.internal_iptables  ovs-7rv8x_ovsdb_log     ovs-g8cdr_ovsdb_log     ovs-nzsv8_ovsdb_log
ip-10-0-141-209.ec2.internal_ovs_dump  ip-10-0-156-230.ec2.internal_ovs_dump   ip-10-0-146-149.ec2.internal_ovs_dump  ovs-7rv8x_vswitchd_log  ovs-g8cdr_vswitchd_log  ovs-nzsv8_vswitchd_log

Comment 20 errata-xmlrpc 2019-10-16 06:32:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.