Description of problem: must-gather gather_network_logs does not set the log path to /must-gather Version-Release number of selected component (if applicable): 4.2.0
https://github.com/openshift/must-gather/pull/108 is the PR for this issue.
This is available to test in current nightly builds.
hi, Ricardo Carrillo Cruz As here said: https://jira.coreos.com/browse/SDN-428?focusedCommentId=101714&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-101714 how can we collect the network logs, it do not be included in `oc adm must-gather`? I'm using `oc adm must-gather`, I also do not find `network_log` folder.
gather_network_logs is a collection script that is not meant to be run by default, that's why is not present in 'gather' collection script, which is what 'oc adm must-gather' invokes. To test this does the right thing, you can try something like this: 1. podman pull quay.io/openshift/origin-must-gather 2. mkdir /tmp/kube 3. cp <your cluster kubeconfig> kube/ 4. mkdir /tmp/must-gather 5. podman run -v /tmp/test/kube:/root/.kube -v /tmp/test/must-gather/:/must-gather -it quay.io/openshift/origin-must-gather gather_network_logs Command should succeed and you should see the network logs under your /test/must-gather, which then confirms the /must-gather folder is created within the container.
(In reply to Ricardo Carrillo Cruz from comment #4) > gather_network_logs is a collection script that is not meant to be run by > default, > that's why is not present in 'gather' collection script, which is what 'oc > adm must-gather' invokes. > > To test this does the right thing, you can try something like this: $ oc adm must-gather -- /usr/bin/gather_audit_logs OR $ oc adm must-gather -- /usr/bin/gather_audit_logs NODENAME [NODENAME] Doing what you suggest below - test the bash script running in a pod; not the cli running a pod (image) running the bash script. - tl;dr you skipping some key pieces of the puzzle. > 1. podman pull quay.io/openshift/origin-must-gather > 2. mkdir /tmp/kube > 3. cp <your cluster kubeconfig> kube/ > 4. mkdir /tmp/must-gather > 5. podman run -v /tmp/test/kube:/root/.kube -v > /tmp/test/must-gather/:/must-gather -it quay.io/openshift/origin-must-gather > gather_network_logs > > Command should succeed and you should see the network logs under your > /test/must-gather, which then confirms the /must-gather folder is created > within the container. This should apply the commands I have above too.
Right, but since it seems running arbitrary commands (i.e. 'oc adm must-gather -- <command>' does not work, the suggestion I made was to test what the fix is all about, and that is the gather_network_logs does put the logs collected under /must-gather. For reference, running your example on a 4.2 cluster: <snip> [ricky@ricky-laptop test]$ oc adm must-gather -- /usr/bin/gather_audit_logs namespace/openshift-must-gather-9d785 created clusterrolebinding.rbac.authorization.k8s.io/must-gather-mj2lp created WARNING: cannot use rsync: rsync not available in container WARNING: cannot use tar: tar not available in container clusterrolebinding.rbac.authorization.k8s.io/must-gather-mj2lp deleted namespace/openshift-must-gather-9d785 deleted error: No available strategies to copy. [ricky@ricky-laptop test]$ ls </snip> Execution fails, and no must-gather folder is created on my laptop.
Sorry, I truncated one line from earlier output, below shows no must-gather folder is created on my laptop: <snip> [ricky@ricky-laptop test]$ ls config kube </snip>
I opened https://bugzilla.redhat.com/show_bug.cgi?id=1731394 , to handle oc adm must-gather inability to run arbitrary commmands, since it's different to what we are addressing here.
@Ricardo, any idea why i am getting this error or its a bug in runc or somethng 1. podman pull quay.io/openshift/origin-must-gather 2. mkdir /tmp/kube 3. cp ~/.kube/config /tmp/kube/ 4. mkdir /tmp/must-gather $ podman run -v /tmp/kube:/root/.kube -v /tmp/must-gather/:/must-gather -it quay.io/openshift/origin-must-gather gather_network_logs Error: cannot specify gid= mount options for unmapped gid in rootless containers : OCI runtime error And I am not sure how the customer would collect these logs as apparently the process for gathering network logs is not straight forward as per comment 4. As we though the 'oc adm must-gather' would invoke it?
I just tested with a cluster from clusterbot and works for me: <snip> [ricky@ricky-laptop ~]$ vi /tmp/kubeconfig [ricky@ricky-laptop ~]$ export KUBECONFIG=/tmp/kubeconfig [ricky@ricky-laptop ~]$ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-130-65.ec2.internal Ready master 15m v1.14.0+bd34733a7 ip-10-0-135-112.ec2.internal Ready master 15m v1.14.0+bd34733a7 ip-10-0-135-249.ec2.internal Ready worker 7m29s v1.14.0+bd34733a7 ip-10-0-143-32.ec2.internal Ready worker 7m31s v1.14.0+bd34733a7 ip-10-0-145-65.ec2.internal Ready worker 7m31s v1.14.0+bd34733a7 ip-10-0-148-189.ec2.internal Ready master 15m v1.14.0+bd34733a7 [ricky@ricky-laptop ~]$ podman pull quay.io/openshift/origin-must-gather Trying to pull docker://quay.io/openshift/origin-must-gather...Getting image source signatures Copying blob 5dfdb3f0bcc0 skipped: already exists Copying blob 806a9a74c184 skipped: already exists Copying blob 34999dd9efc2 skipped: already exists Copying blob 99f178453a43 skipped: already exists Copying blob 1bbd7989ec96 done Copying blob 2baeb8593e81 done Copying config 1d9d780fec done Writing manifest to image destination Storing signatures 1d9d780fec83cf21bf1dcf1e2b94431113c6b6855c684b5cf93740531f777474 [ricky@ricky-laptop ~]$ mkdir /tmp/kube [ricky@ricky-laptop ~]$ cp /tmp/kubeconfig /tmp/kube/config [ricky@ricky-laptop ~]$ mkdir /tmp/must-gather [ricky@ricky-laptop ~]$ podman run -v /tmp/kube:/root/.kube -v /tmp/must-gather/:/must-gather -it quay.io/openshift/origin-must-gather gather_network_logs WARNING: Collecting network logs on ALL nodes in your cluster. This could take a large amount of time. Error from server: error dialing backend: remote error: tls: internal error Error from server: error dialing backend: remote error: tls: internal error Error from server: error dialing backend: remote error: tls: internal error Error from server: error dialing backend: remote error: tls: internal error INFO: Waiting for node network log collection to complete ... INFO: Node network log collection to complete. [ricky@ricky-laptop ~]$ ls /tmp/must-gather/network_logs/ ip-10-0-130-65.ec2.internal_iptables ip-10-0-135-249.ec2.internal_iptables ip-10-0-145-65.ec2.internal_iptables ovs-8ksbl_ovsdb_log ovs-dg4q7_ovsdb_log ovs-k4brh_ovsdb_log ip-10-0-130-65.ec2.internal_ovs_dump ip-10-0-135-249.ec2.internal_ovs_dump ip-10-0-145-65.ec2.internal_ovs_dump ovs-8ksbl_vswitchd_log ovs-dg4q7_vswitchd_log ovs-k4brh_vswitchd_log ip-10-0-135-112.ec2.internal_iptables ip-10-0-143-32.ec2.internal_iptables ip-10-0-148-189.ec2.internal_iptables ovs-8rv8x_ovsdb_log ovs-f8cdr_ovsdb_log ovs-tzsv8_ovsdb_log ip-10-0-135-112.ec2.internal_ovs_dump ip-10-0-143-32.ec2.internal_ovs_dump ip-10-0-148-189.ec2.internal_ovs_dump ovs-8rv8x_vswitchd_log ovs-f8cdr_vswitchd_log ovs-tzsv8_vswitchd_log </snip> Maybe earlier the steps had a bogus step, but basicly is having a kubeconfig for your cluster copied over as /tmp/kube/config . And yes, I'm not implying a customer should use these steps to gather network logs. As stated in a previous comment, there's a bug in oc adm must-gather that prevents from running arbitrary commands, which is why I opened https://bugzilla.redhat.com/show_bug.cgi?id=1731394 . I put the 'podman' steps as a method for testing the whole purpose of this particular fix works, and that is the gather_network_logs script does save logs on /must-gather.
Thanks @ricardo for trying at your side. I am still seeing the same error trying it on actual AWS cluster following same steps. Not sure yet how to verify this fix $ podman run -v /tmp/kube:/root/.kube -v /tmp/must-gather/:/must-gather -it quay.io/openshift/origin-must-gather gather_network_logs Error: cannot specify gid= mount options for unmapped gid in rootless containers : OCI runtime error
Are you running a recent runc/podman version? https://github.com/containers/libpod/issues/1147 https://github.com/containers/libpod/issues/3541
(In reply to Ricardo Carrillo Cruz from comment #13) > Are you running a recent runc/podman version? > > https://github.com/containers/libpod/issues/1147 > https://github.com/containers/libpod/issues/3541 Hi Ricardo, My versions are following and i am using latest 4.2 nightly build [core@ip-]$ runc --version runc version spec: 1.0.0 [core@ip-]$ podman --version podman version 1.4.2
These are my versions: [ricky@ricky-laptop openshift-ci-prom]$ runc --version runc version 1.0.0-rc6+dev commit: a5dee658ceacfe758cf47ba4ff4319278e58760c spec: 1.0.1-dev [ricky@ricky-laptop openshift-ci-prom]$ podman --version podman version 1.2.0
(In reply to Ricardo Carrillo Cruz from comment #15) > These are my versions: > > [ricky@ricky-laptop openshift-ci-prom]$ runc --version > runc version 1.0.0-rc6+dev > commit: a5dee658ceacfe758cf47ba4ff4319278e58760c > spec: 1.0.1-dev > [ricky@ricky-laptop openshift-ci-prom]$ podman --version > podman version 1.2.0 Hmm, i need to find a way to upgrade my runc to match your version. Podman version looks okay. Not sure right now how the runc can be upgraded but will dig into this or if you have any suggestions. Thanks.
Another alternative could be just use docker run to get into the must-gather container. Then create a KUBECONFIG file to point to a cluster, export the envvar, and verify that gather_network_logs does indeed save contents /must-gather.
Thanks Ricardo. I am going to try this today and will keep you apprised
Alright seems good to me post updating runc to latest version. Podman run also complained about OCI runtime error so had no option left but to try updating runc. However BZ 1731394 fix would give more better picture post fix. [core@ip-10-0-141-209 ~]$ runc --version runc version spec: 1.0.0-rc6+dev [core@ip-10-0-141-209 ~]$ podman --version podman version 1.4.2 [core@ip-10-0-141-209 ~]$ vi /tmp/kubeconfig [core@ip-10-0-141-209 ~]$ export KUBECONFIG=/tmp/kubeconfig [core@ip-10-0-141-209 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-131-242.ap-northeast-1.compute.internal Ready worker 29h v1.14.0+2e9d4a117 ip-10-0-141-209.ap-northeast-1.compute.internal Ready master 29h v1.14.0+2e9d4a117 ip-10-0-146-149.ap-northeast-1.compute.internal Ready master 29h v1.14.0+2e9d4a117 ip-10-0-156-230.ap-northeast-1.compute.internal Ready worker 29h v1.14.0+2e9d4a117 ip-10-0-167-156.ap-northeast-1.compute.internal Ready master 29h v1.14.0+2e9d4a117 [core@ip-10-0-141-209 ~]$ podman pull quay.io/openshift/origin-must-gather Trying to pull docker://quay.io/openshift/origin-must-gather...Getting image source signatures Copying blob 7dfdb3f0bcc0 skipped: already exists Copying blob 506a9a74c184 skipped: already exists Copying blob 84999dd9efc2 skipped: already exists Copying blob 49f178453a43 skipped: already exists Copying blob 3bbd7989ec96 done Copying blob 9baeb8593e81 done Copying config 6d9d780fec done Writing manifest to image destination Storing signatures 4d9d780fec83cf21bf1dcf1e2b94431113c6b6855c684b5cf93740531f777474 [core@ip-10-0-141-209 ~]$ mkdir /tmp/kube [core@ip-10-0-141-209 ~]$ cp /tmp/kubeconfig /tmp/kube/config [core@ip-10-0-141-209 ~]$$ mkdir /tmp/must-gather [core@ip-10-0-141-209 ~]$$ podman run -v /tmp/kube:/root/.kube -v /tmp/must-gather/:/must-gather -it quay.io/openshift/origin-must-gather gather_network_logs WARNING: Collecting network logs on ALL nodes in your cluster. This could take a large amount of time. INFO: Waiting for node network log collection to complete ... INFO: Node network log collection to complete. [core@ip-10-0-141-209 ~]$ ls /tmp/must-gather/network_logs/ ip-10-0-131-242.ec2.internal_iptables ip-10-0-146-149.ec2.internal_iptables ip-10-0-167-156.ec2.internal_iptables ovs-7ksbl_ovsdb_log ovs-eg4q7_ovsdb_log ovs-h4brh_ovsdb_log ip-10-0-131-242.ec2.internal_ovs_dump ip-10-0-146-149.ec2.internal_ovs_dump ip-10-0-167-156.ec2.internal_ovs_dump ovs-7ksbl_vswitchd_log ovs-eg4q7_vswitchd_log ovs-h4brh_vswitchd_log ip-10-0-141-209.ec2.internal_iptables ip-10-0-156-230.ec2.internal_iptables ip-10-0-146-149.ec2.internal_iptables ovs-7rv8x_ovsdb_log ovs-g8cdr_ovsdb_log ovs-nzsv8_ovsdb_log ip-10-0-141-209.ec2.internal_ovs_dump ip-10-0-156-230.ec2.internal_ovs_dump ip-10-0-146-149.ec2.internal_ovs_dump ovs-7rv8x_vswitchd_log ovs-g8cdr_vswitchd_log ovs-nzsv8_vswitchd_log
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922