Bug 1877428

Summary: Update Must Gather to pull raw core dumps off nodes
Product: OpenShift Container Platform Reporter: Andrew Stoycos <astoycos>
Component: NetworkingAssignee: Andrew Stoycos <astoycos>
Networking sub component: ovn-kubernetes QA Contact: Ross Brattain <rbrattai>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: anusaxen, jtanenba
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-22 17:16:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1887446    

Description Andrew Stoycos 2020-09-09 15:28:17 UTC
Description of problem:

Currently Must Gather only displays some information, i.e coredumpctl info, about core dumps stored on nodes, we need to extract all raw core dumps off each node for some applications and some CI tests

Comment 3 Anurag saxena 2020-09-10 14:55:25 UTC
@Ross, can you help verifying this? Thanks

Comment 8 Ross Brattain 2020-09-10 23:46:12 UTC
Forgot to mention that until 1728135 is fixed, the oc debug image might not always be available on disconnected clusters.  

See https://bugzilla.redhat.com/show_bug.cgi?id=1728135

For QE automation we hack workaround using oc debug --image with the multus image, since that image always exists on every node.

oc debug node/"${NODE}" --image=$(oc get pod -n openshift-multus -l app=multus -o jsonpath='{.items[0].spec.containers[?(@.name=="kube-multus")].image}')

Comment 10 Anurag saxena 2020-09-15 20:01:47 UTC
@Ross, Can you help verifying it?

Comment 12 Ross Brattain 2020-09-15 21:33:33 UTC
Maybe this  

https://github.com/openshift/oc/blob/bfd07f8816d45f76181412fe67c919dcfba5d55f/pkg/cli/debug/debug.go#L380

	generateName := names.SimpleNameGenerator.GenerateName("openshift-debug-node-")

Comment 13 Ross Brattain 2020-09-15 21:35:31 UTC
so looks like we create a temp namespace in some cases, and the sed is matching the `namespace/.*`  instead of `pod/.*`

Comment 14 Andrew Stoycos 2020-09-16 12:52:32 UTC
Ah ok that's definitely unexpected behavior... I wish we just labeled this pod somehow, it would make getting the pod name much less hackish. In those instances where we make a tmp namespace do we also still make a pod? I could ensure we match on `pod/.*' rather than just '/' but it still might run into issues.

Comment 17 Ross Brattain 2020-09-22 17:15:55 UTC
Verified on 4.6.0-0.nightly-2020-09-22-073212


[must-gather-5nqlx] POD WARNING: Collecting network logs on ALL linux nodes in your cluster. This could take a long time.
[must-gather-5nqlx] POD INFO: Waiting for node core dump collection to complete ...
[must-gather-5nqlx] POD pod/qe46g23-s8c8l-master-1copenshift-qeinternal-debug condition met
[must-gather-5nqlx] POD pod/qe46g23-s8c8l-worker-c-8h6mgcopenshift-qeinternal-debug condition met
[must-gather-5nqlx] POD Copying core dumps on node qe46g23-s8c8l-worker-c-8h6mg.c.openshift-qe.internal
[must-gather-5nqlx] POD Copying core dumps on node qe46g23-s8c8l-master-1.c.openshift-qe.internal
[must-gather-5nqlx] POD pod/qe46g23-s8c8l-worker-a-ssmq9copenshift-qeinternal-debug condition met
[must-gather-5nqlx] POD Copying core dumps on node qe46g23-s8c8l-worker-a-ssmq9.c.openshift-qe.internal
[must-gather-5nqlx] POD pod/qe46g23-s8c8l-master-2copenshift-qeinternal-debug condition met
[must-gather-5nqlx] POD Copying core dumps on node qe46g23-s8c8l-master-2.c.openshift-qe.internal
[must-gather-5nqlx] POD pod/qe46g23-s8c8l-worker-b-mwtsbcopenshift-qeinternal-debug condition met
[must-gather-5nqlx] POD Copying core dumps on node qe46g23-s8c8l-worker-b-mwtsb.c.openshift-qe.internal
[must-gather-5nqlx] POD pod/qe46g23-s8c8l-master-0copenshift-qeinternal-debug condition met
[must-gather-5nqlx] POD Copying core dumps on node qe46g23-s8c8l-master-0.c.openshift-qe.internal
[must-gather-5nqlx] POD pod "qe46g23-s8c8l-worker-c-8h6mgcopenshift-qeinternal-debug" deleted
[must-gather-5nqlx] POD pod "qe46g23-s8c8l-worker-a-ssmq9copenshift-qeinternal-debug" deleted
[must-gather-5nqlx] POD pod "qe46g23-s8c8l-worker-b-mwtsbcopenshift-qeinternal-debug" deleted
[must-gather-5nqlx] POD pod "qe46g23-s8c8l-master-1copenshift-qeinternal-debug" deleted
[must-gather-5nqlx] POD pod "qe46g23-s8c8l-master-2copenshift-qeinternal-debug" deleted
[must-gather-5nqlx] POD pod "qe46g23-s8c8l-master-0copenshift-qeinternal-debug" deleted
[must-gather-5nqlx] POD INFO: Node core dump collection to complete.
[must-gather-5nqlx] OUT waiting for gather to complete
[must-gather-5nqlx] OUT downloading gather output
[must-gather-5nqlx] OUT receiving incremental file list
[must-gather-5nqlx] OUT ./
[must-gather-5nqlx] OUT node_core_dumps/
[must-gather-5nqlx] OUT node_core_dumps/qe46g23-s8c8l-master-0.c.openshift-qe.internal_core_dump/
[must-gather-5nqlx] OUT node_core_dumps/qe46g23-s8c8l-master-0.c.openshift-qe.internal_core_dump/core.1234
[must-gather-5nqlx] OUT node_core_dumps/qe46g23-s8c8l-master-1.c.openshift-qe.internal_core_dump/