Bug 1877428 - Update Must Gather to pull raw core dumps off nodes
Summary: Update Must Gather to pull raw core dumps off nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.6.0
Assignee: Andrew Stoycos
QA Contact: Ross Brattain
URL:
Whiteboard:
Depends On:
Blocks: 1887446
TreeView+ depends on / blocked
 
Reported: 2020-09-09 15:28 UTC by Andrew Stoycos
Modified: 2020-10-12 14:05 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-22 17:16:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift must-gather pull 173 0 None closed Bug 1877428: Update gather_core_dumps to get raw coredump files 2021-02-18 18:36:30 UTC
Github openshift must-gather pull 174 0 None closed Bug 1877428: Update gather_core_dumps collection of debug pod's name, clean up resources 2021-02-18 18:36:30 UTC
Github openshift must-gather pull 177 0 None closed Bug 1877428: Increase gather_core_dump robustness after seeing some CI issues 2021-02-18 18:36:31 UTC
Github openshift must-gather pull 179 0 None closed Bug 1877428: Update `gather_core_dumps` to fix unexpected behavior in CI 2021-02-18 18:36:31 UTC

Description Andrew Stoycos 2020-09-09 15:28:17 UTC
Description of problem:

Currently Must Gather only displays some information, i.e coredumpctl info, about core dumps stored on nodes, we need to extract all raw core dumps off each node for some applications and some CI tests

Comment 3 Anurag saxena 2020-09-10 14:55:25 UTC
@Ross, can you help verifying this? Thanks

Comment 8 Ross Brattain 2020-09-10 23:46:12 UTC
Forgot to mention that until 1728135 is fixed, the oc debug image might not always be available on disconnected clusters.  

See https://bugzilla.redhat.com/show_bug.cgi?id=1728135

For QE automation we hack workaround using oc debug --image with the multus image, since that image always exists on every node.

oc debug node/"${NODE}" --image=$(oc get pod -n openshift-multus -l app=multus -o jsonpath='{.items[0].spec.containers[?(@.name=="kube-multus")].image}')

Comment 10 Anurag saxena 2020-09-15 20:01:47 UTC
@Ross, Can you help verifying it?

Comment 12 Ross Brattain 2020-09-15 21:33:33 UTC
Maybe this  

https://github.com/openshift/oc/blob/bfd07f8816d45f76181412fe67c919dcfba5d55f/pkg/cli/debug/debug.go#L380

	generateName := names.SimpleNameGenerator.GenerateName("openshift-debug-node-")

Comment 13 Ross Brattain 2020-09-15 21:35:31 UTC
so looks like we create a temp namespace in some cases, and the sed is matching the `namespace/.*`  instead of `pod/.*`

Comment 14 Andrew Stoycos 2020-09-16 12:52:32 UTC
Ah ok that's definitely unexpected behavior... I wish we just labeled this pod somehow, it would make getting the pod name much less hackish. In those instances where we make a tmp namespace do we also still make a pod? I could ensure we match on `pod/.*' rather than just '/' but it still might run into issues.

Comment 17 Ross Brattain 2020-09-22 17:15:55 UTC
Verified on 4.6.0-0.nightly-2020-09-22-073212


[must-gather-5nqlx] POD WARNING: Collecting network logs on ALL linux nodes in your cluster. This could take a long time.
[must-gather-5nqlx] POD INFO: Waiting for node core dump collection to complete ...
[must-gather-5nqlx] POD pod/qe46g23-s8c8l-master-1copenshift-qeinternal-debug condition met
[must-gather-5nqlx] POD pod/qe46g23-s8c8l-worker-c-8h6mgcopenshift-qeinternal-debug condition met
[must-gather-5nqlx] POD Copying core dumps on node qe46g23-s8c8l-worker-c-8h6mg.c.openshift-qe.internal
[must-gather-5nqlx] POD Copying core dumps on node qe46g23-s8c8l-master-1.c.openshift-qe.internal
[must-gather-5nqlx] POD pod/qe46g23-s8c8l-worker-a-ssmq9copenshift-qeinternal-debug condition met
[must-gather-5nqlx] POD Copying core dumps on node qe46g23-s8c8l-worker-a-ssmq9.c.openshift-qe.internal
[must-gather-5nqlx] POD pod/qe46g23-s8c8l-master-2copenshift-qeinternal-debug condition met
[must-gather-5nqlx] POD Copying core dumps on node qe46g23-s8c8l-master-2.c.openshift-qe.internal
[must-gather-5nqlx] POD pod/qe46g23-s8c8l-worker-b-mwtsbcopenshift-qeinternal-debug condition met
[must-gather-5nqlx] POD Copying core dumps on node qe46g23-s8c8l-worker-b-mwtsb.c.openshift-qe.internal
[must-gather-5nqlx] POD pod/qe46g23-s8c8l-master-0copenshift-qeinternal-debug condition met
[must-gather-5nqlx] POD Copying core dumps on node qe46g23-s8c8l-master-0.c.openshift-qe.internal
[must-gather-5nqlx] POD pod "qe46g23-s8c8l-worker-c-8h6mgcopenshift-qeinternal-debug" deleted
[must-gather-5nqlx] POD pod "qe46g23-s8c8l-worker-a-ssmq9copenshift-qeinternal-debug" deleted
[must-gather-5nqlx] POD pod "qe46g23-s8c8l-worker-b-mwtsbcopenshift-qeinternal-debug" deleted
[must-gather-5nqlx] POD pod "qe46g23-s8c8l-master-1copenshift-qeinternal-debug" deleted
[must-gather-5nqlx] POD pod "qe46g23-s8c8l-master-2copenshift-qeinternal-debug" deleted
[must-gather-5nqlx] POD pod "qe46g23-s8c8l-master-0copenshift-qeinternal-debug" deleted
[must-gather-5nqlx] POD INFO: Node core dump collection to complete.
[must-gather-5nqlx] OUT waiting for gather to complete
[must-gather-5nqlx] OUT downloading gather output
[must-gather-5nqlx] OUT receiving incremental file list
[must-gather-5nqlx] OUT ./
[must-gather-5nqlx] OUT node_core_dumps/
[must-gather-5nqlx] OUT node_core_dumps/qe46g23-s8c8l-master-0.c.openshift-qe.internal_core_dump/
[must-gather-5nqlx] OUT node_core_dumps/qe46g23-s8c8l-master-0.c.openshift-qe.internal_core_dump/core.1234
[must-gather-5nqlx] OUT node_core_dumps/qe46g23-s8c8l-master-1.c.openshift-qe.internal_core_dump/


Note You need to log in before you can comment on or make changes to this bug.