Bug 1893611 - Skip ceph commands collection attempt if must-gather helper pod is not created
Summary: Skip ceph commands collection attempt if must-gather helper pod is not created
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: must-gather
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.7.0
Assignee: Pulkit Kundra
QA Contact: Oded
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-02 06:48 UTC by Neha Berry
Modified: 2021-06-01 08:51 UTC (History)
7 users (show)

Fixed In Version: 4.7.0-262.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-19 09:16:13 UTC
Embargoed:


Attachments (Terms of Use)
terminal log (124.09 KB, text/plain)
2020-11-02 06:48 UTC, Neha Berry
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift ocs-operator pull 1018 0 None closed Must-gather: skip the ceph collection 2021-02-13 16:24:30 UTC
Github openshift ocs-operator pull 1064 0 None closed Bug 1893611: [release-4.7] Must-gather: skip the ceph collection 2021-02-17 17:40:53 UTC
Red Hat Product Errata RHSA-2021:2041 0 None None None 2021-05-19 09:16:51 UTC

Description Neha Berry 2020-11-02 06:48:39 UTC
Created attachment 1725680 [details]
terminal log

Description of problem:
---------------------------------
Currently, OCS 4.6 must-gather script waits for a helper pod to be up to collect ceph command outputs in text files (for internal mode)

But, in some situations, like when the storage cluster is already deleted or helper pod stays in pending state due to resource unavailability, the script does the following:

a) 50 re-tries to bring up the helper pod
b) Even if the pod still doesn't come up, it attempts to collect must-gather outputs and some of the failures seen on the terminal are added in additional information section

---snip---

[must-gather-gn4xm] POD Error from server (NotFound): pods "must-gather-gn4xm-helper" not found
[must-gather-gn4xm] POD collecting snapshot info for ceph rbd volumes 
[must-gather-gn4xm] POD error: error executing jsonpath "{range .items[*]}{@.metadata.name}{'\\n'}{end}": Error executing template: not in range, nothing to end. Printing more information for debugging the template:
[must-gather-gn4xm] POD 	template was:
[must-gather-gn4xm] POD 		{range .items[*]}{@.metadata.name}{'\n'}{end}
[must-gather-gn4xm] POD 	object given to jsonpath engine was:
[must-gather-gn4xm] POD 		map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":"", "selfLink":""}}
[must-gather-gn4xm] POD 




Version-Release number of selected component (if applicable):
------------------------------------------------------------------
All OCS versions

How reproducible:
========================
Always.

Steps to Reproduce:
--------------------------
1. Create a situation that the must-gather helper pod doesn't come up, e.g. cordon node the moment the helper pod would be created or for this particular case - delete the storagecluster to initiate uninstall
2. Start ocs must-gather, when storagecluster is deleted but namespace and other resources still exist
3.Check the terminal log collection and confirm that there are a few errors thrownin when it tries to collect ceph commands(ceph is already deleted)

Actual results:
-----------------------
At least for above scenario, when the ceph cluster is already deleted, the helper pod failed to come up but still ceph command collection was attempted, which threw some error messages for few specific commands only.

Expected results:
------------------------
If helper pod is not up, no use of attempting to collect must-gather outputs. But the message should be properly handled.

The reason for skip should also be added.

Additional info:
-----------------------

snip

[must-gather-z87pt] POD waiting for helper pod to come up in openshift-storage namespace. Retrying 49
[must-gather-z87pt] POD Error from server (NotFound): pods "must-gather-z87pt-helper" not found
[must-gather-z87pt] POD waiting for helper pod to come up in openshift-storage namespace. Retrying 50
[must-gather-z87pt] POD Error from server (NotFound): pods "must-gather-z87pt-helper" not found
[must-gather-z87pt] POD Error from server (NotFound): pods "must-gather-z87pt-helper" not found
[must-gather-z87pt] POD collecting command output for: ceph auth list
[must-gather-z87pt] POD collecting command output for: ceph balancer dump
[must-gather-z87pt] POD collecting command output for: ceph balancer pool ls
[must-gather-z87pt] POD collecting command output for: ceph balancer status
[must-gather-z87pt] POD collecting command output for: ceph config dump



[must-gather-z87pt] POD Error from server (NotFound): pods "must-gather-z87pt-helper" not found
[must-gather-z87pt] POD collecting snapshot info for ceph rbd volumes 
[must-gather-z87pt] POD error: error executing jsonpath "{range .items[*]}{@.metadata.name}{'\\n'}{end}": Error executing template: not in range, nothing to end. Printing more information for debugging the template:
[must-gather-z87pt] POD 	template was:
[must-gather-z87pt] POD 		{range .items[*]}{@.metadata.name}{'\n'}{end}
[must-gather-z87pt] POD 	object given to jsonpath engine was:
[must-gather-z87pt] POD 		map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":"", "selfLink":""}}
[must-gather-z87pt] POD 
[must-gather-z87pt] POD 
[must-gather-z87pt] POD collecting snapshot info for ceph subvolumes 
[must-gather-z87pt] POD error: error executing jsonpath "{range .items[*]}{@.metadata.name}{'\\n'}{end}": Error executing template: not in range, nothing to end. Printing more information for debugging the template:
[must-gather-z87pt] POD 	template was:
[must-gather-z87pt] POD 		{range .items[*]}{@.metadata.name}{'\n'}{end}
[must-gather-z87pt] POD 	object given to jsonpath engine was:
[must-gather-z87pt] POD 		map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":"", "selfLink":""}}
[must-gather-z87pt] POD 
[must-gather-z87pt] POD 
[must-gather-z87pt] POD collecting command output for: ceph-volume lvm list
[must-gather-z87pt] POD No resources found in openshift-storage namespace.
[must-gather-z87pt] POD collecting command output for: ceph-volume raw list
[must-gather-z87pt] POD No resources found in openshift-storage namespace.
[must-gather-z87pt] POD collecting prepare volume logs from node compute-0 
[must-gather-z87pt] POD collecting prepare volume logs from node compute-1 
[must-gather-z87pt] POD collecting prepare volume logs from node compute-2 
[must-gather-z87pt] POD Error from server (NotFound): pods "must-gather-z87pt-helper" not found
[must-gather-z87pt] POD error: the path "pod_helper.yaml" does not exist

Comment 2 Neha Berry 2020-11-02 06:51:57 UTC
As discussed with Pulkit, it seems we do not skip ceph command collection even when all re-tries for creating must-gather-helper pod are exhausted.

must-gather Logs and terminal logs copied here - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bug-1893611/

Comment 3 Neha Berry 2020-11-02 07:09:50 UTC
Also, some of the error messages seen (both for internal and external) when 


>> 1. During [must-gather-gn4xm] POD collecting dump cephobjectstores



[must-gather-gn4xm] POD collecting dump cephobjectstoreusers
[must-gather-gn4xm] POD error: error executing jsonpath "{range .items[*]}{@.metadata.name}{'\\n'}{end}": Error executing template: not in range, nothing to end. Printing more information for debugging the template:
[must-gather-gn4xm] POD         template was:
[must-gather-gn4xm] POD                 {range .items[*]}{@.metadata.name}{'\n'}{end}
[must-gather-gn4xm] POD         object given to jsonpath engine was:
[must-gather-gn4xm] POD                 map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":"", "selfLink":""}}
[must-gather-gn4xm] POD
[must-gather-gn4xm] POD

>>2 . few ceph commands

[must-gather-gn4xm] POD Error from server (NotFound): pods "must-gather-gn4xm-helper" not found
[must-gather-gn4xm] POD collecting snapshot info for ceph rbd volumes
[must-gather-gn4xm] POD error: error executing jsonpath "{range .items[*]}{@.metadata.name}{'\\n'}{end}": Error executing template: not in range, nothing to end. Printing more information for debugging the template:
[must-gather-gn4xm] POD         template was:
[must-gather-gn4xm] POD                 {range .items[*]}{@.metadata.name}{'\n'}{end}
[must-gather-gn4xm] POD         object given to jsonpath engine was:
[must-gather-gn4xm] POD                 map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":"", "selfLink":""}}
[must-gather-gn4xm] POD
[must-gather-gn4xm] POD
[must-gather-gn4xm] POD collecting snapshot info for ceph subvolumes
[must-gather-gn4xm] POD error: error executing jsonpath "{range .items[*]}{@.metadata.name}{'\\n'}{end}": Error executing template: not in range, nothing to end. Printing more information for debugging the template:
[must-gather-gn4xm] POD         template was:
[must-gather-gn4xm] POD                 {range .items[*]}{@.metadata.name}{'\n'}{end}
[must-gather-gn4xm] POD         object given to jsonpath engine was:
[must-gather-gn4xm] POD                 map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":"", "selfLink":""}}
[must-gather-gn4xm] POD
[must-gather-gn4xm] POD
[must-gather-gn4xm] POD collecting command output for: ceph-volume lvm list
[must-gather-gn4xm] POD No resources found in openshift-storage namespace.
[must-gather-gn4xm] POD collecting command output for: ceph-volume raw list
[must-gather-gn4xm] POD No resources found in openshift-storage namespace.
[must-gather-gn4xm] POD No resources found
[must-gather-gn4xm] POD error: the path "pod_helper.yaml" does not exist
[must-gather-gn4xm] POD Error from server (NotFound): pods "must-gather-gn4xm-helper" not found
[must-gather-gn4xm] POD No resources found

Comment 4 Neha Berry 2020-11-02 07:31:14 UTC
(In reply to Neha Berry from comment #0)
> Created attachment 1725680 [details]
> terminal log
> 

> 
> Steps to Reproduce:
> --------------------------
> 1. Create a situation that the must-gather helper pod doesn't come up, e.g.
> cordon node the moment the helper pod would be created or for this
> particular case - delete the storagecluster to initiate uninstall
> 2. Start ocs must-gather, when storagecluster is deleted but namespace and
> other resources still exist
> 3.Check the terminal log collection and confirm that there are a few errors
> thrownin when it tries to collect ceph commands(ceph is already deleted)
> 

One more way to reproduce this is:

a) install OCS operator but do not install Storagecluster
b) initiate must-gather collection

We will see the error messages and helper pod re-tries, followed by attempts to collect ceph outputs

Comment 11 errata-xmlrpc 2021-05-19 09:16:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041


Note You need to log in before you can comment on or make changes to this bug.