Created attachment 1725680 [details] terminal log Description of problem: --------------------------------- Currently, OCS 4.6 must-gather script waits for a helper pod to be up to collect ceph command outputs in text files (for internal mode) But, in some situations, like when the storage cluster is already deleted or helper pod stays in pending state due to resource unavailability, the script does the following: a) 50 re-tries to bring up the helper pod b) Even if the pod still doesn't come up, it attempts to collect must-gather outputs and some of the failures seen on the terminal are added in additional information section ---snip--- [must-gather-gn4xm] POD Error from server (NotFound): pods "must-gather-gn4xm-helper" not found [must-gather-gn4xm] POD collecting snapshot info for ceph rbd volumes [must-gather-gn4xm] POD error: error executing jsonpath "{range .items[*]}{@.metadata.name}{'\\n'}{end}": Error executing template: not in range, nothing to end. Printing more information for debugging the template: [must-gather-gn4xm] POD template was: [must-gather-gn4xm] POD {range .items[*]}{@.metadata.name}{'\n'}{end} [must-gather-gn4xm] POD object given to jsonpath engine was: [must-gather-gn4xm] POD map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":"", "selfLink":""}} [must-gather-gn4xm] POD Version-Release number of selected component (if applicable): ------------------------------------------------------------------ All OCS versions How reproducible: ======================== Always. Steps to Reproduce: -------------------------- 1. Create a situation that the must-gather helper pod doesn't come up, e.g. cordon node the moment the helper pod would be created or for this particular case - delete the storagecluster to initiate uninstall 2. Start ocs must-gather, when storagecluster is deleted but namespace and other resources still exist 3.Check the terminal log collection and confirm that there are a few errors thrownin when it tries to collect ceph commands(ceph is already deleted) Actual results: ----------------------- At least for above scenario, when the ceph cluster is already deleted, the helper pod failed to come up but still ceph command collection was attempted, which threw some error messages for few specific commands only. Expected results: ------------------------ If helper pod is not up, no use of attempting to collect must-gather outputs. But the message should be properly handled. The reason for skip should also be added. Additional info: ----------------------- snip [must-gather-z87pt] POD waiting for helper pod to come up in openshift-storage namespace. Retrying 49 [must-gather-z87pt] POD Error from server (NotFound): pods "must-gather-z87pt-helper" not found [must-gather-z87pt] POD waiting for helper pod to come up in openshift-storage namespace. Retrying 50 [must-gather-z87pt] POD Error from server (NotFound): pods "must-gather-z87pt-helper" not found [must-gather-z87pt] POD Error from server (NotFound): pods "must-gather-z87pt-helper" not found [must-gather-z87pt] POD collecting command output for: ceph auth list [must-gather-z87pt] POD collecting command output for: ceph balancer dump [must-gather-z87pt] POD collecting command output for: ceph balancer pool ls [must-gather-z87pt] POD collecting command output for: ceph balancer status [must-gather-z87pt] POD collecting command output for: ceph config dump [must-gather-z87pt] POD Error from server (NotFound): pods "must-gather-z87pt-helper" not found [must-gather-z87pt] POD collecting snapshot info for ceph rbd volumes [must-gather-z87pt] POD error: error executing jsonpath "{range .items[*]}{@.metadata.name}{'\\n'}{end}": Error executing template: not in range, nothing to end. Printing more information for debugging the template: [must-gather-z87pt] POD template was: [must-gather-z87pt] POD {range .items[*]}{@.metadata.name}{'\n'}{end} [must-gather-z87pt] POD object given to jsonpath engine was: [must-gather-z87pt] POD map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":"", "selfLink":""}} [must-gather-z87pt] POD [must-gather-z87pt] POD [must-gather-z87pt] POD collecting snapshot info for ceph subvolumes [must-gather-z87pt] POD error: error executing jsonpath "{range .items[*]}{@.metadata.name}{'\\n'}{end}": Error executing template: not in range, nothing to end. Printing more information for debugging the template: [must-gather-z87pt] POD template was: [must-gather-z87pt] POD {range .items[*]}{@.metadata.name}{'\n'}{end} [must-gather-z87pt] POD object given to jsonpath engine was: [must-gather-z87pt] POD map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":"", "selfLink":""}} [must-gather-z87pt] POD [must-gather-z87pt] POD [must-gather-z87pt] POD collecting command output for: ceph-volume lvm list [must-gather-z87pt] POD No resources found in openshift-storage namespace. [must-gather-z87pt] POD collecting command output for: ceph-volume raw list [must-gather-z87pt] POD No resources found in openshift-storage namespace. [must-gather-z87pt] POD collecting prepare volume logs from node compute-0 [must-gather-z87pt] POD collecting prepare volume logs from node compute-1 [must-gather-z87pt] POD collecting prepare volume logs from node compute-2 [must-gather-z87pt] POD Error from server (NotFound): pods "must-gather-z87pt-helper" not found [must-gather-z87pt] POD error: the path "pod_helper.yaml" does not exist
As discussed with Pulkit, it seems we do not skip ceph command collection even when all re-tries for creating must-gather-helper pod are exhausted. must-gather Logs and terminal logs copied here - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bug-1893611/
Also, some of the error messages seen (both for internal and external) when >> 1. During [must-gather-gn4xm] POD collecting dump cephobjectstores [must-gather-gn4xm] POD collecting dump cephobjectstoreusers [must-gather-gn4xm] POD error: error executing jsonpath "{range .items[*]}{@.metadata.name}{'\\n'}{end}": Error executing template: not in range, nothing to end. Printing more information for debugging the template: [must-gather-gn4xm] POD template was: [must-gather-gn4xm] POD {range .items[*]}{@.metadata.name}{'\n'}{end} [must-gather-gn4xm] POD object given to jsonpath engine was: [must-gather-gn4xm] POD map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":"", "selfLink":""}} [must-gather-gn4xm] POD [must-gather-gn4xm] POD >>2 . few ceph commands [must-gather-gn4xm] POD Error from server (NotFound): pods "must-gather-gn4xm-helper" not found [must-gather-gn4xm] POD collecting snapshot info for ceph rbd volumes [must-gather-gn4xm] POD error: error executing jsonpath "{range .items[*]}{@.metadata.name}{'\\n'}{end}": Error executing template: not in range, nothing to end. Printing more information for debugging the template: [must-gather-gn4xm] POD template was: [must-gather-gn4xm] POD {range .items[*]}{@.metadata.name}{'\n'}{end} [must-gather-gn4xm] POD object given to jsonpath engine was: [must-gather-gn4xm] POD map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":"", "selfLink":""}} [must-gather-gn4xm] POD [must-gather-gn4xm] POD [must-gather-gn4xm] POD collecting snapshot info for ceph subvolumes [must-gather-gn4xm] POD error: error executing jsonpath "{range .items[*]}{@.metadata.name}{'\\n'}{end}": Error executing template: not in range, nothing to end. Printing more information for debugging the template: [must-gather-gn4xm] POD template was: [must-gather-gn4xm] POD {range .items[*]}{@.metadata.name}{'\n'}{end} [must-gather-gn4xm] POD object given to jsonpath engine was: [must-gather-gn4xm] POD map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":"", "selfLink":""}} [must-gather-gn4xm] POD [must-gather-gn4xm] POD [must-gather-gn4xm] POD collecting command output for: ceph-volume lvm list [must-gather-gn4xm] POD No resources found in openshift-storage namespace. [must-gather-gn4xm] POD collecting command output for: ceph-volume raw list [must-gather-gn4xm] POD No resources found in openshift-storage namespace. [must-gather-gn4xm] POD No resources found [must-gather-gn4xm] POD error: the path "pod_helper.yaml" does not exist [must-gather-gn4xm] POD Error from server (NotFound): pods "must-gather-gn4xm-helper" not found [must-gather-gn4xm] POD No resources found
(In reply to Neha Berry from comment #0) > Created attachment 1725680 [details] > terminal log > > > Steps to Reproduce: > -------------------------- > 1. Create a situation that the must-gather helper pod doesn't come up, e.g. > cordon node the moment the helper pod would be created or for this > particular case - delete the storagecluster to initiate uninstall > 2. Start ocs must-gather, when storagecluster is deleted but namespace and > other resources still exist > 3.Check the terminal log collection and confirm that there are a few errors > thrownin when it tries to collect ceph commands(ceph is already deleted) > One more way to reproduce this is: a) install OCS operator but do not install Storagecluster b) initiate must-gather collection We will see the error messages and helper pod re-tries, followed by attempts to collect ceph outputs
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2041