Bug 2035774 - Must Gather, Ceph files do not exist on MG directory
Summary: Must Gather, Ceph files do not exist on MG directory
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: must-gather
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.10.0
Assignee: Subham Rai
QA Contact: Oded
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-27 13:48 UTC by Oded
Modified: 2023-08-09 16:35 UTC (History)
10 users (show)

Fixed In Version: 4.10.0-113
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-13 18:50:46 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 1451 0 None open must-gather: stop using tini in must gather helper pod 2022-01-17 04:16:37 UTC
Red Hat Product Errata RHSA-2022:1372 0 None None None 2022-04-13 18:53:23 UTC

Description Oded 2021-12-27 13:48:04 UTC
Description of problem (please be detailed as possible and provide log
snippests):
Must Gather, Ceph files do not exist on MG directory

Version of all relevant components (if applicable):
OCP Version:4.10.0-0.nightly-2021-12-23-153012
ODF Version:full_version=4.10.0-50
Platform: Vmware
ceph versions:
sh-4.4$ ceph versions
{
    "mon": {
        "ceph version 16.2.7-8.el8cp (342facd49bf8e908c5105a56bf7e7e6041643258) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.7-8.el8cp (342facd49bf8e908c5105a56bf7e7e6041643258) pacific (stable)": 1
    },
    "osd": {
        "ceph version 16.2.7-8.el8cp (342facd49bf8e908c5105a56bf7e7e6041643258) pacific (stable)": 3
    },
    "mds": {
        "ceph version 16.2.7-8.el8cp (342facd49bf8e908c5105a56bf7e7e6041643258) pacific (stable)": 2
    },
    "rgw": {
        "ceph version 16.2.7-8.el8cp (342facd49bf8e908c5105a56bf7e7e6041643258) pacific (stable)": 1
    },
    "overall": {
        "ceph version 16.2.7-8.el8cp (342facd49bf8e908c5105a56bf7e7e6041643258) pacific (stable)": 10
    }
}

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Run mg command:
oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.10

2.Check content on mg dir:
E           Exception: Files don't exist:
E           ['ceph_auth_list_--format_json-pretty', 'ceph_balancer_pool_ls_--format_json-pretty', 'ceph_balancer_status_--format_json-pretty', 'ceph_config-key_ls_--format_json-pretty', 'ceph_config_dump_--format_json-pretty', 'ceph_crash_ls_--format_json-pretty', 'ceph_crash_stat_--format_json-pretty', 'ceph_device_ls_--format_json-pretty', 'ceph_fs_dump_--format_json-pretty', 'ceph_fs_ls_--format_json-pretty', 'ceph_fs_status_--format_json-pretty', 'ceph_fs_subvolumegroup_ls_ocs-storagecluster-cephfilesystem_--format_json-pretty', 'ceph_health_detail_--format_json-pretty', 'ceph_mds_stat_--format_json-pretty', 'ceph_mgr_dump_--format_json-pretty', 'ceph_mgr_module_ls_--format_json-pretty', 'ceph_mgr_services_--format_json-pretty', 'ceph_mon_dump_--format_json-pretty', 'ceph_mon_stat_--format_json-pretty', 'ceph_osd_blacklist_ls_--format_json-pretty', 'ceph_osd_blocked-by_--format_json-pretty', 'ceph_osd_crush_class_ls_--format_json-pretty', 'ceph_osd_crush_dump_--format_json-pretty', 'ceph_osd_crush_rule_dump_--format_json-pretty', 'ceph_osd_crush_rule_ls_--format_json-pretty', 'ceph_osd_crush_show-tunables_--format_json-pretty', 'ceph_osd_crush_weight-set_dump_--format_json-pretty', 'ceph_osd_crush_weight-set_ls_--format_json-pretty', 'ceph_osd_df_--format_json-pretty', 'ceph_osd_df_tree_--format_json-pretty', 'ceph_osd_dump_--format_json-pretty', 'ceph_osd_getmaxosd_--format_json-pretty', 'ceph_osd_lspools_--format_json-pretty', 'ceph_osd_numa-status_--format_json-pretty', 'ceph_osd_perf_--format_json-pretty', 'ceph_osd_pool_ls_detail_--format_json-pretty', 'ceph_osd_stat_--format_json-pretty', 'ceph_osd_tree_--format_json-pretty', 'ceph_osd_utilization_--format_json-pretty', 'ceph_pg_dump_--format_json-pretty', 'ceph_pg_stat_--format_json-pretty', 'ceph_progress_--format_json-pretty', 'ceph_progress_json', 'ceph_progress_json_--format_json-pretty', 'ceph_quorum_status_--format_json-pretty', 'ceph_report_--format_json-pretty', 'ceph_service_dump_--format_json-pretty', 'ceph_status_--format_json-pretty', 'ceph_time-sync-status_--format_json-pretty', 'ceph_versions_--format_json-pretty', 'ceph_df_detail_--format_json-pretty']

E           Exception: Files don't exist:
E           ['ceph-volume_raw_list', 'ceph_auth_list', 'ceph_balancer_status', 'ceph_config-key_ls', 'ceph_config_dump', 'ceph_crash_stat', 'ceph_device_ls', 'ceph_fs_dump', 'ceph_fs_ls', 'ceph_fs_status', 'ceph_fs_subvolumegroup_ls_ocs-storagecluster-cephfilesystem', 'ceph_health_detail', 'ceph_mds_stat', 'ceph_mgr_dump', 'ceph_mgr_module_ls', 'ceph_mgr_services', 'ceph_mon_dump', 'ceph_mon_stat', 'ceph_osd_blocked-by', 'ceph_osd_crush_class_ls', 'ceph_osd_crush_dump', 'ceph_osd_crush_rule_dump', 'ceph_osd_crush_rule_ls', 'ceph_osd_crush_show-tunables', 'ceph_osd_crush_weight-set_dump', 'ceph_osd_df', 'ceph_osd_df_tree', 'ceph_osd_dump', 'ceph_osd_getmaxosd', 'ceph_osd_lspools', 'ceph_osd_numa-status', 'ceph_osd_perf', 'ceph_osd_pool_ls_detail', 'ceph_osd_stat', 'ceph_osd_tree', 'ceph_osd_utilization', 'ceph_pg_dump', 'ceph_pg_stat', 'ceph_quorum_status', 'ceph_report', 'ceph_service_dump', 'ceph_status', 'ceph_time-sync-status', 'ceph_versions', 'ceph_df_detail']

E           Exception: Files don't exist:
E           ['pools_rbd_ocs-storagecluster-cephblockpool']


3.Check gather-debug.log:
collecting prepare volume logs from node compute-2 
ceph core dump collection completed
***skipping the ceph collection********
total time taken by collection was 319 seconds 

4.Check helper pod status:
namespaces/openshift-storage/oc_output/pods_-owide:
must-gather-jh7kt-helper                 0/1     CreateContainerError   0          4m30s   10.128.2.43   compute-0   <none>           <none>


mg:
http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-2035774

Actual results:


Expected results:


Additional info:

Comment 2 Mudit Agarwal 2022-01-12 10:51:31 UTC
Helper pod is in container creating state.
If helper pod is not up then this is expected.

The reason why must-gather-helper pod is not up:

 Warning  Failed          4m30s (x2 over 4m30s)   kubelet            Error: container create failed: time="2021-12-27T13:19:21Z" level=error msg="container_linux.go:380: starting container process caused: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          4m29s                   kubelet            Error: container create failed: time="2021-12-27T13:19:22Z" level=error msg="container_linux.go:380: starting container process caused: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          4m14s                   kubelet            Error: container create failed: time="2021-12-27T13:19:37Z" level=error msg="container_linux.go:380: starting container process caused: exec: \"/tini\": stat /tini: no such file or directory"

Sebastien, must-gather is still using "tini" https://github.com/red-hat-storage/ocs-operator/blob/d38316a811f30bffb3ce535bc6dda4ab5ee1dc3b/must-gather/templates/pod.template#L18
We already removed it via https://github.com/red-hat-storage/ocs-operator/pull/1406 for toolbox pod, we need to do the same for must-gather pod also.

Comment 3 Sébastien Han 2022-01-12 13:50:08 UTC
Indeed, Subham PTAL.

Comment 4 Subham Rai 2022-01-12 15:02:44 UTC
The main branch still using tini https://github.com/red-hat-storage/ocs-operator/blob/main/must-gather/templates/pod.template#L18. I'll make the changes. Assigning to myself.

Comment 7 Oded 2022-01-24 19:34:46 UTC
Bug reconstructed, must-gather-helper pod stuck on CreateContainerError state

SetUp:
OCP Version:4.10.0-0.nightly-2022-01-24-020644
ODF Version:full_version=4.10.0-115
Platform: Vmware
Ceph versions:
sh-4.4$ ceph versions
{
    "mon": {
        "ceph version 16.2.7-32.el8cp (34a1b8b0c674a15f06e190b3f9c91ab84fd79cc6) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.7-32.el8cp (34a1b8b0c674a15f06e190b3f9c91ab84fd79cc6) pacific (stable)": 1
    },
    "osd": {
        "ceph version 16.2.7-32.el8cp (34a1b8b0c674a15f06e190b3f9c91ab84fd79cc6) pacific (stable)": 3
    },
    "mds": {
        "ceph version 16.2.7-32.el8cp (34a1b8b0c674a15f06e190b3f9c91ab84fd79cc6) pacific (stable)": 2
    },
    "rgw": {
        "ceph version 16.2.7-32.el8cp (34a1b8b0c674a15f06e190b3f9c91ab84fd79cc6) pacific (stable)": 1
    },
    "overall": {
        "ceph version 16.2.7-32.el8cp (34a1b8b0c674a15f06e190b3f9c91ab84fd79cc6) pacific (stable)": 10
    }
}

Test Process:
1.Run MG command:
$ oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.10

2.Check mg-helper pod status:
$ oc get pods | grep helpe
must-gather-m7s2j-helper                                          0/1     CreateContainerError   0          2m13s

3.Check mg dir content
Files do not exist:
['ceph-volume_raw_list', 'ceph_auth_list', 'ceph_balancer_status', 'ceph_config-key_ls', 'ceph_config_dump', 'ceph_crash_stat', 'ceph_device_ls', 'ceph_fs_dump', 'ceph_fs_ls', 'ceph_fs_status', 'ceph_fs_subvolumegroup_ls_ocs-storagecluster-cephfilesystem', 'ceph_health_detail', 'ceph_mds_stat', 'ceph_mgr_dump', 'ceph_mgr_module_ls', 'ceph_mgr_services', 'ceph_mon_dump', 'ceph_mon_stat', 'ceph_osd_blocked-by', 'ceph_osd_crush_class_ls', 'ceph_osd_crush_dump', 'ceph_osd_crush_rule_dump', 'ceph_osd_crush_rule_ls', 'ceph_osd_crush_show-tunables', 'ceph_osd_crush_weight-set_dump', 'ceph_osd_df', 'ceph_osd_df_tree', 'ceph_osd_dump', 'ceph_osd_getmaxosd', 'ceph_osd_lspools', 'ceph_osd_numa-status', 'ceph_osd_perf', 'ceph_osd_pool_ls_detail', 'ceph_osd_stat', 'ceph_osd_tree', 'ceph_osd_utilization', 'ceph_pg_dump', 'ceph_pg_stat', 'ceph_quorum_status', 'ceph_report', 'ceph_service_dump', 'ceph_status', 'ceph_time-sync-status', 'ceph_versions', 'ceph_df_detail']

['ceph_auth_list_--format_json-pretty', 'ceph_balancer_pool_ls_--format_json-pretty', 'ceph_balancer_status_--format_json-pretty', 'ceph_config-key_ls_--format_json-pretty', 'ceph_config_dump_--format_json-pretty', 'ceph_crash_ls_--format_json-pretty', 'ceph_crash_stat_--format_json-pretty', 'ceph_device_ls_--format_json-pretty', 'ceph_fs_dump_--format_json-pretty', 'ceph_fs_ls_--format_json-pretty', 'ceph_fs_status_--format_json-pretty', 'ceph_fs_subvolumegroup_ls_ocs-storagecluster-cephfilesystem_--format_json-pretty', 'ceph_health_detail_--format_json-pretty', 'ceph_mds_stat_--format_json-pretty', 'ceph_mgr_dump_--format_json-pretty', 'ceph_mgr_module_ls_--format_json-pretty', 'ceph_mgr_services_--format_json-pretty', 'ceph_mon_dump_--format_json-pretty', 'ceph_mon_stat_--format_json-pretty', 'ceph_osd_blacklist_ls_--format_json-pretty', 'ceph_osd_blocked-by_--format_json-pretty', 'ceph_osd_crush_class_ls_--format_json-pretty', 'ceph_osd_crush_dump_--format_json-pretty', 'ceph_osd_crush_rule_dump_--format_json-pretty', 'ceph_osd_crush_rule_ls_--format_json-pretty', 'ceph_osd_crush_show-tunables_--format_json-pretty', 'ceph_osd_crush_weight-set_dump_--format_json-pretty', 'ceph_osd_crush_weight-set_ls_--format_json-pretty', 'ceph_osd_df_--format_json-pretty', 'ceph_osd_df_tree_--format_json-pretty', 'ceph_osd_dump_--format_json-pretty', 'ceph_osd_getmaxosd_--format_json-pretty', 'ceph_osd_lspools_--format_json-pretty', 'ceph_osd_numa-status_--format_json-pretty', 'ceph_osd_perf_--format_json-pretty', 'ceph_osd_pool_ls_detail_--format_json-pretty', 'ceph_osd_stat_--format_json-pretty', 'ceph_osd_tree_--format_json-pretty', 'ceph_osd_utilization_--format_json-pretty', 'ceph_pg_dump_--format_json-pretty', 'ceph_pg_stat_--format_json-pretty', 'ceph_progress_--format_json-pretty', 'ceph_progress_json', 'ceph_progress_json_--format_json-pretty', 'ceph_quorum_status_--format_json-pretty', 'ceph_report_--format_json-pretty', 'ceph_service_dump_--format_json-pretty', 'ceph_status_--format_json-pretty', 'ceph_time-sync-status_--format_json-pretty', 'ceph_versions_--format_json-pretty', 'ceph_df_detail_--format_json-pretty']

Comment 9 Mudit Agarwal 2022-01-25 05:05:32 UTC
Oded, can you help with the describe output of the helper pod or the complete must-gather?

Comment 10 Oded 2022-01-25 13:14:20 UTC
MG:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/dnd-004ai1c33-s/dnd-004ai1c33-s_20220124T035144/logs/failed_testcase_ocs_logs_1642999729/test_multiple_pvc_creation_deletion_scale%5bReadWriteMany-cephfs%5d_ocs_logs/ocs_must_gather/

Describe helper pod:
Events:
  Type     Reason          Age                     From               Message
  ----     ------          ----                    ----               -------
  Normal   Scheduled       4m52s                   default-scheduler  Successfully assigned openshift-storage/must-gather-f5nh9-helper to compute-0
  Normal   AddedInterface  4m50s                   multus             Add eth0 [10.131.0.191/23] from openshift-sdn
  Warning  Failed          4m50s                   kubelet            Error: container create failed: time="2022-01-25T13:08:09Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          4m49s                   kubelet            Error: container create failed: time="2022-01-25T13:08:10Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          4m48s                   kubelet            Error: container create failed: time="2022-01-25T13:08:11Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          4m34s                   kubelet            Error: container create failed: time="2022-01-25T13:08:25Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          4m23s                   kubelet            Error: container create failed: time="2022-01-25T13:08:36Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          4m8s                    kubelet            Error: container create failed: time="2022-01-25T13:08:51Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          3m53s                   kubelet            Error: container create failed: time="2022-01-25T13:09:06Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          3m39s                   kubelet            Error: container create failed: time="2022-01-25T13:09:20Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          3m24s                   kubelet            Error: container create failed: time="2022-01-25T13:09:35Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          2m42s (x3 over 3m9s)    kubelet            (combined from similar events): Error: container create failed: time="2022-01-25T13:10:17Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Normal   Pulled          2m27s (x13 over 4m50s)  kubelet            Container image "quay.io/rhceph-dev/odf4-rook-ceph-rhel8-operator@sha256:deffe459757e10072fdec52c73534af903fa2815370d20bb777d3dd8a074e166" already present on machine

Comment 11 Oded 2022-01-25 13:14:42 UTC
MG:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/dnd-004ai1c33-s/dnd-004ai1c33-s_20220124T035144/logs/failed_testcase_ocs_logs_1642999729/test_multiple_pvc_creation_deletion_scale%5bReadWriteMany-cephfs%5d_ocs_logs/ocs_must_gather/

Describe helper pod:
Events:
  Type     Reason          Age                     From               Message
  ----     ------          ----                    ----               -------
  Normal   Scheduled       4m52s                   default-scheduler  Successfully assigned openshift-storage/must-gather-f5nh9-helper to compute-0
  Normal   AddedInterface  4m50s                   multus             Add eth0 [10.131.0.191/23] from openshift-sdn
  Warning  Failed          4m50s                   kubelet            Error: container create failed: time="2022-01-25T13:08:09Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          4m49s                   kubelet            Error: container create failed: time="2022-01-25T13:08:10Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          4m48s                   kubelet            Error: container create failed: time="2022-01-25T13:08:11Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          4m34s                   kubelet            Error: container create failed: time="2022-01-25T13:08:25Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          4m23s                   kubelet            Error: container create failed: time="2022-01-25T13:08:36Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          4m8s                    kubelet            Error: container create failed: time="2022-01-25T13:08:51Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          3m53s                   kubelet            Error: container create failed: time="2022-01-25T13:09:06Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          3m39s                   kubelet            Error: container create failed: time="2022-01-25T13:09:20Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          3m24s                   kubelet            Error: container create failed: time="2022-01-25T13:09:35Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          2m42s (x3 over 3m9s)    kubelet            (combined from similar events): Error: container create failed: time="2022-01-25T13:10:17Z" level=error msg="runc create failed: unable to start container process: exec: \"/tini\": stat /tini: no such file or directory"
  Normal   Pulled          2m27s (x13 over 4m50s)  kubelet            Container image "quay.io/rhceph-dev/odf4-rook-ceph-rhel8-operator@sha256:deffe459757e10072fdec52c73534af903fa2815370d20bb777d3dd8a074e166" already present on machine

Comment 12 Mudit Agarwal 2022-01-25 13:54:35 UTC
Looks like it is taking that last saved configuration

http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/dnd-004ai1c33-s/dnd-004ai1c33-s_20220124T035144/logs/failed_testcase_ocs_logs_1642999729/test_multiple_pvc_creation_deletion_scale%5bReadWriteMany-cephfs%5d_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-128c52cbe4a2f7fe58ed16ea2a2de72a79534add1c7bac6f769397f62b5ab165/namespaces/openshift-storage/pods/must-gather-zswsq-helper/must-gather-zswsq-helper.yaml 

 kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"must-gather-zswsq-helper","namespace":"openshift-storage"},"spec":{"containers":[{"args":["-g","--","/usr/local/bin/toolbox.sh"],"command":["/tini"],"env":[{"name":"ROOK_CEPH_USERNAME","valueFrom":{"secretKeyRef":{"key":"ceph-username","name":"rook-ceph-mon"}}},{"name":"ROOK_CEPH_SECRET","valueFrom":{"secretKeyRef":{"key":"ceph-secret","name":"rook-ceph-mon"}}}],"image":"quay.io/rhceph-dev/odf4-rook-ceph-rhel8-operator@sha256:deffe459757e10072fdec52c73534af903fa2815370d20bb777d3dd8a074e166","imagePullPolicy":"IfNotPresent","name":"must-gather-helper","securityContext":{"privileged":true},"volumeMounts":[{"mountPath":"/dev","name":"dev"},{"mountPath":"/sys/bus","name":"sysbus"},{"mountPath":"/lib/modules","name":"libmodules"},{"mountPath":"/etc/rook","name":"mon-endpoint-volume"}]}],"tolerations":[{"effect":"NoSchedule","key":"node.ocs.openshift.io/storage","operator":"Equal","value":"true"}],"volumes":[{"hostPath":{"path":"/dev"},"name":"dev"},{"hostPath":{"path":"/sys/bus"},"name":"sysbus"},{"hostPath":{"path":"/lib/modules"},"name":"libmodules"},{"configMap":{"items":[{"key":"data","path":"mon-endpoints"}],"name":"rook-ceph-mon-endpoints"},"name":"mon-endpoint-volume"}]}}

How do we make sure it takes the latest?

Comment 13 Subham Rai 2022-01-25 13:57:57 UTC
(In reply to Mudit Agarwal from comment #12)
> Looks like it is taking that last saved configuration

right, it looks like it picking old one
``` containers:
  - args:
    - -g
    - --
    - /usr/local/bin/toolbox.sh
    command:
    - /tini
    env:
```

my pr removed tini
> 
> http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/dnd-
> 004ai1c33-s/dnd-004ai1c33-s_20220124T035144/logs/
> failed_testcase_ocs_logs_1642999729/
> test_multiple_pvc_creation_deletion_scale%5bReadWriteMany-cephfs%5d_ocs_logs/
> ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-
> 128c52cbe4a2f7fe58ed16ea2a2de72a79534add1c7bac6f769397f62b5ab165/namespaces/
> openshift-storage/pods/must-gather-zswsq-helper/must-gather-zswsq-helper.
> yaml 
> 
>  kubectl.kubernetes.io/last-applied-configuration: |
>      
> {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"must-
> gather-zswsq-helper","namespace":"openshift-storage"},"spec":{"containers":
> [{"args":["-g","--","/usr/local/bin/toolbox.sh"],"command":["/tini"],"env":
> [{"name":"ROOK_CEPH_USERNAME","valueFrom":{"secretKeyRef":{"key":"ceph-
> username","name":"rook-ceph-mon"}}},{"name":"ROOK_CEPH_SECRET","valueFrom":
> {"secretKeyRef":{"key":"ceph-secret","name":"rook-ceph-mon"}}}],"image":
> "quay.io/rhceph-dev/odf4-rook-ceph-rhel8-operator@sha256:
> deffe459757e10072fdec52c73534af903fa2815370d20bb777d3dd8a074e166",
> "imagePullPolicy":"IfNotPresent","name":"must-gather-helper",
> "securityContext":{"privileged":true},"volumeMounts":[{"mountPath":"/dev",
> "name":"dev"},{"mountPath":"/sys/bus","name":"sysbus"},{"mountPath":"/lib/
> modules","name":"libmodules"},{"mountPath":"/etc/rook","name":"mon-endpoint-
> volume"}]}],"tolerations":[{"effect":"NoSchedule","key":"node.ocs.openshift.
> io/storage","operator":"Equal","value":"true"}],"volumes":[{"hostPath":
> {"path":"/dev"},"name":"dev"},{"hostPath":{"path":"/sys/bus"},"name":
> "sysbus"},{"hostPath":{"path":"/lib/modules"},"name":"libmodules"},
> {"configMap":{"items":[{"key":"data","path":"mon-endpoints"}],"name":"rook-
> ceph-mon-endpoints"},"name":"mon-endpoint-volume"}]}}
> 
> How do we make sure it takes the latest?

Comment 14 Subham Rai 2022-01-25 14:13:04 UTC
try to remove quay.io/rhceph-dev/ocs-must-gather:latest-4.10 and test again to make sure it is picking the latest build. Thanks

Comment 15 Oded 2022-01-30 12:02:47 UTC
Bug not fixed

Setup:
OCP version:4.10.0-0.nightly-2022-01-29-215708
ODF Version:4.10.0-128
Provider:Vmware


Test Process:
1.Run MG:
oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.10 --dest-dir=/tmp/tmpbnvcqsja_ocs_logs/ocs_must_gather

2.gather-debug.log
waiting for 1436 1437 1450 1451 1476 1477 to terminate
collecting crash core dump from node compute-2 
collecting prepare volume logs from node compute-2 
ceph core dump collection completed
skipping the ceph collection

3.Check MG content:
Exception: Files don't exist:
['ceph_auth_list', 'ceph_balancer_status', 'ceph_config-key_ls', 'ceph_config_dump', 'ceph_crash_stat', 'ceph_device_ls', 'ceph_fs_dump', 'ceph_fs_ls', 'ceph_fs_status', 'ceph_fs_subvolumegroup_ls_ocs-storagecluster-cephfilesystem', 'ceph_health_detail', 'ceph_mds_stat', 'ceph_mgr_dump', 'ceph_mgr_module_ls', 'ceph_mgr_services', 'ceph_mon_dump', 'ceph_mon_stat', 'ceph_osd_blocked-by', 'ceph_osd_crush_class_ls', 'ceph_osd_crush_dump', 'ceph_osd_crush_rule_dump', 'ceph_osd_crush_rule_ls', 'ceph_osd_crush_show-tunables', 'ceph_osd_crush_weight-set_dump', 'ceph_osd_df', 'ceph_osd_df_tree', 'ceph_osd_dump', 'ceph_osd_getmaxosd', 'ceph_osd_lspools', 'ceph_osd_numa-status', 'ceph_osd_perf', 'ceph_osd_pool_ls_detail', 'ceph_osd_stat', 'ceph_osd_tree', 'ceph_osd_utilization', 'ceph_pg_dump', 'ceph_pg_stat', 'ceph_quorum_status', 'ceph_report', 'ceph_service_dump', 'ceph_status', 'ceph_time-sync-status', 'ceph_versions', 'ceph_df_detail']
['ceph_auth_list_--format_json-pretty', 'ceph_balancer_pool_ls_--format_json-pretty', 'ceph_balancer_status_--format_json-pretty', 'ceph_config-key_ls_--format_json-pretty', 'ceph_config_dump_--format_json-pretty', 'ceph_crash_ls_--format_json-pretty', 'ceph_crash_stat_--format_json-pretty', 'ceph_device_ls_--format_json-pretty', 'ceph_fs_dump_--format_json-pretty', 'ceph_fs_ls_--format_json-pretty', 'ceph_fs_status_--format_json-pretty', 'ceph_fs_subvolumegroup_ls_ocs-storagecluster-cephfilesystem_--format_json-pretty', 'ceph_health_detail_--format_json-pretty', 'ceph_mds_stat_--format_json-pretty', 'ceph_mgr_dump_--format_json-pretty', 'ceph_mgr_module_ls_--format_json-pretty', 'ceph_mgr_services_--format_json-pretty', 'ceph_mon_dump_--format_json-pretty', 'ceph_mon_stat_--format_json-pretty', 'ceph_osd_blacklist_ls_--format_json-pretty', 'ceph_osd_blocked-by_--format_json-pretty', 'ceph_osd_crush_class_ls_--format_json-pretty', 'ceph_osd_crush_dump_--format_json-pretty', 'ceph_osd_crush_rule_dump_--format_json-pretty', 'ceph_osd_crush_rule_ls_--format_json-pretty', 'ceph_osd_crush_show-tunables_--format_json-pretty', 'ceph_osd_crush_weight-set_dump_--format_json-pretty', 'ceph_osd_crush_weight-set_ls_--format_json-pretty', 'ceph_osd_df_--format_json-pretty', 'ceph_osd_df_tree_--format_json-pretty', 'ceph_osd_dump_--format_json-pretty', 'ceph_osd_getmaxosd_--format_json-pretty', 'ceph_osd_lspools_--format_json-pretty', 'ceph_osd_numa-status_--format_json-pretty', 'ceph_osd_perf_--format_json-pretty', 'ceph_osd_pool_ls_detail_--format_json-pretty', 'ceph_osd_stat_--format_json-pretty', 'ceph_osd_tree_--format_json-pretty', 'ceph_osd_utilization_--format_json-pretty', 'ceph_pg_dump_--format_json-pretty', 'ceph_pg_stat_--format_json-pretty', 'ceph_progress_--format_json-pretty', 'ceph_progress_json', 'ceph_progress_json_--format_json-pretty', 'ceph_quorum_status_--format_json-pretty', 'ceph_report_--format_json-pretty', 'ceph_service_dump_--format_json-pretty', 'ceph_status_--format_json-pretty', 'ceph_time-sync-status_--format_json-pretty', 'ceph_versions_--format_json-pretty', 'ceph_df_detail_--format_json-pretty']

4.Get MG helper pod status:
$ oc get pods | grep helper
must-gather-7s4xl-helper                                          0/1     CreateContainerError   0          10s

Events:
  Type     Reason          Age                   From               Message
  ----     ------          ----                  ----               -------
  Normal   Scheduled       3m20s                 default-scheduler  Successfully assigned openshift-storage/must-gather-7s4xl-helper to compute-0
  Normal   AddedInterface  3m18s                 multus             Add eth0 [10.129.2.49/23] from openshift-sdn
  Warning  Failed          3m18s                 kubelet            Error: container create failed: time="2022-01-30T11:55:34Z" level=error msg="container_linux.go:380: starting container process caused: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          3m17s                 kubelet            Error: container create failed: time="2022-01-30T11:55:35Z" level=error msg="container_linux.go:380: starting container process caused: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          3m3s                  kubelet            Error: container create failed: time="2022-01-30T11:55:49Z" level=error msg="container_linux.go:380: starting container process caused: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          2m50s                 kubelet            Error: container create failed: time="2022-01-30T11:56:02Z" level=error msg="container_linux.go:380: starting container process caused: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          2m35s                 kubelet            Error: container create failed: time="2022-01-30T11:56:17Z" level=error msg="container_linux.go:380: starting container process caused: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          2m20s                 kubelet            Error: container create failed: time="2022-01-30T11:56:32Z" level=error msg="container_linux.go:380: starting container process caused: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          2m5s                  kubelet            Error: container create failed: time="2022-01-30T11:56:47Z" level=error msg="container_linux.go:380: starting container process caused: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          114s                  kubelet            Error: container create failed: time="2022-01-30T11:56:58Z" level=error msg="container_linux.go:380: starting container process caused: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          103s                  kubelet            Error: container create failed: time="2022-01-30T11:57:09Z" level=error msg="container_linux.go:380: starting container process caused: exec: \"/tini\": stat /tini: no such file or directory"
  Warning  Failed          60s (x3 over 88s)     kubelet            (combined from similar events): Error: container create failed: time="2022-01-30T11:57:52Z" level=error msg="container_linux.go:380: starting container process caused: exec: \"/tini\": stat /tini: no such file or directory"
  Normal   Pulled          47s (x13 over 3m18s)  kubelet            Container image "quay.io/rhceph-dev/odf4-rook-ceph-rhel8-operator@sha256:553b332e4ae53869f99593621d5e25c889f1907cc7babb11dca6ad61701499c5" already present on machine
  
  
MG dir:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-034ai3c33-s/j-034ai3c33-s_20220126T021925/logs/failed_testcase_ocs_logs_1643167244/test_multiple_pvc_creation_deletion_scale%5bReadWriteMany-CephBlockPool%5d_ocs_logs/ocs_must_gather/

Comment 16 Mudit Agarwal 2022-01-30 16:33:42 UTC
Did you follow the instructions provided in https://bugzilla.redhat.com/show_bug.cgi?id=2035774#c14?
Did you remove the must-gather image from your system and confirmed that it is pulling it from quay?

I guess not, looks like it used the image already present on the machine, see this message:

>> Container image "quay.io/rhceph-dev/odf4-rook-ceph-rhel8-operator@sha256:553b332e4ae53869f99593621d5e25c889f1907cc7babb11dca6ad61701499c5" already present on machine

Please remove the must-gather image from your system, make sure that while creating the helper pod it pull it from the quay repo and then if you still see the issue move it back to ASSIGNED.

Comment 17 Oded 2022-01-31 09:11:14 UTC
ocs-must-gather-latest-4.10 image modified 2 months ago
https://quay.io/repository/rhceph-dev/ocs-must-gather?tab=tags

We need to update the image

Comment 18 Oded 2022-02-06 15:40:06 UTC
Bug fixed.

SetUp:
ODF Version:4.10.0-143
OCP Version:4.10.0-0.nightly-2022-02-02-220834
Platform:AWS

Test Process:
1.Run mg command:
oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.10

2.Check content on mg dir

Comment 24 errata-xmlrpc 2022-04-13 18:50:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372

Comment 25 errata-xmlrpc 2022-04-13 18:53:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372


Note You need to log in before you can comment on or make changes to this bug.