Bug 2209695 - When collecting Must-gather logs shows /usr/bin/gather_ceph_resources: line 341: jq: command not found
Summary: When collecting Must-gather logs shows /usr/bin/gather_ceph_resources: line 3...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: must-gather
Version: 4.13
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.13.0
Assignee: yati padia
QA Contact: Pratik Surve
URL:
Whiteboard:
Depends On:
Blocks: 2210475
TreeView+ depends on / blocked
 
Reported: 2023-05-24 14:25 UTC by Pratik Surve
Modified: 2023-08-09 16:35 UTC (History)
5 users (show)

Fixed In Version: 4.13.0-207
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2210475 (view as bug list)
Environment:
Last Closed: 2023-06-21 15:25:39 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage odf-must-gather pull 25 0 None open cleanup to resolve jq: command not found error 2023-05-25 14:59:06 UTC
Github red-hat-storage odf-must-gather pull 26 0 None Merged Bug 2209695: [release-4.13] cleanup to resolve jq: command not found error 2023-05-25 15:28:42 UTC
Red Hat Product Errata RHBA-2023:3742 0 None None None 2023-06-21 15:25:52 UTC

Description Pratik Surve 2023-05-24 14:25:54 UTC
Description of problem (please be detailed as possible and provide log
snippets):

When collecting Must-gather logs shows /usr/bin/gather_ceph_resources: line 341: jq: command not found



Version of all relevant components (if applicable):

OCP version:- 4.13.0-0.nightly-2023-05-22-040653
ODF version:- 4.13.0-203
CEPH version:- ceph version 17.2.6-50.el9cp (c202ddb5589554af0ce43432ff07cd7ce8f35243) quincy (stable)
ACM version:- 2.8.0-180
SUBMARINER version:- v0.15.0
VOLSYNC version:- volsync-product.v0.7.1

oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather@sha256:10071ddc29383af01d60eadfa4d6f2bd631cfd4c06fcdf7efdb655a84b13a4f1


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Run must-gather over ODF cluster
2.
3.


Actual results:

[must-gather-gctbc] POD 2023-05-24T14:10:31.584612326Z collecting snapshot info for cephFS subvolumes 
[must-gather-gctbc] POD 2023-05-24T14:10:31.586366724Z /usr/bin/gather_ceph_resources: line 341: jq: command not found


Expected results:


Additional info:

I tried creating a pod with the must-gather image added above 
i don't see jq package in it 

# jq -r
bash: jq: command not found

Comment 5 Mudit Agarwal 2023-05-25 08:26:52 UTC
Since OCS 4.8, "jq" was added to the downstream via build, see http://pkgs.devel.redhat.com/cgit/containers/rook-ceph/commit/?h=ocs-4.8-rhel-8

Boris, has something changed in 4.13?

Comment 6 Boris Ranto 2023-05-25 10:23:16 UTC
I don't see any change here around jq really.

I tried comparing rook-ceph 4.13 and 4.9, both have jq binary in them, same version (1.6) and in the exact same location (/usr/bin/jq).

I can confirm that there was no jq binary in ocs-must-gather in e.g. ODF 4.9 either so no change there either.

My guess would be that the script is not running the jq binary in the rook-ceph pod anymore for some reason? It could be somehow related to the rhceph image using ubi-minimal as a base nowadays maybe?

Comment 7 Boris Ranto 2023-05-25 12:10:16 UTC
I was looking at the script and it looks like I'm right. The error is coming from this line:

            subvolgrp_names=$(timeout 120 oc -n "${ns}" exec "${HOSTNAME}"-helper -- bash -c "${ceph_command}"| jq --raw-output '.[].name')

and the escaping is wrong there so it is trying to run jq in the must-gather container and it is no available there. The line should look like this instead:

            subvolgrp_names=$(timeout 120 oc -n "${ns}" exec "${HOSTNAME}"-helper -- bash -c "${ceph_command} | jq --raw-output '.[].name'")

Comment 8 Mudit Agarwal 2023-05-25 14:27:38 UTC
Thanks Boris!

Yati, please send a patch asap.

Comment 9 yati padia 2023-05-25 15:00:30 UTC
Added the link to the patch, will update once merged.

Comment 10 Mudit Agarwal 2023-05-25 15:24:41 UTC
This exists since 4.12 (commit https://github.com/red-hat-storage/ocs-operator/commit/b58ba9b8a8d6f5220842e44c210a6b42f2a6466a)

Yati, please clone this bug to 4.12 also. We need to fix it there as well, I don't know why this was never discovered.

Comment 11 yati padia 2023-05-27 18:46:07 UTC
Yeah sure, will do that.

Comment 17 errata-xmlrpc 2023-06-21 15:25:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3742


Note You need to log in before you can comment on or make changes to this bug.