Bug 2209695

Summary: When collecting Must-gather logs shows /usr/bin/gather_ceph_resources: line 341: jq: command not found
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Pratik Surve <prsurve>
Component: must-gatherAssignee: yati padia <ypadia>
Status: CLOSED ERRATA QA Contact: Pratik Surve <prsurve>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.13CC: branto, muagarwa, ocs-bugs, odf-bz-bot, ypadia
Target Milestone: ---   
Target Release: ODF 4.13.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.13.0-207 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2210475 (view as bug list) Environment:
Last Closed: 2023-06-21 15:25:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2210475    

Description Pratik Surve 2023-05-24 14:25:54 UTC
Description of problem (please be detailed as possible and provide log
snippets):

When collecting Must-gather logs shows /usr/bin/gather_ceph_resources: line 341: jq: command not found



Version of all relevant components (if applicable):

OCP version:- 4.13.0-0.nightly-2023-05-22-040653
ODF version:- 4.13.0-203
CEPH version:- ceph version 17.2.6-50.el9cp (c202ddb5589554af0ce43432ff07cd7ce8f35243) quincy (stable)
ACM version:- 2.8.0-180
SUBMARINER version:- v0.15.0
VOLSYNC version:- volsync-product.v0.7.1

oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather@sha256:10071ddc29383af01d60eadfa4d6f2bd631cfd4c06fcdf7efdb655a84b13a4f1


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Run must-gather over ODF cluster
2.
3.


Actual results:

[must-gather-gctbc] POD 2023-05-24T14:10:31.584612326Z collecting snapshot info for cephFS subvolumes 
[must-gather-gctbc] POD 2023-05-24T14:10:31.586366724Z /usr/bin/gather_ceph_resources: line 341: jq: command not found


Expected results:


Additional info:

I tried creating a pod with the must-gather image added above 
i don't see jq package in it 

# jq -r
bash: jq: command not found

Comment 5 Mudit Agarwal 2023-05-25 08:26:52 UTC
Since OCS 4.8, "jq" was added to the downstream via build, see http://pkgs.devel.redhat.com/cgit/containers/rook-ceph/commit/?h=ocs-4.8-rhel-8

Boris, has something changed in 4.13?

Comment 6 Boris Ranto 2023-05-25 10:23:16 UTC
I don't see any change here around jq really.

I tried comparing rook-ceph 4.13 and 4.9, both have jq binary in them, same version (1.6) and in the exact same location (/usr/bin/jq).

I can confirm that there was no jq binary in ocs-must-gather in e.g. ODF 4.9 either so no change there either.

My guess would be that the script is not running the jq binary in the rook-ceph pod anymore for some reason? It could be somehow related to the rhceph image using ubi-minimal as a base nowadays maybe?

Comment 7 Boris Ranto 2023-05-25 12:10:16 UTC
I was looking at the script and it looks like I'm right. The error is coming from this line:

            subvolgrp_names=$(timeout 120 oc -n "${ns}" exec "${HOSTNAME}"-helper -- bash -c "${ceph_command}"| jq --raw-output '.[].name')

and the escaping is wrong there so it is trying to run jq in the must-gather container and it is no available there. The line should look like this instead:

            subvolgrp_names=$(timeout 120 oc -n "${ns}" exec "${HOSTNAME}"-helper -- bash -c "${ceph_command} | jq --raw-output '.[].name'")

Comment 8 Mudit Agarwal 2023-05-25 14:27:38 UTC
Thanks Boris!

Yati, please send a patch asap.

Comment 9 yati padia 2023-05-25 15:00:30 UTC
Added the link to the patch, will update once merged.

Comment 10 Mudit Agarwal 2023-05-25 15:24:41 UTC
This exists since 4.12 (commit https://github.com/red-hat-storage/ocs-operator/commit/b58ba9b8a8d6f5220842e44c210a6b42f2a6466a)

Yati, please clone this bug to 4.12 also. We need to fix it there as well, I don't know why this was never discovered.

Comment 11 yati padia 2023-05-27 18:46:07 UTC
Yeah sure, will do that.

Comment 17 errata-xmlrpc 2023-06-21 15:25:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3742