Bug 2011845 - Oc exec command returning inconsistent output when printing huge amount of data
Summary: Oc exec command returning inconsistent output when printing huge amount of data
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Arda Guclu
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-07 13:59 UTC by Petr Balogh
Modified: 2022-10-11 10:36 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-11 10:36:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Petr Balogh 2021-10-07 13:59:52 UTC
Description of problem:
When trying to run command:
oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli

Which is suppose to print 57M binary file to local ./mcg-cli file I see inconsistent output in 4.8 version.

We need to basically download from pod this binary file in our ci to local machine - we cannot use rsync command cause of missing packages in the pod from which we are downloading the file so we came up with the command above, but in recent 4.8 runs it started failing a lot.


Here is the more info about different version I've tried where you can see it's ok in 4.6, 4.7, in 4.8 we see most problems, in 4.9 I see reproduced once out of 5 attempts:
----------------------------
4.8 client - 2 failure
----------------------------

test2 $ oc version
Client Version: 4.8.0-0.nightly-2021-10-05-155111
Server Version: 4.8.0-0.nightly-2021-10-06-224928
Kubernetes Version: v1.21.1+a620f50
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.8-1
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.8-2
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.8-3
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.8-4
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.8-5
test2 $ ll -h
total 1150208
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:05 mcg-cli-new-4.8-1
-rw-r--r--  1 pbalogh  staff    55M Oct  7 15:05 mcg-cli-new-4.8-2
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:05 mcg-cli-new-4.8-3
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:05 mcg-cli-new-4.8-4
-rw-r--r--  1 pbalogh  staff    53M Oct  7 15:05 mcg-cli-new-4.8-5
test2 $ md5sum *
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.8-1
ee141a29c27e3199a9958dad48d6486d  mcg-cli-new-4.8-2
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.8-3
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.8-4
a6bfd8d36c512f9ae634cc39fb6466ed  mcg-cli-new-4.8-5


----------------------------
4.6 client - 0 failure
----------------------------

test2 $ oc version
Client Version: 4.6.45
Server Version: 4.8.0-0.nightly-2021-10-06-224928
Kubernetes Version: v1.21.1+a620f50
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.6-1
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.6-2
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.6-3
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.6-4
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.6-5
test2 $ ll -h
total 1734528
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:08 mcg-cli-new-4.6-1
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:08 mcg-cli-new-4.6-2
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:08 mcg-cli-new-4.6-3
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:08 mcg-cli-new-4.6-4
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:09 mcg-cli-new-4.6-5
test2 $ md5sum *
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.6-1
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.6-2
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.6-3
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.6-4
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.6-5

----------------------------
4.7 client - 0 failure
----------------------------

test2 $ oc version
Client Version: 4.7.0-0.ci-2021-10-04-223908
Server Version: 4.8.0-0.nightly-2021-10-06-224928
Kubernetes Version: v1.21.1+a620f50

test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.7-1
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.7-2
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.7-3
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.7-4
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.7-5
test2 $ ll -h
total 2313848
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:41 mcg-cli-new-4.7-1
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:42 mcg-cli-new-4.7-2
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:42 mcg-cli-new-4.7-3
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:42 mcg-cli-new-4.7-4
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:42 mcg-cli-new-4.7-5
test2 $ md5sum *
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.7-1
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.7-2
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.7-3
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.7-4
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.7-5
----------------------------
4.9 client - 1 failure
----------------------------

test2 $ oc version
Client Version: 4.9.0-rc.5
Server Version: 4.8.0-0.nightly-2021-10-06-224928
Kubernetes Version: v1.21.1+a620f50
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.9-1
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.9-2
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.9-3
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.9-4
test2 $ oc exec -n openshift-storage noobaa-operator-574cf9c9cf-6xbrl -- cat /usr/local/bin/noobaa-operator > ./mcg-cli-new-4.9-5
test2 $ ll -h
total 578176
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:01 mcg-cli-new-4.9-1
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:01 mcg-cli-new-4.9-2
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:02 mcg-cli-new-4.9-3
-rw-r--r--  1 pbalogh  staff    54M Oct  7 15:02 mcg-cli-new-4.9-4
-rw-r--r--  1 pbalogh  staff    57M Oct  7 15:02 mcg-cli-new-4.9-5
test2 $ md5sum *
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.9-1
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.9-2
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.9-3
43b567a5d9833e0d5a4145898d912847  mcg-cli-new-4.9-4
935288bbb13e8bd8ca04f55f5e56a9e9  mcg-cli-new-4.9-5


Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-10-05-155111

How reproducible:
Try to download content of large file > 50M

Steps to Reproduce:
1. Run command:
oc exec -n openshift-storage pod-name -- cat /path/to/big/file > ./local-file


Actual results:
We see inconsistency in output of oc exec command. In 4.8 it's the worse

Expected results:
Have the same output as in 4.6 and 4.7

Comment 1 Petr Balogh 2021-10-11 08:04:54 UTC
BTW the tar binary was not available in the container as well, so we could not use oc cp command.  But now when I am looking at the container it already has tar in it, at least in OCS 4.8 version. So I will check also OCS 4.6 and 4.7 if this is the case, and if it's already there we can probably change the logic in the code to use oc cp instead.

Comment 2 Maciej Szulik 2021-10-11 14:57:03 UTC
I'm lowering the priority since it looks like there are alternative and this problems appears to be happening less than 20% of time.

Comment 3 Michal Fojtik 2021-12-17 18:34:04 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 4 Petr Balogh 2021-12-21 12:06:33 UTC
Hello, I haven't tested it recently but don't think this issue got fixed, so it's still valid.
We started using different approach for getting data out of pod, but for sure there was some regression introduced between mentioned versions as we saw more inconsistency of data returned by that exec command. So not sure why it's happening on large amount of data and maybe if you will just have 50 MB file full of 0 or 1 and you will try to cat it's output you might spot some error in the output. But I haven't tested on file with zeroes or 1 so maybe it's related to some symbol from binary file.

Currently as we changed the approach of gathering the data I don't care about fix, but just thinking if it can affect some customer or someone as for sure some regression got introduced in that area.

Comment 5 Petr Balogh 2022-01-05 17:26:37 UTC
So on IBM cloud we see the inconsistency also when using oc cp command.
https://github.com/red-hat-storage/ocs-ci/pull/4958/files#diff-1d32a061b0f98ee283c52d1f7e9ca822177982cee47f7daa76904f5d8e1c3bfdR985

https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/2772/consoleFull

2022-01-05 15:25:44  14:25:31 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage rsh noobaa-operator-6d97549cc6-ngwwg bash -c "md5sum /usr/local/bin/noobaa-operator"
2022-01-05 15:25:44  14:25:32 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /usr/local/bin/noobaa-operator: 2e5a528a73ecf8b15e8a14105be0f82a
2022-01-05 15:25:44  14:25:32 - MainThread - /home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/resources/mcg.py - INFO - Remote noobaa cli md5 hash: 2e5a528a73ecf8b15e8a14105be0f82a
2022-01-05 15:25:44  14:25:32 - MainThread - /home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/resources/mcg.py - INFO - Local noobaa cli md5 hash: ed6f3d31987a18ebb731a67a8cf076e6
2022-01-05 15:25:44  14:25:32 - MainThread - ocs_ci.utility.retry - WARNING - Binary hash doesn't match the one on the operator pod, Retrying in 15 seconds...
2022-01-05 15:25:48  14:25:47 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod noobaa-operator-6d97549cc6-ngwwg -n openshift-storage -o yaml
2022-01-05 15:25:48  14:25:48 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage rsh noobaa-operator-6d97549cc6-ngwwg bash -c "md5sum /usr/local/bin/noobaa-operator"
2022-01-05 15:25:50  14:25:50 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /usr/local/bin/noobaa-operator: 2e5a528a73ecf8b15e8a14105be0f82a
2022-01-05 15:25:50  14:25:50 - MainThread - /home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/resources/mcg.py - INFO - Remote noobaa cli md5 hash: 2e5a528a73ecf8b15e8a14105be0f82a
2022-01-05 15:25:50  14:25:50 - MainThread - /home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/resources/mcg.py - INFO - Local noobaa cli md5 hash: ed6f3d31987a18ebb731a67a8cf076e6
2022-01-05 15:38:44  14:38:32 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod noobaa-operator-6d97549cc6-ngwwg -n openshift-storage -o yaml
2022-01-05 15:38:44  14:38:34 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage rsh noobaa-operator-6d97549cc6-ngwwg bash -c "md5sum /usr/local/bin/noobaa-operator"
2022-01-05 15:38:44  14:38:36 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /usr/local/bin/noobaa-operator: 2e5a528a73ecf8b15e8a14105be0f82a
2022-01-05 15:38:44  14:38:36 - MainThread - /home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/resources/mcg.py - INFO - Remote noobaa cli md5 hash: 2e5a528a73ecf8b15e8a14105be0f82a
2022-01-05 15:38:44  14:38:36 - MainThread - /home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/resources/mcg.py - INFO - Local noobaa cli md5 hash: 73d3fe6d2f58e61c0e0361cfb30d5675
2022-01-05 15:38:44  14:38:36 - MainThread - ocs_ci.utility.retry - WARNING - Binary hash doesn't match the one on the operator pod, Retrying in 15 seconds...
2022-01-05 15:38:53  14:38:51 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod noobaa-operator-6d97549cc6-ngwwg -n openshift-storage -o yaml
2022-01-05 15:38:53  14:38:52 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage rsh noobaa-operator-6d97549cc6-ngwwg bash -c "md5sum /usr/local/bin/noobaa-operator"
2022-01-05 15:38:53  14:38:53 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /usr/local/bin/noobaa-operator: 2e5a528a73ecf8b15e8a14105be0f82a
2022-01-05 15:38:53  14:38:53 - MainThread - /home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/resources/mcg.py - INFO - Remote noobaa cli md5 hash: 2e5a528a73ecf8b15e8a14105be0f82a
2022-01-05 15:38:53  14:38:53 - MainThread - /home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/resources/mcg.py - INFO - Local noobaa cli md5 hash: 73d3fe6d2f58e61c0e0361cfb30d5675
2022-01-05 15:39:06  14:39:05 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod noobaa-operator-6d97549cc6-ngwwg -n openshift-storage -o yaml
2022-01-05 15:39:06  14:39:06 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage rsh noobaa-operator-6d97549cc6-ngwwg bash -c "md5sum /usr/local/bin/noobaa-operator"
2022-01-05 15:39:08  14:39:07 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /usr/local/bin/noobaa-operator: 2e5a528a73ecf8b15e8a14105be0f82a
2022-01-05 15:39:08  14:39:07 - MainThread - /home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/resources/mcg.py - INFO - Remote noobaa cli md5 hash: 2e5a528a73ecf8b15e8a14105be0f82a
2022-01-05 15:39:08  14:39:08 - MainThread - /home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/resources/mcg.py - INFO - Local noobaa cli md5 hash: 84135e3ece427cf19817a68a782e0ba3
2022-01-05 15:39:08  14:39:08 - MainThread - ocs_ci.utility.retry - WARNING - Binary hash doesn't match the one on the operator pod, Retrying in 15 seconds...
2022-01-05 15:39:23  14:39:23 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod noobaa-operator-6d97549cc6-ngwwg -n openshift-storage -o yaml
2022-01-05 15:39:24  14:39:23 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage rsh noobaa-operator-6d97549cc6-ngwwg bash -c "md5sum /usr/local/bin/noobaa-operator"
2022-01-05 15:39:26  14:39:25 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /usr/local/bin/noobaa-operator: 2e5a528a73ecf8b15e8a14105be0f82a
2022-01-05 15:39:26  14:39:25 - MainThread - /home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/resources/mcg.py - INFO - Remote noobaa cli md5 hash: 2e5a528a73ecf8b15e8a14105be0f82a
2022-01-05 15:39:26  14:39:25 - MainThread - /home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/resources/mcg.py - INFO - Local noobaa cli md5 hash: 84135e3ece427cf19817a68a782e0ba3

you can see that Local noobaa cli md5 hash is different on several re-try attepmts.

I am running once again here:
https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-trigger-ibmcloud-managed-1az-rhel-3w-tier1/59/

Will be able to provide you cluster for investigation.
Client used in job above:
4.9.0-0.nightly-2022-01-05-035431

The underlaying cluster is IBM cloud ROKS 4.9.8

Must gather data:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-058icm1r3-t1/j-058icm1r3-t1_20220105T130458/logs/failed_testcase_ocs_logs_1641388138/test_deployment_ocs_logs/

Comment 7 Petr Balogh 2022-01-06 14:34:43 UTC
I see the issue is discussed also here:
https://github.com/kubernetes/kubernetes/issues/60140

And this is looking identical problem.

When using `oc rsync` I don't see the issue.

Comment 9 Michal Fojtik 2022-02-07 07:23:22 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 10 Petr Balogh 2022-02-07 08:51:16 UTC
I think this is still relevant.

Comment 11 Arda Guclu 2022-10-11 10:36:53 UTC
As stated in here https://github.com/kubernetes/kubernetes/issues/60140, this is well-known upstream issue. I'd prefer closing this bug as WONTFIX because we need to wait upstream fix in any case. Feel free to re-open it if you think otherwise.


Note You need to log in before you can comment on or make changes to this bug.