Bug 1795177 - OC status command displays - panic: runtime error: invalid memory address or nil pointer dereference
Summary: OC status command displays - panic: runtime error: invalid memory address or ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Multi-Arch
Version: 4.2.z
Hardware: s390x
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: Carvel Baus
QA Contact: Barry Donahue
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-27 11:23 UTC by Lakshmi Ravichandran
Modified: 2020-06-29 17:46 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-29 17:46:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
SIGSEGV: segmentation error (535.21 KB, image/png)
2020-01-27 11:23 UTC, Lakshmi Ravichandran
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1810136 0 unspecified CLOSED [4.2] A pod that gradually leaks memory causes node to become unreachable for 10 minutes 2021-02-22 00:41:40 UTC

Description Lakshmi Ravichandran 2020-01-27 11:23:25 UTC
Created attachment 1655647 [details]
SIGSEGV: segmentation error

Description of problem:
"oc status" command of OC CLI displays - 
panic: runtime error: invalid memory address or nil pointer dereference
[signam SIGSEGV: segmentation violation code=0x1 ...]

Version-Release number of selected component (if applicable):
Client Version: openshift-clients-4.2.2-201910250432-12-g72076900
Server Version: 4.2.12-s390x
Kubernetes Version: v1.14.6+32dc4a0

How reproducible:
Introduce 100% disk stress on one of the worker nodes in the OCP cluster using filebench command.
The OCP console stops responding and the worker node goes down.
On opening the OC CLI and executing the command "oc status" displays  
"
panic: runtime error: invalid memory address or nil pointer dereference
[signam SIGSEGV: segmentation violation code=0x1 ...]
"
rather than giving a proper error message.


Steps to Reproduce:
1. Introduce 100% disk utilization workload on one of the worker nodes
2. On observering the worker node goes to "Not ready" state, the OCP console stops responding
3. On logging to the bastion and giving oc status command gives SIGSEGV error

Actual results:
Segmenation error given by the oc status command


Expected results:
The oc status should display a proper error message for the scenario.

Additional info:
Only the worker node on which the stress was put in has went to "Not ready" state other master and worker nodes in the cluster was in the "Ready" state

Comment 1 Carvel Baus 2020-02-04 21:08:42 UTC
Can you provide some more specific information about "Introduce 100% disk utilization workload on one of the worker nodes"

filebench does not appear to be included as part of RHCOS. Also could you please provide exact command used to run filebench, including arguments.

Comment 2 Lakshmi Ravichandran 2020-02-13 12:32:10 UTC
During the bugzappers call, this bug has been discussed to follow on with (https://bugzilla.redhat.com/show_bug.cgi?id=1795185)

Comment 3 Carvel Baus 2020-06-10 19:14:13 UTC
A possible fix for this landed in the latest 4.2 nightly. Can you re-test this and see if it can be reproduced?

Comment 4 Lakshmi Ravichandran 2020-06-23 18:10:51 UTC
The scenario was tested on OCP version
Client Version: 4.4.0-0.nightly-s390x-2020-06-17-185805
Server Version: 4.4.0-0.nightly-s390x-2020-06-17-185805
Kubernetes Version: v1.17.1+912792b

and the reported behaviour was not reproducible and the fix is supposed to be landed.

can someone please help me to understand, if the scenario still has to be tested on the latest 4.2 nightly as well ? 
I suppose, since the version of CRIO/other components responsible carrying the OOM fixes is well available/advanced in 4.4.0-0.nightly-s390x-2020-06-17-185805, please correct me if am wrong.

Comment 5 Lakshmi Ravichandran 2020-06-24 16:18:38 UTC
Tested the bug scenario on OCP 4.2.34 and the reported behaviour is not observed.

oc version
-----------
Client Version: 4.4.0-0.nightly-s390x-2020-06-12-154108
Server Version: 4.2.34
Kubernetes Version: v1.14.6+20b13ba


Note You need to log in before you can comment on or make changes to this bug.