Bug 1795177

Summary: OC status command displays - panic: runtime error: invalid memory address or nil pointer dereference
Product: OpenShift Container Platform Reporter: Lakshmi Ravichandran <lakshmi.ravichandran1>
Component: Multi-ArchAssignee: Carvel Baus <cbaus>
Status: CLOSED CURRENTRELEASE QA Contact: Barry Donahue <bdonahue>
Severity: low Docs Contact:
Priority: low    
Version: 4.2.zCC: cbaus, dorzel, Holger.Wolf, hwolf, nbziouec
Target Milestone: ---   
Target Release: ---   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-29 17:46:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
SIGSEGV: segmentation error none

Description Lakshmi Ravichandran 2020-01-27 11:23:25 UTC
Created attachment 1655647 [details]
SIGSEGV: segmentation error

Description of problem:
"oc status" command of OC CLI displays - 
panic: runtime error: invalid memory address or nil pointer dereference
[signam SIGSEGV: segmentation violation code=0x1 ...]

Version-Release number of selected component (if applicable):
Client Version: openshift-clients-4.2.2-201910250432-12-g72076900
Server Version: 4.2.12-s390x
Kubernetes Version: v1.14.6+32dc4a0

How reproducible:
Introduce 100% disk stress on one of the worker nodes in the OCP cluster using filebench command.
The OCP console stops responding and the worker node goes down.
On opening the OC CLI and executing the command "oc status" displays  
"
panic: runtime error: invalid memory address or nil pointer dereference
[signam SIGSEGV: segmentation violation code=0x1 ...]
"
rather than giving a proper error message.


Steps to Reproduce:
1. Introduce 100% disk utilization workload on one of the worker nodes
2. On observering the worker node goes to "Not ready" state, the OCP console stops responding
3. On logging to the bastion and giving oc status command gives SIGSEGV error

Actual results:
Segmenation error given by the oc status command


Expected results:
The oc status should display a proper error message for the scenario.

Additional info:
Only the worker node on which the stress was put in has went to "Not ready" state other master and worker nodes in the cluster was in the "Ready" state

Comment 1 Carvel Baus 2020-02-04 21:08:42 UTC
Can you provide some more specific information about "Introduce 100% disk utilization workload on one of the worker nodes"

filebench does not appear to be included as part of RHCOS. Also could you please provide exact command used to run filebench, including arguments.

Comment 2 Lakshmi Ravichandran 2020-02-13 12:32:10 UTC
During the bugzappers call, this bug has been discussed to follow on with (https://bugzilla.redhat.com/show_bug.cgi?id=1795185)

Comment 3 Carvel Baus 2020-06-10 19:14:13 UTC
A possible fix for this landed in the latest 4.2 nightly. Can you re-test this and see if it can be reproduced?

Comment 4 Lakshmi Ravichandran 2020-06-23 18:10:51 UTC
The scenario was tested on OCP version
Client Version: 4.4.0-0.nightly-s390x-2020-06-17-185805
Server Version: 4.4.0-0.nightly-s390x-2020-06-17-185805
Kubernetes Version: v1.17.1+912792b

and the reported behaviour was not reproducible and the fix is supposed to be landed.

can someone please help me to understand, if the scenario still has to be tested on the latest 4.2 nightly as well ? 
I suppose, since the version of CRIO/other components responsible carrying the OOM fixes is well available/advanced in 4.4.0-0.nightly-s390x-2020-06-17-185805, please correct me if am wrong.

Comment 5 Lakshmi Ravichandran 2020-06-24 16:18:38 UTC
Tested the bug scenario on OCP 4.2.34 and the reported behaviour is not observed.

oc version
-----------
Client Version: 4.4.0-0.nightly-s390x-2020-06-12-154108
Server Version: 4.2.34
Kubernetes Version: v1.14.6+20b13ba