1795177 – OC status command displays - panic: runtime error: invalid memory address or nil pointer dereference

Bug 1795177 - OC status command displays - panic: runtime error: invalid memory address or nil pointer dereference

Summary: OC status command displays - panic: runtime error: invalid memory address or ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Multi-Arch
Sub Component:
Version:	4.2.z
Hardware:	s390x
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Carvel Baus
QA Contact:	Barry Donahue
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-01-27 11:23 UTC by Lakshmi Ravichandran
Modified:	2020-06-29 17:46 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-06-29 17:46:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
SIGSEGV: segmentation error (535.21 KB, image/png) 2020-01-27 11:23 UTC, Lakshmi Ravichandran	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1810136	0	unspecified	CLOSED	[4.2] A pod that gradually leaks memory causes node to become unreachable for 10 minutes	2021-02-22 00:41:40 UTC

Description Lakshmi Ravichandran 2020-01-27 11:23:25 UTC

Created attachment 1655647 [details]
SIGSEGV: segmentation error

Description of problem:
"oc status" command of OC CLI displays - 
panic: runtime error: invalid memory address or nil pointer dereference
[signam SIGSEGV: segmentation violation code=0x1 ...]

Version-Release number of selected component (if applicable):
Client Version: openshift-clients-4.2.2-201910250432-12-g72076900
Server Version: 4.2.12-s390x
Kubernetes Version: v1.14.6+32dc4a0

How reproducible:
Introduce 100% disk stress on one of the worker nodes in the OCP cluster using filebench command.
The OCP console stops responding and the worker node goes down.
On opening the OC CLI and executing the command "oc status" displays  
"
panic: runtime error: invalid memory address or nil pointer dereference
[signam SIGSEGV: segmentation violation code=0x1 ...]
"
rather than giving a proper error message.


Steps to Reproduce:
1. Introduce 100% disk utilization workload on one of the worker nodes
2. On observering the worker node goes to "Not ready" state, the OCP console stops responding
3. On logging to the bastion and giving oc status command gives SIGSEGV error

Actual results:
Segmenation error given by the oc status command


Expected results:
The oc status should display a proper error message for the scenario.

Additional info:
Only the worker node on which the stress was put in has went to "Not ready" state other master and worker nodes in the cluster was in the "Ready" state

Comment 1 Carvel Baus 2020-02-04 21:08:42 UTC

Can you provide some more specific information about "Introduce 100% disk utilization workload on one of the worker nodes"

filebench does not appear to be included as part of RHCOS. Also could you please provide exact command used to run filebench, including arguments.

Comment 2 Lakshmi Ravichandran 2020-02-13 12:32:10 UTC

During the bugzappers call, this bug has been discussed to follow on with (https://bugzilla.redhat.com/show_bug.cgi?id=1795185)

Comment 3 Carvel Baus 2020-06-10 19:14:13 UTC

A possible fix for this landed in the latest 4.2 nightly. Can you re-test this and see if it can be reproduced?

Comment 4 Lakshmi Ravichandran 2020-06-23 18:10:51 UTC

The scenario was tested on OCP version
Client Version: 4.4.0-0.nightly-s390x-2020-06-17-185805
Server Version: 4.4.0-0.nightly-s390x-2020-06-17-185805
Kubernetes Version: v1.17.1+912792b

and the reported behaviour was not reproducible and the fix is supposed to be landed.

can someone please help me to understand, if the scenario still has to be tested on the latest 4.2 nightly as well ? 
I suppose, since the version of CRIO/other components responsible carrying the OOM fixes is well available/advanced in 4.4.0-0.nightly-s390x-2020-06-17-185805, please correct me if am wrong.

Comment 5 Lakshmi Ravichandran 2020-06-24 16:18:38 UTC

Tested the bug scenario on OCP 4.2.34 and the reported behaviour is not observed.

oc version
-----------
Client Version: 4.4.0-0.nightly-s390x-2020-06-12-154108
Server Version: 4.2.34
Kubernetes Version: v1.14.6+20b13ba

Note You need to log in before you can comment on or make changes to this bug.