Bug 1869864

Summary: Unable to view deployment config after enabling auto scaling on it
Product: OpenShift Container Platform Reporter: Vijay Bhadriraju <vbhadrir>
Component: ocAssignee: Mike Dame <mdame>
Status: CLOSED DUPLICATE QA Contact: zhou ying <yinzhou>
Severity: high Docs Contact:
Priority: high    
Version: 4.5CC: alegrand, anpicker, aos-bugs, dslavens, erooth, jokerman, kakkoyun, lcosic, maszulik, mfojtik, mloibl, pkrupa, surbania
Target Milestone: ---Keywords: UpcomingSprint
Target Release: 4.6.0   
Hardware: s390x   
OS: Linux   
Whiteboard: multi-arch
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-09 12:48:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vijay Bhadriraju 2020-08-18 20:00:54 UTC
Description of problem:

"oc describe dc dc_name" after enabling horizontal pod autoscaler

Version-Release number of selected component (if applicable):

4.5

How reproducible:

Enable autoscaler on a dc and run "oc describe dc dc_name"

Steps to Reproduce:
1.
2.
3.

Actual results:

root:scripts#.oc describe dc megaweb
panic: converting (v1.HorizontalPodAutoscaler).MaxReplicas to (autoscaling.HorizontalPodAutoscaler).MaxReplicas: Metrics not present in src

goroutine 1 [running]:
github.com/openshift/oc/pkg/helpers/describe.printAutoscalingInfo(0xc00137cd90, 0x2, 0x2, 0xc00161a9c0, 0x8, 0xc00161a9b0, 0x7, 0x33c8ba0, 0xc0004f2c80, 0xc000de9ad0)
	/go/src/github.com/openshift/oc/pkg/helpers/describe/deployments.go:346 +0xa42
github.com/openshift/oc/pkg/helpers/describe.printDeploymentConfigSpec(0x33c8ba0, 0xc0004f2c80, 0x0, 0x0, 0x0, 0x0, 0xc00161a9b0, 0x7, 0x0, 0x0, ...)
	/go/src/github.com/openshift/oc/pkg/helpers/describe/deployments.go:299 +0x314
github.com/openshift/oc/pkg/helpers/describe.(*DeploymentConfigDescriber).Describe.func1(0xc000de9ad0, 0x32da460, 0xc001232060)
	/go/src/github.com/openshift/oc/pkg/helpers/describe/deployments.go:101 +0x1ed
github.com/openshift/oc/pkg/helpers/describe.tabbedString(0xc00137da20, 0x7ffef54d084c, 0x7, 0x0, 0x0)
	/go/src/github.com/openshift/oc/pkg/helpers/describe/helpers.go:37 +0xb0
github.com/openshift/oc/pkg/helpers/describe.(*DeploymentConfigDescriber).Describe(0xc001863a40, 0xc000e4e720, 0x8, 0x7ffef54d084c, 0x7, 0x1, 0x203000, 0x0, 0x0, 0x0)
	/go/src/github.com/openshift/oc/pkg/helpers/describe/deployments.go:79 +0xa9
github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/describe.(*DescribeOptions).Run(0xc001193f40, 0x0, 0x2ed1ea0)
	/go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/describe/describe.go:190 +0x4ca
github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/describe.NewCmdDescribe.func1(0xc001194a00, 0xc0004819a0, 0x2, 0x2)
	/go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/describe/describe.go:111 +0xa0
github.com/openshift/oc/vendor/github.com/spf13/cobra.(*Command).execute(0xc001194a00, 0xc000481940, 0x2, 0x2, 0xc001194a00, 0xc000481940)
	/go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:830 +0x2ae
github.com/openshift/oc/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc000b11900, 0x2, 0xc000b11900, 0x2)
	/go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:914 +0x2fc
github.com/openshift/oc/vendor/github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:864
main.main()
	/go/src/github.com/openshift/oc/cmd/oc/oc.go:107 +0x835
root:scripts#.



Expected results:

Expect to see the description of the deployment config.


Additional info:

Comment 1 Pawel Krupa 2020-08-19 09:23:28 UTC
It seems like the issue is in `oc` client or in autoscaler. Reassigning to `oc` component for further investigation.

Comment 2 Maciej Szulik 2020-09-08 11:20:39 UTC
Mike can you look into it?

Comment 3 Mike Dame 2020-09-08 20:33:44 UTC
> /go/src/github.com/openshift/oc/pkg/helpers/describe/deployments.go:346 +0xa42

This is a weird error, because the master and release-4.5 branches of oc don't have anything at that line that would return a panic: https://github.com/openshift/oc/blob/release-4.5/pkg/helpers/describe/deployments.go#L346 (it's literally just a closing bracket)

I also attempted to reproduce this on a 4.5 cluster with the latest CI 4.5 oc and didn't have any issue (repro steps below).

Are you using a version of oc built from an older codebase? I see an older commit did have an Fprint() statement there: https://github.com/openshift/oc/blob/f0e69f31d2754f69833d91d6208a0dace672fd19/pkg/helpers/describe/deployments.go#L346

(Repro steps for this):

$ ./oc version
Client Version: 4.5.0-0.ci-2020-09-08-123649
Server Version: 4.5.8
Kubernetes Version: v1.18.3+6c42de8

$ cat dc.yaml 
kind: "DeploymentConfig"
apiVersion: "v1"
metadata:
  name: "frontend"
spec:
  template: 
    metadata:
      labels:
        name: "frontend"
    spec:
      containers:
        - name: "helloworld"
          image: "openshift/origin-ruby-sample"
          ports:
            - containerPort: 8080
              protocol: "TCP"
  replicas: 5 
  triggers:
    - type: "ConfigChange" 
    - type: "ImageChange" 
      imageChangeParams:
        automatic: true
        containerNames:
          - "helloworld"
        from:
          kind: "ImageStreamTag"
          name: "origin-ruby-sample:latest"
  strategy: 
    type: "Rolling"
  paused: false 
  revisionHistoryLimit: 2 
  minReadySeconds: 0 


$ ./oc create -f dc.yaml 
deploymentconfig.apps.openshift.io/frontend created

$ oc autoscale dc/frontend --min 1 --max 10 --cpu-percent=80
horizontalpodautoscaler.autoscaling/frontend autoscaled

$ ./oc describe dc frontend
Name:		frontend
Namespace:	autoscale
Created:	22 seconds ago
Labels:		<none>
Annotations:	<none>
Latest Version:	Not deployed
Selector:	name=frontend
Replicas:	5
Autoscaling:	between 1 and 10 replicas targeting 80% CPU over all the pods
Triggers:	Config, Image(origin-ruby-sample@latest, auto=true)
Strategy:	Rolling
Template:
Pod Template:
  Labels:	name=frontend
  Containers:
   helloworld:
    Image:		openshift/origin-ruby-sample
    Port:		8080/TCP
    Host Port:		0/TCP
    Environment:	<none>
    Mounts:		<none>
  Volumes:		<none>

Latest Deployment:	<none>

Events:	<none>

Comment 4 Maciej Szulik 2020-09-09 11:10:39 UTC
Mike the only suspect I can think of is hpa.Spec.MinReplicas in https://github.com/openshift/oc/blob/master/pkg/helpers/describe/deployments.go#L341
which is a pointer and we're not checking if it's set and yet we access the value. I'd fix that for starters, then eventually try looking
into other possible nil elements in that area.

Comment 5 Maciej Szulik 2020-09-09 11:11:21 UTC
Vijay can you provide us with a full yaml of both the HPA and deployment?

Comment 6 Mike Dame 2020-09-09 12:48:43 UTC
This bug appears to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1785513 (fixed in https://github.com/openshift/oc/pull/227, 4.4)

*** This bug has been marked as a duplicate of bug 1785513 ***

Comment 7 Vijay Bhadriraju 2020-09-09 14:16:47 UTC
The OCP cluster where this problem occurred stopped responding where I was unable to access the cluster via oc cli or the web console. I had to tear down the cluster and rebuilt, so I do not have the yaml of HPA or Deployment. I do not know if enabling auto scaling in some way cratered my cluster, so I have not tried to re-enable HPA on my rebuilt cluster. Deploying a new OCP cluster using UPI is very tedious so have not dared to re-enable HPA with the fear of losing the cluster again.

Comment 8 Mike Dame 2020-09-09 14:31:02 UTC
This error would be originating from the oc client, so I don't think it would be dependent on the cluster. Could you provide an `oc version`?

Comment 9 Vijay Bhadriraju 2020-09-21 00:27:22 UTC
root:scripts#.oc version
Client Version: 4.3.1
Server Version: 4.5.5
Kubernetes Version: v1.18.3+08c38ef

Comment 10 Vijay Bhadriraju 2020-09-21 00:35:08 UTC
I redownloaded the OC client from the 4.5.5 cluster download link and found that the cluster is point to the same OC Client version 4.3.1

Comment 11 Vijay Bhadriraju 2020-09-21 01:52:08 UTC
the OC client version downloaded from the link pointed by the cluster is different as seen below
root:home#.oc version
Client Version: openshift-clients-4.5.0-202006231303.p0-4-gb66f2d3a6
Server Version: 4.5.5
Kubernetes Version: v1.18.3+08c38ef
root:home#.

After using this version of the oc client, I able to run the oc describe dc/dcname on a pod with autoscaling enabled and it returning my a valid dc. I think that problem is with using backward level oc cli tool.

root:home#.oc describe dc/megaweb
Name:		megaweb
Namespace:	megabank
Created:	3 weeks ago
Labels:		app=megaweb
		app.kubernetes.io/component=megaweb
		app.kubernetes.io/instance=megaweb
Annotations:	openshift.io/generated-by=OpenShiftNewApp
Latest Version:	3
Selector:	deploymentconfig=megaweb
Replicas:	0
Autoscaling:	between 1 and 4 replicas targeting 70% CPU over all the pods
Triggers:	Config, Image(megaweb@latest, auto=true)
Strategy:	Rolling
Template:
Pod Template:
  Labels:	deploymentconfig=megaweb
  Annotations:	openshift.io/generated-by: OpenShiftNewApp
  Containers:
   megaweb:
    Image:	docker.io/vbhadrir/mbweb-z@sha256:7e619f30122596f631ef803a41709533436dd672042e6b663741f6f97838a675
    Ports:	9080/TCP, 9443/TCP
    Host Ports:	0/TCP, 0/TCP
    Limits:
      cpu:	1
      memory:	4000Mi
    Requests:
      cpu:	100m
      memory:	1000Mi
    Environment:
      ACCIDHISTORY_SVC_HOST:	$(MBSVC7_SERVICE_HOST):9080
      ACCOUNTS_SVC_HOST:	$(MBSVC8_SERVICE_HOST):9080
      CUSTOMER_SVC_HOST:	$(MBSVC2_SERVICE_HOST):9080
      DEPOSIT_SVC_HOST:		$(MBSVC3_SERVICE_HOST):9080
      HISTORY_SVC_HOST:		$(MBSVC6_SERVICE_HOST):9080
      LOGIN_SVC_HOST:		$(MBSVC1_SERVICE_HOST):9080
      LOGOUT_SVC_HOST:		$(MBSVC9_SERVICE_HOST):9080
      MegaBankComponentClass:	com.ibm.cpo.mb.MegaBankJDBCWS
      TRANSFER_SVC_HOST:	$(MBSVC5_SERVICE_HOST):9080
      WITHDRAW_SVC_HOST:	$(MBSVC4_SERVICE_HOST):9080
    Mounts:			<none>
  Volumes:			<none>

Deployment #3 (latest):
	Name:		megaweb-3
	Created:	2 hours ago
	Status:		Complete
	Replicas:	0 current / 0 desired
	Selector:	deployment=megaweb-3,deploymentconfig=megaweb
	Labels:		app.kubernetes.io/component=megaweb,app.kubernetes.io/instance=megaweb,app=megaweb,openshift.io/deployment-config.name=megaweb
	Pods Status:	0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Deployment #2:
	Created:	2 hours ago
	Status:		Complete
	Replicas:	0 current / 0 desired
Deployment #1:
	Created:	3 weeks ago
	Status:		Complete
	Replicas:	0 current / 0 desired

Events:
  Type		Reason				Age	From				Message
  ----		------				----	----				-------
  Normal	DeploymentCreated		110m	deploymentconfig-controller	Created new replication controller "megaweb-2" for version 2
  Normal	DeploymentCreated		109m	deploymentconfig-controller	Created new replication controller "megaweb-3" for version 3
  Normal	ReplicationControllerScaled	108m	deploymentconfig-controller	Scaled replication controller "megaweb-3" from 4 to 0