Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1525162

Summary: [free-int] unable to start atomic-openshift-master-api due to admission plugin marshaling error
Product: OpenShift Container Platform Reporter: Justin Pierce <jupierce>
Component: MasterAssignee: Michal Fojtik <mfojtik>
Status: CLOSED CURRENTRELEASE QA Contact: ge liu <geliu>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.8.0CC: aos-bugs, eparis, gpei, jokerman, mifiedle, mmccomas, sdodson, wmeng
Target Milestone: ---Keywords: TestBlocker
Target Release: 3.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-21 18:38:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Excerpt from master-config.yml none

Description Justin Pierce 2017-12-12 17:00:42 UTC
Created attachment 1366815 [details]
Excerpt from master-config.yml

Description of problem:
During an upgrade of free-int from v3.7 to v3.8, the openshift-ansible installer timed out waiting for a master to come back online. ssh'd in to the master and found that atomic-openshift-master-api server was failing repeatedly due to:
"cannot unmarshal string into Go struct field ProjectLimitBySelector.maxProjects of type int"

Version-Release number of selected component (if applicable):
v3.8.18

How reproducible:
100%

Comment 1 Justin Pierce 2017-12-12 17:05:59 UTC
Took the obvious approach of changing this field to an int instead of a string and moved further to an error: 

 Couldn't init admission plugin "RunOnceDuration": json: cannot unmarshal string into Go struct field RunOnceDurationConfig.activeDeadlineSecondsOverride of type int64

Fixing that went to:
Couldn't init admission plugin "ClusterResourceOverride": json: cannot unmarshal string into Go struct field ClusterResourceOverrideConfig.cpuRequestToLimitPercent of type int64

..I continued to replace '1234' with 1234 to address the remainder of these in master-config.yml in order to make progress with v3.8 testing.

Comment 2 Jordan Liggitt 2017-12-12 18:32:14 UTC
maxProjects and activeDeadlineSecondsOverride are both numeric fields and must be specified as numbers, not strings

scott, do we have the ability to run `oc adm diagnostics` with the MasterConfigCheck and NodeConfigCheck  diagnostics with 3.8 as a pre-upgrade check? That would flag these for fixing prior to upgrade, rather than failing to come up after upgrade.

Comment 3 Jordan Liggitt 2017-12-12 18:33:16 UTC
> maxProjects and activeDeadlineSecondsOverride are both numeric fields and must be specified as numbers, not strings

(as is cpuRequestToLimitPercent)

Comment 4 Scott Dodson 2017-12-12 18:36:02 UTC
(In reply to Jordan Liggitt from comment #2)
> maxProjects and activeDeadlineSecondsOverride are both numeric fields and
> must be specified as numbers, not strings
> 
> scott, do we have the ability to run `oc adm diagnostics` with the
> MasterConfigCheck and NodeConfigCheck  diagnostics with 3.8 as a pre-upgrade
> check? That would flag these for fixing prior to upgrade, rather than
> failing to come up after upgrade.

We could run it inside a container. If we were to do anything else we'd have to upgrade the package first which would mean that if the master were restarted via other means it'd immediately start failing.

Comment 6 Jordan Liggitt 2017-12-12 19:22:46 UTC
> if the installer attempts to correct the config issue before performing the upgrade

I would not expect the installer to modify config it did not generate

I think the best we can do for config we don't own is detect it by running diagnostics prior to upgrade.

Comment 7 Scott Dodson 2017-12-12 21:02:49 UTC
I guess this is upgrade component then. :-=(

We'll add a task to validate current config with the upgrade target version via a container.

Comment 8 Eric Paris 2017-12-13 17:54:22 UTC
Does this parser change also break what might have been working API object definitions? Could I have created a rc with size "2" which would work on 3.7 but wouldn't on 3.8?

Comment 9 Eric Paris 2017-12-13 18:11:14 UTC
Confirmed.  This replicaset works fine on 3.7.  Does not work at all on 3.8. (Both port and replicas are in "")

# oc create -f /tmp/nginx.yaml -n eparis
Error from server (BadRequest): error when creating "/tmp/nginx.yaml": ReplicaSet in version "v1beta1" cannot be handled as a ReplicaSet: json: cannot unmarshal string into Go struct field ReplicaSetSpec.replicas of type int32

# cat /tmp/nginx.yaml 
apiVersion: extensions/v1beta1
kind: ReplicaSet
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: "1"
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx:latest
        imagePullPolicy: Always
        name: nginx
        ports:
        - containerPort: "80"
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

Comment 11 Jordan Liggitt 2017-12-13 22:35:27 UTC
opened PRs to restore type coercion:

https://github.com/openshift/origin/pull/17764 (3.8)
https://github.com/openshift/origin/pull/17768 (3.9)

Comment 12 Scott Dodson 2017-12-14 14:54:32 UTC
*** Bug 1525828 has been marked as a duplicate of this bug. ***

Comment 14 Jordan Liggitt 2018-01-05 13:43:12 UTC
Fixes are merged in 3.8.19+