Bug 1691660 - CVO upgrade did not overwrite OSImageURL set by user
Summary: CVO upgrade did not overwrite OSImageURL set by user
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.1.0
Assignee: Antonio Murdaca
QA Contact: Siva Reddy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-22 07:55 UTC by Johnny Liu
Modified: 2019-06-04 10:46 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:46:21 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:46:28 UTC

Description Johnny Liu 2019-03-22 07:55:23 UTC
Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:
Always

Steps to Reproduce:
1. set up a cluster with 4.0.0-0.nightly-2019-03-19-004004 payload

2. log into machine, check rhcos version
[core@ip-10-0-136-62 ~]$ rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-03-19-004004@sha256:65406dd82ead5a7cc6bd34f9c8e49b6212a7ab1db9cc9d33ba14613719e3771f
              CustomOrigin: Managed by pivot tool
                   Version: 410.8.20190315.0 (2019-03-15T13:32:33Z)

  pivot://docker-registry-default.cloud.registry.upshift.redhat.com/redhat-coreos/maipo@sha256:c09f455cc09673a1a13ae7b54cc4348cda0411e06dfa79ecd0130b35d62e8670
              CustomOrigin: Provisioned from oscontainer
                   Version: 400.7.20190306.0 (2019-03-06T22:16:26Z)

3. change default OSImageURL via user customized machineconfig.
# cat ~/master-os-update.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 0-master-os-01
spec:
  config:
    ignition:
      config: {}
      security:
        tls: {}
      timeouts: {}
      version: 2.2.0
    networkd: {}
    passwd: {}
    storage: {}
    systemd: {}
  osImageURL: "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181"
# oc create -f ~/master-os-update.yaml
# oc get machineconfig
NAME                                                        GENERATEDBYCONTROLLER       IGNITIONVERSION   CREATED
0-master-os-01                                                                          2.2.0             3m19s
00-master                                                   4.0.22-201903181722-dirty   2.2.0             44h
00-master-ssh                                               4.0.22-201903181722-dirty   2.2.0             44h
00-worker                                                   4.0.22-201903181722-dirty   2.2.0             44h
00-worker-ssh                                               4.0.22-201903181722-dirty   2.2.0             44h
01-master-container-runtime                                 4.0.22-201903181722-dirty   2.2.0             44h
01-master-kubelet                                           4.0.22-201903181722-dirty   2.2.0             44h
01-worker-container-runtime                                 4.0.22-201903181722-dirty   2.2.0             44h
01-worker-kubelet                                           4.0.22-201903181722-dirty   2.2.0             44h
99-master-4f75c9ab-4ae1-11e9-91fa-06b0504a45fe-registries   4.0.22-201903181722-dirty   2.2.0             44h
99-worker-4f76d90c-4ae1-11e9-91fa-06b0504a45fe-registries   4.0.22-201903181722-dirty   2.2.0             44h
master-419a0d921d5f348740605c2f198fe4d4                     4.0.22-201903181722-dirty   2.2.0             44h
master-af21f7284bb0dfd003ef17cbeabd95bc                     4.0.22-201903181722-dirty   2.2.0             3m14s
worker-7a222c854cc1d2ecc25d9cdcd80537c0                     4.0.22-201903181722-dirty   2.2.0             44h

4. After the new machineconfig is applied, log into machine, check rhcos version  again.
[core@ip-10-0-136-62 ~]$ rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181
              CustomOrigin: Managed by pivot tool
                   Version: 47.330 (2019-02-23T04:17:13Z)

  pivot://registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-03-19-004004@sha256:65406dd82ead5a7cc6bd34f9c8e49b6212a7ab1db9cc9d33ba14613719e3771f
              CustomOrigin: Managed by pivot tool
                   Version: 410.8.20190315.0 (2019-03-15T13:32:33Z)

5. Trigger a upgrade to '4.0.0-0.nightly-2019-03-20-153904', it succeed.

6. Check machineconfig again, and machine rhcos version

Actual results:
Two new machineconfigs (master-c3fcb0712f17ba7b1c94e3bd1e0d0443 and worker-7750b88f2147f3d8325b44403417a5df) are listed there.
# oc get machineconfig
NAME                                                        GENERATEDBYCONTROLLER       IGNITIONVERSION   CREATED
0-master-os-01                                                                          2.2.0             165m
00-master                                                   4.0.22-201903191645-dirty   2.2.0             47h
00-master-ssh                                               4.0.22-201903191645-dirty   2.2.0             47h
00-worker                                                   4.0.22-201903191645-dirty   2.2.0             47h
00-worker-ssh                                               4.0.22-201903191645-dirty   2.2.0             47h
01-master-container-runtime                                 4.0.22-201903191645-dirty   2.2.0             47h
01-master-kubelet                                           4.0.22-201903191645-dirty   2.2.0             47h
01-worker-container-runtime                                 4.0.22-201903191645-dirty   2.2.0             47h
01-worker-kubelet                                           4.0.22-201903191645-dirty   2.2.0             47h
99-master-4f75c9ab-4ae1-11e9-91fa-06b0504a45fe-registries   4.0.22-201903191645-dirty   2.2.0             47h
99-worker-4f76d90c-4ae1-11e9-91fa-06b0504a45fe-registries   4.0.22-201903191645-dirty   2.2.0             47h
master-419a0d921d5f348740605c2f198fe4d4                     4.0.22-201903181722-dirty   2.2.0             47h
master-af21f7284bb0dfd003ef17cbeabd95bc                     4.0.22-201903181722-dirty   2.2.0             165m
master-c3fcb0712f17ba7b1c94e3bd1e0d0443                     4.0.22-201903191645-dirty   2.2.0             59m
worker-7750b88f2147f3d8325b44403417a5df                     4.0.22-201903191645-dirty   2.2.0             59m
worker-7a222c854cc1d2ecc25d9cdcd80537c0                     4.0.22-201903181722-dirty   2.2.0             47h

But machine is still using user customized setting's rhcos version
[core@ip-10-0-136-62 ~]$ rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181
              CustomOrigin: Managed by pivot tool
                   Version: 47.330 (2019-02-23T04:17:13Z)

  pivot://registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-03-19-004004@sha256:65406dd82ead5a7cc6bd34f9c8e49b6212a7ab1db9cc9d33ba14613719e3771f
              CustomOrigin: Managed by pivot tool
                   Version: 410.8.20190315.0 (2019-03-15T13:32:33Z)

# oc get machineconfig master-c3fcb0712f17ba7b1c94e3bd1e0d0443 -o yaml|grep osImageURL
  osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181


Expected results:
OSImageURL customization should be protected or limited for customer.

Additional info:
1. Allow user to set OSImageURL via customized machineconfig, this is really convenient for testing os update for QE and dev;
2. We do not expect customer do that, but seem like no obvious warning to stop it, it is still possible to be set by customer.
3. Once customer set it, machine's rhcos version would be out of control of CVO upgrade.
4. Seem like https://github.com/openshift/machine-config-operator/issues/465 is talking about how to prevent such things happen, if the prevent happened, is there still other way for QE or Dev to set OSImageURL for os update testing?

Comment 1 Antonio Murdaca 2019-03-23 18:03:54 UTC
This is the PR to fix this https://github.com/openshift/machine-config-operator/pull/475

I do not feel QE or anyone else should test os upgrades through osImageURL. The expected way to do this is always through the payload. If we start testing in another, not supported way, what's the point of the test? The upgrade testing should be exercised through only machine-os-content in the payload.

The PR I linked should take care of this BZ by dropping the ability to use osImageURL for testing as well - to reiterate, QE should test os upgrades through a payload which overrides machine-os-content, not by creating a machine-config with osImageURL

Comment 2 Johnny Liu 2019-03-25 02:49:53 UTC
(In reply to Antonio Murdaca from comment #1)
> This is the PR to fix this
> https://github.com/openshift/machine-config-operator/pull/475
> 
> I do not feel QE or anyone else should test os upgrades through osImageURL.
> The expected way to do this is always through the payload. If we start
> testing in another, not supported way, what's the point of the test? The
> upgrade testing should be exercised through only machine-os-content in the
> payload.
> 
> The PR I linked should take care of this BZ by dropping the ability to use
> osImageURL for testing as well - to reiterate, QE should test os upgrades
> through a payload which overrides machine-os-content, not by creating a
> machine-config with osImageURL

Sometime QE was requested to do some exploration testing against RHCOS version,
maybe the version was not included in any payload yet. We customized osImageURL 
via machinceconfig to achive the os upgrade manually. I totally agree to drop the
ability to use osImageURL, that would keep only one entry for os upgrade for all
audience, whatever customer, dev or QE.

One more question, once the PR is landed, QE could follow [1] to override 
machine-os-content in the payload for os upgrade (not for the whole cluster
ugprade).

[1]: https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusterversion.md#setting-objects-unmanaged

Comment 3 Antonio Murdaca 2019-03-25 12:03:52 UTC
(In reply to Johnny Liu from comment #2)
> (In reply to Antonio Murdaca from comment #1)
> > This is the PR to fix this
> > https://github.com/openshift/machine-config-operator/pull/475
> > 
> > I do not feel QE or anyone else should test os upgrades through osImageURL.
> > The expected way to do this is always through the payload. If we start
> > testing in another, not supported way, what's the point of the test? The
> > upgrade testing should be exercised through only machine-os-content in the
> > payload.
> > 
> > The PR I linked should take care of this BZ by dropping the ability to use
> > osImageURL for testing as well - to reiterate, QE should test os upgrades
> > through a payload which overrides machine-os-content, not by creating a
> > machine-config with osImageURL
> 
> Sometime QE was requested to do some exploration testing against RHCOS
> version,
> maybe the version was not included in any payload yet. We customized
> osImageURL 
> via machinceconfig to achive the os upgrade manually. I totally agree to
> drop the
> ability to use osImageURL, that would keep only one entry for os upgrade for
> all
> audience, whatever customer, dev or QE.
> 
> One more question, once the PR is landed, QE could follow [1] to override 
> machine-os-content in the payload for os upgrade (not for the whole cluster
> ugprade).
> 
> [1]:
> https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/
> clusterversion.md#setting-objects-unmanaged

Once that PR merges, I believe the correct way to test machine-os-content is to build a payload overriding just machine-os-content. You can follow this guide https://github.com/openshift/machine-config-operator/blob/master/docs/HACKING.md#build-a-custom-release-payload and just override "machine-os-content". Ping me if issues arise following that.

Comment 4 Antonio Murdaca 2019-03-25 12:41:43 UTC
PR has been merged, moving to MODIFIED for QE

Comment 6 Johnny Liu 2019-03-26 07:30:44 UTC
The PR is not landed onto OCP nightly build yet.

# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-25-180911   True        False         174m    Cluster version is 4.0.0-0.nightly-2019-03-25-180911

# oc adm release info --commits|grep machine-config-operator
  machine-config-controller                     https://github.com/openshift/machine-config-operator                       72a74aa98c29ee3a71460dc8795dd7181ce20ab0
  machine-config-daemon                         https://github.com/openshift/machine-config-operator                       72a74aa98c29ee3a71460dc8795dd7181ce20ab0
  machine-config-operator                       https://github.com/openshift/machine-config-operator                       72a74aa98c29ee3a71460dc8795dd7181ce20ab0
  machine-config-server                         https://github.com/openshift/machine-config-operator                       72a74aa98c29ee3a71460dc8795dd7181ce20ab0
  setup-etcd-environment                        https://github.com/openshift/machine-config-operator                       72a74aa98c29ee3a71460dc8795dd7181ce20ab0

[jialiu@dhcp-141-223 machine-config-operator]$ git log --first-parent --format='%ad %h %d %s' --date=iso 72a74aa98c29ee3a71460dc8795dd7181ce20ab0^..origin/master | cat
2019-03-25 17:37:43 -0700 dc9b354  (HEAD -> master, origin/release-4.0, origin/master, origin/HEAD) Merge pull request #573 from runcom/add-retrying
2019-03-25 15:35:06 -0700 c83a2df  Merge pull request #575 from cgwalters/pool-subsumes
2019-03-25 14:04:07 -0700 4b62b08  Merge pull request #574 from rphillips/fixes/add_feature_gates_permissions
2019-03-25 11:20:56 -0700 7add825  Merge pull request #490 from runcom/crc-race
2019-03-25 09:41:57 -0700 7952b20  Merge pull request #572 from runcom/get-cc-directly
2019-03-25 05:23:18 -0700 31f4139  Merge pull request #475 from runcom/no-override-osimageurl
2019-03-22 19:18:38 -0700 72a74aa  Merge pull request #553 from rphillips/feat/kubelet_config_features_fixed

Comment 7 Wei Sun 2019-04-10 02:44:38 UTC
Hi Siva,
Could it be verified now?If the PR is still not the latest payload,please move the bug to MODIFIED status.

Thanks!

Comment 8 Siva Reddy 2019-04-10 04:10:13 UTC
  The osImageURL cannot be set by user using the machineconfig anymore. Even after changing the osImageURL via a configmap, when the cluster is 
upgraded using a payload overriding the machine-os-content the os image version is getting updating. 
    Hence moving this to verified.

Versions used:
  upgrade from 4.0.0-0.9 to 4.0.0-0.10
  machine-os-content used
     Version=410.8.20190322.0(quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41973eb774db51c505f91d9a9428de4a578ffe5b8d9a7a48333300862f11af7f)
     Version: 410.8.20190329.0(quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d762ceee9f46a141f54cc4dc9689fa19048c1df9b26aae5d8016d6d44995a08d)

Steps to reproduce:
  1. install a cluster 
  2. try to update the machine-os-content by creating a machine config
  3. Note that there will be no new rendered machineconfig and the update will not be picked up
  4. Update the config map for os image url in the openshift-machine-config-operator namespace after disabling the cvo
  5. Note that the changes get picked up and the os machine content get upgraded in all the machines
  6. Now enable the cvo and upgrade the cluster with a newer payload using cvo
  7. Note that the machines get upgraded to the machine os content in the payload.

Comment 10 errata-xmlrpc 2019-06-04 10:46:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.