Bug 1908266 - RHOSP 16.1 minor update fails because of release lock enforcement on Ceph nodes
Summary: RHOSP 16.1 minor update fails because of release lock enforcement on Ceph nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: z6
: 16.1 (Train on RHEL 8.2)
Assignee: Sofer Athlan-Guyot
QA Contact: Jason Grosso
Andy Stillman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-16 09:25 UTC by Bernd Zehrfuchs
Modified: 2024-06-13 23:44 UTC (History)
17 users (show)

Fixed In Version: tripleo-ansible-0.5.1-1.20210310113105.902c3c8.el8ost openstack-tripleo-heat-templates-11.3.2-1.20210310113344.29a02c1.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-26 11:43:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1912512 0 None None None 2021-01-20 17:22:18 UTC
OpenStack gerrit 771671 0 None MERGED Allow rhsm subscription test to be overrided in RHOSP deployment. 2021-05-03 16:12:50 UTC
OpenStack gerrit 771676 0 None MERGED Add a new role parameter rhsm_enforce. 2021-05-03 16:12:51 UTC
Red Hat Issue Tracker OSP-440 0 None None None 2022-08-02 14:02:06 UTC
Red Hat Issue Tracker RHOSPDOC-712 0 Urgent Peer Review [RFE][Updates] RHOSP 16.1 minor update fails because of release lock enforcement on Ceph nodes 2021-05-05 16:48:29 UTC
Red Hat Product Errata RHSA-2021:2119 0 None None None 2021-05-26 11:44:24 UTC

Description Bernd Zehrfuchs 2020-12-16 09:25:38 UTC
Description of problem:

The minor update of an RHOSP 16.1 overcloud with containerized Red Hat Ceph cluster fails at the Ceph part  because the Ansible role enforces to set the release to 8.2 on all nodes.
According to the documentation this is only required for undercloud, controller and compute nodes but not for Ceph nodes.

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/keeping_red_hat_openstack_platform_updated/preparing-for-a-minor-update#locking-the-environment-to-a-red-hat-enterprise-linux-release_keeping-updated

Error message:
 fatal: [ceph01]: FAILED! => {"changed": false, "msg": "OSP16.1 is only supported with Red Hat 8.2. Please make sure to pin rhel to 8.2 using: subscription-manager release --set=8.2. You can then proceed with the update."}

Ansible role: tripleo-redhat-enforce

Version-Release number of selected component (if applicable):

RHOSP 16.1
RHEL 8.2
containerized Red Hat Ceph 4 cluster deployed via OSP director


How reproducible:
Deploy a OSP 16.1 stack and run the minor update according to the documentation.

Steps to Reproduce:
  1. deploy OSP 16.1 director
  2. deploy OSP 16.1 overcloud with containerized Red Hat Ceph cluster
  3. run minor update according to https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/keeping_red_hat_openstack_platform_updated

Actual results:

  Update fails with:
   fatal: [ceph01]: FAILED! => {"changed": false, "msg": "OSP16.1 is only supported with Red Hat 8.2. Please make sure to pin rhel to 8.2 using: subscription-manager release --set=8.2. You can then proceed with the update."}

Expected results:

  The update process will complete successfully.


Additional info:

Ansible task file: /usr/share/ansible/roles/tripleo-redhat-enforce/defaults/main.yml

Comment 1 Sofer Athlan-Guyot 2020-12-17 12:18:38 UTC
Hi,

I think this is a documentation issue as I don't think ceph osd node on rhel-16.1 should run on anything but rhel-8.2 as suggested in [1].

I'm asking confirmation of Ceph team and rhos-delivery team.

Teams: is there any reason why we shouldn't run "subscription-manager release --set=8.2" on ceph-osd for OSP16.1 ?

RHOS-DELIVERY: more generally is there some specific subscription for ceph-osd and what are they ?

Note, there are many way to get over this error as it's a configuration option (that can be deactivated) but I we should get confirmation that ceph-osd need another type of rhel pinned down. If 8.2 was indeed needed (as I think it is) then omitting it could lead to problem.

Setting this to urgent as the answer should be straightforward and the documentation updated as quickly as possible if there's a issue.

[1]  https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/keeping_red_hat_openstack_platform_updated

Comment 9 Sofer Athlan-Guyot 2021-01-13 14:56:30 UTC
Hi,

so thanks to the explanation on the internal mailing, I think we do have an issue here. Ceph nodes are not bound to EUS constraint an can be on any version of rhel[1].

That means that the assumption that all overcloud nodes should follow the EUS streams constraints for every 16 release is wrong.

We should be able to compose role where this check is disabled.

Please let me knon if those assertion are correct. 

Thanks,

[1] referring to that https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/installation_guide/requirements-for-installing-rhcs#enabling-the-red-hat-ceph-storage-repositories-install

Comment 11 Sofer Athlan-Guyot 2021-01-20 17:22:19 UTC
Hi,

started implementing a solution for this, where one can disable enforcement on a per role basis by adding "rhsm_enforce: false" to the role definition.

Comment 12 Sofer Athlan-Guyot 2021-02-18 17:59:02 UTC
Hi @dmcphers ,

this is a new parameter for role definition that should be set to false in the role definition called "rhsm_enforce". This is useful for Ceph role using overcloud-minimal where rhel is not necessarily pinned to a specific version.

Do you think it's worth some more documentation in some specific ceph/osp documentation ?

Comment 16 Sofer Athlan-Guyot 2021-04-07 14:37:05 UTC
Hi @dmacpher ,

so the way I see it should be a warning there :

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/keeping_red_hat_openstack_platform_updated/assembly-updating_the_overcloud#running-the-overcloud-update-preparation_keeping-updated

Just before we run the overcloud update prepare command.

The warning section should point to this https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/deploying_an_overcloud_with_containerized_red_hat_ceph/index#using-the-overcloud-minimal-image-to-avoid-using-a-Red-Hat-subscription-entitlement

This is where we have to add that the role file should set the rhsm_enforce to false when using overcloud-minimal-image.

Basically, 'for all Ceph osd "role" which are using overcloud-minimal image, their role should have rhsm_enforce set to false' to avoid checking rhosp version enforcement.

Comment 25 Vlada Grosu 2021-04-27 16:21:52 UTC
Hi Sofer,

I can see that by default the roles_data.yaml file has set rhsm_enforce: False [1]. 

Is the expectation that they've unset that parameter? 
What actions are we required to ensure that this parameter is set to false? Is it to inspect the roles_data.yaml file and edit it if needed?


Also, if we're making a change in the Deploying an overcloud with containerized Red Hat Ceph guide, section 2.6. Using the overcloud-minimal image to avoid using a Red Hat subscription entitlement [2], does having this parameter set to false affect any other scenarios except for updates/upgrades?

I will create a draft for these changes shortly and update this ticket with the details.

Many thanks,
Vlada


[1] https://github.com/openstack/tripleo-heat-templates/blob/stable/train/roles_data.yaml

[2] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/deploying_an_overcloud_with_containerized_red_hat_ceph/index#using-the-overcloud-minimal-image-to-avoid-using-a-Red-Hat-subscription-entitlement


(In reply to Sofer Athlan-Guyot from comment #16)
> Hi @dmacpher ,
> 
> so the way I see it should be a warning there :
> 
> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.
> 1/html/keeping_red_hat_openstack_platform_updated/assembly-
> updating_the_overcloud#running-the-overcloud-update-preparation_keeping-
> updated
> 
> Just before we run the overcloud update prepare command.
> 
> The warning section should point to this
> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.
> 1/html-single/deploying_an_overcloud_with_containerized_red_hat_ceph/
> index#using-the-overcloud-minimal-image-to-avoid-using-a-Red-Hat-
> subscription-entitlement
> 
> This is where we have to add that the role file should set the rhsm_enforce
> to false when using overcloud-minimal-image.
> 
> Basically, 'for all Ceph osd "role" which are using overcloud-minimal image,
> their role should have rhsm_enforce set to false' to avoid checking rhosp
> version enforcement.

Comment 34 Sofer Athlan-Guyot 2021-05-05 13:46:53 UTC
Hi,

made sure that we were enforcing the rhel check:

(undercloud) [stack@undercloud-0 ~]$ openstack stack environment show qe-Cloud-0 > env.txt
(undercloud) [stack@undercloud-0 ~]$ grep Enforce env.txt
  SkipRhelEnforcement: false

The role data for CephStorage has rhsm_enforce set to false:

- name: CephStorage
  description: |
    Ceph OSD Storage node role
  networks:
    Storage:
      subnet: storage_subnet
    StorageMgmt:
      subnet: storage_mgmt_subnet
  uses_deprecated_params: False
  deprecated_nic_config_name: 'ceph-storage.yaml'
  # CephOSD present so serial has to be 1
  update_serial: 1
  rhsm_enforce: False
...

Ceph-1 has subscription but nothing is set:

[root@ceph-1 ~]# sudo subscription-manager release --show
Release not set

Compute-0 has no subscription whatsoever (used to prove that the check is indeed enable there)

[heat-admin@compute-0 ~]$ sudo subscription-manager release --show
This system is not yet registered. Try 'subscription-manager register --help' for more information.

Now if I'm updating ceph-1, there is no check implemented:

TASK [tripleo-redhat-enforce : Enforce RHEL/OSP version pair] ******************
Wednesday 05 May 2021  13:20:09 +0000 (0:00:00.069)       0:00:17.324 *********
skipping: [ceph-1] => {"changed": false, "skip_reason": "Conditional result was False"}


while for compute-0:

TASK [tripleo-redhat-enforce : Enforce RHEL/OSP version pair] ******************
Wednesday 05 May 2021  13:34:12 +0000 (0:00:00.068)       0:00:17.875 *********
included: /usr/share/ansible/roles/tripleo-redhat-enforce/tasks/enforce_release.yml for compute-0

TASK [tripleo-redhat-enforce : get current release settings] *******************
Wednesday 05 May 2021  13:34:12 +0000 (0:00:00.088)       0:00:17.964 *********
fatal: [compute-0]: FAILED! => {"attempts": 1, "changed": true, "cmd": ["subscription-manager", "release", "--show"], "delta": "0:00:01.162262", "end": "2021-05-05 13:34:14.583573", "msg": "non-zero return code", "rc": 1, "start": "2021-05-05 13:34:13.421311", "stderr": "This system is not yet registered. Try 'subscription-manager register --help' for more information.", "stderr_lines": ["This system is not yet registered. Try 'subscription-manager register --help' for more information."], "stdout": "", "stdout_lines": []}
...ignoring

TASK [tripleo-redhat-enforce : fails if not registered] ************************
Wednesday 05 May 2021  13:34:14 +0000 (0:00:01.701)       0:00:19.666 *********
fatal: [compute-0]: FAILED! => {"changed": false, "msg": "Your environment is not subscribed! If it is expected, please set SkipRhelEnforcement to true. For Director the documentation is there https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html-single/ director_installation_and_usage/index#configuring-the-undercloud-with-environment-files, for the Overcloud you need to add a new parameter file to your deploy command with that parameter set. You can also disable it in the role, see rhsm_enforce role parameter. If this is unexpected, you have to subscribe this node and ensure that RHEL is pinned to 8.2 as this is the only version supported for 16.1."}

Failure happens as expected.


Verified.

Comment 35 Sofer Athlan-Guyot 2021-05-05 13:51:36 UTC
(In reply to Vlada Grosu from comment #25)
> Hi Sofer,
> 
> I can see that by default the roles_data.yaml file has set rhsm_enforce:
> False [1]. 
> 
> Is the expectation that they've unset that parameter? 

so roles_data.yaml is an "example" and if you look closely it's set to false only for the CephStorage role which is where this parameter should make sense.

> What actions are we required to ensure that this parameter is set to false?
> Is it to inspect the roles_data.yaml file and edit it if needed?

Yes.

> 
> 
> Also, if we're making a change in the Deploying an overcloud with
> containerized Red Hat Ceph guide, section 2.6. Using the overcloud-minimal
> image to avoid using a Red Hat subscription entitlement [2], does having
> this parameter set to false affect any other scenarios except for
> updates/upgrades?

The check happens only during update, but it would be a good idea to properly set this from the deployment.

> 
> I will create a draft for these changes shortly and update this ticket with
> the details.
> 

Thanks.

> Many thanks,
> Vlada
> 
> 
> [1]
> https://github.com/openstack/tripleo-heat-templates/blob/stable/train/
> roles_data.yaml
> 
> [2]
> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.
> 1/html-single/deploying_an_overcloud_with_containerized_red_hat_ceph/
> index#using-the-overcloud-minimal-image-to-avoid-using-a-Red-Hat-
> subscription-entitlement
> 
> 
> (In reply to Sofer Athlan-Guyot from comment #16)
> > Hi @dmacpher ,
> > 
> > so the way I see it should be a warning there :
> > 
> > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.
> > 1/html/keeping_red_hat_openstack_platform_updated/assembly-
> > updating_the_overcloud#running-the-overcloud-update-preparation_keeping-
> > updated
> > 
> > Just before we run the overcloud update prepare command.
> > 
> > The warning section should point to this
> > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.
> > 1/html-single/deploying_an_overcloud_with_containerized_red_hat_ceph/
> > index#using-the-overcloud-minimal-image-to-avoid-using-a-Red-Hat-
> > subscription-entitlement
> > 
> > This is where we have to add that the role file should set the rhsm_enforce
> > to false when using overcloud-minimal-image.
> > 
> > Basically, 'for all Ceph osd "role" which are using overcloud-minimal image,
> > their role should have rhsm_enforce set to false' to avoid checking rhosp
> > version enforcement.

Comment 36 Vlada Grosu 2021-05-05 16:52:42 UTC
Thanks, Sofer! 

The changes will be reflected in Director installation and usage guide, Deploying an overcloud with containerized Red Hat Ceph, and Keeping Red Hat OpenStack Platform updated. They will be published for 16.1.6. 

I've added the docs Jira tracker for this.  

Thank you!

Comment 44 errata-xmlrpc 2021-05-26 11:43:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenStack Platform 16.1.6 (tripleo-ansible) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2119


Note You need to log in before you can comment on or make changes to this bug.