Bug 1894774

Summary: [DOCS] How to configure the worker and infra nodes to be distributed across hosts (with soft-anti-affinity) on IPI in OpenShift Platform 4.5 on top of Red Hat OpenStack Platform
Product: OpenShift Container Platform Reporter: Hideshi Fukumoto <hfukumot>
Component: DocumentationAssignee: Max Bridges <mbridges>
Status: CLOSED CURRENTRELEASE QA Contact: rlobillo
Severity: medium Docs Contact: Vikram Goyal <vigoyal>
Priority: medium    
Version: 4.5CC: aos-bugs, atragler, eduen, jokerman, mbridges, pprinett, rlobillo, sdodson, vigoyal, wsun, xtian
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-10 15:21:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 6 Pierre Prinetti 2021-01-21 17:30:43 UTC
Hi Hideshi,
The information you seek is technically not an installation step, but rather a "machine management" step: adding a MachineSet can be done as a day-2 operation.

Here is the relevant documentation: https://docs.openshift.com/container-platform/4.6/machine_management/creating_machinesets/creating-machineset-osp.html#machineset-yaml-osp_creating-machineset-osp

The note "4" (serverGroupID) points to the OpenStack documentation about creating a server group.

Is that enough, or is there still some missing or misplaced information?

Comment 12 Pierre Prinetti 2021-02-24 12:56:05 UTC
(In reply to Hideshi Fukumoto from comment #10)
> Hi Pierre,
> 
> Thank you for your comment.
> 
> > Your description looks accurate, although I haven't had the chance to test it yet.
> > 
> > I like the idea of modifying the Machineset before triggering the install step,
> > however I unfortunately have no insight over the supportability of such workflow.
> > In other words: I suppose that customers are allowed to change the manifests
> > before `create cluster`, but I am not sure.
> 
>   I don't know what to do in such a case. [...]

Following an internal discussion, I am now pretty convinced that modifying manifests is legit and does not break the supportability of the installation. That being said, I suggest you to refer to a Product person to get a final answer on the question of supportability.

Comment 13 Pierre Prinetti 2021-02-25 08:34:24 UTC
In the attached PR[1], I have added a description of how a user can set an affinity policy on workers at install-time.
There is probably zero "new" content, but rather a "vertical" collection of information aimed at the specific purpose of setting affinity for workers.

Is this the kind of content you expected, Hideshi?

[1]: github.com/openshift/installer/pull/4687

Comment 16 Pierre Prinetti 2021-03-09 10:58:51 UTC
Re:
> could you please backport

We don't generally backport documentation patches. Do you think you can point the customer to the document on the master branch, until it's properly published somewhere downstream?

Comment 19 Pierre Prinetti 2021-03-17 09:51:28 UTC
Max,
Do you think that we can find a place for this content in the downstream docs?

Comment 20 Max Bridges 2021-03-17 14:41:26 UTC
@pprinett Assuming this is QEed, yeah, I think so.

Comment 29 rlobillo 2021-06-16 13:08:40 UTC
Procedure https://github.com/openshift/installer/blob/master/docs/user/openstack/affinity.md verified on 4.8.0-0.nightly-2021-06-14-145150 over OSP16.1 (RHOS-16.1-RHEL-8-20210506.n.1)


Given an OSP installation with 3 computes:

$ . overcloudrc && openstack host list
+---------------------------+-----------+----------+
| Host Name                 | Service   | Zone     |
+---------------------------+-----------+----------+
| controller-1.redhat.local | conductor | internal |
| controller-2.redhat.local | conductor | internal |
| controller-0.redhat.local | conductor | internal |
| controller-1.redhat.local | scheduler | internal |
| controller-2.redhat.local | scheduler | internal |
| controller-0.redhat.local | scheduler | internal |
| compute-0.redhat.local    | compute   | nova     |
| compute-1.redhat.local    | compute   | nova     |
| compute-2.redhat.local    | compute   | nova     |
+---------------------------+-----------+----------+


and install-config.yaml with below section:

compute:
- name: worker
  platform:
    openstack:
      zones: []
      additionalNetworkIDs: ['2c7cfcdd-ab1d-4e6c-bf5b-293674aaa3ae']
  replicas: 3
controlPlane:
  name: master
  platform:
    openstack:
      zones: []
  replicas: 3

The server group is created setting the anti-affinity rule:

$ openstack \
> --os-compute-api-version=2.15 \
> server group create \
> --policy anti-affinity \
> my-openshift-worker-group
+------------+--------------------------------------+
| Field      | Value                                |
+------------+--------------------------------------+
| id         | 5fff614c-4a3f-4a46-a91b-345a91818f08 |
| members    |                                      |
| name       | my-openshift-worker-group            |
| policies   | anti-affinity                        |
| project_id | 081f736edb604e889a864777936df531     |
| user_id    | f5aa933c3ffe46b59fe7dc8bfe14e30d     |
+------------+--------------------------------------+

And the manifest is modified as explained on https://github.com/openshift/installer/blob/master/docs/user/openstack/affinity.md

$ openshift-install create manifests --dir ostest                                                                                                         
INFO Credentials loaded from file "/home/stack/clouds.yaml"                                                                                                                                  
INFO Consuming Install Config from target directory                                                                                                                                          
INFO Manifests created in: ostest/manifests and ostest/openshift


$ cd ostest/

$ cat openshift/99_openshift-cluster-api_worker-machineset-0.yaml | yq .spec.template.spec.providerSpec.value.serverGroupID
"5fff614c-4a3f-4a46-a91b-345a91818f08"

Running IPI installation:

$ openshift-install create cluster --dir ostest --log-level debug
[...]
DEBUG Time elapsed per stage:
DEBUG     Infrastructure: 2m38s
DEBUG Bootstrap Complete: 16m8s
DEBUG                API: 7m3s
DEBUG  Bootstrap Destroy: 39s
DEBUG  Cluster Operators: 24m54s
INFO Time elapsed: 45m12s


$ openstack server list --long --all-projects -c Name -c Host
+-----------------------------+------------------------+
| Name                        | Host                   |
+-----------------------------+------------------------+
| ostest-rclbg-worker-0-h9dvn | compute-0.redhat.local |
| ostest-rclbg-worker-0-6thfm | compute-2.redhat.local |
| ostest-rclbg-worker-0-pt45f | compute-1.redhat.local |
| ostest-rclbg-master-2       | compute-0.redhat.local |
| ostest-rclbg-master-1       | compute-2.redhat.local |
| ostest-rclbg-master-0       | compute-1.redhat.local |
+-----------------------------+------------------------+

$ oc get machineset/ostest-rclbg-worker-0 -n openshift-machine-api -o json | jq .spec.template.spec.providerSpec.value.serverGroupID
"5fff614c-4a3f-4a46-a91b-345a91818f08"

Comment 30 rlobillo 2021-06-16 13:09:10 UTC
Procedure https://github.com/openshift/installer/blob/master/docs/user/openstack/affinity.md verified on 4.8.0-0.nightly-2021-06-14-145150 over OSP16.1 (RHOS-16.1-RHEL-8-20210506.n.1)


Given an OSP installation with 3 computes:

$ . overcloudrc && openstack host list
+---------------------------+-----------+----------+
| Host Name                 | Service   | Zone     |
+---------------------------+-----------+----------+
| controller-1.redhat.local | conductor | internal |
| controller-2.redhat.local | conductor | internal |
| controller-0.redhat.local | conductor | internal |
| controller-1.redhat.local | scheduler | internal |
| controller-2.redhat.local | scheduler | internal |
| controller-0.redhat.local | scheduler | internal |
| compute-0.redhat.local    | compute   | nova     |
| compute-1.redhat.local    | compute   | nova     |
| compute-2.redhat.local    | compute   | nova     |
+---------------------------+-----------+----------+


and install-config.yaml with below section:

compute:
- name: worker
  platform:
    openstack:
      zones: []
      additionalNetworkIDs: ['2c7cfcdd-ab1d-4e6c-bf5b-293674aaa3ae']
  replicas: 3
controlPlane:
  name: master
  platform:
    openstack:
      zones: []
  replicas: 3

The server group is created setting the anti-affinity rule:

$ openstack \
> --os-compute-api-version=2.15 \
> server group create \
> --policy anti-affinity \
> my-openshift-worker-group
+------------+--------------------------------------+
| Field      | Value                                |
+------------+--------------------------------------+
| id         | 5fff614c-4a3f-4a46-a91b-345a91818f08 |
| members    |                                      |
| name       | my-openshift-worker-group            |
| policies   | anti-affinity                        |
| project_id | 081f736edb604e889a864777936df531     |
| user_id    | f5aa933c3ffe46b59fe7dc8bfe14e30d     |
+------------+--------------------------------------+

And the manifest is modified as explained on https://github.com/openshift/installer/blob/master/docs/user/openstack/affinity.md

$ openshift-install create manifests --dir ostest                                                                                                         
INFO Credentials loaded from file "/home/stack/clouds.yaml"                                                                                                                                  
INFO Consuming Install Config from target directory                                                                                                                                          
INFO Manifests created in: ostest/manifests and ostest/openshift


$ cd ostest/

$ cat openshift/99_openshift-cluster-api_worker-machineset-0.yaml | yq .spec.template.spec.providerSpec.value.serverGroupID
"5fff614c-4a3f-4a46-a91b-345a91818f08"

Running IPI installation:

$ openshift-install create cluster --dir ostest --log-level debug
[...]
DEBUG Time elapsed per stage:
DEBUG     Infrastructure: 2m38s
DEBUG Bootstrap Complete: 16m8s
DEBUG                API: 7m3s
DEBUG  Bootstrap Destroy: 39s
DEBUG  Cluster Operators: 24m54s
INFO Time elapsed: 45m12s


$ openstack server list --long --all-projects -c Name -c Host
+-----------------------------+------------------------+
| Name                        | Host                   |
+-----------------------------+------------------------+
| ostest-rclbg-worker-0-h9dvn | compute-0.redhat.local |
| ostest-rclbg-worker-0-6thfm | compute-2.redhat.local |
| ostest-rclbg-worker-0-pt45f | compute-1.redhat.local |
| ostest-rclbg-master-2       | compute-0.redhat.local |
| ostest-rclbg-master-1       | compute-2.redhat.local |
| ostest-rclbg-master-0       | compute-1.redhat.local |
+-----------------------------+------------------------+

$ oc get machineset/ostest-rclbg-worker-0 -n openshift-machine-api -o json | jq .spec.template.spec.providerSpec.value.serverGroupID
"5fff614c-4a3f-4a46-a91b-345a91818f08"

Comment 33 Max Bridges 2021-06-30 13:47:30 UTC
@rlobillo I'm noticing this warning in some of the 4.7 installation docs:

"Modifying the OpenShift Container Platform manifest files created by the installation program is not supported. Applying a manifest file that you create, as in the following procedure, is supported."

Is this no longer the case? Makes me a little nervous, as the procedure that's tested for this bug relies on modifying generated manifest files.

Comment 34 rlobillo 2021-07-07 08:14:07 UTC
Hello Max.

I am removing the 'needinfo' flag as this topic has been already discussed internally.

Please let us know if you need anything else.

Comment 43 Max Bridges 2021-08-16 13:41:30 UTC
Looks like a change to the PR just needs a +1/-1 from QE: https://github.com/openshift/openshift-docs/pull/34157

FYI @rlobillo

Comment 44 rlobillo 2021-08-23 06:41:28 UTC
lgtm, thanks.

Comment 45 Max Bridges 2021-08-23 12:45:11 UTC
TY!

Comment 46 Max Bridges 2021-08-30 17:22:24 UTC
Eric + Anita--any concerns about backporting this through 4.6?

Comment 48 Max Bridges 2021-10-06 13:08:50 UTC
I am still looking for +1s from the ShiftStack team. Will check in with them. 

CCS does +1 this back to 4.6, as that is what QE verified.

Comment 49 Pierre Prinetti 2021-10-06 20:13:59 UTC
Please note that manifest modification is no longer necessary to apply strict anti-affinity to masters or workers in 4.10: https://issues.redhat.com/browse/OSASINFRA-2573

Comment 57 Max Bridges 2021-12-06 19:59:00 UTC
Proposed changes announced.

Comment 58 Max Bridges 2021-12-10 14:50:14 UTC
Changes merged. They should appear on prod today.

Comment 59 Max Bridges 2021-12-10 15:21:48 UTC
I see this reflected on docs.openshift.com now. Closing.