Description of problem: When adding new values to a MachineSet's `.spec.selector.matchLabels` new Machines are created for the MachineSet but the old Machines become orphaned from the MachineSet. How reproducible: always Steps to Reproduce: 1. Create a new cluster 2. Edit one of the worker MachineSets to add a new label in the `.spec.selector.matchLabels`, eg `test-label: ""` 3. Observe the Machines and MachineSet replica counts Actual results: a new Machine is created which takes the new selector match label, and the old Machine becomes orphaned from the MachineSet. the old Machine still contains the annotations, labels, and ownerRef that make it appear as part of the MachineSet, but the MachineSet no longer lists it. Expected results: I expected the MachineSet to accept the new selector without issue. If a new Machine should be created to fit the matchLabel, then the old Machine should be cleaned up. Additional info: I'm not sure what the best solution to this issue should be, we have several options: 1. don't allow updating of selectors on MachineSets 2. warn the user about the update and orphan Machine(s) 3. automatically delete the orphaned Machine(s) 4. update the Machine(s) in the MachineSet to take the new matchLabel There is another effect from this that could be deleterious for users, aside from the orphaned Machines, when the user deletes the MachineSet it will also delete the orphaned Machines. This also could lead to unintentional effects.
My vote for resolving this issue would be to work on adding immutability support for CRDs and then making the selector for a MachineSet immutable. The other options have technical challenges that make them too complicated to implement, and going this route matches that of deployments/daemonsets/statefulsets, so should be familiar to users. In the mean time, we could use webhooks to prevent users from changing this, that would be quick and easy, but not an ideal solution as webhooks can in theory fail. I'd be tempted to mirror this into Jira for a long term fix and make the webhook fix as the short term fix, and also document in the Machine docs that the selector is expected to be immutable.
(In reply to Joel Speed from comment #1) > My vote for resolving this issue would be to work on adding immutability > support for CRDs and then making the selector for a MachineSet immutable. > The other options have technical challenges that make them too complicated > to implement, and going this route matches that of > deployments/daemonsets/statefulsets, so should be familiar to users. > > In the mean time, we could use webhooks to prevent users from changing this, > that would be quick and easy, but not an ideal solution as webhooks can in > theory fail. > +1, i think this is a reasonable short term fix > I'd be tempted to mirror this into Jira for a long term fix and make the > webhook fix as the short term fix, and also document in the Machine docs > that the selector is expected to be immutable. +1, that will help us scope it. thanks
I created https://issues.redhat.com/browse/OCPCLOUD-1345 so that we don't forget to remove this in the future once we have immutability support
verified clusterversion: 4.10.0-0.nightly-2021-11-03-181048 update the selector on machineset will be denied. No new machine will be created. $ oc edit machineset zhsun114-7k4tl-worker-us-east-2c [16:47:41] error: machinesets.machine.openshift.io "zhsun114-7k4tl-worker-us-east-2c" could not be patched: admission webhook "validation.machineset.machine.openshift.io" denied the request: [spec.selector: Forbidden: selector is immutable, spec.template.metadata.labels: Invalid value: map[string]string{"machine.openshift.io/cluster-api-cluster":"zhsun114-7k4tl", "machine.openshift.io/cluster-api-machine-role":"worker", "machine.openshift.io/cluster-api-machine-type":"worker", "machine.openshift.io/cluster-api-machineset":"zhsun114-7k4tl-worker-us-east-2c"}: `selector` does not match template `labels`]
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056