In ceph-ansible 3.1 the client role tasks are executed serially on each client node. This causes tasks like scale up or upgrade to take a lot of time depending on how many client nodes are using the cluster; it should be possible instead to execute the client role tasks in parallel on all nodes at the same time.
We should provide set of new playbooks where we will try to do something like: serial: "{{ ((groups['<group>'] | length) * 0.2) | round(0,'ceil') | int }}" And even full parallel on clients as it makes no sense to containerize and upgrade one by one node if you have hundreds of nodes.
- This is about rolling update; we see "serial: 1" here: https://github.com/ceph/ceph-ansible/blob/master/infrastructure-playbooks/rolling_update.yml#L798-L803 - Can we override this value for the client role, e.g. by customizing the inventory?
As per a conversation with Seb: - the ceph-ansible team will remove "serial: 1" from rolling_update.yml playbook for the clients - it will be put into 3.2
Looks like the fix introduced a new issue [1] 2019-01-16 17:32:34,485 p=28500 u=mistral | ERROR! The field 'serial' has an invalid value, which includes an undefined variable. The error was: 'ansible_forks' is undefined The error appears to have been in '/usr/share/ceph-ansible/infrastructure-playbooks/rolling_update.yml': line 737, column 3, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - name: upgrade ceph client node ^ here exception type: <class 'ansible.errors.AnsibleUndefinedVariable'> exception: 'ansible_forks' is undefined 1. https://paste.fedoraproject.org/paste/QV7A0FhAl4t4uQMjXUK3Tw
(In reply to Giulio Fidente from comment #12) > Looks like the fix introduced a new issue [1] > > 2019-01-16 17:32:34,485 p=28500 u=mistral | ERROR! The field 'serial' has > an invalid value, which includes an undefined variable. The error was: > 'ansible_forks' is undefined > > > The error appears to have been in > '/usr/share/ceph-ansible/infrastructure-playbooks/rolling_update.yml': line > 737, column 3, but may > > be elsewhere in the file depending on the exact syntax problem. > > The offending line appears to be: > > > > > - name: upgrade ceph client node > ^ here > > exception type: <class 'ansible.errors.AnsibleUndefinedVariable'> > > exception: 'ansible_forks' is undefined > > 1. https://paste.fedoraproject.org/paste/QV7A0FhAl4t4uQMjXUK3Tw I think the problem is that ansible_forks was added in ansible 2.5 Lukas, do you know what version of ansible was installed on the undercloud when the run failed?
Hi Tejas, yes, the default behaviour is to update clients in parallel.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0223