Bug 1650184

Summary:	[RFE] Do not execute client role tasks serially but in parallel
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Giulio Fidente <gfidente>
Component:	Ceph-Ansible	Assignee:	Guillaume Abrioux <gabrioux>
Status:	CLOSED ERRATA	QA Contact:	ceph-qe-bugs <ceph-qe-bugs>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.1	CC:	anharris, aschoen, augol, ceph-eng-bugs, ceph-qe-bugs, edonnell, gabrioux, gfidente, gmeno, hnallurv, johfulto, lbezdick, nthomas, sankarshan, tchandra, tserlin, vashastr, yprokule
Target Milestone:	z1	Keywords:	FutureFeature
Target Release:	3.2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	RHEL: ceph-ansible-3.2.4-1.el7cp Ubuntu: ceph-ansible_3.2.4-2redhat1	Doc Type:	Enhancement
Doc Text:	Previously, the `rolling-update.yml` playbook executed the client roles one by one. With this update, users can specify the number of nodes to be processed in one batch using the new variable `client_update_batch`. This makes the upgrade process for client nodes much faster. If no value is passed, it defaults to the value of the variable for `ansible_forks`, which is `5` by default.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-01-31 10:36:36 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1578730

Description Giulio Fidente 2018-11-15 14:35:21 UTC

In ceph-ansible 3.1 the client role tasks are executed serially on each client node.

This causes tasks like scale up or upgrade to take a lot of time depending on how many client nodes are using the cluster; it should be possible instead to execute the client role tasks in parallel on all nodes at the same time.

Comment 3 Lukas Bezdicka 2018-11-15 15:15:59 UTC

We should provide set of new playbooks where we will try to do something like:
  serial: "{{ ((groups['<group>'] | length)  * 0.2) | round(0,'ceil') | int }}"
And even full parallel on clients as it makes no sense to containerize and upgrade one by one node if you have hundreds of nodes.

Comment 4 John Fulton 2018-12-20 15:40:01 UTC

- This is about rolling update; we see "serial: 1" here: https://github.com/ceph/ceph-ansible/blob/master/infrastructure-playbooks/rolling_update.yml#L798-L803
- Can we override this value for the client role, e.g. by customizing the inventory?

Comment 5 John Fulton 2019-01-02 14:18:05 UTC

As per a conversation with Seb: 

- the ceph-ansible team will remove "serial: 1" from rolling_update.yml playbook for the clients
- it will be put into 3.2

Comment 12 Giulio Fidente 2019-01-16 16:38:56 UTC

Looks like the fix introduced a new issue [1]

2019-01-16 17:32:34,485 p=28500 u=mistral |  ERROR! The field 'serial' has an invalid value, which includes an undefined variable. The error was: 'ansible_forks' is undefined                                    
                                                                                                                                                                                                                  
The error appears to have been in '/usr/share/ceph-ansible/infrastructure-playbooks/rolling_update.yml': line 737, column 3, but may                                                                              
be elsewhere in the file depending on the exact syntax problem. 
                                                                
The offending line appears to be:                               
                                                                                                                                                                                                                  
                                                                                                                                                                                                                  
- name: upgrade ceph client node                                
  ^ here                                                        
                                                                
exception type: <class 'ansible.errors.AnsibleUndefinedVariable'>                                                                                                                                                 
exception: 'ansible_forks' is undefined    

1. https://paste.fedoraproject.org/paste/QV7A0FhAl4t4uQMjXUK3Tw

Comment 14 Giulio Fidente 2019-01-17 17:36:14 UTC

(In reply to Giulio Fidente from comment #12)
> Looks like the fix introduced a new issue [1]
> 
> 2019-01-16 17:32:34,485 p=28500 u=mistral |  ERROR! The field 'serial' has
> an invalid value, which includes an undefined variable. The error was:
> 'ansible_forks' is undefined                                    
>                                                                             
> 
> The error appears to have been in
> '/usr/share/ceph-ansible/infrastructure-playbooks/rolling_update.yml': line
> 737, column 3, but may                                                      
> 
> be elsewhere in the file depending on the exact syntax problem. 
>                                                                 
> The offending line appears to be:                               
>                                                                             
> 
>                                                                             
> 
> - name: upgrade ceph client node                                
>   ^ here                                                        
>                                                                 
> exception type: <class 'ansible.errors.AnsibleUndefinedVariable'>           
> 
> exception: 'ansible_forks' is undefined    
> 
> 1. https://paste.fedoraproject.org/paste/QV7A0FhAl4t4uQMjXUK3Tw

I think the problem is that ansible_forks was added in ansible 2.5

Lukas, do you know what version of ansible was installed on the undercloud when the run failed?

Comment 27 Guillaume Abrioux 2019-01-22 15:02:38 UTC

Hi Tejas,

yes, the default behaviour is to update clients in parallel.

Comment 30 errata-xmlrpc 2019-01-31 10:36:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0223