Bug 1395820 - rolling_upgrade restarts services regardless of version
Summary: rolling_upgrade restarts services regardless of version
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 3.0
Assignee: Sébastien Han
QA Contact: Vasishta
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-16 18:33 UTC by Christina Meno
Modified: 2022-02-21 18:06 UTC (History)
12 users (show)

Fixed In Version: ceph-ansible-2.1.9-1.el7scon
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-29 14:06:02 UTC
Embargoed:


Attachments (Terms of Use)
Terminal log of rolling update (171.41 KB, text/plain)
2017-05-17 12:15 UTC, Vasishta
no flags Details

Description Christina Meno 2016-11-16 18:33:35 UTC
Description of problem:
In some cases rolling_update fails e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1366808 in this case we'd like to be able to recommend taking some corrective action and re-running rolling_update.

Right now the task that restarts OSDs will run regardless if the service  version changed. This will cause unnecessary load on the cluster AND will increase the probability for timing dependant issues like the dmcrypt race above to prevent the cluster from reaching active an clean. 

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Create a cluster using ceph version A
2. upgrade onee OSD host to ceph version B
3. run rolling_update and 

Actual results:
observe that all osds were restarted

Expected results:
only services whose version has changed should be restarted

Comment 2 seb 2016-11-17 09:24:16 UTC
Upstream seems to have another approach now, it first stops services, apply roles and then make sure the process is started. This makes sense since the package upgrade + the role will start the daemon already.

There are no more restart calls.

Comment 3 Ken Dreyer (Red Hat) 2017-02-15 19:33:52 UTC
Sebastien, is this bug fixed as of v2.1.9?

Comment 4 seb 2017-02-17 09:53:57 UTC
Yes Ken.

Comment 8 Vasishta 2017-05-09 14:50:47 UTC
Hi Ken,Seb,

   Is there any change that has gone into this fix? From comment #2 there is a new upstream approach.
So could you let us know what is the expected  behaviour now?

Thanks

Comment 9 seb 2017-05-11 09:09:12 UTC
What do you mean?
There is nothing new to this BZ, IMHO this was fixed a couple of months ago.

Comment 10 Vasishta 2017-05-11 10:10:42 UTC
Hi,

As per my understanding, Now ONLY services which undergoes version change will be stopped and started later and other osd services are untouched as per new approach followed in upstream. Is this the actual case ? Is my understating correct ?

I got confused whether this particular fix followed the above mentioned upstream approach or just skipped restarting of services which didn't get upgraded. 


Regards,
Vasishta

Comment 11 seb 2017-05-12 13:36:17 UTC
Your understanding is correct, only selected services will be updated and thus restarted.

Comment 12 Vasishta 2017-05-17 12:15:22 UTC
Created attachment 1279679 [details]
Terminal log of rolling update

Hi all,

I could observe that osd services were 'stopped and started' even though there was no version change.
Attached file contains full terminal log.
(All are ubuntu nodes and only repos in node 106 was not replaced)

TASK [stop ceph osds with systemd] *********************************************
changed: [magna106] => (item=2)
changed: [magna106] => (item=5)

TASK [start ceph osds with systemd] ********************************************
ok: [magna106] => (item=2)
ok: [magna106] => (item=5)


$ for i in {068,071,106};do ssh magna$i 'ceph -v';done
ceph version 10.2.7-20redhat1xenial (8b2e41c074ec6b5053c9838b5e21239ba5d63443)
ceph version 10.2.7-20redhat1xenial (8b2e41c074ec6b5053c9838b5e21239ba5d63443)
ceph version 10.2.5-28redhat1xenial (033f137cde8573cfc5a4662b4ed6a63b8a8d1464)

Changing the status back to ASSIGNED state as service was stopped and started regardless of version change.
Please let me know if there are any concerns.

Regards,
Vasishta

Comment 13 Vasishta 2017-05-17 12:38:07 UTC
Hi,

Again tried a dry run of rolling update and observed that services were stopped and started though there were no version change. Moving back to ASSIGNED stated as mentioned in Comment 12.

$ ps aux |grep ceph
ceph       31780  0.2  0.1 891124 41224 ?        Ssl  10:55   0:15 /usr/bin/ceph-osd -f --cluster temp --id 2 --setuser ceph --setgroup ceph
ceph       31920  0.2  0.1 890048 41488 ?        Ssl  10:55   0:15 /usr/bin/ceph-osd -f --cluster temp --id 5 --setuser ceph --setgroup ceph
ubuntu     32486  0.0  0.0  16572  2148 pts/1    S+   12:29   0:00 grep --color=auto ceph
ubuntu@magna106:~$ ps aux |grep ceph
ubuntu     32832  0.0  0.0  16572  2208 pts/1    S+   12:31   0:00 grep --color=auto ceph
$ ps aux| grep ceph
ceph       34537  0.5  0.1 883068 36892 ?        Ssl  12:31   0:00 /usr/bin/ceph-osd -f --cluster temp --id 2 --setuser ceph --setgroup ceph
ceph       34675  0.5  0.1 884052 36908 ?        Ssl  12:31   0:00 /usr/bin/ceph-osd -f --cluster temp --id 5 --setuser ceph --setgroup ceph
ubuntu     34965  0.0  0.0  16572  2196 pts/1    S+   12:33   0:00 grep --color=auto ceph


Regards,
Vasishta

Comment 14 seb 2017-05-17 15:11:08 UTC
I'd say that if you run rolling_update, you're expecting to get a new version, even you don't then it will just assuming there is a new version available.

I'd like to close this as "won't fix" since the behavior is expected.
Do not run the playbook if there is nothing to update.


Does that sound reasonable?

Comment 15 John Poelstra 2017-05-17 15:20:25 UTC
discussed at program meeting, does not meeting blocker critera, moving to next release


Note You need to log in before you can comment on or make changes to this bug.