Bug 1739209

Summary: [ceph-ansible] - rolling-update of containerized cluster from 2.x to 3.x failed trying to run systemd-device-to-id.sh saying no such file
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vasishta <vashastr>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.3CC: aschoen, ceph-eng-bugs, gabrioux, gmeno, nthomas, sankarshan, tchandra, tserlin
Target Milestone: rcKeywords: Regression
Target Release: 3.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.2.24-1.el7cp Ubuntu: ceph-ansible_3.2.24-2redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-21 15:11:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File contains playbook log
none
File contains playbook log none

Description Vasishta 2019-08-08 18:34:41 UTC
Created attachment 1601912 [details]
File contains playbook log

Description of problem:
Rolling update from ceph-ansible 2.x to 3.x failed in task "ceph-osd : run the systemd-device-to-id.sh script" saying "No such file or directory"

It seems like task which copies the script has not been delegated to other nodes.

Version-Release number of selected component (if applicable):
ceph-ansible-3.2.22-1.el7cp.noarch

How reproducible:
Always (1/1)

Steps to Reproduce:
1. Get a RHCS 2.x containerized cluster (With OSDs having device name in their service name) 
2. Try to upgrade it to 3.3

Actual results:
"bash: /tmp/systemd-device-to-id.sh: No such file or directory"

Expected results:
rolling-update must complete successfully

Additional info:

Comment 1 Vasishta 2019-08-08 19:57:59 UTC
Created attachment 1601932 [details]
File contains playbook log

I think following lines from start_osds.yml needs to be removed

https://github.com/ceph/ceph-ansible/blob/stable-3.2/roles/ceph-osd/tasks/start_osds.yml#L131-L132

It seemed to be be working for me, cluster got updated and all new OSD services are up.

But old services (service with device name) were present and flapping on nodes on which script was not run first time when I had initiated (Logs of run 1 is at previous attachment).

Regards,
Vasishta Shastry
QE, Ceph

Comment 12 Vasishta 2019-08-13 17:43:13 UTC
Working fine with ceph-ansible-3.2.24-1.el7cp.noarch
Moving to VERIFIED state

Comment 14 errata-xmlrpc 2019-08-21 15:11:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2538