Bug 1392270

Summary: upgrade of ceph in containerized environment fails from 1.3.2 to 1.3.3
Product: Red Hat Ceph Storage Reporter: krishnaram Karthick <kramdoss>
Component: ContainerAssignee: Sébastien Han <shan>
Status: CLOSED NOTABUG QA Contact: Anoop <annair>
Severity: high Docs Contact:
Priority: unspecified    
Version: 1.3.3CC: dang, hchen, ifont, jim.curtis, kdreyer, kramdoss, pprakash, rcyriac, seb
Target Milestone: rc   
Target Release: 2.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-05 08:55:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1384662    

Description krishnaram Karthick 2016-11-07 04:18:46 UTC
Description of problem:

Upgrade of 1.3.2 to 1.3.3 ceph in containerized environment wasn't successful. MON container got re spun successfully. However, OSD respin fails

Version-Release number of selected component (if applicable):
ceph 1.3.3 container image

How reproducible:
always

Steps to Reproduce:
1) stopped mon container with 1.3.2
2) spun new mon container with 1.3.3 image - this was successful. ceph health commands returned proper output
3) stopped all OSD containers from one of the three nodes.
4) Tried re-spinning OSD with existing disks (without zapping). This wasn't successful, container stopped immediately.

Actual results:
OSD containers failed to spin up

Expected results:
OSD containers comes up and rebuilds

Additional info:

Comment 2 seb 2016-11-08 11:13:43 UTC
logs?

Comment 8 krishnaram Karthick 2016-11-24 08:48:16 UTC
With the latest image provided, upgrade from ceph 1.3.2 to 1.3.3 was successful.

steps followed to perform the upgrade:
======================================
1) created a containerized ceph cluster with 1.3.2 image ( current image in registry.access.redhat.com/rhceph/rhceph-1.3-rhel7, image id:8d6844d4fb9d) with 3 MONs and 3 OSDs

2) created a rbd device and mapped the rbd device to a libvirt hypervisor using librbd
[ref link: http://docs.ceph.com/docs/jewel/rbd/libvirt]

3) Installed an operating system on top of the rbd device

4) On the first MOD node, stopped the ceph container. spun a container using the new image.

# docker load -i docker-image-bc55b7a663275c02c214c7f2221223273301db00f84f42c05e18f15b872dd3a2.x86_64.tar.gz
# docker stop 0f6dab202c33 [container running 1.3.2]
# docker run -d --net=host -v /etc/ceph:/etc/ceph:z  -v /var/lib/ceph/:/var/lib/ceph/:z -e CEPH_DAEMON=MON -e MON_IP=10.70.43.185 -e CEPH_PUBLIC_NETWORK=10.70.40.0/22 1c4f8e54c782

5) Repeated step 4 on other nodes running MON
6) stopped the containers running OSDs.
#docker stop ab7c448b6eda --> osd running on /dev/vdb
#docker stop 09338f65c84f --> osd running on /dev/vdc

7) started OSD containers with new image

# docker run -d --net=host --pid=host --privileged=true -v /var/lib/ceph:/var/lib/ceph:z -v /etc/ceph:/etc/ceph:z -v /dev/:/dev/ -e OSD_DEVICE=/dev/vdb -e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE 1c4f8e54c782
# docker run -d --net=host --pid=host --privileged=true -v /var/lib/ceph:/var/lib/ceph:z -v /etc/ceph:/etc/ceph:z -v /dev/:/dev/ -e OSD_DEVICE=/dev/vdc -e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE 1c4f8e54c782

8) Allow for rebuilds to complete and health to be 'ok', from one of the mons check the health, 
# docker exec -it a50f90c73d2c ceph -w

9) Repeat steps 6,7 and 8 on other OSD nodes.

10) containerized ceph cluster upgraded to 1.3.3
# docker exec -it a50f90c73d2c ceph -v
ceph version 0.94.9-3.el7cp (7358f71bebe44c463df4d91c2770149e812bbeaa)

Comment 9 Ken Dreyer (Red Hat) 2017-01-17 03:19:14 UTC
Did this fix ever ship to customers? What is the next step?

Comment 10 seb 2017-01-17 09:14:04 UTC
I think we can close this one, according to Ivan the fix was already upstream so he just built a new image.

Comment 11 Ivan Font 2017-01-17 17:11:09 UTC
(In reply to seb from comment #10)
> I think we can close this one, according to Ivan the fix was already
> upstream so he just built a new image.

That's correct. I just back-ported the fix and built a new image.

Comment 12 krishnaram Karthick 2017-02-27 03:49:42 UTC
There is nothing that needs to be done with this bug, this can be closed.

Comment 13 seb 2017-04-05 08:55:15 UTC
As per Karthick comment, I'm closing this bug since the fix is part of the new image.