Bug 1392270 - upgrade of ceph in containerized environment fails from 1.3.2 to 1.3.3
Summary: upgrade of ceph in containerized environment fails from 1.3.2 to 1.3.3
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Container
Version: 1.3.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: 2.3
Assignee: Sébastien Han
QA Contact: Anoop
URL:
Whiteboard:
Depends On:
Blocks: 1384662
TreeView+ depends on / blocked
 
Reported: 2016-11-07 04:18 UTC by krishnaram Karthick
Modified: 2017-04-05 08:55 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-05 08:55:15 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description krishnaram Karthick 2016-11-07 04:18:46 UTC
Description of problem:

Upgrade of 1.3.2 to 1.3.3 ceph in containerized environment wasn't successful. MON container got re spun successfully. However, OSD respin fails

Version-Release number of selected component (if applicable):
ceph 1.3.3 container image

How reproducible:
always

Steps to Reproduce:
1) stopped mon container with 1.3.2
2) spun new mon container with 1.3.3 image - this was successful. ceph health commands returned proper output
3) stopped all OSD containers from one of the three nodes.
4) Tried re-spinning OSD with existing disks (without zapping). This wasn't successful, container stopped immediately.

Actual results:
OSD containers failed to spin up

Expected results:
OSD containers comes up and rebuilds

Additional info:

Comment 2 seb 2016-11-08 11:13:43 UTC
logs?

Comment 8 krishnaram Karthick 2016-11-24 08:48:16 UTC
With the latest image provided, upgrade from ceph 1.3.2 to 1.3.3 was successful.

steps followed to perform the upgrade:
======================================
1) created a containerized ceph cluster with 1.3.2 image ( current image in registry.access.redhat.com/rhceph/rhceph-1.3-rhel7, image id:8d6844d4fb9d) with 3 MONs and 3 OSDs

2) created a rbd device and mapped the rbd device to a libvirt hypervisor using librbd
[ref link: http://docs.ceph.com/docs/jewel/rbd/libvirt]

3) Installed an operating system on top of the rbd device

4) On the first MOD node, stopped the ceph container. spun a container using the new image.

# docker load -i docker-image-bc55b7a663275c02c214c7f2221223273301db00f84f42c05e18f15b872dd3a2.x86_64.tar.gz
# docker stop 0f6dab202c33 [container running 1.3.2]
# docker run -d --net=host -v /etc/ceph:/etc/ceph:z  -v /var/lib/ceph/:/var/lib/ceph/:z -e CEPH_DAEMON=MON -e MON_IP=10.70.43.185 -e CEPH_PUBLIC_NETWORK=10.70.40.0/22 1c4f8e54c782

5) Repeated step 4 on other nodes running MON
6) stopped the containers running OSDs.
#docker stop ab7c448b6eda --> osd running on /dev/vdb
#docker stop 09338f65c84f --> osd running on /dev/vdc

7) started OSD containers with new image

# docker run -d --net=host --pid=host --privileged=true -v /var/lib/ceph:/var/lib/ceph:z -v /etc/ceph:/etc/ceph:z -v /dev/:/dev/ -e OSD_DEVICE=/dev/vdb -e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE 1c4f8e54c782
# docker run -d --net=host --pid=host --privileged=true -v /var/lib/ceph:/var/lib/ceph:z -v /etc/ceph:/etc/ceph:z -v /dev/:/dev/ -e OSD_DEVICE=/dev/vdc -e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE 1c4f8e54c782

8) Allow for rebuilds to complete and health to be 'ok', from one of the mons check the health, 
# docker exec -it a50f90c73d2c ceph -w

9) Repeat steps 6,7 and 8 on other OSD nodes.

10) containerized ceph cluster upgraded to 1.3.3
# docker exec -it a50f90c73d2c ceph -v
ceph version 0.94.9-3.el7cp (7358f71bebe44c463df4d91c2770149e812bbeaa)

Comment 9 Ken Dreyer (Red Hat) 2017-01-17 03:19:14 UTC
Did this fix ever ship to customers? What is the next step?

Comment 10 seb 2017-01-17 09:14:04 UTC
I think we can close this one, according to Ivan the fix was already upstream so he just built a new image.

Comment 11 Ivan Font 2017-01-17 17:11:09 UTC
(In reply to seb from comment #10)
> I think we can close this one, according to Ivan the fix was already
> upstream so he just built a new image.

That's correct. I just back-ported the fix and built a new image.

Comment 12 krishnaram Karthick 2017-02-27 03:49:42 UTC
There is nothing that needs to be done with this bug, this can be closed.

Comment 13 seb 2017-04-05 08:55:15 UTC
As per Karthick comment, I'm closing this bug since the fix is part of the new image.


Note You need to log in before you can comment on or make changes to this bug.