Hide Forgot
Description of problem: Upgrade of 1.3.2 to 1.3.3 ceph in containerized environment wasn't successful. MON container got re spun successfully. However, OSD respin fails Version-Release number of selected component (if applicable): ceph 1.3.3 container image How reproducible: always Steps to Reproduce: 1) stopped mon container with 1.3.2 2) spun new mon container with 1.3.3 image - this was successful. ceph health commands returned proper output 3) stopped all OSD containers from one of the three nodes. 4) Tried re-spinning OSD with existing disks (without zapping). This wasn't successful, container stopped immediately. Actual results: OSD containers failed to spin up Expected results: OSD containers comes up and rebuilds Additional info:
logs?
With the latest image provided, upgrade from ceph 1.3.2 to 1.3.3 was successful. steps followed to perform the upgrade: ====================================== 1) created a containerized ceph cluster with 1.3.2 image ( current image in registry.access.redhat.com/rhceph/rhceph-1.3-rhel7, image id:8d6844d4fb9d) with 3 MONs and 3 OSDs 2) created a rbd device and mapped the rbd device to a libvirt hypervisor using librbd [ref link: http://docs.ceph.com/docs/jewel/rbd/libvirt] 3) Installed an operating system on top of the rbd device 4) On the first MOD node, stopped the ceph container. spun a container using the new image. # docker load -i docker-image-bc55b7a663275c02c214c7f2221223273301db00f84f42c05e18f15b872dd3a2.x86_64.tar.gz # docker stop 0f6dab202c33 [container running 1.3.2] # docker run -d --net=host -v /etc/ceph:/etc/ceph:z -v /var/lib/ceph/:/var/lib/ceph/:z -e CEPH_DAEMON=MON -e MON_IP=10.70.43.185 -e CEPH_PUBLIC_NETWORK=10.70.40.0/22 1c4f8e54c782 5) Repeated step 4 on other nodes running MON 6) stopped the containers running OSDs. #docker stop ab7c448b6eda --> osd running on /dev/vdb #docker stop 09338f65c84f --> osd running on /dev/vdc 7) started OSD containers with new image # docker run -d --net=host --pid=host --privileged=true -v /var/lib/ceph:/var/lib/ceph:z -v /etc/ceph:/etc/ceph:z -v /dev/:/dev/ -e OSD_DEVICE=/dev/vdb -e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE 1c4f8e54c782 # docker run -d --net=host --pid=host --privileged=true -v /var/lib/ceph:/var/lib/ceph:z -v /etc/ceph:/etc/ceph:z -v /dev/:/dev/ -e OSD_DEVICE=/dev/vdc -e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE 1c4f8e54c782 8) Allow for rebuilds to complete and health to be 'ok', from one of the mons check the health, # docker exec -it a50f90c73d2c ceph -w 9) Repeat steps 6,7 and 8 on other OSD nodes. 10) containerized ceph cluster upgraded to 1.3.3 # docker exec -it a50f90c73d2c ceph -v ceph version 0.94.9-3.el7cp (7358f71bebe44c463df4d91c2770149e812bbeaa)
Did this fix ever ship to customers? What is the next step?
I think we can close this one, according to Ivan the fix was already upstream so he just built a new image.
(In reply to seb from comment #10) > I think we can close this one, according to Ivan the fix was already > upstream so he just built a new image. That's correct. I just back-ported the fix and built a new image.
There is nothing that needs to be done with this bug, this can be closed.
As per Karthick comment, I'm closing this bug since the fix is part of the new image.