Discussed at program meeting. Root cause analysis needed
Where di you update the tag number?
(In reply to seb from comment #3) > Where di you update the tag number? Hi Seb, I followed https://access.redhat.com/articles/2789521 for upgrading the container builds, this doesn't have any steps to update the tag in any config file. This is actually manual steps for upgrade. Am I missing something ? Thanks, Shylesh
(In reply to shylesh from comment #4) > (In reply to seb from comment #3) > > Where di you update the tag number? > > Hi Seb, > > I followed https://access.redhat.com/articles/2789521 for upgrading the > container builds, this doesn't have any steps to update the tag in any > config file. This is actually manual steps for upgrade. > > Am I missing something ? > > Thanks, > Shylesh Hi Seb, I followed the doc https://access.redhat.com/articles/2789521 for upgrading here are the steps according to doc a) For containers that were deployed by using the Ansible automation application: systemctl stop <daemon>@<ID>.service b) Pull the updated Red Hat Ceph Storage container image: docker pull registry.access.redhat.com/rhceph/<image_name> c) Start the daemon again: For containers that were deployed by using the Ansible automation application: systemctl start <daemon>@<ID>.service after this step container starts with "old" image but not the new image which we pulled. Anything needs to be added ? Thanks, Shylesh
Hi Shylesh, Sorry about that, there is actually a missing step in the documentation. The sequence should be: systemctl stop <daemon>@<ID>.service docker rm -f <container id> systemctl start <daemon>@<ID>.service Can you try and let me know? Thanks!
Update docs to include above steps. Seb, please provide redlined article to Anjana.
(Sebastien, can we have the systemd unit files do that automatically?)
Alright, this is funny, I forgot that we actually do this inside the systemd unit file itself. So when we stop/start we actually: - stop the container - remove the container - start the container Can you run "docker inspect <container id>" of the container when running? Then look for the image ID hash, then stop/start and check again? Thanks!
Can I access this setup to do further debugging? Thanks.
Alright, actually what you're experiencing is normal. Since your images have 2 different names, if you don't update ceph-ansible to reflect this new image version you will never get the systemd unit file updated with the name image name. Please change "ceph_docker_image_tag" value to the new tag and run ansible.
Is mon_containerized_deployment enable for all the roles?
Hi Rachana, The current behavior is correct, it's just that we are referencing the wrong container name. The command is being executed on a monitor which is what we want, the only problem is the container name is wrong. PR upstream, do you mind testing this? https://github.com/ceph/ceph-ansible/pull/1555 Thanks!
Yes Bara, we need to advise that mon_containerized_deployment should be used in group_vars/all.yml
(In reply to seb from comment #22) > Hi Rachana, > > The current behavior is correct, it's just that we are referencing the wrong > container name. The command is being executed on a monitor which is what we > want, the only problem is the container name is wrong. > > PR upstream, do you mind testing this? > https://github.com/ceph/ceph-ansible/pull/1555 > Thanks! @Seb, which downstream build contains the code fix for this defect? Can you please update the "Fixed In Version" field in this bug accordingly? We would like to test the code fix in downstream version. Thanks, Harish
backport here: https://github.com/ceph/ceph-ansible/pull/1566 will merge today and then we will get a new package to test this
We have a new tag upstream, Ken did we get a new build from it? Thanks
Thomas is handling the remaining work for 2.3. Thomas, want to pull in ceph-ansible v2.2.7 here with rdopkg new-version?
Rachana, it seems you're having a different problem now. As far as I can see, you successfully got the new version of the image. So technically this bug should go on verified. What's broken now is the upgrade, also I have one question: * Why do you stop all the services before the upgrade? That's **not** how an upgrade should be performed. Just run the rolling_update.yml playbook. What's actually not working is when we experience a complete shutdown, I think I know what's wrong and normally this should have been fixed by https://github.com/ceph/ceph-docker/pull/654, which solves https://bugzilla.redhat.com/show_bug.cgi?id=1455357 The corresponding container image is ceph-2-rhel-7-docker-candidate-20170530112243. In the meantime, please give me the output of: journalctl -xn -f -u ceph-mon@ceph-mon-magna029 Thanks.
Need info provided on IRC.
oops moving this back to ON_QA
Rachana, are all the mons started during the upgrade? There is something wrong that I don't understand. Please give me access to the machines. Thanks. Ping me on IRC.
I think I've found the issue, I'm working on a fix.
New commit, please re-test: remote: *** Checking commit aee9726f6457dcbef8ef633c21c704111f3d1dfc remote: *** Resolves: remote: *** Approved: remote: *** rhbz#1455357 (pm_ack+) remote: *** Commit aee9726f6457dcbef8ef633c21c704111f3d1dfc allowed
I think this one should go on VERIFIED as the initial issue "docker instances of osd and mons will not be spun with new image after a docker pull update" has been fixed by doc. QE just encountered this bug while trying to perform an upgrade. Either open a new BZ to track the rolling update issue or address it in https://bugzilla.redhat.com/show_bug.cgi?id=1455357 which is similar as the rolling update process restarts containers. So the root cause is identical. Thanks.
(In reply to seb from comment #50) > I think this one should go on VERIFIED as the initial issue "docker > instances of osd and mons will not be spun with new image after a docker > pull update" has been fixed by doc. QE just encountered this bug while > trying to perform an upgrade. > > Either open a new BZ to track the rolling update issue or address it in > https://bugzilla.redhat.com/show_bug.cgi?id=1455357 which is similar as the > rolling update process restarts containers. So the root cause is identical. > Opened a new BZ - Bug 1458024 - [ceph-ansible] [ceph-container] : upgrade of containerized cluster fails Lets track upgrade(using ceph-ansible) failures into that bug and moving this bug to verified for initial pull image problem > Thanks.
Closing this bug as the documentation has been published with the release