Bug 1449159 - [ceph container]:- docker instances of osd and mons will not be spinned with new image after a docker pull update
Summary: [ceph container]:- docker instances of osd and mons will not be spinned with ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Documentation
Version: 2.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: 2.3
Assignee: Bara Ancincova
QA Contact: shylesh
URL:
Whiteboard:
Depends On:
Blocks: 1437905
TreeView+ depends on / blocked
 
Reported: 2017-05-09 10:17 UTC by shylesh
Modified: 2017-06-20 18:23 UTC (History)
15 users (show)

Fixed In Version: ceph-2-rhel-7-docker-candidate-20170525181418
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-20 18:23:00 UTC
Embargoed:


Attachments (Terms of Use)

Comment 2 John Poelstra 2017-05-10 15:45:30 UTC
Discussed at program meeting. Root cause analysis needed

Comment 3 seb 2017-05-12 11:53:58 UTC
Where di you update the tag number?

Comment 4 shylesh 2017-05-12 18:11:37 UTC
(In reply to seb from comment #3)
> Where di you update the tag number?

Hi Seb,

I followed https://access.redhat.com/articles/2789521 for upgrading the container builds, this doesn't have any steps to update the tag in any config file. This is actually manual steps for upgrade.

Am I missing something ?

Thanks,
Shylesh

Comment 5 shylesh 2017-05-15 07:41:59 UTC
(In reply to shylesh from comment #4)
> (In reply to seb from comment #3)
> > Where di you update the tag number?
> 
> Hi Seb,
> 
> I followed https://access.redhat.com/articles/2789521 for upgrading the
> container builds, this doesn't have any steps to update the tag in any
> config file. This is actually manual steps for upgrade.
> 
> Am I missing something ?
> 
> Thanks,
> Shylesh

Hi Seb,

I followed the doc https://access.redhat.com/articles/2789521 for upgrading here are the steps according to doc

 a) For containers that were deployed by using the Ansible automation application:
 

    systemctl stop <daemon>@<ID>.service

b) Pull the updated Red Hat Ceph Storage container image:
 
    docker pull registry.access.redhat.com/rhceph/<image_name>


c) Start the daemon again:

  For containers that were deployed by using the Ansible automation application:
 

    systemctl start <daemon>@<ID>.service


after this step container starts with "old" image but not the new image which we pulled.

Anything needs to be added ?

Thanks,
Shylesh

Comment 6 seb 2017-05-16 09:01:28 UTC
Hi Shylesh,

Sorry about that, there is actually a missing step in the documentation.
The sequence should be:

systemctl stop <daemon>@<ID>.service
docker rm -f <container id>
systemctl start <daemon>@<ID>.service


Can you try and let me know?
Thanks!

Comment 7 Ian Colle 2017-05-16 16:31:03 UTC
Update docs to include above steps. Seb, please provide redlined article to Anjana.

Comment 8 Ken Dreyer (Red Hat) 2017-05-16 16:34:09 UTC
(Sebastien, can we have the systemd unit files do that automatically?)

Comment 12 seb 2017-05-17 13:56:14 UTC
Alright, this is funny, I forgot that we actually do this inside the systemd unit file itself. So when we stop/start we actually:

- stop the container
- remove the container
- start the container

Can you run "docker inspect <container id>" of the container when running? Then look for the image ID hash, then stop/start and check again? Thanks!

Comment 14 seb 2017-05-18 08:59:48 UTC
Can I access this setup to do further debugging?
Thanks.

Comment 16 seb 2017-05-18 09:38:03 UTC
Alright, actually what you're experiencing is normal. Since your images have 2 different names, if you don't update ceph-ansible to reflect this new image version you will never get the systemd unit file updated with the name image name.

Please change "ceph_docker_image_tag" value to the new tag and run ansible.

Comment 18 seb 2017-05-18 12:52:51 UTC
Is mon_containerized_deployment enable for all the roles?

Comment 22 seb 2017-05-22 07:44:19 UTC
Hi Rachana,

The current behavior is correct, it's just that we are referencing the wrong container name. The command is being executed on a monitor which is what we want, the only problem is the container name is wrong.

PR upstream, do you mind testing this? https://github.com/ceph/ceph-ansible/pull/1555
Thanks!

Comment 24 seb 2017-05-22 09:44:10 UTC
Yes Bara, we need to advise that mon_containerized_deployment should be used in group_vars/all.yml

Comment 26 Harish NV Rao 2017-05-24 10:25:26 UTC
(In reply to seb from comment #22)
> Hi Rachana,
> 
> The current behavior is correct, it's just that we are referencing the wrong
> container name. The command is being executed on a monitor which is what we
> want, the only problem is the container name is wrong.
> 
> PR upstream, do you mind testing this?
> https://github.com/ceph/ceph-ansible/pull/1555
> Thanks!

@Seb, which downstream build contains the code fix for this defect? 
Can you please update the "Fixed In Version" field in this bug accordingly? We would like to test the code fix in downstream version.

Thanks,
Harish

Comment 27 seb 2017-05-24 13:27:42 UTC
backport here: https://github.com/ceph/ceph-ansible/pull/1566
will merge today and then we will get a new package to test this

Comment 28 seb 2017-05-29 09:00:34 UTC
We have a new tag upstream, Ken did we get a new build from it?
Thanks

Comment 29 Ken Dreyer (Red Hat) 2017-05-30 17:10:30 UTC
Thomas is handling the remaining work for 2.3.

Thomas, want to pull in ceph-ansible v2.2.7 here with rdopkg new-version?

Comment 39 seb 2017-05-31 08:03:13 UTC
Rachana, it seems you're having a different problem now. As far as I can see, you successfully got the new version of the image. So technically this bug should go on verified.

What's broken now is the upgrade, also I have one question:

* Why do you stop all the services before the upgrade? That's **not** how an upgrade should be performed. Just run the rolling_update.yml playbook.

What's actually not working is when we experience a complete shutdown, I think I know what's wrong and normally this should have been fixed by https://github.com/ceph/ceph-docker/pull/654, which solves https://bugzilla.redhat.com/show_bug.cgi?id=1455357

The corresponding container image is ceph-2-rhel-7-docker-candidate-20170530112243.

In the meantime, please give me the output of:

journalctl -xn -f -u ceph-mon@ceph-mon-magna029

Thanks.

Comment 41 seb 2017-05-31 08:29:33 UTC
Need info provided on IRC.

Comment 42 seb 2017-05-31 09:21:21 UTC
oops moving this back to ON_QA

Comment 47 seb 2017-06-01 12:18:46 UTC
Rachana, are all the mons started during the upgrade?
There is something wrong that I don't understand.

Please give me access to the machines. Thanks.
Ping me on IRC.

Comment 48 seb 2017-06-01 13:14:38 UTC
I think I've found the issue, I'm working on a fix.

Comment 49 seb 2017-06-01 14:22:05 UTC
New commit, please re-test:

remote: *** Checking commit aee9726f6457dcbef8ef633c21c704111f3d1dfc
remote: *** Resolves:
remote: ***   Approved:
remote: ***     rhbz#1455357 (pm_ack+)
remote: *** Commit aee9726f6457dcbef8ef633c21c704111f3d1dfc allowed

Comment 50 seb 2017-06-01 15:21:40 UTC
I think this one should go on VERIFIED as the initial issue "docker instances of osd and mons will not be spun with new image after a docker pull update" has been fixed by doc. QE just encountered this bug while trying to perform an upgrade.

Either open a new BZ to track the rolling update issue or address it in https://bugzilla.redhat.com/show_bug.cgi?id=1455357 which is similar as the rolling update process restarts containers. So the root cause is identical.

Thanks.

Comment 51 Rachana Patel 2017-06-01 19:29:13 UTC
(In reply to seb from comment #50)
> I think this one should go on VERIFIED as the initial issue "docker
> instances of osd and mons will not be spun with new image after a docker
> pull update" has been fixed by doc. QE just encountered this bug while
> trying to perform an upgrade.
> 
> Either open a new BZ to track the rolling update issue or address it in
> https://bugzilla.redhat.com/show_bug.cgi?id=1455357 which is similar as the
> rolling update process restarts containers. So the root cause is identical.
> 

Opened a new BZ - Bug 1458024 - [ceph-ansible] [ceph-container] : upgrade of containerized cluster fails

Lets track upgrade(using ceph-ansible) failures into that bug and moving this bug to verified for initial pull image problem


> Thanks.

Comment 53 John Poelstra 2017-06-20 18:23:00 UTC
Closing this bug as the documentation has been published with the release


Note You need to log in before you can comment on or make changes to this bug.