1500470 – 3.0 Upgrade: Container upgrade from 2.4 to 3.0 Failed with client.admin authentication error (1) Operation not permitted

This project is now read‑only. Starting Monday, February 2, please use Jira Cloud for all bug tracking management.

Bug 1500470 - 3.0 Upgrade: Container upgrade from 2.4 to 3.0 Failed with client.admin authentication error (1) Operation not permitted

Summary: 3.0 Upgrade: Container upgrade from 2.4 to 3.0 Failed with client.admin authe...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	rc
Target Release:	3.0
Assignee:	Sébastien Han
QA Contact:	Parikshith
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-10-10 16:05 UTC by Parikshith
Modified:	2017-12-05 23:47 UTC (History)
CC List:	11 users (show)
Fixed In Version:	RHEL: ceph-ansible-3.0.4-1.el7cp Ubuntu: ceph-ansible_3.0.4-2redhat1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-12-05 23:47:58 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
rolling update log (390.99 KB, text/plain) 2017-10-10 16:05 UTC, Parikshith	no flags	Details
upgrade log (372.76 KB, text/plain) 2017-10-17 11:24 UTC, Parikshith	no flags	Details
ceph-ansible-container-update-2.4-3.0 (1.70 MB, text/plain) 2017-10-19 17:22 UTC, Sébastien Han	no flags	Details
upgrade-2.4-to-3.0-container (1.70 MB, text/plain) 2017-10-20 11:57 UTC, Sébastien Han	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 2076	0	None	None	None	2017-10-18 16:25:58 UTC
Red Hat Product Errata	RHBA-2017:3387	0	normal	SHIPPED_LIVE	Red Hat Ceph Storage 3.0 bug fix and enhancement update	2017-12-06 03:03:45 UTC

Description Parikshith 2017-10-10 16:05:22 UTC

Created attachment 1336802 [details]
rolling update log

Description of problem:
RHCS 2.4 - 3.0 upgrade fails with error client.admin authentication error (1) Operation not permitted since the admin keyring on the node that is getting updated is altered.

Version-Release number of selected component (if applicable):
ceph version: 12.2.1-10.el7cp
Ceph-ansible: 3.0.0-0.1.rc19.el7cp

How reproducible:


Steps to Reproduce:
1. Install RHCS 2.4 and Upgrade to RHCS 3.0(rolling update) on containers.

Actual results:

failed: [magna035 -> magna035] (item=magna035) => {
    "changed": true, 
    "cmd": [
        "/tmp/restart_mon_daemon.sh"
    ], 
    "delta": "0:01:03.195151", 
    "end": "2017-10-10 15:40:32.327364", 
    "failed": true, 
    "invocation": {
        "module_args": {
            "_raw_params": "/tmp/restart_mon_daemon.sh", 
            "_uses_shell": false, 
            "chdir": null, 
            "creates": null, 
            "executable": null, 
            "removes": null, 
            "warn": true
        }
    }, 
    "item": "magna035", 
    "rc": 1, 
    "start": "2017-10-10 15:39:29.132213", 
    "stderr": "Error response from daemon: No such container: ceph-mon-magna035\n2017-10-10 15:39:41.250336 7efe3ba61700  0 librados: client.admin authentication error (1) Operation not permitted\n[errno 1] error connecting to the cluster\n2017-10-10 15:39:51.484886 7f0680f31700  0 librados: client.admin authentication error (1) Operation not permitted\n[errno 1] error connecting to the cluster\n2017-10-10 15:40:01.690645 7f1079095700  0 librados: client.admin authentication error (1) Operation not permitted\n[errno 1] error connecting to the cluster\n2017-10-10 15:40:11.891639 7fe8ca47f700  0 librados: client.admin authentication error (1) Operation not permitted\n[errno 1] error connecting to the cluster\n2017-10-10 15:40:22.109292 7f92d1ce6700  0 librados: client.admin authentication error (1) Operation not permitted\n[errno 1] error connecting to the cluster\n2017-10-10 15:40:32.311750 7f3c7c333700  0 librados: client.admin authentication error (1) Operation not permitted\n[errno 1] error connecting to the cluster", 
    "stderr_lines": [
        "Error response from daemon: No such container: ceph-mon-magna035", 
        "2017-10-10 15:39:41.250336 7efe3ba61700  0 librados: client.admin authentication error (1) Operation not permitted", 
        "[errno 1] error connecting to the cluster", 
        "2017-10-10 15:39:51.484886 7f0680f31700  0 librados: client.admin authentication error (1) Operation not permitted", 
        "[errno 1] error connecting to the cluster", 
        "2017-10-10 15:40:01.690645 7f1079095700  0 librados: client.admin authentication error (1) Operation not permitted", 
        "[errno 1] error connecting to the cluster", 
        "2017-10-10 15:40:11.891639 7fe8ca47f700  0 librados: client.admin authentication error (1) Operation not permitted", 
        "[errno 1] error connecting to the cluster", 
        "2017-10-10 15:40:22.109292 7f92d1ce6700  0 librados: client.admin authentication error (1) Operation not permitted", 
        "[errno 1] error connecting to the cluster", 
        "2017-10-10 15:40:32.311750 7f3c7c333700  0 librados: client.admin authentication error (1) Operation not permitted", 
        "[errno 1] error connecting to the cluster"
    ], 
    "stdout": "Error with quorum.\ncluster status:", 
    "stdout_lines": [
        "Error with quorum.", 
        "cluster status:"
    ]
}

The admin keyring on monitor being upgraded is altered.

Admin keyring on mon1(which failed):
[client.admin]
	key = AQAlE9tZV1FlLxAA6fyOfYRJS0VS5KlRhU1CAw==

Admin keyring on rest of the mons:
cat /etc/ceph/slave.client.admin.keyring
[client.admin]
	key = AQDajtxZ/M/mIBAA6+9fzxec1EfJvaeWgD5Dig==
	auid = 0
	caps mds = "allow"
	caps mon = "allow *"
	caps osd = "allow *"

I have attached complete upgrade logs.

Expected results:


Additional info:

Comment 2 Sébastien Han 2017-10-11 10:40:04 UTC

Looking at your logs, I see that this key: /home/ubuntu/2/bb1353bb-5b01-4664-97b6-7eda6645d54f//etc/ceph/slave.client.admin.keyring was copied to the node.
Is this key the key from the initial play? Is the content of your fetch directory (/home/ubuntu/2/bb1353bb-5b01-4664-97b6-7eda6645d54f/) correct? Isn't this a leftover?

Thanks.

Comment 3 Parikshith 2017-10-11 18:24:50 UTC

I created and maintained the same fetch directory(/home/ubuntu/2) while installing(2.4) and upgrading to 3.0.

Comment 4 Sébastien Han 2017-10-11 21:13:05 UTC

Not sure how we can take it from here, can you provide an env or perhaps the env where you saw that issue? I'll be easier for me to diagnose.

Thanks.

Comment 6 Sébastien Han 2017-10-13 18:19:18 UTC

Thanks!

Comment 10 Parikshith 2017-10-17 11:24:25 UTC

Created attachment 1339667 [details]
upgrade log

Comment 11 Sébastien Han 2017-10-17 16:16:24 UTC

Parikshith thanks for trying again, this really looks like a timing issue to me. During the first command it seems the container is not present but it should since the previous task shows all the mon being in quorum. So the first command of the loop (set osd flags) fails but the second two works.

As agreed, on IRC, let's get into a BJ and redo the procedure together.

Comment 14 Sébastien Han 2017-10-18 16:25:58 UTC

Thanks Parikshith, work is in progress.

Comment 15 Sébastien Han 2017-10-19 17:22:06 UTC

Created attachment 1340849 [details]
ceph-ansible-container-update-2.4-3.0

Comment 16 Sébastien Han 2017-10-19 17:23:51 UTC

This work is almost complete; I've attached the logs of a successful upgrade.
I had trouble with the MGR since FW wasn't open. From now on, please make sure 6800 is opened on monitor machine.
Thanks.

Comment 17 Sébastien Han 2017-10-20 11:57:46 UTC

Created attachment 1341215 [details]
upgrade-2.4-to-3.0-container

Comment 18 Sébastien Han 2017-10-20 12:03:59 UTC

Patch has merged upstream, I just tagged v3.0.4, see: https://github.com/ceph/ceph-ansible/releases/tag/v3.0.4

This will be included in v3.0.4.
Thanks.

Comment 19 Sébastien Han 2017-10-20 12:05:38 UTC

Btw, I'm still borrowing your machines for some testings.
Thanks, you can re-use them by next Monday.

Comment 27 errata-xmlrpc 2017-12-05 23:47:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387

Note You need to log in before you can comment on or make changes to this bug.