Bug 1600943 - [cee/sd] upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set sortnibblewise
Summary: [cee/sd] upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set sortnibb...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 3.1
Assignee: Sébastien Han
QA Contact: subhash
John Brier
URL:
Whiteboard:
Depends On:
Blocks: 1584264
TreeView+ depends on / blocked
 
Reported: 2018-07-13 12:54 UTC by Tomas Petr
Modified: 2021-09-09 15:02 UTC (History)
14 users (show)

Fixed In Version: RHEL: ceph-ansible-3.1.0-0.1.rc21.el7cp Ubuntu: ceph-ansible_3.1.0~rc21-2redhat1
Doc Type: Bug Fix
Doc Text:
.Upgrading {product} 2 to version 3 will set the `sortbitwise` option properly Previously, a rolling upgrade from {product} 2 to {product} 3 would fail because the OSDs would never initialize. This is because `sortbitwise` was not properly set by Ceph Ansible. With this release, Ceph Ansible sets `sortbitwise` properly, so the ODSs can start.
Clone Of:
Environment:
Last Closed: 2018-09-26 18:22:32 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 2914 0 None None None 2018-07-23 12:58:21 UTC
Github ceph ceph-ansible pull 3047 0 None None None 2018-08-21 09:18:42 UTC
Red Hat Issue Tracker RHCEPH-1636 0 None None None 2021-09-09 15:02:54 UTC
Red Hat Product Errata RHBA-2018:2819 0 None None None 2018-09-26 18:23:15 UTC

Description Tomas Petr 2018-07-13 12:54:26 UTC
Description of problem:
upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set sortnibblewise,
it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will not start if nibblewise is set.

running "ceph osd set sortbitwise" will fix this.

The ceph-ansible playbook could check this and fail in prerequisites check or set it itself

Version-Release number of selected component (if applicable):
RHCS 3
ceph-ansible-3.0.39

How reproducible:
always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Sébastien Han 2018-08-08 15:05:31 UTC
In https://github.com/ceph/ceph-ansible/releases/tag/v3.0.41

Comment 4 Sébastien Han 2018-08-08 15:06:09 UTC
since .40 sorry

Comment 5 Sébastien Han 2018-08-08 15:07:36 UTC
and v3.1.0rc14

Comment 10 subhash 2018-08-21 07:26:33 UTC
verified the issue with the following steps

1. Deployed ceph2.5 cluster(10.2.10-28 ) and unset sortbitwise flag
2. Upgraded ceph-ansible to 3.1 and the cluster through rolling_update.yml

rolling_updating.yml fail at the below task,moving the bz to assigned state
TASK [waiting for clean pgs...] ******************************************************************************************
task path: /usr/share/ceph-ansible/rolling_update.yml:411
Tuesday 21 August 2018  06:42:51 +0000 (0:00:00.655)       0:12:40.336 ******** 
Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
<magna021> ESTABLISH SSH CONNECTION FOR USER: None
<magna021> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/%h-%r-%p magna021 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-dbyoistppgehsreyyylnqrtbmvbmpcsb; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<magna021> (0, '\n{"changed": true, "end": "2018-08-21 06:42:51.835425", "stdout": "\\n{\\"fsid\\":\\"eb52f84f-c6eb-4820-8a0a-4f9cc84e20ae\\",\\"health\\":{\\"summary\\":[{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"64 pgs degraded\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"64 pgs stuck degraded\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"64 pgs stuck unclean\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"64 pgs stuck undersized\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"64 pgs undersized\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"1 host (3 osds) down\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"3 osds down\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"noout,noscrub,nodeep-scrub flag(s) set\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"no legacy OSD present but \'sortbitwise\' flag is not set\\"}],\\"overall_status\\":\\"HEALTH_WARN\\",\\"detail\\":[]},\\"election_epoch\\":7,\\"quorum\\":[0],\\"quorum_names\\":[\\"magna021\\"],\\"monmap\\":{\\"epoch\\":2,\\"fsid\\":\\"eb52f84f-c6eb-4820-8a0a-4f9cc84e20ae\\",\\"modified\\":\\"2018-08-21 06:34:29.260101\\",\\"created\\":\\"2018-08-21 05:43:32.108976\\",\\"features\\":{\\"persistent\\":[\\"kraken\\",\\"luminous\\"],\\"optional\\":[]},\\"mons\\":[{\\"rank\\":0,\\"name\\":\\"magna021\\",\\"addr\\":\\"10.8.128.21:6789/0\\",\\"public_addr\\":\\"10.8.128.21:6789/0\\"}]},\\"osdmap\\":{\\"osdmap\\":{\\"epoch\\":44,\\"num_osds\\":9,\\"num_up_osds\\":6,\\"num_in_osds\\":9,\\"full\\":false,\\"nearfull\\":false,\\"num_remapped_pgs\\":0}},\\"pgmap\\":{\\"pgs_by_state\\":[{\\"state_name\\":\\"active+undersized+degraded\\",\\"count\\":64}],\\"num_pgs\\":64,\\"num_pools\\":1,\\"num_objects\\":0,\\"data_bytes\\":0,\\"bytes_used\\":1018404864,\\"bytes_avail\\":8948088016896,\\"bytes_total\\":8949106421760},\\"fsmap\\":{\\"epoch\\":1,\\"by_rank\\":[]},\\"mgrmap\\":{\\"epoch\\":60,\\"active_gid\\":14113,\\"active_name\\":\\"magna021\\",\\"active_addr\\":\\"10.8.128.21:6812/185260\\",\\"available\\":true,\\"standbys\\":[],\\"modules\\":[\\"status\\"],\\"available_modules\\":[\\"balancer\\",\\"dashboard\\",\\"influx\\",\\"localpool\\",\\"prometheus\\",\\"restful\\",\\"selftest\\",\\"status\\",\\"zabbix\\"],\\"services\\":{}},\\"servicemap\\":{\\"epoch\\":1,\\"modified\\":\\"0.000000\\",\\"services\\":{}}}", "cmd": ["ceph", "--cluster", "ceph", "-s", "--format", "json"], "rc": 0, "start": "2018-08-21 06:42:51.484577", "stderr": "", "delta": "0:00:00.350848", "invocation": {"module_args": {"warn": true, "executable": null, "_uses_shell": false, "_raw_params": " ceph --cluster ceph -s --format json", "removes": null, "creates": null, "chdir": null, "stdin": null}}}\n', 'OpenSSH_7.4p1, OpenSSL 1.0.2k-fips  26 Jan 2017\r\ndebug1: Reading configuration data /root/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 8: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 167277\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n')
FAILED - RETRYING: waiting for clean pgs... (40 retries left).Result was: {
    "attempts": 1, 
    "changed": true, 
    "cmd": [
        "ceph", 
        "--cluster", 
        "ceph", 
        "-s", 
        "--format", 
        "json"
    ], 
    "delta": "0:00:00.350848", 
    "end": "2018-08-21 06:42:51.835425", 
    "failed": false, 
    "invocation": {
        "module_args": {
            "_raw_params": " ceph --cluster ceph -s --format json", 
            "_uses_shell": false, 
            "chdir": null, 
            "creates": null, 
            "executable": null, 
            "removes": null, 
            "stdin": null, 
            "warn": true
        }
    }, 
    "rc": 0, 
    "retries": 41, 
    "start": "2018-08-21 06:42:51.484577", 
    "stderr": "", 
    "stderr_lines": [], 
 "stdout": "\n{\"fsid\":\"eb52f84f-c6eb-4820-8a0a-4f9cc84e20ae\",\"health\":{\"summary\":[{\"severity\":\"HEALTH_WARN\",\"summary\":\"64 pgs degraded\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"64 pgs stuck degraded\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"64 pgs stuck unclean\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"64 pgs stuck undersized\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"64 pgs undersized\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"1 host (3 osds) down\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"3 osds down\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"noout,noscrub,nodeep-scrub flag(s) set\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"no legacy OSD present but 'sortbitwise' flag is not set\"}],\"overall_status\":\"HEALTH_WARN\",\"detail\":[]},\"election_epoch\":7,\"quorum\":[0],\"quorum_names\":[\"magna021\"],\"monmap\":{\"epoch\":2,\"fsid\":\"eb52f84f-c6eb-4820-8a0a-4f9cc84e20ae\",\"modified\":\"2018-08-21 06:34:29.260101\",\"created\":\"2018-08-21 05:43:32.108976\",\"features\":{\"persistent\":[\"kraken\",\"luminous\"],\"optional\":[]},\"mons\":[{\"rank\":0,\"name\":\"magna021\",\"addr\":\"10.8.128.21:6789/0\",\"public_addr\":\"10.8.128.21:6789/0\"}]},\"osdmap\":{\"osdmap\":{\"epoch\":44,\"num_osds\":9,\"num_up_osds\":6,\"num_in_osds\":9,\"full\":false,\"nearfull\":false,\"num_remapped_pgs\":0}},\"pgmap\":{\"pgs_by_state\":[{\"state_name\":\"active+undersized+degraded\",\"count\":64}],\"num_pgs\":64,\"num_pools\":1,\"num_objects\":0,\"data_bytes\":0,\"bytes_used\":1018404864,\"bytes_avail\":8948088016896,\"bytes_total\":8949106421760},\"fsmap\":{\"epoch\":1,\"by_rank\":[]},\"mgrmap\":{\"epoch\":60,\"active_gid\":14113,\"active_name\":\"magna021\",\"active_addr\":\"10.8.128.21:6812/185260\",\"available\":true,\"standbys\":[],\"modules\":[\"status\"],\"available_modules\":[\"balancer\",\"dashboard\",\"influx\",\"localpool\",\"prometheus\",\"restful\",\"selftest\",\"status\",\"zabbix\"],\"services\":{}},\"servicemap\":{\"epoch\":1,\"modified\":\"0.000000\",\"services\":{}}}",

Comment 12 Sébastien Han 2018-08-21 09:13:34 UTC
Can you tell why your OSDs did not get updated?
It seems they got the right version of the package but they report running ceph version 10.2.10-28.el7cp after restart.

Can I access the env and run one test?
I believe the OSD cannot start if 'sortbitwise' flag is not set thus it won't report its newer version.

I just need to trigger the command manually and see if the OSD start reporting their actually new version after this, because at the moment they report their old version.

Thanks.

Comment 14 Ken Dreyer (Red Hat) 2018-08-21 21:23:13 UTC
https://github.com/ceph/ceph-ansible/pull/3047 was backported to the stable-3.1 branch today, and we need an upstream tag for stable-3.1 with this change.

Comment 19 subhash 2018-08-23 11:25:22 UTC
verified with following version :ceph-ansible-3.1.0-0.1.rc21.el7cp

steps:

1. Deployed ceph2.5 cluster(10.2.10-28 ) and unset sortbitwise flag
2. Upgraded ceph-ansible to 3.1 and the cluster through rolling_update.yml

Upgrade playbook ran fine and the PG's are is active+clean state.Moving to verified state

Comment 22 errata-xmlrpc 2018-09-26 18:22:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819


Note You need to log in before you can comment on or make changes to this bug.