Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1600943 - [cee/sd] upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set sortnibblewise
[cee/sd] upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set sortnibb...
Status: CLOSED ERRATA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible (Show other bugs)
3.0
Unspecified Unspecified
medium Severity medium
: rc
: 3.1
Assigned To: leseb
subhash
John Brier
:
Depends On:
Blocks: 1584264
  Show dependency treegraph
 
Reported: 2018-07-13 08:54 EDT by Tomas Petr
Modified: 2018-09-26 14:23 EDT (History)
14 users (show)

See Also:
Fixed In Version: RHEL: ceph-ansible-3.1.0-0.1.rc21.el7cp Ubuntu: ceph-ansible_3.1.0~rc21-2redhat1
Doc Type: Bug Fix
Doc Text:
.Upgrading {product} 2 to version 3 will set the `sortbitwise` option properly Previously, a rolling upgrade from {product} 2 to {product} 3 would fail because the OSDs would never initialize. This is because `sortbitwise` was not properly set by Ceph Ansible. With this release, Ceph Ansible sets `sortbitwise` properly, so the ODSs can start.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-09-26 14:22:32 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Github ceph/ceph-ansible/pull/2914 None None None 2018-07-23 08:58 EDT
Github ceph/ceph-ansible/pull/3047 None None None 2018-08-21 05:18 EDT
Red Hat Product Errata RHBA-2018:2819 None None None 2018-09-26 14:23 EDT

  None (edit)
Description Tomas Petr 2018-07-13 08:54:26 EDT
Description of problem:
upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set sortnibblewise,
it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will not start if nibblewise is set.

running "ceph osd set sortbitwise" will fix this.

The ceph-ansible playbook could check this and fail in prerequisites check or set it itself

Version-Release number of selected component (if applicable):
RHCS 3
ceph-ansible-3.0.39

How reproducible:
always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 4 leseb 2018-08-08 11:06:09 EDT
since .40 sorry
Comment 5 leseb 2018-08-08 11:07:36 EDT
and v3.1.0rc14
Comment 10 subhash 2018-08-21 03:26:33 EDT
verified the issue with the following steps

1. Deployed ceph2.5 cluster(10.2.10-28 ) and unset sortbitwise flag
2. Upgraded ceph-ansible to 3.1 and the cluster through rolling_update.yml

rolling_updating.yml fail at the below task,moving the bz to assigned state
TASK [waiting for clean pgs...] ******************************************************************************************
task path: /usr/share/ceph-ansible/rolling_update.yml:411
Tuesday 21 August 2018  06:42:51 +0000 (0:00:00.655)       0:12:40.336 ******** 
Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
<magna021> ESTABLISH SSH CONNECTION FOR USER: None
<magna021> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/%h-%r-%p magna021 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-dbyoistppgehsreyyylnqrtbmvbmpcsb; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<magna021> (0, '\n{"changed": true, "end": "2018-08-21 06:42:51.835425", "stdout": "\\n{\\"fsid\\":\\"eb52f84f-c6eb-4820-8a0a-4f9cc84e20ae\\",\\"health\\":{\\"summary\\":[{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"64 pgs degraded\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"64 pgs stuck degraded\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"64 pgs stuck unclean\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"64 pgs stuck undersized\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"64 pgs undersized\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"1 host (3 osds) down\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"3 osds down\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"noout,noscrub,nodeep-scrub flag(s) set\\"},{\\"severity\\":\\"HEALTH_WARN\\",\\"summary\\":\\"no legacy OSD present but \'sortbitwise\' flag is not set\\"}],\\"overall_status\\":\\"HEALTH_WARN\\",\\"detail\\":[]},\\"election_epoch\\":7,\\"quorum\\":[0],\\"quorum_names\\":[\\"magna021\\"],\\"monmap\\":{\\"epoch\\":2,\\"fsid\\":\\"eb52f84f-c6eb-4820-8a0a-4f9cc84e20ae\\",\\"modified\\":\\"2018-08-21 06:34:29.260101\\",\\"created\\":\\"2018-08-21 05:43:32.108976\\",\\"features\\":{\\"persistent\\":[\\"kraken\\",\\"luminous\\"],\\"optional\\":[]},\\"mons\\":[{\\"rank\\":0,\\"name\\":\\"magna021\\",\\"addr\\":\\"10.8.128.21:6789/0\\",\\"public_addr\\":\\"10.8.128.21:6789/0\\"}]},\\"osdmap\\":{\\"osdmap\\":{\\"epoch\\":44,\\"num_osds\\":9,\\"num_up_osds\\":6,\\"num_in_osds\\":9,\\"full\\":false,\\"nearfull\\":false,\\"num_remapped_pgs\\":0}},\\"pgmap\\":{\\"pgs_by_state\\":[{\\"state_name\\":\\"active+undersized+degraded\\",\\"count\\":64}],\\"num_pgs\\":64,\\"num_pools\\":1,\\"num_objects\\":0,\\"data_bytes\\":0,\\"bytes_used\\":1018404864,\\"bytes_avail\\":8948088016896,\\"bytes_total\\":8949106421760},\\"fsmap\\":{\\"epoch\\":1,\\"by_rank\\":[]},\\"mgrmap\\":{\\"epoch\\":60,\\"active_gid\\":14113,\\"active_name\\":\\"magna021\\",\\"active_addr\\":\\"10.8.128.21:6812/185260\\",\\"available\\":true,\\"standbys\\":[],\\"modules\\":[\\"status\\"],\\"available_modules\\":[\\"balancer\\",\\"dashboard\\",\\"influx\\",\\"localpool\\",\\"prometheus\\",\\"restful\\",\\"selftest\\",\\"status\\",\\"zabbix\\"],\\"services\\":{}},\\"servicemap\\":{\\"epoch\\":1,\\"modified\\":\\"0.000000\\",\\"services\\":{}}}", "cmd": ["ceph", "--cluster", "ceph", "-s", "--format", "json"], "rc": 0, "start": "2018-08-21 06:42:51.484577", "stderr": "", "delta": "0:00:00.350848", "invocation": {"module_args": {"warn": true, "executable": null, "_uses_shell": false, "_raw_params": " ceph --cluster ceph -s --format json", "removes": null, "creates": null, "chdir": null, "stdin": null}}}\n', 'OpenSSH_7.4p1, OpenSSL 1.0.2k-fips  26 Jan 2017\r\ndebug1: Reading configuration data /root/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 8: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 167277\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n')
FAILED - RETRYING: waiting for clean pgs... (40 retries left).Result was: {
    "attempts": 1, 
    "changed": true, 
    "cmd": [
        "ceph", 
        "--cluster", 
        "ceph", 
        "-s", 
        "--format", 
        "json"
    ], 
    "delta": "0:00:00.350848", 
    "end": "2018-08-21 06:42:51.835425", 
    "failed": false, 
    "invocation": {
        "module_args": {
            "_raw_params": " ceph --cluster ceph -s --format json", 
            "_uses_shell": false, 
            "chdir": null, 
            "creates": null, 
            "executable": null, 
            "removes": null, 
            "stdin": null, 
            "warn": true
        }
    }, 
    "rc": 0, 
    "retries": 41, 
    "start": "2018-08-21 06:42:51.484577", 
    "stderr": "", 
    "stderr_lines": [], 
 "stdout": "\n{\"fsid\":\"eb52f84f-c6eb-4820-8a0a-4f9cc84e20ae\",\"health\":{\"summary\":[{\"severity\":\"HEALTH_WARN\",\"summary\":\"64 pgs degraded\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"64 pgs stuck degraded\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"64 pgs stuck unclean\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"64 pgs stuck undersized\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"64 pgs undersized\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"1 host (3 osds) down\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"3 osds down\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"noout,noscrub,nodeep-scrub flag(s) set\"},{\"severity\":\"HEALTH_WARN\",\"summary\":\"no legacy OSD present but 'sortbitwise' flag is not set\"}],\"overall_status\":\"HEALTH_WARN\",\"detail\":[]},\"election_epoch\":7,\"quorum\":[0],\"quorum_names\":[\"magna021\"],\"monmap\":{\"epoch\":2,\"fsid\":\"eb52f84f-c6eb-4820-8a0a-4f9cc84e20ae\",\"modified\":\"2018-08-21 06:34:29.260101\",\"created\":\"2018-08-21 05:43:32.108976\",\"features\":{\"persistent\":[\"kraken\",\"luminous\"],\"optional\":[]},\"mons\":[{\"rank\":0,\"name\":\"magna021\",\"addr\":\"10.8.128.21:6789/0\",\"public_addr\":\"10.8.128.21:6789/0\"}]},\"osdmap\":{\"osdmap\":{\"epoch\":44,\"num_osds\":9,\"num_up_osds\":6,\"num_in_osds\":9,\"full\":false,\"nearfull\":false,\"num_remapped_pgs\":0}},\"pgmap\":{\"pgs_by_state\":[{\"state_name\":\"active+undersized+degraded\",\"count\":64}],\"num_pgs\":64,\"num_pools\":1,\"num_objects\":0,\"data_bytes\":0,\"bytes_used\":1018404864,\"bytes_avail\":8948088016896,\"bytes_total\":8949106421760},\"fsmap\":{\"epoch\":1,\"by_rank\":[]},\"mgrmap\":{\"epoch\":60,\"active_gid\":14113,\"active_name\":\"magna021\",\"active_addr\":\"10.8.128.21:6812/185260\",\"available\":true,\"standbys\":[],\"modules\":[\"status\"],\"available_modules\":[\"balancer\",\"dashboard\",\"influx\",\"localpool\",\"prometheus\",\"restful\",\"selftest\",\"status\",\"zabbix\"],\"services\":{}},\"servicemap\":{\"epoch\":1,\"modified\":\"0.000000\",\"services\":{}}}",
Comment 12 leseb 2018-08-21 05:13:34 EDT
Can you tell why your OSDs did not get updated?
It seems they got the right version of the package but they report running ceph version 10.2.10-28.el7cp after restart.

Can I access the env and run one test?
I believe the OSD cannot start if 'sortbitwise' flag is not set thus it won't report its newer version.

I just need to trigger the command manually and see if the OSD start reporting their actually new version after this, because at the moment they report their old version.

Thanks.
Comment 14 Ken Dreyer (Red Hat) 2018-08-21 17:23:13 EDT
https://github.com/ceph/ceph-ansible/pull/3047 was backported to the stable-3.1 branch today, and we need an upstream tag for stable-3.1 with this change.
Comment 19 subhash 2018-08-23 07:25:22 EDT
verified with following version :ceph-ansible-3.1.0-0.1.rc21.el7cp

steps:

1. Deployed ceph2.5 cluster(10.2.10-28 ) and unset sortbitwise flag
2. Upgraded ceph-ansible to 3.1 and the cluster through rolling_update.yml

Upgrade playbook ran fine and the PG's are is active+clean state.Moving to verified state
Comment 22 errata-xmlrpc 2018-09-26 14:22:32 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819

Note You need to log in before you can comment on or make changes to this bug.