Bug 1378549 - RHCS 1.3: Upgrading 0.94.6 -> 0.94.9 saturating mon node networking
Summary: RHCS 1.3: Upgrading 0.94.6 -> 0.94.9 saturating mon node networking
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 1.3.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 1.3.4
Assignee: Kefu Chai
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-22 17:38 UTC by Mike Hackett
Modified: 2019-12-16 06:52 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1379027 (view as bug list)
Environment:
Last Closed: 2018-01-31 04:18:58 UTC
Embargoed:


Attachments (Terms of Use)
Network spike on dnvrco01-cephmon-001 during upgrade (126.09 KB, image/png)
2016-09-22 17:38 UTC, Mike Hackett
no flags Details
Same network spike with packets/sec also included. (142.19 KB, image/png)
2016-09-22 17:38 UTC, Mike Hackett
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 17386 0 None None None 2016-09-22 17:38:01 UTC
Red Hat Knowledge Base (Solution) 2726351 0 None None None 2016-10-25 14:55:04 UTC

Description Mike Hackett 2016-09-22 17:38:02 UTC
Created attachment 1203864 [details]
Network spike on dnvrco01-cephmon-001 during upgrade

Description of problem:
When attempting to upgrade a Ceph cluster from 94.6 to 94.9 a serious performance issue is seen every time an OSD is restarted in large clusters. 

The monitors are already upgraded and running 94.9, when restarting the OSD's as part of the upgrade it causes several minutes of network saturation on all three monitor nodes. This causes thousands of slow requests.

Initially monitor logs were flooded with the following messages:

2016-09-14 15:51:12.174478 osd.405 24.161.248.95:6805/41332 329 : cluster [WRN] failed to encode map e727238 with expected crc
2016-09-14 15:51:12.174635 osd.220 24.161.248.119:6816/92203 301 : cluster [WRN] failed to encode map e727238 with expected crc
2016-09-14 15:51:12.178740 osd.872 24.161.248.104:6816/235917 55 : cluster [WRN] failed to encode map e727238 with expected crc

But 'clog_to_monitors false' was set and this is no longer occuring but network still gets saturated during restarts of OSD's.

Above issue is discussed on the following community thread:
http://ceph-users.ceph.narkive.com/rPGrATpE/v0-94-7-hammer-released

It appears that starting with 0.94.7 that the osdmap encoding changed (which was unexpected by developers). When this happens all the 0.94.6 OSDs report the crc problem back to the mons, but the newer 0.94.9 OSDs don't.

Ceph users list discussion on this current issue:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013216.html

Current theory is that downrev OSD's appear to be continually pulling osdmaps from the upgraded mons.

- Opening Downstream Bugzilla as it appears an upgraded from 1.3.2 to RHCS 2.0 on large clusters may also be susceptible to this issue.

Version-Release number of selected component (if applicable):
1.3.2


Additional info:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013216.html
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg30783.html

Comment 3 Mike Hackett 2016-09-22 17:38:42 UTC
Created attachment 1203865 [details]
Same network spike with packets/sec also included.

Comment 7 Mike Hackett 2016-09-23 14:50:34 UTC
Upstream tracker 17386 updated with the following details:

    "It appears that starting with 0.94.7 that the osdmap encoding changed (which was unexpected by developers"

the CRC mismatch warning is expected:

pg_pool_t is a field in OSDMap::Incremental, and OSDMap itself. in 0.94.6, pg_pool_t is encoded with v17 scheme, while in 0.94.9, this structure is encoded using v21. after upgrade, the monitors encode the (inc) osdmap using the new scheme, while OSD running 0.94.6 is still re-encoding the full osdmap using the v17, and then compare the crc of the re-encoded full map with the crc of the original fullmap encoded using v21. that's why the CRCs mismatch.

in a large cluster, resending the fullmap could be burden to monitor and saturates the cluster network. maybe we can have

    we do have the machinery to re-encode osdmap for old client. but we need to do this explicitly, i.e.
        add CEPH_FEATURE_RESERVED (the non-exist feature bit) to the feature bits
        encode the MOSDMap message in OSDMonitor::send_incremental() before sending it down to messenger, which will just put the pre-encoded incremental maps and full maps into the payload buffer. (downside: larger memory foot print)
    or, we can add an option to disable the crc checking (or full map upon CRC mismatch) on the OSD side. so we can disable it at run-time at seeing the performance degradation due to this problem. (downside: yet another knob)

Comment 28 Kefu Chai 2016-09-30 03:16:03 UTC
please note, upon completion of the upgrade of the cluster after installing monitor with the hotfix, user can opt to rollback to the monitor without the fix, or just keep the hotfix version. and the fix only kicks in if the peer does not have the GMT_HITSET feature bit.


Note You need to log in before you can comment on or make changes to this bug.