1889668 – [RADOS]: Getting ambiguity messages by running enable_stretch_mode

Bug 1889668 - [RADOS]: Getting ambiguity messages by running enable_stretch_mode

Summary: [RADOS]: Getting ambiguity messages by running enable_stretch_mode

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	4.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.2
Assignee:	Greg Farnum
QA Contact:	Manohar Murthy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-10-20 10:59 UTC by skanta
Modified:	2021-01-12 14:58 UTC (History)
CC List:	13 users (show)
Fixed In Version:	ceph-14.2.11-76.el8cp, ceph-14.2.11-76.el7cp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-01-12 14:58:00 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:0081	0	None	None	None	2021-01-12 14:58:46 UTC

Description skanta 2020-10-20 10:59:39 UTC

Description of problem:After executing the  enable_stretch_mode  command twice getting different messages.

[root@ceph-bharath-1603163288685-node1-monmgrinstaller cephuser]# /bin/ceph mon enable_stretch_mode ceph-bharath-1603163288685-node6-mon stretch_rule datacenter

Error EINVAL: the 2 datacenterinstances in the cluster have differing weights 5372 and 4062 but stretch mode currently requires they be the same!

[root@ceph-bharath-1603163288685-node1-monmgrinstaller cephuser]# /bin/ceph mon enable_stretch_mode ceph-bharath-1603163288685-node6-mon stretch_rule datacenter

stretch mode currently committing

At first time when executing the command getting the Error message and second time when I ran getting message as stretch mode is currently committing.



Version-Release number of selected component (if applicable):
[root@ceph-bharath-1603163288685-node1-monmgrinstaller cephuser]# ceph versions
{
    "mon": {
        "ceph version 14.2.11-55.el8cp (a88999020b8767a4c384efbc8f9c061e95e78051) nautilus (stable)": 3
    },
    "mgr": {
        "ceph version 14.2.11-55.el8cp (a88999020b8767a4c384efbc8f9c061e95e78051) nautilus (stable)": 1
    },
    "osd": {
        "ceph version 14.2.11-55.el8cp (a88999020b8767a4c384efbc8f9c061e95e78051) nautilus (stable)": 28
    },
    "mds": {},
    "overall": {
        "ceph version 14.2.11-55.el8cp (a88999020b8767a4c384efbc8f9c061e95e78051) nautilus (stable)": 32
    }
}



How reproducible:


Steps to Reproduce:

bin/ceph config set osd osd_crush_update_on_start false
/bin/ceph osd crush move osd.0 host=host1-1 datacenter=site1
/bin/ceph osd crush move osd.1 host=host1-2 datacenter=site1
/bin/ceph osd crush move osd.2 host=host2-1 datacenter=site2
/bin/ceph osd crush move osd.3 host=host2-2 datacenter=site2
/bin/ceph osd getcrushmap > crush.map.bin
/bin/crushtool -d crush.map.bin -o crush.map.txt
cat <<EOF >> crush.map.txt
rule stretch_rule {
        id 1
        type replicated
        min_size 1
        max_size 10
        step take site1
        step chooseleaf firstn 2 type host
        step emit
        step take site2
        step chooseleaf firstn 2 type host
        step emit
}
EOF


/bin/crushtool -c crush.map.txt -o crush2.map.bin
/bin/ceph osd setcrushmap -i crush2.map.bin
/bin/ceph mon set election_strategy connectivity
/bin/ceph mon set_location a datacenter=site1
/bin/ceph mon set_location ceph-bharath-1601914071234-node6-mon datacenter=site1
/bin/ceph mon set_location ceph-bharath-1601914071234-node1-monmgrinstaller datacenter=site2
/bin/ceph mon set_location ceph-bharath-1601914071234-node2-mon datacenter=site3
/bin/ceph osd pool create test_stretch1 8 8 replicated

/bin/ceph mon enable_stretch_mode ceph-bharath-1601914071234-node2-mon stretch_rule datacenter

Actual results:
1.Getting error message as first time
2.cluster committing in to stretch mode 

Expected results:

Need to show proper message 

Additional info:

Comment 1 Greg Farnum 2020-10-23 16:32:11 UTC

Ah yep, this code was erroneously committing changes when it was meant to be testing validity, so things ended up in a weird half-state. Patch in progress upstream.

Comment 2 Yaniv Kaul 2020-11-05 08:46:19 UTC

(In reply to Greg Farnum from comment #1)
> Ah yep, this code was erroneously committing changes when it was meant to be
> testing validity, so things ended up in a weird half-state. Patch in
> progress upstream.

Any updates? can you post a link to the upstream patch?

Comment 3 Greg Farnum 2020-11-15 23:29:27 UTC

Fixed in ceph-4.2-rhel-patches branch.

Comment 9 errata-xmlrpc 2021-01-12 14:58:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0081

Note You need to log in before you can comment on or make changes to this bug.