Bug 1563621

Summary: [RADOS]:- Snapset inconsistency is hard to diagnose because authoritative copy used by list-inconsistent-snapset not shown
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Parikshith <pbyregow>
Component: RADOSAssignee: Josh Durgin <jdurgin>
Status: CLOSED CURRENTRELEASE QA Contact: Manohar Murthy <mmurthy>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.0CC: ceph-eng-bugs, dzafman, kchai, nojha
Target Milestone: rc   
Target Release: 4.*   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-29 22:04:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Parikshith 2018-04-04 10:30:34 UTC
Description of problem:
Opening bz as per https://bugzilla.redhat.com/show_bug.cgi?id=1544680#c13


$ sudo rados list-inconsistent-snapset 3.7f
{"epoch":79,"inconsistents":[]}

$ sudo rados list-inconsistent-obj 3.7f --format=json-pretty
{
    "epoch": 79,
    "inconsistents": [
        {
            "object": {
                "name": "obj1",
                "nspace": "",
                "locator": "",
                "snap": "head",
                "version": 13
            },
            "errors": [
                "snapset_inconsistency" 
            ],
            "union_shard_errors": [],
            "selected_object_info": "3:ff7b1f36:::obj1:head(73'13 client.4471.0:1 dirty|data_digest|omap_digest s 1682 uv 13 dd 735b0743 od ffffffff alloc_hint [0 0 0])",
            "shards": [
                {
                    "osd": 1,
                    "primary": false,
                    "errors": [],
                    "size": 1682,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x735b0743",
                    "snapset": "0=[]:[]+stray_clone_snaps={1=[1],2=[2],3=[3],4=[4],5=[5],6=[6]}" 
                },
                {
                    "osd": 6,
                    "primary": true,
                    "errors": [],
                    "size": 1682,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x735b0743",
                    "snapset": "6=[6,5,4,3,2,1]:{1=[1],2=[2],3=[3],4=[4],5=[5],6=[6]}" 
                },
                {
                    "osd": 8,
                    "primary": false,
                    "errors": [],
                    "size": 1682,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x735b0743",
                    "snapset": "6=[6,5,4,3,2,1]:{1=[1],2=[2],3=[3],4=[4],5=[5],6=[6]}" 
                }
            ]
        }
    ]
}
For now the user would have to increase the debug_osd log level and examine the osd logs to find the selected authoritative copy for a specific object. With 2 or different snapsets we could make it more complex by showing the snapshot results using each snapset for comparison or easier would be to indicate which is the authoritative copy. The existing code in PG::scrub_compare_maps() doesn't pass enough information to PrimaryLogPG::scrub_snapshot_metadata() for it to see both snapset variants or know which shard it is using.

Additional info:

Comment 3 Giridhar Ramaraju 2019-08-05 13:08:43 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 4 Giridhar Ramaraju 2019-08-05 13:10:08 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri