Bug 1563621 - [RADOS]:- Snapset inconsistency is hard to diagnose because authoritative copy used by list-inconsistent-snapset not shown
Summary: [RADOS]:- Snapset inconsistency is hard to diagnose because authoritative cop...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS
Version: 3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 4.*
Assignee: Josh Durgin
QA Contact: Manohar Murthy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-04 10:30 UTC by Parikshith
Modified: 2020-04-29 22:04 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-29 22:04:15 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 23428 0 None None None 2018-04-04 10:30:33 UTC
Github ceph ceph pull 20947 0 None closed Special scrub handling of hinfo_key errors 2021-02-01 05:34:36 UTC

Description Parikshith 2018-04-04 10:30:34 UTC
Description of problem:
Opening bz as per https://bugzilla.redhat.com/show_bug.cgi?id=1544680#c13


$ sudo rados list-inconsistent-snapset 3.7f
{"epoch":79,"inconsistents":[]}

$ sudo rados list-inconsistent-obj 3.7f --format=json-pretty
{
    "epoch": 79,
    "inconsistents": [
        {
            "object": {
                "name": "obj1",
                "nspace": "",
                "locator": "",
                "snap": "head",
                "version": 13
            },
            "errors": [
                "snapset_inconsistency" 
            ],
            "union_shard_errors": [],
            "selected_object_info": "3:ff7b1f36:::obj1:head(73'13 client.4471.0:1 dirty|data_digest|omap_digest s 1682 uv 13 dd 735b0743 od ffffffff alloc_hint [0 0 0])",
            "shards": [
                {
                    "osd": 1,
                    "primary": false,
                    "errors": [],
                    "size": 1682,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x735b0743",
                    "snapset": "0=[]:[]+stray_clone_snaps={1=[1],2=[2],3=[3],4=[4],5=[5],6=[6]}" 
                },
                {
                    "osd": 6,
                    "primary": true,
                    "errors": [],
                    "size": 1682,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x735b0743",
                    "snapset": "6=[6,5,4,3,2,1]:{1=[1],2=[2],3=[3],4=[4],5=[5],6=[6]}" 
                },
                {
                    "osd": 8,
                    "primary": false,
                    "errors": [],
                    "size": 1682,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x735b0743",
                    "snapset": "6=[6,5,4,3,2,1]:{1=[1],2=[2],3=[3],4=[4],5=[5],6=[6]}" 
                }
            ]
        }
    ]
}
For now the user would have to increase the debug_osd log level and examine the osd logs to find the selected authoritative copy for a specific object. With 2 or different snapsets we could make it more complex by showing the snapshot results using each snapset for comparison or easier would be to indicate which is the authoritative copy. The existing code in PG::scrub_compare_maps() doesn't pass enough information to PrimaryLogPG::scrub_snapshot_metadata() for it to see both snapset variants or know which shard it is using.

Additional info:

Comment 3 Giridhar Ramaraju 2019-08-05 13:08:43 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 4 Giridhar Ramaraju 2019-08-05 13:10:08 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri


Note You need to log in before you can comment on or make changes to this bug.