Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. I created an ecpool of count 3+2 2. created few objects and took 3 snapshots . After each snapshot i wrote some data 3. i picked one of the shard and corrupted user.ceph.snapset xattr 4. after running the scrub on primary , status is reporting an inconsistent object but list-inconsistent-obj doesn't tell which shard is actually having the problem. Additional info: [root@magna105 ~]# rados list-inconsistent-obj 58.0 [{"object":{"name":"obj988","nspace":"","locator":"","snap":"head"},"missing":false,"stat_err":false,"read_err":false,"data_digest_mismatch":false, "omap_digest_mismatch":false,"size_mismatch":false,"attr_mismatch":true,"shards":[{"osd":0,"missing":false,"read_error":false,"data_digest_mismatch":false,"omap_digest_mismatch":false,"size_mismatch":false,"data_digest_mismatch_oi":false,"omap_digest_mismatch_oi":false,"size_mismatch_oi":false,"size":1376,"attrs":{"attr":{"name":"_","value":"DwjrAAAABAMnAAAAAAAAAAYAAABvYmo5ODj+\/\/\/\/\/\/\/\/\/yLj0iEAAAAAADoAAAAAAAAABgMcAAAAOgAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABwEwAAAAAAAFVIAQCgCwAAAAAAAFRIAQACAhUAAAAI\/rUGAAAAAAABAAAAAAAAAAAAAACCAgAAAAAAAP7lnldiKDcpAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAcBMAAAAAAAAAAAAAAAAAAAA0AAAA\/uWeV3RxlikMgVqc\/\/\/\/\/w=="},"attr":{"name":"hinfo_key","value":"AQEgAAAAYAUAAAAAAAAFAAAAxBQDO+gZjinoGY4pxBQDO8QUAzs="},"attr":{"name":"snapset","value":"AgJxAAAAAgAAAAAAAAABAgAAAAIAAAAAAAAAAQAAAAAAAAACAAAAAQAAAAAAAAACAAAAAAAAAAIAAAABAAAAAAAAAAAAAAACAAAAAAAAAAAAAAACAAAAAQAAAAAAAACCAgAAAAAAAAIAAAAAAAAAggIAAAAAAAA="}}},{"osd":2,"missing":false,"read_error":false,"data_digest_mismatch":false,"omap_digest_mismatch":false,"size_mismatch":false,"data_digest_mismatch_oi":false,"omap_digest_mismatch_oi":false,"size_mismatch_oi":false,"size":1376,"attrs":{"attr":{"name":"_","value":"DwjrAAAABAMnAAAAAAAAAAYAAABvYmo5ODj+\/\/\/\/\/\/\/\/\/yLj0iEAAAAAADoAAAAAAAAABgMcAAAAOgAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABwEwAAAAAAAFVIAQCgCwAAAAAAAFRIAQACAhUAAAAI\/rUGAAAAAAABAAAAAAAAAAAAAACCAgAAAAAAAP7lnldiKDcpAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAcBMAAAAAAAAAAAAAAAAAAAA0AAAA\/uWeV3RxlikMgVqc\/\/\/\/\/w=="},"attr":{"name":"hinfo_key","value":"AQEgAAAAYAUAAAAAAAAFAAAAxBQDO+gZjinoGY4pxBQDO8QUAzs="},"attr":{"name":"snapset","value":"AgJxAAAAAgAAAAAAAAABAgAAAAIAAAAAAAAAAQAAAAAAAAACAAAAAQAAAAAAAAACAAAAAAAAAAIAAAABAAAAAAAAAAAAAAACAAAAAAAAAAAAAAACAAAAAQAAAAAAAACCAgAAAAAAAAIAAAAAAAAAggIAAAAAAAA="}}},{"osd":3,"missing":false,"read_error":false,"data_digest_mismatch":false,"omap_digest_mismatch":false,"size_mismatch":false,"data_digest_mismatch_oi":false,"omap_digest_mismatch_oi":false,"size_mismatch_oi":false,"size":1376,"attrs":{"attr":{"name":"_","value":"DwjrAAAABAMnAAAAAAAAAAYAAABvYmo5ODj+\/\/\/\/\/\/\/\/\/yLj0iEAAAAAADoAAAAAAAAABgMcAAAAOgAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABwEwAAAAAAAFVIAQCgCwAAAAAAAFRIAQACAhUAAAAI\/rUGAAAAAAABAAAAAAAAAAAAAACCAgAAAAAAAP7lnldiKDcpAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAcBMAAAAAAAAAAAAAAAAAAAA0AAAA\/uWeV3RxlikMgVqc\/\/\/\/\/w=="},"attr":{"name":"hinfo_key","value":"AQEgAAAAYAUAAAAAAAAFAAAAxBQDO+gZjinoGY4pxBQDO8QUAzs="},"attr":{"name":"snapset","value":"AgJxAAAAAgAAAAAAAAABAgAAAAIAAAAAAAAAAQAAAAAAAAACAAAAAQAAAAAAAAACAAAAAAAAAAIAAAABAAAAAAAAAAAAAAACAAAAAAAAAAAAAAACAAAAAQAAAAAAAACCAgAAAAAAAAIAAAAAAAAAggIAAAAAAAA="}}},{"osd":4,"missing":false,"read_error":false,"data_digest_mismatch":false,"omap_digest_mismatch":false,"size_mismatch":false,"data_digest_mismatch_oi":false,"omap_digest_mismatch_oi":false,"size_mismatch_oi":false,"size":1376,"attrs":{"attr":{"name":"_","value":"DwjrAAAABAMnAAAAAAAAAAYAAABvYmo5ODj+\/\/\/\/\/\/\/\/\/yLj0iEAAAAAADoAAAAAAAAABgMcAAAAOgAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABwEwAAAAAAAFVIAQCgCwAAAAAAAFRIAQACAhUAAAAI\/rUGAAAAAAABAAAAAAAAAAAAAACCAgAAAAAAAP7lnldiKDcpAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAcBMAAAAAAAAAAAAAAAAAAAA0AAAA\/uWeV3RxlikMgVqc\/\/\/\/\/w=="},"attr":{"name":"hinfo_key","value":"AQEgAAAAYAUAAAAAAAAFAAAAxBQDO+gZjinoGY4pxBQDO8QUAzs="},"attr":{"name":"snapset","value":"AgJxAAAAAgAAAAAAAAABAgAAAAIAAAAAAAAAAQAAAAAAAAACAAAAAQAAAAAAAAACAAAAAAAAAAIAAAABAAAAAAAAAAAAAAACAAAAAAAAAAAAAAACAAAAAQAAAAAAAACCAgAAAAAAAAIAAAAAAAAAggIAAAAAAAA="}}},{"osd":7,"missing":false,"read_error":false,"data_digest_mismatch":false,"omap_digest_mismatch":false,"size_mismatch":false,"data_digest_mismatch_oi":false,"omap_digest_mismatch_oi":false,"size_mismatch_oi":false,"size":1376,"attrs":{"attr":{"name":"_","value":"DwjrAAAABAMnAAAAAAAAAAYAAABvYmo5ODj+\/\/\/\/\/\/\/\/\/yLj0iEAAAAAADoAAAAAAAAABgMcAAAAOgAAAAAAAAD\/\/\/\/\/AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/wAAAABwEwAAAAAAAFVIAQCgCwAAAAAAAFRIAQACAhUAAAAI\/rUGAAAAAAABAAAAAAAAAAAAAACCAgAAAAAAAP7lnldiKDcpAgIVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAcBMAAAAAAAAAAAAAAAAAAAA0AAAA\/uWeV3RxlikMgVqc\/\/\/\/\/w=="},"attr":{"name":"hinfo_key","value":"AQEgAAAAYAUAAAAAAAAFAAAAxBQDO+gZjinoGY4pxBQDO8QUAzs="},"attr":{"name":"snapset","value":"SEVMTE8="}}}]}] From the above output its not possible to tell which shard is inconsistent. However user can go through value of each attr for each shard and can find it out which is cumbersome. It would be nice if list-inconsistent-obj itself can highlight the defective shard.
David's working on this as we speak for Kraken.
The current in progress code for Kraken will give an "attr_value_mismatch" for this error and will indicate it on a particular shard. I can see by a quick visual inspection that the snapset on osd 7 doesn't look like the others. The scrub code picks an authoritative copy based on the integrity of certain fields. In the reporter's example if osd 7 were to be selected as the authoritative copy the result will mark all the other OSDs with the "attr_value_mismatch" error.
This was back ported after all (https://github.com/ceph/ceph/pull/13146). I changed to 2.3 target release.
[root@banshee 1.38s3_head]# rados list-inconsistent-obj 1.38 --format=json-pretty { "epoch": 163, "inconsistents": [ { "object": { "name": "benchmark_data_aircobra.lab.eng.blr.redhat.c_79300_object79", "nspace": "", "locator": "", "snap": "head", "version": 2 }, "errors": [ "attr_value_mismatch" ], "union_shard_errors": [], "selected_object_info": "1:1d3dd9ca:::benchmark_data_aircobra.lab.eng.blr.redhat.c_79300_object79:head(165'2 client.4724.0:80 dirty|data_digest|omap_digest s 4198176 uv 2 dd 52fd006a od ffffffff)", "shards": [ { "osd": 1, "shard": 0, "errors": [], "size": 1399392, "attrs": [ { "name": "_", "value": "DwggAQAABANcAAAAAAAAADsAAABiZW5jaG1hcmtfZGF0YV9haXJjb2JyYS5sYWIuZW5nLmJsci5yZWRoYXQuY183OTMwMF9vYmplY3Q3Of7\/\/\/\/\/\/\/\/\/uLybUwAAAAAAAQAAAAAAAAAGAxwAAAABAAAAAAAAAP\/\/\/\/8AAAAAAAAAAP\/\/\/\/\/\/\/\/\/\/AAAAAAIAAAAAAAAApQAAAAAAAAAAAAAAAAAAAAICFQAAAAh0EgAAAAAAAFAAAAAAAAAAAAAAACAPQAAAAAAAaB0lWbJfPAICAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAADQAAABoHSVZfJ\/TAmoA\/VL\/\/\/\/\/", "Base64": true }, { "name": "hinfo_key", "value": "AQEgAAAAYFoVAAAAAAAFAAAAr+GE2KhJYfGoSWHxr+GE2Kw7MEA=", "Base64": true }, { "name": "snapset", "value": "AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==", "Base64": true } ] }, { "osd": 2, "shard": 2, "errors": [], "size": 1399392, "attrs": [ { "name": "_", "value": "DwggAQAABANcAAAAAAAAADsAAABiZW5jaG1hcmtfZGF0YV9haXJjb2JyYS5sYWIuZW5nLmJsci5yZWRoYXQuY183OTMwMF9vYmplY3Q3Of7\/\/\/\/\/\/\/\/\/uLybUwAAAAAAAQAAAAAAAAAGAxwAAAABAAAAAAAAAP\/\/\/\/8AAAAAAAAAAP\/\/\/\/\/\/\/\/\/\/AAAAAAIAAAAAAAAApQAAAAAAAAAAAAAAAAAAAAICFQAAAAh0EgAAAAAAAFAAAAAAAAAAAAAAACAPQAAAAAAAaB0lWbJfPAICAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAADQAAABoHSVZfJ\/TAmoA\/VL\/\/\/\/\/", "Base64": true }, { "name": "hinfo_key", "value": "AQEgAAAAYFoVAAAAAAAFAAAAr+GE2KhJYfGoSWHxr+GE2Kw7MEA=", "Base64": true }, { "name": "snapset", "value": "AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==", "Base64": true } ] }, { "osd": 3, "shard": 4, "errors": [], "size": 1399392, "attrs": [ { "name": "_", "value": "DwggAQAABANcAAAAAAAAADsAAABiZW5jaG1hcmtfZGF0YV9haXJjb2JyYS5sYWIuZW5nLmJsci5yZWRoYXQuY183OTMwMF9vYmplY3Q3Of7\/\/\/\/\/\/\/\/\/uLybUwAAAAAAAQAAAAAAAAAGAxwAAAABAAAAAAAAAP\/\/\/\/8AAAAAAAAAAP\/\/\/\/\/\/\/\/\/\/AAAAAAIAAAAAAAAApQAAAAAAAAAAAAAAAAAAAAICFQAAAAh0EgAAAAAAAFAAAAAAAAAAAAAAACAPQAAAAAAAaB0lWbJfPAICAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAADQAAABoHSVZfJ\/TAmoA\/VL\/\/\/\/\/", "Base64": true }, { "name": "hinfo_key", "value": "AQEgAAAAYFoVAAAAAAAFAAAAr+GE2KhJYfGoSWHxr+GE2Kw7MEA=", "Base64": true }, { "name": "snapset", "value": "AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==", "Base64": true } ] }, { "osd": 4, "shard": 3, "errors": [], "size": 1399392, "attrs": [ { "name": "_", "value": "DwggAQAABANcAAAAAAAAADsAAABiZW5jaG1hcmtfZGF0YV9haXJjb2JyYS5sYWIuZW5nLmJsci5yZWRoYXQuY183OTMwMF9vYmplY3Q3Of7\/\/\/\/\/\/\/\/\/uLybUwAAAAAAAQAAAAAAAAAGAxwAAAABAAAAAAAAAP\/\/\/\/8AAAAAAAAAAP\/\/\/\/\/\/\/\/\/\/AAAAAAIAAAAAAAAApQAAAAAAAAAAAAAAAAAAAAICFQAAAAh0EgAAAAAAAFAAAAAAAAAAAAAAACAPQAAAAAAAaB0lWbJfPAICAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAADQAAABoHSVZfJ\/TAmoA\/VL\/\/\/\/\/", "Base64": true }, { "name": "hinfo_key", "value": "AQEgAAAAYFoVAAAAAAAFAAAAr+GE2KhJYfGoSWHxr+GE2Kw7MEA=", "Base64": true }, { "name": "snapset", "value": "0000000xxxx000000000", "Base64": false } ] }, { "osd": 8, "shard": 1, "errors": [], "size": 1399392, "attrs": [ { "name": "_", "value": "DwggAQAABANcAAAAAAAAADsAAABiZW5jaG1hcmtfZGF0YV9haXJjb2JyYS5sYWIuZW5nLmJsci5yZWRoYXQuY183OTMwMF9vYmplY3Q3Of7\/\/\/\/\/\/\/\/\/uLybUwAAAAAAAQAAAAAAAAAGAxwAAAABAAAAAAAAAP\/\/\/\/8AAAAAAAAAAP\/\/\/\/\/\/\/\/\/\/AAAAAAIAAAAAAAAApQAAAAAAAAAAAAAAAAAAAAICFQAAAAh0EgAAAAAAAFAAAAAAAAAAAAAAACAPQAAAAAAAaB0lWbJfPAICAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAADQAAABoHSVZfJ\/TAmoA\/VL\/\/\/\/\/", "Base64": true }, { "name": "hinfo_key", "value": "AQEgAAAAYFoVAAAAAAAFAAAAr+GE2KhJYfGoSWHxr+GE2Kw7MEA=", "Base64": true }, { "name": "snapset", "value": "AgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==", "Base64": true } ] } ] } ] } I am verifying this issue. Errors section in individual shards/osds does't display any error. Am I missing something?
discussed at program meeting, need feedback from development today. gregory to followup
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1497