Description of problem (please be detailed as possible and provide log snippests): Unable to clear crash list after kill OSD daemon Version of all relevant components (if applicable): OCP version:4.8.0-0.nightly-2021-06-24-222938 OCS Version:ocs-operator.v4.7.2-429.ci Provider: Vmware ceph version: sh-4.4# ceph versions { "mon": { "ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 3 }, "mgr": { "ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 1 }, "osd": { "ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 3 }, "mds": { "ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 2 }, "rgw": { "ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 1 }, "overall": { "ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 10 } } Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1.Install OCS4.7.1 2.Get ceph-osd pid: [root@compute-0 /]# pidof ceph-osd 811668 3.Kill process id: [root@compute-0 /]# kill -11 811668 4.Check ceph crash list sh-4.4# ceph crash ls ID ENTITY NEW 2021-06-28_07:03:19.750123Z_1e956973-8358-4e91-b676-64b15718a7da osd.2 5.Upgrade OCS4.7.1 to OCS4.7.2 6.Clear "ceph crash list" sh-4.4# ceph crash archive-all 7.Check ceph crash list [the item was not deleted, Fail!!!] sh-4.4# ceph crash ls ID ENTITY NEW 2021-06-28_07:03:19.750123Z_1e956973-8358-4e91-b676-64b15718a7da osd.2 Actual results: osd.2 item does not deleted after ceph crash list clearing ("ceph crash archive-all") Expected results: osd.2 item deleted after ceph crash list clearing ("ceph crash archive-all") Additional info:
logs: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-1976980/
(In reply to Oded from comment #0) > Description of problem (please be detailed as possible and provide log > snippests): > Unable to clear crash list after kill OSD daemon > > Version of all relevant components (if applicable): > OCP version:4.8.0-0.nightly-2021-06-24-222938 > OCS Version:ocs-operator.v4.7.2-429.ci > Provider: Vmware > ceph version: > sh-4.4# ceph versions > { > "mon": { > "ceph version 14.2.11-181.el8cp > (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 3 > }, > "mgr": { > "ceph version 14.2.11-181.el8cp > (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 1 > }, > "osd": { > "ceph version 14.2.11-181.el8cp > (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 3 > }, > "mds": { > "ceph version 14.2.11-181.el8cp > (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 2 > }, > "rgw": { > "ceph version 14.2.11-181.el8cp > (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 1 > }, > "overall": { > "ceph version 14.2.11-181.el8cp > (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 10 > } > } > > Does this issue impact your ability to continue to work with the product > (please explain in detail what is the user impact)? > > > Is there any workaround available to the best of your knowledge? > > > Rate from 1 - 5 the complexity of the scenario you performed that caused this > bug (1 - very simple, 5 - very complex)? > > > Can this issue reproducible? > > > Can this issue reproduce from the UI? > > > If this is a regression, please provide more details to justify this: > > > Steps to Reproduce: > 1.Install OCS4.7.1 > > 2.Get ceph-osd pid: > [root@compute-0 /]# pidof ceph-osd > 811668 > > 3.Kill process id: > [root@compute-0 /]# kill -11 811668 > > 4.Check ceph crash list > sh-4.4# ceph crash ls > ID ENTITY NEW > 2021-06-28_07:03:19.750123Z_1e956973-8358-4e91-b676-64b15718a7da osd.2 > > 5.Upgrade OCS4.7.1 to OCS4.7.2 > > 6.Clear "ceph crash list" > sh-4.4# ceph crash archive-all > > 7.Check ceph crash list [the item was not deleted, Fail!!!] > sh-4.4# ceph crash ls > ID ENTITY NEW > 2021-06-28_07:03:19.750123Z_1e956973-8358-4e91-b676-64b15718a7da osd.2 > > > Actual results: > osd.2 item does not deleted after ceph crash list clearing ("ceph crash > archive-all") > > Expected results: > osd.2 item deleted after ceph crash list clearing ("ceph crash archive-all") I don't think this is the expected result. Have you seen it work this way in any other version/tests? "ceph crash archive-all" will remove the health check associated with the crash but "ceph crash ls" will still list it. You can use "ceph crash ls-new" if you don't wish to see archived crashes. Alternatively, you can use "rm" to remove the crash. > > > Additional info:
Doesn't look like an issue, keeping it open till we get a reply for the question asked in the previous comment. Moving it out of 4.8
Agree with Neha on https://bugzilla.redhat.com/show_bug.cgi?id=1976980#c4. Expected results: osd.2 item deleted after ceph crash list clearing ("ceph crash archive-all") >>> We cannot expect "ceph crash archive-all" command to clear/delete the items under ceph crash ls. This looks to be working as expected and this BZ can be closed as not a bug