Created attachment 1861278 [details] Logs for rook-ceph-osd-prepare-default-1-data-0f8266--1-zbjtx Description of problem (please be detailed as possible and provide log snippests): Following BZ2047318, it was found that the OSD.1 prepare job failed but reported as "Completed/Success". The log showed a python traceback: View attached logs "Logs for rook-ceph-osd-prepare-default-1-data-0f8266--1-zbjtx ". Additional info: Please view extensive details in the original BZ and post-mortem notes. - BZ2047318: https://bugzilla.redhat.com/show_bug.cgi?id=2047318 - Post-mortem OCS and MTSRE: https://docs.google.com/document/d/11VZL3OjL-gZzHtvdW3BzaBi9g26em6YN6LAfabb3lYA/
This bug is cloned to 4.10 here: https://bugzilla.redhat.com/show_bug.cgi?id=2054898 I would expect this was a rare condition where the disk was corrupt. The fix just makes it a bit more obvious where the failure is. Not sure we need to backport this to 4.8 unless this could be more consistently repro'd.
Samuel Is the fix in 4.10 sufficient, or to what release would you propose this be backported? It's simple to backport, but it's rare enough I'm not sure it's needed though.
Fixing 4.10 is sufficient for MTSRE. It's not a critical bug, mostly a nice to have thing for debugging so let's not bother. :) Thank you.
(In reply to Samuel Blais-Dowdy from comment #3) > Fixing 4.10 is sufficient for MTSRE. It's not a critical bug, mostly a nice > to have thing for debugging so let's not bother. :) Thank you. Sounds good, i'll close this one since the fix was merged with the clone for the 4.10 release. Thanks!