Description of problem:
The customer has two questions that have to do with pg repair's:
1) The customer discovered an inconsistent pg and issued a 'ceph pg repair' but the repair did not appear to begin performing any repair for ~11 hours. The customer is looking for information on how the process is scheduled.
2) The customer is interested in learning which pg states/conditions are safe to repair and which are not. They were wondering if we could provide a list of these conditions so they could document them.
Version-Release number of selected component (if applicable):
pg repair logs have been requested but have not yet been received.
Further question: What happens if the inconsistent copy of the object is actually the primary copy and a client attempts to read the object? Does ceph automatically promote a different copy to primary? Or will this result in read I/O error? We have not experienced this (yet) since our workload at the time of this issue was only writes, but we'd like to know how to handle the read scenario and what type of error we can expect in the application, if any.
From my understanding if the primary copy of the object is bad and 'pg repair' is called, Ceph will replicate the object to the secondary and tertiary OSD nodes, it's not intelligent in anyway (unless this has been resolved in a later commit).
Regarding the question above, I imagine the client would just get an IO error if the object was corrupted enough or it will attempt to read the object with some level of success.
Is this true?