Description of problem (please be detailed as possible and provide log snippests): 2 MDS pods in crashloopbackoff. PVCs unaccessible Version of all relevant components (if applicable): OCS 4.9 ceph version 16.2.0-152 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Filesystem inaccessible Is there any workaround available to the best of your knowledge? no Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 4 Can this issue reproducible? no Can this issue reproduce from the UI? no If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
FWIW, in Server::reconnect_tick(): ``` for (auto session : remaining_sessions) { // Keep sessions that have specified timeout. These sessions will prevent // mds from going to active. MDS goes to active after they all have been // killed or reclaimed. if (session->info.client_metadata.find("timeout") != session->info.client_metadata.end()) { dout(1) << "reconnect keeps " << session->info.inst << ", need to be reclaimed" << dendl; client_reclaim_gather.insert(session->get_client()); continue; } dout(1) << "reconnect gives up on " << session->info.inst << dendl; mds->clog->warn() << "evicting unresponsive client " << *session << ", after waiting " << elapse1 << " seconds during MDS startup"; ``` Is the MDS waiting for session to be reclaimed?