Hide Forgot
Description of problem: Backport: http://tracker.ceph.com/issues/14120 : Pipe::do_recv() may loop infinitely Version-Release number of selected component (if applicable): Red Hat Ceph Storage 2.1
Looks like the jewel backport in https://github.com/ceph/ceph/pull/12341 will probably make v10.2.4.
Problem: A client disconnect can put SimpleMessenger threads in an infinite loop that tries to read from the socket, gets EAGAIN, and loops. It is unclear exactly what environmental circumstances lead to this state, but Zheng was hitting it in his dev environment when he submitted the fix, and a customer was hitting it on seemingly every OSD on most hosts (pushing the load over 200 on an otherwise idle cluster). Customer impact: A SimpleMessenger thread gets stuck in a loop and burns CPU. No other known impact (besides the additional system load). How widespread: No idea. For this customer it happened to all OSDs on most hosts in the cluster, and reentered this state shortly after rebooting the host. Unclear exactly why this cluster was susceptible but others haven't seen the problem.
QE has few questions that needs clarification:- 1. Is this QE testable ? If "YES" Can you please provide the steps to reproduce the Bug ? If "NO" QE will run the Automated regression suite.
Sam, Sage, mind answering Kiran's questions in Comment 18 above?