Created attachment 1131488 [details] skyring.log from crhashed server Description of problem: When I created larger cluster (with around 20 nodes) and tried to accept all nodes, skyring crashed with long traceback starting with: panic: send on closed channel Check the whole log in attachment. Version-Release number of selected component (if applicable): rhscon-ceph-0.0.6-8.el7.x86_64 rhscon-core-0.0.8-7.el7.x86_64 rhscon-ui-0.0.16-1.el7.noarch How reproducible: On larger cluster it happens 3 times from total of 4 tries. We saw the same issue once also on smaller cluster with only 6 nodes. Steps to Reproduce: 1. Prepare nodes for USM cluster (with more nodes it will crash more likely). 2. Click to "Accept All" button in USM web UI. 3. Wait some time while checking skyring.log Actual results: Sometime skyring crash with long traceback starting with: panic: send on closed channel Expected results: Skyring shouldn't crash when accepting any supported number of nodes. Additional info: When I try to accept nodes one by one (wia API) and wait till the node is accepted, it works well.
Tested on cluster with 40 OSD and 3 MON nodes. "Accept All" works as expected (if any node initialization fails, it is possible to re-initialize it). USM server (RHEL 7.2): ceph-ansible-1.0.5-26.el7scon.noarch ceph-installer-1.0.14-1.el7scon.noarch rhscon-ceph-0.0.33-1.el7scon.x86_64 rhscon-core-0.0.34-1.el7scon.x86_64 rhscon-core-selinux-0.0.34-1.el7scon.noarch rhscon-ui-0.0.47-1.el7scon.noarch Node (RHEL 7.2): rhscon-agent-0.0.15-1.el7scon.noarch rhscon-core-selinux-0.0.34-1.el7scon.noarch >> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2016:1754