1312814 – Skyring crashed when accepting nodes because of "panic: send on closed channel"

Bug 1312814 - Skyring crashed when accepting nodes because of "panic: send on closed channel"

Summary: Skyring crashed when accepting nodes because of "panic: send on closed channel"

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	2
Assignee:	Shubhendu Tripathi
QA Contact:	sds-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-02-29 10:23 UTC by Daniel Horák
Modified:	2016-08-23 19:47 UTC (History)
CC List:	3 users (show)
Fixed In Version:	rhscon-core-0.0.8-10.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-23 19:47:29 UTC
Embargoed:

Attachments	(Terms of Use)
skyring.log from crhashed server (1.47 MB, text/plain) 2016-02-29 10:23 UTC, Daniel Horák	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gerrithub.io	264635	0	None	None	None	2016-03-02 05:51:00 UTC
Red Hat Product Errata	RHEA-2016:1754	0	normal	SHIPPED_LIVE	New packages: Red Hat Storage Console 2.0	2017-04-18 19:09:06 UTC

Description Daniel Horák 2016-02-29 10:23:19 UTC

Created attachment 1131488 [details]
skyring.log from crhashed server

Description of problem:
  When I created larger cluster (with around 20 nodes) and tried to accept all nodes, skyring crashed with long traceback starting with: 
    panic: send on closed channel
  Check the whole log in attachment.

Version-Release number of selected component (if applicable):
  rhscon-ceph-0.0.6-8.el7.x86_64
  rhscon-core-0.0.8-7.el7.x86_64
  rhscon-ui-0.0.16-1.el7.noarch


How reproducible:
  On larger cluster it happens 3 times from total of 4 tries.
  We saw the same issue once also on smaller cluster with only 6 nodes.

Steps to Reproduce:
  1. Prepare nodes for USM cluster (with more nodes it will crash more likely). 
  2. Click to "Accept All" button in USM web UI.
  3. Wait some time while checking skyring.log

Actual results:
  Sometime skyring crash with long traceback starting with:
    panic: send on closed channel

Expected results:
  Skyring shouldn't crash when accepting any supported number of nodes.

Additional info:
  When I try to accept nodes one by one (wia API) and wait till the node is accepted, it works well.

Comment 3 Daniel Horák 2016-07-14 11:01:52 UTC

Tested on cluster with 40 OSD and 3 MON nodes.

"Accept All" works as expected (if any node initialization fails, it is possible to re-initialize it).

USM server (RHEL 7.2):
  ceph-ansible-1.0.5-26.el7scon.noarch
  ceph-installer-1.0.14-1.el7scon.noarch
  rhscon-ceph-0.0.33-1.el7scon.x86_64
  rhscon-core-0.0.34-1.el7scon.x86_64
  rhscon-core-selinux-0.0.34-1.el7scon.noarch
  rhscon-ui-0.0.47-1.el7scon.noarch

Node (RHEL 7.2):
  rhscon-agent-0.0.15-1.el7scon.noarch
  rhscon-core-selinux-0.0.34-1.el7scon.noarch

>> VERIFIED

Comment 5 errata-xmlrpc 2016-08-23 19:47:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1754

Note You need to log in before you can comment on or make changes to this bug.