1341225 – number of connections from skyring to mongodb grows up to 1000 in idle state

Bug 1341225 - number of connections from skyring to mongodb grows up to 1000 in idle state

Summary: number of connections from skyring to mongodb grows up to 1000 in idle state

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	2
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	2
Assignee:	Shubhendu Tripathi
QA Contact:	Martin Bukatovic
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Console-2-DevFreeze
TreeView+	depends on / blocked

Reported:	2016-05-31 13:35 UTC by Martin Bukatovic
Modified:	2016-08-23 19:53 UTC (History)
CC List:	4 users (show)
Fixed In Version:	rhscon-core-0.0.30-1.el7scon
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-23 19:53:05 UTC
Embargoed:

Attachments	(Terms of Use)
number of connections log (25.18 KB, application/x-gzip) 2016-05-31 13:40 UTC, Martin Bukatovic	no flags	Details
graph based on number of connections log (44.73 KB, image/png) 2016-05-31 13:42 UTC, Martin Bukatovic	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2016:1754	0	normal	SHIPPED_LIVE	New packages: Red Hat Storage Console 2.0	2017-04-18 19:09:06 UTC

Description Martin Bukatovic 2016-05-31 13:35:48 UTC

Description of problem
======================

Number of skyring to mongodb connections grows in a linear way in about 17
hours cycle, which may be related to additional problems (it's not possible
to login into web interface and the web interface is not available when the
number of connections is near it's peak).

Version-Release 
===============

On RHSC 2.0 machine:

rhscon-core-0.0.19-1.el7scon.x86_64
rhscon-ceph-0.0.18-1.el7scon.x86_64
rhscon-ui-0.0.34-1.el7scon.noarch
ceph-installer-1.0.11-1.el7scon.noarch
ceph-ansible-1.0.5-15.el7scon.noarch
mongodb-server-2.6.5-4.1.el7.x86_64
mongodb-2.6.5-4.1.el7.x86_64

On Ceph node machines:

rhscon-agent-0.0.8-1.el7scon.noarch
ceph-base-10.2.1-6.el7cp.x86_64

How reproducible
================

100 %

Steps to Reproduce
==================

1. Install RHSC 2.0 following the documentation, make sure you have few nodes
   ready to be accepted later.
2. Accept all nodes.
3. Start Create Cluster task.
4. Monitor number of skyring to mongodb connections
   and let the cluster idle for few days.

To "quickly" reproduce the issue, one would need about 2 or 3 days to
observe few full cycles (as described below)

One can run script like this to monitor the number of the connections:

~~~
#!/bin/bash

skyring_pid=$1
mongodb_port=27017

while true; do
  con_num=$(lsof -p ${skyring_pid} | grep ${mongodb_port} | wc -l)
  echo $(date +"%Y-%m-%dT%H:%M") $con_num
  sleep 60
done
~~~

Actual results
==============

Skyring maintains "low" number of connections (about 20 in my case) to mongodb
at first, but after few hours (about 6 hours in my case), the number of
connections grows in a linear way until it reaches it's peak of 1002
connections, when the number of connections plummets to just 4 connections.
From there the cycle of linear growth of connections starts again. One such
cycle takes about 17 hours.

There are additional/related problems:

1) it's not possible to login as admin user into web interface, the login
   page states:

> The username or password is incorrect.

While in the skyring.log file, there are just these 2 lines for the event:

~~~
2016-05-29T15:08:35.229+02:00 ERROR    auth.go:163 Login] Error saving the session for user: admin. error: Closed explicitly
2016-05-29T15:08:35.229+02:00 ERROR    auth.go:70 login] Unable to login User:Closed explicitly
~~~

2) When the number of connections is near it's peak, it's not possible to reach
   skyring daemon via web interface. One would need to wait until the
   number of connections plummets again to be able to use the web interface.

Expected results
================

Skyring maintains reasonable number of mongodb connections the whole time.

Comment 1 Martin Bukatovic 2016-05-31 13:40:32 UTC

Created attachment 1163238 [details]
number of connections log

Attaching log file generated by the script from "Steps to Reproduce" section,
each line contains a times tamp and the number of connections.

Comment 2 Martin Bukatovic 2016-05-31 13:42:06 UTC

Created attachment 1163239 [details]
graph based on number of connections log

Graph representation of "number of connections log". See "Actual results" section
of this BZ for the explanation.

Comment 3 Martin Bukatovic 2016-05-31 14:32:39 UTC

Additional information: restarting skyring fixes the issue with web login.

Comment 4 Nishanth Thomas 2016-06-03 05:12:53 UTC

When we discussed earlier, these connection were getting garbage collected when the go garbage collector kicks in. Are you seeing steep increase in the no of connections or is iut getting cleared up after some time?

Comment 5 Martin Bukatovic 2016-06-03 09:07:22 UTC

(In reply to Nishanth Thomas from comment #4)
> When we discussed earlier, these connection were getting garbage collected
> when the go garbage collector kicks in.

Yes, that's true. On the other hand the way how it grows, the sheer number
of connections being open and related issues (web interface not reachable
when the number of mongodb connections is near it's peak) are hardly ok
and reasonable.

> Are you seeing steep increase in the
> no of connections or is iut getting cleared up after some time?

Look at the graph attached to this BZ. I see unreasonable linear growth
of mongodb connections. The period of this process is about 17 hours.
While linear growth is not steep (for some definition of steep), it's
definitely not fine.

Why does skyring open new connection like this over and over again?

Comment 6 Martin Bukatovic 2016-06-08 09:02:40 UTC

I was trying to recheck this issue with new skyring build (I was asked to to this
during one meeting with Nishants this week) and at first sight, I don't see
linear growth of skyring/mongodb connections. But I'm unable to
collect enough data to gain a good evidence, because skyring crashes/stops
after few hours (BZ 1343104) and I need to let it run for at least 4 or 5 days.

Comment 7 Shubhendu Tripathi 2016-07-04 12:06:11 UTC

With build rhscon-core-0.0.30-1.el7scon, the mongodb connections are well under control and there is not linear growth in the no of connections now.

Comment 10 Martin Bukatovic 2016-08-08 08:49:46 UTC

Checking with
=============

On RHSC 2.0 server machine:

rhscon-ui-0.0.51-1.el7scon.noarch
rhscon-core-0.0.38-1.el7scon.x86_64
rhscon-ceph-0.0.38-1.el7scon.x86_64
rhscon-core-selinux-0.0.38-1.el7scon.noarch

On Ceph 2.0 machines:

rhscon-core-selinux-0.0.38-1.el7scon.noarch
rhscon-agent-0.0.16-1.el7scon.noarch

Verification
============

When observing number of skyring to mongodb connections for about 3,5 days,
I haven't noticed the issue at all: the number of connections remained constant
(at 17 connections) during the whole period.

~~~
$ head -2 skyring_mongod_conn.log 
2016-08-04T18:12 17
2016-08-04T18:13 17
$ tail -2 skyring_mongod_conn.log 
2016-08-08T10:39 17
2016-08-08T10:40 17
$ cut -d' ' -f2 skyring_mongod_conn.log | sort | uniq
17
~~~

>> VERIFIED

Comment 12 errata-xmlrpc 2016-08-23 19:53:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1754

Note You need to log in before you can comment on or make changes to this bug.