1403770 – Incorrect incrementation of volinfo refcnt during volume start

Bug 1403770 - Incorrect incrementation of volinfo refcnt during volume start

Summary: Incorrect incrementation of volinfo refcnt during volume start

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Avra Sengupta
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1351528 1403780 1404104 1404105
TreeView+	depends on / blocked

Reported:	2016-12-12 10:57 UTC by Avra Sengupta
Modified:	2017-03-23 05:55 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.8.4-9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1403780 (view as bug list)
Environment:
Last Closed:	2017-03-23 05:55:52 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Avra Sengupta 2016-12-12 10:57:58 UTC

Description of problem:
When a volume is started, in glusterd_op_start_volume(), we increase the refcount of the volume, but in 'out:' the logic to decrement the refcount is faulty. As a result, every time a volume stops and starts, it's refcount has increased by 1. This is pretty serious given that we use refcount as a parameter to delete a volume.

What happens when we delete a volume which has gone through the above:
We don't see it listed in the vol info, as we explicitly remove it frm the list. But the volinfo continues to stay in memory.

In events of reverting a failed snapshot restore, this has catastrophic consequences, as the stale volinfo not only stays back in memory, but also corrupts the volume list.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Avra Sengupta 2016-12-12 11:27:29 UTC

Master URL: http://review.gluster.org/16108

Comment 5 Avra Sengupta 2016-12-13 05:43:06 UTC

Master Url: http://review.gluster.org/16108
Release 3.9 Url: http://review.gluster.org/#/c/16113/
Release 3.8 Url: http://review.gluster.org/#/c/16114/
RHGS 3.2.0 Url: https://code.engineering.redhat.com/gerrit/#/c/92782/

Comment 7 Byreddy 2016-12-19 09:56:02 UTC

@Avra, steps are there to reproduce and verify this issue?

Comment 8 Avra Sengupta 2016-12-19 10:10:05 UTC

You need to run gdb on glusterd, and put a breakpoint on glusterd_op_start_volume() and check for the value of volinfo->refcnt. This value should not increase every time we stop and again start the volume.

Comment 9 Byreddy 2016-12-21 06:57:00 UTC

Verified this issue using glusterfs-3.8.4-9 and reproduced the issue with out the fix as well.

Fix is working well, below the gdb result with and without fix.

Without Fix,
============
2568	        ret  = glusterd_volinfo_find (volname, &volinfo);
(gdb) 
2569	        if (ret) {
(gdb) p volinfo->refcnt
$1 = 7
(gdb) c
Continuing.
Detaching after fork from child process 31326.
Detaching after fork from child process 31345.

(gdb) 
2568	        ret  = glusterd_volinfo_find (volname, &volinfo);
(gdb) p volinfo->refcnt
$2 = 8
(gdb) c
Continuing.
Detaching after fork from child process 31416.
Detaching after fork from child process 31435.

2568	        ret  = glusterd_volinfo_find (volname, &volinfo);
(gdb) 
2569	        if (ret) {
(gdb) p volinfo->refcnt
$3 = 9
(gdb) c
Continuing.
Detaching after fork from child process 31506.




With Fix,
=========
2569	        if (ret) {
(gdb) p volinfo->refcnt
$1 = 1
(gdb) c
Continuing.
Detaching after fork from child process 32164.
Detaching after fork from child process 32183.
Detaching after fork from child process 32202.
Detaching after fork from child process 32205.
Detaching after fork from child process 32229.
Detaching after fork from child process 32231.

Breakpoint 1, glusterd_op_start_volume (dict=dict@entry=0x7f1cf06b2988, op_errstr=op_errstr@entry=0x7f1cd83843c0) at glusterd-volume-ops.c:2544
2544	{
(gdb) next
2569	        if (ret) {
(gdb) p volinfo->refcnt
$2 = 1
(gdb) 
$3 = 1
(gdb) c
Continuing.
Detaching after fork from child process 32248.
Detaching after fork from child process 32267.
Detaching after fork from child process 32286.
Detaching after fork from child process 32289.
Detaching after fork from child process 32319.
Detaching after fork from child process 32321.
[Switching to Thread 0x7f1ce8f01700 (LWP 31781)]

Breakpoint 1, glusterd_op_start_volume (dict=dict@entry=0x7f1cf06b3198, op_errstr=op_errstr@entry=0x7f1cd83843c0) at glusterd-volume-ops.c:2544
2544	{
2568	        ret  = glusterd_volinfo_find (volname, &volinfo);
(gdb) 
2569	        if (ret) {
(gdb) p volinfo->refcnt
$4 = 1
(gdb) c
Continuing.
Detaching after fork from child process 32338.


Based on above details moving to verified state.

Comment 11 errata-xmlrpc 2017-03-23 05:55:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.