Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1403770 - Incorrect incrementation of volinfo refcnt during volume start
Incorrect incrementation of volinfo refcnt during volume start
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
3.2
Unspecified Unspecified
unspecified Severity unspecified
: ---
: RHGS 3.2.0
Assigned To: Avra Sengupta
Byreddy
:
Depends On:
Blocks: 1351528 1403780 1404104 1404105
  Show dependency treegraph
 
Reported: 2016-12-12 05:57 EST by Avra Sengupta
Modified: 2017-03-23 01:55 EDT (History)
7 users (show)

See Also:
Fixed In Version: glusterfs-3.8.4-9
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1403780 (view as bug list)
Environment:
Last Closed: 2017-03-23 01:55:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 05:18:45 EDT

  None (edit)
Description Avra Sengupta 2016-12-12 05:57:58 EST
Description of problem:
When a volume is started, in glusterd_op_start_volume(), we increase the refcount of the volume, but in 'out:' the logic to decrement the refcount is faulty. As a result, every time a volume stops and starts, it's refcount has increased by 1. This is pretty serious given that we use refcount as a parameter to delete a volume.

What happens when we delete a volume which has gone through the above:
We don't see it listed in the vol info, as we explicitly remove it frm the list. But the volinfo continues to stay in memory.

In events of reverting a failed snapshot restore, this has catastrophic consequences, as the stale volinfo not only stays back in memory, but also corrupts the volume list.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 2 Avra Sengupta 2016-12-12 06:27:29 EST
Master URL: http://review.gluster.org/16108
Comment 7 Byreddy 2016-12-19 04:56:02 EST
@Avra, steps are there to reproduce and verify this issue?
Comment 8 Avra Sengupta 2016-12-19 05:10:05 EST
You need to run gdb on glusterd, and put a breakpoint on glusterd_op_start_volume() and check for the value of volinfo->refcnt. This value should not increase every time we stop and again start the volume.
Comment 9 Byreddy 2016-12-21 01:57:00 EST
Verified this issue using glusterfs-3.8.4-9 and reproduced the issue with out the fix as well.

Fix is working well, below the gdb result with and without fix.

Without Fix,
============
2568	        ret  = glusterd_volinfo_find (volname, &volinfo);
(gdb) 
2569	        if (ret) {
(gdb) p volinfo->refcnt
$1 = 7
(gdb) c
Continuing.
Detaching after fork from child process 31326.
Detaching after fork from child process 31345.

(gdb) 
2568	        ret  = glusterd_volinfo_find (volname, &volinfo);
(gdb) p volinfo->refcnt
$2 = 8
(gdb) c
Continuing.
Detaching after fork from child process 31416.
Detaching after fork from child process 31435.

2568	        ret  = glusterd_volinfo_find (volname, &volinfo);
(gdb) 
2569	        if (ret) {
(gdb) p volinfo->refcnt
$3 = 9
(gdb) c
Continuing.
Detaching after fork from child process 31506.




With Fix,
=========
2569	        if (ret) {
(gdb) p volinfo->refcnt
$1 = 1
(gdb) c
Continuing.
Detaching after fork from child process 32164.
Detaching after fork from child process 32183.
Detaching after fork from child process 32202.
Detaching after fork from child process 32205.
Detaching after fork from child process 32229.
Detaching after fork from child process 32231.

Breakpoint 1, glusterd_op_start_volume (dict=dict@entry=0x7f1cf06b2988, op_errstr=op_errstr@entry=0x7f1cd83843c0) at glusterd-volume-ops.c:2544
2544	{
(gdb) next
2569	        if (ret) {
(gdb) p volinfo->refcnt
$2 = 1
(gdb) 
$3 = 1
(gdb) c
Continuing.
Detaching after fork from child process 32248.
Detaching after fork from child process 32267.
Detaching after fork from child process 32286.
Detaching after fork from child process 32289.
Detaching after fork from child process 32319.
Detaching after fork from child process 32321.
[Switching to Thread 0x7f1ce8f01700 (LWP 31781)]

Breakpoint 1, glusterd_op_start_volume (dict=dict@entry=0x7f1cf06b3198, op_errstr=op_errstr@entry=0x7f1cd83843c0) at glusterd-volume-ops.c:2544
2544	{
2568	        ret  = glusterd_volinfo_find (volname, &volinfo);
(gdb) 
2569	        if (ret) {
(gdb) p volinfo->refcnt
$4 = 1
(gdb) c
Continuing.
Detaching after fork from child process 32338.


Based on above details moving to verified state.
Comment 11 errata-xmlrpc 2017-03-23 01:55:52 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.