Bug 1416684

Summary:	Hangs on 32 bit systems since 3.9.0
Product:	[Community] GlusterFS	Reporter:	Vitaly Lipatov <lav>
Component:	md-cache	Assignee:	Vitaly Lipatov <lav>
Status:	CLOSED EOL	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	3.9	CC:	bugs, ndevos
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1417913 (view as bug list)		Environment:
Last Closed:	2017-03-08 12:32:48 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1417913
Bug Blocks:

Description Vitaly Lipatov 2017-01-26 09:09:02 UTC

Description of problem:
Since 3.9.0 command
/usr/sbin/glusterfs --debug --volfile-server=SERVER --volfile-id=VOLID /mount/path
hangs on 32 bit system

Additional info:

It hangs on INCREMENT_ATOMIC from this commit:
commit 3cc7f6588c281846f8c590553da03dd16f150e8a
Author: Poornima G <pgurusid>
Date:   Wed Aug 17 12:55:37 2016 +0530

    md-cache: Add cache hit and miss counters
    
    These counters can be accessed either by .meta interface
    or statedump.
    
    From meta: cat on the private file in md-cache directory.
    Eg: cat /mnt/glusterfs/0/.meta/graphs/active/patchy-md-cache/private
    [performance/md-cache.patchy-md-cache]


         if (xdata) {
                 ret = mdc_inode_xatt_get (this, loc->inode, &xattr_rsp);
-                if (ret != 0)
+                if (ret != 0) {
+                        INCREMENT_ATOMIC (conf->mdc_counter.lock,
+                                          conf->mdc_counter.xattr_miss);
                         goto uncached;
+                }
 

comment out "!defined(__i386__)" in INCREMENT_ATOMIC definition:

+#if (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 1)) && !defined(__i386__)
+# define INCREMENT_ATOMIC(lk, op) __sync_add_and_fetch(&op, 1)

fix the problem.

I see two problem here:
1. incorrect workaround for INCREMENT_ATOMIC
2. incorrect detection of RHEL 5 32bit

Backtrace:
#0  0xb7e55362 in pthread_spin_lock () from /lib/libpthread.so.0
(gdb) bt
#0  0xb7e55362 in pthread_spin_lock () from /lib/libpthread.so.0
#1  0xb3541ffd in mdc_lookup (frame=0xb6df2300, this=0xb37165b8, loc=0xb03fd220, xdata=0xb66417b4) at md-cache.c:1085
#2  0xb350e3e5 in io_stats_lookup (frame=0xb6df2218, this=0xb3717680, loc=0xb03fd220, xdata=0xb66417b4) at io-stats.c:2617
#3  0xb7f3c225 in default_lookup (frame=0xb6df2218, this=0xb3702f60, loc=0xb03fd220, xdata=0xb66417b4) at defaults.c:2572
#4  0xb34fb63a in meta_lookup (frame=0xb6df2218, this=0xb3702f60, loc=0xb03fd220, xdata=0xb66417b4) at meta.c:44
#5  0xb63282d1 in fuse_first_lookup (this=0x8089400) at fuse-bridge.c:4262

Comment 1 Vitaly Lipatov 2017-01-26 09:43:20 UTC

LOCK hangs due missed LOCK_INIT == missed pthread_spin_init == use gf_lock_t lock uninitialized with zero;

Comment 2 Vitaly Lipatov 2017-01-26 10:04:25 UTC

fix:

--- a/xlators/performance/md-cache/src/md-cache.c
+++ b/xlators/performance/md-cache/src/md-cache.c
@@ -2905,6 +2905,7 @@ init (xlator_t *this)
         GF_OPTION_INIT("cache-invalidation", conf->mdc_invalidation, bool, out);
 
         LOCK_INIT (&conf->lock);
+        LOCK_INIT (&conf->mdc_counter.lock);
         time (&conf->last_child_down);

Comment 3 Niels de Vos 2017-01-31 12:19:53 UTC

Hi Vitaly,

do you want to send a patch to our Gerrit instance for this? The steps to do so are on http://gluster.readthedocs.io/en/latest/Developer-guide/Simplified-Development-Workflow/

I have cloned this bug to get a fix in the master branch (#1417913), the patch should be sent there first.

Comment 4 Worker Ant 2017-02-02 08:45:55 UTC

REVIEW: https://review.gluster.org/16515 (add missed LOCK_INIT to fix INCREMENT_ATOMIC on conf->mdc_counter.lock when pthread_spin_* using) posted (#1) for review on master by Vitaly Lipatov (lav)

Comment 5 Vitaly Lipatov 2017-02-03 07:51:38 UTC

(In reply to Niels de Vos from comment #3)
...
> I have cloned this bug to get a fix in the master branch (#1417913), the
> patch should be sent there first.
The patch is applied to the master branch. Need I do something for backport it to other branches?

Comment 6 Niels de Vos 2017-02-03 12:48:26 UTC

There is not much to it, it basically is cherry-picking the change on top of the other branches and modifying the commit message.

http://gluster.readthedocs.io/en/latest/Developer-guide/Backport-Guidelines/ explains the exact steps with some examples.

I'll assign this bug to you, just let me know if you run into problems or have more questions. Thanks!

Comment 7 Worker Ant 2017-02-03 13:35:47 UTC

REVIEW: https://review.gluster.org/16542 (md-cache: initialize mdc_counter.lock) posted (#1) for review on release-3.9 by Vitaly Lipatov (lav)

Comment 8 Vitaly Lipatov 2017-02-13 07:15:55 UTC

Thank you, Niels!
Is it enouch to send one for review on release 3.9 as I did? There is no moving in that ticket. And branch 3.10 needs that patch will manually sent, is it so?

Comment 9 Niels de Vos 2017-02-13 08:27:00 UTC

Yes, Vitaly, the change for the master branch and release-3.9 are correct. For 3.10 there is bug 1417915 that follows the same procedure.

Comment 10 Worker Ant 2017-02-13 08:28:04 UTC

REVIEW: https://review.gluster.org/16542 (md-cache: initialize mdc_counter.lock) posted (#2) for review on release-3.9 by Vitaly Lipatov (lav)

Comment 11 Kaushal 2017-03-08 12:32:48 UTC

This bug is getting closed because GlusterFS-3.9 has reached its end-of-life [1].

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please open a new bug against the newer release.

[1]: https://www.gluster.org/community/release-schedule/