Bug 1247833 - sharding - OS installation on vm image hangs on a sharded volume
sharding - OS installation on vm image hangs on a sharded volume
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: sharding (Show other bugs)
3.7.3
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Krutika Dhananjay
bugs@gluster.org
: Triaged
Depends On: 1247108
Blocks: glusterfs-3.7.4
  Show dependency treegraph
 
Reported: 2015-07-28 23:49 EDT by Krutika Dhananjay
Modified: 2015-09-09 05:38 EDT (History)
1 user (show)

See Also:
Fixed In Version: glusterfs-3.7.4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1247108
Environment:
Last Closed: 2015-09-09 05:38:40 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Krutika Dhananjay 2015-07-28 23:49:42 EDT
+++ This bug was initially created as a clone of Bug #1247108 +++

Description of problem:
OS installation on a vm image in a sharded volume hangs at some point.

Statedump on the fuse client taken at several points reveals that readv() fop is hung:

<statedump>
...
...

[global.callpool.stack.1.frame.10]
frame=0x7f0b0bcfd150
ref_count=0
translator=dis-rep-shard
complete=0                  <==== complete is 0.
parent=dis-rep-trace
wind_from=trace_readv
wind_to=FIRST_CHILD(this)->fops->readv
unwind_to=trace_readv_cbk

...
...

[global.callpool.stack.1.frame.14]
frame=0x7f0b0bcd6f40
ref_count=1
translator=dis-rep
complete=0                <======== complete is 0
parent=fuse
wind_from=fuse_readv_resume
wind_to=FIRST_CHILD(this)->fops->readv
unwind_to=fuse_readv_cbk
...
...
</statedump>

This was found to be due to call_count being reduced to -1 at the end of shard_common_lookup_shards() because of which this particular stack never gets unwound till FUSE:

(gdb) p (call_frame_t *)0x7f0b0bcfd150
$1 = (call_frame_t *) 0x7f0b0bcfd150
(gdb) p (shard_local_t *)$1->local
$2 = (shard_local_t *) 0x7f0b0086310c
(gdb) p $2->call_count
$3 = -1
(gdb) p $2->eexist_count 
$4 = 1


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Krutika Dhananjay on 2015-07-28 03:37:32 EDT ---

http://review.gluster.org/#/c/11770/

--- Additional comment from Anand Avati on 2015-07-28 09:44:12 EDT ---

REVIEW: http://review.gluster.org/11778 (features/shard: Fix block size get from xdata) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

--- Additional comment from Anand Avati on 2015-07-28 21:53:52 EDT ---

COMMIT: http://review.gluster.org/11770 committed in master by Pranith Kumar Karampuri (pkarampu@redhat.com) 
------
commit d051bd14223d12ca8eaea85f6988ff41e5eef2c3
Author: Krutika Dhananjay <kdhananj@redhat.com>
Date:   Tue Jul 28 11:25:55 2015 +0530

    features/shard: (Re)initialize local->call_count before winding lookup
    
    Change-Id: I616409c38b86c0acf1817b3472a1fed73db293f8
    BUG: 1247108
    Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
    Reviewed-on: http://review.gluster.org/11770
    Tested-by: NetBSD Build System <jenkins@build.gluster.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
Comment 1 Anand Avati 2015-07-28 23:54:39 EDT
REVIEW: http://review.gluster.org/11783 (features/shard: (Re)initialize local->call_count before winding lookup) posted (#1) for review on release-3.7 by Krutika Dhananjay (kdhananj@redhat.com)
Comment 2 Anand Avati 2015-07-29 06:43:44 EDT
REVIEW: http://review.gluster.org/11789 (features/shard: Fix block size get from xdata) posted (#1) for review on release-3.7 by Krutika Dhananjay (kdhananj@redhat.com)
Comment 4 Anand Avati 2015-07-29 06:49:16 EDT
REVIEW: http://review.gluster.org/11783 (features/shard: (Re)initialize local->call_count before winding lookup) posted (#2) for review on release-3.7 by Krutika Dhananjay (kdhananj@redhat.com)
Comment 5 Anand Avati 2015-07-30 03:30:02 EDT
COMMIT: http://review.gluster.org/11789 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu@redhat.com) 
------
commit 044a5623eb9af8e6f52ed2dd02f0f07d23479638
Author: Pranith Kumar K <pkarampu@redhat.com>
Date:   Tue Jul 28 18:38:56 2015 +0530

    features/shard: Fix block size get from xdata
    
            Backport of: http://review.gluster.org/11778
    
    Instead of using dict_get_ptr, dict_get_uint64 was used. If the first byte of
    the value is '\0' then size is returned as 0 because strtoull is used in
    data_to_uint64. This will make it seem like the file is not sharded at all.
    
    Original author: Pranith Kumar K <pkarampu@redhat.com>
    Change-Id: Id07a7d9523cb29d096b65dd68bbfcef395031aef
    BUG: 1247833
    Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
    Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
    Reviewed-on: http://review.gluster.org/11789
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Comment 6 Anand Avati 2015-07-30 03:37:06 EDT
REVIEW: http://review.gluster.org/11802 (features/shard: Create /.shard with 0777 permissions (for now)) posted (#1) for review on release-3.7 by Krutika Dhananjay (kdhananj@redhat.com)
Comment 7 Anand Avati 2015-07-30 14:20:07 EDT
REVIEW: http://review.gluster.org/11810 (cluster/afr: Make [f]xattrop metadata transaction) posted (#1) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu@redhat.com)
Comment 8 Anand Avati 2015-07-31 01:52:16 EDT
REVIEW: http://review.gluster.org/11802 (features/shard: Create /.shard with 0777 permissions (for now)) posted (#2) for review on release-3.7 by Krutika Dhananjay (kdhananj@redhat.com)
Comment 9 Kaushal 2015-09-09 05:38:40 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.4, please open a new bug report.

glusterfs-3.7.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12496
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.