+++ This bug was initially created as a clone of Bug #1315560 +++
Description of problem:
http://www.gluster.org/pipermail/gluster-devel/2016-March/048568.htmlhttps://build.gluster.org/job/rackspace-regression-2GB-triggered/18872/consoleFullhttps://build.gluster.org/job/rackspace-regression-2GB-triggered/18793/console
I have set the author to the author of the script to begin with.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
--- Additional comment from Vijay Bellur on 2016-03-07 23:52:23 EST ---
REVIEW: http://review.gluster.org/13632 (tests: Move tier-file-create.t to bad tests) posted (#1) for review on master by Krutika Dhananjay (kdhananj)
--- Additional comment from Vijay Bellur on 2016-03-08 01:45:07 EST ---
REVIEW: http://review.gluster.org/13632 (tests: Move tier-file-create.t to bad tests) posted (#2) for review on master by Krutika Dhananjay (kdhananj)
--- Additional comment from Vijay Bellur on 2016-03-08 06:29:20 EST ---
REVIEW: http://review.gluster.org/13632 (tests: Move tier-file-create.t to bad tests) posted (#3) for review on master by Krutika Dhananjay (kdhananj)
--- Additional comment from Vijay Bellur on 2016-03-08 15:00:44 EST ---
COMMIT: http://review.gluster.org/13632 committed in master by Jeff Darcy (jdarcy)
------
commit 66d62edd08be5701407e4adcb153a676702ff8b8
Author: Krutika Dhananjay <kdhananj>
Date: Tue Mar 8 10:21:14 2016 +0530
tests: Move tier-file-create.t to bad tests
Change-Id: Iaddb244699b0e2647a67a75f257e4c47e0e69e0d
BUG: 1315560
Signed-off-by: Krutika Dhananjay <kdhananj>
Reviewed-on: http://review.gluster.org/13632
Smoke: Gluster Build System <jenkins.com>
NetBSD-regression: NetBSD Build System <jenkins.org>
CentOS-regression: Gluster Build System <jenkins.com>
Reviewed-by: Dan Lambright <dlambrig>
Reviewed-by: Jeff Darcy <jdarcy>
--- Additional comment from Vijay Bellur on 2016-03-11 05:48:50 EST ---
REVIEW: http://review.gluster.org/13680 (cluster/ec: Do not ref dictionary in lookup) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)
--- Additional comment from Vijay Bellur on 2016-03-14 07:40:03 EDT ---
COMMIT: http://review.gluster.org/13680 committed in master by Xavier Hernandez (xhernandez)
------
commit 64cba025b13aad7fb3020a04930cfa22fbfcb859
Author: Pranith Kumar K <pkarampu>
Date: Tue Mar 8 23:05:08 2016 +0530
cluster/ec: Do not ref dictionary in lookup
Problem:
1) dict_for_each loops over the elements without any locks, so the members of
the dictionary can be ref/unrefed while dict_for_each is executed by another
thread leading to crashes.
Basically with distributed ec + disctributed replicate as cold, hot tiers. tier
sends a lookup which fails on ec. (By this time dict already contains ec
xattrs) After this lookup_everywhere code path is hit in tier which triggers
lookup on each of distribute's hash lookup but fails which leads to the cold,
hot dht's lookup_everywhere in two parallel epoll threads where in ec when it
tries to set trusted.ec.version/dirty/size as keys in the dictionary, the older
values against the same key get erased. While this erasing is going on if the
thread that is doing lookup on afr's subvolume accesses these keys either in
dict_copy_with_ref or client xlator trying to serialize, that can either lead
to crash or hang based on if the spin/mutex lock is called on invalid memory.
2) EC deletes GF_CONTENT_KEY from the dictionary, this may lead to extra reads
in case of lookup-everwhere for tiered volumes.
Fix:
Do dict_copy_with_ref() for the lookup-dictionary.
This is avoiding the problem and is not actually fixing the 1st problem.
2nd problem will be fixed.
Change-Id: I5427aa14c48cb7572977d4de9a28c5ffff2b4b95
BUG: 1315560
Signed-off-by: Pranith Kumar K <pkarampu>
Reviewed-on: http://review.gluster.org/13680
Smoke: Gluster Build System <jenkins.com>
NetBSD-regression: NetBSD Build System <jenkins.org>
CentOS-regression: Gluster Build System <jenkins.com>
Reviewed-by: Xavier Hernandez <xhernandez>
Comment 8Nag Pavan Chilakam
2016-05-23 07:35:17 UTC
QA Validation:
As discussed with dev, I ran the file in continous loop of 100 times and didn't hit any core
Hence moving to verified
[root@dhcp35-191 /]# rpm -qa|grep gluster
glusterfs-fuse-3.7.9-5.el7rhgs.x86_64
glusterfs-client-xlators-3.7.9-5.el7rhgs.x86_64
glusterfs-3.7.9-5.el7rhgs.x86_64
glusterfs-server-3.7.9-5.el7rhgs.x86_64
glusterfs-api-3.7.9-5.el7rhgs.x86_64
python-gluster-3.7.9-5.el7rhgs.noarch
glusterfs-libs-3.7.9-5.el7rhgs.x86_64
glusterfs-cli-3.7.9-5.el7rhgs.x86_64
[root@dhcp35-191 /]#
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2016:1240