1353426 – glusterd: glusterd provides stale port information when a volume is recreated with same brick path

Bug 1353426 - glusterd: glusterd provides stale port information when a volume is recreated with same brick path

Summary: glusterd: glusterd provides stale port information when a volume is recreated...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	3.8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Atin Mukherjee
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1333749 1334270
Blocks:
TreeView+	depends on / blocked

Reported:	2016-07-07 04:20 UTC by Atin Mukherjee
Modified:	2016-07-08 14:42 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.8.1
Clone Of:	1334270
Environment:
Last Closed:	2016-07-08 14:42:35 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Atin Mukherjee 2016-07-07 04:20:08 UTC

+++ This bug was initially created as a clone of Bug #1334270 +++

+++ This bug was initially created as a clone of Bug #1333749 +++

Description of problem:
-------------------------

Had a 2*(4+2) volume, with roughly a lakh files (of size 1k created) from nfs client. Did a 'ls -l | wc -l' , at the same time started creating files of 1g from another mountpoint. When both the above mentioned commands were proceeding, did a tier-attach of 2*2 volume. The command got executed successfully, but these were the errors seen in the logs: 

[2016-05-06 09:29:58.797805] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:30:00.848981] E [MSGID: 109037] [tier.c:1237:tier_process_brick] 0-tier: Failed to get journal_mode of sql db /bricks/brick1/ozone/.glusterfs/ozone.db
[2016-05-06 09:30:00.849018] E [MSGID: 109087] [tier.c:1341:tier_build_migration_qfile] 0-ozone-tier-dht: Brick /bricks/brick1/ozone/.glusterfs/ozone.db query failed
[2016-05-06 09:30:01.018505] E [MSGID: 109037] [tier.c:1394:tier_migrate_files_using_qfile] 0-tier: Failed to open /var/run/gluster/ozone-tier-dht/promote-ozone-1 to the query file [No such file or directory]
[2016-05-06 09:30:02.807200] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)

Multiple connection refused errors were seen in the other nodes: 

[2016-05-06 09:29:19.434783] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:29:23.438925] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:29:27.446776] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:29:31.452944] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:29:35.460888] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:29:39.464874] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
glusterfs-3.7.9-3.el7rhgs.x86_64


How reproducible: Hit it once
--------------------

Sosreports will be copied to http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/

--- Additional comment from Sweta Anandpara on 2016-05-06 06:11:14 EDT ---

[qe@rhsqe-repo 1333749]$ 
[qe@rhsqe-repo 1333749]$ hostname
rhsqe-repo.lab.eng.blr.redhat.com
[qe@rhsqe-repo 1333749]$ 
[qe@rhsqe-repo 1333749]$ 
[qe@rhsqe-repo 1333749]$ pwd
/home/repo/sosreports/1333749
[qe@rhsqe-repo 1333749]$ 
[qe@rhsqe-repo 1333749]$ 
[qe@rhsqe-repo 1333749]$ ls -l 
total 97940
-rwxr-xr-x. 1 qe qe 17523472 May  6 15:37 sosreport-dhcp35-210.lab.eng.blr.redhat.com-20160506092326.tar.xz
-rwxr-xr-x. 1 qe qe 25732732 May  6 15:37 sosreport-sysreg-prod-20160506092321.tar.xz
-rwxr-xr-x. 1 qe qe 29254684 May  6 15:37 sosreport-sysreg-prod-20160506092323.tar.xz
-rwxr-xr-x. 1 qe qe 27769984 May  6 15:37 sosreport-sysreg-prod-20160506092325.tar.xz
[qe@rhsqe-repo 1333749]$

--- Additional comment from Joseph Elwin Fernandes on 2016-05-06 09:47:18 EDT ---

Looks like network failure.

Depending on the version and journal mode of sqlite3, tier migration process,
does the query locally (rhel 7) or via CTR (rhel 6, which is not recommended).
To get the version and journal mode tier does a IPC FOP to the brick. Since in this case, there is
as failure of network(not sure which path the client translator selects, network or
unix domain socket as it only sends it to the local bricks on the node, looking at
the log its seems a network call as they are using IP's and port numbers)
or the brick is down, this call fails.

Will look into the brick log of this node and see if the bricks is up or not.

--- Additional comment from Joseph Elwin Fernandes on 2016-05-06 09:49:57 EDT ---

What the vol info ? I mean the name of the bricks?

--- Additional comment from Joseph Elwin Fernandes on 2016-05-06 09:51:42 EDT ---

is this the vol info of the setup ?

type=5
count=16
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=6
redundancy_count=2
version=3
transport-type=0
volume-id=9227798a-cdd5-4ff6-ab5e-046a8434cc5e
username=15f61858-8340-4da1-aa0f-9df80581d1f0
password=3516b404-dbd0-4858-b29a-1470ad22c120
op-version=30700
client-op-version=30700
quota-version=0
cold_count=12
cold_replica_count=1
cold_disperse_count=6
cold_redundancy_count=2
hot_count=4
hot_replica_count=2
hot_type=2
cold_type=4
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
cluster.tier-mode=cache
features.ctr-enabled=on
performance.readdir-ahead=on
brick-0=10.70.35.13:-bricks-brick3-ozone_tier
brick-1=10.70.35.137:-bricks-brick3-ozone_tier
brick-2=10.70.35.85:-bricks-brick3-ozone_tier
brick-3=10.70.35.210:-bricks-brick3-ozone_tier
brick-4=10.70.35.210:-bricks-brick0-ozone
brick-5=10.70.35.85:-bricks-brick0-ozone
brick-6=10.70.35.137:-bricks-brick0-ozone
brick-7=10.70.35.13:-bricks-brick0-ozone
brick-8=10.70.35.210:-bricks-brick1-ozone
brick-9=10.70.35.85:-bricks-brick1-ozone
brick-10=10.70.35.137:-bricks-brick1-ozone
brick-11=10.70.35.13:-bricks-brick1-ozone
brick-12=10.70.35.210:-bricks-brick2-ozone
brick-13=10.70.35.85:-bricks-brick2-ozone
brick-14=10.70.35.137:-bricks-brick2-ozone
brick-15=10.70.35.13:-bricks-brick2-ozone

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-05-06 17:48:24 EDT ---

This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Sweta Anandpara on 2016-05-08 23:37:30 EDT ---

Yes it is. Had a 2*(4+2) volume(ozone) as cold tier and 2*2 volume as hot tier (ozone_tier)

--- Additional comment from Sweta Anandpara on 2016-05-08 23:45:36 EDT ---

I have the setup if it has to be looked at. 

The hypervisor was impacted in the rack-replacement/lab-shutdown that took place last weekend. Have got my setup back online now and can be accessed at: 10.70.35.210
Will share the password over email.

[root@dhcp35-210 ~]# 
[root@dhcp35-210 ~]# gluster v info
 
Volume Name: ozone
Type: Tier
Volume ID: 9227798a-cdd5-4ff6-ab5e-046a8434cc5e
Status: Started
Number of Bricks: 16
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.35.13:/bricks/brick3/ozone_tier
Brick2: 10.70.35.137:/bricks/brick3/ozone_tier
Brick3: 10.70.35.85:/bricks/brick3/ozone_tier
Brick4: 10.70.35.210:/bricks/brick3/ozone_tier
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick5: 10.70.35.210:/bricks/brick0/ozone
Brick6: 10.70.35.85:/bricks/brick0/ozone
Brick7: 10.70.35.137:/bricks/brick0/ozone
Brick8: 10.70.35.13:/bricks/brick0/ozone
Brick9: 10.70.35.210:/bricks/brick1/ozone
Brick10: 10.70.35.85:/bricks/brick1/ozone
Brick11: 10.70.35.137:/bricks/brick1/ozone
Brick12: 10.70.35.13:/bricks/brick1/ozone
Brick13: 10.70.35.210:/bricks/brick2/ozone
Brick14: 10.70.35.85:/bricks/brick2/ozone
Brick15: 10.70.35.137:/bricks/brick2/ozone
Brick16: 10.70.35.13:/bricks/brick2/ozone
Options Reconfigured:
performance.readdir-ahead: on
features.ctr-enabled: on
cluster.tier-mode: cache
[root@dhcp35-210 ~]#

--- Additional comment from Atin Mukherjee on 2016-05-09 01:57:00 EDT ---

I have an initial RCA for why the client was trying to connect to the stale port.

A brick process initiates a SIGNOUT event from cleanup_and_exit () which is called only in a graceful shut down case. If a brick process is brought down by kill -9 semantics then glusterd doesn't receive this event which eventually means the stale port details will still be in the data structure. As port search logic starts from base_port and goes up to last_alloc, glusterd will provide the older stale details instead of new one in this case resulting into this failure.

I am thinking of modifying the port map search logic from last_alloc to base such that we always pick up the fresh entries and eliminate the case of clashing with older entries.

Question here is why are we using kill -9 instead of kill -15 to test brick down scenario?

--- Additional comment from Atin Mukherjee on 2016-05-09 04:50:39 EDT ---

And I confirmed with Sweta that the same volume was stopped, deleted and recreated back with same brick path and kill -9 was used to bring down the brick process.

--- Additional comment from Vijay Bellur on 2016-05-09 06:23:00 EDT ---

REVIEW: http://review.gluster.org/14268 (glusterd: search port from last_alloc to base_port) posted (#1) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Vijay Bellur on 2016-07-05 07:43:24 EDT ---

COMMIT: http://review.gluster.org/14268 committed in master by Jeff Darcy (jdarcy) 
------
commit 967a77ed4db0e1c0bcc23f132e312b659ce961ef
Author: Atin Mukherjee <amukherj>
Date:   Mon May 9 12:14:37 2016 +0530

    glusterd: search port from last_alloc to base_port
    
    If a brick process is killed ungracefully then GlusterD wouldn't receive a
    PMAP_SIGNOUT event and hence the stale port details wouldn't be removed out.
    
    Now consider the following case:
    1. Create a volume with 1 birck
    2. Start the volume (say brick port allocated is 49152)
    3. Kill the brick process by 'kill -9'
    4. Stop & delete the volume
    5. Recreate the volume and start it. (Now the brick port gets 49153)
    6. Mount the volume
    
    Now in step 6 mount will fail as GlusterD will provide back the stale port
    number given the query starts searching from the base_port.
    
    Solution:
    
    To avoid this, searching for port from last_alloc and coming down to base_port
    should solve the issue.
    
    Change-Id: I9afafd722a7fda0caac4cc892605f4e7c0e48e73
    BUG: 1334270
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/14268
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Samikshan Bairagya <samikshan>
    Reviewed-by: Jeff Darcy <jdarcy>

Comment 1 Vijay Bellur 2016-07-07 04:21:35 UTC

REVIEW: http://review.gluster.org/14867 (glusterd: search port from last_alloc to base_port) posted (#1) for review on release-3.8 by Atin Mukherjee (amukherj)

Comment 2 Vijay Bellur 2016-07-07 13:51:31 UTC

COMMIT: http://review.gluster.org/14867 committed in release-3.8 by Jeff Darcy (jdarcy) 
------
commit f4044fa4ff75389de0cf17008179d55ac0a15b33
Author: Atin Mukherjee <amukherj>
Date:   Mon May 9 12:14:37 2016 +0530

    glusterd: search port from last_alloc to base_port
    
    Backport of http://review.gluster.org/14268
    
    If a brick process is killed ungracefully then GlusterD wouldn't receive a
    PMAP_SIGNOUT event and hence the stale port details wouldn't be removed out.
    
    Now consider the following case:
    1. Create a volume with 1 birck
    2. Start the volume (say brick port allocated is 49152)
    3. Kill the brick process by 'kill -9'
    4. Stop & delete the volume
    5. Recreate the volume and start it. (Now the brick port gets 49153)
    6. Mount the volume
    
    Now in step 6 mount will fail as GlusterD will provide back the stale port
    number given the query starts searching from the base_port.
    
    Solution:
    
    To avoid this, searching for port from last_alloc and coming down to base_port
    should solve the issue.
    
    >Change-Id: I9afafd722a7fda0caac4cc892605f4e7c0e48e73
    >BUG: 1334270
    >Signed-off-by: Atin Mukherjee <amukherj>
    >Reviewed-on: http://review.gluster.org/14268
    >Smoke: Gluster Build System <jenkins.org>
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >Reviewed-by: Samikshan Bairagya <samikshan>
    >Reviewed-by: Jeff Darcy <jdarcy>
    
    Change-Id: I9afafd722a7fda0caac4cc892605f4e7c0e48e73
    BUG: 1353426
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/14867
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Kaushal M <kaushal>

Comment 3 Niels de Vos 2016-07-08 14:42:35 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.1, please open a new bug report.

glusterfs-3.8.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.packaging/156
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.