1333749 – glusterd: glusterd provides stale port information when a volume is recreated with same brick path

Bug 1333749 - glusterd: glusterd provides stale port information when a volume is recreated with same brick path

Summary: glusterd: glusterd provides stale port information when a volume is recreated...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Atin Mukherjee
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1334270 1351522 1353426
TreeView+	depends on / blocked

Reported:	2016-05-06 10:04 UTC by Sweta Anandpara
Modified:	2017-03-23 05:29 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.8.4-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1334270 (view as bug list)
Environment:
Last Closed:	2017-03-23 05:29:56 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Sweta Anandpara 2016-05-06 10:04:27 UTC

Description of problem:
-------------------------

Had a 2*(4+2) volume, with roughly a lakh files (of size 1k created) from nfs client. Did a 'ls -l | wc -l' , at the same time started creating files of 1g from another mountpoint. When both the above mentioned commands were proceeding, did a tier-attach of 2*2 volume. The command got executed successfully, but these were the errors seen in the logs: 

[2016-05-06 09:29:58.797805] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:30:00.848981] E [MSGID: 109037] [tier.c:1237:tier_process_brick] 0-tier: Failed to get journal_mode of sql db /bricks/brick1/ozone/.glusterfs/ozone.db
[2016-05-06 09:30:00.849018] E [MSGID: 109087] [tier.c:1341:tier_build_migration_qfile] 0-ozone-tier-dht: Brick /bricks/brick1/ozone/.glusterfs/ozone.db query failed
[2016-05-06 09:30:01.018505] E [MSGID: 109037] [tier.c:1394:tier_migrate_files_using_qfile] 0-tier: Failed to open /var/run/gluster/ozone-tier-dht/promote-ozone-1 to the query file [No such file or directory]
[2016-05-06 09:30:02.807200] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)

Multiple connection refused errors were seen in the other nodes: 

[2016-05-06 09:29:19.434783] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:29:23.438925] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:29:27.446776] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:29:31.452944] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:29:35.460888] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:29:39.464874] E [socket.c:2279:socket_connect_finish] 0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
glusterfs-3.7.9-3.el7rhgs.x86_64


How reproducible: Hit it once
--------------------

Sosreports will be copied to http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/

Comment 2 Joseph Elwin Fernandes 2016-05-06 13:47:18 UTC

Looks like network failure.

Depending on the version and journal mode of sqlite3, tier migration process,
does the query locally (rhel 7) or via CTR (rhel 6, which is not recommended).
To get the version and journal mode tier does a IPC FOP to the brick. Since in this case, there is
as failure of network(not sure which path the client translator selects, network or
unix domain socket as it only sends it to the local bricks on the node, looking at
the log its seems a network call as they are using IP's and port numbers)
or the brick is down, this call fails.

Will look into the brick log of this node and see if the bricks is up or not.

Comment 3 Joseph Elwin Fernandes 2016-05-06 13:49:57 UTC

What the vol info ? I mean the name of the bricks?

Comment 4 Joseph Elwin Fernandes 2016-05-06 13:51:42 UTC

is this the vol info of the setup ?

type=5
count=16
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=6
redundancy_count=2
version=3
transport-type=0
volume-id=9227798a-cdd5-4ff6-ab5e-046a8434cc5e
username=15f61858-8340-4da1-aa0f-9df80581d1f0
password=3516b404-dbd0-4858-b29a-1470ad22c120
op-version=30700
client-op-version=30700
quota-version=0
cold_count=12
cold_replica_count=1
cold_disperse_count=6
cold_redundancy_count=2
hot_count=4
hot_replica_count=2
hot_type=2
cold_type=4
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
cluster.tier-mode=cache
features.ctr-enabled=on
performance.readdir-ahead=on
brick-0=10.70.35.13:-bricks-brick3-ozone_tier
brick-1=10.70.35.137:-bricks-brick3-ozone_tier
brick-2=10.70.35.85:-bricks-brick3-ozone_tier
brick-3=10.70.35.210:-bricks-brick3-ozone_tier
brick-4=10.70.35.210:-bricks-brick0-ozone
brick-5=10.70.35.85:-bricks-brick0-ozone
brick-6=10.70.35.137:-bricks-brick0-ozone
brick-7=10.70.35.13:-bricks-brick0-ozone
brick-8=10.70.35.210:-bricks-brick1-ozone
brick-9=10.70.35.85:-bricks-brick1-ozone
brick-10=10.70.35.137:-bricks-brick1-ozone
brick-11=10.70.35.13:-bricks-brick1-ozone
brick-12=10.70.35.210:-bricks-brick2-ozone
brick-13=10.70.35.85:-bricks-brick2-ozone
brick-14=10.70.35.137:-bricks-brick2-ozone
brick-15=10.70.35.13:-bricks-brick2-ozone

Comment 6 Sweta Anandpara 2016-05-09 03:37:30 UTC

Yes it is. Had a 2*(4+2) volume(ozone) as cold tier and 2*2 volume as hot tier (ozone_tier)

Comment 8 Atin Mukherjee 2016-05-09 05:57:00 UTC

I have an initial RCA for why the client was trying to connect to the stale port.

A brick process initiates a SIGNOUT event from cleanup_and_exit () which is called only in a graceful shut down case. If a brick process is brought down by kill -9 semantics then glusterd doesn't receive this event which eventually means the stale port details will still be in the data structure. As port search logic starts from base_port and goes up to last_alloc, glusterd will provide the older stale details instead of new one in this case resulting into this failure.

I am thinking of modifying the port map search logic from last_alloc to base such that we always pick up the fresh entries and eliminate the case of clashing with older entries.

Question here is why are we using kill -9 instead of kill -15 to test brick down scenario?

Comment 9 Atin Mukherjee 2016-05-09 08:50:39 UTC

And I confirmed with Sweta that the same volume was stopped, deleted and recreated back with same brick path and kill -9 was used to bring down the brick process.

Comment 10 Atin Mukherjee 2016-05-09 10:24:42 UTC

Upstream patch http://review.gluster.org/14268 posted for review.

Comment 11 Sweta Anandpara 2016-05-11 06:44:50 UTC

Noted. Kill -15 is going to be used henceforth, for brick down scenario. Thanks for the info.

Comment 13 Atin Mukherjee 2016-09-17 14:41:27 UTC

Upstream mainline : http://review.gluster.org/14268
Upstream 3.8 : http://review.gluster.org/14867

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.

Comment 18 Byreddy 2016-10-04 05:02:41 UTC

Verified this bug using the build - glusterfs-3.8.4-2.

Fix is working good, reported issue is not seen, steps executed to test this one is as follows

Steps:
1. Created a volume using 1 brick  and started it
2. killed the brick process using "kill -9"
3. stopped and deleted the volume.
4. Recreated the volume using the same brick path with same volume name.
5. Done the fuse mount of volume // it got mounted successfully
6. Checked for "errors" in mount and glusterd log, no errors are found as mentioned in the description section.


Note: I could able to reproduce this issue in 3.1.3 with the above steps.


With above details moving this bug to verified state.

Comment 20 errata-xmlrpc 2017-03-23 05:29:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.