Bug 989448

Summary: Dist-geo-rep: Stale glusterfs mount process in slave after geo-rep session stop and delete
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: M S Vishwanath Bhat <vbhat>
Component: geo-replicationAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED EOL QA Contact: shilpa <smanjara>
Severity: medium Docs Contact:
Priority: high    
Version: 2.1CC: avishwan, chrisw, csaba, mzywusko, rhs-bugs, rwheeler, vagarwal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: usability
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-25 08:48:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
glusterfs mount process logfile
none
glusterd log file from the slave none

Description M S Vishwanath Bhat 2013-07-29 09:44:54 UTC
Created attachment 779710 [details]
glusterfs mount process logfile

Description of problem:
After stopping and deleting the session from the master volume, there is a stale fuse mount process in the slave.


Version-Release number of selected component (if applicable):
glusterfs-3.4.0.12rhs.beta6-1.el6rhs.x86_64

How reproducible:
Hit once.

Steps to Reproduce:
1. Create and start master and slave volumes.
2. Create and start a geo-rep session between master and slave.
3. Now after syncing data stop the geo-rep session and delete it.
4. Stop the master and slave volumes.

Actual results:
In one of the slave

[root@falcon ~]# ps -aef | grep gluster
root     18992     1  0 Jul24 ?        00:00:30 /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid
root     19334     1  0 Jul24 ?        00:00:22 glusterfs -s localhost --xlator-option=*dht.lookup-unhashed=off --volfile-id hosa-slave -l /var/log/glusterfs/geo-replication-slaves/slave.log /tmp/tmp.GoL7yiyMEf


Expected results:
There should be no stale glusterfs processes.


Additional info:

These error messages were seen in the log file of the slave mount.


[2013-07-26 06:11:41.820108] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2013-07-26 06:11:41.829570] I [glusterfsd-mgmt.c:1545:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2013-07-26 06:11:41.829676] I [glusterfsd-mgmt.c:1545:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2013-07-27 21:46:01.493633] I [glusterfsd.c:1128:reincarnate] 0-glusterfsd: Fetching the volume file from server...
[2013-07-27 21:46:01.494777] I [glusterfsd-mgmt.c:1545:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2013-07-29 09:27:50.114288] W [socket.c:522:__socket_rwv] 0-hosa-slave-client-0: readv on 10.70.43.183:49152 failed (No data available)
[2013-07-29 09:27:50.114380] I [client.c:2103:client_rpc_notify] 0-hosa-slave-client-0: disconnected from 10.70.43.183:49152. Client process will keep trying to connect to glusterd until brick's port is available. 
[2013-07-29 09:27:52.248986] W [socket.c:522:__socket_rwv] 0-hosa-slave-client-3: readv on 10.70.42.219:49152 failed (No data available)
[2013-07-29 09:27:52.249323] I [client.c:2103:client_rpc_notify] 0-hosa-slave-client-3: disconnected from 10.70.42.219:49152. Client process will keep trying to connect to glusterd until brick's port is available. 
[2013-07-29 09:27:52.249827] W [socket.c:522:__socket_rwv] 0-hosa-slave-client-1: readv on 10.70.43.197:49152 failed (No data available)
[2013-07-29 09:27:52.250122] I [client.c:2103:client_rpc_notify] 0-hosa-slave-client-1: disconnected from 10.70.43.197:49152. Client process will keep trying to connect to glusterd until brick's port is available. 
[2013-07-29 09:27:52.250168] E [afr-common.c:3822:afr_notify] 0-hosa-slave-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2013-07-29 09:27:52.251311] W [socket.c:522:__socket_rwv] 0-hosa-slave-client-2: readv on 10.70.43.48:49152 failed (No data available)
[2013-07-29 09:27:52.251561] I [client.c:2103:client_rpc_notify] 0-hosa-slave-client-2: disconnected from 10.70.43.48:49152. Client process will keep trying to connect to glusterd until brick's port is available. 
[2013-07-29 09:27:52.251608] E [afr-common.c:3822:afr_notify] 0-hosa-slave-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2013-07-29 09:28:00.427878] E [client-handshake.c:1741:client_query_portmap_cbk] 0-hosa-slave-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2013-07-29 09:28:00.428047] I [client.c:2103:client_rpc_notify] 0-hosa-slave-client-0: disconnected from 10.70.43.183:24007. Client process will keep trying to connect to glusterd until brick's port is available. 
[2013-07-29 09:28:02.437290] E [client-handshake.c:1741:client_query_portmap_cbk] 0-hosa-slave-client-3: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2013-07-29 09:28:02.437569] I [client.c:2103:client_rpc_notify] 0-hosa-slave-client-3: disconnected from 10.70.42.219:24007. Client process will keep trying to connect to glusterd until brick's port is available. 
[2013-07-29 09:28:02.443829] E [client-handshake.c:1741:client_query_portmap_cbk] 0-hosa-slave-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2013-07-29 09:28:02.444072] I [client.c:2103:client_rpc_notify] 0-hosa-slave-client-1: disconnected from 10.70.43.197:24007. Client process will keep trying to connect to glusterd until brick's port is available. 
[2013-07-29 09:28:02.449285] E [client-handshake.c:1741:client_query_portmap_cbk] 0-hosa-slave-client-2: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2013-07-29 09:28:02.449493] I [client.c:2103:client_rpc_notify] 0-hosa-slave-client-2: disconnected from 10.70.43.48:24007. Client process will keep trying to connect to glusterd until brick's port is available. 



I have attached the glusterfs mount log and glusterd log from that slave node.

Comment 1 M S Vishwanath Bhat 2013-07-29 09:45:44 UTC
Created attachment 779711 [details]
glusterd log file from the slave

Comment 3 Amar Tumballi 2013-08-01 10:09:40 UTC
Csaba, can you please have a look on this ASAP. We need to have all the 'time' to be in UTC in the code (so failover-failback should be seemless).

Comment 4 Amar Tumballi 2013-08-01 10:11:58 UTC
above comment was for bug 987272

Comment 5 M S Vishwanath Bhat 2013-08-07 09:02:26 UTC
I hit this again with 3.4.0.15rhs

Comment 6 Aravinda VK 2014-12-24 07:03:00 UTC
Need to check behavior with latest 2.1 & 3.0 rpms.

Comment 7 Aravinda VK 2015-11-25 08:48:59 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.

Comment 8 Aravinda VK 2015-11-25 08:50:50 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.