Bug 1000380 - unable to start volume after geo rep session stop, volume stop force and rpm upgrade
Summary: unable to start volume after geo rep session stop, volume stop force and rpm ...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd
Version: 2.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-23 10:32 UTC by Rachana Patel
Modified: 2015-11-25 08:50 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-25 08:49:04 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Rachana Patel 2013-08-23 10:32:29 UTC
Description of problem:
unable to start volume after geo rep session stop, volum estop force and rpm upgrade

Version-Release number of selected component (if applicable):
3.4.0.22rhs-2.el6rhs.x86_64

How reproducible:
haven't tried

Steps to Reproduce:
1. stopped all geo rep session between master and slave cluster
[root@DVM1 ~]# gluster volume geo master1 status
NODE                           MASTER     SLAVE                                              HEALTH         UPTIME       
---------------------------------------------------------------------------------------------------------------------
DVM1.lab.eng.blr.redhat.com    master1    ssh://10.70.37.219::slave1                         Not Started    N/A          
DVM1.lab.eng.blr.redhat.com    master1    ssh://rhsauto018.lab.eng.blr.redhat.com::slave1    Stopped        N/A          
DVM2.lab.eng.blr.redhat.com    master1    ssh://10.70.37.219::slave1                         Not Started    N/A          
DVM2.lab.eng.blr.redhat.com    master1    ssh://rhsauto018.lab.eng.blr.redhat.com::slave1    Stopped        N/A          
DVM5.lab.eng.blr.redhat.com    master1    ssh://10.70.37.219::slave1                         Not Started    N/A          
DVM5.lab.eng.blr.redhat.com    master1    ssh://rhsauto018.lab.eng.blr.redhat.com::slave1    Stopped        N/A          
DVM4.lab.eng.blr.redhat.com    master1    ssh://10.70.37.219::slave1                         Not Started    N/A          
DVM4.lab.eng.blr.redhat.com    master1    ssh://rhsauto018.lab.eng.blr.redhat.com::slave1    Stopped        N/A          
DVM6.lab.eng.blr.redhat.com    master1    ssh://10.70.37.219::slave1                         Not Started    N/A          
DVM6.lab.eng.blr.redhat.com    master1    ssh://rhsauto018.lab.eng.blr.redhat.com::slave1    Stopped        N/A          
[root@DVM1 ~]# gluster volume geo master2 status
No active geo-replication sessions for master2

2. tried to stop volume which gave error 'geo rep session is active',so used force option
[root@DVM5 rpm]# for i in ` ls /var/lib/glusterd/vols/` ; do gluster v stop $i ; done

Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: master1: failed: geo-replication sessions are active for the volume 'master1'.
Use 'volume geo-replication status' command for more info. Use 'force' option to ignore and stop the volume.

[root@DVM5 rpm]# gluster volume stop master1 force
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: master1: success


3. upgrade rpm to 3.4.0.22rhs-2.el6rhs.x86_64

4. try to start volume. able to start all volume except one which has geo rep session

[root@DVM1 rpm]# gluster volume start master1
volume start: master1: failed: Commit failed on localhost. Please check the log file for more details.
[root@DVM1 rpm]# less /var/log/glusterfs/etc-glusterfs-glusterd.vol.log 
[root@DVM1 rpm]# ls /rhs/brick1
1  dir1  f1  f10  f2  f3  f4  f5  f6  f7  f8  f9  n1
[root@DVM1 rpm]# getfattr -d -m . -e hex /rhs/brick1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff
trusted.glusterfs.fa11e206-d039-4606-92fa-29f29a9a8dfa.xtime=0x521573ed0004ccfa
trusted.glusterfs.volume-id=0xfa11e206d039460692fa29f29a9a8dfa

[root@DVM1 rpm]# gluster volume start master1 force
volume start: master1: success

[root@DVM1 rpm]# gluster v info master1
 
Volume Name: master1
Type: Distributed-Replicate
Volume ID: fa11e206-d039-4606-92fa-29f29a9a8dfa
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.37.128:/rhs/brick1
Brick2: 10.70.37.110:/rhs/brick1
Brick3: 10.70.37.192:/rhs/brick1
Brick4: 10.70.37.88:/rhs/brick1
Brick5: 10.70.37.81:/rhs/brick1
Brick6: 10.70.37.88:/rhs/brick5/2
Options Reconfigured:
changelog.fsync-interval: 3
changelog.rollover-time: 15
changelog.encoding: ascii
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
diagnostics.client-log-level: INFO
changelog.changelog: on

[root@DVM1 rpm]# gluster v status master1
Status of volume: master1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.37.128:/rhs/brick1				N/A	N	N/A
Brick 10.70.37.110:/rhs/brick1				N/A	N	N/A
Brick 10.70.37.192:/rhs/brick1				N/A	N	N/A
Brick 10.70.37.81:/rhs/brick1				N/A	N	N/A
NFS Server on localhost					2049	Y	25966
Self-heal Daemon on localhost				N/A	Y	25972
NFS Server on 10.70.37.81				2049	Y	20253
Self-heal Daemon on 10.70.37.81				N/A	Y	20259
NFS Server on 10.70.37.192				2049	Y	20672
Self-heal Daemon on 10.70.37.192			N/A	Y	20678
NFS Server on 10.70.37.110				2049	Y	18720
Self-heal Daemon on 10.70.37.110			N/A	Y	18730
 
There are no active volume tasks

[root@DVM1 rpm]# gluster volume start master1 
volume start: master1: failed: Volume master1 already started

[root@DVM1 rpm]# gluster v status master1
Status of volume: master1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.37.128:/rhs/brick1				N/A	N	N/A
Brick 10.70.37.110:/rhs/brick1				N/A	N	N/A
Brick 10.70.37.192:/rhs/brick1				N/A	N	N/A
Brick 10.70.37.81:/rhs/brick1				N/A	N	N/A
NFS Server on localhost					2049	Y	25966
Self-heal Daemon on localhost				N/A	Y	25972
NFS Server on 10.70.37.192				2049	Y	20672
Self-heal Daemon on 10.70.37.192			N/A	Y	20678
NFS Server on 10.70.37.81				2049	Y	20253
Self-heal Daemon on 10.70.37.81				N/A	Y	20259
NFS Server on 10.70.37.110				2049	Y	18720
Self-heal Daemon on 10.70.37.110			N/A	Y	18730
 
There are no active volume tasks


Actual results:
unable to start volume

Expected results:


Additional info:
log snippet :-
brick log
[2013-08-22 23:23:50.901409] E [posix-handle.c:379:posix_handle_init] 0-master1-posix: Different dirs /rhs/brick1 (512/64770) != /rhs/b
rick1/.glusterfs/00/00/00000000-0000-0000-0000-000000000001 (75497675/64770)
[2013-08-22 23:23:50.901432] E [posix.c:4676:init] 0-master1-posix: Posix handle setup failed
[2013-08-22 23:23:50.901441] E [xlator.c:423:xlator_init] 0-master1-posix: Initialization of volume 'master1-posix' failed, review your
 volfile again
[2013-08-22 23:23:50.901451] E [graph.c:292:glusterfs_graph_init] 0-master1-posix: initializing translator failed
[2013-08-22 23:23:50.901459] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed
[2013-08-22 23:23:50.901737] W [glusterfsd.c:1062:cleanup_and_exit] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f1ac0b
a3f35] (-->/usr/sbin/glusterfsd(mgmt_getspec_cbk+0x2ed) [0x40baed] (-->/usr/sbin/glusterfsd(glusterfs_process_volfp+0x106) [0x405066]))
) 0-: received signum (0), shutting down
[2013-08-22 23:24:57.613435] I [glusterfsd.c:1988:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.4.0.22rh
s (/usr/sbin/glusterfsd -s 10.70.37.128 --volfile-id master1.10.70.37.128.rhs-brick1 -p /var/lib/glusterd/vols/master1/run/10.70.37.128
-rhs-brick1.pid -S /var/run/1baa07e44194db8dd4a9e235f6f28f34.socket --brick-name /rhs/brick1 -l /var/log/glusterfs/bricks/rhs-brick1.lo
g --xlator-option *-posix.glusterd-uuid=6b7ec72c-3f0a-45c2-9cdb-656231b6c04d --brick-port 49152 --xlator-option master1-server.listen-p
ort=49152)
[2013-08-22 23:24:57.619158] I [socket.c:3487:socket_init] 0-socket.glusterfsd: SSL support is NOT enabled
[2013-08-22 23:24:57.619238] I [socket.c:3502:socket_init] 0-socket.glusterfsd: using system polling thread
[2013-08-22 23:24:57.619581] I [socket.c:3487:socket_init] 0-glusterfs: SSL support is NOT enabled
[2013-08-22 23:24:57.619613] I [socket.c:3502:socket_init] 0-glusterfs: using system polling thread
[2013-08-22 23:24:57.627461] I [graph.c:239:gf_add_cmdline_options] 0-master1-server: adding option 'listen-port' for volume 'master1-server' with value '49152'
[2013-08-22 23:24:57.627486] I [graph.c:239:gf_add_cmdline_options] 0-master1-posix: adding option 'glusterd-uuid' for volume 'master1-posix' with value '6b7ec72c-3f0a-45c2-9cdb-656231b6c04d'
[2013-08-22 23:24:57.629132] W [options.c:848:xl_opt_validate] 0-master1-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction
[2013-08-22 23:24:57.629169] I [socket.c:3487:socket_init] 0-tcp.master1-server: SSL support is NOT enabled
[2013-08-22 23:24:57.629181] I [socket.c:3502:socket_init] 0-tcp.master1-server: using system polling thread
[2013-08-22 23:24:57.629260] I [quota.c:2748:quota_parse_limits] 0-master1-quota: could not get the limits
[2013-08-22 23:24:57.631642] E [posix-handle.c:379:posix_handle_init] 0-master1-posix: Different dirs /rhs/brick1 (512/64770) != /rhs/brick1/.glusterfs/00/00/00000000-0000-0000-0000-000000000001 (75497675/64770)

[2013-08-22 23:24:57.631668] E [posix.c:4676:init] 0-master1-posix: Posix handle setup failed
[2013-08-22 23:24:57.631678] E [xlator.c:423:xlator_init] 0-master1-posix: Initialization of volume 'master1-posix' failed, review your
 volfile again
[2013-08-22 23:24:57.631688] E [graph.c:292:glusterfs_graph_init] 0-master1-posix: initializing translator failed
[2013-08-22 23:24:57.631697] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed
[2013-08-22 23:24:57.632023] W [glusterfsd.c:1062:cleanup_and_exit] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f06c5eaff35] (-->/usr/sbin/glusterfsd(mgmt_getspec_cbk+0x2ed) [0x40baed] (-->/usr/sbin/glusterfsd(glusterfs_process_volfp+0x106) [0x405066]))) 0-: received signum (0), shutting down
[2013-08-22 23:32:10.214199] I [glusterfsd.c:1988:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.4.0.22rhs (/usr/sbin/glusterfsd -s 10.70.37.128 --volfile-id master1.10.70.37.128.rhs-brick1 -p /var/lib/glusterd/vols/master1/run/10.70.37.128-rhs-brick1.pid -S /var/run/1baa07e44194db8dd4a9e235f6f28f34.socket --brick-name /rhs/brick1 -l /var/log/glusterfs/bricks/rhs-brick1.log --xlator-option *-posix.glusterd-uuid=6b7ec72c-3f0a-45c2-9cdb-656231b6c04d --brick-port 49152 --xlator-option master1-server.listen-port=49152)


glusterd log :-

[2013-08-22 23:24:57.635804] E [glusterd-syncop.c:951:gd_commit_op_phase] 0-management: Commit of operation 'Volume Start' failed on localhost    
[2013-08-22 23:32:10.232193] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick (null) on port 49152
[2013-08-22 23:32:10.235804] E [glusterd-utils.c:4076:glusterd_brick_start] 0-management: Unable to start brick 10.70.37.128:/rhs/brick1
[2013-08-22 23:32:11.310508] E [glusterd-utils.c:3526:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/7eea64235394df91f42812e13630a0af.socket error: Permission denied

Comment 4 Aravinda VK 2015-11-25 08:49:04 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.

Comment 5 Aravinda VK 2015-11-25 08:50:55 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.


Note You need to log in before you can comment on or make changes to this bug.