Hide Forgot
Description of problem: unable to start volume after geo rep session stop, volum estop force and rpm upgrade Version-Release number of selected component (if applicable): 3.4.0.22rhs-2.el6rhs.x86_64 How reproducible: haven't tried Steps to Reproduce: 1. stopped all geo rep session between master and slave cluster [root@DVM1 ~]# gluster volume geo master1 status NODE MASTER SLAVE HEALTH UPTIME --------------------------------------------------------------------------------------------------------------------- DVM1.lab.eng.blr.redhat.com master1 ssh://10.70.37.219::slave1 Not Started N/A DVM1.lab.eng.blr.redhat.com master1 ssh://rhsauto018.lab.eng.blr.redhat.com::slave1 Stopped N/A DVM2.lab.eng.blr.redhat.com master1 ssh://10.70.37.219::slave1 Not Started N/A DVM2.lab.eng.blr.redhat.com master1 ssh://rhsauto018.lab.eng.blr.redhat.com::slave1 Stopped N/A DVM5.lab.eng.blr.redhat.com master1 ssh://10.70.37.219::slave1 Not Started N/A DVM5.lab.eng.blr.redhat.com master1 ssh://rhsauto018.lab.eng.blr.redhat.com::slave1 Stopped N/A DVM4.lab.eng.blr.redhat.com master1 ssh://10.70.37.219::slave1 Not Started N/A DVM4.lab.eng.blr.redhat.com master1 ssh://rhsauto018.lab.eng.blr.redhat.com::slave1 Stopped N/A DVM6.lab.eng.blr.redhat.com master1 ssh://10.70.37.219::slave1 Not Started N/A DVM6.lab.eng.blr.redhat.com master1 ssh://rhsauto018.lab.eng.blr.redhat.com::slave1 Stopped N/A [root@DVM1 ~]# gluster volume geo master2 status No active geo-replication sessions for master2 2. tried to stop volume which gave error 'geo rep session is active',so used force option [root@DVM5 rpm]# for i in ` ls /var/lib/glusterd/vols/` ; do gluster v stop $i ; done Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: master1: failed: geo-replication sessions are active for the volume 'master1'. Use 'volume geo-replication status' command for more info. Use 'force' option to ignore and stop the volume. [root@DVM5 rpm]# gluster volume stop master1 force Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: master1: success 3. upgrade rpm to 3.4.0.22rhs-2.el6rhs.x86_64 4. try to start volume. able to start all volume except one which has geo rep session [root@DVM1 rpm]# gluster volume start master1 volume start: master1: failed: Commit failed on localhost. Please check the log file for more details. [root@DVM1 rpm]# less /var/log/glusterfs/etc-glusterfs-glusterd.vol.log [root@DVM1 rpm]# ls /rhs/brick1 1 dir1 f1 f10 f2 f3 f4 f5 f6 f7 f8 f9 n1 [root@DVM1 rpm]# getfattr -d -m . -e hex /rhs/brick1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff trusted.glusterfs.fa11e206-d039-4606-92fa-29f29a9a8dfa.xtime=0x521573ed0004ccfa trusted.glusterfs.volume-id=0xfa11e206d039460692fa29f29a9a8dfa [root@DVM1 rpm]# gluster volume start master1 force volume start: master1: success [root@DVM1 rpm]# gluster v info master1 Volume Name: master1 Type: Distributed-Replicate Volume ID: fa11e206-d039-4606-92fa-29f29a9a8dfa Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: 10.70.37.128:/rhs/brick1 Brick2: 10.70.37.110:/rhs/brick1 Brick3: 10.70.37.192:/rhs/brick1 Brick4: 10.70.37.88:/rhs/brick1 Brick5: 10.70.37.81:/rhs/brick1 Brick6: 10.70.37.88:/rhs/brick5/2 Options Reconfigured: changelog.fsync-interval: 3 changelog.rollover-time: 15 changelog.encoding: ascii geo-replication.indexing: on geo-replication.ignore-pid-check: on diagnostics.client-log-level: INFO changelog.changelog: on [root@DVM1 rpm]# gluster v status master1 Status of volume: master1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.128:/rhs/brick1 N/A N N/A Brick 10.70.37.110:/rhs/brick1 N/A N N/A Brick 10.70.37.192:/rhs/brick1 N/A N N/A Brick 10.70.37.81:/rhs/brick1 N/A N N/A NFS Server on localhost 2049 Y 25966 Self-heal Daemon on localhost N/A Y 25972 NFS Server on 10.70.37.81 2049 Y 20253 Self-heal Daemon on 10.70.37.81 N/A Y 20259 NFS Server on 10.70.37.192 2049 Y 20672 Self-heal Daemon on 10.70.37.192 N/A Y 20678 NFS Server on 10.70.37.110 2049 Y 18720 Self-heal Daemon on 10.70.37.110 N/A Y 18730 There are no active volume tasks [root@DVM1 rpm]# gluster volume start master1 volume start: master1: failed: Volume master1 already started [root@DVM1 rpm]# gluster v status master1 Status of volume: master1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.128:/rhs/brick1 N/A N N/A Brick 10.70.37.110:/rhs/brick1 N/A N N/A Brick 10.70.37.192:/rhs/brick1 N/A N N/A Brick 10.70.37.81:/rhs/brick1 N/A N N/A NFS Server on localhost 2049 Y 25966 Self-heal Daemon on localhost N/A Y 25972 NFS Server on 10.70.37.192 2049 Y 20672 Self-heal Daemon on 10.70.37.192 N/A Y 20678 NFS Server on 10.70.37.81 2049 Y 20253 Self-heal Daemon on 10.70.37.81 N/A Y 20259 NFS Server on 10.70.37.110 2049 Y 18720 Self-heal Daemon on 10.70.37.110 N/A Y 18730 There are no active volume tasks Actual results: unable to start volume Expected results: Additional info: log snippet :- brick log [2013-08-22 23:23:50.901409] E [posix-handle.c:379:posix_handle_init] 0-master1-posix: Different dirs /rhs/brick1 (512/64770) != /rhs/b rick1/.glusterfs/00/00/00000000-0000-0000-0000-000000000001 (75497675/64770) [2013-08-22 23:23:50.901432] E [posix.c:4676:init] 0-master1-posix: Posix handle setup failed [2013-08-22 23:23:50.901441] E [xlator.c:423:xlator_init] 0-master1-posix: Initialization of volume 'master1-posix' failed, review your volfile again [2013-08-22 23:23:50.901451] E [graph.c:292:glusterfs_graph_init] 0-master1-posix: initializing translator failed [2013-08-22 23:23:50.901459] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed [2013-08-22 23:23:50.901737] W [glusterfsd.c:1062:cleanup_and_exit] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f1ac0b a3f35] (-->/usr/sbin/glusterfsd(mgmt_getspec_cbk+0x2ed) [0x40baed] (-->/usr/sbin/glusterfsd(glusterfs_process_volfp+0x106) [0x405066])) ) 0-: received signum (0), shutting down [2013-08-22 23:24:57.613435] I [glusterfsd.c:1988:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.4.0.22rh s (/usr/sbin/glusterfsd -s 10.70.37.128 --volfile-id master1.10.70.37.128.rhs-brick1 -p /var/lib/glusterd/vols/master1/run/10.70.37.128 -rhs-brick1.pid -S /var/run/1baa07e44194db8dd4a9e235f6f28f34.socket --brick-name /rhs/brick1 -l /var/log/glusterfs/bricks/rhs-brick1.lo g --xlator-option *-posix.glusterd-uuid=6b7ec72c-3f0a-45c2-9cdb-656231b6c04d --brick-port 49152 --xlator-option master1-server.listen-p ort=49152) [2013-08-22 23:24:57.619158] I [socket.c:3487:socket_init] 0-socket.glusterfsd: SSL support is NOT enabled [2013-08-22 23:24:57.619238] I [socket.c:3502:socket_init] 0-socket.glusterfsd: using system polling thread [2013-08-22 23:24:57.619581] I [socket.c:3487:socket_init] 0-glusterfs: SSL support is NOT enabled [2013-08-22 23:24:57.619613] I [socket.c:3502:socket_init] 0-glusterfs: using system polling thread [2013-08-22 23:24:57.627461] I [graph.c:239:gf_add_cmdline_options] 0-master1-server: adding option 'listen-port' for volume 'master1-server' with value '49152' [2013-08-22 23:24:57.627486] I [graph.c:239:gf_add_cmdline_options] 0-master1-posix: adding option 'glusterd-uuid' for volume 'master1-posix' with value '6b7ec72c-3f0a-45c2-9cdb-656231b6c04d' [2013-08-22 23:24:57.629132] W [options.c:848:xl_opt_validate] 0-master1-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction [2013-08-22 23:24:57.629169] I [socket.c:3487:socket_init] 0-tcp.master1-server: SSL support is NOT enabled [2013-08-22 23:24:57.629181] I [socket.c:3502:socket_init] 0-tcp.master1-server: using system polling thread [2013-08-22 23:24:57.629260] I [quota.c:2748:quota_parse_limits] 0-master1-quota: could not get the limits [2013-08-22 23:24:57.631642] E [posix-handle.c:379:posix_handle_init] 0-master1-posix: Different dirs /rhs/brick1 (512/64770) != /rhs/brick1/.glusterfs/00/00/00000000-0000-0000-0000-000000000001 (75497675/64770) [2013-08-22 23:24:57.631668] E [posix.c:4676:init] 0-master1-posix: Posix handle setup failed [2013-08-22 23:24:57.631678] E [xlator.c:423:xlator_init] 0-master1-posix: Initialization of volume 'master1-posix' failed, review your volfile again [2013-08-22 23:24:57.631688] E [graph.c:292:glusterfs_graph_init] 0-master1-posix: initializing translator failed [2013-08-22 23:24:57.631697] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed [2013-08-22 23:24:57.632023] W [glusterfsd.c:1062:cleanup_and_exit] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f06c5eaff35] (-->/usr/sbin/glusterfsd(mgmt_getspec_cbk+0x2ed) [0x40baed] (-->/usr/sbin/glusterfsd(glusterfs_process_volfp+0x106) [0x405066]))) 0-: received signum (0), shutting down [2013-08-22 23:32:10.214199] I [glusterfsd.c:1988:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.4.0.22rhs (/usr/sbin/glusterfsd -s 10.70.37.128 --volfile-id master1.10.70.37.128.rhs-brick1 -p /var/lib/glusterd/vols/master1/run/10.70.37.128-rhs-brick1.pid -S /var/run/1baa07e44194db8dd4a9e235f6f28f34.socket --brick-name /rhs/brick1 -l /var/log/glusterfs/bricks/rhs-brick1.log --xlator-option *-posix.glusterd-uuid=6b7ec72c-3f0a-45c2-9cdb-656231b6c04d --brick-port 49152 --xlator-option master1-server.listen-port=49152) glusterd log :- [2013-08-22 23:24:57.635804] E [glusterd-syncop.c:951:gd_commit_op_phase] 0-management: Commit of operation 'Volume Start' failed on localhost [2013-08-22 23:32:10.232193] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick (null) on port 49152 [2013-08-22 23:32:10.235804] E [glusterd-utils.c:4076:glusterd_brick_start] 0-management: Unable to start brick 10.70.37.128:/rhs/brick1 [2013-08-22 23:32:11.310508] E [glusterd-utils.c:3526:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/7eea64235394df91f42812e13630a0af.socket error: Permission denied
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.