Bug 1005575 - Dist-geo-rep : gluster volume geo <master_vol> <slave_ip>::<slave_vol> config throws error 'Staging failed..command failed' after adding brick to master volume(before reconfiguration of session)
Summary: Dist-geo-rep : gluster volume geo <master_vol> <slave_ip>::<slave_vol> config...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication
Version: 2.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard: config
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-08 15:22 UTC by Rachana Patel
Modified: 2015-11-25 08:50 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: This behaviour is expected and happens because glusterd today is cluster aware and not volume aware. Section 11.4 of the admin guide specifically asks the admin to perform a series of steps when a new brick on a new node is added in the cluster. By choice, we do not document the exact error messages that the command will output if the admin fails to perform these steps, as it will make the admin guide more complicated, and the error messages themselves are self explanatory. Hence the above mentioned behaviour is not an issue. Consequence: This behaviour will also be seen, if a new node is added to the cluster(even if a new brick is not added) and the steps mentioned in 11.4 are not followed. This is because of the way glusterd functions today(not volume aware). This is normal glusterd behaviour and is same as every other gluster command which performs the same set of operations on all the nodes in the cluster(irrespective of whether the node is a part of the volume on which the operations are being performed or not). Fix: We can update 11.4 section of the admin guide, so that it talks about adding new node in the cluster, along with adding new bricks in the volume. Result:
Clone Of:
Environment:
Last Closed: 2015-11-25 08:47:48 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Rachana Patel 2013-09-08 15:22:58 UTC
Description of problem:
Dist-geo-rep : gluster volume geo <master_vol> <slave_ip>::<slave_vol> config throws error 'Staging failed..command failed' after adding brick to master volume(before reconfiguration of  session)

Version-Release number of selected component (if applicable):
3.4.0.32rhs-1.el6_4.x86_64

How reproducible:
always

Steps to Reproduce:
1. create a geo rep session between master and slave volume
2. verify its status using status command and check output of config option
3. now add RHSS to master cluster and one brick to master volume.
Check output of config command
[root@old2 ~]# gluster volume geo  m_master1 rhsauto031.lab.eng.blr.redhat.com::slave1 status
NODE                           MASTER       SLAVE                                        HEALTH     UPTIME         
---------------------------------------------------------------------------------------------------------------
old2.lab.eng.blr.redhat.com    m_master1    rhsauto031.lab.eng.blr.redhat.com::slave1    Stable     00:09:33       
old4.lab.eng.blr.redhat.com    m_master1    rhsauto031.lab.eng.blr.redhat.com::slave1    defunct    N/A            
old1.lab.eng.blr.redhat.com    m_master1    rhsauto031.lab.eng.blr.redhat.com::slave1    Stable     00:01:36       
old3.lab.eng.blr.redhat.com    m_master1    rhsauto031.lab.eng.blr.redhat.com::slave1    faulty     N/A         
   
[root@old2 ~]# gluster volume geo  m_master1 rhsauto031.lab.eng.blr.redhat.com::slave1 config
Staging failed on 10.70.35.26. Error: Geo-replication session between m_master1 and rhsauto031.lab.eng.blr.redhat.com::slave1 does not exist.
geo-replication command failed

4. now done reconfiguration which is required after adding brick
[root@old1 ~]# gluster system:: execute gsec_create

Common secret pub file present at /var/lib/glusterd/geo-replication/common_secret.pem.pub
[root@old1 ~]#  gluster volume geo  m_master1 rhsauto031.lab.eng.blr.redhat.com::slave1 create push-pem force
Creating geo-replication session between m_master1 & rhsauto031.lab.eng.blr.redhat.com::slave1 has been successful
[root@old1 ~]# gluster volume geo  m_master1 rhsauto031.lab.eng.blr.redhat.com::slave1 start force
Starting geo-replication session between m_master1 & rhsauto031.lab.eng.blr.redhat.com::slave1 has been successful

5. again check output of config option
[root@old2 ~]# gluster volume geo  m_master1 rhsauto031.lab.eng.blr.redhat.com::slave1 config
special_sync_mode: partial
state_socket_unencoded: /var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/ssh%3A%2F%2Froot%4010.70.37.6%3Agluster%3A%2F%2F127.0.0.1%3Aslave1.socket
gluster_log_file: /var/log/glusterfs/geo-replication/m_master1/ssh%3A%2F%2Froot%4010.70.37.6%3Agluster%3A%2F%2F127.0.0.1%3Aslave1.gluster.log
ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem
ignore_deletes: true
change_detector: changelog
volume_id: 6af54e44-5083-4ad6-be4f-af5985e0160f
state_file: /var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/ssh%3A%2F%2Froot%4010.70.37.6%3Agluster%3A%2F%2F127.0.0.1%3Aslave1.status
remote_gsyncd: /nonexistent/gsyncd
session_owner: 6af54e44-5083-4ad6-be4f-af5985e0160f
socketdir: /var/run
working_dir: /var/run/gluster/m_master1/ssh%3A%2F%2Froot%4010.70.37.6%3Agluster%3A%2F%2F127.0.0.1%3Aslave1
state_detail_file: /var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/ssh%3A%2F%2Froot%4010.70.37.6%3Agluster%3A%2F%2F127.0.0.1%3Aslave1-detail.status
gluster_command_dir: /usr/sbin/
pid_file: /var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/ssh%3A%2F%2Froot%4010.70.37.6%3Agluster%3A%2F%2F127.0.0.1%3Aslave1.pid
log_file: /var/log/glusterfs/geo-replication/m_master1/ssh%3A%2F%2Froot%4010.70.37.6%3Agluster%3A%2F%2F127.0.0.1%3Aslave1.log
gluster_params: aux-gfid-mount xlator-option=*-dht.assert-no-child-down=true


Actual results:
after adding brick to master volume; config option is not showing any output. Once user do reconfig setup. It shows output

Expected results:
config option should show expected output

Additional info:

Comment 1 Rachana Patel 2013-09-08 15:25:01 UTC
log snippet:-
[2013-09-08 12:14:27.905956] W [glusterd-geo-rep.c:1404:glusterd_op_gsync_args_get] 0-: master not found
[2013-09-08 12:14:28.026453] E [glusterd-geo-rep.c:1745:glusterd_mountbroker_check] (-->/usr/lib64/libglusterfs.so.0(dict_foreach+0x45)
 [0x7f3a1310b4e5] (-->/usr/lib64/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(+0x81dde) [0x7f3a0f6dedde] (-->/usr/lib64/glusterfs/3.4.
0.32rhs/xlator/mgmt/glusterd.so(+0x7aa43) [0x7f3a0f6d7a43]))) 0-: Assertion failed: op_errstr
[2013-09-08 12:14:28.507895] E [glusterd-geo-rep.c:1745:glusterd_mountbroker_check] (-->/usr/lib64/libglusterfs.so.0(dict_foreach+0x45)
 [0x7f3a1310b4e5] (-->/usr/lib64/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(+0x81dde) [0x7f3a0f6dedde] (-->/usr/lib64/glusterfs/3.4.
0.32rhs/xlator/mgmt/glusterd.so(+0x7aa43) [0x7f3a0f6d7a43]))) 0-: Assertion failed: op_errstr
[2013-09-08 12:15:35.460649] I [glusterd-geo-rep.c:1573:glusterd_get_statefile_name] 0-: Using passed config template(/var/lib/glusterd
/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/gsyncd.conf).
[2013-09-08 12:15:35.784245] E [glusterd-syncop.c:102:gd_collate_errors] 0-: Staging failed on 10.70.35.26. Error: Geo-replication session between m_master1 and rhsauto031.lab.eng.blr.redhat.com::slave1 does not exist.
[2013-09-08 12:15:57.088611] I [glusterd-geo-rep.c:1573:glusterd_get_statefile_name] 0-: Using passed config template(/var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/gsyncd.conf).
[2013-09-08 12:15:57.393328] E [glusterd-syncop.c:102:gd_collate_errors] 0-: Staging failed on 10.70.35.26. Error: Geo-replication session between m_master1 and rhsauto031.lab.eng.blr.redhat.com::slave1 does not exist.
[2013-09-08 12:16:58.551798] I [glusterd-geo-rep.c:1573:glusterd_get_statefile_name] 0-: Using passed config template(/var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/gsyncd.conf).
[2013-09-08 12:16:58.878982] E [glusterd-syncop.c:102:gd_collate_errors] 0-: Staging failed on 10.70.35.26. Error: Geo-replication session between m_master1 and rhsauto031.lab.eng.blr.redhat.com::slave1 does not exist.
[2013-09-08 12:18:00.354277] I [glusterd-geo-rep.c:1573:glusterd_get_statefile_name] 0-: Using passed config template(/var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/gsyncd.conf).
[2013-09-08 12:18:40.690326] I [glusterd-geo-rep.c:1573:glusterd_get_statefile_name] 0-: Using passed config template(/var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/gsyncd.conf).
[2013-09-08 12:18:40.864229] I [glusterd-geo-rep.c:1991:glusterd_op_stage_gsync_create] 0-: Session between m_master1 and rhsauto031.lab.eng.blr.redhat.com::slave1 is already created. Force creating again.
[2013-09-08 12:18:41.278845] I [run.c:190:runner_log] 0-management: Ran script: /var/lib/glusterd/hooks/1/gsync-create/post/S56glusterd-geo-rep-create-post.sh --volname=m_master1 This argument will stop the hooks script
[2013-09-08 12:19:29.759510] I [glusterd-geo-rep.c:1573:glusterd_get_statefile_name] 0-: Using passed config template(/var/lib/glusterd/geo-replication/m_rep_master1_rhsauto031.lab.eng.blr.redhat.com_slave2/gsyncd.conf).
[2013-09-08 12:19:29.862407] I [glusterd-geo-rep.c:1991:glusterd_op_stage_gsync_create] 0-: Session between m_rep_master1 and rhsauto031.lab.eng.blr.redhat.com::slave2 is already created. Force creating again.

Comment 5 Scott Haines 2013-09-27 17:08:10 UTC
Targeting for 3.0.0 (Denali) release.

Comment 6 Nagaprasad Sathyanarayana 2014-05-06 11:43:38 UTC
Dev ack to 3.0 RHS BZs

Comment 11 Aravinda VK 2015-11-25 08:47:48 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.

Comment 12 Aravinda VK 2015-11-25 08:50:12 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.


Note You need to log in before you can comment on or make changes to this bug.