Bug 1005575

Summary: Dist-geo-rep : gluster volume geo <master_vol> <slave_ip>::<slave_vol> config throws error 'Staging failed..command failed' after adding brick to master volume(before reconfiguration of session)
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachana Patel <racpatel>
Component: geo-replicationAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED EOL QA Contact: storage-qa-internal <storage-qa-internal>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.1CC: avishwan, chrisw, csaba, mzywusko, nsathyan, rhs-bugs, vagarwal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: config
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: This behaviour is expected and happens because glusterd today is cluster aware and not volume aware. Section 11.4 of the admin guide specifically asks the admin to perform a series of steps when a new brick on a new node is added in the cluster. By choice, we do not document the exact error messages that the command will output if the admin fails to perform these steps, as it will make the admin guide more complicated, and the error messages themselves are self explanatory. Hence the above mentioned behaviour is not an issue. Consequence: This behaviour will also be seen, if a new node is added to the cluster(even if a new brick is not added) and the steps mentioned in 11.4 are not followed. This is because of the way glusterd functions today(not volume aware). This is normal glusterd behaviour and is same as every other gluster command which performs the same set of operations on all the nodes in the cluster(irrespective of whether the node is a part of the volume on which the operations are being performed or not). Fix: We can update 11.4 section of the admin guide, so that it talks about adding new node in the cluster, along with adding new bricks in the volume. Result:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-25 08:47:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rachana Patel 2013-09-08 15:22:58 UTC
Description of problem:
Dist-geo-rep : gluster volume geo <master_vol> <slave_ip>::<slave_vol> config throws error 'Staging failed..command failed' after adding brick to master volume(before reconfiguration of  session)

Version-Release number of selected component (if applicable):
3.4.0.32rhs-1.el6_4.x86_64

How reproducible:
always

Steps to Reproduce:
1. create a geo rep session between master and slave volume
2. verify its status using status command and check output of config option
3. now add RHSS to master cluster and one brick to master volume.
Check output of config command
[root@old2 ~]# gluster volume geo  m_master1 rhsauto031.lab.eng.blr.redhat.com::slave1 status
NODE                           MASTER       SLAVE                                        HEALTH     UPTIME         
---------------------------------------------------------------------------------------------------------------
old2.lab.eng.blr.redhat.com    m_master1    rhsauto031.lab.eng.blr.redhat.com::slave1    Stable     00:09:33       
old4.lab.eng.blr.redhat.com    m_master1    rhsauto031.lab.eng.blr.redhat.com::slave1    defunct    N/A            
old1.lab.eng.blr.redhat.com    m_master1    rhsauto031.lab.eng.blr.redhat.com::slave1    Stable     00:01:36       
old3.lab.eng.blr.redhat.com    m_master1    rhsauto031.lab.eng.blr.redhat.com::slave1    faulty     N/A         
   
[root@old2 ~]# gluster volume geo  m_master1 rhsauto031.lab.eng.blr.redhat.com::slave1 config
Staging failed on 10.70.35.26. Error: Geo-replication session between m_master1 and rhsauto031.lab.eng.blr.redhat.com::slave1 does not exist.
geo-replication command failed

4. now done reconfiguration which is required after adding brick
[root@old1 ~]# gluster system:: execute gsec_create

Common secret pub file present at /var/lib/glusterd/geo-replication/common_secret.pem.pub
[root@old1 ~]#  gluster volume geo  m_master1 rhsauto031.lab.eng.blr.redhat.com::slave1 create push-pem force
Creating geo-replication session between m_master1 & rhsauto031.lab.eng.blr.redhat.com::slave1 has been successful
[root@old1 ~]# gluster volume geo  m_master1 rhsauto031.lab.eng.blr.redhat.com::slave1 start force
Starting geo-replication session between m_master1 & rhsauto031.lab.eng.blr.redhat.com::slave1 has been successful

5. again check output of config option
[root@old2 ~]# gluster volume geo  m_master1 rhsauto031.lab.eng.blr.redhat.com::slave1 config
special_sync_mode: partial
state_socket_unencoded: /var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/ssh%3A%2F%2Froot%4010.70.37.6%3Agluster%3A%2F%2F127.0.0.1%3Aslave1.socket
gluster_log_file: /var/log/glusterfs/geo-replication/m_master1/ssh%3A%2F%2Froot%4010.70.37.6%3Agluster%3A%2F%2F127.0.0.1%3Aslave1.gluster.log
ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem
ignore_deletes: true
change_detector: changelog
volume_id: 6af54e44-5083-4ad6-be4f-af5985e0160f
state_file: /var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/ssh%3A%2F%2Froot%4010.70.37.6%3Agluster%3A%2F%2F127.0.0.1%3Aslave1.status
remote_gsyncd: /nonexistent/gsyncd
session_owner: 6af54e44-5083-4ad6-be4f-af5985e0160f
socketdir: /var/run
working_dir: /var/run/gluster/m_master1/ssh%3A%2F%2Froot%4010.70.37.6%3Agluster%3A%2F%2F127.0.0.1%3Aslave1
state_detail_file: /var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/ssh%3A%2F%2Froot%4010.70.37.6%3Agluster%3A%2F%2F127.0.0.1%3Aslave1-detail.status
gluster_command_dir: /usr/sbin/
pid_file: /var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/ssh%3A%2F%2Froot%4010.70.37.6%3Agluster%3A%2F%2F127.0.0.1%3Aslave1.pid
log_file: /var/log/glusterfs/geo-replication/m_master1/ssh%3A%2F%2Froot%4010.70.37.6%3Agluster%3A%2F%2F127.0.0.1%3Aslave1.log
gluster_params: aux-gfid-mount xlator-option=*-dht.assert-no-child-down=true


Actual results:
after adding brick to master volume; config option is not showing any output. Once user do reconfig setup. It shows output

Expected results:
config option should show expected output

Additional info:

Comment 1 Rachana Patel 2013-09-08 15:25:01 UTC
log snippet:-
[2013-09-08 12:14:27.905956] W [glusterd-geo-rep.c:1404:glusterd_op_gsync_args_get] 0-: master not found
[2013-09-08 12:14:28.026453] E [glusterd-geo-rep.c:1745:glusterd_mountbroker_check] (-->/usr/lib64/libglusterfs.so.0(dict_foreach+0x45)
 [0x7f3a1310b4e5] (-->/usr/lib64/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(+0x81dde) [0x7f3a0f6dedde] (-->/usr/lib64/glusterfs/3.4.
0.32rhs/xlator/mgmt/glusterd.so(+0x7aa43) [0x7f3a0f6d7a43]))) 0-: Assertion failed: op_errstr
[2013-09-08 12:14:28.507895] E [glusterd-geo-rep.c:1745:glusterd_mountbroker_check] (-->/usr/lib64/libglusterfs.so.0(dict_foreach+0x45)
 [0x7f3a1310b4e5] (-->/usr/lib64/glusterfs/3.4.0.32rhs/xlator/mgmt/glusterd.so(+0x81dde) [0x7f3a0f6dedde] (-->/usr/lib64/glusterfs/3.4.
0.32rhs/xlator/mgmt/glusterd.so(+0x7aa43) [0x7f3a0f6d7a43]))) 0-: Assertion failed: op_errstr
[2013-09-08 12:15:35.460649] I [glusterd-geo-rep.c:1573:glusterd_get_statefile_name] 0-: Using passed config template(/var/lib/glusterd
/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/gsyncd.conf).
[2013-09-08 12:15:35.784245] E [glusterd-syncop.c:102:gd_collate_errors] 0-: Staging failed on 10.70.35.26. Error: Geo-replication session between m_master1 and rhsauto031.lab.eng.blr.redhat.com::slave1 does not exist.
[2013-09-08 12:15:57.088611] I [glusterd-geo-rep.c:1573:glusterd_get_statefile_name] 0-: Using passed config template(/var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/gsyncd.conf).
[2013-09-08 12:15:57.393328] E [glusterd-syncop.c:102:gd_collate_errors] 0-: Staging failed on 10.70.35.26. Error: Geo-replication session between m_master1 and rhsauto031.lab.eng.blr.redhat.com::slave1 does not exist.
[2013-09-08 12:16:58.551798] I [glusterd-geo-rep.c:1573:glusterd_get_statefile_name] 0-: Using passed config template(/var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/gsyncd.conf).
[2013-09-08 12:16:58.878982] E [glusterd-syncop.c:102:gd_collate_errors] 0-: Staging failed on 10.70.35.26. Error: Geo-replication session between m_master1 and rhsauto031.lab.eng.blr.redhat.com::slave1 does not exist.
[2013-09-08 12:18:00.354277] I [glusterd-geo-rep.c:1573:glusterd_get_statefile_name] 0-: Using passed config template(/var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/gsyncd.conf).
[2013-09-08 12:18:40.690326] I [glusterd-geo-rep.c:1573:glusterd_get_statefile_name] 0-: Using passed config template(/var/lib/glusterd/geo-replication/m_master1_rhsauto031.lab.eng.blr.redhat.com_slave1/gsyncd.conf).
[2013-09-08 12:18:40.864229] I [glusterd-geo-rep.c:1991:glusterd_op_stage_gsync_create] 0-: Session between m_master1 and rhsauto031.lab.eng.blr.redhat.com::slave1 is already created. Force creating again.
[2013-09-08 12:18:41.278845] I [run.c:190:runner_log] 0-management: Ran script: /var/lib/glusterd/hooks/1/gsync-create/post/S56glusterd-geo-rep-create-post.sh --volname=m_master1 This argument will stop the hooks script
[2013-09-08 12:19:29.759510] I [glusterd-geo-rep.c:1573:glusterd_get_statefile_name] 0-: Using passed config template(/var/lib/glusterd/geo-replication/m_rep_master1_rhsauto031.lab.eng.blr.redhat.com_slave2/gsyncd.conf).
[2013-09-08 12:19:29.862407] I [glusterd-geo-rep.c:1991:glusterd_op_stage_gsync_create] 0-: Session between m_rep_master1 and rhsauto031.lab.eng.blr.redhat.com::slave2 is already created. Force creating again.

Comment 5 Scott Haines 2013-09-27 17:08:10 UTC
Targeting for 3.0.0 (Denali) release.

Comment 6 Nagaprasad Sathyanarayana 2014-05-06 11:43:38 UTC
Dev ack to 3.0 RHS BZs

Comment 11 Aravinda VK 2015-11-25 08:47:48 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.

Comment 12 Aravinda VK 2015-11-25 08:50:12 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.