Bug 1400916

Summary:	[compound FOPs]: Need to disconnect and remount the clients for Compound Fops feature to take effect
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Nag Pavan Chilakam <nchilaka>
Component:	replicate	Assignee:	Pranith Kumar K <pkarampu>
Status:	CLOSED NOTABUG	QA Contact:	Nag Pavan Chilakam <nchilaka>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.2	CC:	kdhananj, rhs-bugs, storage-qa-internal
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-12-02 10:23:46 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Nag Pavan Chilakam 2016-12-02 10:02:00 UTC

Description of problem:
-=====================
Seems like we need to remount the volumes on the clients to let the compound fops feature to take effect.
Just enabling the option is not sufficient

I had a 4 node cluster, where I created a 1x2 volume spanning on n1 and n2
I fuse mounted the volume on c1 and c2 using n3 and n4 IPs respectively
I then enabled Compound fops on the volume and restarted the volume.

The volume graph show cfops enabled on brick logs, but there is no regenration of vol graph on client side.

I discussed with Glusterd Team(Atin) and found that it is not necessary to regenerate vol graph on client side on restart of volume 

That means that Compound fops will not be in effect.
For it it come into effect, turning on cfops options will not suffice. User has to remount the volume on all clients.
This will also, mean that any customer post upgrade and enabling cfops must be remounting the clients, once upgraded 
Hence there will be unavailability of the volume for some time.

We need to either fix it by regenerating vol graph on client side, or if it is not the ideal way of doing, we may have to document this



Version-Release number of selected component (if applicable):
===============
3.8.4-6

Comment 2 Krutika Dhananjay 2016-12-02 10:23:46 UTC

Nag and I checked his setup.
This is not a bug.
Volfile will be generated only when there is a graph switch.

Just to double-check, we attached the client to gdb and checked for afr's private member and saw that compound-fops was on:

<snip>

Breakpoint 1, afr_lookup (frame=0x7f08e79dccd8, this=0x7f08d8009060, loc=0x7f08e722f678, xattr_req=0x7f08e717d35c) at afr-common.c:2858
2858    {
(gdb) p this->private
$1 = (void *) 0x7f08d804f220
(gdb) p (afr_private_t *)this->private
$2 = (afr_private_t *) 0x7f08d804f220
(gdb) p *$2
$3 = {lock = {spinlock = 1, mutex = {__data = {__lock = 1, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
      __size = "\001", '\000' <repeats 38 times>, __align = 1}}, child_count = 2, arbiter_count = 0, children = 0x7f08d804a3b0, root_inode = 0x7f08d43ce06c,
  child_up = 0x7f08d804a350 "\001\001\r", <incomplete sequence \360\255\272>, local = 0x7f08d804a8f0 "", pending_key = 0x7f08d804a410, data_self_heal = 0x7f08d7de9e34 "on",
  data_self_heal_algorithm = 0x0, data_self_heal_window_size = 1, heal_waiting = {next = 0x7f08d804f290, prev = 0x7f08d804f290}, heal_wait_qlen = 128, heal_waiters = 0, healing = {
    next = 0x7f08d804f2a8, prev = 0x7f08d804f2a8}, background_self_heal_count = 8, healers = 0, metadata_self_heal = _gf_true, entry_self_heal = _gf_true, data_change_log = _gf_true,
  metadata_change_log = _gf_true, entry_change_log = _gf_true, metadata_splitbrain_forced_heal = _gf_false, read_child = -1, hash_mode = 1, favorite_child = -1,
  fav_child_policy = AFR_FAV_CHILD_NONE, inodelk_trace = _gf_false, entrylk_trace = _gf_false, wait_count = 1, timer = 0x0, optimistic_change_log = _gf_true, eager_lock = _gf_true,
  pre_op_compat = _gf_true, post_op_delay_secs = 1, quorum_count = 0, quorum_reads = _gf_false, vol_uuid = '\000' <repeats 36 times>, last_event = 0x7f08d804f4b0, event_generation = 6,
  choose_local = _gf_true, did_discovery = _gf_true, sh_readdir_size = 1024, ensure_durability = _gf_true, sh_domain = 0x7f08d804f440 "rep2-replicate-0:self-heal",
  afr_dirty = 0x7f08d7deccd4 "trusted.afr.dirty", shd = {iamshd = _gf_false, enabled = _gf_true, timeout = 600, index_healers = 0x7f08d804fc80, full_healers = 0x7f08d804fe30,
    split_brain = 0x7f08d804ffe0, statistics = 0x7f08d8052150, max_threads = 1, wait_qlength = 1024}, consistent_metadata = _gf_false, spb_choice_timeout = 300, need_heal = _gf_false,
  pump_private = 0x0, use_afr_in_pump = _gf_false, locking_scheme = 0x7f08d7deb341 "full", esh_granular = _gf_false, use_compound_fops = _gf_true}
(gdb)

</snip>

So closing the BZ.