1403767 – [GLUSTERD]Failed to reflect the correct volume status on the rebooted node after doing volume stop while 1 node is down

Bug 1403767 - [GLUSTERD]Failed to reflect the correct volume status on the rebooted node after doing volume stop while 1 node is down

Summary: [GLUSTERD]Failed to reflect the correct volume status on the rebooted node af...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Atin Mukherjee
QA Contact:	Manisha Saini
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-12-12 10:46 UTC by Manisha Saini
Modified:	2023-09-14 03:35 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	In multi-node NFS-Ganesha configurations with multiple volumes, if a node was rebooted while a volume was stopped, volume status was reported incorrectly. This is resolved as of Red Hat Gluster Storage 3.4.
Clone Of:
Environment:
Last Closed:	2018-10-31 08:43:54 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Glusterd Logs of rebooted Node (5.08 MB, text/plain) 2016-12-13 07:37 UTC, Manisha Saini	no flags	Details
Glusterd Logs of the Node from which Volume stop was performed (397.53 KB, text/plain) 2016-12-13 07:39 UTC, Manisha Saini	no flags	Details
View All

Description Manisha Saini 2016-12-12 10:46:26 UTC

Description of problem:

Doing volume stop while 1 node is rebooted,unable to reflect the correct status of the volume on rebooted node.
When rebooted node came up,its still reflecting that volume in "Started" state

Version-Release number of selected component (if applicable):

glusterfs-3.8.4-8.el7rhgs.x86_64
nfs-ganesha-2.4.1-2.el7rhgs.x86_64

How reproducible:


Steps to Reproduce:
1.Create 4 node ganesha cluster on 7 node gluster and enable ganesha on it
2.Create 4 Distribute Volumes
3.Perform volume start and stop operations on different volumes 
4.Before doing node reboot, start all the volumes
5.Now reboot 1 of the node and From 1 of the other node which is up,Do volume stop 

[root@dhcp46-241 ganesha]# gluster v stop ganeshaVol5
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: ganeshaVol5: success

[root@dhcp46-241 ganesha]# showmount -e localhost
Export list for localhost:
/ganeshaVol4 (everyone)
/ganeshaVol3 (everyone)
/ganeshaVol1 (everyone)

When the rebooted node came up,it still reflects ganeshaVol5 in started state.
On rest of the other nodes this volume is in stopped state.

Following messages are reflected in gluster v status (on rebooted node)

Staging failed on dhcp46-241.lab.eng.blr.redhat.com. Error: Volume ganeshaVol5 is not started
Staging failed on dhcp46-219.lab.eng.blr.redhat.com. Error: Volume ganeshaVol5 is not started
Staging failed on dhcp47-45.lab.eng.blr.redhat.com. Error: Volume ganeshaVol5 is not started
Staging failed on dhcp46-232.lab.eng.blr.redhat.com. Error: Volume ganeshaVol5 is not started
Staging failed on dhcp47-33.lab.eng.blr.redhat.com. Error: Volume ganeshaVol5 is not started
Staging failed on dhcp46-110.lab.eng.blr.redhat.com. Error: Volume ganeshaVol5 is not started
 
[root@dhcp47-3 ~]# gluster v status
Status of volume: ganeshaVol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp46-219.lab.eng.blr.redhat.com:/mn
t/data1/3                                   49162     0          Y       4236 
Brick dhcp46-241.lab.eng.blr.redhat.com:/mn
t/data1/3                                   49162     0          Y       30733
Brick dhcp47-3.lab.eng.blr.redhat.com:/mnt/
data1/3                                     49152     0          Y       1820 
Brick dhcp47-45.lab.eng.blr.redhat.com:/mnt
/data1/3                                    49161     0          Y       26508
Brick dhcp46-219.lab.eng.blr.redhat.com:/mn
t/data2/4                                   49163     0          Y       4256 
Brick dhcp46-241.lab.eng.blr.redhat.com:/mn
t/data2/4                                   49163     0          Y       30753
Brick dhcp47-3.lab.eng.blr.redhat.com:/mnt/
data2/4                                     49153     0          Y       1827 
Brick dhcp47-45.lab.eng.blr.redhat.com:/mnt
/data2/4                                    49162     0          Y       26531
Brick dhcp46-219.lab.eng.blr.redhat.com:/mn
t/data3/5                                   49164     0          Y       4276 
Brick dhcp46-241.lab.eng.blr.redhat.com:/mn
t/data3/5                                   49164     0          Y       30773
Brick dhcp47-3.lab.eng.blr.redhat.com:/mnt/
data3/5                                     49154     0          Y       1840 
Brick dhcp47-45.lab.eng.blr.redhat.com:/mnt
/data3/5                                    49163     0          Y       26551
 
Task Status of Volume ganeshaVol1
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: ganeshaVol3
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp46-219.lab.eng.blr.redhat.com:/mn
t/data1/2                                   49159     0          Y       4137 
Brick dhcp46-241.lab.eng.blr.redhat.com:/mn
t/data1/2                                   49159     0          Y       30634
Brick dhcp47-3.lab.eng.blr.redhat.com:/mnt/
data1/2                                     49155     0          Y       1854 
Brick dhcp47-45.lab.eng.blr.redhat.com:/mnt
/data1/2                                    49158     0          Y       26401
Brick dhcp46-219.lab.eng.blr.redhat.com:/mn
t/data2/2                                   49160     0          Y       4157 
Brick dhcp46-241.lab.eng.blr.redhat.com:/mn
t/data2/2                                   49160     0          Y       30654
Brick dhcp47-3.lab.eng.blr.redhat.com:/mnt/
data2/2                                     49156     0          Y       1847 
Brick dhcp47-45.lab.eng.blr.redhat.com:/mnt
/data2/2                                    49159     0          Y       26421
Brick dhcp46-219.lab.eng.blr.redhat.com:/mn
t/data3/2                                   49161     0          Y       4177 
Brick dhcp46-241.lab.eng.blr.redhat.com:/mn
t/data3/2                                   49161     0          Y       30674
Brick dhcp47-3.lab.eng.blr.redhat.com:/mnt/
data3/2                                     49157     0          Y       1872 
Brick dhcp47-45.lab.eng.blr.redhat.com:/mnt
/data3/2                                    49160     0          Y       26441
 
Task Status of Volume ganeshaVol3
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: ganeshaVol4
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp46-219.lab.eng.blr.redhat.com:/mn
t/data1/6                                   49156     0          Y       4060 
Brick dhcp46-241.lab.eng.blr.redhat.com:/mn
t/data1/6                                   49156     0          Y       30569
Brick dhcp47-3.lab.eng.blr.redhat.com:/mnt/
data1/6                                     49158     0          Y       1883 
Brick dhcp47-45.lab.eng.blr.redhat.com:/mnt
/data1/6                                    49155     0          Y       26320
Brick dhcp46-219.lab.eng.blr.redhat.com:/mn
t/data2/6                                   49157     0          Y       4080 
Brick dhcp46-241.lab.eng.blr.redhat.com:/mn
t/data2/6                                   49157     0          Y       30589
Brick dhcp47-3.lab.eng.blr.redhat.com:/mnt/
data2/6                                     49159     0          Y       1889 
Brick dhcp47-45.lab.eng.blr.redhat.com:/mnt
/data2/6                                    49156     0          Y       26343
Brick dhcp46-219.lab.eng.blr.redhat.com:/mn
t/data3/6                                   49158     0          Y       4100 
Brick dhcp46-241.lab.eng.blr.redhat.com:/mn
t/data3/6                                   49158     0          Y       30609
Brick dhcp47-3.lab.eng.blr.redhat.com:/mnt/
data3/6                                     49160     0          Y       1906 
Brick dhcp47-45.lab.eng.blr.redhat.com:/mnt
/data3/6                                    49157     0          Y       26371
 
Task Status of Volume ganeshaVol4
------------------------------------------------------------------------------
There are no active volume tasks
 
Staging failed on dhcp46-241.lab.eng.blr.redhat.com. Error: Volume ganeshaVol5 is not started
Staging failed on dhcp46-219.lab.eng.blr.redhat.com. Error: Volume ganeshaVol5 is not started
Staging failed on dhcp47-45.lab.eng.blr.redhat.com. Error: Volume ganeshaVol5 is not started
Staging failed on dhcp46-232.lab.eng.blr.redhat.com. Error: Volume ganeshaVol5 is not started
Staging failed on dhcp47-33.lab.eng.blr.redhat.com. Error: Volume ganeshaVol5 is not started
Staging failed on dhcp46-110.lab.eng.blr.redhat.com. Error: Volume ganeshaVol5 is not started
 
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp47-3.lab.eng.blr.redhat.com:/var/
lib/glusterd/ss_brick                       49164     0          Y       1928 
Brick dhcp46-219.lab.eng.blr.redhat.com:/va
r/lib/glusterd/ss_brick                     49155     0          Y       1817 
Brick dhcp46-241.lab.eng.blr.redhat.com:/va
r/lib/glusterd/ss_brick                     49155     0          Y       28366
Self-heal Daemon on localhost               N/A       N/A        Y       6506 
Self-heal Daemon on dhcp46-241.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       32053
Self-heal Daemon on dhcp46-219.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       5445 
Self-heal Daemon on dhcp47-45.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       28102
Self-heal Daemon on dhcp46-232.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       12308
Self-heal Daemon on dhcp47-33.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       10595
Self-heal Daemon on dhcp46-110.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       5481 
 
Task Status of Volume gluster_shared_storage
------------------------------------------------------------------------------
There are no active volume tasks

[root@dhcp47-3 ~]# gluster v info
 
Volume Name: ganeshaVol1
Type: Distribute
Volume ID: d5568168-ec2c-445b-9747-b8ca8fcaba7c
Status: Started
Snapshot Count: 0
Number of Bricks: 12
Transport-type: tcp
Bricks:
Brick1: dhcp46-219.lab.eng.blr.redhat.com:/mnt/data1/3
Brick2: dhcp46-241.lab.eng.blr.redhat.com:/mnt/data1/3
Brick3: dhcp47-3.lab.eng.blr.redhat.com:/mnt/data1/3
Brick4: dhcp47-45.lab.eng.blr.redhat.com:/mnt/data1/3
Brick5: dhcp46-219.lab.eng.blr.redhat.com:/mnt/data2/4
Brick6: dhcp46-241.lab.eng.blr.redhat.com:/mnt/data2/4
Brick7: dhcp47-3.lab.eng.blr.redhat.com:/mnt/data2/4
Brick8: dhcp47-45.lab.eng.blr.redhat.com:/mnt/data2/4
Brick9: dhcp46-219.lab.eng.blr.redhat.com:/mnt/data3/5
Brick10: dhcp46-241.lab.eng.blr.redhat.com:/mnt/data3/5
Brick11: dhcp47-3.lab.eng.blr.redhat.com:/mnt/data3/5
Brick12: dhcp47-45.lab.eng.blr.redhat.com:/mnt/data3/5
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.cache-invalidation: off
ganesha.enable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
 
Volume Name: ganeshaVol3
Type: Distribute
Volume ID: c208643d-521d-4fcb-8768-0edd81f23ee6
Status: Started
Snapshot Count: 0
Number of Bricks: 12
Transport-type: tcp
Bricks:
Brick1: dhcp46-219.lab.eng.blr.redhat.com:/mnt/data1/2
Brick2: dhcp46-241.lab.eng.blr.redhat.com:/mnt/data1/2
Brick3: dhcp47-3.lab.eng.blr.redhat.com:/mnt/data1/2
Brick4: dhcp47-45.lab.eng.blr.redhat.com:/mnt/data1/2
Brick5: dhcp46-219.lab.eng.blr.redhat.com:/mnt/data2/2
Brick6: dhcp46-241.lab.eng.blr.redhat.com:/mnt/data2/2
Brick7: dhcp47-3.lab.eng.blr.redhat.com:/mnt/data2/2
Brick8: dhcp47-45.lab.eng.blr.redhat.com:/mnt/data2/2
Brick9: dhcp46-219.lab.eng.blr.redhat.com:/mnt/data3/2
Brick10: dhcp46-241.lab.eng.blr.redhat.com:/mnt/data3/2
Brick11: dhcp47-3.lab.eng.blr.redhat.com:/mnt/data3/2
Brick12: dhcp47-45.lab.eng.blr.redhat.com:/mnt/data3/2
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.cache-invalidation: off
ganesha.enable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
 
Volume Name: ganeshaVol4
Type: Distribute
Volume ID: e87dff35-b277-45ea-abb7-5a7e8d32f4e6
Status: Started
Snapshot Count: 0
Number of Bricks: 12
Transport-type: tcp
Bricks:
Brick1: dhcp46-219.lab.eng.blr.redhat.com:/mnt/data1/6
Brick2: dhcp46-241.lab.eng.blr.redhat.com:/mnt/data1/6
Brick3: dhcp47-3.lab.eng.blr.redhat.com:/mnt/data1/6
Brick4: dhcp47-45.lab.eng.blr.redhat.com:/mnt/data1/6
Brick5: dhcp46-219.lab.eng.blr.redhat.com:/mnt/data2/6
Brick6: dhcp46-241.lab.eng.blr.redhat.com:/mnt/data2/6
Brick7: dhcp47-3.lab.eng.blr.redhat.com:/mnt/data2/6
Brick8: dhcp47-45.lab.eng.blr.redhat.com:/mnt/data2/6
Brick9: dhcp46-219.lab.eng.blr.redhat.com:/mnt/data3/6
Brick10: dhcp46-241.lab.eng.blr.redhat.com:/mnt/data3/6
Brick11: dhcp47-3.lab.eng.blr.redhat.com:/mnt/data3/6
Brick12: dhcp47-45.lab.eng.blr.redhat.com:/mnt/data3/6
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.cache-invalidation: off
ganesha.enable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
 
Volume Name: ganeshaVol5
Type: Distribute
Volume ID: 1a6864e5-64b3-4b45-8a25-939895c630cf
Status: Started
Snapshot Count: 0
Number of Bricks: 12
Transport-type: tcp
Bricks:
Brick1: dhcp46-219.lab.eng.blr.redhat.com:/mnt/data1/7
Brick2: dhcp46-241.lab.eng.blr.redhat.com:/mnt/data1/7
Brick3: dhcp47-3.lab.eng.blr.redhat.com:/mnt/data1/7
Brick4: dhcp47-45.lab.eng.blr.redhat.com:/mnt/data1/7
Brick5: dhcp46-219.lab.eng.blr.redhat.com:/mnt/data2/7
Brick6: dhcp46-241.lab.eng.blr.redhat.com:/mnt/data2/7
Brick7: dhcp47-3.lab.eng.blr.redhat.com:/mnt/data2/7
Brick8: dhcp47-45.lab.eng.blr.redhat.com:/mnt/data2/7
Brick9: dhcp46-219.lab.eng.blr.redhat.com:/mnt/data3/7
Brick10: dhcp46-241.lab.eng.blr.redhat.com:/mnt/data3/7
Brick11: dhcp47-3.lab.eng.blr.redhat.com:/mnt/data3/7
Brick12: dhcp47-45.lab.eng.blr.redhat.com:/mnt/data3/7
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.cache-invalidation: off
ganesha.enable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
 
Volume Name: gluster_shared_storage
Type: Replicate
Volume ID: bcb7239e-1e56-41f1-a2cc-df94bb929fe9
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: dhcp47-3.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick
Brick2: dhcp46-219.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick
Brick3: dhcp46-241.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
nfs-ganesha: enable
cluster.enable-shared-storage: enable

[root@dhcp47-3 ~]# firewall-cmd --list-services
dhcpv6-client rpc-bind rquota high-availability mountd glusterfs nfs ssh nlm

[root@dhcp47-3 ~]# gluster peer status
Number of Peers: 6

Hostname: dhcp46-241.lab.eng.blr.redhat.com
Uuid: 1fe28c22-4b7c-4dcd-ae69-d572b66d2434
State: Peer in Cluster (Connected)

Hostname: dhcp46-219.lab.eng.blr.redhat.com
Uuid: 35ecc4c8-84b4-4ad6-a25c-8f411e1a1087
State: Peer in Cluster (Connected)

Hostname: dhcp47-45.lab.eng.blr.redhat.com
Uuid: ff3ba838-5350-44c2-954a-be74f65b4663
State: Peer in Cluster (Connected)

Hostname: dhcp46-232.lab.eng.blr.redhat.com
Uuid: 222f7028-81e4-45c6-8b2a-eac9fafef2eb
State: Peer in Cluster (Connected)

Hostname: dhcp47-33.lab.eng.blr.redhat.com
Uuid: e90fa3d9-58db-4d38-abbb-26d6158bc205
State: Peer in Cluster (Connected)

Hostname: dhcp46-110.lab.eng.blr.redhat.com
Uuid: d7c61834-17a0-430e-b27e-cf1dc4f3f3b0
State: Peer in Cluster (Connected)

[root@dhcp47-3 ganeshaVol5]# showmount -e localhost
Export list for localhost:
/ganeshaVol1 (everyone)
/ganeshaVol3 (everyone)
/ganeshaVol4 (everyone)
/ganeshaVol5 (everyone)

On the Node from which volume stopped was performed:

Volume Name: ganeshaVol5
Type: Distribute
Volume ID: 1a6864e5-64b3-4b45-8a25-939895c630cf
Status: Stopped
Snapshot Count: 0
Number of Bricks: 12
Transport-type: tcp
Bricks:
Brick1: dhcp46-219.lab.eng.blr.redhat.com:/mnt/data1/7
Brick2: dhcp46-241.lab.eng.blr.redhat.com:/mnt/data1/7
Brick3: dhcp47-3.lab.eng.blr.redhat.com:/mnt/data1/7
Brick4: dhcp47-45.lab.eng.blr.redhat.com:/mnt/data1/7
Brick5: dhcp46-219.lab.eng.blr.redhat.com:/mnt/data2/7
Brick6: dhcp46-241.lab.eng.blr.redhat.com:/mnt/data2/7
Brick7: dhcp47-3.lab.eng.blr.redhat.com:/mnt/data2/7
Brick8: dhcp47-45.lab.eng.blr.redhat.com:/mnt/data2/7
Brick9: dhcp46-219.lab.eng.blr.redhat.com:/mnt/data3/7
Brick10: dhcp46-241.lab.eng.blr.redhat.com:/mnt/data3/7
Brick11: dhcp47-3.lab.eng.blr.redhat.com:/mnt/data3/7
Brick12: dhcp47-45.lab.eng.blr.redhat.com:/mnt/data3/7
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable


Actual results:
Rebooted node is not fetching the correct status of the volume from other nodes

Expected results:
Rebooted node should reflect the correct status of the volume

Additional info:

Comment 3 Atin Mukherjee 2016-12-12 16:06:39 UTC

I had got a chance to look into the set up and figured out that at the time of friend update and the node which went through the reboot, found a higher version for volume ganeshaVol5 as per 

[2016-12-12 09:26:13.494987] I [MSGID: 106009] [glusterd-utils.c:2914:glusterd_compare_friend_volume] 0-management: Version of volume ganeshaVol5 differ. local version = 7, remote version = 8 on peer dhcp46-241.lab.eng.blr.redhat.com

However the surprising part post that was glusterd didn't update the volume info file with the latest and still continued with the stale volinfo. The log file doesn't indicate any failures for the same however to analyze the issue I have couple of requests:

1. Is it reproducible?
2. If yes, can we try to enable debug log and share across?

Now coming to the decision on if its a blocker for rhgs-3.2.0 or not, my answer would be no as the test case looks to be something which will not be often executed at production i.e. rebooting a node and stopping the volume at the same time.

Please add your thoughts.

Comment 4 Byreddy 2016-12-13 04:29:37 UTC

Just to confirm the issue exists on non nfs-ganesh setup, tried the below things multiple times, it worked perfectly to me

 
"Steps i did":
1. Created a 4 node cluster
2. Created 4 distribute volume using all 4 node bricks and started all the volumes.
3. Rebooted one of cluster node and at the same time stopped the volumes.
4. Checked the volume status on the rebooted node, it showed correctly ( volumes was in stopped state)

@Manisha, The firewall rules on your setup are persistent ? if not persistent, then chances of hitting this issue more.
Always make the firewall rules persistent for any node reboot related testing

Comment 5 SATHEESARAN 2016-12-13 06:12:17 UTC

(In reply to Byreddy from comment #4)
> Just to confirm the issue exists on non nfs-ganesh setup, tried the below
> things multiple times, it worked perfectly to me
> 
>  
> "Steps i did":
> 1. Created a 4 node cluster
> 2. Created 4 distribute volume using all 4 node bricks and started all the
> volumes.
> 3. Rebooted one of cluster node and at the same time stopped the volumes.
> 4. Checked the volume status on the rebooted node, it showed correctly (
> volumes was in stopped state)
> 
> @Manisha, The firewall rules on your setup are persistent ? if not
> persistent, then chances of hitting this issue more.
> Always make the firewall rules persistent for any node reboot related testing

Byreddy,

I had the same query on firewalld rules.
Manisha has configured firewalld rules via gdeploy and glusterfs service was added permanent.

<snip>
<msaini_>[root@dhcp47-3 ~]# firewall-cmd --list-services
<msaini_> dhcpv6-client rpc-bind rquota high-availability mountd glusterfs nfs ssh nlm
</snip>

Comment 6 Manisha Saini 2016-12-13 07:32:01 UTC

(In reply to Atin Mukherjee from comment #3)
> I had got a chance to look into the set up and figured out that at the time
> of friend update and the node which went through the reboot, found a higher
> version for volume ganeshaVol5 as per 
> 
> [2016-12-12 09:26:13.494987] I [MSGID: 106009]
> [glusterd-utils.c:2914:glusterd_compare_friend_volume] 0-management: Version
> of volume ganeshaVol5 differ. local version = 7, remote version = 8 on peer
> dhcp46-241.lab.eng.blr.redhat.com
> 
> However the surprising part post that was glusterd didn't update the volume
> info file with the latest and still continued with the stale volinfo. The
> log file doesn't indicate any failures for the same however to analyze the
> issue I have couple of requests:
> 
> 1. Is it reproducible?
> 2. If yes, can we try to enable debug log and share across?
> 
> Now coming to the decision on if its a blocker for rhgs-3.2.0 or not, my
> answer would be no as the test case looks to be something which will not be
> often executed at production i.e. rebooting a node and stopping the volume
> at the same time.
> 
> Please add your thoughts.

Again tried reproducing the same scenario.The issue is reproducible.
With single volume ,the issue is not observed.In my scenario there were 4 volumes.With the same steps(Creating volumes and doing start and stop on those volumes) i am able to hit this issue again.

Comment 7 Manisha Saini 2016-12-13 07:37:59 UTC

Created attachment 1231084 [details]
Glusterd Logs of rebooted Node

Comment 8 Manisha Saini 2016-12-13 07:39:01 UTC

Created attachment 1231086 [details]
Glusterd Logs of the Node from which Volume stop was performed

Comment 9 Byreddy 2016-12-13 10:28:43 UTC

Based on issue reproducible, it looks issue exists on nfs-ganesha configured setup. 

This scenario is working well for me on the setup where nfs-ganesha  is not configured

@Manisha, You can try the same thing on the same setup with out nfs-ganesha config to isolate a problem.

Comment 10 Atin Mukherjee 2016-12-14 04:18:35 UTC

Here are few additional data points which Manisha & myself came up with as per the testing and analysis results:

1. This issue doesn't happen on a similar setup where NFS-Ganesha is not configured.
2. This issue doesn't happen for a single volume set up.
3. This issue only happens if the node goes for a reboot, killing all gluster processes and then bringing it back after performing volume stop from another nodes doesn't cause any inconsistency in data.
4. If more than one volume is stopped then this issue doesn't persist.

We still don't have enough RCA to have any evidence what's going wrong here however IMO this test doesn't look like a frequent use case in production and can be deferred from rhgs-3.2.0 given there is a workaround available here to correct the state.

Comment 14 Bhavana 2017-03-13 15:57:46 UTC

The doc text is slightly edited for the release notes.

Comment 23 Atin Mukherjee 2018-10-31 08:43:54 UTC

Anjana - for your awareness, this needs to be taken out from the known issue chapter.

Comment 26 Red Hat Bugzilla 2023-09-14 03:35:59 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.