+++ This bug was initially created as a clone of Bug #1480423 +++
+++ This bug was initially created as a clone of Bug #1480332 +++
Description of problem:
One of the volume brick is not up after restarting the pod.
sh-4.2# rpm -qa |grep glusterfs
glusterfs-fuse-3.8.4-39.el7rhgs.x86_64
glusterfs-server-3.8.4-39.el7rhgs.x86_64
glusterfs-libs-3.8.4-39.el7rhgs.x86_64
glusterfs-3.8.4-39.el7rhgs.x86_64
glusterfs-api-3.8.4-39.el7rhgs.x86_64
glusterfs-cli-3.8.4-39.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-39.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-39.el7rhgs.x86_64
sh-4.2#
sh-4.2# gluster v status
Status of volume: heketidbstorage
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.35.5:/var/lib/heketi/mounts/v
g_e2c07dc7b49f9f42ca327a3073966364/brick_7a
c173229b1e69e8730be424956846e4/brick 49152 0 Y 524
Brick 192.168.35.2:/var/lib/heketi/mounts/v
g_11f9761be704eb359ffd71c4d76afdf1/brick_04
0ff4a25411b19d2b24d0340bf3e9f8/brick 49152 0 Y 403
Brick 192.168.35.6:/var/lib/heketi/mounts/v
g_1438320d4e7bde3c3d2f232671afd699/brick_1f
0d216ac0f28f1f5b0fbbc4adcdc69b/brick 49152 0 Y 410
Self-heal Daemon on localhost N/A N/A Y 399
Self-heal Daemon on 192.168.35.5 N/A N/A Y 514
Self-heal Daemon on 192.168.35.6 N/A N/A Y 435
Self-heal Daemon on 192.168.35.4 N/A N/A Y 412
Self-heal Daemon on 192.168.35.2 N/A N/A Y 412
Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_96d8648ebd2ff7dcb87be3a2587c6246
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.35.6:/var/lib/heketi/mounts/v
g_9ed2ce0c55973e60d0ffc25d0028157f/brick_90
862eaeb677759f02e0809786e8ec70/brick 49153 0 Y 416
Brick 192.168.35.3:/var/lib/heketi/mounts/v
g_803bb877441ad72c82b0ee6113a79511/brick_9a
7f3517851a71e13bc3ff1f3b96f4bc/brick 49152 0 Y 408
Brick 192.168.35.4:/var/lib/heketi/mounts/v
g_799ba6da52ca29cf5404bd96b53fa957/brick_3f
ed005da1f6746d2d299b1e971fcc39/brick 49152 0 Y 403
Brick 192.168.35.6:/var/lib/heketi/mounts/v
g_1438320d4e7bde3c3d2f232671afd699/brick_61
5a892b042f50617e4d228e0971c6ce/brick 49153 0 Y 416
Brick 192.168.35.4:/var/lib/heketi/mounts/v
g_3ae4bb7cd25740f3cd84a15dbb985f48/brick_bb
9eb06eb20f6d570d4d2f6378b14742/brick 49152 0 Y 403
Brick 192.168.35.5:/var/lib/heketi/mounts/v
g_96d6ff00fd2a38ead4c9ef9cd6db935f/brick_1a
6dcabc43a70402dc595ed89d76126f/brick 49153 0 Y 531
Self-heal Daemon on localhost N/A N/A Y 399
Self-heal Daemon on 192.168.35.4 N/A N/A Y 412
Self-heal Daemon on 192.168.35.2 N/A N/A Y 412
Self-heal Daemon on 192.168.35.5 N/A N/A Y 514
Self-heal Daemon on 192.168.35.6 N/A N/A Y 435
Task Status of Volume vol_96d8648ebd2ff7dcb87be3a2587c6246
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_bd43b5b5f048c469ee16f80376d73ec3
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.35.3:/var/lib/heketi/mounts/v
g_803bb877441ad72c82b0ee6113a79511/brick_80
9331b5ef0e23c3327cf4483258dd25/brick N/A N/A N N/A
Brick 192.168.35.4:/var/lib/heketi/mounts/v
g_799ba6da52ca29cf5404bd96b53fa957/brick_7d
5d8d5c3074f965d456c6d1fbfbc54d/brick 49153 0 Y 407
Brick 192.168.35.2:/var/lib/heketi/mounts/v
g_11f9761be704eb359ffd71c4d76afdf1/brick_54
dd82e9703966757b5c8f9f332c998b/brick 49152 0 Y 403
Self-heal Daemon on localhost N/A N/A Y 399
Self-heal Daemon on 192.168.35.5 N/A N/A Y 514
Self-heal Daemon on 192.168.35.6 N/A N/A Y 435
Self-heal Daemon on 192.168.35.2 N/A N/A Y 412
Self-heal Daemon on 192.168.35.4 N/A N/A Y 412
Task Status of Volume vol_bd43b5b5f048c469ee16f80376d73ec3
------------------------------------------------------------------------------
There are no active volume tasks
sh-4.2#
sh-4.2# cat glusterd.log|grep -i disconnec
[2017-08-08 11:45:35.612285] I [MSGID: 106004] [glusterd-handler.c:5879:__glusterd_peer_rpc_notify] 0-management: Peer <192.168.35.6> (<e74482f2-b77f-4920-817e-5181cb748cce>), in state <Peer in Cluster>, has disconnected from glusterd.
[2017-08-08 11:45:45.700948] E [socket.c:2360:socket_connect_finish] 0-management: connection to 192.168.35.6:24007 failed (Connection refused); disconnecting socket
[2017-08-08 15:21:15.512361] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-08 15:21:15.513083] I [MSGID: 106005] [glusterd-handler.c:5687:__glusterd_brick_rpc_notify] 0-management: Brick 192.168.35.3:/var/lib/heketi/mounts/vg_803bb877441ad72c82b0ee6113a79511/brick_809331b5ef0e23c3327cf4483258dd25/brick has disconnected from glusterd.
[2017-08-09 17:43:29.010548] I [MSGID: 106004] [glusterd-handler.c:5879:__glusterd_peer_rpc_notify] 0-management: Peer <192.168.35.4> (<29382453-610e-40f6-b2b3-9ae4edcfde51>), in state <Peer in Cluster>, has disconnected from glusterd.
[2017-08-09 17:43:39.504984] E [socket.c:2360:socket_connect_finish] 0-management: connection to 192.168.35.4:24007 failed (Connection refused); disconnecting socket
[2017-08-09 17:45:59.307446] I [MSGID: 106004] [glusterd-handler.c:5879:__glusterd_peer_rpc_notify] 0-management: Peer <192.168.35.2> (<8dbffc32-38b5-460c-a857-cd2303a35c35>), in state <Peer in Cluster>, has disconnected from glusterd.
[2017-08-09 17:46:09.793636] E [socket.c:2360:socket_connect_finish] 0-management: connection to 192.168.35.2:24007 failed (Connection refused); disconnecting socket
[2017-08-09 17:48:29.725173] I [MSGID: 106004] [glusterd-handler.c:5879:__glusterd_peer_rpc_notify] 0-management: Peer <192.168.35.6> (<e74482f2-b77f-4920-817e-5181cb748cce>), in state <Peer in Cluster>, has disconnected from glusterd.
[2017-08-09 17:48:40.112820] E [socket.c:2360:socket_connect_finish] 0-management: connection to 192.168.35.6:24007 failed (Connection refused); disconnecting socket
[2017-08-09 17:53:41.528247] E [socket.c:2360:socket_connect_finish] 0-management: connection to 192.168.35.5:24007 failed (Connection refused); disconnecting socket
[2017-08-09 17:53:41.528302] I [MSGID: 106004] [glusterd-handler.c:5879:__glusterd_peer_rpc_notify] 0-management: Peer <192.168.35.5> (<849ac512-5d53-47b3-bbe2-7afff39627e7>), in state <Peer in Cluster>, has disconnected from glusterd.
[2017-08-09 17:53:43.192552] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-09 17:53:43.193180] I [MSGID: 106005] [glusterd-handler.c:5687:__glusterd_brick_rpc_notify] 0-management: Brick 192.168.35.3:/var/lib/heketi/mounts/vg_803bb877441ad72c82b0ee6113a79511/brick_9a7f3517851a71e13bc3ff1f3b96f4bc/brick has disconnected from glusterd.
[2017-08-09 17:53:43.193974] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-09 17:53:43.194563] I [MSGID: 106005] [glusterd-handler.c:5687:__glusterd_brick_rpc_notify] 0-management: Brick 192.168.35.3:/var/lib/heketi/mounts/vg_803bb877441ad72c82b0ee6113a79511/brick_809331b5ef0e23c3327cf4483258dd25/brick has disconnected from glusterd.
sh-4.2#
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-08-10 13:46:47 EDT ---
This bug is automatically being proposed for the current release of Container-Native Storage under active development, by setting the release flag 'cns‑3.6.0' to '?'.
If this bug should be proposed for a different release, please manually change the proposed release flag.
--- Additional comment from Humble Chirammal on 2017-08-10 13:47:59 EDT ---
Atin and Mohit has analysed this issue and have found the root cause that this is a PID clash. More updates will be followed soon.
--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-08-11 00:57:30 EDT ---
This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.3.0' to '?'.
If this bug should be proposed for a different release, please manually change the proposed release flag.
--- Additional comment from Atin Mukherjee on 2017-08-11 00:59:44 EDT ---
Rejy - Can you please raise a blocker flag on this bug? This issue is coming from CNS testing and we'd need to have a discussion on this bug in rhgs-3.3.0 blocker triage meeting.
--- Additional comment from Atin Mukherjee on 2017-08-11 01:14:22 EDT ---
RCA:
When the gluster pod in CNS environment is restarted, given all the gluster daemon pidfiles are stored in /var/lib/glusterd/ location all the pidfiles are persisted. Now on restart of the pod when glusterd tries to start the bricks it first fetches out the pid from the respective brick pidfiles and try to match up the entry in /proc/<pid>/cmdline and in case it finds out an entry it assumes that brick is running.
Now in this particular case there were two bricks which glusterd had to start on this pod. While GlusterD sent the trigger to start the first brick and given on a glusterd restart scenario bricks are started asynchronously, GlusterD went ahead and tried to start the second brick. Now for every daemon, as part of the daemonization there are two processes spawned up. 1st is the parent and 2nd is the child and when the child is forked out, 1st process goes off and in this transition there are two pid entries maintained in /proc/<pid>/cmdline.
Consider for first brick the parent pid was 101 and the child pid was 102. Also in this particular case the pidfile of 2nd brick had an entry of 101 prior to restarting the pod. Now when glusterd tries to restart the 2nd brick it finds 101 to be running (although its the child pid) and assumes that 2nd brick is running and never triggers a start or an attach to this brick.
In container environment the pid numbers are in 3 digits and the probability of this clash is much higher than CRS deployments where gluster storage nodes are outside of containers.
One of the easy way to mitigate this problem in CNS is to have the pidfiles cleaned up during start up. However in CRS although the probability of this clash is *very* less, there is no guarantee that we can never hit it.
The impact of this issue is that the brick will not come up, however there wouldn't be any impact to I/O given the other replica bricks would be running. Doing a volume start force would attach this brick and brick status will be online.
We have a patch https://review.gluster.org/#/c/13580/ which changes the pidfile location from /var/lib/glusterd to /var/run/gluster so that on node reboot all the pidfiles are cleaned up.
upstream patch : https://review.gluster.org/#/c/13580/
--- Additional comment from Humble Chirammal on 2017-08-11 01:47:24 EDT ---
Thanks Atin, I would like to propose this as a blocker for CNS 3.6 release.
REVIEW: https://review.gluster.org/18025 (glusterd: Introduce run-directory option in glusterd.vol file) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa)