Bug 1658984

Summary: None of the bricks come ONLINE after gluster pod reboot in OCS 3.11.1
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: vinutha <vinug>
Component: rhgs-server-containerAssignee: Niels de Vos <ndevos>
Status: CLOSED ERRATA QA Contact: vinutha <vinug>
Severity: high Docs Contact:
Priority: unspecified    
Version: ocs-3.11CC: abhishku, amukherj, knarra, kramdoss, madam, moagrawa, nberry, ndevos, rgeorge, rhs-bugs, sankarshan, sarora, suprasad, vinug
Target Milestone: ---Keywords: Regression, TestBlocker, ZStream
Target Release: OCS 3.11.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ocs/rhgs-server-rhel7:3.11.1-5 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-07 04:12:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1644160    

Description vinutha 2018-12-13 10:28:51 UTC
Description of problem:
******** This bug was hit while verifying bug https://bugzilla.redhat.com/show_bug.cgi?id=1632896 ***************

After editing the ds with the Brick-Multiplex variable set to No and restarting the gluster pods makes the bricks of heketidbstorage volume go offline. 


Version-Release number of selected component (if applicable):
# rpm -qa| grep gluster 
glusterfs-server-3.12.2-30.el7rhgs.x86_64
gluster-block-0.2.1-29.el7rhgs.x86_64
glusterfs-api-3.12.2-30.el7rhgs.x86_64
glusterfs-cli-3.12.2-30.el7rhgs.x86_64
python2-gluster-3.12.2-30.el7rhgs.x86_64
glusterfs-fuse-3.12.2-30.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-30.el7rhgs.x86_64
glusterfs-libs-3.12.2-30.el7rhgs.x86_64
glusterfs-3.12.2-30.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-30.el7rhgs.x86_64

# oc rsh heketi-storage-1-qkgnn rpm -qa | grep heketi
heketi-8.0.0-2.el7rhgs.x86_64
heketi-client-8.0.0-2.el7rhgs.x86_64

# oc version 
oc v3.11.43
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://dhcp46-113.lab.eng.blr.redhat.com:8443
openshift v3.11.43
kubernetes v1.11.0+d4cacc0

How reproducible:
Tried once on this setup 


Steps to Reproduce:
1. 4 node ocp + ocs setup. Created a file and a block pvc 
# heketi-cli  volume list 
Id:257cd8f9ccb8de57cd1eadf7aeefb8dc    Cluster:5ca7a0269e0407efbcdbdaee8b343996    Name:vol_glusterfs_f1_a8802983-feb6-11e8-9144-005056a53ec9
Id:a1146f15b390206cd49181526881ab3f    Cluster:5ca7a0269e0407efbcdbdaee8b343996    Name:heketidbstorage
Id:da13f9f679f02100509c378f909e3b38    Cluster:5ca7a0269e0407efbcdbdaee8b343996    Name:vol_da13f9f679f02100509c378f909e3b38 [block]

2. Edited the ds with the following change
 
# oc edit ds glusterfs-storage
daemonset.extensions/glusterfs-storage edited

- name: GLUSTER_BRICKMULTIPLEX
  value: "No"

All bricks of heketidbstorage are online before gluster pod reboot 
# oc rsh glusterfs-storage-8gpz8 gluster v status
Status of volume: heketidbstorage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.166:/var/lib/heketi/mounts/v
g_ae300b4cee27f06380cf70817f175733/brick_36
e0027ce5f499adf741944f45361fa7/brick        49152     0          Y       417  
Brick 10.70.47.145:/var/lib/heketi/mounts/v
g_962a1bb9c334107f9cdc89dcd97dac05/brick_91
9104cfe6b87ac48bfd616687c5c75e/brick        49152     0          Y       426  
Brick 10.70.47.27:/var/lib/heketi/mounts/vg
_f398741caf7abeb26708dda31c04175d/brick_064
b73523e803d81cb348a11916af98e/brick         49152     0          Y       404  
Self-heal Daemon on localhost               N/A       N/A        Y       3715 
Self-heal Daemon on 10.70.46.237            N/A       N/A        Y       3776 
Self-heal Daemon on 10.70.47.166            N/A       N/A        Y       3624 
Self-heal Daemon on 10.70.47.27             N/A       N/A        Y       3836 
 
Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks
 

3. Restarted the gluster pods one after another. Observed that the heketidb bricks are not online after the gluster pod reboots. 


After gluster pods reboot the heketidbstorage bricks are not online 

== after rebooting gluster pod 1 

# oc rsh glusterfs-storage-blmhf gluster v status 
Status of volume: heketidbstorage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.166:/var/lib/heketi/mounts/v
g_ae300b4cee27f06380cf70817f175733/brick_36
e0027ce5f499adf741944f45361fa7/brick        49152     0          Y       417  
Brick 10.70.47.145:/var/lib/heketi/mounts/v
g_962a1bb9c334107f9cdc89dcd97dac05/brick_91
9104cfe6b87ac48bfd616687c5c75e/brick        N/A       N/A        N       N/A  
Brick 10.70.47.27:/var/lib/heketi/mounts/vg
_f398741caf7abeb26708dda31c04175d/brick_064
b73523e803d81cb348a11916af98e/brick         49152     0          Y       404  
Self-heal Daemon on localhost               N/A       N/A        Y       332  
Self-heal Daemon on 10.70.47.166            N/A       N/A        Y       3624 
Self-heal Daemon on dhcp47-145.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       673  
Self-heal Daemon on 10.70.47.27             N/A       N/A        Y       3836 

== after rebooting gluster pod 2 

# oc rsh glusterfs-storage-blmhf gluster v status 
Status of volume: heketidbstorage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.166:/var/lib/heketi/mounts/v
g_ae300b4cee27f06380cf70817f175733/brick_36
e0027ce5f499adf741944f45361fa7/brick        49152     0          Y       417  
Brick 10.70.47.145:/var/lib/heketi/mounts/v
g_962a1bb9c334107f9cdc89dcd97dac05/brick_91
9104cfe6b87ac48bfd616687c5c75e/brick        N/A       N/A        N       N/A  
Brick 10.70.47.27:/var/lib/heketi/mounts/vg
_f398741caf7abeb26708dda31c04175d/brick_064
b73523e803d81cb348a11916af98e/brick         49152     0          Y       404  
Self-heal Daemon on localhost               N/A       N/A        Y       332  
Self-heal Daemon on dhcp47-145.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       673  
Self-heal Daemon on 10.70.47.27             N/A       N/A        Y       3836 
Self-heal Daemon on 10.70.47.166            N/A       N/A        Y       3624 
 
Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks

== after rebooting gluster pod 3 
# oc rsh glusterfs-storage-jnpf4 gluster v status
Status of volume: heketidbstorage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.166:/var/lib/heketi/mounts/v
g_ae300b4cee27f06380cf70817f175733/brick_36
e0027ce5f499adf741944f45361fa7/brick        N/A       N/A        N       N/A  
Brick 10.70.47.145:/var/lib/heketi/mounts/v
g_962a1bb9c334107f9cdc89dcd97dac05/brick_91
9104cfe6b87ac48bfd616687c5c75e/brick        N/A       N/A        N       N/A  
Brick 10.70.47.27:/var/lib/heketi/mounts/vg
_f398741caf7abeb26708dda31c04175d/brick_064
b73523e803d81cb348a11916af98e/brick         49152     0          Y       404  
Self-heal Daemon on localhost               N/A       N/A        Y       311  
Self-heal Daemon on dhcp46-237.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       332  
Self-heal Daemon on dhcp47-145.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       673  
Self-heal Daemon on 10.70.47.27             N/A       N/A        Y       3836 
 
Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks


== after rebooting gluster pod 4 
# oc rsh glusterfs-storage-jnpf4 gluster v status
Status of volume: heketidbstorage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.166:/var/lib/heketi/mounts/v
g_ae300b4cee27f06380cf70817f175733/brick_36
e0027ce5f499adf741944f45361fa7/brick        N/A       N/A        N       N/A  
Brick 10.70.47.145:/var/lib/heketi/mounts/v
g_962a1bb9c334107f9cdc89dcd97dac05/brick_91
9104cfe6b87ac48bfd616687c5c75e/brick        N/A       N/A        N       N/A  
Brick 10.70.47.27:/var/lib/heketi/mounts/vg
_f398741caf7abeb26708dda31c04175d/brick_064
b73523e803d81cb348a11916af98e/brick         N/A       N/A        N       N/A  
Self-heal Daemon on localhost               N/A       N/A        Y       311  
Self-heal Daemon on 10.70.47.27             N/A       N/A        Y       310  
Self-heal Daemon on dhcp47-145.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       673  
Self-heal Daemon on dhcp46-237.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       332  
 
Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks


Gluster volume heal info 
# oc rsh glusterfs-storage-jnpf4
sh-4.2# for i in `gluster v list` ; do echo $i; echo ""; gluster v heal $i info ; done
heketidbstorage

heketidbstorage: Not able to fetch volfile from glusterd
Volume heal failed.
vol_da13f9f679f02100509c378f909e3b38

Brick 10.70.47.27:/var/lib/heketi/mounts/vg_f398741caf7abeb26708dda31c04175d/brick_9b7bf90c161bfef754f0e90fac34ad11/brick
Status: Connected
Number of entries: 0

Brick 10.70.47.145:/var/lib/heketi/mounts/vg_962a1bb9c334107f9cdc89dcd97dac05/brick_0310c142d85cd2004522963cb203938c/brick
Status: Connected
Number of entries: 0

Brick 10.70.47.166:/var/lib/heketi/mounts/vg_ae300b4cee27f06380cf70817f175733/brick_be0aebf901f4d9dd662a1db5c9f7af11/brick
Status: Connected
Number of entries: 0

vol_glusterfs_f1_a8802983-feb6-11e8-9144-005056a53ec9

Brick 10.70.46.237:/var/lib/heketi/mounts/vg_e64f926eebfd4e5e5a4afaec4460050a/brick_94b7d877281df12bd639e0b4709cdc2c/brick
Status: Connected
Number of entries: 0

Brick 10.70.47.27:/var/lib/heketi/mounts/vg_f398741caf7abeb26708dda31c04175d/brick_120766ed1db05bbf34c660754a405b8a/brick
Status: Connected
Number of entries: 0

Brick 10.70.47.145:/var/lib/heketi/mounts/vg_962a1bb9c334107f9cdc89dcd97dac05/brick_67e812e75e4d7261f5036672d4684080/brick
Status: Connected
Number of entries: 0

== Gluster pods post rebooting 

# oc get pods | grep gluster
glusterblock-storage-provisioner-dc-1-8zvnx   1/1       Running   0          2d
glusterfs-storage-blmhf                       1/1       Running   0          1h
glusterfs-storage-bm2f8                       1/1       Running   0          51m
glusterfs-storage-g98sg                       1/1       Running   0          1h
glusterfs-storage-jnpf4                       1/1       Running   0          54m


Actual results:
Heketidbstorage bricks not online after gluster pod reboot

Expected results:
heketidbstorage bricks should be online after gluster pod reboot 

Additional info:
logs will be attached

Comment 5 RamaKasturi 2018-12-18 07:19:34 UTC
I am seeing a similar issue on the AWS setup where i have upgraded  from accelerated hot fix builds to the latest 3.11.1 bits and i see that once my upgrade of gluster pods finishes none of the bricks in the gluster pod comes up. 

Below are the errors seen in glusterd.log:
================================================

[2018-12-17 10:37:16.167479] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-management: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.listen-backlog 1024
  8:     option event-threads 1
  9:     option ping-timeout 0
 10:     option transport.socket.read-fail-log off
 11:     option transport.socket.keepalive-interval 2
 12:     option transport.socket.keepalive-time 10
 13:     option transport-type rdma
 14:     option working-directory /var/lib/glusterd
 15: end-volume
 16:  
+------------------------------------------------------------------------------+
[2018-12-17 10:37:16.175099] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-12-17 10:42:30.063817] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[2018-12-17 10:42:30.063763] I [socket.c:2481:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-12-17 10:42:30.063938] I [socket.c:3680:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2018-12-17 10:42:30.063952] E [rpcsvc.c:1349:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x2, Program: GlusterD svc cli, ProgVers: 2, Proc: 5) to rpc-transport (socket.management)
[2018-12-17 10:42:30.063972] E [MSGID: 106430] [glusterd-utils.c:560:glusterd_submit_reply] 0-glusterd: Reply submission failed
[2018-12-17 10:42:30.064596] I [socket.c:2481:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-12-17 10:42:30.064702] I [socket.c:3680:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2018-12-17 10:42:30.064718] E [rpcsvc.c:1349:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x2, Program: GlusterD svc cli, ProgVers: 2, Proc: 5) to rpc-transport (socket.management)
[2018-12-17 10:42:30.064996] I [socket.c:2481:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-12-17 10:42:30.065382] I [socket.c:2481:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-12-17 10:42:30.065787] I [socket.c:3680:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2018-12-17 10:42:30.065799] E [rpcsvc.c:1349:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x2, Program: GlusterD svc cli, ProgVers: 2, Proc: 5) to rpc-transport (socket.management)
The message "E [MSGID: 106430] [glusterd-utils.c:560:glusterd_submit_reply] 0-glusterd: Reply submission failed" repeated 2 times between [2018-12-17 10:42:30.063972] and [2018-12-17 10:42:30.065810]
The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 3158 times between [2018-12-17 10:42:30.063817] and [2018-12-17 10:42:31.502823]
[2018-12-17 10:45:30.753931] I [glusterd-locks.c:732:gd_mgmt_v3_unlock_timer_cbk] 0-management: In gd_mgmt_v3_unlock_timer_cbk
[2018-12-17 10:45:37.169631] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 3221e583-b1ea-4de4-965a-29c54d4f69e2, host: 172.16.37.192, port: 0
[2018-12-17 10:45:37.172074] E [glusterd-utils.c:6172:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/var/lib/heketi/mounts/vg_fad4d4727a76dfba2f389f39daf27baa/brick_a4c08bf0c5b3f305093267ef46948f33/brick), brick is deemed not to be a part of the volume (heketidbstorage) 
[2018-12-17 10:45:37.172128] E [MSGID: 106005] [glusterd-server-quorum.c:408:glusterd_do_volume_quorum_action] 0-management: Failed to connect to 172.16.17.167:/var/lib/heketi/mounts/vg_fad4d4727a76dfba2f389f39daf27baa/brick_a4c08bf0c5b3f305093267ef46948f33/brick
[2018-12-17 10:45:37.172153] E [glusterd-utils.c:6172:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/var/lib/heketi/mounts/vg_fad4d4727a76dfba2f389f39daf27baa/brick_fb0f05a58746ad08ca92bfd4ab98f1b0/brick), brick is deemed not to be a part of the volume (knarra_cirros10_claim001_eac46710-f896-11e8-a3a6-025dbca7e8b6) 
[2018-12-17 10:45:37.172162] E [MSGID: 106005] [glusterd-server-quorum.c:408:glusterd_do_volume_quorum_action] 0-management: Failed to connect to 172.16.17.167:/var/lib/heketi/mounts/vg_fad4d4727a76dfba2f389f39daf27baa/brick_fb0f05a58746ad08ca92bfd4ab98f1b0/brick
[2018-12-17 10:45:37.172178] E [glusterd-utils.c:6172:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/var/lib/heketi/mounts/vg_fad4d4727a76dfba2f389f39daf27baa/brick_263b2ddfe254f8a84e284189f4417c05/brick), brick is deemed not to be a part of the volume (knarra_cirros10_claim002_f52b4e21-f896-11e8-a3a6-025dbca7e8b6)

Comment 11 Niels de Vos 2018-12-18 19:59:39 UTC
From the logs in /var/log/glusterfs/container/ on a node where mounting failed:

sh-4.2# tail -n2 /var/log/glusterfs/container/mountfstab
mount: special device /dev/mapper/vg_fc718f8701c690ecd8974cce6903f210-brick_015e5cd2337492f7c51c08e6cd384048 does not exist
mount command exited with code 32

But the LVs should be available:

sh-4.2# tail -n2 /var/log/glusterfs/container/lvscan    
  ACTIVE            '/dev/vg_fc718f8701c690ecd8974cce6903f210/brick_015e5cd2337492f7c51c08e6cd384048' [1.00 GiB] inherit
  ACTIVE            '/dev/dockervg/dockerlv' [<100.00 GiB] inherit

And they are!

sh-4.2# ls /dev/mapper/vg_* | tail -n4
/dev/mapper/vg_fc718f8701c690ecd8974cce6903f210-tp_fff73290c8eb84e253eef5bf6bbef9e2
/dev/mapper/vg_fc718f8701c690ecd8974cce6903f210-tp_fff73290c8eb84e253eef5bf6bbef9e2-tpool
/dev/mapper/vg_fc718f8701c690ecd8974cce6903f210-tp_fff73290c8eb84e253eef5bf6bbef9e2_tdata
/dev/mapper/vg_fc718f8701c690ecd8974cce6903f210-tp_fff73290c8eb84e253eef5bf6bbef9e2_tmeta

There are lots of bricks on this system, so there might be some delays in setting up all /dev/mapper/ devices?

sh-4.2# wc -l /var/lib/heketi/fstab
1638 /var/lib/heketi/fstab

[side question: How many bricks do we support on a single system?]

The gluster-setup.sh script has some logic to retry mounting (and running 'vgscan --mknodes'). Unfortunately this retry logic is not executed in case 'mount' gave an error. Upstream PR https://github.com/gluster/gluster-containers/pull/114 is an attempt to fix this.

Comment 16 RamaKasturi 2018-12-20 06:33:00 UTC
Hello Niels,

    I did test with the container image present in comment 15 and below are my observations.

Below are the steps followed to respin the container image:
===============================================================
1) copied the image to aws systems.
2) deleted the glusterfs template
3) deleted the glusterfs deamon set.
4) Edited the gluster template to have the latest image
5) created glusterfs template and deamonset again
6) While Niels and sven were debugging they started one of the volume and rebooted the pod which brought up some of the bricks, so i spinned the pod where there are no bricks online with the new container image.
7) Deleted the image and spinned a new one.
8) Pod came back up successfully.

Results / Observations :
================================
1) I could see that bricks are mounted from df -kh output .
2) checked the file /var/log/glusterfs/container/fstab and see only one entry which says "Mount successful"
3) Niels had asked me to reboot the node where glusterfs pod resides , to trigger a potential race while LVM on the host is still detecting devices when the glusterfs-server pod is starting
4) Rebooted the node and i see the following contents where it has failures in the beginning and later mounted successfully. copied mountfstab in the below link at [1]
5) Now i see that glusterfsd process came up but not all of the bricks are online.

I do not see the brick which is not online present in /var/log/glusterfs/container/mountfstab

/var/lib/heketi/mounts/vg_fc718f8701c690ecd8974cce6903f210/brick_aa8b1c7a713d4659f2f98ff354ce1a56/brick

For the bricks which are not online i see the below error messages in glusterd.log file.

[2018-12-19 19:21:46.262362] E [glusterd-utils.c:6172:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/var/lib/heketi/mounts/vg_fc718f8701c690ecd8974cce6903f210/brick_aa8b1c7a713d4659f2f98ff354ce1a56/brick), brick is deemed not to be a part of the volume (knarra_cirros14_claim071_8d3fbc2f-f985-11e8-a3a6-025dbca7e8b6)
[2018-12-19 19:21:46.262370] E [MSGID: 106005] [glusterd-server-quorum.c:408:glusterd_do_volume_quorum_action] 0-management: Failed to connect to 172.16.25.103:/var/lib/heketi/mounts/vg_fc718f8701c690ecd8974cce6903f210/brick_aa8b1c7a713d4659f2f98ff354ce1a56/brick

I have copied below logs and files to the location at [1] :
==============================================
/var/log/glusterfs/container/mountfstab
/var/log/glusterfs/glusterd.log
gluster volume status output file.

[1] http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1658984/

Resetting the needinfo on vinutha as i have provided the required results.

placing need info back on Niels.

Comment 18 RamaKasturi 2018-12-24 07:45:32 UTC
Hello Niels,

   I have tested the new container image present in Comment 17 and i see that after pod restart all the bricks are up. Below are the tests performed for the same.

Test Steps 1:
==================
1) docker pull the new container image on to all nodes
2) delete glusterfs daemonset
3) edit the daemonset to point to new image
4) create the glusterfs daemonset again
5) Now delete one of the glusterfs pod
6) Once the pod is up and running, restart the other pods too.
7) Once all the pods are up now go and check for the following

ps aux | grep glusterfsd <- should show that glusterfsd process are running
gluster volume status <- should show that all bricks are up and running
df -kh should show all bricks mounted
/var/log/glusterfs/container/mountfstab should show that bricks are mounted.
/var/log/glusterfs/container/failed_bricks should not list anything.

Test 2:
===============

1) rebooted all the nodes where the pods are running and check for the following.

ps aux | grep glusterfsd <- should show that glusterfsd process are running
gluster volume status <- should show that all bricks are up and running
df -kh should show all bricks mounted
/var/log/glusterfs/container/mountfstab should show that bricks are mounted.
/var/log/glusterfs/container/failed_bricks should not list anything.

@Niels, i see that with this new patch all the bricks are up and running but below are my observations.

/var/log/glusterfs/container/mountfstab -> shows that bricks are not mounted even though they are :

sh-4.2# cat /var/log/glusterfs/container/mountfstab 
mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_7284930ad6f9e613ec2f9caddb950075 does not exist
mount: special device /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_02c090a83d8bb2dc21a5d3f5f401d974 does not exist
mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_9c058aaaae49e074bae7107db3327a58 does not exist
mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_5defd825b08ebcd68c151477f961e161 does not exist
mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_cdd7d971bc34e73e8822aae44b113fa9 does not exist
mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_e7814ded5342940d6a4384cc785ae34c does not exist
mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_9d595fb810b44f7c5f5e6707a06040b0 does not exist
mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_ee2a7c445483ac950ba31248db2e2f8a does not exist
mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_b1c9d3b3d03fc8ad17b0edd259c37fe8 does not exist
mount: special device /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_301eb2f4da27b4533ed520b2c218a769 does not exist
mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_a260c093f0b508ed985a46c00b559d1c does not exist
mount: special device /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_ac54c45871ad68c29e1800bb792d0a3e does not exist
mount: special device /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_5705f68474f33c5c24dbff6ae92b39a5 does not exist
mount command exited with code 32
/var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_7284930ad6f9e613ec2f9caddb950075 not mounted.
/var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_02c090a83d8bb2dc21a5d3f5f401d974 not mounted.
/var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_9c058aaaae49e074bae7107db3327a58 not mounted.
/var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_5defd825b08ebcd68c151477f961e161 not mounted.
/var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_cdd7d971bc34e73e8822aae44b113fa9 not mounted.
/var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_e7814ded5342940d6a4384cc785ae34c not mounted.
/var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_9d595fb810b44f7c5f5e6707a06040b0 not mounted.
/var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_ee2a7c445483ac950ba31248db2e2f8a not mounted.
/var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_b1c9d3b3d03fc8ad17b0edd259c37fe8 not mounted.
/var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_301eb2f4da27b4533ed520b2c218a769 not mounted.
/var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_a260c093f0b508ed985a46c00b559d1c not mounted.
/var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_ac54c45871ad68c29e1800bb792d0a3e not mounted.
/var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_5705f68474f33c5c24dbff6ae92b39a5 not mounted.

sh-4.2# stat /var/log/glusterfs/container/mountfstab 
  File: '/var/log/glusterfs/container/mountfstab'
  Size: 3089      	Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d	Inode: 67312755    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2018-12-24 07:30:32.682186651 +0000
Modify: 2018-12-24 07:28:36.316114233 +0000
Change: 2018-12-24 07:28:36.316114233 +0000
 Birth: -
sh-4.2# date -u
Mon Dec 24 07:44:33 UTC 2018


df-kh shows that bricks are mounted:
=============================================
sh-4.2# df -kh
Filesystem                                                                              Size  Used Avail Use% Mounted on
overlay                                                                                  40G  2.6G   38G   7% /
tmpfs                                                                                    16G     0   16G   0% /dev
/dev/sdc                                                                                 40G   33M   40G   1% /run
/dev/mapper/docker--vol-dockerlv                                                         40G  2.6G   38G   7% /run/secrets
/dev/mapper/rhel_dhcp46--210-root                                                        35G  2.6G   33G   8% /etc/ssl
tmpfs                                                                                    16G  2.5M   16G   1% /run/lvm
devtmpfs                                                                                 16G     0   16G   0% /dev/disk
shm                                                                                      64M     0   64M   0% /dev/shm
tmpfs                                                                                    16G     0   16G   0% /sys/fs/cgroup
tmpfs                                                                                    16G   16K   16G   1% /run/secrets/kubernetes.io/serviceaccount
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_7284930ad6f9e613ec2f9caddb950075  2.0G   33M  2.0G   2% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_7284930ad6f9e613ec2f9caddb950075
/dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_02c090a83d8bb2dc21a5d3f5f401d974   85G   34M   85G   1% /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_02c090a83d8bb2dc21a5d3f5f401d974
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_9c058aaaae49e074bae7107db3327a58   85G   34M   85G   1% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_9c058aaaae49e074bae7107db3327a58
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_5defd825b08ebcd68c151477f961e161   85G   34M   85G   1% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_5defd825b08ebcd68c151477f961e161
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_cdd7d971bc34e73e8822aae44b113fa9   85G   34M   85G   1% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_cdd7d971bc34e73e8822aae44b113fa9
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_e7814ded5342940d6a4384cc785ae34c   85G   34M   85G   1% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_e7814ded5342940d6a4384cc785ae34c
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_9d595fb810b44f7c5f5e6707a06040b0   85G   34M   85G   1% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_9d595fb810b44f7c5f5e6707a06040b0
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_ee2a7c445483ac950ba31248db2e2f8a   85G   34M   85G   1% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_ee2a7c445483ac950ba31248db2e2f8a
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_b1c9d3b3d03fc8ad17b0edd259c37fe8   85G   34M   85G   1% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_b1c9d3b3d03fc8ad17b0edd259c37fe8
/dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_301eb2f4da27b4533ed520b2c218a769   10G   34M   10G   1% /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_301eb2f4da27b4533ed520b2c218a769
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_a260c093f0b508ed985a46c00b559d1c 1014M   33M  982M   4% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_a260c093f0b508ed985a46c00b559d1c
/dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_ac54c45871ad68c29e1800bb792d0a3e 1014M   33M  982M   4% /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_ac54c45871ad68c29e1800bb792d0a3e
/dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_5705f68474f33c5c24dbff6ae92b39a5 1014M   33M  982M   4% /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_5705f68474f33c5c24dbff6ae92b39a5

since all the bricks are mounted, i understand that there should be no failed_bricks. Any idea why i still see failed_bricks ?

sh-4.2# cat /var/log/glusterfs/container/failed_bricks 
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_7284930ad6f9e613ec2f9caddb950075 /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_7284930ad6f9e613ec2f9caddb950075 xfs rw,inode64,noatime,nouuid 1 2
/dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_02c090a83d8bb2dc21a5d3f5f401d974 /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_02c090a83d8bb2dc21a5d3f5f401d974 xfs rw,inode64,noatime,nouuid 1 2
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_9c058aaaae49e074bae7107db3327a58 /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_9c058aaaae49e074bae7107db3327a58 xfs rw,inode64,noatime,nouuid 1 2
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_5defd825b08ebcd68c151477f961e161 /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_5defd825b08ebcd68c151477f961e161 xfs rw,inode64,noatime,nouuid 1 2
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_cdd7d971bc34e73e8822aae44b113fa9 /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_cdd7d971bc34e73e8822aae44b113fa9 xfs rw,inode64,noatime,nouuid 1 2
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_e7814ded5342940d6a4384cc785ae34c /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_e7814ded5342940d6a4384cc785ae34c xfs rw,inode64,noatime,nouuid 1 2
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_9d595fb810b44f7c5f5e6707a06040b0 /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_9d595fb810b44f7c5f5e6707a06040b0 xfs rw,inode64,noatime,nouuid 1 2
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_ee2a7c445483ac950ba31248db2e2f8a /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_ee2a7c445483ac950ba31248db2e2f8a xfs rw,inode64,noatime,nouuid 1 2
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_b1c9d3b3d03fc8ad17b0edd259c37fe8 /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_b1c9d3b3d03fc8ad17b0edd259c37fe8 xfs rw,inode64,noatime,nouuid 1 2
/dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_301eb2f4da27b4533ed520b2c218a769 /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_301eb2f4da27b4533ed520b2c218a769 xfs rw,inode64,noatime,nouuid 1 2
/dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_a260c093f0b508ed985a46c00b559d1c /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_a260c093f0b508ed985a46c00b559d1c xfs rw,inode64,noatime,nouuid 1 2
/dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_ac54c45871ad68c29e1800bb792d0a3e /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_ac54c45871ad68c29e1800bb792d0a3e xfs rw,inode64,noatime,nouuid 1 2
/dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_5705f68474f33c5c24dbff6ae92b39a5 /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_5705f68474f33c5c24dbff6ae92b39a5 xfs rw,inode64,noatime,nouuid 1 2

sh-4.2# stat /var/log/glusterfs/container/failed_bricks 
  File: '/var/log/glusterfs/container/failed_bricks'
  Size: 2847      	Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d	Inode: 67312753    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2018-12-24 07:28:36.872109871 +0000
Modify: 2018-12-24 07:28:36.316114233 +0000
Change: 2018-12-24 07:28:36.316114233 +0000
 Birth: -

Comment 19 RamaKasturi 2018-12-24 07:46:56 UTC
Volume status output after node and pod restart:
==========================================================

sh-4.2# cat /var/log/glusterfs/volume_status_reboot.txt 
Status of volume: heketidbstorage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.46.91:/var/lib/heketi/mounts/vg
_f95193fd292e2e2749950b5c86d6a7db/brick_728
4930ad6f9e613ec2f9caddb950075/brick         49152     0          Y       420  
Brick 10.70.47.152:/var/lib/heketi/mounts/v
g_9af5d1a3d96fb7db986dea9960641a69/brick_81
0322a38992df94abcb96b66e3e4854/brick        49152     0          Y       716  
Brick 10.70.47.90:/var/lib/heketi/mounts/vg
_261fc40b553d688500df95097fceceaf/brick_5bd
c650e5272e8c4a97ed1929334f688/brick         49152     0          Y       416  
Self-heal Daemon on localhost               N/A       N/A        Y       405  
Self-heal Daemon on apu-v311z-ocs-v311-app-
cns-0                                       N/A       N/A        Y       407  
Self-heal Daemon on 10.70.47.152            N/A       N/A        Y       707  
 
Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_07dc4e02f1de21ba0e8c3ea1ce635c62
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.90:/var/lib/heketi/mounts/vg
_261fc40b553d688500df95097fceceaf/brick_433
29a33adfdd2f74f77c263afb6f156/brick         49152     0          Y       416  
Brick 10.70.46.91:/var/lib/heketi/mounts/vg
_823bd4dc9b3e98d9e1829781468472f8/brick_ac5
4c45871ad68c29e1800bb792d0a3e/brick         49152     0          Y       420  
Brick 10.70.47.152:/var/lib/heketi/mounts/v
g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_89
5a9e0651421196b74c763580edc3b7/brick        49152     0          Y       716  
Self-heal Daemon on localhost               N/A       N/A        Y       405  
Self-heal Daemon on apu-v311z-ocs-v311-app-
cns-0                                       N/A       N/A        Y       407  
Self-heal Daemon on 10.70.47.152            N/A       N/A        Y       707  
 
Task Status of Volume vol_07dc4e02f1de21ba0e8c3ea1ce635c62
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_08305c86405b3895c841d872bf0f4f96
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.90:/var/lib/heketi/mounts/vg
_1401faa9337c9c871b0bc91fe9b130a5/brick_14a
b2915e3cec27edff85374ac008e03/brick         49152     0          Y       416  
Brick 10.70.46.91:/var/lib/heketi/mounts/vg
_f95193fd292e2e2749950b5c86d6a7db/brick_a26
0c093f0b508ed985a46c00b559d1c/brick         49152     0          Y       420  
Brick 10.70.47.152:/var/lib/heketi/mounts/v
g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_a8
970da6fa60e860a3524df933250549/brick        49152     0          Y       716  
Self-heal Daemon on localhost               N/A       N/A        Y       405  
Self-heal Daemon on apu-v311z-ocs-v311-app-
cns-0                                       N/A       N/A        Y       407  
Self-heal Daemon on 10.70.47.152            N/A       N/A        Y       707  
 
Task Status of Volume vol_08305c86405b3895c841d872bf0f4f96
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_0eb56edc7b7ea789148c3af076f7ac1e
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.152:/var/lib/heketi/mounts/v
g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_ca
5e400dce2e0551375ab30c1bffaaf1/brick        49152     0          Y       716  
Brick 10.70.47.90:/var/lib/heketi/mounts/vg
_1401faa9337c9c871b0bc91fe9b130a5/brick_eb2
3a6549022a6e45b62c421add5c216/brick         49152     0          Y       416  
Brick 10.70.46.91:/var/lib/heketi/mounts/vg
_823bd4dc9b3e98d9e1829781468472f8/brick_570
5f68474f33c5c24dbff6ae92b39a5/brick         49152     0          Y       420  
Self-heal Daemon on localhost               N/A       N/A        Y       405  
Self-heal Daemon on apu-v311z-ocs-v311-app-
cns-0                                       N/A       N/A        Y       407  
Self-heal Daemon on 10.70.47.152            N/A       N/A        Y       707  
 
Task Status of Volume vol_0eb56edc7b7ea789148c3af076f7ac1e
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_495132035d89ae4c181d25e19e1ad3cd
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.152:/var/lib/heketi/mounts/v
g_9af5d1a3d96fb7db986dea9960641a69/brick_f5
01ed0139ac61f0eb17f87e1aacd604/brick        49152     0          Y       716  
Brick 10.70.46.91:/var/lib/heketi/mounts/vg
_823bd4dc9b3e98d9e1829781468472f8/brick_301
eb2f4da27b4533ed520b2c218a769/brick         49152     0          Y       420  
Brick 10.70.47.90:/var/lib/heketi/mounts/vg
_1401faa9337c9c871b0bc91fe9b130a5/brick_b3f
b035bad566c51b328b1a9d2b3707b/brick         49152     0          Y       416  
Self-heal Daemon on localhost               N/A       N/A        Y       405  
Self-heal Daemon on apu-v311z-ocs-v311-app-
cns-0                                       N/A       N/A        Y       407  
Self-heal Daemon on 10.70.47.152            N/A       N/A        Y       707  
 
Task Status of Volume vol_495132035d89ae4c181d25e19e1ad3cd
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol_9bbc93a8dcd77536812fbe51ac4c2745
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.90:/var/lib/heketi/mounts/vg
_261fc40b553d688500df95097fceceaf/brick_400
41f8b883e8d7bd06c259a74ce3e67/brick         49152     0          Y       416  
Brick 10.70.46.91:/var/lib/heketi/mounts/vg
_823bd4dc9b3e98d9e1829781468472f8/brick_02c
090a83d8bb2dc21a5d3f5f401d974/brick         49152     0          Y       420  
Brick 10.70.47.152:/var/lib/heketi/mounts/v
g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_43
c890893450c308931ba5c81c794a50/brick        49152     0          Y       716  
Brick 10.70.47.152:/var/lib/heketi/mounts/v
g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_2f
25f8d3c2b0f8bf0593570c0e45c981/brick        49152     0          Y       716  
Brick 10.70.47.90:/var/lib/heketi/mounts/vg
_1401faa9337c9c871b0bc91fe9b130a5/brick_751
9268de248093aa0f6ee1cfe0591b0/brick         49152     0          Y       416  
Brick 10.70.46.91:/var/lib/heketi/mounts/vg
_f95193fd292e2e2749950b5c86d6a7db/brick_9c0
58aaaae49e074bae7107db3327a58/brick         49152     0          Y       420  
Brick 10.70.47.152:/var/lib/heketi/mounts/v
g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_a4
de61f45bd8fc13d5601880aed0005c/brick        49152     0          Y       716  
Brick 10.70.47.90:/var/lib/heketi/mounts/vg
_261fc40b553d688500df95097fceceaf/brick_e4d
0a5a50f8b3816fef0364003325bf9/brick         49152     0          Y       416  
Brick 10.70.46.91:/var/lib/heketi/mounts/vg
_f95193fd292e2e2749950b5c86d6a7db/brick_5de
fd825b08ebcd68c151477f961e161/brick         49152     0          Y       420  
Brick 10.70.46.91:/var/lib/heketi/mounts/vg
_f95193fd292e2e2749950b5c86d6a7db/brick_cdd
7d971bc34e73e8822aae44b113fa9/brick         49152     0          Y       420  
Brick 10.70.47.152:/var/lib/heketi/mounts/v
g_9af5d1a3d96fb7db986dea9960641a69/brick_99
0563fc39e8f3d972ca130b620dc249/brick        49152     0          Y       716  
Brick 10.70.47.90:/var/lib/heketi/mounts/vg
_261fc40b553d688500df95097fceceaf/brick_b68
319a56dd9143bd05f86ee24eb476b/brick         49152     0          Y       416  
Brick 10.70.47.90:/var/lib/heketi/mounts/vg
_261fc40b553d688500df95097fceceaf/brick_d58
0d308a4dc579d34ca16b2e31ccaa7/brick         49152     0          Y       416  
Brick 10.70.46.91:/var/lib/heketi/mounts/vg
_f95193fd292e2e2749950b5c86d6a7db/brick_e78
14ded5342940d6a4384cc785ae34c/brick         49152     0          Y       420  
Brick 10.70.47.152:/var/lib/heketi/mounts/v
g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_cb
df1aab36e29d72ee5282e3f9983c28/brick        49152     0          Y       716  
Brick 10.70.47.152:/var/lib/heketi/mounts/v
g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_49
2edc310c08f9b223f747d5922e9196/brick        49152     0          Y       716  
Brick 10.70.46.91:/var/lib/heketi/mounts/vg
_f95193fd292e2e2749950b5c86d6a7db/brick_9d5
95fb810b44f7c5f5e6707a06040b0/brick         49152     0          Y       420  
Brick 10.70.47.90:/var/lib/heketi/mounts/vg
_261fc40b553d688500df95097fceceaf/brick_939
e3ee026fb3a19963df1bf7b011a7d/brick         49152     0          Y       416  
Brick 10.70.46.91:/var/lib/heketi/mounts/vg
_f95193fd292e2e2749950b5c86d6a7db/brick_ee2
a7c445483ac950ba31248db2e2f8a/brick         49152     0          Y       420  
Brick 10.70.47.90:/var/lib/heketi/mounts/vg
_261fc40b553d688500df95097fceceaf/brick_2ea
7f4783a7cd308a19114f49de2f8cc/brick         49152     0          Y       416  
Brick 10.70.47.152:/var/lib/heketi/mounts/v
g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_74
26ae9f225eb0469d8598133cf1ad9d/brick        49152     0          Y       716  
Brick 10.70.47.152:/var/lib/heketi/mounts/v
g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_a3
4caa685b65f5465297cc7fd4e52580/brick        49152     0          Y       716  
Brick 10.70.46.91:/var/lib/heketi/mounts/vg
_f95193fd292e2e2749950b5c86d6a7db/brick_b1c
9d3b3d03fc8ad17b0edd259c37fe8/brick         49152     0          Y       420  
Brick 10.70.47.90:/var/lib/heketi/mounts/vg
_261fc40b553d688500df95097fceceaf/brick_125
8a526280983123ae6571b1d497eb0/brick         49152     0          Y       416  
Self-heal Daemon on localhost               N/A       N/A        Y       405  
Self-heal Daemon on apu-v311z-ocs-v311-app-
cns-0                                       N/A       N/A        Y       407  
Self-heal Daemon on 10.70.47.152            N/A       N/A        Y       707  
 
Task Status of Volume vol_9bbc93a8dcd77536812fbe51ac4c2745
------------------------------------------------------------------------------
There are no active volume tasks

Comment 21 RamaKasturi 2018-12-24 07:50:01 UTC
Adding need info on Niels back as it got cleared before.

Comment 22 Niels de Vos 2018-12-24 09:15:52 UTC
(In reply to RamaKasturi from comment #18)
...
> 7) Once all the pods are up now go and check for the following
> 
> ps aux | grep glusterfsd <- should show that glusterfsd process are running
> gluster volume status <- should show that all bricks are up and running
> df -kh should show all bricks mounted
> /var/log/glusterfs/container/mountfstab should show that bricks are mounted.
> /var/log/glusterfs/container/failed_bricks should not list anything.
> 
> Test 2:
> ===============
> 
> 1) rebooted all the nodes where the pods are running and check for the
> following.
> 
> ps aux | grep glusterfsd <- should show that glusterfsd process are running
> gluster volume status <- should show that all bricks are up and running
> df -kh should show all bricks mounted
> /var/log/glusterfs/container/mountfstab should show that bricks are mounted.
> /var/log/glusterfs/container/failed_bricks should not list anything.
> 
> @Niels, i see that with this new patch all the bricks are up and running but
> below are my observations.
> 
> /var/log/glusterfs/container/mountfstab -> shows that bricks are not mounted
> even though they are :

The /var/log/glusterfs/container/mountfstab is used for logging during the 1st try of mounting the bricks and some checks afterwards. The errors that occurred will be listed in the mountfstab file.

/var/log/glusterfs/container/failed_bricks is used for gathering the bricks that were not mounted in the 1st try. This file is in 'fstab format' and is used to mount all failed bricks a 2nd time, after creating the LVM/LV devices with 'vgscan --mknods'.

Unfortunately the logging of the gluster-setup.sh script is not very good. Only some success/failures are logged in files. This is something that we should improve so that we can understand better what happened when things failed.

The results that you shared, and the contents of the files do not suggest anything is wrong :-)

Comment 23 Niels de Vos 2018-12-24 09:52:21 UTC
The change that we need to include in the rhgs-server container image has been posted as PR#144. The current gluster-setup.sh script needs to be replaced with the updated CentOS/gluster-setup.sh one (https://github.com/gluster/gluster-containers/blob/fa8d448f07c6c76ecc1696a6018c8b52dd071c73/CentOS/gluster-setup.sh if no further review comments require changes).

Th upstream change is currently under review, and not merged yet.

Comment 25 RamaKasturi 2019-01-02 13:29:46 UTC
Verified with the container image brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-server-rhel7:3.11.1-8.

Below are the tests performed:
=================================

Test 1:
++++++++++++++++++++++++++++++++
1) Updated the pods with the above container image
2) Created one file pvc and restarted the first pod.

Once the pod is up i see that bricks of heketidbstorage and the volume which is created newly are up and running.

Test2:
+++++++++++++++++++++++++++++
1) Created 50 file pvcs
2) Rebooted the third pod

Verified that all the bricks are online once the pod rebooted and came back up.

Test3:
++++++++++++++++++++++++++++++
1) Edited the daemon set to have the changes required for block to work
2) Now rebooted the third pod

Verified that after the pod came back up i see that bricks are online.

Test 4:
++++++++++++++++++++++++++++++++++
1) Rebooted the node where the pod is hosted

Verified that all the bricks are up once the node has been rebooted.

Copied the volume status output to the file below after pod & node reboots.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1658984/output/


Not moving this bug yet to verified state since i wanted to check with vinutha if she has anything else to test other than the above. Once i receive the confirmation from her i will move this bug to verified state.

Comment 26 RamaKasturi 2019-01-02 13:53:33 UTC
Hello Vinutha,

   I have updated my test results in comment 25. I see that you hit this bug during editing the ds and setting brick mux to no. Can you please let me know if the results from comment 25 looks good or is there anything else we should be covering?

Thanks
kasturi

Comment 28 RamaKasturi 2019-01-03 12:06:13 UTC
Moving this to verified state based on Comment 27 and Comment 25

Comment 29 vinutha 2019-01-03 13:43:22 UTC
*** Bug 1659021 has been marked as a duplicate of this bug. ***

Comment 31 errata-xmlrpc 2019-02-07 04:12:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0287