Bug 1656724

Summary: Heketi brick is in N/A state post upgrading the OCS setup with rhgs-server and volmanager image
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Manisha Saini <msaini>
Component: rhgs-server-containerAssignee: Raghavendra Talur <rtalur>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Manisha Saini <msaini>
Severity: high Docs Contact:
Priority: unspecified    
Version: ocs-3.11CC: jmulligan, kramdoss, madam, moagrawa, rhs-bugs
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-11 20:18:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Manisha Saini 2018-12-06 07:51:25 UTC
Description of problem:

Fresh installed the setup with OCP 3.11 and OCS 3.11.1 (rhgs-server-rhel7:3.11.1-1, rhgs-volmanager-rhel7:3.11.1-1) 

Did some testing for couple of days.Upgraded the setup to the new heketi and rhgs-server build ( rhgs-server-rhel7:3.11.1-3 , rhgs-volmanager-rhel7:3.11.1-2) 

Post upgrade,one of the heketi brick failed to come up and status shows N/A whereas gluster pods and heketi pods were up and running post upgrade.

On checking further,that brick is not mounted on the gluster node

heal is stuck for heketi volume since of the brick is offline.

==========
# gluster v heal heketidbstorage info
Brick 10.70.46.81:/var/lib/heketi/mounts/vg_dca41af8b5c15e419f66928440c4d9d6/brick_2c6e1296068e94eb7fdbd3ea620a6d94/brick
Status: Transport endpoint is not connected
Number of entries: -

Brick 10.70.46.62:/var/lib/heketi/mounts/vg_55d43508331a4ba298eee11d6f3c39a1/brick_aaaef260d4204c047b5747a89b6a8b74/brick
/container.log 
/ 
/heketi.db 
Status: Connected
Number of entries: 3

Brick 10.70.47.60:/var/lib/heketi/mounts/vg_cf0899e90a28ac4a488a21e0e6f2b14c/brick_0bed772e7a536fc8e6dba4b9ecdf71f3/brick
/container.log 
/ 
/heketi.db 
Status: Connected
Number of entries: 3
============

============
# gluster v status
Status of volume: heketidbstorage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.46.81:/var/lib/heketi/mounts/vg
_dca41af8b5c15e419f66928440c4d9d6/brick_2c6
e1296068e94eb7fdbd3ea620a6d94/brick         N/A       N/A        N       N/A  
Brick 10.70.46.62:/var/lib/heketi/mounts/vg
_55d43508331a4ba298eee11d6f3c39a1/brick_aaa
ef260d4204c047b5747a89b6a8b74/brick         49152     0          Y       370  
Brick 10.70.47.60:/var/lib/heketi/mounts/vg
_cf0899e90a28ac4a488a21e0e6f2b14c/brick_0be
d772e7a536fc8e6dba4b9ecdf71f3/brick         49152     0          Y       371  
Self-heal Daemon on localhost               N/A       N/A        Y       23714
Self-heal Daemon on dhcp46-62.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       24258
Self-heal Daemon on dhcp47-154.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       23460
Self-heal Daemon on 10.70.47.60             N/A       N/A        Y       23989
 
Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks
=============


glusterd.log shows following error for that brick

==========
[2018-12-05 12:59:27.093108] E [glusterd-utils.c:6177:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/var/lib/heketi/mounts/vg_dca41af8b5c15e419f66928440c4d9d6/brick_2c6e1296068e94eb7fdbd3ea620a6d94/brick), brick is deemed not to be a part of the volume (heketidbstorage)
[2018-12-05 12:59:27.093149] E [MSGID: 106005] [glusterd-server-quorum.c:408:glusterd_do_volume_quorum_action] 0-management: Failed to connect to 10.70.46.81:/var/lib/heketi/mounts/vg_dca41af8b5c15e419f66928440c4d9d6/brick_2c6e1296068e94eb7fdbd3ea620a6d94/brick
==========


# gluster peer status
Number of Peers: 3

Hostname: dhcp47-154.lab.eng.blr.redhat.com
Uuid: 6f3a1924-6308-4292-9bea-8daea70d90ca
State: Peer in Cluster (Connected)
Other names:
10.70.47.154

Hostname: dhcp46-62.lab.eng.blr.redhat.com
Uuid: 801158d4-4129-4a7e-8f11-b8583cc1d8ae
State: Peer in Cluster (Connected)

Hostname: 10.70.47.60
Uuid: 2e6fef14-4e42-40aa-a0dc-c316d8540927
State: Peer in Cluster (Connected)



Version-Release number of selected component (if applicable):


brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-volmanager-rhel7:3.11.1-2

brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-server-rhel7:3.11.1-3

brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-gluster-block-prov-rhel7:3.11.1-1

# oc version
oc v3.11.43
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://dhcp47-138.lab.eng.blr.redhat.com:8443
openshift v3.11.43
kubernetes v1.11.0+d4cacc0


How reproducible:
1/1

Steps to Reproduce:

1.Upgrade the OCS 3.11.1-1 setup to new rhgs server build - rhgs-server-rhel7:3.11.1-3 and new heketi build - rhgs-volmanager-rhel7:3.11.1-2 following the upgrade steps from admin guide.


Actual results:
Post upgrade,one of the heketi brick went into offline state

Expected results:
All bricks should be online post upgrade.

Additional info:

Comment 4 Yaniv Kaul 2019-04-01 06:40:51 UTC
Any updates?

Comment 5 Manisha Saini 2020-08-16 14:39:23 UTC
will reopen this BZ if the issue is again observed in the testing.Clearing needinfo.