Bug 1656724 - Heketi brick is in N/A state post upgrading the OCS setup with rhgs-server and volmanager image
Summary: Heketi brick is in N/A state post upgrading the OCS setup with rhgs-server an...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhgs-server-container
Version: ocs-3.11
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Raghavendra Talur
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-06 07:51 UTC by Manisha Saini
Modified: 2020-08-16 14:39 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-11 20:18:17 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1623433 0 unspecified CLOSED Brick fails to come online after shutting down and restarting a node 2021-02-22 00:41:40 UTC

Internal Links: 1623433

Description Manisha Saini 2018-12-06 07:51:25 UTC
Description of problem:

Fresh installed the setup with OCP 3.11 and OCS 3.11.1 (rhgs-server-rhel7:3.11.1-1, rhgs-volmanager-rhel7:3.11.1-1) 

Did some testing for couple of days.Upgraded the setup to the new heketi and rhgs-server build ( rhgs-server-rhel7:3.11.1-3 , rhgs-volmanager-rhel7:3.11.1-2) 

Post upgrade,one of the heketi brick failed to come up and status shows N/A whereas gluster pods and heketi pods were up and running post upgrade.

On checking further,that brick is not mounted on the gluster node

heal is stuck for heketi volume since of the brick is offline.

==========
# gluster v heal heketidbstorage info
Brick 10.70.46.81:/var/lib/heketi/mounts/vg_dca41af8b5c15e419f66928440c4d9d6/brick_2c6e1296068e94eb7fdbd3ea620a6d94/brick
Status: Transport endpoint is not connected
Number of entries: -

Brick 10.70.46.62:/var/lib/heketi/mounts/vg_55d43508331a4ba298eee11d6f3c39a1/brick_aaaef260d4204c047b5747a89b6a8b74/brick
/container.log 
/ 
/heketi.db 
Status: Connected
Number of entries: 3

Brick 10.70.47.60:/var/lib/heketi/mounts/vg_cf0899e90a28ac4a488a21e0e6f2b14c/brick_0bed772e7a536fc8e6dba4b9ecdf71f3/brick
/container.log 
/ 
/heketi.db 
Status: Connected
Number of entries: 3
============

============
# gluster v status
Status of volume: heketidbstorage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.46.81:/var/lib/heketi/mounts/vg
_dca41af8b5c15e419f66928440c4d9d6/brick_2c6
e1296068e94eb7fdbd3ea620a6d94/brick         N/A       N/A        N       N/A  
Brick 10.70.46.62:/var/lib/heketi/mounts/vg
_55d43508331a4ba298eee11d6f3c39a1/brick_aaa
ef260d4204c047b5747a89b6a8b74/brick         49152     0          Y       370  
Brick 10.70.47.60:/var/lib/heketi/mounts/vg
_cf0899e90a28ac4a488a21e0e6f2b14c/brick_0be
d772e7a536fc8e6dba4b9ecdf71f3/brick         49152     0          Y       371  
Self-heal Daemon on localhost               N/A       N/A        Y       23714
Self-heal Daemon on dhcp46-62.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       24258
Self-heal Daemon on dhcp47-154.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       23460
Self-heal Daemon on 10.70.47.60             N/A       N/A        Y       23989
 
Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks
=============


glusterd.log shows following error for that brick

==========
[2018-12-05 12:59:27.093108] E [glusterd-utils.c:6177:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/var/lib/heketi/mounts/vg_dca41af8b5c15e419f66928440c4d9d6/brick_2c6e1296068e94eb7fdbd3ea620a6d94/brick), brick is deemed not to be a part of the volume (heketidbstorage)
[2018-12-05 12:59:27.093149] E [MSGID: 106005] [glusterd-server-quorum.c:408:glusterd_do_volume_quorum_action] 0-management: Failed to connect to 10.70.46.81:/var/lib/heketi/mounts/vg_dca41af8b5c15e419f66928440c4d9d6/brick_2c6e1296068e94eb7fdbd3ea620a6d94/brick
==========


# gluster peer status
Number of Peers: 3

Hostname: dhcp47-154.lab.eng.blr.redhat.com
Uuid: 6f3a1924-6308-4292-9bea-8daea70d90ca
State: Peer in Cluster (Connected)
Other names:
10.70.47.154

Hostname: dhcp46-62.lab.eng.blr.redhat.com
Uuid: 801158d4-4129-4a7e-8f11-b8583cc1d8ae
State: Peer in Cluster (Connected)

Hostname: 10.70.47.60
Uuid: 2e6fef14-4e42-40aa-a0dc-c316d8540927
State: Peer in Cluster (Connected)



Version-Release number of selected component (if applicable):


brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-volmanager-rhel7:3.11.1-2

brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-server-rhel7:3.11.1-3

brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-gluster-block-prov-rhel7:3.11.1-1

# oc version
oc v3.11.43
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://dhcp47-138.lab.eng.blr.redhat.com:8443
openshift v3.11.43
kubernetes v1.11.0+d4cacc0


How reproducible:
1/1

Steps to Reproduce:

1.Upgrade the OCS 3.11.1-1 setup to new rhgs server build - rhgs-server-rhel7:3.11.1-3 and new heketi build - rhgs-volmanager-rhel7:3.11.1-2 following the upgrade steps from admin guide.


Actual results:
Post upgrade,one of the heketi brick went into offline state

Expected results:
All bricks should be online post upgrade.

Additional info:

Comment 4 Yaniv Kaul 2019-04-01 06:40:51 UTC
Any updates?

Comment 5 Manisha Saini 2020-08-16 14:39:23 UTC
will reopen this BZ if the issue is again observed in the testing.Clearing needinfo.


Note You need to log in before you can comment on or make changes to this bug.