Description of problem: Fresh installed the setup with OCP 3.11 and OCS 3.11.1 (rhgs-server-rhel7:3.11.1-1, rhgs-volmanager-rhel7:3.11.1-1) Did some testing for couple of days.Upgraded the setup to the new heketi and rhgs-server build ( rhgs-server-rhel7:3.11.1-3 , rhgs-volmanager-rhel7:3.11.1-2) Post upgrade,one of the heketi brick failed to come up and status shows N/A whereas gluster pods and heketi pods were up and running post upgrade. On checking further,that brick is not mounted on the gluster node heal is stuck for heketi volume since of the brick is offline. ========== # gluster v heal heketidbstorage info Brick 10.70.46.81:/var/lib/heketi/mounts/vg_dca41af8b5c15e419f66928440c4d9d6/brick_2c6e1296068e94eb7fdbd3ea620a6d94/brick Status: Transport endpoint is not connected Number of entries: - Brick 10.70.46.62:/var/lib/heketi/mounts/vg_55d43508331a4ba298eee11d6f3c39a1/brick_aaaef260d4204c047b5747a89b6a8b74/brick /container.log / /heketi.db Status: Connected Number of entries: 3 Brick 10.70.47.60:/var/lib/heketi/mounts/vg_cf0899e90a28ac4a488a21e0e6f2b14c/brick_0bed772e7a536fc8e6dba4b9ecdf71f3/brick /container.log / /heketi.db Status: Connected Number of entries: 3 ============ ============ # gluster v status Status of volume: heketidbstorage Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.81:/var/lib/heketi/mounts/vg _dca41af8b5c15e419f66928440c4d9d6/brick_2c6 e1296068e94eb7fdbd3ea620a6d94/brick N/A N/A N N/A Brick 10.70.46.62:/var/lib/heketi/mounts/vg _55d43508331a4ba298eee11d6f3c39a1/brick_aaa ef260d4204c047b5747a89b6a8b74/brick 49152 0 Y 370 Brick 10.70.47.60:/var/lib/heketi/mounts/vg _cf0899e90a28ac4a488a21e0e6f2b14c/brick_0be d772e7a536fc8e6dba4b9ecdf71f3/brick 49152 0 Y 371 Self-heal Daemon on localhost N/A N/A Y 23714 Self-heal Daemon on dhcp46-62.lab.eng.blr.r edhat.com N/A N/A Y 24258 Self-heal Daemon on dhcp47-154.lab.eng.blr. redhat.com N/A N/A Y 23460 Self-heal Daemon on 10.70.47.60 N/A N/A Y 23989 Task Status of Volume heketidbstorage ------------------------------------------------------------------------------ There are no active volume tasks ============= glusterd.log shows following error for that brick ========== [2018-12-05 12:59:27.093108] E [glusterd-utils.c:6177:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/var/lib/heketi/mounts/vg_dca41af8b5c15e419f66928440c4d9d6/brick_2c6e1296068e94eb7fdbd3ea620a6d94/brick), brick is deemed not to be a part of the volume (heketidbstorage) [2018-12-05 12:59:27.093149] E [MSGID: 106005] [glusterd-server-quorum.c:408:glusterd_do_volume_quorum_action] 0-management: Failed to connect to 10.70.46.81:/var/lib/heketi/mounts/vg_dca41af8b5c15e419f66928440c4d9d6/brick_2c6e1296068e94eb7fdbd3ea620a6d94/brick ========== # gluster peer status Number of Peers: 3 Hostname: dhcp47-154.lab.eng.blr.redhat.com Uuid: 6f3a1924-6308-4292-9bea-8daea70d90ca State: Peer in Cluster (Connected) Other names: 10.70.47.154 Hostname: dhcp46-62.lab.eng.blr.redhat.com Uuid: 801158d4-4129-4a7e-8f11-b8583cc1d8ae State: Peer in Cluster (Connected) Hostname: 10.70.47.60 Uuid: 2e6fef14-4e42-40aa-a0dc-c316d8540927 State: Peer in Cluster (Connected) Version-Release number of selected component (if applicable): brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-volmanager-rhel7:3.11.1-2 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-server-rhel7:3.11.1-3 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-gluster-block-prov-rhel7:3.11.1-1 # oc version oc v3.11.43 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://dhcp47-138.lab.eng.blr.redhat.com:8443 openshift v3.11.43 kubernetes v1.11.0+d4cacc0 How reproducible: 1/1 Steps to Reproduce: 1.Upgrade the OCS 3.11.1-1 setup to new rhgs server build - rhgs-server-rhel7:3.11.1-3 and new heketi build - rhgs-volmanager-rhel7:3.11.1-2 following the upgrade steps from admin guide. Actual results: Post upgrade,one of the heketi brick went into offline state Expected results: All bricks should be online post upgrade. Additional info:
Any updates?
will reopen this BZ if the issue is again observed in the testing.Clearing needinfo.