Bug 1658984
Summary: | None of the bricks come ONLINE after gluster pod reboot in OCS 3.11.1 | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | vinutha <vinug> |
Component: | rhgs-server-container | Assignee: | Niels de Vos <ndevos> |
Status: | CLOSED ERRATA | QA Contact: | vinutha <vinug> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | ocs-3.11 | CC: | abhishku, amukherj, knarra, kramdoss, madam, moagrawa, nberry, ndevos, rgeorge, rhs-bugs, sankarshan, sarora, suprasad, vinug |
Target Milestone: | --- | Keywords: | Regression, TestBlocker, ZStream |
Target Release: | OCS 3.11.1 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | ocs/rhgs-server-rhel7:3.11.1-5 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-02-07 04:12:47 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1644160 |
Description
vinutha
2018-12-13 10:28:51 UTC
I am seeing a similar issue on the AWS setup where i have upgraded from accelerated hot fix builds to the latest 3.11.1 bits and i see that once my upgrade of gluster pods finishes none of the bricks in the gluster pod comes up. Below are the errors seen in glusterd.log: ================================================ [2018-12-17 10:37:16.167479] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-management: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction Final graph: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option rpc-auth-allow-insecure on 7: option transport.listen-backlog 1024 8: option event-threads 1 9: option ping-timeout 0 10: option transport.socket.read-fail-log off 11: option transport.socket.keepalive-interval 2 12: option transport.socket.keepalive-time 10 13: option transport-type rdma 14: option working-directory /var/lib/glusterd 15: end-volume 16: +------------------------------------------------------------------------------+ [2018-12-17 10:37:16.175099] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2018-12-17 10:42:30.063817] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req [2018-12-17 10:42:30.063763] I [socket.c:2481:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-12-17 10:42:30.063938] I [socket.c:3680:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2018-12-17 10:42:30.063952] E [rpcsvc.c:1349:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x2, Program: GlusterD svc cli, ProgVers: 2, Proc: 5) to rpc-transport (socket.management) [2018-12-17 10:42:30.063972] E [MSGID: 106430] [glusterd-utils.c:560:glusterd_submit_reply] 0-glusterd: Reply submission failed [2018-12-17 10:42:30.064596] I [socket.c:2481:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-12-17 10:42:30.064702] I [socket.c:3680:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2018-12-17 10:42:30.064718] E [rpcsvc.c:1349:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x2, Program: GlusterD svc cli, ProgVers: 2, Proc: 5) to rpc-transport (socket.management) [2018-12-17 10:42:30.064996] I [socket.c:2481:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-12-17 10:42:30.065382] I [socket.c:2481:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-12-17 10:42:30.065787] I [socket.c:3680:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2018-12-17 10:42:30.065799] E [rpcsvc.c:1349:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x2, Program: GlusterD svc cli, ProgVers: 2, Proc: 5) to rpc-transport (socket.management) The message "E [MSGID: 106430] [glusterd-utils.c:560:glusterd_submit_reply] 0-glusterd: Reply submission failed" repeated 2 times between [2018-12-17 10:42:30.063972] and [2018-12-17 10:42:30.065810] The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 3158 times between [2018-12-17 10:42:30.063817] and [2018-12-17 10:42:31.502823] [2018-12-17 10:45:30.753931] I [glusterd-locks.c:732:gd_mgmt_v3_unlock_timer_cbk] 0-management: In gd_mgmt_v3_unlock_timer_cbk [2018-12-17 10:45:37.169631] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 3221e583-b1ea-4de4-965a-29c54d4f69e2, host: 172.16.37.192, port: 0 [2018-12-17 10:45:37.172074] E [glusterd-utils.c:6172:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/var/lib/heketi/mounts/vg_fad4d4727a76dfba2f389f39daf27baa/brick_a4c08bf0c5b3f305093267ef46948f33/brick), brick is deemed not to be a part of the volume (heketidbstorage) [2018-12-17 10:45:37.172128] E [MSGID: 106005] [glusterd-server-quorum.c:408:glusterd_do_volume_quorum_action] 0-management: Failed to connect to 172.16.17.167:/var/lib/heketi/mounts/vg_fad4d4727a76dfba2f389f39daf27baa/brick_a4c08bf0c5b3f305093267ef46948f33/brick [2018-12-17 10:45:37.172153] E [glusterd-utils.c:6172:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/var/lib/heketi/mounts/vg_fad4d4727a76dfba2f389f39daf27baa/brick_fb0f05a58746ad08ca92bfd4ab98f1b0/brick), brick is deemed not to be a part of the volume (knarra_cirros10_claim001_eac46710-f896-11e8-a3a6-025dbca7e8b6) [2018-12-17 10:45:37.172162] E [MSGID: 106005] [glusterd-server-quorum.c:408:glusterd_do_volume_quorum_action] 0-management: Failed to connect to 172.16.17.167:/var/lib/heketi/mounts/vg_fad4d4727a76dfba2f389f39daf27baa/brick_fb0f05a58746ad08ca92bfd4ab98f1b0/brick [2018-12-17 10:45:37.172178] E [glusterd-utils.c:6172:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/var/lib/heketi/mounts/vg_fad4d4727a76dfba2f389f39daf27baa/brick_263b2ddfe254f8a84e284189f4417c05/brick), brick is deemed not to be a part of the volume (knarra_cirros10_claim002_f52b4e21-f896-11e8-a3a6-025dbca7e8b6) From the logs in /var/log/glusterfs/container/ on a node where mounting failed: sh-4.2# tail -n2 /var/log/glusterfs/container/mountfstab mount: special device /dev/mapper/vg_fc718f8701c690ecd8974cce6903f210-brick_015e5cd2337492f7c51c08e6cd384048 does not exist mount command exited with code 32 But the LVs should be available: sh-4.2# tail -n2 /var/log/glusterfs/container/lvscan ACTIVE '/dev/vg_fc718f8701c690ecd8974cce6903f210/brick_015e5cd2337492f7c51c08e6cd384048' [1.00 GiB] inherit ACTIVE '/dev/dockervg/dockerlv' [<100.00 GiB] inherit And they are! sh-4.2# ls /dev/mapper/vg_* | tail -n4 /dev/mapper/vg_fc718f8701c690ecd8974cce6903f210-tp_fff73290c8eb84e253eef5bf6bbef9e2 /dev/mapper/vg_fc718f8701c690ecd8974cce6903f210-tp_fff73290c8eb84e253eef5bf6bbef9e2-tpool /dev/mapper/vg_fc718f8701c690ecd8974cce6903f210-tp_fff73290c8eb84e253eef5bf6bbef9e2_tdata /dev/mapper/vg_fc718f8701c690ecd8974cce6903f210-tp_fff73290c8eb84e253eef5bf6bbef9e2_tmeta There are lots of bricks on this system, so there might be some delays in setting up all /dev/mapper/ devices? sh-4.2# wc -l /var/lib/heketi/fstab 1638 /var/lib/heketi/fstab [side question: How many bricks do we support on a single system?] The gluster-setup.sh script has some logic to retry mounting (and running 'vgscan --mknodes'). Unfortunately this retry logic is not executed in case 'mount' gave an error. Upstream PR https://github.com/gluster/gluster-containers/pull/114 is an attempt to fix this. Hello Niels, I did test with the container image present in comment 15 and below are my observations. Below are the steps followed to respin the container image: =============================================================== 1) copied the image to aws systems. 2) deleted the glusterfs template 3) deleted the glusterfs deamon set. 4) Edited the gluster template to have the latest image 5) created glusterfs template and deamonset again 6) While Niels and sven were debugging they started one of the volume and rebooted the pod which brought up some of the bricks, so i spinned the pod where there are no bricks online with the new container image. 7) Deleted the image and spinned a new one. 8) Pod came back up successfully. Results / Observations : ================================ 1) I could see that bricks are mounted from df -kh output . 2) checked the file /var/log/glusterfs/container/fstab and see only one entry which says "Mount successful" 3) Niels had asked me to reboot the node where glusterfs pod resides , to trigger a potential race while LVM on the host is still detecting devices when the glusterfs-server pod is starting 4) Rebooted the node and i see the following contents where it has failures in the beginning and later mounted successfully. copied mountfstab in the below link at [1] 5) Now i see that glusterfsd process came up but not all of the bricks are online. I do not see the brick which is not online present in /var/log/glusterfs/container/mountfstab /var/lib/heketi/mounts/vg_fc718f8701c690ecd8974cce6903f210/brick_aa8b1c7a713d4659f2f98ff354ce1a56/brick For the bricks which are not online i see the below error messages in glusterd.log file. [2018-12-19 19:21:46.262362] E [glusterd-utils.c:6172:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/var/lib/heketi/mounts/vg_fc718f8701c690ecd8974cce6903f210/brick_aa8b1c7a713d4659f2f98ff354ce1a56/brick), brick is deemed not to be a part of the volume (knarra_cirros14_claim071_8d3fbc2f-f985-11e8-a3a6-025dbca7e8b6) [2018-12-19 19:21:46.262370] E [MSGID: 106005] [glusterd-server-quorum.c:408:glusterd_do_volume_quorum_action] 0-management: Failed to connect to 172.16.25.103:/var/lib/heketi/mounts/vg_fc718f8701c690ecd8974cce6903f210/brick_aa8b1c7a713d4659f2f98ff354ce1a56/brick I have copied below logs and files to the location at [1] : ============================================== /var/log/glusterfs/container/mountfstab /var/log/glusterfs/glusterd.log gluster volume status output file. [1] http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1658984/ Resetting the needinfo on vinutha as i have provided the required results. placing need info back on Niels. Hello Niels, I have tested the new container image present in Comment 17 and i see that after pod restart all the bricks are up. Below are the tests performed for the same. Test Steps 1: ================== 1) docker pull the new container image on to all nodes 2) delete glusterfs daemonset 3) edit the daemonset to point to new image 4) create the glusterfs daemonset again 5) Now delete one of the glusterfs pod 6) Once the pod is up and running, restart the other pods too. 7) Once all the pods are up now go and check for the following ps aux | grep glusterfsd <- should show that glusterfsd process are running gluster volume status <- should show that all bricks are up and running df -kh should show all bricks mounted /var/log/glusterfs/container/mountfstab should show that bricks are mounted. /var/log/glusterfs/container/failed_bricks should not list anything. Test 2: =============== 1) rebooted all the nodes where the pods are running and check for the following. ps aux | grep glusterfsd <- should show that glusterfsd process are running gluster volume status <- should show that all bricks are up and running df -kh should show all bricks mounted /var/log/glusterfs/container/mountfstab should show that bricks are mounted. /var/log/glusterfs/container/failed_bricks should not list anything. @Niels, i see that with this new patch all the bricks are up and running but below are my observations. /var/log/glusterfs/container/mountfstab -> shows that bricks are not mounted even though they are : sh-4.2# cat /var/log/glusterfs/container/mountfstab mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_7284930ad6f9e613ec2f9caddb950075 does not exist mount: special device /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_02c090a83d8bb2dc21a5d3f5f401d974 does not exist mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_9c058aaaae49e074bae7107db3327a58 does not exist mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_5defd825b08ebcd68c151477f961e161 does not exist mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_cdd7d971bc34e73e8822aae44b113fa9 does not exist mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_e7814ded5342940d6a4384cc785ae34c does not exist mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_9d595fb810b44f7c5f5e6707a06040b0 does not exist mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_ee2a7c445483ac950ba31248db2e2f8a does not exist mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_b1c9d3b3d03fc8ad17b0edd259c37fe8 does not exist mount: special device /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_301eb2f4da27b4533ed520b2c218a769 does not exist mount: special device /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_a260c093f0b508ed985a46c00b559d1c does not exist mount: special device /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_ac54c45871ad68c29e1800bb792d0a3e does not exist mount: special device /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_5705f68474f33c5c24dbff6ae92b39a5 does not exist mount command exited with code 32 /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_7284930ad6f9e613ec2f9caddb950075 not mounted. /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_02c090a83d8bb2dc21a5d3f5f401d974 not mounted. /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_9c058aaaae49e074bae7107db3327a58 not mounted. /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_5defd825b08ebcd68c151477f961e161 not mounted. /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_cdd7d971bc34e73e8822aae44b113fa9 not mounted. /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_e7814ded5342940d6a4384cc785ae34c not mounted. /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_9d595fb810b44f7c5f5e6707a06040b0 not mounted. /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_ee2a7c445483ac950ba31248db2e2f8a not mounted. /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_b1c9d3b3d03fc8ad17b0edd259c37fe8 not mounted. /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_301eb2f4da27b4533ed520b2c218a769 not mounted. /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_a260c093f0b508ed985a46c00b559d1c not mounted. /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_ac54c45871ad68c29e1800bb792d0a3e not mounted. /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_5705f68474f33c5c24dbff6ae92b39a5 not mounted. sh-4.2# stat /var/log/glusterfs/container/mountfstab File: '/var/log/glusterfs/container/mountfstab' Size: 3089 Blocks: 8 IO Block: 4096 regular file Device: fd00h/64768d Inode: 67312755 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-12-24 07:30:32.682186651 +0000 Modify: 2018-12-24 07:28:36.316114233 +0000 Change: 2018-12-24 07:28:36.316114233 +0000 Birth: - sh-4.2# date -u Mon Dec 24 07:44:33 UTC 2018 df-kh shows that bricks are mounted: ============================================= sh-4.2# df -kh Filesystem Size Used Avail Use% Mounted on overlay 40G 2.6G 38G 7% / tmpfs 16G 0 16G 0% /dev /dev/sdc 40G 33M 40G 1% /run /dev/mapper/docker--vol-dockerlv 40G 2.6G 38G 7% /run/secrets /dev/mapper/rhel_dhcp46--210-root 35G 2.6G 33G 8% /etc/ssl tmpfs 16G 2.5M 16G 1% /run/lvm devtmpfs 16G 0 16G 0% /dev/disk shm 64M 0 64M 0% /dev/shm tmpfs 16G 0 16G 0% /sys/fs/cgroup tmpfs 16G 16K 16G 1% /run/secrets/kubernetes.io/serviceaccount /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_7284930ad6f9e613ec2f9caddb950075 2.0G 33M 2.0G 2% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_7284930ad6f9e613ec2f9caddb950075 /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_02c090a83d8bb2dc21a5d3f5f401d974 85G 34M 85G 1% /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_02c090a83d8bb2dc21a5d3f5f401d974 /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_9c058aaaae49e074bae7107db3327a58 85G 34M 85G 1% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_9c058aaaae49e074bae7107db3327a58 /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_5defd825b08ebcd68c151477f961e161 85G 34M 85G 1% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_5defd825b08ebcd68c151477f961e161 /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_cdd7d971bc34e73e8822aae44b113fa9 85G 34M 85G 1% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_cdd7d971bc34e73e8822aae44b113fa9 /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_e7814ded5342940d6a4384cc785ae34c 85G 34M 85G 1% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_e7814ded5342940d6a4384cc785ae34c /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_9d595fb810b44f7c5f5e6707a06040b0 85G 34M 85G 1% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_9d595fb810b44f7c5f5e6707a06040b0 /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_ee2a7c445483ac950ba31248db2e2f8a 85G 34M 85G 1% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_ee2a7c445483ac950ba31248db2e2f8a /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_b1c9d3b3d03fc8ad17b0edd259c37fe8 85G 34M 85G 1% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_b1c9d3b3d03fc8ad17b0edd259c37fe8 /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_301eb2f4da27b4533ed520b2c218a769 10G 34M 10G 1% /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_301eb2f4da27b4533ed520b2c218a769 /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_a260c093f0b508ed985a46c00b559d1c 1014M 33M 982M 4% /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_a260c093f0b508ed985a46c00b559d1c /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_ac54c45871ad68c29e1800bb792d0a3e 1014M 33M 982M 4% /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_ac54c45871ad68c29e1800bb792d0a3e /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_5705f68474f33c5c24dbff6ae92b39a5 1014M 33M 982M 4% /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_5705f68474f33c5c24dbff6ae92b39a5 since all the bricks are mounted, i understand that there should be no failed_bricks. Any idea why i still see failed_bricks ? sh-4.2# cat /var/log/glusterfs/container/failed_bricks /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_7284930ad6f9e613ec2f9caddb950075 /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_7284930ad6f9e613ec2f9caddb950075 xfs rw,inode64,noatime,nouuid 1 2 /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_02c090a83d8bb2dc21a5d3f5f401d974 /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_02c090a83d8bb2dc21a5d3f5f401d974 xfs rw,inode64,noatime,nouuid 1 2 /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_9c058aaaae49e074bae7107db3327a58 /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_9c058aaaae49e074bae7107db3327a58 xfs rw,inode64,noatime,nouuid 1 2 /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_5defd825b08ebcd68c151477f961e161 /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_5defd825b08ebcd68c151477f961e161 xfs rw,inode64,noatime,nouuid 1 2 /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_cdd7d971bc34e73e8822aae44b113fa9 /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_cdd7d971bc34e73e8822aae44b113fa9 xfs rw,inode64,noatime,nouuid 1 2 /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_e7814ded5342940d6a4384cc785ae34c /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_e7814ded5342940d6a4384cc785ae34c xfs rw,inode64,noatime,nouuid 1 2 /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_9d595fb810b44f7c5f5e6707a06040b0 /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_9d595fb810b44f7c5f5e6707a06040b0 xfs rw,inode64,noatime,nouuid 1 2 /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_ee2a7c445483ac950ba31248db2e2f8a /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_ee2a7c445483ac950ba31248db2e2f8a xfs rw,inode64,noatime,nouuid 1 2 /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_b1c9d3b3d03fc8ad17b0edd259c37fe8 /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_b1c9d3b3d03fc8ad17b0edd259c37fe8 xfs rw,inode64,noatime,nouuid 1 2 /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_301eb2f4da27b4533ed520b2c218a769 /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_301eb2f4da27b4533ed520b2c218a769 xfs rw,inode64,noatime,nouuid 1 2 /dev/mapper/vg_f95193fd292e2e2749950b5c86d6a7db-brick_a260c093f0b508ed985a46c00b559d1c /var/lib/heketi/mounts/vg_f95193fd292e2e2749950b5c86d6a7db/brick_a260c093f0b508ed985a46c00b559d1c xfs rw,inode64,noatime,nouuid 1 2 /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_ac54c45871ad68c29e1800bb792d0a3e /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_ac54c45871ad68c29e1800bb792d0a3e xfs rw,inode64,noatime,nouuid 1 2 /dev/mapper/vg_823bd4dc9b3e98d9e1829781468472f8-brick_5705f68474f33c5c24dbff6ae92b39a5 /var/lib/heketi/mounts/vg_823bd4dc9b3e98d9e1829781468472f8/brick_5705f68474f33c5c24dbff6ae92b39a5 xfs rw,inode64,noatime,nouuid 1 2 sh-4.2# stat /var/log/glusterfs/container/failed_bricks File: '/var/log/glusterfs/container/failed_bricks' Size: 2847 Blocks: 8 IO Block: 4096 regular file Device: fd00h/64768d Inode: 67312753 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-12-24 07:28:36.872109871 +0000 Modify: 2018-12-24 07:28:36.316114233 +0000 Change: 2018-12-24 07:28:36.316114233 +0000 Birth: - Volume status output after node and pod restart: ========================================================== sh-4.2# cat /var/log/glusterfs/volume_status_reboot.txt Status of volume: heketidbstorage Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.91:/var/lib/heketi/mounts/vg _f95193fd292e2e2749950b5c86d6a7db/brick_728 4930ad6f9e613ec2f9caddb950075/brick 49152 0 Y 420 Brick 10.70.47.152:/var/lib/heketi/mounts/v g_9af5d1a3d96fb7db986dea9960641a69/brick_81 0322a38992df94abcb96b66e3e4854/brick 49152 0 Y 716 Brick 10.70.47.90:/var/lib/heketi/mounts/vg _261fc40b553d688500df95097fceceaf/brick_5bd c650e5272e8c4a97ed1929334f688/brick 49152 0 Y 416 Self-heal Daemon on localhost N/A N/A Y 405 Self-heal Daemon on apu-v311z-ocs-v311-app- cns-0 N/A N/A Y 407 Self-heal Daemon on 10.70.47.152 N/A N/A Y 707 Task Status of Volume heketidbstorage ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol_07dc4e02f1de21ba0e8c3ea1ce635c62 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.90:/var/lib/heketi/mounts/vg _261fc40b553d688500df95097fceceaf/brick_433 29a33adfdd2f74f77c263afb6f156/brick 49152 0 Y 416 Brick 10.70.46.91:/var/lib/heketi/mounts/vg _823bd4dc9b3e98d9e1829781468472f8/brick_ac5 4c45871ad68c29e1800bb792d0a3e/brick 49152 0 Y 420 Brick 10.70.47.152:/var/lib/heketi/mounts/v g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_89 5a9e0651421196b74c763580edc3b7/brick 49152 0 Y 716 Self-heal Daemon on localhost N/A N/A Y 405 Self-heal Daemon on apu-v311z-ocs-v311-app- cns-0 N/A N/A Y 407 Self-heal Daemon on 10.70.47.152 N/A N/A Y 707 Task Status of Volume vol_07dc4e02f1de21ba0e8c3ea1ce635c62 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol_08305c86405b3895c841d872bf0f4f96 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.90:/var/lib/heketi/mounts/vg _1401faa9337c9c871b0bc91fe9b130a5/brick_14a b2915e3cec27edff85374ac008e03/brick 49152 0 Y 416 Brick 10.70.46.91:/var/lib/heketi/mounts/vg _f95193fd292e2e2749950b5c86d6a7db/brick_a26 0c093f0b508ed985a46c00b559d1c/brick 49152 0 Y 420 Brick 10.70.47.152:/var/lib/heketi/mounts/v g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_a8 970da6fa60e860a3524df933250549/brick 49152 0 Y 716 Self-heal Daemon on localhost N/A N/A Y 405 Self-heal Daemon on apu-v311z-ocs-v311-app- cns-0 N/A N/A Y 407 Self-heal Daemon on 10.70.47.152 N/A N/A Y 707 Task Status of Volume vol_08305c86405b3895c841d872bf0f4f96 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol_0eb56edc7b7ea789148c3af076f7ac1e Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.152:/var/lib/heketi/mounts/v g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_ca 5e400dce2e0551375ab30c1bffaaf1/brick 49152 0 Y 716 Brick 10.70.47.90:/var/lib/heketi/mounts/vg _1401faa9337c9c871b0bc91fe9b130a5/brick_eb2 3a6549022a6e45b62c421add5c216/brick 49152 0 Y 416 Brick 10.70.46.91:/var/lib/heketi/mounts/vg _823bd4dc9b3e98d9e1829781468472f8/brick_570 5f68474f33c5c24dbff6ae92b39a5/brick 49152 0 Y 420 Self-heal Daemon on localhost N/A N/A Y 405 Self-heal Daemon on apu-v311z-ocs-v311-app- cns-0 N/A N/A Y 407 Self-heal Daemon on 10.70.47.152 N/A N/A Y 707 Task Status of Volume vol_0eb56edc7b7ea789148c3af076f7ac1e ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol_495132035d89ae4c181d25e19e1ad3cd Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.152:/var/lib/heketi/mounts/v g_9af5d1a3d96fb7db986dea9960641a69/brick_f5 01ed0139ac61f0eb17f87e1aacd604/brick 49152 0 Y 716 Brick 10.70.46.91:/var/lib/heketi/mounts/vg _823bd4dc9b3e98d9e1829781468472f8/brick_301 eb2f4da27b4533ed520b2c218a769/brick 49152 0 Y 420 Brick 10.70.47.90:/var/lib/heketi/mounts/vg _1401faa9337c9c871b0bc91fe9b130a5/brick_b3f b035bad566c51b328b1a9d2b3707b/brick 49152 0 Y 416 Self-heal Daemon on localhost N/A N/A Y 405 Self-heal Daemon on apu-v311z-ocs-v311-app- cns-0 N/A N/A Y 407 Self-heal Daemon on 10.70.47.152 N/A N/A Y 707 Task Status of Volume vol_495132035d89ae4c181d25e19e1ad3cd ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol_9bbc93a8dcd77536812fbe51ac4c2745 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.90:/var/lib/heketi/mounts/vg _261fc40b553d688500df95097fceceaf/brick_400 41f8b883e8d7bd06c259a74ce3e67/brick 49152 0 Y 416 Brick 10.70.46.91:/var/lib/heketi/mounts/vg _823bd4dc9b3e98d9e1829781468472f8/brick_02c 090a83d8bb2dc21a5d3f5f401d974/brick 49152 0 Y 420 Brick 10.70.47.152:/var/lib/heketi/mounts/v g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_43 c890893450c308931ba5c81c794a50/brick 49152 0 Y 716 Brick 10.70.47.152:/var/lib/heketi/mounts/v g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_2f 25f8d3c2b0f8bf0593570c0e45c981/brick 49152 0 Y 716 Brick 10.70.47.90:/var/lib/heketi/mounts/vg _1401faa9337c9c871b0bc91fe9b130a5/brick_751 9268de248093aa0f6ee1cfe0591b0/brick 49152 0 Y 416 Brick 10.70.46.91:/var/lib/heketi/mounts/vg _f95193fd292e2e2749950b5c86d6a7db/brick_9c0 58aaaae49e074bae7107db3327a58/brick 49152 0 Y 420 Brick 10.70.47.152:/var/lib/heketi/mounts/v g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_a4 de61f45bd8fc13d5601880aed0005c/brick 49152 0 Y 716 Brick 10.70.47.90:/var/lib/heketi/mounts/vg _261fc40b553d688500df95097fceceaf/brick_e4d 0a5a50f8b3816fef0364003325bf9/brick 49152 0 Y 416 Brick 10.70.46.91:/var/lib/heketi/mounts/vg _f95193fd292e2e2749950b5c86d6a7db/brick_5de fd825b08ebcd68c151477f961e161/brick 49152 0 Y 420 Brick 10.70.46.91:/var/lib/heketi/mounts/vg _f95193fd292e2e2749950b5c86d6a7db/brick_cdd 7d971bc34e73e8822aae44b113fa9/brick 49152 0 Y 420 Brick 10.70.47.152:/var/lib/heketi/mounts/v g_9af5d1a3d96fb7db986dea9960641a69/brick_99 0563fc39e8f3d972ca130b620dc249/brick 49152 0 Y 716 Brick 10.70.47.90:/var/lib/heketi/mounts/vg _261fc40b553d688500df95097fceceaf/brick_b68 319a56dd9143bd05f86ee24eb476b/brick 49152 0 Y 416 Brick 10.70.47.90:/var/lib/heketi/mounts/vg _261fc40b553d688500df95097fceceaf/brick_d58 0d308a4dc579d34ca16b2e31ccaa7/brick 49152 0 Y 416 Brick 10.70.46.91:/var/lib/heketi/mounts/vg _f95193fd292e2e2749950b5c86d6a7db/brick_e78 14ded5342940d6a4384cc785ae34c/brick 49152 0 Y 420 Brick 10.70.47.152:/var/lib/heketi/mounts/v g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_cb df1aab36e29d72ee5282e3f9983c28/brick 49152 0 Y 716 Brick 10.70.47.152:/var/lib/heketi/mounts/v g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_49 2edc310c08f9b223f747d5922e9196/brick 49152 0 Y 716 Brick 10.70.46.91:/var/lib/heketi/mounts/vg _f95193fd292e2e2749950b5c86d6a7db/brick_9d5 95fb810b44f7c5f5e6707a06040b0/brick 49152 0 Y 420 Brick 10.70.47.90:/var/lib/heketi/mounts/vg _261fc40b553d688500df95097fceceaf/brick_939 e3ee026fb3a19963df1bf7b011a7d/brick 49152 0 Y 416 Brick 10.70.46.91:/var/lib/heketi/mounts/vg _f95193fd292e2e2749950b5c86d6a7db/brick_ee2 a7c445483ac950ba31248db2e2f8a/brick 49152 0 Y 420 Brick 10.70.47.90:/var/lib/heketi/mounts/vg _261fc40b553d688500df95097fceceaf/brick_2ea 7f4783a7cd308a19114f49de2f8cc/brick 49152 0 Y 416 Brick 10.70.47.152:/var/lib/heketi/mounts/v g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_74 26ae9f225eb0469d8598133cf1ad9d/brick 49152 0 Y 716 Brick 10.70.47.152:/var/lib/heketi/mounts/v g_627c29a85d39b08e6b1c1a9ecb1d912c/brick_a3 4caa685b65f5465297cc7fd4e52580/brick 49152 0 Y 716 Brick 10.70.46.91:/var/lib/heketi/mounts/vg _f95193fd292e2e2749950b5c86d6a7db/brick_b1c 9d3b3d03fc8ad17b0edd259c37fe8/brick 49152 0 Y 420 Brick 10.70.47.90:/var/lib/heketi/mounts/vg _261fc40b553d688500df95097fceceaf/brick_125 8a526280983123ae6571b1d497eb0/brick 49152 0 Y 416 Self-heal Daemon on localhost N/A N/A Y 405 Self-heal Daemon on apu-v311z-ocs-v311-app- cns-0 N/A N/A Y 407 Self-heal Daemon on 10.70.47.152 N/A N/A Y 707 Task Status of Volume vol_9bbc93a8dcd77536812fbe51ac4c2745 ------------------------------------------------------------------------------ There are no active volume tasks Adding need info on Niels back as it got cleared before. (In reply to RamaKasturi from comment #18) ... > 7) Once all the pods are up now go and check for the following > > ps aux | grep glusterfsd <- should show that glusterfsd process are running > gluster volume status <- should show that all bricks are up and running > df -kh should show all bricks mounted > /var/log/glusterfs/container/mountfstab should show that bricks are mounted. > /var/log/glusterfs/container/failed_bricks should not list anything. > > Test 2: > =============== > > 1) rebooted all the nodes where the pods are running and check for the > following. > > ps aux | grep glusterfsd <- should show that glusterfsd process are running > gluster volume status <- should show that all bricks are up and running > df -kh should show all bricks mounted > /var/log/glusterfs/container/mountfstab should show that bricks are mounted. > /var/log/glusterfs/container/failed_bricks should not list anything. > > @Niels, i see that with this new patch all the bricks are up and running but > below are my observations. > > /var/log/glusterfs/container/mountfstab -> shows that bricks are not mounted > even though they are : The /var/log/glusterfs/container/mountfstab is used for logging during the 1st try of mounting the bricks and some checks afterwards. The errors that occurred will be listed in the mountfstab file. /var/log/glusterfs/container/failed_bricks is used for gathering the bricks that were not mounted in the 1st try. This file is in 'fstab format' and is used to mount all failed bricks a 2nd time, after creating the LVM/LV devices with 'vgscan --mknods'. Unfortunately the logging of the gluster-setup.sh script is not very good. Only some success/failures are logged in files. This is something that we should improve so that we can understand better what happened when things failed. The results that you shared, and the contents of the files do not suggest anything is wrong :-) The change that we need to include in the rhgs-server container image has been posted as PR#144. The current gluster-setup.sh script needs to be replaced with the updated CentOS/gluster-setup.sh one (https://github.com/gluster/gluster-containers/blob/fa8d448f07c6c76ecc1696a6018c8b52dd071c73/CentOS/gluster-setup.sh if no further review comments require changes). Th upstream change is currently under review, and not merged yet. Verified with the container image brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-server-rhel7:3.11.1-8. Below are the tests performed: ================================= Test 1: ++++++++++++++++++++++++++++++++ 1) Updated the pods with the above container image 2) Created one file pvc and restarted the first pod. Once the pod is up i see that bricks of heketidbstorage and the volume which is created newly are up and running. Test2: +++++++++++++++++++++++++++++ 1) Created 50 file pvcs 2) Rebooted the third pod Verified that all the bricks are online once the pod rebooted and came back up. Test3: ++++++++++++++++++++++++++++++ 1) Edited the daemon set to have the changes required for block to work 2) Now rebooted the third pod Verified that after the pod came back up i see that bricks are online. Test 4: ++++++++++++++++++++++++++++++++++ 1) Rebooted the node where the pod is hosted Verified that all the bricks are up once the node has been rebooted. Copied the volume status output to the file below after pod & node reboots. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1658984/output/ Not moving this bug yet to verified state since i wanted to check with vinutha if she has anything else to test other than the above. Once i receive the confirmation from her i will move this bug to verified state. Hello Vinutha, I have updated my test results in comment 25. I see that you hit this bug during editing the ds and setting brick mux to no. Can you please let me know if the results from comment 25 looks good or is there anything else we should be covering? Thanks kasturi Moving this to verified state based on Comment 27 and Comment 25 *** Bug 1659021 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0287 |