Description of problem: On a 3 node CNS setup, when one of the node containing gluster pod was rebooted, the gluster pod fails to come up. There are around 600 volumes in the setup. The build used for the testing has brick multiplexing enabled. I'm not sure where the glusterd fails to come up, but this is the observation. Lot of brick relates messages are seen in the glusterd logs. [2017-07-20 07:25:23.026481] I [glusterd-utils.c:5057:attach_brick] 0-management: add brick /var/lib/heketi/mounts/vg_cd31d2bc8e2d9def002ba6d2c5db7e19/brick_7c71f2212339545ca834a72ea6fee8c8/brick to existing process for /var/lib/heketi/mounts/vg_161d21b3068c4ea60ecba72c1450aedf/brick_7a11f1c67edee5557a9fbf3d4f5a0495/brick [2017-07-20 07:25:23.026935] I [glusterd-utils.c:5057:attach_brick] 0-management: add brick /var/lib/heketi/mounts/vg_ac2b2a4bb2a522005033a9ba9eb6d8a6/brick_abce34cd3481cc4682693d32833669d9/brick to existing process for /var/lib/heketi/mounts/vg_161d21b3068c4ea60ecba72c1450aedf/brick_7a11f1c67edee5557a9fbf3d4f5a0495/brick [2017-07-20 07:25:23.027449] I [glusterd-utils.c:5057:attach_brick] 0-management: add brick /var/lib/heketi/mounts/vg_161d21b3068c4ea60ecba72c1450aedf/brick_b2870fabb4225bcc35bfe427c92faa1a/brick to existing process for /var/lib/heketi/mounts/vg_161d21b3068c4ea60ecba72c1450aedf/brick_7a11f1c67edee5557a9fbf3d4f5a0495/brick connection related messages are seen too. [2017-07-20 07:25:18.690206] W [glusterd-utils.c:5114:attach_brick] 0-management: attach failed for /var/lib/heketi/mounts/vg_161d21b3068c4ea60ecba72c1450aedf/brick_7a11f1c67edee5557a9fbf3d4f5a0495/brick [2017-07-20 07:25:18.746394] I [rpc-clnt.c:1060:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2017-07-20 07:25:19.747509] I [glusterd-utils.c:5057:attach_brick] 0-management: add brick /var/lib/heketi/mounts/vg_cd31d2bc8e2d9def002ba6d2c5db7e19/brick_9312d56fb9f5a1c85f7a06725ac27347/brick to existing process for /var/lib/heketi/mounts/vg_161d21b3068c4ea60ecba72c1450aedf/brick_7a11f1c67edee5557a9fbf3d4f5a0495/brick [2017-07-20 07:25:19.747816] I [glusterd-utils.c:4967:send_attach_req] 0-management: not connected yet [2017-07-20 07:25:19.848777] I [glusterd-utils.c:4967:send_attach_req] 0-management: not connected yet [2017-07-20 07:25:19.967816] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2017-07-20 07:25:19.970427] I [MSGID: 106005] [glusterd-handler.c:5687:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.47.49:/var/lib/heketi/mounts/vg_161d21b3068c4ea60ecba72c1450aedf/brick_7a11f1c67edee5557a9fbf3d4f5a0495/brick has disconnected from glusterd. [2017-07-20 07:25:20.776064] I [glusterd-utils.c:4967:send_attach_req] 0-management: not connected yet [2017-07-20 07:25:20.996851] I [glusterd-utils.c:4967:send_attach_req] 0-management: not connected yet [2017-07-20 07:25:21.988277] I [glusterd-utils.c:4967:send_attach_req] 0-management: not connected yet [2017-07-20 07:25:22.185533] I [glusterd-utils.c:4967:send_attach_req] 0-management: not connected yet From the console I see this, A start job is running for LVM2 PV scan on device xxx I'm not able to point out where to start debugging, if the issue is with glusterd(brick-multiplexing) or LVM related. Version-Release number of selected component (if applicable): glusterfs-server-3.8.4-33.el7rhgs.x86_64 rhgs3/rhgs-server-rhel7:3.3.0-7 How reproducible: 1/1 Steps to Reproduce: 1. create 600 volumes on a CNS setup using heketi 2. reboot one of the worker node hosting gluster pod Actual results: gluster pod fails to get respun Expected results: gluster pod should get respun successfully Additional info: Logs shall be attached shortly.
On the rebooted node, pvscan --cache && pvs && vgscan --cache && vgs is resulting in these messages: 10.70.47.49 14:04:27 up 1:45, 1 user, load average: 0.12, 0.23, 0.52 PV VG Fmt Attr PSize PFree /dev/sda2 rhel_dhcp47-104 lvm2 a-- <99.00g 4.00m /dev/sdb vg_rhel_dhcp47--104-var lvm2 a-- <60.00g 1020.00m /dev/sdc1 docker-vg lvm2 a-- <50.00g 30.00g /dev/sdd vg_161d21b3068c4ea60ecba72c1450aedf lvm2 a-- 99.87g 39.39g /dev/sde vg_cd31d2bc8e2d9def002ba6d2c5db7e19 lvm2 a-- 99.87g 92.00m /dev/sdg vg_ac2b2a4bb2a522005033a9ba9eb6d8a6 lvm2 a-- 511.87g <59.36g Reading volume groups from cache. Found volume group "vg_ac2b2a4bb2a522005033a9ba9eb6d8a6" using metadata type lvm2 Found volume group "docker-vg" using metadata type lvm2 Found volume group "vg_161d21b3068c4ea60ecba72c1450aedf" using metadata type lvm2 Found volume group "rhel_dhcp47-104" using metadata type lvm2 Found volume group "vg_cd31d2bc8e2d9def002ba6d2c5db7e19" using metadata type lvm2 Found volume group "vg_rhel_dhcp47--104-var" using metadata type lvm2 VG #PV #LV #SN Attr VSize VFree docker-vg 1 1 0 wz--n- <50.00g 30.00g rhel_dhcp47-104 1 3 0 wz--n- <99.00g 4.00m vg_161d21b3068c4ea60ecba72c1450aedf 1 120 0 wz--n- 99.87g 39.39g vg_ac2b2a4bb2a522005033a9ba9eb6d8a6 1 898 0 wz--n- 511.87g <59.36g vg_cd31d2bc8e2d9def002ba6d2c5db7e19 1 196 0 wz--n- 99.87g 92.00m vg_rhel_dhcp47--104-var 1 1 0 wz--n- <60.00g 1020.00m WARNING: lvmetad connection failed, cannot reconnect. WARNING: lvmetad cannot be used due to error: Connection reset by peer WARNING: To avoid corruption, restart lvmetad (or disable with use_lvmetad=0). WARNING: lvmetad connection failed, cannot reconnect. WARNING: lvmetad cannot be used due to error: Connection reset by peer WARNING: To avoid corruption, restart lvmetad (or disable with use_lvmetad=0). WARNING: lvmetad connection failed, cannot reconnect. WARNING: lvmetad cannot be used due to error: Connection reset by peer WARNING: To avoid corruption, restart lvmetad (or disable with use_lvmetad=0). WARNING: lvmetad connection failed, cannot reconnect.
root@dhcp47-49 ~]# pvscan PV /dev/sdg VG vg_ac2b2a4bb2a522005033a9ba9eb6d8a6 lvm2 [511.87 GiB / <59.36 GiB free] PV /dev/sdc1 VG docker-vg lvm2 [<50.00 GiB / 30.00 GiB free] PV /dev/sdd VG vg_161d21b3068c4ea60ecba72c1450aedf lvm2 [99.87 GiB / 39.39 GiB free] PV /dev/sda2 VG rhel_dhcp47-104 lvm2 [<99.00 GiB / 4.00 MiB free] PV /dev/sde VG vg_cd31d2bc8e2d9def002ba6d2c5db7e19 lvm2 [99.87 GiB / 92.00 MiB free] PV /dev/sdb VG vg_rhel_dhcp47--104-var lvm2 [<60.00 GiB / 1020.00 MiB free] Total: 6 [920.60 GiB] / in use: 6 [920.60 GiB] / in no VG: 0 [0 ] [root@dhcp47-49 ~]# vgscan Reading volume groups from cache. Found volume group "vg_ac2b2a4bb2a522005033a9ba9eb6d8a6" using metadata type lvm2 Found volume group "docker-vg" using metadata type lvm2 Found volume group "vg_161d21b3068c4ea60ecba72c1450aedf" using metadata type lvm2 Found volume group "rhel_dhcp47-104" using metadata type lvm2 Found volume group "vg_cd31d2bc8e2d9def002ba6d2c5db7e19" using metadata type lvm2 Found volume group "vg_rhel_dhcp47--104-var" using metadata type lvm2
I cannot say conclusively but I think this is the explanation On the vg vg_cd31d2bc8e2d9def002ba6d2c5db7e19, there are 196 LVs. 98 thin pools and 98 bricks. This vg is only of size 99GB and practically full. The metadata for all the LVs are stored on the same VG and there is some resource contention happening which is rendering lvmetad useless. An easy verification is to have disks which double the required capacity for 600 volumes and try again. If we don't see this problem when disks aren't full, it is surely a LVM bug and we will have to approach LVM experts about the limits. krk, can we try the above scenario please?
Does the problem occur on all the three machines? 10.70.47.49 10.70.47.72 10.70.46.248
(In reply to Raghavendra Talur from comment #8) > Does the problem occur on all the three machines? > 10.70.47.49 > 10.70.47.72 > 10.70.46.248 Yes. This is seen on all three machines.
Test runs with 625 Volumes the node reboot =========================================== - I rebooted the VM, dhcp42-97.lab.eng.blr.redhat.com via virt-manager. - The test run with 625 volumes, with a node reboot is back online in approximately 4 minutes.
Thanks a lot Ramky!! Yeah, to summarize 2 scenarios were performed: 1) Pod Restart 2) Node Restart 1) Pod Restart: The Gluster pod which host bricks for 625 volumes came back in 6 minutes in this setup with "deletion of pod" via "oc delete pod <glusterfs-*> command 2) Node Restart: The Gluster pod which host bricks for 625 volumes came up in "3" minutes when we performed NODE reboot! --snip-- [root@dhcp42-112 ~]# oc get pods NAME READY STATUS RESTARTS AGE glusterblock-provisioner-dc-1-nz9mv 1/1 Running 2 20h glusterfs-359cp 1/1 Running 2 20h glusterfs-3bp0t 1/1 Running 0 55m glusterfs-xss4t 1/1 Running 2 20h heketi-1-sq557 1/1 Running 79 18h storage-project-router-1-4kmsq 1/1 Running 2 23h [root@dhcp42-112 ~]# oc get nodes NAME STATUS AGE VERSION dhcp41-193.lab.eng.blr.redhat.com Ready 23h v1.6.1+5115d708d7 dhcp42-112.lab.eng.blr.redhat.com Ready,SchedulingDisabled 23h v1.6.1+5115d708d7 dhcp42-97.lab.eng.blr.redhat.com NotReady 23h v1.6.1+5115d708d7 dhcp43-171.lab.eng.blr.redhat.com Ready 23h v1.6.1+5115d708d7 [root@dhcp42-112 ~]# oc get nodes NAME STATUS AGE VERSION dhcp41-193.lab.eng.blr.redhat.com Ready 23h v1.6.1+5115d708d7 dhcp42-112.lab.eng.blr.redhat.com Ready,SchedulingDisabled 23h v1.6.1+5115d708d7 dhcp42-97.lab.eng.blr.redhat.com Ready 23h v1.6.1+5115d708d7 dhcp43-171.lab.eng.blr.redhat.com Ready 23h v1.6.1+5115d708d7 [root@dhcp42-112 ~]# oc get pods NAME READY STATUS RESTARTS AGE glusterblock-provisioner-dc-1-nz9mv 1/1 Running 2 20h glusterfs-359cp 1/1 Running 2 20h glusterfs-3bp0t 1/1 Running 0 56m glusterfs-xss4t 0/1 Running 3 20h heketi-1-sq557 1/1 Running 79 18h storage-project-router-1-4kmsq 1/1 Running 2 23h [root@dhcp42-112 ~]# oc get pods -w NAME READY STATUS RESTARTS AGE glusterblock-provisioner-dc-1-nz9mv 1/1 Running 2 20h glusterfs-359cp 1/1 Running 2 20h glusterfs-3bp0t 1/1 Running 0 56m glusterfs-xss4t 0/1 Running 3 20h heketi-1-sq557 1/1 Running 79 18h storage-project-router-1-4kmsq 1/1 Running 2 23h glusterfs-xss4t 1/1 Running 3 20h --/snip-- Based on the test results, I am moving back this to ON_QA.
I'll answer the latter part of the question first. The test mentioned has been performed with 625 Volumes on a Single Device. [root@dhcp42-112 ~]# heketi-cli topology info | grep "/dev" Id:1bb70c3198d3c4d7d5d7ac529a35ef9b Name:/dev/vdc State:online Size (GiB):1199 Used (GiB):632 Free (GiB):566 Id:2591462d49daa7930d4eb7b770cac00b Name:/dev/vdc State:online Size (GiB):1199 Used (GiB):632 Free (GiB):566 Id:bfa9ce5c8b955141709cdd6658c3c3c5 Name:/dev/vdc State:online Size (GiB):1199 Used (GiB):632 Free (GiB):566 === For the maximum number of volumes per device, there are quite a few co-relations, Amount of memory available in the nodes, amount of disk space. Since most of the dev test environments are created in virtual environments. In all tests above the VM's had about 16G of RAM. I could create additional 300 volumes without any errors. But when the test VM's memory was about 4GB, the volume creations used to fail when with 157 - 185 volumes on a single device. In the present test setup, my gut feeling is that we can go up to another 400 volumes on a single disk. But then here, if you want a deterministic approach, we need to leave the numbers to the performance team, who can give a conclusive answer, as their test is on bare metal and their access to repeatable/ reproducible test environments.
I'm closing this bug as "NOTABUG". As on a single device, I have been able to perform pod restart and node restart with the following builds: Build Numbers :: ============================== -[root@dhcp42-112 ~]# rpm -qa | grep 'cns-deploy' cns-deploy-5.0.0-38.el7rhgs.x86_64 [root@dhcp42-112 ~]# rpm -qa | grep 'heketi-client' heketi-client-5.0.0-11.el7rhgs.x86_64 Docker Images :: ================ [root@dhcp42-112 ~]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7 3.3.0-21 7a58514538de 4 days ago 407.7 MB brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-volmanager-rhel7 3.3.0-13 f3afaa5a05f7 6 days ago 426.7 MB brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7 3.3.0-20 1d748122341a 6 days ago 407.7 MB brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-gluster-block-prov-docker 3.3.0-10 e571939c4800 2 weeks ago 395.9 MB brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-s3-server-rhel7 3.3.0-7 c684dc64abff 2 weeks ago 447.3 MB
Karthick, thanks for verifying this bug !