Description of problem: ----------------------- Testbed contains the 3 nodes with converged virt+gluster. In this case sharded replica 3 volume is used for RHEV Hosted Engine VM, Application VMs root disk which has root filesystem (sda), and Application VMs additional disks used for application data(sdb). I am using the fio-tools[1] to launch fio in client-server fashion on all the application VMs. While running 80:20 fio workload on 29 VMs, I see error messages logged in the VM console - ""echo 0 > /proc/sys/kernel/hung_task_timeout_secs" . Also VMs responses go slow during this time window [1] - https://github.com/pcuzner/fio-tools Version-Release number of selected component (if applicable): -------------------------------------------------------------- RHEL 7.2 Server RHEV 3.6.7-4 RHGS 3.1.3 How reproducible: ----------------- 2/2 Steps to Reproduce: ------------------- 1. Create 2 replica 3 sharded volume 2. Create 2 RHEV data domain backed with the above created gluster volumes. One domain for root disk(sda) of the Application VM, and another domain for additional disk(sdb) to the VM 3. Create 29 VMs installed with RHEL 7.2 server 4. Run 80:20 fio workload Actual results: --------------- All the 29 VMs see a problem and hence see the error messages in the console as follows : "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. INFO: task kworker/1:1:3478 blocked for more than 120 seconds. Expected results: ----------------- Application VMs should not face any problems
Created attachment 1171397 [details] screenshot of one of the VM console
Problems that I see with this issue are listed below : 1. VM console shows me lot of hung_timeout happened for few processes. This will definitely impact any application running inside the VM 2. VMs tend to respond slow during this time window. 3. With PCP charts, I see that the I/O wait is still there. I hope that indicated that there is some I/O happening. With brick strace that has become evident I/O is happening on VMs for last 1 hour ( too much time ), actually this I/O is expected to get completed with 5 minutes. With the first occurrence of this issue, I have stopped the fio job and ssh'ed in the VMs and they looked good.I have rebooted the VMs and they came up healthy. I am still waiting for the completion of second iteration of the test and will update the logs post that
While running 80:20 fio workload on 29 VMs, I see error messages logged in one of the VM console - ""echo 0 > /proc/sys/kernel/hung_task_timeout_secs" . glusterfs version : glusterfs-3.7.9-10.el7rhgs.x86_64 ovirt version : ovirt-engine-4.0.2.6-0.1.el7ev.noarch volume configuration : ======================= [root@rhsqa-grafton5 ~]# gluster volume info engine Volume Name: engine Type: Replicate Volume ID: e5935349-d827-4638-8c23-b953210a0a55 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.36.82:/rhgs/brick1/engine-4 Brick2: 10.70.36.83:/rhgs/brick1/engine Brick3: 10.70.36.84:/rhgs/brick1/engine Options Reconfigured: performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 performance.strict-o-direct: on network.ping-timeout: 30 user.cifs: off gluster volume info data Volume Name: data Type: Replicate Volume ID: 42608dba-9a30-46a1-9b1f-710d64e2d655 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.36.82:/rhgs/brick3/data Brick2: 10.70.36.83:/rhgs/brick3/data_5 Brick3: 10.70.36.84:/rhgs/brick3/data_new Options Reconfigured: performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 performance.strict-o-direct: on network.ping-timeout: 30 user.cifs: off gluster volume info vmstore Volume Name: vmstore Type: Replicate Volume ID: d094ccfa-7840-4a68-ab33-3179aa996efd Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.36.82:/rhgs/brick2/vmstore Brick2: 10.70.36.83:/rhgs/brick2/vmstore-5 Brick3: 10.70.36.84:/rhgs/brick2/vmstore-6 Options Reconfigured: performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 performance.strict-o-direct: on network.ping-timeout: 30 user.cifs: off
Are you able to reproduce issue after changing the vm.dirty_ratio and vm.dirty_background_ratio to lower values?
Hi sahina, I have set the values vm.dirty_background_ratio to 10 and tried reproducing it. I was able to hit the issue once, but when i tried again i was not able to hit it.
(In reply to RamaKasturi from comment #6) > Hi sahina, > > I have set the values vm.dirty_ratio to 10 and tried > reproducing it. I was able to hit the issue once, but when i tried again i > was not able to hit it.
@Sas and Kasturi, Assuming you have a consistent reproducer for this bug, could you run it after disabling open-behind, and then stopping and starting the vms again? -Krutika
(In reply to Krutika Dhananjay from comment #8) > @Sas and Kasturi, > > Assuming you have a consistent reproducer for this bug, could you run it > after disabling open-behind, and then stopping and starting the vms again? > > -Krutika Krutika, The worst problem happens when we try to reproduce this problem. In my setup ( with RHV 4.0.4 and RHGS 3.1.3 - glusterfs-3.7.9-12.el7rhgs ), I am unable to hit this issue, where there are error messages in all VMs console. Instead of that I observed events in RHV UI, saying that 'VMs not responding' This again happened on few VMs. Let me explain the workload. I have tested with fio-tool as pointed in comment0 with randrw80_20 workloadm with and without rate-limiting IOPs I tested with turning off open-behind, I could again see the events that atleast 2 of the VMs are not responding. But when looking at the VMs by hand, all looks good and no problems. Not sure how come these events are triggered and what is the consequence of the same
(In reply to SATHEESARAN from comment #9) > (In reply to Krutika Dhananjay from comment #8) > > @Sas and Kasturi, > > > > Assuming you have a consistent reproducer for this bug, could you run it > > after disabling open-behind, and then stopping and starting the vms again? > > > > -Krutika > > Krutika, > > The worst problem happens when we try to reproduce this problem. > > In my setup ( with RHV 4.0.4 and RHGS 3.1.3 - glusterfs-3.7.9-12.el7rhgs ), > I am unable to hit this issue, where there are error messages in all VMs > console. > Instead of that I observed events in RHV UI, saying that 'VMs not responding' > This again happened on few VMs. > > Let me explain the workload. I have tested with fio-tool as pointed in > comment0 > with randrw80_20 workloadm with and without rate-limiting IOPs > > I tested with turning off open-behind, I could again see the events that > atleast 2 of the VMs are not responding. But when looking at the VMs by > hand, all looks good and no problems. Not sure how come these events are > triggered and what is the consequence of the same Hmm.. do you have the logs corresponding to this particular failed run? If so, could you please share them? -Krutika
(In reply to Krutika Dhananjay from comment #10) > (In reply to SATHEESARAN from comment #9) > > (In reply to Krutika Dhananjay from comment #8) > > > @Sas and Kasturi, > > > > > > Assuming you have a consistent reproducer for this bug, could you run it > > > after disabling open-behind, and then stopping and starting the vms again? > > > > > > -Krutika > > > > Krutika, > > > > The worst problem happens when we try to reproduce this problem. > > > > In my setup ( with RHV 4.0.4 and RHGS 3.1.3 - glusterfs-3.7.9-12.el7rhgs ), > > I am unable to hit this issue, where there are error messages in all VMs > > console. > > Instead of that I observed events in RHV UI, saying that 'VMs not responding' > > This again happened on few VMs. > > > > Let me explain the workload. I have tested with fio-tool as pointed in > > comment0 > > with randrw80_20 workloadm with and without rate-limiting IOPs > > > > I tested with turning off open-behind, I could again see the events that > > atleast 2 of the VMs are not responding. But when looking at the VMs by > > hand, all looks good and no problems. Not sure how come these events are > > triggered and what is the consequence of the same > > Hmm.. do you have the logs corresponding to this particular failed run? If > so, could you please share them? > > -Krutika Krutika, I don't have logs for the corresponding test. I will reproduce the issue again and I will attach the required logs
I have run fio(80:20) workload on 29 vms with RHV4.0.5 and RHGS3.1.3. I see kernel hung taks messages on the vm console and i see that latency for vms is very huge. Below are the latency numbers for all the vms on which fio was run. [root@dhcp47-196 controller]# grep "80.000000" randrw8020_100iops.job_vmstore_029_json.out "80.000000" : 937984, "80.000000" : 708608, "80.000000" : 684032, "80.000000" : 610304, "80.000000" : 577536, "80.000000" : 544768, "80.000000" : 514048, "80.000000" : 509952, "80.000000" : 544768, "80.000000" : 477184, "80.000000" : 1449984, "80.000000" : 806912, "80.000000" : 1003520, "80.000000" : 806912, "80.000000" : 839680, "80.000000" : 741376, "80.000000" : 733184, "80.000000" : 602112, "80.000000" : 561152, "80.000000" : 142336, "80.000000" : 85504, "80.000000" : 113152, "80.000000" : 111104, "80.000000" : 104960, "80.000000" : 94720, "80.000000" : 86528, "80.000000" : 86528, "80.000000" : 76288, "80.000000" : 72192,
i hit the same issue while rhv-rhgs testing, i was running dd and kernel untar IO on application VM.
The issues seen here seem similar to one reported in Bug 1389516 and based on Comment 49 and Comment 52 and other comments in bug - we need to have realistic expectations on performance - i.e map performance that we see to what we expect from underlying storage, and recheck the job file to ensure it limits the backlog of I/O requests.
Krutika, As I have mentioned in comment11, I don't have the logs right at the moment, but this issue is seen on the VMs. On the side note, we are not sure, is this a genuine issue and does it affect the application or VMs. We need to take it in for RHGS 3.3.0 and consider providing proper solution for the same