Bug 1349335 - "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" message gets logged in VM console with 80:20 r/w fio workload
Summary: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" message gets logged in VM ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
: ---
Assignee: Krutika Dhananjay
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks: Gluster-HC-3
TreeView+ depends on / blocked
 
Reported: 2016-06-23 09:06 UTC by SATHEESARAN
Modified: 2018-04-16 18:16 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
RHEV-RHGS HCI RHEL 7.2 RHEV 3.6.7-4 RHGS 3.1.3
Last Closed: 2018-04-16 18:16:53 UTC
Embargoed:


Attachments (Terms of Use)
screenshot of one of the VM console (33.92 KB, image/png)
2016-06-23 09:54 UTC, SATHEESARAN
no flags Details

Description SATHEESARAN 2016-06-23 09:06:28 UTC
Description of problem:
-----------------------
Testbed contains the 3 nodes with converged virt+gluster. In this case sharded replica 3 volume is used for RHEV Hosted Engine VM, Application VMs root disk which has root filesystem (sda), and Application VMs additional disks used for  application data(sdb).

I am using the fio-tools[1] to launch fio in client-server fashion on all the application VMs.

While running 80:20 fio workload on 29 VMs, I see error messages logged in the VM console - ""echo 0 > /proc/sys/kernel/hung_task_timeout_secs" . Also VMs responses go slow during this time window

[1] - https://github.com/pcuzner/fio-tools


Version-Release number of selected component (if applicable):
--------------------------------------------------------------
RHEL 7.2 Server
RHEV 3.6.7-4
RHGS 3.1.3

How reproducible:
-----------------
2/2

Steps to Reproduce:
-------------------
1. Create 2 replica 3 sharded volume 
2. Create 2 RHEV data domain backed with the above created gluster volumes.
One domain for root disk(sda) of the Application VM, and another domain for additional disk(sdb) to the VM
3. Create 29 VMs installed with RHEL 7.2 server
4. Run 80:20 fio workload

Actual results:
---------------
All the 29 VMs see a problem and hence see the error messages in the console as follows :

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/1:1:3478 blocked for more than 120 seconds.

Expected results:
-----------------
Application VMs should not face any problems

Comment 1 SATHEESARAN 2016-06-23 09:54:18 UTC
Created attachment 1171397 [details]
screenshot of one of the VM console

Comment 3 SATHEESARAN 2016-06-23 10:06:10 UTC
Problems that I see with this issue are listed below :

1. VM console shows me lot of hung_timeout happened for few processes. This will definitely impact any application running inside the VM

2. VMs tend to respond slow during this time window.

3. With PCP charts, I see that the I/O wait is still there. I hope that indicated that there is some I/O happening. With brick strace that has become evident
I/O is happening on VMs for last 1 hour ( too much time ), actually this I/O is expected to get completed with 5 minutes.

With the first occurrence of this issue, I have stopped the fio job and ssh'ed in the VMs and they looked good.I have rebooted the VMs and they came up healthy.

I am still waiting for the completion of second iteration of the test and will update the logs post that

Comment 4 RamaKasturi 2016-08-24 09:37:39 UTC
While running 80:20 fio workload on 29 VMs, I see error messages logged in one of the VM console - ""echo 0 > /proc/sys/kernel/hung_task_timeout_secs" .

glusterfs version : glusterfs-3.7.9-10.el7rhgs.x86_64
ovirt version : ovirt-engine-4.0.2.6-0.1.el7ev.noarch

volume configuration :
=======================
[root@rhsqa-grafton5 ~]# gluster volume info engine
 
Volume Name: engine
Type: Replicate
Volume ID: e5935349-d827-4638-8c23-b953210a0a55
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.36.82:/rhgs/brick1/engine-4
Brick2: 10.70.36.83:/rhgs/brick1/engine
Brick3: 10.70.36.84:/rhgs/brick1/engine
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
performance.strict-o-direct: on
network.ping-timeout: 30
user.cifs: off

gluster volume info data
 
Volume Name: data
Type: Replicate
Volume ID: 42608dba-9a30-46a1-9b1f-710d64e2d655
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.36.82:/rhgs/brick3/data
Brick2: 10.70.36.83:/rhgs/brick3/data_5
Brick3: 10.70.36.84:/rhgs/brick3/data_new
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
performance.strict-o-direct: on
network.ping-timeout: 30
user.cifs: off


gluster volume info vmstore
 
Volume Name: vmstore
Type: Replicate
Volume ID: d094ccfa-7840-4a68-ab33-3179aa996efd
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.36.82:/rhgs/brick2/vmstore
Brick2: 10.70.36.83:/rhgs/brick2/vmstore-5
Brick3: 10.70.36.84:/rhgs/brick2/vmstore-6
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
performance.strict-o-direct: on
network.ping-timeout: 30
user.cifs: off

Comment 5 Sahina Bose 2016-09-09 05:12:12 UTC
Are you able to reproduce issue after changing the vm.dirty_ratio and vm.dirty_background_ratio  to lower values?

Comment 6 RamaKasturi 2016-09-09 05:18:04 UTC
Hi sahina,

      I have set the values vm.dirty_background_ratio to 10 and tried reproducing it. I was able to hit the issue once, but when i tried again i was not able to hit it.

Comment 7 RamaKasturi 2016-09-09 05:18:40 UTC
(In reply to RamaKasturi from comment #6)
> Hi sahina,
> 
>       I have set the values vm.dirty_ratio to 10 and tried
> reproducing it. I was able to hit the issue once, but when i tried again i
> was not able to hit it.

Comment 8 Krutika Dhananjay 2016-09-22 03:23:30 UTC
@Sas and Kasturi,

Assuming you have a consistent reproducer for this bug, could you run it after disabling open-behind, and then stopping and starting the vms again?

-Krutika

Comment 9 SATHEESARAN 2016-10-06 09:46:41 UTC
(In reply to Krutika Dhananjay from comment #8)
> @Sas and Kasturi,
> 
> Assuming you have a consistent reproducer for this bug, could you run it
> after disabling open-behind, and then stopping and starting the vms again?
> 
> -Krutika

Krutika,

The worst problem happens when we try to reproduce this problem.

In my setup ( with RHV 4.0.4 and RHGS 3.1.3 - glusterfs-3.7.9-12.el7rhgs ), I am unable to hit this issue, where there are error messages in all VMs console.
Instead of that I observed events in RHV UI, saying that 'VMs not responding'
This again happened on few VMs.

Let me explain the workload. I have tested with fio-tool as pointed in comment0
with randrw80_20 workloadm with and without rate-limiting IOPs

I tested with turning off open-behind, I could again see the events that atleast 2 of the VMs are not responding. But when looking at the VMs by hand, all looks good and no problems. Not sure how come these events are triggered and what is the consequence of the same

Comment 10 Krutika Dhananjay 2016-10-14 09:52:18 UTC
(In reply to SATHEESARAN from comment #9)
> (In reply to Krutika Dhananjay from comment #8)
> > @Sas and Kasturi,
> > 
> > Assuming you have a consistent reproducer for this bug, could you run it
> > after disabling open-behind, and then stopping and starting the vms again?
> > 
> > -Krutika
> 
> Krutika,
> 
> The worst problem happens when we try to reproduce this problem.
> 
> In my setup ( with RHV 4.0.4 and RHGS 3.1.3 - glusterfs-3.7.9-12.el7rhgs ),
> I am unable to hit this issue, where there are error messages in all VMs
> console.
> Instead of that I observed events in RHV UI, saying that 'VMs not responding'
> This again happened on few VMs.
> 
> Let me explain the workload. I have tested with fio-tool as pointed in
> comment0
> with randrw80_20 workloadm with and without rate-limiting IOPs
> 
> I tested with turning off open-behind, I could again see the events that
> atleast 2 of the VMs are not responding. But when looking at the VMs by
> hand, all looks good and no problems. Not sure how come these events are
> triggered and what is the consequence of the same

Hmm.. do you have the logs corresponding to this particular failed run? If so, could you please share them?

-Krutika

Comment 11 SATHEESARAN 2016-10-17 06:04:40 UTC
(In reply to Krutika Dhananjay from comment #10)
> (In reply to SATHEESARAN from comment #9)
> > (In reply to Krutika Dhananjay from comment #8)
> > > @Sas and Kasturi,
> > > 
> > > Assuming you have a consistent reproducer for this bug, could you run it
> > > after disabling open-behind, and then stopping and starting the vms again?
> > > 
> > > -Krutika
> > 
> > Krutika,
> > 
> > The worst problem happens when we try to reproduce this problem.
> > 
> > In my setup ( with RHV 4.0.4 and RHGS 3.1.3 - glusterfs-3.7.9-12.el7rhgs ),
> > I am unable to hit this issue, where there are error messages in all VMs
> > console.
> > Instead of that I observed events in RHV UI, saying that 'VMs not responding'
> > This again happened on few VMs.
> > 
> > Let me explain the workload. I have tested with fio-tool as pointed in
> > comment0
> > with randrw80_20 workloadm with and without rate-limiting IOPs
> > 
> > I tested with turning off open-behind, I could again see the events that
> > atleast 2 of the VMs are not responding. But when looking at the VMs by
> > hand, all looks good and no problems. Not sure how come these events are
> > triggered and what is the consequence of the same
> 
> Hmm.. do you have the logs corresponding to this particular failed run? If
> so, could you please share them?
> 
> -Krutika

Krutika,

I don't have logs for the corresponding test.
I will reproduce the issue again and I will attach the required logs

Comment 13 RamaKasturi 2016-11-24 07:23:10 UTC
I have run fio(80:20) workload on 29 vms with RHV4.0.5 and RHGS3.1.3. I see kernel hung taks messages on the vm console and i see that latency for vms is very huge.

Below are the latency numbers for all the vms on which fio was run.

[root@dhcp47-196 controller]# grep "80.000000" randrw8020_100iops.job_vmstore_029_json.out
            "80.000000" : 937984,
            "80.000000" : 708608,
            "80.000000" : 684032,
            "80.000000" : 610304,
            "80.000000" : 577536,
            "80.000000" : 544768,
            "80.000000" : 514048,
            "80.000000" : 509952,
            "80.000000" : 544768,
            "80.000000" : 477184,
            "80.000000" : 1449984,
            "80.000000" : 806912,
            "80.000000" : 1003520,
            "80.000000" : 806912,
            "80.000000" : 839680,
            "80.000000" : 741376,
            "80.000000" : 733184,
            "80.000000" : 602112,
            "80.000000" : 561152,
            "80.000000" : 142336,
            "80.000000" : 85504,
            "80.000000" : 113152,
            "80.000000" : 111104,
            "80.000000" : 104960,
            "80.000000" : 94720,
            "80.000000" : 86528,
            "80.000000" : 86528,
            "80.000000" : 76288,
            "80.000000" : 72192,

Comment 15 Byreddy 2017-01-12 06:35:18 UTC
i hit the same issue while rhv-rhgs testing, i was running dd and kernel untar IO on application VM.

Comment 16 Sahina Bose 2017-01-18 06:35:17 UTC
The issues seen here seem similar to one reported in Bug 1389516 and based on Comment 49 and Comment 52 and other comments in bug - we need to have realistic expectations on performance - i.e map performance that we see to what we expect from underlying storage, and recheck the job file to ensure it limits the backlog of I/O requests.

Comment 17 SATHEESARAN 2017-02-16 16:55:59 UTC
Krutika,

As I have mentioned in comment11, I don't have the logs right at the moment, but this issue is seen on the VMs.

On the side note, we are not sure, is this a genuine issue and does it affect the application or VMs. We need to take it in for RHGS 3.3.0 and consider providing proper solution for the same


Note You need to log in before you can comment on or make changes to this bug.