Bug 730412

Summary: Regular disk hangs during high load with small file random i/o
Product: Red Hat Enterprise Linux 5 Reporter: Harshavardhana <fharshav>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED WORKSFORME QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.6CC: cww
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-11-01 22:01:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Harshavardhana 2011-08-12 20:11:50 UTC
Description of problem:

Regular disk hangs and server side hangs for small file random high i/o. 

Version-Release number of selected component (if applicable):

2.6.18-238.9.1.el5 kernel 5.6 

ext4 FS inode size 512

dumpe4fs 1.41.12 (17-May-2010)
Filesystem volume name:   /data1
Last mounted on:          /data1/data1
Filesystem UUID:          6e4af245-fea6-459d-86fe-a33a49bce8c5
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              610304000
Block count:              2441215990
Reserved block count:     122060799
Free blocks:              2364722641
Free inodes:              610303989
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      441
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   1024
Flex block group size:    16
Filesystem created:       Wed Aug  3 12:48:19 2011
Last mount time:          Wed Aug  3 13:37:22 2011
Last write time:          Wed Aug  3 13:37:22 2011
Mount count:              1
Maximum mount count:      20
Last checked:             Wed Aug  3 12:48:19 2011
Check interval:           15552000 (6 months)
Next check after:         Mon Jan 30 11:48:19 2012
Lifetime writes:          291 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               512
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      c6cb58d1-fab3-4750-b73e-64a6a2fd8dc2
Journal backup:           inode blocks
Journal features:         journal_incompat_revoke
Journal size:             128M
Journal length:           32768
Journal sequence:         0x0004f8f2
Journal start:            31790

sysctl.conf :-

vm.swappiness = 0
vm.vfs_cache_pressure = 100000
vm.dirty_background_ratio = 1
vm.dirty_ratio = 5

mount options :-

/dev/sdb1 /data1 ext4 rw,noatime,nodiratime,barrier=1,data=writeback 0 0

Hardware :- 

Dell PERC H700 

How reproducible:

Easily, everytime with high load for 5-6hrs. 

Steps to Reproduce:

Running a homegrown application which would cause disks to start throwing 'hung_task_timeouts' with a backtrace. 

Actual results:

Application hung after 6hrs, no more progress in the job. 

Expected results:

Application runs to completion on NetApp backend device. 

Additional info:

Logs for 'dmidecode', 'dmesg-backtrace', 'lscpi-info', 'slabinfo' .etc are available @ http://shell.gluster.com/~harsha/redhat-bugzilla/

Comment 1 Harshavardhana 2011-09-16 19:59:45 UTC
Fixed it by tuning kernel parameters. 

# latency
echo "deadline" > /sys/block/sdb/queue/scheduler
echo "deadline" > /sys/block/sdc/queue/scheduler

# 2x queue_depth - 128
echo "256" > /sys/block/sdb/queue/nr_requests
echo "256" > /sys/block/sdc/queue/nr_requests

# 64k stripe size
echo "16" > /proc/sys/vm/page-cluster

# Saturate internal RAID cache
blockdev --setra 4096 /dev/sdb
blockdev --setra 4096 /dev/sdc

# Virtual Memory optimizations
sysctl vm.swappiness=0
sysctl vm.vfs_cache_pressure=100000
sysctl vm.dirty_ratio=5
sysctl vm.dirty_background_ratio=1