Bug 670093

Summary: Extremely slow software raid (6) with 2.6.35.10-74.fc14.x86_64
Product: [Fedora] Fedora Reporter: Naoki <naoki>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 14CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-30 17:29:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Naoki 2011-01-17 07:24:01 UTC
Description of problem:
After a fresh build using software raid (md) the build speed is reported through /proc/mdstat as ~200k/s maybe peaking at 1.2MB/s on a six drive array. Boot almost so slow it appears hung at times.

Moving to 2.6.37-2.fc15.x86_64 pushed the speed up to the expected ~66MB/s and the system was perfectly usable. The problem also did not exist in the earlier kernel (2.6.35.9-?).

Version-Release number of selected component (if applicable):
kernel 2.6.35.10-74.fc14.x86_64

How reproducible:
100% (on our system)

Steps to Reproduce:
1. Build a software raid set with affected kernel.
2. Check /proc/mdstat 
3. Speed of rebuild (and system in general) unacceptably slow.
  
Actual results:
I/O abnormally slow.

Expected results:
Raid build speeds and I/O at speeds closer to the actual disk performance.

Additional info:

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] 
md1 : active raid6 sda2[0] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1]
      3904940032 blocks super 1.1 level 6, 512k chunk, algorithm 2 [6/6] [UUUUUU]
      [======>..............]  resync = 30.0% (292900408/976235008) finish=8892.0min speed=1280K/sec
      bitmap: 6/8 pages [24KB], 65536KB chunk

md0 : active raid1 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
      524276 blocks super 1.0 [6/6] [UUUUUU]
      
unused devices: <none>
[root@1 ~]# uname -a
Linux 1 2.6.35.10-74.fc14.x86_64 #1 SMP Thu Dec 23 16:04:50 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

--------------

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] 
md1 : active raid6 sda2[0] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1]
      3904940032 blocks super 1.1 level 6, 512k chunk, algorithm 2 [6/6] [UUUUUU]
      [======>..............]  resync = 34.3% (335286144/976235008) finish=158.2min speed=67497K/sec
      bitmap: 6/8 pages [24KB], 65536KB chunk

md0 : active raid1 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
      524276 blocks super 1.0 [6/6] [UUUUUU]
      
unused devices: <none>
[root@1 ~]# uname -a
Linux 1 2.6.37-2.fc15.x86_64 #1 SMP Fri Jan 7 14:57:36 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

Comment 1 Chuck Ebbert 2011-01-19 13:02:30 UTC
I don't see any changes in that latest F14 kernel that could cause this. Can you get the block device and queue information for the problem array under both a working and broken kernel? Run these commands on both kernels and attach the output:

 # cd /sys/block/md1
 # grep "" * queue/*

Comment 2 Josh Boyer 2011-08-30 17:29:16 UTC
Given 8mo of no comments and a indication that the f15 kernel was working fine, I'm closing this as NEXTRELEASE.