Bug 173143 - fully-synced raid 1 increases idle load avg without eating cpu
Summary: fully-synced raid 1 increases idle load avg without eating cpu
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
(Show other bugs)
Version: rawhide
Hardware: All Linux
medium
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Brian Brock
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-11-14 16:33 UTC by Alexandre Oliva
Modified: 2007-11-30 22:11 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-01-13 16:12:23 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

Description Alexandre Oliva 2005-11-14 16:33:52 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8) Gecko/20051103 Fedora/1.5-0.5.0.rc1 Firefox/1.5

Description of problem:
I had a chance to notice that raid 1 resyncing was slow on kernel 1.1657_FC5 after trying 1660 and 1663, that both froze shortly after zebra started up.  I'd collect stack traces for a bug report, but today's 1665 fixed it, so I won't bother.

The symptom was that raid resyncing was fixed at the slowest speed set at /proc/sys/dev/raid/speed_limit_min, even for a completely idle system (other than the raid syncing, of course).  Bumping the limit up would get raid resyncing to speed up accordingly.

I didn't have a chance to test resyncing on 1665 yet, but I suspect there's going to be some change because, unlike 1657, the load average is now stuck above N, where N is the number of active RAID 1 devices, all of them fully synced.  I couldn't find any oopses in /var/log/messages, and the fans on the affected notebooks are not active, so it's clear that this higher-than-expected load average is a mistake.  There isn't anything eating CPU like crazy, it's just incorrect accounting, it seems.  One of the affected boxes, an Athlon64 notebook tracking x86_64 rawhide, has 8 RAID 1 devices, and its load is stuck slightly above 8.  An i686 notebook tracking i386 rawhide has 5 RAID 1 devices, and its load is stuck slightly above 5.  None of them have any other active RAID devices, so I can't tell whether the problem is exclusive to RAID 1.

Version-Release number of selected component (if applicable):
kernel-2.6.14-1.1665_FC5

How reproducible:
Always

Steps to Reproduce:
1.Boot the kernel up

Actual Results:  Load avg is off by the number of active RAID 1 devices

Expected Results:  It should go back to normal

Additional info:

Comment 1 Alexandre Oliva 2005-11-14 16:34:42 UTC
Sorry, filed against wrong component.

Comment 2 Alexandre Oliva 2005-11-19 21:21:13 UTC
I have confirmation that it's the number of active raid devices that causes the
increased load, and I've found out another side effect of this problem: it
prevents swsusp from working.  The problem is still present on 1688_FC5,
unfortunately.  The symptom is that, when swsusp tries to stop all tasks, it
fails after a few seconds and complains that the raid-controlling processes
won't stop, and the system comes back to activity instead of going to sleep :-(

Comment 3 Alexandre Oliva 2006-01-13 16:12:23 UTC
This was fixed a while before the 2.6.15 release.


Note You need to log in before you can comment on or make changes to this bug.