Bug 576749
Summary: | Intel bios RAID 1 - md127_resync activity chokes system to death | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Bruce Fowler <brf33> |
Component: | mdadm | Assignee: | Doug Ledford <dledford> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 12 | CC: | agajania, agajan, bitbashing, brf33, ctyler.fedora, dledford, edward.lara.lara, joe.christy, mishu, work.eric |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | mdadm-3.1.3-0.git20100804.2.fc13 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-12-03 16:47:41 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Bruce Fowler
2010-03-25 02:41:52 UTC
With some hints from the Forum, I have worked through the anacron configuration and shell scripts that run the md* code. Following is a file that I generated during one of these lock-up episodes. Note that after the scan gets to 25% or so, the script itself even locks up (run by root with "nice -5") and the machine sits there for over an hour until the script resumes. I've disabled the weekly scan (now that I know how to do that), because it is unacceptable for my machine to "go away" for an hour with no way to get it back short of a power-cycle reset. Even mouse tracking ceased to function. CTL-ALT-F2 to a text terminal wouldn't respond to a login request. I would like to reenable the weekly scan as soon as possible so the integrity of my RAID array is verified on a regular basis. Otherwise it is one more manual chore I am sure to forget! :-) Two corrections to my original report. Apparently this is a routine scan, not a broken RAID 1 mirror. And the "automatic reboot" appears to have been queued during my attempts to get the machine back, it does not normally occur. Here is the script (run as root using "nice -5"): || #! /bin/bash || # Capture what's going on while mdadm is hogging machine || # || echo "Loop writing stats to '~/bug.log' every 30 seconds" || while true; do || echo ">>>>>>>>>>>>>>>>>>>>>>>>>>>" || iostat -t || cat /proc/mdstat || sleep 30 || done >>~/bug.log & And here is the heavily edited output (Much of middle part deleted): >>>>>>>>>>>>>>>>>>>>>>>>>>> Linux 2.6.32.9-70.fc12.i686.PAE (grimm.localdomain) 03/31/2010 _i686_ (2 CPU) 03/31/2010 01:27:36 PM avg-cpu: %user %nice %system %iowait %steal %idle 3.48 0.00 5.53 23.12 0.00 67.87 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 392.17 52352.48 116.50 5815837 12942 sdb 210.85 51.89 48558.09 5764 5394318 md127 824.47 3922.42 116.21 435742 12910 Personalities : [raid1] md127 : active raid1 sda[1] sdb[0] 312568832 blocks super external:/md_d-1/0 [2/2] [UU] [>....................] resync = 0.8% (2690688/312568832) finish=106.7min speed=48356K/sec md0 : inactive sdb[1](S) sda[0](S) 4514 blocks super external:imsm unused devices: <none> >>>>>>>>>>>>>>>>>>>>>>>>>>> Linux 2.6.32.9-70.fc12.i686.PAE (grimm.localdomain) 03/31/2010 _i686_ (2 CPU) 03/31/2010 01:28:06 PM avg-cpu: %user %nice %system %iowait %steal %idle 3.00 0.00 4.81 19.02 0.00 73.17 ((((((((((((((((... Iterations deleted...)))))))))))))))) >>>>>>>>>>>>>>>>>>>>>>>>>>> Linux 2.6.32.9-70.fc12.i686.PAE (grimm.localdomain) 03/31/2010 _i686_ (2 CPU) 03/31/2010 01:47:07 PM avg-cpu: %user %nice %system %iowait %steal %idle 1.53 0.00 2.22 3.42 0.00 92.83 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 883.86 122028.45 80.92 156390437 103704 sdb 523.29 4.50 121407.61 5764 155594776 md127 86.09 699.75 80.69 896790 103414 Personalities : [raid1] md127 : active raid1 sda[1] sdb[0] 312568832 blocks super external:/md_d-1/0 [2/2] [UU] [====>................] resync = 24.8% (77745536/312568832) finish=59.3min speed=65922K/sec md0 : inactive sdb[1](S) sda[0](S) 4514 blocks super external:imsm unused devices: <none> >>>>>>>>>>>>>>>>>>>>>>>>>>> Linux 2.6.32.9-70.fc12.i686.PAE (grimm.localdomain) 03/31/2010 _i686_ (2 CPU) 03/31/2010 01:47:37 PM avg-cpu: %user %nice %system %iowait %steal %idle 1.51 0.00 2.21 3.35 0.00 92.93 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 887.95 122414.67 79.09 160557861 103738 sdb 525.04 4.39 121808.16 5764 159762362 md127 84.12 683.74 78.87 896790 103446 Personalities : [raid1] md127 : active raid1 sda[1] sdb[0] 312568832 blocks super external:/md_d-1/0 [2/2] [UU] [=====>...............] resync = 25.5% (79829440/312568832) finish=55.0min speed=70408K/sec md0 : inactive sdb[1](S) sda[0](S) 4514 blocks super external:imsm unused devices: <none> >>>>>>>>>>>>>>>>>>>>>>>>>>> Linux 2.6.32.9-70.fc12.i686.PAE (grimm.localdomain) 03/31/2010 _i686_ (2 CPU) 03/31/2010 02:55:08 PM avg-cpu: %user %nice %system %iowait %steal %idle 0.90 0.00 1.88 67.59 0.00 29.63 [... Rest of log deleted] Just thought it worth sharing that I recently had a similar issue. For me it seems setting '/proc/sys/dev/raid/speed_limit_max' to a lower value corrected it. Hope this might help :) *your mileage may vary as at the time of writing this I assume it worked because typically it would have happened by now I'm wondering if this bug has been fixed. I just cut the power to my PC and restarted it. A md127_resync process is running. The data speed is staying between 55K/sec and 70K/sec. I haven't noticed any degradation in responsiveness. A few weeks ago, a kernel update included a fix for a RAID 5 issue. (See bug #575402.) Maybe that update helped this problem as well. I'm now running F13. I experienced similar symptoms last week when using the kernel from the F13 DVD. No, it hasn't been fixed. The UI became unresponsive when the disk was about 60% resynched. The data speed was about 83K/sec. I'll try setting /proc/sys/dev/raid/speed_limit_max to see if that helps. The following command didn't help. The UI still became unresponsive. echo "50000" > /proc/sys/dev/raid/speed_limit_max The data speed did stay around 50K/sec. I noticed the following when the computer was unresponsive: = the mouse cursor still moves OK = windows are no longer updated = I can go to a new virtual console and log in. (It takes a minute or so.) Sometimes a virtual console becomes unresponsive. In that case, I am still able to to go another virtual console with C-A-Fn and log in. *** Bug 542546 has been marked as a duplicate of this bug. *** This is specifically a problem with imsm arrays. If you wait for the resync to complete, it returns to normal. The problem has been fixed in mdadm-3.1.3-0.git20100722.1 or later. mdadm-3.1.3-0.git20100722.1.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100722.1.fc12 mdadm-3.1.3-0.git20100722.2.fc12 has been pushed to the Fedora 12 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update mdadm'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100722.2.fc12 mdadm-3.1.3-0.git20100804.2.fc13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc13 mdadm-3.1.3-0.git20100804.2.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc12 mdadm-3.1.3-0.git20100804.2.fc14 has been submitted as an update for Fedora 14. http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc14 mdadm-3.1.3-0.git20100804.2.fc12 has been pushed to the Fedora 12 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update mdadm'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc12 mdadm-3.1.3-0.git20100804.2.fc13 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update mdadm'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc13 mdadm-3.1.3-0.git20100804.2.fc14 has been pushed to the Fedora 14 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update mdadm'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc14 This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. mdadm-3.1.3-0.git20100804.2.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report. mdadm-3.1.3-0.git20100804.2.fc13 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report. |