Description of problem: Heavy disk I/O crashes or freezes Fedora 15, possibly MD RAID specific. Just installed Fedora 15. Clean install to separate partition on spare HD with no problems. Set out to back-up approx 1.3TB of data (mostly camcorder footage) to a new hard 2TB drive but experienced freezes or reboots after a few tens of seconds with anything that caused heavy disk I/O such as resize2fs or rsync. Unfortunately no OOPS or any other diagnostic to attach, either the X window session froze with a blank screen requiring a reboot or more often the machine simply rebooted. I have successfully copied the data in F14 (kernel 2.6.36-1), and am about 90 minutes into verifying the copy (with that much data chances of an error somewhere become rather more likely - it'll take a few hours more to compare the whole tree) but both the original copy and an attempt to verify the copy crashes in F15 very quickly. I've tried both 2.6.38.6-26 which was the original F15 kernel and 2.6.38.8-35 which is the current one. The filesystem in question is an md RAID5 of 4 Samsung drives with an ext4 filesystem. All the drives are fairly new (18 months) I'm not sure that this is RAID specific though - the machine also crashed during a "yum install" of some stuff which would have gone to the system partition on a separate Hitachi Deskstar T7K500 - this drive is a bit older but, again, shows no problems with smartctl. The destination drive is a new WD20EARS This pretty much makes F15 unusable Version-Release number of selected component (if applicable): Kernels 2.6.38.6-26 and (through?) 2.6.38.8-35 affected How reproducible: 100% Steps to Reproduce: 1. Install Fedora 15 2. Copy large filesystem 3. Actual results: Reboots & freezes after a few 10s of seconds. Expected results: Data copied without reboots or freezes Additional info: Platform: Core i7 920 (2.67GHz Bloomfield, not O/C'd), x86_64 kernel versions as above, Asus P6TSE with 12GB DDR3 DRAM, md RAID5 (4x Samsung HD103SJ all clean with no errors according to smartctl) plus 2TB Werstern Digital drive acting as temporary backup while I reconfigure the RAID partition & a 320Gb Hitachi for system files.
Same here, but with single HDD configuration. Whenever a large file is copied both to the HDD and to an USB thumbdrive high IO occurs and system is freezed (unusable) until the copying is finished. No crash (not yet), though.
Same thing. It happens on any long running sustained I/O. SATA, USB, or Firewire drives all experience the same problem. It even happens when copying with 'ionice -c3'. I started using rsync with '--bwlimit' to about half the normal throughput to work around the issue. Only X hangs, I can ssh in, and don't see any issues in /var/log/messages or dmesg. I do note that one core is running at 100% when the problem occurs, vs. about 10% when I use rsync bwlimit to throttle the throughput. Same problem on kernel-2.6.40-4.fc15.x86_64 as well.
Same here. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
I was able to easily recreate the problem, by copying a large amount of data (gigabytes) to a USB attached flash card. After a few minutes, the user interface froze. I ssh'd in from another machine and noticed high iowait, but little actual I/O taking place. I switched all the devices from the CFQ scheduler to the deadline scheduler, and the problem immediately cleared. With deadline, I can't recreate it anymore. If nothing else, this should be a workaround for those with the problem. You can switch on the fly for each device with: echo "deadline" > /sys/block/sdX/queue/scheduler or boot with kernel option elevator=deadline
Confirmed this issue is still reproducible on Fedora 16 Beta with kernel 3.1.0-0.rc8.git0.1.fc16.i686. Not good. This is really a serious issue. It made my backing up and switching to GPT partition table not fluent as it should be.
Same issue here on 2.6.40.3-0.fc15.x86_64. At first I thought it was just copying to USB devices (cf. https://bugzilla.redhat.com/show_bug.cgi?id=734516), but the issue also occurs when copying between two internal SATA drives. Really annoying...
(In reply to comment #4) > I was able to easily recreate the problem, by copying a large amount of data > (gigabytes) to a USB attached flash card. After a few minutes, the user > interface froze. I ssh'd in from another machine and noticed high iowait, but > little actual I/O taking place. I switched all the devices from the CFQ > scheduler to the deadline scheduler, and the problem immediately cleared. With > deadline, I can't recreate it anymore. If nothing else, this should be a > workaround for those with the problem. You can switch on the fly for each > device with: > > echo "deadline" > /sys/block/sdX/queue/scheduler > > or boot with kernel option elevator=deadline For me, or at least 3.1.0-0.rc8.git0.1.fc16.i686, the switching to deadline scheduler only reduce the chances of freeze. I was still caught two freezes when copying between internal and USB-connected external hard drive. Both have ext4 file systems. So I guess the cause lies somewhere else.
deadline didn't really help in my case. my case was copying large files to a usb storage device. My usb storage device was formatted with NTFS. I re-formatted it to ext4, and it seems to fix the problem.
(In reply to comment #8) > deadline didn't really help in my case. my case was copying large files to a > usb storage device. > My usb storage device was formatted with NTFS. I re-formatted it to ext4, and > it seems to fix the problem. My USB external hard drive is formatted with ext4. However, the problem still exists.
hello, Seeing this on my up to date F16 box. I tried copying 7gigs to a flash drive using rsync, and my interface froze. Pull the flash disk out, and it starts working again. [ankur@ankur ~]$ uname -a Linux ankur.pc 3.1.0-0.rc10.git0.1.fc16.x86_64 #1 SMP Wed Oct 19 05:02:17 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux Thanks, Ankur
I think that https://bugzilla.redhat.com/show_bug.cgi?id=742802 is a duplicate of this bug. I use fedora 15 x86_64, and high IO causes great system freezes. kernel version 2.6.40.6-0.fc15.x86_64 Today I was copying large files to a slow SD card, 3mbytes/s speed, and the whole system was mostly unresponsive. So it doesn't matter where you read and or write That's very bad. This could be also related to systemd/cgroup changes. This problem should be treated as very high priority.
This issue is reproducible on Fedora 16 RC with kernel 3.1.0-1.fc16.i686. Copying a 7.5G file to USB stick in a speed of 7MB/s. System completely freezes and then the cooling fan starts being noisy. So I guess high CPU happened as well. Had to long press the power button to force close. REISUB didn't help.
System UI froze while copying 1,5GB file from external HD (ext4) to usb key (vfat) Linux verne 3.1.0-1.fc16.x86_64 #1 SMP Mon Oct 24 12:18:13 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
There's not much to go on here without any backtraces. Try running the kernel-debug kernel to see if that makes any additional output appear before the lockups.
(In reply to comment #14) > There's not much to go on here without any backtraces. > Try running the kernel-debug kernel to see if that makes any additional output > appear before the lockups. Here are the lines in /var/log/messages before freeze(20:18:50) and after I forced power off and restarted(20:21:42). I was copying a 7.5G file to NTFS formatted USB thumb disk. The freeze happened in less than two minutes: Nov 1 20:18:33 localhost systemd-logind[924]: New session 6 of user lvp. Nov 1 20:18:37 localhost gnome-session[3645]: DEBUG(+): GsmDBusClient: obj_path=/org/gnome/SessionManager interface=org.gnome.SessionManager method=IsInhibited Nov 1 20:18:37 localhost gnome-session[3645]: DEBUG(+): GsmDBusClient: obj_path=/org/gnome/SessionManager interface=org.gnome.SessionManager method=IsInhibited Nov 1 20:18:50 localhost dbus-daemon[940]: ** Message: No devices in use, exit Nov 1 20:21:42 localhost systemd-tmpfiles[3850]: Successfully loaded SELinux database in 52ms 820us, size on heap is 363K. Nov 1 20:24:42 localhost kernel: imklog 5.8.5, log source = /proc/kmsg started. Nov 1 20:24:42 localhost rsyslogd: [origin software="rsyslogd" swVersion="5.8.5" x-pid="1032" x-info="http://www.rsyslog.com"] start Nov 1 20:24:42 localhost kernel: [ 0.000000] Initializing cgroup subsys cpuset Nov 1 20:24:42 localhost kernel: [ 0.000000] Initializing cgroup subsys cpu I use kernel-debug-3.1.0-5.fc16.i686 in this case. The session 6 was started because the pulseaudio constantly crashed and resulted in high Gnome Shell CPU usage. So I performed the copy operation in VTE. Let me know what other info I can provide.
(In reply to comment #14) > There's not much to go on here without any backtraces. > Try running the kernel-debug kernel to see if that makes any additional output > appear before the lockups. Please instruct me how to collect the information you require to identify this issue.
Dave, Would oprofile dump would do? It will be a rather large dump... - Gilboa
Still happens also on Fedora 16, x86_64 kernel 3.1.5-6.fc16.x86_64. If you switch virtual console you can work on the other, but Xorg session is blocked by flush operations on the slow io device. I was copying 10gb of files from a micro sd (with usb adapter) and a nexus s phone. X session is not usable when copying files (copy started using dolphin file manager).
I can confirm the behaviour described by #18, although if I start htop from the virtual console it don't start. Every time this happens and I can start htop I see that the CPU and memory use are low/normal. Tested with 320GB external drive, 8 GB android phone (sdcard) and 2/4 GB flash drive.
After updating my computer's BIOS, and switching of "legacy usb support" in the BIOS, I no longer see the problem. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
http://phoronix.com/forums/showthread.php?67502-Linux-3-2-Kernel-Officially-Christened&p=245623#post245623 an user says that 3.2 kernel fixes the problem, I'm going to grab a 3.2 build!!!
(In reply to comment #21) > http://phoronix.com/forums/showthread.php?67502-Linux-3-2-Kernel-Officially-Christened&p=245623#post245623 > > an user says that 3.2 kernel fixes the problem, I'm going to grab a 3.2 > build!!! http://koji.fedoraproject.org/koji/buildinfo?buildID=281207 would be the one I suggest if you want to try 3.2 out. It's the final 3.2 release with the debugging options disabled.
do I need to rebuild it? It's compiling since two hours ago, because it's compiling many kernel flavours.. I've done a simple rpmbuild --rebuild kernel-3.2.0-2.fc17.src.rpm
(In reply to comment #23) > do I need to rebuild it? > It's compiling since two hours ago, because it's compiling many kernel > flavours.. > I've done a simple rpmbuild --rebuild kernel-3.2.0-2.fc17.src.rpm No... you should just be able to download the built RPMs directly from that link already.
I've build and installed that 3.2 kernel and definitely fixes this bug for me. That's great!
Hi Josh, Will 3.2 please be made available to F16?. This bug is a real pain, and F17 is a long way off :/ Thanks, Ankur
(In reply to comment #26) > Hi Josh, > > Will 3.2 please be made available to F16?. This bug is a real pain, and F17 is > a long way off :/ Yes. Its been discussed on the lists http://lists.fedoraproject.org/pipermail/devel/2012-January/160970.html You can also use the 3.2.0 F17 quite easily with F-16 without issues.
Great! Thanks! :)
I also confirm that updating to kernel 3.2 fix the issue. Even under high computing/IO load, UI is slowed down but still responsive! Much better than to wait until the process is completed.
After updating updating the kernel package from the one provided here: http://koji.fedoraproject.org/koji/buildinfo?buildID=281207 I've notice that now I can use my system while copying files from/to USB drives, although the speed is not constant and low, despite the low system resource usage. I also experience some minor hangups and copying between two usb drives is really slow (below 1MByte/s) and inconstant (the drives are connected to the same internal usb hub).
It seems that also F16 kernel version 3.1.9-1.fc16.x86_64 fixes the problem. Could someone try that to confirm that it works? Anyone has backported those IO patches from 3.2?
No go. I just created a 16GB VM image using dd if=/dev/zero bs=1M ... and the DE (KDE) was completely unresponsive (beyond the mouse pointer, that is). P.S. my 5 x 320GB Software RAID 5 setup was writing at ~350MBps and SSH access worked just fine, so the problem, at least as far as I could see, was limited to the GUI. - Gilboa
which kernel you used? could you try with 3.2.1 kernel from f17?
F16/3.2.1 from koji hanged my machine during after a couple of hours of heavy usage. As I didn't have a serial console attached, I couldn't get any callstack. However, as I'm using the nVidia binary driver (and couldn't reproduce the results on a nVidia-less machine) I decided not to post a bug report. I'll wait for next koji build before trying again. (F17 already moved to 3.3pre-rc) - Gilboa
It appears to be solved with 3.2.1-3.fc16.x86_64. At least I could copy a file with about 12GB without system hang using KDE Dolphin. I can observe slowness but not hanging for a long period, which is considered normal due I/O operations.
Using F16/x86_64/v3.2.1-3, I dd'ed a 60GB VM image (dd if=/dev/zero of=/image/name bs=1M count=XXX) and KDE was unbearably slow. The test was conducted on a 2 x 6C Xeon w/12GB RAM and 5 x 320GB SATA drives on software RAID5 (w/ HT on). No idea why, but I can't remember hitting the same issue in early F15 kernels. (I used to compile kernels and while playing spring-rts... :))
I'm getting the same sort of issue: Under heavy IO it can freeze for a bit, then may or may not return. When running my monthly backups things start queuing up till nothing works... That is: while frozen if you do a ps -ef it hangs, or top or whatever. System is still running and if lucky I can type reboot. If not process just start piling up. Also if it's doing a resync when the backups trigger... forget it... the system will lock up. Two systems: both 3.2.5-3.fc16.x86_64 and using xfs on the array System 1: [root@whitestar log]# mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Sun Aug 14 00:28:09 2011 Raid Level : raid5 Array Size : 2930284032 (2794.54 GiB 3000.61 GB) Used Dev Size : 976761344 (931.51 GiB 1000.20 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Wed Feb 15 13:26:35 2012 State : active, resyncing Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Resync Status : 84% complete Name : whitestar:0 (local to host whitestar) UUID : ff6d1724:e688b10f:e1cd698e:017c37f7 Events : 13921 Number Major Minor RaidDevice State 0 8 32 0 active sync /dev/sdc 1 8 48 1 active sync /dev/sdd 3 8 64 2 active sync /dev/sde 4 8 0 3 active sync /dev/sda System #2: [root@andromeda /]# mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Tue Aug 16 20:31:39 2011 Raid Level : raid5 Array Size : 9767564800 (9315.08 GiB 10001.99 GB) Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB) Raid Devices : 6 Total Devices : 6 Persistence : Superblock is persistent Update Time : Wed Feb 15 13:43:11 2012 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : andromeda:0 (local to host andromeda) UUID : a15f896f:5adcb015:8908feeb:8f0ce6f2 Events : 35578 Number Major Minor RaidDevice State 0 8 48 0 active sync /dev/sdd 1 8 64 1 active sync /dev/sde 2 8 80 2 active sync /dev/sdf 4 8 32 3 active sync /dev/sdc 6 8 16 4 active sync /dev/sdb 5 8 0 5 active sync /dev/sda
This was still present in 3.2.9-2, so I grabbed the 3.3.0-0.rc6.git2.2.fc18 from koji today and installed it on my f16 x86_64 and I no longer experience the gui lockup.
Confirming that I still get the UI lock up: Linux ankur.pc 3.2.9-1.fc16.x86_64 #1 SMP Thu Mar 1 01:41:10 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Will test with 3.3 when it hits stable and re-confirm.
[mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update.
updated to kernel-3.3.0-4.fc16, writed many small files over a slow usb pen, no slowdowns to graphical ui (I use kde), no lag. Great! Hope that this will remain this way over next kernel updates. Thanks Dave and other kernel devs!
Still no go. With fully update kernel on both host and VM's I've simultaneously update the kernel cscope DB on two F16 VM's while browsing the web on the quad core AMD Phenom 635 machine w/ 8GB RAM. I/O was at ~10-20MBps, CPU usage was at under 100% (effectively 25%), but the UI was *very* sluggish. - GIlboa
... Though I should point out that the I had minor swap usage ~250MB, so the slowdown might have been triggered by swap-out. I'll redo the test on my Xeon workstation (see above) and report the results. - Gilboa
I'm getting the same issues with the new 3.3.0-4.fc16 kernel. I've built a software raid 1 with two usb drives for my backups. The system freezes happen when I copy data from or to the raid or when it is resyncing. This is really annoying as the raid needs to be resynced when the system freezed while I was writing to the raid..
Daniel, Could y(In reply to comment #46) > I'm getting the same issues with the new 3.3.0-4.fc16 kernel. > > I've built a software raid 1 with two usb drives for my backups. The system > freezes happen when I copy data from or to the raid or when it is resyncing. > > This is really annoying as the raid needs to be resynced when the system > freezed while I was writing to the raid.. Could you please clarify here - did you build the raid1 from the two USB drives, or do you have a system with a raid1 built from regular disk and just two USB drives installed? Thanks, Jes
(In reply to comment #44) > Still no go. > With fully update kernel on both host and VM's I've simultaneously update the > kernel cscope DB on two F16 VM's while browsing the web on the quad core AMD > Phenom 635 machine w/ 8GB RAM. > I/O was at ~10-20MBps, CPU usage was at under 100% (effectively 25%), but the > UI was *very* sluggish. > > - GIlboa How are you running the VMs? Are you making sure to launch them with O_DIRECT access to the file VM image files? I think it's cache=none on the QEMU command line. If you run the image files with regular buffered I/O and the guests are large compared to the amount of memory you have, you can easily thrash the system memory of the host which will lead to a very sluggish system. Jes
(In reply to comment #47) > Daniel, > > Could y(In reply to comment #46) > > I'm getting the same issues with the new 3.3.0-4.fc16 kernel. > > > > I've built a software raid 1 with two usb drives for my backups. The system > > freezes happen when I copy data from or to the raid or when it is resyncing. > > > > This is really annoying as the raid needs to be resynced when the system > > freezed while I was writing to the raid.. > > Could you please clarify here - did you build the raid1 from the two USB > drives, or do you have a system with a raid1 built from regular disk and > just two USB drives installed? > > Thanks, > Jes Hi Jes, The raid is build from the two USB drives.
I'm having the same issue, kernel 3.3.0-4.fc16.i686, 4 disk soft RAID5 set. when RAID IO is high, the process generating the load will eventually freeze, the process specific CPU core goes into an endless 99-100% WAIT state. When this happens, a shutdown -h now doesn't work, the computer doesn't restart, it hangs during shutdown, only the power or reset button helps. Once restarted the raid set have to be resynced. Hardware is Supermicro X7SPA-H (Atom D510) 2GB RAM soft RAID1 for OS soft RAID5 for data The problem occurred today again, last time it happened was 10 days ago. "screenshot" from nmon (don't know if it'll be readable though, once posted) │CPU User% Sys% Wait% Idle|0 |25 |50 |75 100|│ │ 1 0.5 0.5 0.0 99.0| > |│ │ 2 0.0 0.0 0.0 100.0| > |│ │ 3 0.0 0.5 99.5 0.0|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW>│ │ 4 0.0 0.0 0.0 100.0| > |│ │ +-------------------------------------------------+│ │Avg 0.1 0.2 25.0 74.6|WWWWWWWWWWWW > |│ │ +-------------------------------------------------+│
Jes, My VM's already use cache=none. *However* I can only seem to reproduce this issue on a fairly (?) low end quad core AMD Phnome machine w/8GB RAM. Running the same scenario (including the minor memory over-commit) on a dual 6C Xeon w/ 12GB RAM + 5 drive software RAID5 doesn't produce the same results. Far more, I just DD'ed 4GB into a USB thumb drive while DD'ing 40GB to the MD5 RAID did produce some sluggishness, but far, *far* better than 3.0 or 3.1. I'll do some additional testing and report back.
I can confirm the behavior that is described in comment 50. When the system freezes there is nothing you can do. Even Caps Lock and Num Lock don't work anymore.
Problem still occurs in 3.3.2-6.fc16.i686
3.3.4-3.fc16.i686 is a no go, managed to run into the problem again in less than 24hrs. It's amazing this problem doesn't seem to be assigned to anyone at RH .... :(
Frollic, We're watching the bug, however there is nothing obvious pattern and it is not clear what is going wrong. Only a very small number of people have seen this problem - I haven't seen anything like it in my own testing. In addition, you are running a 32 bit kernel on a 64 bit processor with more than 1GB of RAM. That doesn't make sense as you end up using highmem and bounce buffers which will slow down your I/O performance a fair bit. There are two issues at play here, some people have reported problems when running RAID on top of USB devices, which is just begging for bad things to happen with the short queues and slow performance of USB. The other case is with real SATA connections, but there it also smells like there could be dodgy hardware at play. Jes
I would also add that the problem with slow io devices (without raid) and Xorg freezing seems almost to be gone. Previously every machine I had GUI freezing while writing to slow usb disks or SD cards (both 32bit and 64 bit). Perhaps problems related to raid arrays are different. Regards
Well, it's a very old system, which have been migrated from a previous setup. You can trust me, when I say I would have "upgraded" a long time ago, if there was an easy way of doing it :) Is there a way I/we can help you in investigating the problem ? I seem to run into the issue quite frequently :( Btw, I think the problems started once I upgraded to FC16, I don't think I had them in FC15.
Frollic, It may be the case, but as I said, running a 32bit kernel is begging for problems if you want to have I/O performance. I realize upgrading might be tricky, but it would be very interesting to know if the problems you see go away if you run a 64 bit install on the box. Emilio, I suspect the USB issues are not so much related to raid, but simply that the raid code is more likely to trigger the issue in the first place because of a different way of hitting the device. Jes
(In reply to comment #55) > Frollic, > > We're watching the bug, however there is nothing obvious pattern and it is > not clear what is going wrong. Only a very small number of people have seen > this problem - I haven't seen anything like it in my own testing. > > In addition, you are running a 32 bit kernel on a 64 bit processor with > more than 1GB of RAM. That doesn't make sense as you end up using highmem > and bounce buffers which will slow down your I/O performance a fair bit. > > There are two issues at play here, some people have reported problems when > running RAID on top of USB devices, which is just begging for bad things > to happen with the short queues and slow performance of USB. The other case > is with real SATA connections, but there it also smells like there could be > dodgy hardware at play. > > Jes If it's dodgy hardware that would be interesting. I have the same problem on two different computers. (both i7 2600k) Different motherboards, and controller cards. Different hard drive manufactures.
(In reply to comment #59) > If it's dodgy hardware that would be interesting. I have the same problem on > two different computers. (both i7 2600k) Different motherboards, and > controller cards. Different hard drive manufactures. This is the first time I have seen anyone mention multiple cases of this problem, where as others have mentioned the problem went away when moving to other hardware. Can you list the controller info, motherboard chipset, and kernel version? Are you running partitions directly on top of the raid or do you have lvm in between? From your previous posting I presume you are running the 64 bit kernel? Jes
Created attachment 583544 [details] Computer 1 The raid is direct partitions and the filesystem on top is XFS.
Created attachment 583552 [details] Computer 2 Here's the second computer.
OK, swapped to x86_64, problem appeared in less than 24hrs after the system was ready (same as when I ran i686), and the soft raid5 IO increased. kernel is 3.3.4-3.fc16.x86_64 I'll try the 3.3.5-2 kernel after I reboot the server tomorrow morning.
Though this issue seemed to have gone with kernel 3.2.X series, it returns on all the current 3.3.X kernels on Fedora 17. That includes: kernel-3.3.0-1.fc17.x86_64 kernel-3.3.4-5.fc17.x86_64 kernel-3.3.6-3.fc17.x86_64 For other hadrware infos, please see my smolt profile: http://www.smolts.org/client/show/pub_79c55d9b-ea8d-45f5-8159-fb546e8d06f6 I had tried two different portable hard disks connected via USB. One is 500GB Seagate one formatted with NTFS and the other is the similar but formatted with ext4. The system will completely freeze during either writing or reading large files to them.
OK, upgraded to x86_64 last week, and (as expected) the problem still occurs. Kernel release is 3.3.5-2.fc16.x86_64. This time I'm actually able to kill the application causing the IO wait, but the wait it self doesn't go away.
With the official announcement of kernel 3.4.0, I am kind of hope this issue to be fixed in upstream. Will give the 3.4.0 f17 kernel a try once it's in Koji.
After Tommy Hes findings (#64), I decided to downgrade to 3.2.6-3.fc16.x86_64. I've used the kernel for over a week, and had no freeze so far.
I am starting to think this isn't raid related, at least not all the time. Copying files from an external USB drive to a fast memory stick, and I am seeing my laptop freeze regularly. 3.3.5-2.fc16.x86_64 It feels like it fills up the write queue and then stalls while the writes complete rather than schedule and let other stuff run in the mean time. It doesn't lock up solid, but that is most likely due to the flash drive being a very fast one....
I tried a few other things to see if anything changed. 1. Upgrade to latest BIOS. No change. 2. Add "usb-handoff" to kernel boot parameter. No change.
Ran some more tests yesterday - I found I see this problem if I copy form one USB drive to another. However when I switched to copying from network to the flash drive, the hangs went away. Question for those who see this problem with USB: Has anyone seen this on USB using just one drive, or does it always happen when more than one USB drive is involved? If the latter, it could indicate we have the USB stack fighting over a lock.
(In reply to comment #70) > Question for those who see this problem with USB: Has anyone seen this on > USB using just one drive, or does it always happen when more than one USB > drive is involved? > > If the latter, it could indicate we have the USB stack fighting over a lock. I had this problem months ago making backups with DejaDup on a NTFS-formatted external hard disk. I also had a similar problem not so long ago (system almost unusable) copying a 8 GB file to a NTFS-formatted pendrive. I will try to make a test this weekend.
Jes, I can confirm that the problem mostly happens when I am copying data from one USB drive to another (can't rember it happening when copying data from/to USB from another source (such as network) or from my internal HDD). Moreover I found out that the problem occurs when I am copying data from a fast USB drive to a much slower one.
(In reply to comment #70) > Ran some more tests yesterday - I found I see this problem if I copy form > one USB drive to another. However when I switched to copying from network > to the flash drive, the hangs went away. > > Question for those who see this problem with USB: Has anyone seen this on > USB using just one drive, or does it always happen when more than one USB > drive is involved? > > If the latter, it could indicate we have the USB stack fighting over a lock. The freeze happens on both cases, at least for me. The issue normally occurs when copying large files or multiple small files.
Hi all, This issue seems to be addressed in 3.4.0-1.fc17.x86_64. I tried copying and moving several large files from USB connected portable HDD to internal HDD and no system freeze happened. Thanks,
I agree with Tommy He, I've stopped experiencing it.
Yes, problem appears to be fixed. I transfered almost 1 TB between two RAID5 sets, where one of them was USB-based, no problem whatsoever.
Hi, Since it looks like the problem has been resolved for everyone, I am going to close this bug. If you see this problem returning, please reopen the bug. Thanks, Jes
Hi guys, sadly I have to inform that I have to report the same problem described here on Fedora 19 kernel 3.11.3-201.fc19.i686.PAE. The system freeze/hangs on heavy I/O loads, no matter if is from one internal partition to another or from a internal partition to an USB disk or from an USB disk to an internal partition. The system complete freeze, no possible to go to other terminal I forced to do a hard reboot. My internal system drive is a HP sps-drv hdd 750gb 7200sat2.5in new the USB disk can be any. My system is an Alienware 15x Intel® Core™ i7 CPU Q 820 @ 1.73GHz × 8 with an Nvidia GeForce GTX 260M/PCIe/SSE2. The problem it also happen with the nouvou driver. What else can I add to help to report this?, at least the "messages" log seems to report the same things already on this thread.
I just tried making an 1Gig SD card for my Raspberry Pi and I got this issue again. It's been solved for a LONG time, but it just came back. I'm running: 3.11.4-201.fc19.x86_64
Hi, this problem caught me too on 3.11.6-200.fc19.x86_64. I was syncing data with eSATA drives when the system froze :( With 3.11.3-201.fc19.x86_64 everything is working for me so far.
Not sure what changed... but when I made a Pi SD card today I got no slowness. I'm on 3.11.7-200.fc19.x86_64 now.
I switched from Ubuntu 12.04 to Fedora (because it has a usable Desktop UI in the form of KDE). After 3 months of Fedora 17, I upgraded to Fedora 19. This bug unnerved me in Fedora 17, and continues to do so in Fedora 19. Copying large amounts of data, starting a VM with >512MB in vmware ... most, but not everything that moves a large amount of data slows KDE down to a crawl. Twice now, some misbehaving program forced me to shutdown my computer via Button, because the UI didn't react and I could not kill the program. This could either be a kernel bug, introduced later than 12.04 Ubuntu, or some Redhat/Ubuntu difference/optimization. Nevertheless, this never happended in Ubuntu nor Windows. Running on an Lenovo Thinkpad X201: 3.12.7-200.fc19.x86_64 #1 SMP Fri Jan 10 15:32:06 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Please provide a workaround/fix for this.
(In reply to Scott Baker from comment #79) > I just tried making an 1Gig SD card for my Raspberry Pi and I got this issue > again. It's been solved for a LONG time, but it just came back. I'm running: > 3.11.4-201.fc19.x86_64 I totally agree. It's been an issue before kernel 3.4 for me too then solved and now it's back. I'm running 3.12.7-300.fc20.x86_64. Does not depend on the type of the destination drive, I've experienced it with internal HDD, pendrive, SD card. cp on filesystems and dd on raw block devices also affected.
Could someone please advise, what to do and how to get to the root of the problem? Right now I am at a loss at what Linux Distribution is at all usable, Ubuntu has an unusable interface but not this annoying bug, redhat has a working KDE, but becomes unusable whenever something happens that creates a larger IO load. If you need additional information I am willing to provide it.
I've been running kernel-3.10.5-201.fc19 for a long time (180 days or so), without any soft-RAID5 issues at all. You should be able to find it at http://koji.fedoraproject.org/koji/packageinfo?packageID=8 Today I upgraded to FC20 and 3.12.10-300.fc20, we'll see if it works properly.
thank you, I downloaded kernel-3.10.5-201.fc19.x86_64.rpm kernel-devel-3.10.5-201.fc19.x86_64.rpm kernel-headers-3.10.5-201.fc19.x86_64.rpm kernel-tools-3.10.5-201.fc19.x86_64.rpm kernel-tools-libs-3.10.5-201.fc19.x86_64.rpm Installing with the rpm command did not work, so I tried # yum localinstall *.rpm which did (I come from Ubuntu, so everything Fedora-specific is new to me) This worked. Then I tried all things important to me, that depend upon the kernel version, namely vmware and virtualbox, which are working now. So I do indeed have a working system now! I will come back if it didn't solve the problem and comment again. In the meantime thank you for your advice and help to provide me with a working linux system again.
Seeing this whenever my F19 system is carrying out a reasonable amount of I/O. 3.12.9-201.fc19.x86_64 Easiest way to reproduce is copying files to a USB stick (FAT or NTFS both seem to trigger).
Installing Kernel 3.10 did the trick! Now I just have to figure out, how to set this kernel as default without resetting all of my /bood/grub2/grub.cfg entries to some default. Yeah well, not complaining, but Ubuntu had a nice GUI for this :-) ..nevertheless, the problem is solved (for me) by a new kernel - thank you, especially frollic nilsson who pointed me to how to work around this problem