Bug 318661
Summary: | F7 crashes, kernel 2.6.22.9-91.fc7: Journal commit I/O error | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Gilbert Sebenste <sebenste> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | 7 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-10-22 16:10:30 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Gilbert Sebenste
2007-10-04 16:11:45 UTC
Created attachment 216061 [details]
This is my /var/spool/messages file, showing the relevant bootup messages. No errors occurred before this point; what you see is the first message. following reboot and beyond.
This is usually caused by failing disk drives. What does smartctl say about the drive's health? # smartctl -t short <device> [wait for test to finish] # smartctl -a <device> Hi Chuck, Yes, I agree, but this is new, and is happening on two machines. I've never used smartctl before. df -k yields: Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 707523888 35975188 635028764 6% / /dev/sda1 101086 18723 77144 20% /boot tmpfs 1815340 0 1815340 0% /dev/shm So when I type: smartctl -t short /dev/mapper/VolGroup00-LogVol00 or smartctl -t short / I get: Smartctl: please specify device type with the -d option. It's a SATA drive. What would be the correct command? OK, I tried this on the kernel part of it. It took 5 seconds to run, or is it still running? It says it will take 79 minutes, but... smartctl -t short /dev/sda1 smartctl version 5.37 [i386-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Warning! SMART Attribute Data Structure error: invalid SMART checksum. Warning! SMART Attribute Thresholds Structure error: invalid SMART checksum. === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 79 minutes for test to complete. Test will complete after Thu Oct 4 15:52:06 2007 Use smartctl -X to abort test. [root@machine username]# Wait out the 79 minutes, then run smartctl -a Created attachment 217031 [details]
Output from the hard drive test.
This one is the output form the hard drive check.
I saw this in the logwatch email I get daily: WARNING: Kernel Errors Present res 51/04:00:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) ...: 6 Time(s) res 51/04:00:01:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) ...: 2 Time(s) res 51/04:01:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) ...: 3 Time(s) res 51/04:01:01:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) ...: 4 Time(s) res 51/04:01:06:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) ...: 1 Time(s) Let me guess...failing hard drive? OK, here's an expanded version of the above, at 4 different times last evening. None so far today. Oct 4 14:33:05 weather kernel: ata3.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0 Oct 4 14:33:06 weather kernel: res 51/04:00:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Oct 4 14:33:06 weather kernel: ata3.00: configured for UDMA/133 Oct 4 14:33:06 weather kernel: ata3: EH complete Oct 4 14:33:06 weather kernel: sd 2:0:0:0: [sda] 1465149168 512-byte hardware sectors (750156 MB) Oct 4 14:33:06 weather kernel: sd 2:0:0:0: [sda] Write Protect is off Oct 4 14:33:06 weather kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support $ Oct 4 14:33:06 weather kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Oct 4 14:33:06 weather kernel: ata3.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0 Oct 4 14:33:06 weather kernel: res 51/04:00:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Oct 4 14:33:06 weather kernel: ata3.00: configured for UDMA/133 Oct 4 14:33:06 weather kernel: ata3: EH complete Oct 4 14:33:06 weather kernel: sd 2:0:0:0: [sda] 1465149168 512-byte hardware sectors (750156 MB) Oct 4 14:33:06 weather kernel: sd 2:0:0:0: [sda] Write Protect is off I am getting ready to call this a failing hard drive, and call it a night... Gentlemen, I am sorry to have wasted your time. It is now becoming apparent that I got a bad batch of hard drives from Seagate. I couldn't believe they're all bad, but after further analysis this afternoon, I can't escape the dreadful conclusion. I am terribly sorry to have wasted time here with you. My apologies, and have a great weekend. Mark this as NOTABUG and close, please. Thanks. Gilbert New hard drives do not stop this from happening. occurs randomly, but computer will not stay up more than 18 hours. Using a Seagate 750 GB SATA 3Gb/s drive, 7200 RPM, 16 MB. Since this machine is new out-of-the-box, I can tell you that older machines I have do NOT have this problem. And they have a higher load and use the drive more heavily than this one. I have also switched back to an older kernel, the original for F7: 2.6.21-1.3194.fc7. Will notify if this holds or not. (In reply to comment #10) > New hard drives do not stop this from happening. occurs randomly, but computer > will not stay up more than 18 hours. > > Using a Seagate 750 GB SATA 3Gb/s drive, 7200 RPM, 16 MB. > Set for 3Gb/s or jumpered down to 1.5? The faster setting can cause problems... Hi Chuck, Really? How does one jumper it down to 1.5? I see 4 pins on the back, but nothing to jumper it with in the box. (In reply to comment #13) > How does one jumper it down to 1.5? I see 4 pins on the back, but > nothing to jumper it with in the box. They're just standard jumpers, they should be available ata a parts store or salvageable from broken equipment. But you do need the manual for the drive to tell you which pins to jumper. Wait. If my HD is 3 GB/sec, and my motherboard (ASUS) supports SATA-2, why is there a problem? Is it BIOS, or OS related? (In reply to comment #15) > Wait. If my HD is 3 GB/sec, and my motherboard (ASUS) supports SATA-2, why is > there a problem? Is it BIOS, or OS related? Could be anything: cables, drive firmware, controller or OS. Jumper is on as of 19:15 CT on 10/9/07. Will keep you posted. I am not sure what to do now. I am now on kernel 2.6.23.1. I am also using a jumper to keep my SATA drives using the SATA1 protocol instead of SATA2. No crashes (except for an apparent unrelated bug, for which I just filed). Should I close this? |