Description of problem: mkfs.ext3 /dev/sdb1 (a 300 GB partition) runs fine, even if there are disk failures. SMART disk status says OK. Disk is Maxtor 6L300S0 (sata 150) on a sata2 (sata 300, NCQ) JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 02). Errors appear while "writing superblocks and FS accounting information" (approximative translation from my native language). From logs: (note the strange "synchronization", failures appear exactly 1 minute after ntpd did its job) Feb 1 17:47:07 a64 ntpd[2696]: synchronized to LOCAL(0), stratum 10 Feb 1 17:47:07 a64 ntpd[2696]: kernel time sync enabled 0001 Feb 1 17:48:11 a64 ntpd[2696]: synchronized to 64.142.114.146, stratum 2 Feb 1 17:49:11 a64 kernel: ata3.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x2 frozen Feb 1 17:49:11 a64 kernel: ata3.00: tag 0 cmd 0x61 Emask 0x4 stat 0x40 err 0x0 (timeout) Feb 1 17:49:11 a64 kernel: ata3.00: tag 1 cmd 0x61 Emask 0x4 stat 0x40 err 0x0 (timeout) Feb 1 17:49:11 a64 kernel: ata3.00: tag 2 cmd 0x61 Emask 0x4 stat 0x40 err 0x0 (timeout) ...(up to tag 30) Feb 1 17:49:11 a64 kernel: ata3: soft resetting port Feb 1 17:49:18 a64 kernel: ata3: port is slow to respond, please be patient (Status 0xc0) Feb 1 17:49:41 a64 kernel: ata3: port failed to respond (30 secs, Status 0xc0) Feb 1 17:49:41 a64 kernel: ata3: softreset failed (device not ready) Feb 1 17:49:41 a64 kernel: ata3: softreset failed, retrying in 5 secs Feb 1 17:49:46 a64 kernel: ata3: hard resetting port Feb 1 17:49:46 a64 kernel: ata3: COMRESET failed (device not ready) Feb 1 17:49:46 a64 kernel: ata3: hardreset failed, retrying in 5 secs Feb 1 17:49:51 a64 kernel: ata3: hard resetting port Feb 1 17:49:52 a64 kernel: ata3: COMRESET failed (device not ready) Feb 1 17:49:52 a64 kernel: ata3: reset failed, giving up Feb 1 17:49:52 a64 kernel: ata3.00: disabled Feb 1 17:49:52 a64 kernel: ata3: EH complete Feb 1 17:49:52 a64 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000 Feb 1 17:49:52 a64 kernel: end_request: I/O error, dev sdb, sector 114032703 Feb 1 17:49:52 a64 kernel: Buffer I/O error on device sdb1, logical block 28508160 Feb 1 17:49:52 a64 kernel: lost page write due to I/O error on sdb1 Feb 1 17:49:52 a64 kernel: Buffer I/O error on device sdb1, logical block 28508161 Feb 1 17:49:52 a64 kernel: lost page write due to I/O error on sdb1 Feb 1 17:49:52 a64 kernel: Buffer I/O error on device sdb1, logical block 28508162 Feb 1 17:49:52 a64 kernel: lost page write due to I/O error on sdb1 Feb 1 17:49:52 a64 kernel: Buffer I/O error on device sdb1, logical block 28508163 Feb 1 17:49:52 a64 kernel: lost page write due to I/O error on sdb1 Feb 1 17:49:52 a64 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000 Feb 1 17:49:52 a64 kernel: end_request: I/O error, dev sdb, sector 110362687 Feb 1 17:49:52 a64 kernel: Buffer I/O error on device sdb1, logical block 27590656 Feb 1 17:49:52 a64 kernel: lost page write due to I/O error on sdb1 Feb 1 17:49:52 a64 kernel: Buffer I/O error on device sdb1, logical block 27590657 Feb 1 17:49:52 a64 kernel: lost page write due to I/O error on sdb1 Feb 1 17:49:52 a64 kernel: Buffer I/O error on device sdb1, logical block 27590658 Feb 1 17:49:52 a64 kernel: lost page write due to I/O error on sdb1 Feb 1 17:49:52 a64 kernel: Buffer I/O error on device sdb1, logical block 27590659 Feb 1 17:49:52 a64 kernel: lost page write due to I/O error on sdb1 Feb 1 17:49:52 a64 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000 Feb 1 17:49:52 a64 kernel: end_request: I/O error, dev sdb, sector 117178431 Feb 1 17:49:52 a64 kernel: Buffer I/O error on device sdb1, logical block 29294592 Feb 1 17:49:52 a64 kernel: lost page write due to I/O error on sdb1 Feb 1 17:49:52 a64 kernel: Buffer I/O error on device sdb1, logical block 29294593 Feb 1 17:49:52 a64 kernel: lost page write due to I/O error on sdb1 Feb 1 17:49:52 a64 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000 Feb 1 17:49:52 a64 kernel: end_request: I/O error, dev sdb, sector 112984127 Feb 1 17:49:52 a64 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000 Feb 1 17:49:52 a64 kernel: end_request: I/O error, dev sdb, sector 116654143 (then loads of other sector errors) Another try: (apparently no more relation to ntpd synchronization) Feb 2 10:14:56 a64 ntpd[2676]: synchronized to LOCAL(0), stratum 10 Feb 2 10:14:56 a64 ntpd[2676]: kernel time sync enabled 0001 Feb 2 10:15:59 a64 ntpd[2676]: synchronized to 62.112.194.64, stratum 2 Feb 2 10:16:03 a64 Installed: python-virtinst.noarch 0.98.0-1.fc6 Feb 2 10:25:24 a64 kernel: ata3.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x2 frozen Feb 2 10:25:24 a64 kernel: ata3.00: tag 0 cmd 0x61 Emask 0x4 stat 0x40 err 0x0 (timeout) Feb 2 10:25:24 a64 kernel: ata3.00: tag 1 cmd 0x61 Emask 0x4 stat 0x40 err 0x0 (timeout) Feb 2 10:25:24 a64 kernel: ata3.00: tag 2 cmd 0x61 Emask 0x4 stat 0x40 err 0x0 (timeout) then the same as previous try. Disk seems to have a half SMART implementation: Feb 2 10:41:53 a64 smartd[2987]: Device: /dev/sdb, not capable of SMART self-check Feb 2 10:41:54 a64 smartd[2987]: Sending warning via mail to root ... Feb 2 10:41:56 a64 smartd[2987]: Warning via mail to root: successful Feb 2 10:41:56 a64 smartd[2987]: Device: /dev/sdb, failed to read SMART Attribute Data [root@a64 ~]# smartctl /dev/sdb --all -T permissive smartctl version 5.36 [x86_64-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Short INQUIRY response, skip product id SMART Health Status: OK Read defect list: asked for grown list but didn't get it Error Counter logging not supported Device does not support Self Test logging mkfs.ext3 output: [root@a64 ~]# mkfs.ext3 /dev/sdb1 mke2fs 1.39 (29-May-2006) Étiquette de système de fichiers= Type de système d'exploitation : Linux Taille de bloc=4096 (log=2) Taille de fragment=4096 (log=2) 36634624 i-noeuds, 73262416 blocs 3663120 blocs (5.00%) réservés pour le super utilisateur Premier bloc de données=0 Nombre maximum de blocs du système de fichiers=4294967296 2236 groupes de blocs 32768 blocs par groupe, 32768 fragments par groupe 16384 i-noeuds par groupe Superblocs de secours stockés sur les blocs : 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616 Écriture des tables d'i-noeuds : complété Création du journal (32768 blocs) : complété Écriture des superblocs et de l'information de comptabilité du système de fichiers : complété Le système de fichiers sera automatiquement vérifié tous les 20 montages ou après 180 jours, selon la première éventualité. Utiliser tune2fs -c ou -i pour écraser la valeur. Version-Release number of selected component (if applicable): 1.39-7.fc6 How reproducible: Unsure. formatted the same way a similar disk (same model but sata 300) without problem. Is it a bug with sata 150 HD on sata 300 port ? Still have to find hardware and time for further tests. Steps to Reproduce: 1. 2. 3. Actual results: mkfs.ext3 says everything ran fine. Expected results: should be aware of the failure. Additional info:
ok, so this bug is what mkfs does in the face of IO errors... I'll have to take a look at the error checking. Out of curiosity, what does subsequent mount or fsck say? The root cause is unknown, looks like you have ATA problems...
Actually, there is not much that mkfs can do here; it is doing buffered writes (which is fine) which will actually be flushed to disk at a later time. If those later attempts to flush data fail, mkfs won't know; it may well have already exited; there is no feedback possible. Unless mkfs were to use direct IO it would not catch this sort of error. But, this is true of any application which does buffered writes. I'm going to close this as NOTABUG; I think the only real problem here is that you have a failing disk.