Bug 513221 - ext4 filesystem corruption and data loss
Summary: ext4 filesystem corruption and data loss
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 11
Hardware: All
OS: Linux
high
urgent
Target Milestone: ---
Assignee: Eric Sandeen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 513460
TreeView+ depends on / blocked
 
Reported: 2009-07-22 15:30 UTC by Doug Ledford
Modified: 2009-12-01 04:44 UTC (History)
7 users (show)

Fixed In Version: 2.6.30.9-99.fc11
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-12-01 04:44:31 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
strace excerpt of Dovecot file access before/after corruption (16.73 KB, text/plain)
2009-11-01 10:36 UTC, Tim Eliseo
no flags Details
jprobe for ext4_fallocate (1.16 KB, application/x-gzip)
2009-11-10 21:56 UTC, Eric Sandeen
no flags Details

Description Doug Ledford 2009-07-22 15:30:14 UTC
Description of problem:
Afflicted machine is both a sendmail and dovecot imap mail server.  Local mail delivery is via procmail.  We have now experienced three separate instances of mailbox file corruption resulting in minor to severe loss of mail data.  Both procmail and dovecot are running on a local ext4 filesystem, so NFS locking issues should not be a problem.

Version-Release number of selected component (if applicable):
kernel-2.6.29.5-191.fc11.x86_64

How reproducible:
Random occurrence, has happened three times in roughly 1 week.

Steps to Reproduce:
I don't have a reproducer yet, but I would assume two task both reading and writing to the same file, using some sort of file locking to mediate access, should replicate the problem.
  
Actual results:
The effected files all have blocks of NULL characters.  Usually, those blocks of NULL characters have overwritten previous data and new data is appended beyond the NULL stream.


Expected results:
No file corruption!

Additional info:
I have corrupted files available for inspection of that would help

Comment 2 Ric Wheeler 2009-07-22 19:39:10 UTC
Can you add details on the specific F11 kernel version & mount options used for the file system?

Thanks!

Comment 3 Eric Sandeen 2009-07-22 19:47:35 UTC
Did the machine ever experience power a loss or crash?

Comment 4 Eric Sandeen 2009-07-22 19:52:50 UTC
Oh another thing that may be interesting is to map the file - filefrag -v would suffice.  This will tell us whether we have 0s on disk, or holes in the file.

If there was no power loss or other odd event, my first hunch is delalloc problems; if this is at all repeatable and you're willing to experiment, mounting with -o nodelalloc would take that out of the picture.

(Potential downside is that nodelalloc doesn't get a lot of testing...)

Ric also asks whether you may have ever run out of space during operation...

Comment 5 Doug Ledford 2009-07-22 22:27:24 UTC
Kernel version is in comment #1

Mount options: 

/dev/md/root            /         ext4    defaults,noatime,nodiratime 1 1

No power loss.  No recent crashes since disabling SELinux about 3 weeks ago.  Prior to that I had some nfs<->SELinux related crashes where this machine was NFS server and when the client tried to access files, on random occasions and files it would fail to get an SELinux context.

[doug@firewall redhat-lists]$ sudo filefrag -v bugzilla
Checking bugzilla
Filesystem type is: ef53
Filesystem cylinder groups is approximately 18299
File is stored in extents format
Blocksize of file bugzilla is 4096
File size of bugzilla is 991855 (243 blocks)
First block: 22356864
Last block: 22471154
Discontinuity: Block 128 is at 22471040 (was 22356991)
bugzilla: 2 extents found
[doug@firewall redhat-lists]$ 

Never ran out of space.  Current usage has over 1TB free.

Comment 6 Eric Sandeen 2009-07-23 03:02:49 UTC
Thanks Doug.  Could you also:

# debugfs /dev/whatever
debugfs> stat relative/path/to/bugzilla

Just trying to collect some info about the file to try to correlate to the location of the 0s.

thanks,
-Eric

Comment 7 Doug Ledford 2009-07-23 11:58:01 UTC
stat /home/doug/Mail/redhat-lists/bugzilla
Inode: 5541183   Type: regular    Mode:  0600   Flags: 0x80000
Generation: 3103348147    Version: 0x00000000:00000001
User:  2218   Group:  2218   Size: 995714
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 1952
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x4a684b9e:819806a0 -- Thu Jul 23 07:38:06 2009
 atime: 0x4a37f0bd:00000000 -- Tue Jun 16 15:21:33 2009
 mtime: 0x4a684b9e:819806a0 -- Thu Jul 23 07:38:06 2009
crtime: 0x4a3804ec:961871c8 -- Tue Jun 16 16:47:40 2009
Size of extra inode fields: 28
Extended attributes stored in inode body: 
  selinux = "unconfined_u:object_r:user_home_t:s0\000" (37)
BLOCKS:
(0-127):22356864-22356991, (128-243):22471040-22471155
TOTAL: 244

debugfs:

Comment 8 Doug Ledford 2009-07-23 12:32:29 UTC
Umm....Holy Crap.  After the unclean shutdowns in the past it always came back up mostly clean, just did a journal replay and went on.  But, given the current issues, I ran fsck, and this is what I got.


[root@firewall openvpn]# fsck.ext4 -n /dev/md/root
e2fsck 1.41.4 (27-Jan-2009)
Warning!  /dev/md/root is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
firewall/ contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found.  Fix? no

Inode 4760 was part of the orphaned inode list.  IGNORED.
Inode 5374 was part of the orphaned inode list.  IGNORED.
Inode 5447 was part of the orphaned inode list.  IGNORED.
Inode 11048 was part of the orphaned inode list.  IGNORED.
Inode 11200 was part of the orphaned inode list.  IGNORED.
Inode 11231 was part of the orphaned inode list.  IGNORED.
Inode 11357 was part of the orphaned inode list.  IGNORED.
Inode 32344 was part of the orphaned inode list.  IGNORED.
Inode 99504 was part of the orphaned inode list.  IGNORED.
Inode 101530 was part of the orphaned inode list.  IGNORED.
Inode 103650 was part of the orphaned inode list.  IGNORED.
Deleted inode 105652 has zero dtime.  Fix? no

Inode 155994 was part of the orphaned inode list.  IGNORED.
Inode 155995 was part of the orphaned inode list.  IGNORED.
Inode 164750 was part of the orphaned inode list.  IGNORED.
Inode 166059 was part of the orphaned inode list.  IGNORED.
Inode 166122 was part of the orphaned inode list.  IGNORED.
Inode 167136 was part of the orphaned inode list.  IGNORED.
Inode 168909 was part of the orphaned inode list.  IGNORED.
Inode 169301 was part of the orphaned inode list.  IGNORED.
Inode 169302 was part of the orphaned inode list.  IGNORED.
Inode 169348 was part of the orphaned inode list.  IGNORED.
Inode 169389 was part of the orphaned inode list.  IGNORED.
Inode 169396 was part of the orphaned inode list.  IGNORED.
Inode 169443 was part of the orphaned inode list.  IGNORED.
Inode 169477 was part of the orphaned inode list.  IGNORED.
Inode 169484 was part of the orphaned inode list.  IGNORED.
Inode 169497 was part of the orphaned inode list.  IGNORED.
Inode 169516 was part of the orphaned inode list.  IGNORED.
Inode 169521 was part of the orphaned inode list.  IGNORED.
Inode 169547 was part of the orphaned inode list.  IGNORED.
Inode 169682 was part of the orphaned inode list.  IGNORED.
Inode 170515 was part of the orphaned inode list.  IGNORED.
Inode 170849 was part of the orphaned inode list.  IGNORED.
Inode 170850 was part of the orphaned inode list.  IGNORED.
Inode 171815 was part of the orphaned inode list.  IGNORED.
Inode 172876 was part of the orphaned inode list.  IGNORED.
Inode 175733 was part of the orphaned inode list.  IGNORED.
Inode 175833 was part of the orphaned inode list.  IGNORED.
Inode 175835 was part of the orphaned inode list.  IGNORED.
Inode 175901 was part of the orphaned inode list.  IGNORED.
Inode 175969 was part of the orphaned inode list.  IGNORED.
Inode 176022 was part of the orphaned inode list.  IGNORED.
Inode 176302 was part of the orphaned inode list.  IGNORED.
Inode 176353 was part of the orphaned inode list.  IGNORED.
Inode 176542 was part of the orphaned inode list.  IGNORED.
Inode 176588 was part of the orphaned inode list.  IGNORED.
Inode 176781 was part of the orphaned inode list.  IGNORED.
Inode 176855 was part of the orphaned inode list.  IGNORED.
Inode 176934 was part of the orphaned inode list.  IGNORED.
Inode 177176 was part of the orphaned inode list.  IGNORED.
Inode 177213 was part of the orphaned inode list.  IGNORED.
Inode 177382 was part of the orphaned inode list.  IGNORED.
Inode 177474 was part of the orphaned inode list.  IGNORED.
Inode 177488 was part of the orphaned inode list.  IGNORED.
Inode 177586 was part of the orphaned inode list.  IGNORED.
Inode 177709 was part of the orphaned inode list.  IGNORED.
Inode 177716 was part of the orphaned inode list.  IGNORED.
Inode 177766 was part of the orphaned inode list.  IGNORED.
Inode 177775 was part of the orphaned inode list.  IGNORED.
Inode 177814 was part of the orphaned inode list.  IGNORED.
Inode 177883 was part of the orphaned inode list.  IGNORED.
Inode 177950 was part of the orphaned inode list.  IGNORED.
Inode 177977 was part of the orphaned inode list.  IGNORED.
Inode 178227 was part of the orphaned inode list.  IGNORED.
Inode 178259 was part of the orphaned inode list.  IGNORED.
Inode 178326 was part of the orphaned inode list.  IGNORED.
Inode 178396 was part of the orphaned inode list.  IGNORED.
Inode 178467 was part of the orphaned inode list.  IGNORED.
Inode 178520 was part of the orphaned inode list.  IGNORED.
Inode 178584 was part of the orphaned inode list.  IGNORED.
Inode 178804 was part of the orphaned inode list.  IGNORED.
Inode 178867 was part of the orphaned inode list.  IGNORED.
Inode 197633 was part of the orphaned inode list.  IGNORED.
Inode 199282 was part of the orphaned inode list.  IGNORED.
Inode 199300 was part of the orphaned inode list.  IGNORED.
Inode 199348 was part of the orphaned inode list.  IGNORED.
Inode 201848 was part of the orphaned inode list.  IGNORED.
Inode 204603 was part of the orphaned inode list.  IGNORED.
Inode 204684 was part of the orphaned inode list.  IGNORED.
Inode 204688 was part of the orphaned inode list.  IGNORED.
Inode 204755 was part of the orphaned inode list.  IGNORED.
Inode 205220 was part of the orphaned inode list.  IGNORED.
Inode 205308 was part of the orphaned inode list.  IGNORED.
Inode 205309 was part of the orphaned inode list.  IGNORED.
Inode 208867 was part of the orphaned inode list.  IGNORED.
Inode 208874 was part of the orphaned inode list.  IGNORED.
Inode 208918 was part of the orphaned inode list.  IGNORED.
Inode 208922 was part of the orphaned inode list.  IGNORED.
Inode 208934 was part of the orphaned inode list.  IGNORED.
Inode 208979 was part of the orphaned inode list.  IGNORED.
Inode 208987 was part of the orphaned inode list.  IGNORED.
Inode 209186 was part of the orphaned inode list.  IGNORED.
Inode 209190 was part of the orphaned inode list.  IGNORED.
Inode 209210 was part of the orphaned inode list.  IGNORED.
Inode 209213 was part of the orphaned inode list.  IGNORED.
Inode 209214 was part of the orphaned inode list.  IGNORED.
Inode 209225 was part of the orphaned inode list.  IGNORED.
Inode 209335 was part of the orphaned inode list.  IGNORED.
Inode 209392 was part of the orphaned inode list.  IGNORED.
Inode 209634 was part of the orphaned inode list.  IGNORED.
Inode 212153 was part of the orphaned inode list.  IGNORED.
Inode 212262 was part of the orphaned inode list.  IGNORED.
Inode 212268 was part of the orphaned inode list.  IGNORED.
Inode 212269 was part of the orphaned inode list.  IGNORED.
Inode 212276 was part of the orphaned inode list.  IGNORED.
Inode 212281 was part of the orphaned inode list.  IGNORED.
Inode 212316 was part of the orphaned inode list.  IGNORED.
Inode 212682 was part of the orphaned inode list.  IGNORED.
Inode 212765 was part of the orphaned inode list.  IGNORED.
Inode 212803 was part of the orphaned inode list.  IGNORED.
Inode 216493 was part of the orphaned inode list.  IGNORED.
Inode 217003 was part of the orphaned inode list.  IGNORED.
Inode 218025 was part of the orphaned inode list.  IGNORED.
Inode 218080 was part of the orphaned inode list.  IGNORED.
Inode 218197 was part of the orphaned inode list.  IGNORED.
Inode 218312 was part of the orphaned inode list.  IGNORED.
Inode 218371 was part of the orphaned inode list.  IGNORED.
Inode 218397 was part of the orphaned inode list.  IGNORED.
Inode 218421 was part of the orphaned inode list.  IGNORED.
Inode 218446 was part of the orphaned inode list.  IGNORED.
Inode 218751 was part of the orphaned inode list.  IGNORED.
Inode 218903 was part of the orphaned inode list.  IGNORED.
Inode 219089 was part of the orphaned inode list.  IGNORED.
Inode 219094 was part of the orphaned inode list.  IGNORED.
Inode 219620 was part of the orphaned inode list.  IGNORED.
Inode 219814 was part of the orphaned inode list.  IGNORED.
Inode 219817 was part of the orphaned inode list.  IGNORED.
Inode 219835 was part of the orphaned inode list.  IGNORED.
Inode 240154 was part of the orphaned inode list.  IGNORED.
Inode 241714 was part of the orphaned inode list.  IGNORED.
Inode 241718 was part of the orphaned inode list.  IGNORED.
Inode 247978 was part of the orphaned inode list.  IGNORED.
Inode 247983 was part of the orphaned inode list.  IGNORED.
Inode 247992 was part of the orphaned inode list.  IGNORED.
Inode 248010 was part of the orphaned inode list.  IGNORED.
Inode 248290 was part of the orphaned inode list.  IGNORED.
Inode 248298 was part of the orphaned inode list.  IGNORED.
Inode 248299 was part of the orphaned inode list.  IGNORED.
Inode 248300 was part of the orphaned inode list.  IGNORED.
Inode 248301 was part of the orphaned inode list.  IGNORED.
Inode 248302 was part of the orphaned inode list.  IGNORED.
Inode 248306 was part of the orphaned inode list.  IGNORED.
Inode 248315 was part of the orphaned inode list.  IGNORED.
Inode 266976 was part of the orphaned inode list.  IGNORED.
Inode 291027 was part of the orphaned inode list.  IGNORED.
Inode 291546 was part of the orphaned inode list.  IGNORED.
Inode 291570 was part of the orphaned inode list.  IGNORED.
Inode 291572 was part of the orphaned inode list.  IGNORED.
Inode 291762 was part of the orphaned inode list.  IGNORED.
Inode 293046 was part of the orphaned inode list.  IGNORED.
Inode 309020 was part of the orphaned inode list.  IGNORED.
Inode 309081 was part of the orphaned inode list.  IGNORED.
Inode 309082 was part of the orphaned inode list.  IGNORED.
Inode 309083 was part of the orphaned inode list.  IGNORED.
Inode 5377632 was part of the orphaned inode list.  IGNORED.
Inode 5379671 was part of the orphaned inode list.  IGNORED.
Inode 6029642 was part of the orphaned inode list.  IGNORED.
Inode 6032427 was part of the orphaned inode list.  IGNORED.
Inode 6032568 was part of the orphaned inode list.  IGNORED.
Inode 6032728 was part of the orphaned inode list.  IGNORED.
Inode 6032731 was part of the orphaned inode list.  IGNORED.
Inode 6032736 was part of the orphaned inode list.  IGNORED.
Inode 6032944 was part of the orphaned inode list.  IGNORED.
Inode 6032975 was part of the orphaned inode list.  IGNORED.
Inode 6033281 was part of the orphaned inode list.  IGNORED.
Inode 6033432 was part of the orphaned inode list.  IGNORED.
Inode 6033685 was part of the orphaned inode list.  IGNORED.
Inode 6034543 was part of the orphaned inode list.  IGNORED.
Inode 6035373 was part of the orphaned inode list.  IGNORED.
Inode 6035533 was part of the orphaned inode list.  IGNORED.
Inode 6036431 was part of the orphaned inode list.  IGNORED.
Inode 6036448 was part of the orphaned inode list.  IGNORED.
Inode 6036482 was part of the orphaned inode list.  IGNORED.
Inode 6036517 was part of the orphaned inode list.  IGNORED.
Inode 6036555 was part of the orphaned inode list.  IGNORED.
Inode 6036561 was part of the orphaned inode list.  IGNORED.
Inode 6036593 was part of the orphaned inode list.  IGNORED.
Inode 6036602 was part of the orphaned inode list.  IGNORED.
Inode 6036618 was part of the orphaned inode list.  IGNORED.
Inode 6036621 was part of the orphaned inode list.  IGNORED.
Inode 6036626 was part of the orphaned inode list.  IGNORED.
Inode 6036650 was part of the orphaned inode list.  IGNORED.
Inode 6036656 was part of the orphaned inode list.  IGNORED.
Inode 6036665 was part of the orphaned inode list.  IGNORED.
Inode 6036667 was part of the orphaned inode list.  IGNORED.
Inode 6036675 was part of the orphaned inode list.  IGNORED.
Inode 6036684 was part of the orphaned inode list.  IGNORED.
Inode 6036691 was part of the orphaned inode list.  IGNORED.
Inode 6036696 was part of the orphaned inode list.  IGNORED.
Inode 6036701 was part of the orphaned inode list.  IGNORED.
Inode 6036703 was part of the orphaned inode list.  IGNORED.
Inode 6037595 was part of the orphaned inode list.  IGNORED.
Inode 10592426 was part of the orphaned inode list.  IGNORED.
Inode 12320927 was part of the orphaned inode list.  IGNORED.
Inode 12320928 was part of the orphaned inode list.  IGNORED.
Inode 12320929 was part of the orphaned inode list.  IGNORED.
Pass 2: Checking directory structure
Entry '.fetchmail.pid' in /home/dledford (5541444) has deleted/unused inode 5379205.  Clear? no
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(16128--16169) -(17152--17221) -(35808--35837) -(40832--40946) -(95040--95102) -(115680--115708) -(516096--516131) -(656832--656865) -(657536--657553) -(659904--659938) -(660416--660467) -(660512--660538) -(661120--661165) -(677248--677375) -(681696--681714) -(690048--690068) -(690080--690095) -(693504--693623) -(695232--695294) -(696608--696624) -(701184--701245) -(710752--710778) -(712640--712702) -(725696--725732) -(727040--727106) -(727680--727753) -(729376--729400) -(734464--734502) -(736864--736881) -(737152--737251) -(751296--751333) -(755200--755358) -(760576--760613) -(778240--778405) -(788224--788244) -(799392--799417) -(820720--820735) -(825904--825919) -(837760--837793) -(837888--837911) -(839424--839666) -(839936--839983) -(840000--840042) -(842112--842136) -(845216--845233) -(857600--857913) -(858304--858322) -(861568--861644) -(864704--864739) -(864832--864885) -(864896--864932) -(864960--864981) -(869984--870011) -(875392--875410) -(876128--876156) -(879488--879543) -(880320--880337) -(885792--885808) -(886240--886263) -(887072--887100) -(887264--887282) -(887744--887795) -(887904--887930) -(993536--993602) -(1023392--1023415) -(1023488--1023504) -(1086464--1086499) -(1091840--1091896) -(1107456--1107648) -(1110016--1110319) -(1110528--1110894) -(1115648--1115683) -(1145344--1145434) -(1155328--1155432) -(1156608--1156718) -(1166976--1167018) -(1217600--1217621) -(1237504--1237594) -(1243264--1243291) -(1262592--1262746) -(1264896--1264943) -(1269760--1269844) -(1269888--1269930) -(1353472--1353642) -(1420032--1420171) -(1507824--1507836) -(1519040--1519072) -(1531392--1531681) -(1538560--1538704) -(1547264--1547727) -(1548494--1548504) -(1609728--1611775) -(1613824--1615871) -(1617920--1618960) -(1639424--1639561) -(1749472--1749488) -(1782272--1782719) -(20803584--20804063) -(20805120--20805631) -(22806754--22806776) -(22806975--22806978) -(22839376--22839381) -(22840351--22840354) -(22840519--22840526) -(24416704--24416736) -(24416768--24416805) -(24416832--24416864) -(24417024--24417175) -(26181993--26181997) -(26182073--26182078) -(26182589--26182611) -(26182743--26182755) -(26182780--26182791) -(26182805--26182810) -(26183084--26183091) -(26183110--26183120) -(26183140--26183145) -(26183613--26183619) -(26192279--26192288) -(26192357--26192364) -(26192453--26192460) -(26192681--26192689) -(26192994--26193000) -(26193098--26193103) -(26193124--26193135) -(26193472--26193476) -(26193732--26193738) -(26193879--26193888) -(26193905--26193916) -(48662074--48662079) -(48727051--48727055) -(48732056--48732062) -(48742236--48742245) -(48742259--48742284) -(48742297--48742306) -(49333056--49333112) -(54463467--54463468) -54463531 -54464487 -(54479728--54479737) -(54479853--54479857) -54494288 -54495066 -(54502541--54502545) -54541479 -54541511 -(54541513--54541514) -54541526 -(56197270--56197271) -(58004151--58004154) -(58011927--58011929) -58532044 -58758794 -58758848 -58758905 -58758971 -(58884347--58884353) -59475269 -(60523864--60523871) -(60571899--60571912) -(60571933--60571938) -(60589445--60589456) -(60590604--60590614) -(60591017--60591019) -(60607154--60607168) -(60607914--60607926) -(60609124--60609127)
Fix? no

Free blocks count wrong for group #1617 (7734, counted=7735).
Fix? no

Free blocks count wrong for group #1661 (24072, counted=24074).
Fix? no

Free blocks count wrong for group #1662 (31088, counted=31093).
Fix? no

Free blocks count wrong for group #1663 (29854, counted=29874).
Fix? no

Free blocks count wrong for group #1716 (22408, counted=22409).
Fix? no

Free blocks count wrong for group #1766 (24047, counted=24049).
Fix? no

Free blocks count wrong for group #1767 (21221, counted=21227).
Fix? no

Free blocks count wrong for group #1776 (4224, counted=4226).
Fix? no

Free blocks count wrong for group #1786 (13739, counted=13738).
Fix? no

Free blocks count wrong for group #1858 (30121, counted=30123).
Fix? no

Free blocks count wrong (524151895, counted=524151947).
Fix? no

Inode bitmap differences:  -4760 -5374 -5447 -11048 -11200 -11231 -11357 -32344 -99504 -101530 -103650 -105652 -(155994--155995) -164750 -166059 -166122 -167136 -168909 -(169301--169302) -169348 -169389 -169396 -169443 -169477 -169484 -169497 -169516 -169521 -169547 -169682 -170515 -(170849--170850) -171815 -172876 -175733 -175833 -175835 -175901 -175969 -176022 -176302 -176353 -176542 -176588 -176781 -176855 -176934 -177176 -177213 -177382 -177474 -177488 -177586 -177709 -177716 -177766 -177775 -177814 -177883 -177950 -177977 -178227 -178259 -178326 -178396 -178467 -178520 -178584 -178804 -178867 -197633 -199282 -199300 -199348 -201848 -204603 -204684 -204688 -204755 -205220 -(205308--205309) -208867 -208874 -208918 -208922 -208934 -208979 -208987 -209186 -209190 -209210 -(209213--209214) -209225 -209335 -209392 -209634 -212153 -212262 -(212268--212269) -212276 -212281 -212316 -212682 -212765 -212803 -216493 -217003 -218025 -218080 -218197 -218312 -218371 -218397 -218421 -218446 -218751 -218903 -219089 -219094 -219620 -219814 -219817 -219835 -240154 -241714 -241718 -247978 -247983 -247992 -248010 -248290 -(248298--248302) -248306 -248315 -266976 -291027 -291546 -291570 -291572 -291762 -293046 -309020 -(309081--309083) -5377632 -5379205 -5379671 -6032427 -6032568 -6032728 -6032731 -6032736 -6032944 -6032975 -6033281 -6033685 -6035533 -6036618 -10592426 -(12320927--12320929)
Fix? no

Free inodes count wrong for group #656 (199, counted=198).
Fix? no

Free inodes count wrong for group #736 (252, counted=277).
Fix? no

Free inodes count wrong for group #737 (7114, counted=7115).
Fix? no

Free inodes count wrong (150641317, counted=150641342).
Fix? no


firewall/: ********** WARNING: Filesystem still has errors **********

firewall/: 1647963/152289280 files (0.7% non-contiguous), 85002345/609154240 blocks
[root@firewall openvpn]#

Comment 9 Doug Ledford 2009-07-23 14:12:26 UTC
OK, the previous comment is a red herring.  It was simply because I did a read only check of a mounted filesystem.  A similar read only check while the fs was mounted ro came back clean.  So, whatever corruption I have isn't a case of inconsistent meta data, the meta data is consistent, it's just pointing to data blocks that it shouldn't :-(

Comment 10 Eric Sandeen 2009-07-23 15:10:23 UTC
Heh, don't you wish you could delete comments ;)

Anyway, your 0s were to 0x5c00, or 23552 bytes, which is not on a 4k boundary, it's 5.75 4k blocks.  It is on a 512-byte boundary though.  It also doesn't correlate with your block layout boundaries currently; your first extent is 244 blocks long, so it's not a sparse file leading to 0s;  you really do seem to be pointing at valid blocks which just happen to have 0s in them.

Any idea if the 0s always start on 512-byte boundaries?  Any other pattern to them?

Working backwards from strings of 0s to "what went wrong" will be tough.  :(

The 512 boundary is a little suspicious; if it were a delalloc problem I'd expect it to be on fs block boundaries (4k in your case).

Comment 11 Ric Wheeler 2009-07-23 15:16:38 UTC
It would be interesting to get some idea of how your mail server is using the file system - is it using O_DIRECT? A_SYNC? etc? 

Is it multi-threaded? (This reminds me of a mis-behaving multi-threaded app from my past that tried to be clever with its threads all writing to the same file).

Comment 12 Eric Sandeen 2009-07-23 16:12:37 UTC
Doug can you attach your dovecot config?  I may try pointing http://www.imapwiki.org/ImapTest at dovecot over ext4.

Are the dovecot mail files the only thing you've seen problems with?

-Eric

Comment 13 Doug Ledford 2009-07-23 16:42:46 UTC
(In reply to comment #10)
> Anyway, your 0s were to 0x5c00, or 23552 bytes, which is not on a 4k boundary,
> it's 5.75 4k blocks.  It is on a 512-byte boundary though.

You guys need glasses or something ;-)  You asked for the kernel version when I gave it in comment #1, and the 0s were to 0x5c000 not 0x5c00, which is exactly 368k, which *is* a 4k boundary, it's file block 0 through 91 ;-)  That kinda makes the rest of your comment moot.

Although I'll answer the 0s question for you: this is on a relatively new raid5 device.  As a part of the initial raid5 sync, the entire array gets 0ed out.  So, any time the metadata points to blocks that have never actually been written, which is going to be almost all the time when it points to random otherwise unallocated blocks, it's going to return 0s.  In the future, when this array has had lots of stuff written and deleted, that will start to show random data instead.

Now, as for the fileserver/mail server.  I'm using the exact same setup I've been using for several years.  However, about a month ago, I upgraded from Fedora 9 to Fedora 11.  During that upgrade, I put in new hard drives to expand capacity and made a new raid5 array on the new drives (the old setup had a 4 disk raid5 array, the new a 6 disk raid5 array).  Obviously, the version of sendmail and dovecot have been updated during the f9-f11 update, but they are close enough that I was able to simply reuse my old config files.  The only major difference between the old server and the new is the switch from ext3 to ext4.  In the 6 years or so since I've been running my own mail server over ext3, I can't recall any problems like I've had recently with ext4, while this new setup has resulted in at least 5 known mail file corruptions so far.

That being said, I'm more than happy to give you my dovecot config and sendmail config files since they might help you track this down, but so you know these are known working configurations on ext3.

Comment 16 Eric Sandeen 2009-07-23 16:52:05 UTC
Sigh, dropped a 0 on the size, sorry.  Ok well it's still not a sparse file, and also the 0s are not aligned on any extent boundaries.  ;)

Yeah I'm not all that inclined to implicate dovecot or sendmail, but running an imap stress-test over ext4 might be a way to replicate the bug.

Thanks for the info; sorry about the misreading :)

-Eric

Comment 17 Doug Ledford 2009-07-23 20:41:19 UTC
I'm not sure an IMAP stress test will help.  I think you need a combination IMAP/procmail stress test to really get to things.  The basic operation is that you have an IMAP mbox file with some number of emails.  On occasion, you purge the deleted emails (I have no idea if dovecot mmap's the file and rewrites its own memory and then flushes the changes or what it does, but it does it somehow).  Simultaneously, you have new mails being delivered.  Obviously, there is locking between a dovecot flush and a procmail append operation, but it's likely that somewhere in this locking is the problem.

Would it help if I did a raw grep of the hard disk looking for the missing email messages?  If they are found, it would indicate that the blocks got redirected to somewhere they shouldn't (or the write from dovecot was supposed to go to these blocks instead of writing over the original blocks, but never made it).  If it isn't found, then it's a clear indication that things got overwritten in some way.  I suspect the data is there on the disk somewhere, and that instead of overwriting in place the ext4 code attempted to do a new block write, but that the new data simply never made it to the disk.  Of course, that's just my guess.

Comment 18 Eric Sandeen 2009-07-23 21:25:11 UTC
(In reply to comment #17)

> Would it help if I did a raw grep of the hard disk looking for the missing
> email messages?

Sure, it'd be a clue ...

-Eric

Comment 19 Doug Ledford 2009-07-28 13:40:00 UTC
I had a new instance of corruption show up today (I'm making regular reviews of the mail spool files and IMAP folders for all accounts now).  This one, unless I can work around it, will wipe out the wife's entire Inbox :-(

Comment 20 Ric Wheeler 2009-07-28 13:56:59 UTC
Doug, while we try to reproduce, it might be an interesting test to try and use XFS for your mail spool - it has delayed allocation and other similar features to ext4. If the corruption issues show up with XFS as well, it will help point us in a different direction (app issues? non-fs corruption? )

Comment 21 Eric Sandeen 2009-07-28 14:28:33 UTC
Doug, any time you find this corruption, can you provide details on where the 0s were in the file?

I'm wondering, were the 0s where there was previously good data, or is it looking like fresh data that should have been written?  Oops, putting on my glasses ;) you said:

"Usually, those blocks of NULL characters have overwritten previous data and new data is appended beyond the NULL stream."

Is that still the case?  Is there any apparent pattern or consistency to this overwriting?

Did the grep through the disk find anything interesting?

Comment 22 Doug Ledford 2009-07-28 17:10:22 UTC
Regarding switching the mail spool to xfs, I would have to shrink my / partition, free up some space on the disks, create a new raid array, put xfs on it, and mount it as the mail spool, plus do additional mounts in user's home directories for the IMAP mailboxes.  Not a small task :-(

As for the corruption, the NULL bytes can happen in the middle of existing emails.  However, when they do, they usually wipe out far more than just the one email they start in the middle of.  The latest corruption looks like this:

/var/spool/mail/jessica
0001000 0000 0000 0000 0000 0000 0000 0000 0000
*
000b000 7274 6e6f 3e67 4c20 6f6f 206b 7075 7920
--
000c000 0000 0000 0000 0000 0000 0000 0000 0000
*
0028000 7720 7469 2068 5345 544d 2050 6469 6e20

So, the initial NULL bytes is a smallish length (0xa000), but it's right at the beginning of the file.  It falls within the boundary of the second email in the spool file, and the transition looks like this in vi:

(Mostly that would^@^@^@

....

        <a href="http

The email that starts the corruption is definitely a different mail than the one where the corruption ends.

The next spot of corruption starts in yet another email (not far away) and looks like this:

Received: from mail1.no^@^@^@

...

        (version=TLSv1/SSLv3 

Even though this corruption begins in a mail header and ends in a mail header, they definitely are not the same message.


Here's the filefrag output on this spool file as it stands:

[root@firewall ~]# filefrag -v /var/spool/mail/jessica 
Checking /var/spool/mail/jessica
Filesystem type is: ef53
Filesystem cylinder groups is approximately 18299
File is stored in extents format
Blocksize of file /var/spool/mail/jessica is 4096
File size of /var/spool/mail/jessica is 1430389 (350 blocks)
First block: 24416256
Last block: 24416605
/var/spool/mail/jessica: 1 extent found
[root@firewall ~]# 


And here's the debug output as it stands:

[root@firewall ~]# debugfs /dev/md/root
debugfs 1.41.4 (27-Jan-2009)
stat /var/spool/mail/jessica
debugfs:  stat /var/spool/mail/jessica
Inode: 5384004   Type: regular    Mode:  0660   Flags: 0x80000
Generation: 3103287788    Version: 0x00000000:00000001
User:   501   Group:    12   Size: 1425738
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 2800
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x4a6f3000:0d4152ec -- Tue Jul 28 13:06:08 2009
 atime: 0x4a692e4d:246c438c -- Thu Jul 23 23:45:17 2009
 mtime: 0x4a6f3000:0d4152ec -- Tue Jul 28 13:06:08 2009
crtime: 0x4a380231:36390498 -- Tue Jun 16 16:36:01 2009
Size of extra inode fields: 28
Extended attributes stored in inode body: 
  selinux = "system_u:object_r:mail_spool_t:s0\000" (34)
BLOCKS:
(IND):44892265, (0-348):24416256-24416604
TOTAL: 350

debugfs:  


So, the first 4k of the file is good, then a break, then another good 4k block, then another big break.  If I had to guess, I would almost say this looks like on some sort of defrag reallocation, where the fs moves data blocks to avoid fragmentation in a file that has lots of deletes in the middle of the file data, the data is not getting moved from old blocks to new blocks properly.  But, that's just a guess on my part.

Comment 25 Ric Wheeler 2009-07-29 13:58:25 UTC
What helped me debug this kind of issue in the past (different distro/fs/app) was getting a trace of the IO related system calls. We are only interested in the ones that could change the state of the file (open/creat/lseek/write/mmap). Before starting, verify that all of the dovecot mail files are clean, we can stop the trace once you see any zero's and then search backwards in the trace for the last touch.

If you have not done the dump/restore yet, any chance you have enough time/space to store a massive trace :-) ?

Comment 26 Ric Wheeler 2009-07-29 14:19:58 UTC
One other question - how do we syncrhonize the RAID5 remirroring IO with FS IO? I wonder if delalloc and the remirror operation has opened a window here where MD might zero blocks that should have valid data?

Comment 28 Doug Ledford 2009-07-29 21:58:35 UTC
What are you calling "remirroring I/O"?

Comment 29 Ric Wheeler 2009-07-29 22:09:56 UTC
In comment 13 above (https://bugzilla.redhat.com/show_bug.cgi?id=513221#c13), you mentioned the original setup zeros data. Probably not related, but I do wonder if delayed allocation could play tricks with the MD threads scheduling this kind of zeroing.

One interesting question is did you see any files not related to the dovecot that had this corruption (again, assuming you would notice)?

Comment 30 Doug Ledford 2009-07-29 23:03:30 UTC
Ah, ok.  The answer to that is no.  The raid subsystem actually does a reconstruct on drive initialization, I said it zeroed the blocks because I knew the drives were brand new and it was going to read all zeros and construct parity to match.  However, any time a write comes in, whether or not we've completed the reconstruction, the write goes to the correct data blocks on the disks and when we get around to those blocks we read back from disk what was written.  So, no race condition.  Actually, this description is still somewhat simplified from reality in that it describes a resync when what we do on raid creation is a reconstruction, but it gets the point across.

As for other files, I didn't have an easy means of checking them for corruption.  I could run an rpm verify, but that would only check rpm installed files, which are mostly static and probably wouldn't have this problem.  Data, such as the files in a git repo, would eventually get caught by a hash/data mismatch.  Other data files, like the ones I keep in /srv/Fedora that are mirrors of upstream get overwritten and so likely wouldn't stay corrupted even if they are today.  Other data files, like my music collection, I wouldn't know are corrupted until I tried to play them, and that's over 7GB of music files so it would take over 7 days of 24 hour a day listening to check them all :-/

Comment 31 Eric Sandeen 2009-07-30 19:32:43 UTC
Ok, just a whim.

I see that dovecot actually does use posix_fallocate in places:

int file_set_size(int fd, off_t size)
{
...
#ifdef HAVE_POSIX_FALLOCATE
        if (posix_fallocate_supported) {
                if (posix_fallocate(fd, st.st_size, size - st.st_size) == 0)
                        return 0;
...

in f11, posix_fallocate on ext4 will actually call sys_fallocate, which creates unwritten extents.  For unwritten extents, on-disk data is not exposed because they are flagged as uninitialized, and read back as 0s.

It would be interesting to see if, possibly, your 0s are due to that range of blocks being marked as unwritten.  It's a longshot, but ...

http://sandeen.fedorapeople.org/utilities/fiemap-test-new.c

is a bit of a hacky fiemap exerciser; unwritten extents will be shown in flags:

[root@inode tmp]# dd if=/dev/zero of=testfile bs=4k count=1
[root@inode tmp]# /root/fallocate -o 4k -l 4k testfile
[root@inode tmp]# ./fiemap-test-new -S testfile
ext:   0 logical: [       0..       0] phys:  1323743.. 1323743 flags: 0x000 tot: 1
ext:   1 logical: [       1..       1] phys:  1323739.. 1323739 flags: 0x801 tot: 1

0x800 is preallocated.

Pointing the fiemap tool at a corrupted file *might* be interesting.

-Eric

Comment 32 Eric Sandeen 2009-07-30 19:35:29 UTC
Oh, and -if- by chance an extent shows up as preallocated/unwritten, then using the physical block number to go and read directly from disk may be the next step.

-Eric

Comment 34 Tim Eliseo 2009-10-18 09:17:00 UTC
I have also seen this corruption several times in an mbox file written only by procmail (via sendmail) and dovecot (accessed using POP3). A compare with a pre-corruption backup of the file a few days earlier shows that the first 4096 bytes are set to NUL, with no other corruption. At 4096 bytes is not a message boundary, or even a line boundary. There were some system reboots between the time of the good backup and when I noticed the corruption, but none were unclean shutdowns.

The filesystem was originally created as ext3, and then converted to ext4 a while ago by adding the new options.

The sendmail and dovecot configs are pretty close to stock, with no changes in procmail or dovecot file handling options. Other info in the log below. I'm sorry that I was unable to try reading from the preallocated physical block 26161055 before I fixed the file by rewriting the correct data (maybe next time).

I might be willing to try a system call trace, if there is not a lot of overhead. Please let me know exactly how you'd like to see this set up.

(Yes, I know that the filesystem on /dev/md1 is much smaller than the volume.)

---
# uname -r
2.6.30.8-64.fc11.x86_64
# cd /var/spool/mail
# ls -ls paula
3852 -rw-rw----. 1 paula paula 3936926 Oct 18 01:02 paula
# fiemap-test-new -vv paula
Printing values in 4096-byte-block units
Mapping 18446744073709551615 bytes from offset 0
Input:  start 0 length 18446744073709551615 flags 0x0 count 32
Output: start 0 length 18446744073709551615 flags 0x0 count 32 mapped 5
ext:   0 logical: [       0..       0] phys: 26161055..26161055 flags: 0x800 tot: 1
ext:   1 logical: [       1..     127] phys: 26161056..26161182 flags: 0x000 tot: 127
ext:   2 logical: [     128..     243] phys: 26149376..26149491 flags: 0x000 tot: 116
ext:   3 logical: [     244..     511] phys: 26214956..26215223 flags: 0x000 tot: 268
ext:   4 logical: [     512..     961] phys: 43226112..43226561 flags: 0x001 tot: 450
# debugfs /dev/md1
debugfs 1.41.4 (27-Jan-2009)
debugfs:  stat /var/spool/mail/paula
Inode: 5322715   Type: regular    Mode:  0660   Flags: 0x80000
Generation: 4234752118    Version: 0x00000000:00000001
User:  1065   Group:  1065   Size: 3936926
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 7704
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x4adacb95:4bd9f51c -- Sun Oct 18 01:02:29 2009
 atime: 0x4adacc6a:372b140c -- Sun Oct 18 01:06:02 2009
 mtime: 0x4adacb95:4bd9f51c -- Sun Oct 18 01:02:29 2009
crtime: 0x4a960276:c16641bc -- Wed Aug 26 20:50:14 2009
Size of extra inode fields: 28
Extended attributes stored in inode body: 
  selinux = "unconfined_u:object_r:default_t:s0\000" (35)
BLOCKS:
(IND):20021749, (0-127):26161055-26161182, (128-243):26149376-26149491, (244-511):26214956-26215223, (512-961):43226112-432226561
TOTAL: 963

debugfs:  # 
# od -Ax -c paula | head
000000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
001000   o   n   t   e   n   t   -   T   y   p   e   :       t   e   x
001010   t   /   h   t   m   l   ;  \n  \t   c   h   a   r   s   e   t
001020   =   "   U   S   -   A   S   C   I   I   "  \n   C   o   n   t
001030   e   n   t   -   T   r   a   n   s   f   e   r   -   E   n   c
001040   o   d   i   n   g   :       q   u   o   t   e   d   -   p   r
001050   i   n   t   a   b   l   e  \n  \n   <   h   t   m   l       x
001060   m   l   n   s   :   v   =   3   D   "   u   r   n   :   s   c
001070   h   e   m   a   s   -   m   i   c   r   o   s   o   f   t   -
# df .
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md1             412990128 225527976 166494148  58% /
# mdadm -D /dev/md1
/dev/md1:
        Version : 1.00
  Creation Time : Tue Feb  3 05:41:05 2009
     Raid Level : raid10
     Array Size : 1744112384 (1663.32 GiB 1785.97 GB)
  Used Dev Size : 872056192 (831.66 GiB 892.99 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sun Oct 18 01:00:35 2009
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : offset=2
     Chunk Size : 64K

           Name : root
           UUID : a6be9de2:11f8f516:463ae293:3729c4a4
         Events : 66910

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       22        1      active sync   /dev/sdb6
       2       8       35        2      active sync   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3
# dumpe2fs -h /dev/md1
dumpe2fs 1.41.4 (27-Jan-2009)
Filesystem volume name:   /
Last mounted on:          <not available>
Filesystem UUID:          7cbef9a4-c767-4841-8389-e5e234a16b68
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              25651200
Block count:              104857600
Reserved block count:     5242001
Free blocks:              46865504
Free inodes:              24266395
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      226
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8016
Inode blocks per group:   501
RAID stride:              16
Filesystem created:       Thu Nov  6 03:52:33 2008
Last mount time:          Sat Oct 17 20:45:37 2009
Last write time:          Sat Oct 17 20:45:37 2009
Mount count:              37
Maximum mount count:      -1
Last checked:             Wed Oct  7 03:43:41 2009
Check interval:           0 (<none>)
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
First orphan inode:       114603
Default directory hash:   half_md4
Directory Hash Seed:      b5d3063b-46f7-437e-9baf-c3158dcd4369
Journal backup:           inode blocks
Journal size:             128M

# diff /etc/dovecot.conf{.dist,}
20c20
< #protocols = imap imaps pop3 pop3s
---
> protocols = imap imaps pop3 pop3s
42c42
< #listen = *, [::]
---
> listen = *, [::]
95,96c95,96
< #ssl_cert_file = /etc/pki/dovecot/certs/dovecot.pem
< #ssl_key_file = /etc/pki/dovecot/private/dovecot.pem
---
> ssl_cert_file = /etc/pki/dovecot/certs/dovecot.pem
> ssl_key_file = /etc/pki/dovecot/private/dovecot.pem
#

Comment 35 Tim Eliseo 2009-10-22 22:49:17 UTC
I have seen this corruption again, with exactly the same symptom, on the same file. This time, I have read the raw disk data underlying the extent flagged erroneously as FIEMAP_EXTENT_UNWRITTEN and determined that it does contain the proper data that previously appeared at that place in the file (that is, the file had correct data, and then somehow the flag got set on an extent).

I'm now running strace on dovecot to try to catch the next corruption. The common sequence seems to be:

1. procmail appends messages to the file, extending it.

2. /usr/libexec/dovecot/pop3 reads messages from the file, and then uses ftruncate() to delete all but the first sentinel message. In this case, the file is left at 530 bytes, with one block in one extent allocated. Note that since the file never becomes empty (0 bytes), the first block has had the same block number through many cycles of append and truncate, and at least two cycles of corruption and repair.

3. Once the corruption occurs, dovecot refuses to use the file, conveniently preserving its state. procmail continues to append to it, however.

A log is below, slightly edited for privacy. It shows the erroneous state, then a file repair, then the state after.

---
# cd /var/spool/mail
# fiemap-test-new -vv paula
Printing values in 4096-byte-block units
Mapping 18446744073709551615 bytes from offset 0
Input:  start 0 length 18446744073709551615 flags 0x0 count 32
Output: start 0 length 18446744073709551615 flags 0x0 count 32 mapped 10
ext:   0 logical: [       0..       0] phys: 26161055..26161055 flags: 0x800 tot: 1
ext:   1 logical: [       1..      13] phys: 24102136..24102148 flags: 0x000 tot: 13
ext:   2 logical: [      14..      14] phys: 24102186..24102186 flags: 0x000 tot: 1
ext:   3 logical: [      15..      69] phys: 24084981..24085035 flags: 0x000 tot: 55
ext:   4 logical: [      70..     105] phys: 24085376..24085411 flags: 0x000 tot: 36
ext:   5 logical: [     106..     511] phys: 26870352..26870757 flags: 0x000 tot: 406
ext:   6 logical: [     512..    1023] phys: 26871296..26871807 flags: 0x000 tot: 512
ext:   7 logical: [    1024..    2047] phys: 43353088..43354111 flags: 0x000 tot: 1024
ext:   8 logical: [    2048..    4095] phys: 63455232..63457279 flags: 0x000 tot: 2048
ext:   9 logical: [    4096..    5071] phys: 63461376..63462351 flags: 0x001 tot: 976
# debugfs /dev/md1
debugfs 1.41.4 (27-Jan-2009)
debugfs:  stat /var/spool/mail/paula
Inode: 5322715   Type: regular    Mode:  0660   Flags: 0x80000
Generation: 4234752118    Version: 0x00000000:00000001
User:  1065   Group:  1065   Size: 20774824
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 40584
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x4ae0ac06:6df02d80 -- Thu Oct 22 12:01:26 2009
 atime: 0x4ae0aa0a:528cc5d4 -- Thu Oct 22 11:52:58 2009
 mtime: 0x4ae0ac06:6df02d80 -- Thu Oct 22 12:01:26 2009
crtime: 0x4a960276:c16641bc -- Wed Aug 26 20:50:14 2009
Size of extra inode fields: 28
Extended attributes stored in inode body: 
  selinux = "unconfined_u:object_r:default_t:s0\000" (35)
BLOCKS:
(IND):20021749, (0):26161055, (1-13):24102136-24102148, (14):24102186, (15-69):24084981-24085035, (70-105):24085376-24085411, (106-511):26870352-26870757, (512-1023):26871296-26871807, (1024-2047):43353088-43354111, (2048-4095):63455232-63457279, (4096-5071):63461376-63462351
TOTAL: 5073

debugfs:  # 
# od -Ax -c paula |head
000000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
001000       i   d       n   9   K   J   I   D   T   6   0   1   3   8
001010   2   1  \n  \t   f   o   r       <   _   _   _   _   _   _   _
001020   _   _   _   @   _   _   _   _   _   _   _   _   _   _   _   _
001030   .   _   _   _   >   ;       T   u   e   ,       2   0       O
001040   c   t       2   0   0   9       1   2   :   1   8   :   1   3
001050       -   0   7   0   0  \n   E   n   v   e   l   o   p   e   -
001060   T   o   :       <   _   _   _   _   _   _   _   _   _   _   @
001070   _   _   _   _   _   _   _   _   _   _   _   _   .   _   _   _
# dd if=/dev/md1 bs=4096 skip=26161055 count=1 | od -Ax -c | tail
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.00264496 s, 1.5 MB/s
000f70   s   t   -   m   a   i   l   .   c   o   m   >  \n   R   e   c
000f80   e   i   v   e   d   :       f   r   o   m       p   e   a   k
000f90   m   a   i   l   2   .   v   e   c   t   o   r   v   e   s   t
000fa0   .   l   o   c   a   l       (   m   a   i   l   .   v   e   c
000fb0   t   o   r   v   e   s   t   -   m   a   i   l   .   c   o   m
000fc0       [   6   6   .   1   2   9   .   7   2   .   2   2   8   ]
000fd0   )  \n  \t   b   y       _   _   _   _   _   _   _   _   _   _
000fe0   _   .   _   _   _       (   8   .   1   4   .   3   /   8   .
000ff0   1   4   .   3   )       w   i   t   h       E   S   M   T   P
001000
# dd if=/dev/md1 bs=4096 skip=26161055 count=1 | dd of=paula bs=4096 conv=notrunc
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 2.2419e-05 s, 183 MB/s
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 4.2116e-05 s, 97.3 MB/s
# fiemap-test-new -vv paula
Printing values in 4096-byte-block units
Mapping 18446744073709551615 bytes from offset 0
Input:  start 0 length 18446744073709551615 flags 0x0 count 32
Output: start 0 length 18446744073709551615 flags 0x0 count 32 mapped 10
ext:   0 logical: [       0..       0] phys: 26161055..26161055 flags: 0x000 tot: 1
ext:   1 logical: [       1..      13] phys: 24102136..24102148 flags: 0x000 tot: 13
ext:   2 logical: [      14..      14] phys: 24102186..24102186 flags: 0x000 tot: 1
ext:   3 logical: [      15..      69] phys: 24084981..24085035 flags: 0x000 tot: 55
ext:   4 logical: [      70..     105] phys: 24085376..24085411 flags: 0x000 tot: 36
ext:   5 logical: [     106..     511] phys: 26870352..26870757 flags: 0x000 tot: 406
ext:   6 logical: [     512..    1023] phys: 26871296..26871807 flags: 0x000 tot: 512
ext:   7 logical: [    1024..    2047] phys: 43353088..43354111 flags: 0x000 tot: 1024
ext:   8 logical: [    2048..    4095] phys: 63455232..63457279 flags: 0x000 tot: 2048
ext:   9 logical: [    4096..    5071] phys: 63461376..63462351 flags: 0x001 tot: 976
# debugfs /dev/md1
debugfs 1.41.4 (27-Jan-2009)
debugfs:  stat /var/spool/mail/paula
Inode: 5322715   Type: regular    Mode:  0660   Flags: 0x80000
Generation: 4234752118    Version: 0x00000000:00000001
User:  1065   Group:  1065   Size: 20774824
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 40584
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x4ae0ad17:bfeca90c -- Thu Oct 22 12:05:59 2009
 atime: 0x4ae0ad38:aff3e468 -- Thu Oct 22 12:06:32 2009
 mtime: 0x4ae0ad17:bfeca90c -- Thu Oct 22 12:05:59 2009
crtime: 0x4a960276:c16641bc -- Wed Aug 26 20:50:14 2009
Size of extra inode fields: 28
Extended attributes stored in inode body: 
  selinux = "unconfined_u:object_r:default_t:s0\000" (35)
BLOCKS:
(IND):20021749, (0):26161055, (1-13):24102136-24102148, (14):24102186, (15-69):24084981-24085035, (70-105):24085376-24085411, (106-511):26870352-26870757, (512-1023):26871296-26871807, (1024-2047):43353088-43354111, (2048-4095):63455232-63457279, (4096-5071):63461376-63462351
TOTAL: 5073

debugfs:  # 
#

Comment 36 Eric Sandeen 2009-10-23 01:40:41 UTC
oh!  Ok, this is the first time, I think, that we've determined that it really is flagged as unwritten extents...  That's a huge clue, and narrows down the problem significantly, thank you!

I wonder if there's a way to turn that off or short-circuit it for the time being, just to save your mail spool.

I have a version of fsx that does random fallocates; perhaps i can tweak it to do it more intensively and see if fsx can ever hit a corruption that way, or maybe after taking a hard look at your description of the IO we can come up with a testcase ...

Otherwise maybe tracing read/write/fallocate/truncates would give us a clue, though I hate to sacrifice any more mail spools for this ...

Comment 37 Eric Sandeen 2009-10-23 04:26:41 UTC
Any chance you have errors like "Error: Corrupted index cache file" in dovecot error logs (maybe syslog?)

Comment 38 Eric Sandeen 2009-10-23 05:09:22 UTC
On 2.6.30, running dovecot's imaptest against ext4 gives me the corrupted index cache file messages above.

However, it only seems to show up on sub-page blocks; so far 4k blocks seem ok, and I expect you guys all had 4k blocks.  It was also ok w/ 1k blocks and nodelalloc....

In any case, 2.6.31 seems to run fine.  Perhaps the problem is related; I'll see what fixed the sub-page-block case in .31 and see if it seems relevant...

Ok, that particular bug is likely to be

c364b22c9580a885e0f8c0d0f9710d67dc448958 ext4: Fix mmap/truncate race when blocksize < pagesize && delayed allocation

though I'm not sure that's going to fix the original case here.

Comment 39 Doug Ledford 2009-10-25 05:27:58 UTC
Don't let yourself get too excited about the "Error: Corrupted index cache file" messages from dovecot.  I get them on ext3, ext4, and xfs.  They do not, however, affect operation.  I have removed the supposedly corrupt index files entirely so that dovecot could remake them and had them return in less than 24 hours.  This appears to be an issue with dovecot that has nothing to do with the underlying file system.  However, this issue has never resulted in a corrupt mail box or data, just warning messages in syslog (and possibly slower performance although I wouldn't know that without a better benchmarking system).

Comment 40 Mikkel Lauritsen 2009-10-26 12:09:49 UTC
I'd just like to add a "me too"...

07ffff0 6564 3768 7569 5246 3465 7934 6f74 3556
0800000 0000 0000 0000 0000 0000 0000 0000 0000
*
2478130 7246 6d6f 6c20 6e69 7875 762d 7861 622d
2478140 756f 636e 7365 6d40 6961 2e6c 6570 6772

It basically seems as if the mbox file was overwritten from 0800000 and onwards; the text at 2478130 is a new mail that was received after the problem occurred.

Fortunately it was my own inbox (a little less than two years' worth of mail :-( ) and not the old missus', as the latter would probably have caused the immediate termination of myself.

This is on kernel 2.6.30.8-64.fc11.i686.PAE with postfix-2.5.6-3.fc11.i586 and dovecot-1.2.6-1_101.fc11.i586 ; the fs is ext4 with default options.

Comment 41 Mikkel Lauritsen 2009-10-26 12:16:42 UTC
Managed to forget one thing:

The only possible lead I have as to what might have caused the error is that it may have happened as the dovecot rpm was upgraded from to 1.2.6-1_101 from the last version before that. The log files don't show any apparent problems; it seems as if dovecot was shut down and restarted cleanly.

Comment 42 Eric Sandeen 2009-10-26 14:54:27 UTC
Mikkel, if you still have the file around - and for anyone else who hits this - if you could point that fiemap-test application from comment #31 at the file in question, with -vv option, we can confirm that it is an unwritten extent problem.

Thanks,
-Eric

Comment 43 Mikkel Lauritsen 2009-10-26 16:27:13 UTC
Sure - here you go:

[mikkel@server ~]$ ./fiemap-test-new -vv /var/spool/mail/mikkel 
Printing values in 4096-byte-block units
Mapping 18446744073709551615 bytes from offset 0
Input:  start 0 length 18446744073709551615 flags 0x0 count 32
Output: start 0 length 18446744073709551615 flags 0x0 count 32 mapped 2
ext:   0 logical: [       0..    1305] phys:   155648..  156953 flags: 0x800 tot: 1306
ext:   1 logical: [    1306..    1311] phys:   156954..  156959 flags: 0x001 tot: 6

Comment 44 Eric Sandeen 2009-10-26 16:35:53 UTC
Yepper, 0x800 is the preallocated/unwritten flag, so that first range will just report 0s.

You might try what the other reporter did in comment #35, and do a raw dd of those physical (4k blocks from 155648 to 156959) blocks - your data may well still be there, and would be a way to recover your mbox.

If it does look intact that would be worth reporting back too, thanks!

-Eric

Comment 45 Mikkel Lauritsen 2009-10-26 19:16:42 UTC
As it turns out, something (dovecot, probably) had rewritten the mbox file to remove the zero'ed block, so I wasn't able to recover anything. Oh well. I guess the lesson to learn is that if you end up with a corrupt inbox, don't touch it until you've tried recovering using the hack discussed above :-P

Comment 46 Eric Sandeen 2009-10-28 18:26:29 UTC
Tim, do you have those traces to share?  Are you doing full traces or just some operations?  I think creat/open/close/sync/mmap/write/fallocate/truncate would probably be sufficient...

Thanks,
-Eric

Comment 47 Tim Eliseo 2009-11-01 10:36:20 UTC
Created attachment 367000 [details]
strace excerpt of Dovecot file access before/after corruption

I was doing (almost) full straces of Dovecot with the following command:

   strace -s128 -q -e!epoll_wait -tt -v -f

A trace of all syscalls outputs a trace line every second from the network listen process; excluding the tracing of epoll_wait() omits this clutter.

I had another file corruption while the trace was running, but on a different mail file than before. The trace excerpt shows the last good access to the file, when Dovecot updates data in the sentinel message in the first file block (lines 21-22), right after the ftruncate(). The next access several hours later shows that the first block reads as NULs (line 111). Note (line 100) that the atime has the timestamp of the read on line 21, so presumably procmail was the only other accessor in between, appending the file.

As before, the raw disk block still contains the file's correct block 0 data. Note that the value in the sentinel message (X-IMAP: 1251348499 0000026887) matches what was written in line 22 of the trace.

This data doesn't seem to help us in knowing when the corruption occurs. I see three options:

1. Dovecot triggered the setting of the unwritten flag. Either it happened after line 21, or somehow the block data was read (instead of returning NULs) despite the flag (from a cache?).

2. procmail triggered the setting of the unwritten flag. The simple method of tracing procmail would require tracing sendmail and all children, which is far too much overhead for us. I have some alternate ideas.

3. Something else happened, unrelated to dovecot or procmail. During the period in between the two trace times an rsync backup process ran which seemed to be hung for a long time reading some sparse VM image files of around 100 GB. I don't have any more info on that one yet.

Below is a log of the file state, and it's repair. One peculiar thing is that the flags for the first block still show 0x800 immediately after the dd write, but then show as 0 a short time later. Is this a journal/inode sync issue?

---
# cd /var/spool/mail/ralph
# fiemap-test-new -vv ralph
Printing values in 4096-byte-block units
Mapping 18446744073709551615 bytes from offset 0
Input:  start 0 length 18446744073709551615 flags 0x0 count 32
Output: start 0 length 18446744073709551615 flags 0x0 count 32 mapped 9
ext:   0 logical: [       0..       0] phys: 37507421..37507421 flags: 0x800 tot: 1
ext:   1 logical: [       1..    2148] phys: 42766313..42768460 flags: 0x000 tot: 2148
ext:   2 logical: [    2149..    2815] phys: 59170848..59171514 flags: 0x000 tot: 667
ext:   3 logical: [    2816..    2883] phys: 59171564..59171631 flags: 0x000 tot: 68
ext:   4 logical: [    2884..    2884] phys: 59171660..59171660 flags: 0x000 tot: 1
ext:   5 logical: [    2885..    2947] phys: 59171662..59171724 flags: 0x000 tot: 63
ext:   6 logical: [    2948..    2987] phys: 62042174..62042213 flags: 0x000 tot: 40
ext:   7 logical: [    2988..    3460] phys: 62032790..62033262 flags: 0x000 tot: 473
ext:   8 logical: [    3461..    3771] phys: 62036151..62036461 flags: 0x001 tot: 311
# dd if=/dev/md1 bs=4096 skip=37507421 count=1 | dd of=ralph bs=4096 conv=notrunc
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 2.3118e-05 s, 177 MB/s
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 5.8109e-05 s, 70.5 MB/s
# fiemap-test-new -vv ralph
Printing values in 4096-byte-block units
Mapping 18446744073709551615 bytes from offset 0
Input:  start 0 length 18446744073709551615 flags 0x0 count 32
Output: start 0 length 18446744073709551615 flags 0x0 count 32 mapped 9
ext:   0 logical: [       0..       0] phys: 37507421..37507421 flags: 0x800 tot: 1
ext:   1 logical: [       1..    2148] phys: 42766313..42768460 flags: 0x000 tot: 2148
ext:   2 logical: [    2149..    2815] phys: 59170848..59171514 flags: 0x000 tot: 667
ext:   3 logical: [    2816..    2883] phys: 59171564..59171631 flags: 0x000 tot: 68
ext:   4 logical: [    2884..    2884] phys: 59171660..59171660 flags: 0x000 tot: 1
ext:   5 logical: [    2885..    2947] phys: 59171662..59171724 flags: 0x000 tot: 63
ext:   6 logical: [    2948..    2987] phys: 62042174..62042213 flags: 0x000 tot: 40
ext:   7 logical: [    2988..    3460] phys: 62032790..62033262 flags: 0x000 tot: 473
ext:   8 logical: [    3461..    3771] phys: 62036151..62036461 flags: 0x001 tot: 311
# head ralph
From MAILER_DAEMON  Mon Sep 28 11:28:14 2009
Date: Mon, 28 Sep 2009 11:28:14 -0700
From: Mail System Internal Data <MAILER-DAEMON@_____>
Subject: DON'T DELETE THIS MESSAGE -- FOLDER INTERNAL DATA
Message-ID: <1254162494@_____>
X-IMAP: 1251348499 0000026887
Status: RO

This text is part of the internal format of your mail folder, and is not
a real message.  It is created automatically by the mail system software.
# fiemap-test-new -vv ralph
Printing values in 4096-byte-block units
Mapping 18446744073709551615 bytes from offset 0
Input:  start 0 length 18446744073709551615 flags 0x0 count 32
Output: start 0 length 18446744073709551615 flags 0x0 count 32 mapped 9
ext:   0 logical: [       0..       0] phys: 37507421..37507421 flags: 0x000 tot: 1
ext:   1 logical: [       1..    2148] phys: 42766313..42768460 flags: 0x000 tot: 2148
ext:   2 logical: [    2149..    2815] phys: 59170848..59171514 flags: 0x000 tot: 667
ext:   3 logical: [    2816..    2883] phys: 59171564..59171631 flags: 0x000 tot: 68
ext:   4 logical: [    2884..    2884] phys: 59171660..59171660 flags: 0x000 tot: 1
ext:   5 logical: [    2885..    2947] phys: 59171662..59171724 flags: 0x000 tot: 63
ext:   6 logical: [    2948..    2987] phys: 62042174..62042213 flags: 0x000 tot: 40
ext:   7 logical: [    2988..    3460] phys: 62032790..62033262 flags: 0x000 tot: 473
ext:   8 logical: [    3461..    3771] phys: 62036151..62036461 flags: 0x001 tot: 311
#

Comment 48 Eric Sandeen 2009-11-01 14:13:17 UTC
Interesting, thanks - were there no fallocate calls in the trace?

Comment 49 Tim Eliseo 2009-11-01 21:34:35 UTC
The only fallocate call by processes 22115 or 26139 was on the file "/home/ralph/mail/.imap/INBOX/dovecot.index.cache". I presume this is not relevant, as this file was not corrupted.

I'm sorry I can't post a large trace file, as it is too difficult to sanitize. If you're looking for anything else specific, please let me know.

Comment 50 Mikkel Lauritsen 2009-11-02 08:51:46 UTC
Now it ate the almost the entire rest of my inbox :-/ This time I restarted the server (cleanly).

[mikkel@server ~]$ hexdump /var/spool/mail/mikkel
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
051a000 382e 532f 6275 696d 2974 6920 2064 396e
051a010 4751 735a 366b 3130 3637 3138 0a3b 4d09
[...]

[mikkel@server ~]$ ./fiemap-test-new -vv /var/spool/mail/mikkel 
Printing values in 4096-byte-block units
Mapping 18446744073709551615 bytes from offset 0
Input:  start 0 length 18446744073709551615 flags 0x0 count 32
Output: start 0 length 18446744073709551615 flags 0x0 count 32 mapped 4
ext:   0 logical: [       0..    1305] phys:   155648..  156953 flags: 0x800 tot: 1306
ext:   1 logical: [    1306..    1314] phys:   156954..  156962 flags: 0x000 tot: 9
ext:   2 logical: [    1315..    1366] phys: 11888901..11888952 flags: 0x000 tot: 52
ext:   3 logical: [    1367..    1370] phys: 11909120..11909123 flags: 0x001 tot: 4

Comment 51 Mikkel Lauritsen 2009-11-02 12:51:38 UTC
FWIW, in this case the hint from comment #44 about restoring the missing data using dd worked fine.

Comment 52 Eric Sandeen 2009-11-09 21:36:13 UTC
Ok, focusing more on this bug now, I had to sort out an upstream corruption regression in .32 ... sorry.  I'll look over your logs etc and see if there's anything useful and/or more I can ask for ...

-Eric

Comment 53 Eric Sandeen 2009-11-09 22:10:44 UTC
(In reply to comment #47)
> This data doesn't seem to help us in knowing when the corruption occurs. I see
> three options:
> 
> 1. Dovecot triggered the setting of the unwritten flag. Either it happened
> after line 21, or somehow the block data was read (instead of returning NULs)
> despite the flag (from a cache?).

well, we don't see any allocation calls on this file ... 

> 2. procmail triggered the setting of the unwritten flag. The simple method of
> tracing procmail would require tracing sendmail and all children, which is far
> too much overhead for us. I have some alternate ideas.

I don't see any calls in procmail that would result in preallocation ...
 
> 3. Something else happened, unrelated to dovecot or procmail. During the period
> in between the two trace times an rsync backup process ran which seemed to be
> hung for a long time reading some sparse VM image files of around 100 GB. I
> don't have any more info on that one yet.

I was wondering if we possibly reused a block which had been preallocated, and somehow got "old" prealloc state; but the file in question was truncated down to 530, so its first block never should have changed... throws out that idea I guess.

I wonder if a jprobe in the kernel's fallocate path would be worth it, to log something to dmesg whenever fallocate gets called on this fs, and by which process...?

> Below is a log of the file state, and it's repair. One peculiar thing is that
> the flags for the first block still show 0x800 immediately after the dd write,
> but then show as 0 a short time later. Is this a journal/inode sync issue?

Hm, I'll have to look into when the flag gets reset when the block gets written to.

-Eric

Comment 54 Eric Sandeen 2009-11-10 21:56:07 UTC
Created attachment 368953 [details]
jprobe for ext4_fallocate

This might be helpful...

tar xvzf the tarball, and then in the ext4_falloc_probe/ directory, do:

make -C /lib/modules/`uname -r`/build M=`pwd`

(you'll need a matching kernel & kernel-devel rpm installed for the `uname -r` kernel)

That will build a .ko file, and if you insmod it you'll see messages like this on every call into ext4_fallocate:

Planted jprobe at ffffffff81225cf1, handler addr ffffffffa006d000
jprobe: ext4_fallocate by imap for inode 234972 (dovecot.index.cache), off 179200, len 5120
jprobe: ext4_fallocate by imap for inode 234972 (dovecot.index.cache), off 184320, len 1024
jprobe: ext4_fallocate by imap for inode 234972 (dovecot.index.cache), off 185344, len 7168
jprobe: ext4_fallocate by imap for inode 234972 (dovecot.index.cache), off 192512, len 1024
jprobe: ext4_fallocate by imap for inode 234972 (dovecot.index.cache), off 193536, len 1024
jprobe: ext4_fallocate by imap for inode 16 (testuser), off 12802047, len 6323

....

Not sure if this will be helpful or not, but it might be worth a shot.

Comment 55 Eric Sandeen 2009-11-13 17:21:07 UTC
I've run 20 fallocate-enabled fsx instances in parallel for 12 hours, with no corruption.  Not hitting it by random ...

Comment 56 Eric Sandeen 2009-11-16 17:34:52 UTC
This may actually resolve it:

commit a41f20716975910d9beb90b7efc61107901492b8
Author: Aneesh Kumar K.V <aneesh.kumar.ibm.com>
Date:   Wed Jun 10 14:22:55 2009 -0400

    ext4: Avoid corrupting the uninitialized bit in the extent during truncate
    
    The unitialized bit was not properly getting preserved in in an extent
    which is partially truncated because the it was geting set to the
    value of the first extent to be removed or truncated as part of the
    truncate operation, and if there are multiple extents are getting
    removed or modified as part of the truncate operation, it is only the
    last extent which will might be partially truncated, and its
    uninitalized bit is not necessarily the same as the first extent to be
    truncated.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar.ibm.com>
    Signed-off-by: "Theodore Ts'o" <tytso>


I don't think anyone has hit this in 2.6.31 and beyond, and that's where it landed.  I'll build a test kernel if anyone is willing to try it ...

-Eric

Comment 57 Fedora Update System 2009-11-18 12:16:55 UTC
kernel-2.6.30.9-99.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/kernel-2.6.30.9-99.fc11

Comment 58 Eric Sandeen 2009-11-18 17:46:46 UTC
For the folks who haven't given up on ext4 for mail spools yet, please try the kernel in comment #57, I think it should resolve this issue.

Thanks,
-Eric

Comment 59 Mikkel Lauritsen 2009-11-18 19:33:30 UTC
Thanks. Your efforts in fixing this issue are highly appreciated.

Comment 60 Eric Sandeen 2009-11-18 20:07:46 UTC
Sure thing, apologies it took so long, that fix just slipped my mind ... :(

Comment 61 Fedora Update System 2009-11-20 05:27:38 UTC
kernel-2.6.30.9-99.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-11803

Comment 62 Fedora Update System 2009-12-01 04:44:06 UTC
kernel-2.6.30.9-99.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.