Bug 515529 - ENOSPC during fsstress leads to filesystem corruption on ext2, ext3, and ext4
Summary: ENOSPC during fsstress leads to filesystem corruption on ext2, ext3, and ext4
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: x86_64
OS: Linux
low
urgent
Target Milestone: rc
: ---
Assignee: Eric Sandeen
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 526950 533192 665067
TreeView+ depends on / blocked
 
Reported: 2009-08-04 16:55 UTC by Devin Nate
Modified: 2010-12-22 16:12 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 665067 (view as bug list)
Environment:
Last Closed: 2010-03-30 07:42:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
testing script to repeatedly run tests - takes manual config to change (1.06 KB, text/plain)
2009-08-04 16:55 UTC, Devin Nate
no flags Details
Proposed patch (4.33 KB, patch)
2009-08-06 23:10 UTC, Eric Sandeen
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Devin Nate 2009-08-04 16:55:08 UTC
Created attachment 356209 [details]
testing script to repeatedly run tests - takes manual config to change

Description of problem: (copied from initial information posted into:
https://bugzilla.redhat.com/show_bug.cgi?id=494927

Corruption when using fsstress to test ext2, ext3, and ext4 on brd (ramdisk) and loop (raw disk not tested yet). Reproducible using the below script. Takes minor tweaks to change between file systems and mount commands.

Testing with:
attached test script, although the heart of it is: fsstress -d /mnt/test_file_system/work -p 3 -l 0 -n 100000000 -X. Some tests
re-run with -p 1 to remove the possibility of parallel tasks impacting the
test. fstress was retrieved from ltp at this url 2 days ago:
http://ltp.cvs.sourceforge.net/viewvc/ltp/ltp/testcases/kernel/fs/fsstress/

RHEL 5.4 beta kernel version 2.6.18-157.el5 (otherwise a stock 5.3 install). No
other processes are accessing the test_file_system.

RHEL is running as a vmware esxi 4.0 guest, 2vcpu, 2GB ram. No errors in dmesg. Also testing on RHEL 5.4 kernel on a physical box (ibm x336 with ServeRAID/ips driver), so far no difference in results.

1. fails on ext2, blocksizes 1024, 2048, 4096, when mounted on a loop device
(loop0) losetup against a file in /dev/shm (tmpfs).

2. fails on ext2, blocksizes 1024, 2048, 4096, when mounted on a ramdisk (ram0)
using the new brd driver (part of rhel 5.4).

3. fails on ext3, blocksizes 1024, 2048, 4096, when mounted on a loop device
(loop0) losetup against a file in /dev/shm (tmpfs), ext3 mount options
data=writeback and data=ordered

4. fails on ext3, blocksizes 1024, 2048, 4096, when mounted on a ramdisk (ram0)
using the new brd driver (part of rhel 5.4), ext3 mount options data=writeback
and data=ordered

5. fails on ext3, blocksize default, when mounted on a loop device (loop0)
losetup against a file in / (exiting ext3 filesystem).

6. succeeds on ext3 for the above, with data=journal mount option.

7. Basically, as best I can tell, ext3 with data=journal always works, and
nothing else does.

8. Doesn't appear to be limited to ramdisk (or at least, loop devices on the
actual hard drive fail also).

9. I tried ext4.. it also failed. I don't recall which variation I used above.
It was mounted with default mount options.

note:
"Fail" is evidenced by some combination of errors similar to:
+ e2fsck -fvp /dev/ram0
/dev/ram0: HTREE directory inode 61596 has an invalid root node.
HTREE INDEX CLEARED.
/dev/ram0: HTREE directory inode 61604 has an invalid root node.
HTREE INDEX CLEARED.
/dev/ram0: HTREE directory inode 61632 has an invalid root node.
HTREE INDEX CLEARED.
/dev/ram0: HTREE directory inode 61644 has an invalid root node.
HTREE INDEX CLEARED.
/dev/ram0: HTREE directory inode 61660 has an invalid root node.
HTREE INDEX CLEARED.
/dev/ram0: Inode 69395, i_size is 902294, should be 1906688.  FIXED.
/dev/ram0: Inode 69723, i_size is 1277952, should be 1280000.  FIXED.
/dev/ram0: Inode 69781, i_size is 0, should be 939008.  FIXED.
/dev/ram0: Inode 70098, i_size is 662593, should be 1267712.  FIXED.
/dev/ram0: Inode 70227, i_size is 0, should be 330752.  FIXED.
/dev/ram0: Inode 70418, i_size is 892928, should be 896000.  FIXED.
/dev/ram0: Inode 70541, i_size is 380928, should be 382976.  FIXED.
/dev/ram0: Inode 71272, i_size is 503808, should be 506880.  FIXED.
/dev/ram0: Inode 71663, i_size is 2002944, should be 2004992.  FIXED.
/dev/ram0: Inode 72421, i_size is 348160, should be 350208.  FIXED.
/dev/ram0: Inode 73392, i_size is 958464, should be 960512.  FIXED.
/dev/ram0: Inode 73980, i_size is 434176, should be 437248.  FIXED.
/dev/ram0: Inode 74468, i_size is 466944, should be 470016.  FIXED.
/dev/ram0: Inode 76077, i_size is 200704, should be 202752.  FIXED.
/dev/ram0: Inode 71114, i_size is 0, should be 103424.  FIXED.
/dev/ram0: Inode 70462, i_size is 0, should be 72704.  FIXED.
/dev/ram0: Entry 'c638' in /work/p0/d1/d141/d3d7/d4ed (61644) has an incorrect
filetype (was 3, should be 1).


/dev/ram0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
        (i.e., without -a or -p options)


"Success" was evidenced by no errors after repeated iterations (in my case, I
let it run a couple hundred runs).

Furthermore, I put a bunch of links into a comment I put against the RHEL 5.4
brd driver info discussing various aspects I found today. The info is viewable
from rhel regarding the use of the new brd ramdisk driver in rhel 5.4:
https://bugzilla.redhat.com/show_bug.cgi?id=480663

Thanks,
Devin

Comment 1 Devin Nate 2009-08-04 17:14:00 UTC
Initial testing with a fresh kernel-2.6.30.4 compiled from the rhel 5.4 beta kernel .config file has so far rendered successes for all prior tests. Still working through the tests, but so far no failures so far.

Being tested on the vmware instance at this time, but much more successful so far than on the rhel kernel. Only the kernel has changed, same identical system as before.

Thanks,
Devin

Comment 2 Eric Sandeen 2009-08-04 17:26:49 UTC
Ok, thanks for the bug report.  I think the issue is that the device is fairly small (16MB in your attached testcase), and things go badly on ENOSPC.  I have a distinct testcase that seems to show it:

# mkfs.ext3 -b 4096 /dev/sdb3 16384
# mount /dev/sdb3 /mnt/test
# fsstress-ltp -d /mnt/test/work/ -s 1249856130 -p 3 -l
0 -n 100000000
<wait 30s, about 'til full>
# umount /mnt/test
# e2fsck -fvp /dev/sdb3
/dev/sdb3: Inode 53, i_size is 588535, should be 888832.  FIXED.
/dev/sdb3: Inode 573, i_size is 0, should be 172032.  FIXED.
....

etc.

I think I know which mod fixed this upstream, testing now.

Thanks,
-Eric

Comment 3 Devin Nate 2009-08-04 22:27:35 UTC
Testing so far on stock kernel-2.6.30.4 vs rhel 5.4 kernel, which produced the above results.


Summary:

1. ext2 seems more stable in current kernel compared to rhel 5.4 2.6.18-157 (at least, it did pass 5 iterations - which isn't entirely comprehensive - I'll try to run it over night).

2. ext3 seems much less stable in current stock kernel compared to rhel 5.4 2.6.18-157 (kernel errors and unable to reboot), although neither passed the fsstress in data=ordered.

2a. ext3 with data=journal worked "fine" in rhel 5.4 2.6.18-157 (a couple hundred iterations). ext3 with data=journal failed with a kernel failure in stock kernel on about the 21st iteration.

3. ext4 data=ordered seems more stable in current stock kernel compared to rhel 5.4 2.6.18-157 (kernel errors instead of fs corruption), but that's not entirely a huge improvement (and kernel errors presumably could lead to fs corruption).

4. fsstress/this test script is causing problems all over the place on ext2, ext3, and ext4. None of the above are inspiring - either on the rhel kernel or stock. So far the only 'successful' candidates appear to be ext2 with stock kernel or ext3 with data=journal on rhel kernel.

Thanks,
Devin


Details follow:

Minimum 5 iterations each (previously, testing never got past the first
iteration - i.e. failed immediately). 5 may not be enough iterations - some
websites claiming problems only start in >700 iterations, but previously I
wasn't even getting 1 iteration through and it's a starting point especially
for the failures

1. no failure in 5 iterations on ext2, blocksizes 1024, 2048, 4096, when
mounted on a loop device (loop0) losetup against a file in /dev/shm (tmpfs).

2. no failure in 5 iterations on ext2, blocksizes 1024, 2048, 4096, when
mounted on a ramdisk (ram0) using the new brd driver (part of rhel 5.4).

3. kernel failure on ext4, blocksize 1024, data=ordered (default), using brd
(ram0). Happened on cycle 4 (not 1, 2, or 3). Rebooted after for further
testing, system shut down and rebooted fine. Messages as follows:

Message from syslogd@ at Tue Aug  4 14:40:50 2009 ...
cclinux02 kernel: This should not happen.!! Data will be lost
Message from syslogd@ at Tue Aug  4 14:40:50 2009 ...
cclinux02 kernel: mpage_da_map_blocks block allocation failed for inode 450 at
logical offset 777 with max blocks 12 with error -28
Message from syslogd@ at Tue Aug  4 14:40:50 2009 ...
cclinux02 kernel: Total free blocks count 0
Message from syslogd@ at Tue Aug  4 14:40:50 2009 ...
cclinux02 kernel: Free/Dirty block details
Message from syslogd@ at Tue Aug  4 14:40:50 2009 ...
cclinux02 kernel: free_blocks=0
Message from syslogd@ at Tue Aug  4 14:40:50 2009 ...
cclinux02 kernel: dirty_blocks=-30
Message from syslogd@ at Tue Aug  4 14:40:51 2009 ...
cclinux02 kernel: i_reserved_data_blocks=12
Message from syslogd@ at Tue Aug  4 14:40:50 2009 ...
cclinux02 kernel: Block reservation details
Message from syslogd@ at Tue Aug  4 14:40:51 2009 ...
cclinux02 kernel: i_reserved_meta_blocks=2
Message from syslogd@ at Tue Aug  4 14:40:53 2009 ...
cclinux02 kernel: mpage_da_map_blocks block allocation failed for inode 2817 at
logical offset 991 with max blocks 25 with error -28
Message from syslogd@ at Tue Aug  4 14:40:53 2009 ...
cclinux02 kernel: This should not happen.!! Data will be lost
Message from syslogd@ at Tue Aug  4 14:40:53 2009 ...
cclinux02 kernel: Total free blocks count 0
Message from syslogd@ at Tue Aug  4 14:40:53 2009 ...
cclinux02 kernel: Free/Dirty block details
Message from syslogd@ at Tue Aug  4 14:40:53 2009 ...
cclinux02 kernel: free_blocks=0
Message from syslogd@ at Tue Aug  4 14:40:53 2009 ...
cclinux02 kernel: dirty_blocks=-4
Message from syslogd@ at Tue Aug  4 14:40:53 2009 ...
cclinux02 kernel: Block reservation details
Message from syslogd@ at Tue Aug  4 14:40:53 2009 ...
cclinux02 kernel: i_reserved_data_blocks=25
Message from syslogd@ at Tue Aug  4 14:40:53 2009 ...
cclinux02 kernel: i_reserved_meta_blocks=2
Message from syslogd@ at Tue Aug  4 14:40:59 2009 ...
cclinux02 kernel: mpage_da_map_blocks block allocation failed for inode 1446 at
logical offset 1894 with max blocks 2 with error -28
Message from syslogd@ at Tue Aug  4 14:40:59 2009 ...
cclinux02 kernel: This should not happen.!! Data will be lost
Message from syslogd@ at Tue Aug  4 14:40:59 2009 ...
cclinux02 kernel: Total free blocks count 0
Message from syslogd@ at Tue Aug  4 14:40:59 2009 ...
cclinux02 kernel: Free/Dirty block details
Message from syslogd@ at Tue Aug  4 14:40:59 2009 ...
cclinux02 kernel: free_blocks=0
Message from syslogd@ at Tue Aug  4 14:40:59 2009 ...
cclinux02 kernel: dirty_blocks=-1
Message from syslogd@ at Tue Aug  4 14:40:59 2009 ...
cclinux02 kernel: Block reservation details
Message from syslogd@ at Tue Aug  4 14:40:59 2009 ...
cclinux02 kernel: i_reserved_data_blocks=2
Message from syslogd@ at Tue Aug  4 14:40:59 2009 ...
cclinux02 kernel: i_reserved_meta_blocks=2^C

4. kernel failure on ext4, blocksize 1024, data=ordered (default), using loop
(loop0) on /dev/shm. Happened on cycle 7 (not 1, 2, or 3). Rebooted after for
further testing, system shutdown and rebooted fine. Messages as follows:

Message from syslogd@ at Tue Aug  4 15:49:55 2009 ...
cclinux02 kernel: mpage_da_map_blocks block allocation failed for inode 289 at
logical offset 740 with max blocks 4 with error -28
Message from syslogd@ at Tue Aug  4 15:49:55 2009 ...
cclinux02 kernel: Free/Dirty block details
Message from syslogd@ at Tue Aug  4 15:49:55 2009 ...
cclinux02 kernel: Total free blocks count 0
Message from syslogd@ at Tue Aug  4 15:49:55 2009 ...
cclinux02 kernel: This should not happen.!! Data will be lost
Message from syslogd@ at Tue Aug  4 15:49:55 2009 ...
cclinux02 kernel: free_blocks=0
Message from syslogd@ at Tue Aug  4 15:49:55 2009 ...
cclinux02 kernel: Block reservation details
Message from syslogd@ at Tue Aug  4 15:49:55 2009 ...
cclinux02 kernel: dirty_blocks=-1
Message from syslogd@ at Tue Aug  4 15:49:55 2009 ...
cclinux02 kernel: i_reserved_data_blocks=4
Message from syslogd@ at Tue Aug  4 15:49:55 2009 ...
cclinux02 kernel: i_reserved_meta_blocks=2

5. kernel failure on ext3, data=ordered, on ram0 (brd). reboot attempted but
hung. system had to be powered off. messages as follows:

Message from syslogd@ at Tue Aug  4 15:55:54 2009 ...
cclinux02 kernel: ------------[ cut here ]------------
Message from syslogd@ at Tue Aug  4 15:55:54 2009 ...
cclinux02 kernel: invalid opcode: 0000 [#1] SMP
Message from syslogd@ at Tue Aug  4 15:55:54 2009 ...
cclinux02 kernel: last sysfs file:
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
Message from syslogd@ at Tue Aug  4 15:55:54 2009 ...
cclinux02 kernel: Stack:
Message from syslogd@ at Tue Aug  4 15:55:54 2009 ...
cclinux02 kernel:  ffff880019016e80 ffffffff802a89b4 ffff880001058b40
ffff88007f4026c0
Message from syslogd@ at Tue Aug  4 15:55:54 2009 ...
cclinux02 kernel:  ffffffffa0052e60 ffffffff802a8a43 ffff880019016e80
ffff88007e5a4800
Message from syslogd@ at Tue Aug  4 15:55:54 2009 ...
cclinux02 kernel: Call Trace:
Message from syslogd@ at Tue Aug  4 15:55:54 2009 ...
cclinux02 kernel:  [<ffffffff802a89b4>] ? generic_shutdown_super+0x70/0xdd
Message from syslogd@ at Tue Aug  4 15:55:54 2009 ...
cclinux02 kernel:  [<ffffffff802a8a43>] ? kill_block_super+0x22/0x3a
Message from syslogd@ at Tue Aug  4 15:55:54 2009 ...
cclinux02 kernel:  [<ffffffff802a8d7f>] ? deactivate_super+0x5f/0x77
Message from syslogd@ at Tue Aug  4 15:55:54 2009 ...
cclinux02 kernel:  [<ffffffff802ba82a>] ? sys_umount+0x2bc/0x313
Message from syslogd@ at Tue Aug  4 15:55:54 2009 ...
cclinux02 kernel:  [<ffffffff8020b96b>] ? system_call_fastpath+0x16/0x1b
Message from syslogd@ at Tue Aug  4 15:55:54 2009 ...
cclinux02 kernel: Code: 48 8b 51 40 48 81 c6 88 02 00 00 89 04 24 31 c0 e8 90
79 1f e0 48 8b 6d 00 48 8b 45 00 4c 39 e5 0f 18 08 75 b7 4d 39 24 24 74 04 <0f>
0b eb fe 49 8b bd 38 01 00 00 e8 15 19 28 e0 48 8b bb b0 01


5. ext3 with data=journal, 1024k blocksize, on brd (ram0). Reboot failed,
similar to prior ext3 test (machine had to be powered off and back on). Kernel
error as follows:

EXT3-fs: mounted filesystem with journal data mode.
EXT3 Inode ffff8800775a4978: orphan list check failed!
ffff8800775a4978: 0000186e 00000000 00000000 00000000
ffff8800775a4988: 00000000 00000000 00000000 00000000
ffff8800775a4998: 00000000 00000000 00000000 00000000
ffff8800775a49a8: 00000000 00000000 00000000 00000000
ffff8800775a49b8: 00000000 00000000 00000000 00000000
ffff8800775a49c8: 00000000 00000001 00000000 00000000
ffff8800775a49d8: 00000000 00000000 00000000 00000000
ffff8800775a49e8: 775a49e8 ffff8800 775a49e8 ffff8800
ffff8800775a49f8: ffffffff ffffffff ffffffff ffffffff
ffff8800775a4a08: 6cbd1b98 ffff8800 6cbd1b98 ffff8800
ffff8800775a4a18: 000001a3 00000000 00000000 00000000
ffff8800775a4a28: 00000001 00000000 775a4a30 ffff8800
ffff8800775a4a38: 775a4a30 ffff8800 00000000 00000000
ffff8800775a4a48: 00000000 00000000 00000000 00000000
ffff8800775a4a58: 00100100 00000000 00200200 00000000
ffff8800775a4a68: 775a4a68 ffff8800 775a4a68 ffff8800
ffff8800775a4a78: 775a4a78 ffff8800 775a4a78 ffff8800
ffff8800775a4a88: 0000039a 00000000 00000000 00000001
ffff8800775a4a98: 00000000 00000000 00000000 00000000
ffff8800775a4aa8: 00000001 00000000 000001a3 00000000
ffff8800775a4ab8: 4a78b3bb 00000000 00000000 00000000
ffff8800775a4ac8: 4a78b3a0 00000000 00000000 00000000
ffff8800775a4ad8: 4a78b3b9 00000000 00000000 00000000
ffff8800775a4ae8: 0000000a 00000000 00000002 00000000
ffff8800775a4af8: a1ff0000 00000404 00000001 00000000
ffff8800775a4b08: 775a4b08 ffff8800 775a4b08 ffff8800
ffff8800775a4b18: 00000000 00000000 00000000 00000000
ffff8800775a4b28: 775a4b28 ffff8800 775a4b28 ffff8800
ffff8800775a4b38: a0049220 ffffffff 80cd4840 ffffffff
ffff8800775a4b48: 7dd5e400 ffff8800 00000000 00000000
ffff8800775a4b58: 775a4b60 ffff8800 775a4a48 ffff8800
ffff8800775a4b68: 00000000 00000020 00000000 00000000
ffff8800775a4b78: 00000707 00000000 00000000 00000000
ffff8800775a4b88: 00010001 00000000 775a4b90 ffff8800
ffff8800775a4b98: 775a4b90 ffff8800 00000202 00000002
ffff8800775a4ba8: 00000000 00000000 00000000 00000000
ffff8800775a4bb8: a0048660 ffffffff 001200d2 00000000
ffff8800775a4bc8: 7e0cf258 ffff8800 00000404 00000000
ffff8800775a4bd8: 775a4bd8 ffff8800 775a4bd8 ffff8800
ffff8800775a4be8: 00000000 00000000 00000000 00000000
ffff8800775a4bf8: 00000000 00000000 775a4c00 ffff8800
ffff8800775a4c08: 775a4c00 ffff8800 00000000 00000000
ffff8800775a4c18: 00000000 4ba46971 00000000 00000000
ffff8800775a4c28: 00000000 00000000 775a4c30 ffff8800
ffff8800775a4c38: 775a4c30 ffff8800 00000001 00000000
ffff8800775a4c48: 775a4c48 ffff8800 775a4c48 ffff8800
ffff8800775a4c58: 00000000 00000000 00000040 00000000
ffff8800775a4c68: 0007c20d 00000001 00000000 00000000
ffff8800775a4c78: 00000000 00000000 00000000 00000000
Pid: 7494, comm: umount Not tainted 2.6.30.4 #1
Call Trace:
 [<ffffffffa0041a47>] ? ext3_destroy_inode+0x65/0x78 [ext3]
 [<ffffffff802b7625>] ? dispose_list+0xaa/0xd9
 [<ffffffff802b7db8>] ? invalidate_inodes+0xe5/0x103
 [<ffffffff802a898d>] ? generic_shutdown_super+0x49/0xdd
 [<ffffffff802a8a43>] ? kill_block_super+0x22/0x3a
 [<ffffffff802a8d7f>] ? deactivate_super+0x5f/0x77
 [<ffffffff802ba82a>] ? sys_umount+0x2bc/0x313
 [<ffffffff8020b96b>] ? system_call_fastpath+0x16/0x1b
sb orphan head is 922
sb_info orphan list:
  inode ram0:922 at ffff8800775a4a48: mode 120777, nlink 1, next 0
------------[ cut here ]------------
kernel BUG at fs/ext3/super.c:434!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
CPU 1
Modules linked in: vsock vmmemctl pvscsi dm_mirror dm_multipath scsi_dh video
output sbs sbshc battery acpi_memhotplug lp sg ide_cd_mod cdrom serio_raw
parport_pc floppy parport button rtc_cmos rtc_core ac rtc_lib vmxnet3 vmci
shpchp pcspkr i2c_piix4 i2c_core dm_region_hash dm_log dm_mod ata_piix libata
mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd
ohci_hcd ehci_hcd [last unloaded: microcode]
Pid: 7494, comm: umount Not tainted 2.6.30.4 #1 VMware Virtual Platform
RIP: 0010:[<ffffffffa0041e1a>]  [<ffffffffa0041e1a>] ext3_put_super+0x185/0x1ea
[ext3]
RSP: 0018:ffff88006ca2de88  EFLAGS: 00010202
RAX: ffff8800775a4a08 RBX: ffff88006cbd1a00 RCX: 000000000000c6f1
RDX: 0000000000003b3b RSI: 0000000000000046 RDI: ffffffff80842474
RBP: ffff88006cbd1b98 R08: ffff88007dd5e400 R09: ffff88000107b380
R10: ffff88000107b380 R11: 000001020000000e R12: ffff88006cbd1b98
R13: ffff88007dd5e400 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f2872e65760(0000) GS:ffff88000104b000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000030260d3150 CR3: 000000007c7d9000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process umount (pid: 7494, threadinfo ffff88006ca2c000, task ffff88001b18b4b0)
Stack:
 ffffffff00000000 ffff88007dd5e400 ffffffffa0048d00 0000000000000003
 ffff88007e1817c0 ffffffff802a89b4 ffff880001058b40 ffff88007f42da40
 ffffffffa0052e60 ffffffff802a8a43 ffff88007e1817c0 ffff88007dd5e400
Call Trace:
 [<ffffffff802a89b4>] ? generic_shutdown_super+0x70/0xdd
 [<ffffffff802a8a43>] ? kill_block_super+0x22/0x3a
 [<ffffffff802a8d7f>] ? deactivate_super+0x5f/0x77
 [<ffffffff802ba82a>] ? sys_umount+0x2bc/0x313
 [<ffffffff8020b96b>] ? system_call_fastpath+0x16/0x1b
Code: 48 8b 51 40 48 81 c6 88 02 00 00 89 04 24 31 c0 e8 90 79 1f e0 48 8b 6d
00 48 8b 45 00 4c 39 e5 0f 18 08 75 b7 4d 39 24 24 74 04 <0f> 0b eb fe 49 8b bd
38 01 00 00 e8 15 19 28 e0 48 8b bb b0 01
RIP  [<ffffffffa0041e1a>] ext3_put_super+0x185/0x1ea [ext3]
 RSP <ffff88006ca2de88>
---[ end trace e093690d8588901b ]---
------------[ cut here ]------------
WARNING: at kernel/exit.c:896 do_exit+0x3d/0x5fe()
Hardware name: VMware Virtual Platform
Modules linked in: vsock vmmemctl pvscsi dm_mirror dm_multipath scsi_dh video
output sbs sbshc battery acpi_memhotplug lp sg ide_cd_mod cdrom serio_raw
parport_pc floppy parport button rtc_cmos rtc_core ac rtc_lib vmxnet3 vmci
shpchp pcspkr i2c_piix4 i2c_core dm_region_hash dm_log dm_mod ata_piix libata
mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd
ohci_hcd ehci_hcd [last unloaded: microcode]
Pid: 7494, comm: umount Tainted: G      D    2.6.30.4 #1
Call Trace:
 [<ffffffff8023b709>] ? do_exit+0x3d/0x5fe
 [<ffffffff80238b9e>] ? warn_slowpath_common+0x77/0x8e
 [<ffffffff8023b709>] ? do_exit+0x3d/0x5fe
 [<ffffffff802392cf>] ? release_console_sem+0x174/0x18e
 [<ffffffff804bfb7c>] ? oops_end+0xa8/0xad
 [<ffffffff8020ceec>] ? do_invalid_op+0x85/0x8f
 [<ffffffffa0041e1a>] ? ext3_put_super+0x185/0x1ea [ext3]
 [<ffffffff802397e2>] ? printk+0x4e/0x56
 [<ffffffff8020c6d5>] ? invalid_op+0x15/0x20
 [<ffffffffa0041e1a>] ? ext3_put_super+0x185/0x1ea [ext3]
 [<ffffffffa0041e04>] ? ext3_put_super+0x16f/0x1ea [ext3]
 [<ffffffff802a89b4>] ? generic_shutdown_super+0x70/0xdd
 [<ffffffff802a8a43>] ? kill_block_super+0x22/0x3a
 [<ffffffff802a8d7f>] ? deactivate_super+0x5f/0x77
 [<ffffffff802ba82a>] ? sys_umount+0x2bc/0x313
 [<ffffffff8020b96b>] ? system_call_fastpath+0x16/0x1b
---[ end trace e093690d8588901c ]---

Comment 4 Eric Sandeen 2009-08-04 22:39:41 UTC
Devin, thanks for the extra testing.  the "mpage_da_alloc_blocks" business is ext4 running out of space for delayed allocation.  the big orphan inode dump is related to 

http://bugzilla.kernel.org/show_bug.cgi?id=13676

only recently fixed by:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=9eaaa2d5759837402ec5eee13b2a97921808c3eb

The most recent kernels I've tested are ok on ext3, and I found a couple mods that I was quite sure had fixed this upstream, but so far I've not got a solution for RHEL5.  But I have a testcase, so it won't be long.  :)

Thanks,
-Eric

Comment 5 Devin Nate 2009-08-04 22:47:22 UTC
Dare I ask, will ext2 be handled by the mods for RHEL5?

(I don't really care... just need to know - this actually impacts a business process, whereby we mount a ramdisk with ext2 because we don't need a journal for the ramdisk ... therefore if ext2 is excluded we'll use ext3 instead).

Thanks,
Devin

Comment 6 Eric Sandeen 2009-08-04 23:12:50 UTC
if ext2 is corrupted by fsstress, yep we'll fix that too.  I need to get to the bottom of all this, though, to see the root cause & determine whether it's the same issue for ext2 vs. 3 vs. 4.

FWIW, I think that at least some of this is caused by ENOSPC issues; if you test on larger devices, you may find that you don't hit the problem if you don't run out of space, so that may put your mind at ease to some degree for the business process?

-Eric

Comment 7 Devin Nate 2009-08-05 05:34:28 UTC
Hi Eric;

I agree with your comment re ENOSPC making the test more difficult (I don't know if running out of inodes is also considered ENOSPC, but further complicates). It does offer some confidence, plus the length of time of used linux on ext3 ;) That said, I still look forward to seeing clean stress tests!

Thanks for your involvement, let me know if/when there's anything I can do to help!

Devin

Comment 8 Eric Sandeen 2009-08-05 21:28:04 UTC
Ok, this is a pretty narrow failure case, related to failed direct IO on ext2/3/4.

Fixed by:

commit 0f64415d42760379753e6088787ce3fd3e069509
Author: Dmitri Monakhov <dmonakhov>
Date:   Tue Jan 6 14:40:04 2009 -0800

    fs: truncate blocks outside i_size after O_DIRECT write error
    
    In case of error extending write may have instantiated a few blocks
    outside i_size.  We need to trim these blocks.  We have to do it
    *regardless* to blocksize.  At least ext2, ext3 and reiserfs interpret
    (i_size < biggest block) condition as error.  Fsck will complain about
    wrong i_size.  Then fsck will fix the error by changing i_size according
    to the biggest block.  This is bad because this blocks contain garbage
    from previous write attempt.  And result in data corruption.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0f64415d42760379753e6088787ce3fd3e069509

I don't know if you use direct IO in your workload; if not you should not hit this bug.  If you modify the testcase to invoke fsstress with "-f dwrite=0" it will turn off direct IO writes, and should avoid this bug.  I'll test that too, but if you do find any further corruption cases, please open another bug for that.

Thanks,
-Eric

Comment 9 Devin Nate 2009-08-05 21:48:07 UTC
Hi Eric;

We don't use direct io at all. We'll rerun tests with the proposed -f dwrite=0 in the next day.

Thanks,
Devin

Comment 10 Devin Nate 2009-08-06 04:50:54 UTC
Test results so far:

On rhel 5.4 kernel, 2.6.18-157:

1. ext2, same test script, no failure on 5 iterations, -f dwrite=0, ram0

2. ext2, same test script, no failure on 5 iterations, -f dwrite=0, loop0

3. ext3, same test script, *failure* on 1st iteration, -f dwrite=0, ram0, default mount options, blocksize=1024

4. ext3, same test script, *failure* on 1st iteration, -f dwrite=0, loop0, default mount options, blocksize=1024

Note: for above 2 tests, error's (actual case for 1 iteration) similar to:

/dev/loop0: Inode 102, i_size is 663552, should be 666624.  FIXED.
/dev/loop0: Inode 190, i_size is 827392, should be 830464.  FIXED.
/dev/loop0: Inode 370, i_size is 430080, should be 433152.  FIXED.
/dev/loop0: Inode 417, i_size is 1474560, should be 1476608.  FIXED.
/dev/loop0: Inode 469, i_size is 614400, should be 617472.  FIXED.
/dev/loop0: Inode 841, i_size is 921600, should be 923648.  FIXED.
/dev/loop0: Inode 877, i_size is 1990656, should be 1993728.  FIXED.
/dev/loop0: Inode 905, i_size is 1409024, should be 1412096.  FIXED.
/dev/loop0: Inode 914, i_size is 1171789, should be 2091008.  FIXED.
/dev/loop0: Inode 1072, i_size is 1138688, should be 1140736.  FIXED.
/dev/loop0: Inode 1094, i_size is 512000, should be 514048.  FIXED.
/dev/loop0: Inode 1097, i_size is 847872, should be 849920.  FIXED.
/dev/loop0: Inode 1105, i_size is 581632, should be 584704.  FIXED.
/dev/loop0: Inode 1309, i_size is 733184, should be 736256.  FIXED.
/dev/loop0: Inode 1526, i_size is 688128, should be 690176.  FIXED.
/dev/loop0: Inode 1703, i_size is 745472, should be 748544.  FIXED.
/dev/loop0: Inode 1787, i_size is 704512, should be 706560.  FIXED.
/dev/loop0: Inode 1868, i_size is 1544192, should be 1546240.  FIXED.
/dev/loop0: Inode 840, i_size is 258048, should be 260096.  FIXED.
/dev/loop0: Inode 1673, i_size is 77824, should be 80896.  FIXED.
/dev/loop0: Inode 583, i_size is 237568, should be 240640.  FIXED.
/dev/loop0: Inode 2099, i_size is 1708032, should be 1710080.  FIXED.
/dev/loop0: Inode 2234, i_size is 921600, should be 924672.  FIXED.
/dev/loop0: Inode 2547, i_size is 364544, should be 366592.  FIXED.
/dev/loop0: Inode 2717, i_size is 671744, should be 674816.  FIXED.
/dev/loop0: Inode 2722, i_size is 991232, should be 993280.  FIXED.
/dev/loop0: Inode 2724, i_size is 376832, should be 379904.  FIXED.

5. ext3, same test script, *failure* on 2nd iteration, -f dwrite=0, ram0, data=journal mount option. (this seems new).

6. ext3, same test script, *failure* on 1st iteration, -f dwrite=0, loop0, data=journal mount option. (this seems new).

7. For fun, re-ran ext2 tests... no failure after 5 iterationes.

8. ext4, same test script, no failure on 5 iterations, -f dwrite=0, ram0, default mount options.

9. For fun re-ran ext3 test, ext3, default mount options, ram0, -f dwrite=0.. *failure* on iteration 1.

10. For fun, re-ran ext3 test, data=journal, ram0, -f dwrite=0, *failure* on iteration 1.

11. For fun, re-ran ext3 test, data=journal, ram0, *not with* fsstress options, did not fail on tests.

12. For fun again, re-ran ext3 tests, data=journal, ram0, -f dwrite=0, *failure* on iteration #2.

13. For fun, re-ran #11: trying to get to the bottom of this: *failure* on iteration 6. Albeit, only 1 line of corrections (instead of many). Output below:

+ e2fsck -fvp /dev/ram0
/dev/ram0: Inode 229, i_size is 913408, should be 915456.  FIXED.

    3501 inodes used (85.47%)
      15 non-contiguous inodes (0.4%)
         # of inodes with ind/dind/tind blocks: 19/5/0
    4730 blocks used (28.87%)
       0 bad blocks
       0 large files

     585 regular files
    1519 directories
    1140 character device files
       0 block device files
       0 fifos
    2473 links
     248 symbolic links (21 fast symbolic links)
       0 sockets
--------
    5965 files

14. Rerunning ext2 tests overnight, so far no errors.. iteration #8.


So, mostly ... ext3 now seems to fail consistently, even if I mix tests to prove the test script itself. And the testing is still pretty brief with only 5 iterations for all other file systems. What's unfortunately "new" is ext3 with data=journal no longer passing. I'd previously ran a couple hundred iterations of this (without the -f dwrite=0 option), and it didn't show issues, and now it fails. I've confirmed no changes to packages on this system (only uptime /memory /stuff that'd change with uptime, has changed).

This only changes the result slightly I guess? Maybe ext2 & ext4 is better (at least they didn't fail in 5 iterations or less). ext3 seems as bad or worse.

All tests ran against rhel 5.4 kernel 2.6.18-157

Same test script eric, should be easy to duplicate still. For what it's worth, I got my copy of fsstress from the LTP website a couple days ago, and confirmed that it accepts -f dwrite=0 as an option.

Thanks,
Devin

Comment 11 Eric Sandeen 2009-08-06 05:15:18 UTC
Well crud ;)

There were a few ext3 journaling fixes that I -think- might resolve this, I'll distill another simple testcase & see what's going on.

ext3 shouldn't get -worse- by -not- doing direct IO writes ;)  Maybe you just got unlucky w/ the new mix of ops.

For your loop tests above was the loopback file a file on a real filesystem or tmpfs?

I'm going to focus testing on "real" filesystems first, to rule out potential problems w/ ramdisks and the like, then move on to that if all seems well.

thanks for the detailed testing; sorry for the problems.

Comment 12 Eric Sandeen 2009-08-06 05:55:05 UTC
so far I've got 12 cycles of fsstress with -f dwrite=0 passing on a 16MB "real" disk w/ 4k-block ext3 in writeback mode.

However, the same test on 1k blocks failed on the first run.

There were some journaling fixes lately that may have addressed this, I'll look into this new failure.  It does seem to be ok upstream.  (Just realized I'd been running w/ 4k blocks before, which is a simpler case from the fs perspective on x86).

I'll likely split this out into a new bug, but will need to get to the root cause before I file it.  If/when I do I'll keep you on cc:

Comment 13 Devin Nate 2009-08-06 10:54:31 UTC
Hi Eric;

I'm glad you're getting similar results to myself. I can also confirm that blocksize is making a difference for me

The above loop tests were on tmpfs, so I'll add testing of actual disk based file systems. Normally, we don't use the loop system, I was just trying to as an alternative to the brd based ram0 system. I also propose we use a common seed value, which I suspect governs the behavior of fsstress, or communicate that value as part of the results. It's one possibility why/how my prior ext3 went so well but so poorly through todays iterations. The seed value is -s num. In a completely unoriginal way, I'm starting with a seed value of 1 ;)

Hopefully with a set seed value, you and I should get identical results in identical cycles.

For reference, my fsstress command is currently:
SEED=1
fsstress -f dwrite=0 -d /mnt/test_file_system/work -p 3 -l 0 -n 100000000 -X -s $SEED

No issues on the problems.. at this point its an interesting challenge for me ;)

Thanks.. off to setting up a small disk now in the machine. At least virtualization makes that trivial.

Devin

Comment 14 Eric Sandeen 2009-08-06 17:11:27 UTC
Ok, similar problem for buffered writes.

commit 5ec8b75e3a2a94860ee99b5456fe1a963c8680e5
Author: Aneesh Kumar K.V <aneesh.kumar.ibm.com>
Date:   Sat Oct 18 20:28:00 2008 -0700

    ext3: truncate block allocated on a failed ext3_write_begin
    
    For blocksize < pagesize we need to remove blocks that got allocated in
    block_write_begin() if we fail with ENOSPC for later blocks.
    block_write_begin() internally does this if it allocated page locally.
    This makes sure we don't have blocks outside inode.i_size during ENOSPC.

No good way to avoid this in your fsstress testing, it takes an ext3 kernel patch to make the change.  But in my 1k-block testing of a 16MB ext3 fs w/ data=ordered, I survived 30+ runs.

These 2 fixes are more or less the same issue for different types of writes, I'm happy to just keep them in this bug.

And they're both related to ENOSPC, I don't know if that's any comfort to you in your workload (whether you are likely to encounter ENOSPC or can avoid it).

Comment 15 Eric Sandeen 2009-08-06 20:56:31 UTC
I've built a kernel on a private branch with these 2 fixes if you'd like to test.  Do you need xen variants or not?  (I built x86_64)

-Eric

Comment 16 Devin Nate 2009-08-06 22:42:15 UTC
Hi Eric;

I don't need xen, just standard x86_64. How do I get them?

Comment 17 Eric Sandeen 2009-08-06 23:04:02 UTC
Binary kernel is at http://people.redhat.com/esandeen/bz515529/

I can upload src.rpm and kernel-devel and whatever else if you need them, just trying to stay under quota.

-Eric

Comment 18 Eric Sandeen 2009-08-06 23:10:19 UTC
Created attachment 356597 [details]
Proposed patch

This is the patch used to build the kernel in the previous comment.

Comment 19 Devin Nate 2009-08-07 20:28:15 UTC
Hi Eric;

I'm running a touch behind. I'll be testing this weekend or monday.

I ran a huge number of iterations (a couple thousand) against kernel 2.6.18-157 and ext3 data=ordered 4096 blocksize, on a ram disk, and no problems (per your discussing that 4096 looks good). I can confirm this.

ext4 threw a problem after a few hundred cycles.

Haven't touched ext2 for a bit.

I'll review asap, which is hopefully this weekend.

Thanks, and talk soon.
Devin

Comment 20 Devin Nate 2009-08-07 20:28:42 UTC
Oh, for my previous post, I have not yet tried your patches yet.. I'm running behind and haven't got to them yet.

Comment 21 Devin Nate 2009-08-08 15:38:45 UTC
Hi Eric;

I'm starting testing now, thanks for your work. One question, in your Comment #4 above, you identified a patch and also some things with ext4.. I don't see those changes in your patchfile. Are they being handled also?

Thanks,
Devin

Comment 22 Eric Sandeen 2009-08-08 16:12:22 UTC
Devin, no that one's not in yet.  It actually affects ext3 as well, but I hadn't seen the problem yet in my testing on RHEL5, so was holding off for a bit.  The patch you're asking about depends on yet another change (695f6ae0dcea3dd83bfbb9634ff067f780649ba8 at least), so I'm just being a bit conservative - I don't think you'll hit a problem though with this testcase.  If you do, you can try running again w/o the symlink op in fsstress to verify.

Thanks,
-Eric

Comment 23 Devin Nate 2009-08-08 16:23:20 UTC
That's fine Eric, just wanted to make sure we didn't forget to do something by accident.

At this point, if I had to guess (based on your feedback) ext2 and ext3 will pass 1000 cycles with block sizes of 1024, 2048, and 4096, and with any data= flag for ext3, on ram0 or a real device.

I have tests running now on a ramdisk on the kernel you linked to this bug. I'm logging each cycle to a log file and all params used, so even if I loose my mind after thousands of cycles I'll know what actually happeend ;)

I'll post results. Thanks,
Devin

Comment 24 Devin Nate 2009-08-08 16:50:41 UTC
Oh, and in testing, I am no longer including the -f dwrite=0 as the O_DIRECT patches appear to have been included in your patchset.

Thanks,
Devin

Comment 25 Devin Nate 2009-08-08 17:38:38 UTC
Oh yeah.. any idea when this kernel will become production? There's 2 key benefits in this kernel... the new brd ramdisk, and these file system fixes.

Comment 26 Devin Nate 2009-08-09 15:59:37 UTC
Ok, at 1000 test cycles per, it takes quite a while (about 31,000 seconds to be exact).

Command being run:

fsstress -d /mnt/test_file_system/work -p 3 -l 0 -n 100000000 -X -s $i
(where $i is cycle number)

I started with the most common case, ext3 with default options, 4096 block size, and I'd say it passed. I've got ext2 running now, and plan to do ext4 next. Then i'll get into block sizes and diff mount options.

Results (so far):
completed cycle 1000; device=/dev/ram0; fstype=ext3; blocksize=4096; mountopts=

Thanks,
Devin

Comment 27 Eric Sandeen 2009-08-09 18:35:55 UTC
re: comment #25

The 162 kernel is part of the stream for RHEL5.4; it's a post-beta kernel.

As to when the fixes identified in this bug will be available, it'd be best to lobby w/ your support folks for that, it's too late to make these changes in the initial 5.4 release at least.

Comment 28 Devin Nate 2009-08-09 19:16:44 UTC
Hi Eric;
I'm ok with that. As long as the fixes make it in at some stage. I plan to take testing through ext2, ext3, ext4, with block sizes of 1024, 2048, and 4096. I'm not sure how I'll work data= options in, as a comprehensive test would do all combinations.

My only question for you is... Is that worth my effort? My goal to do so is more than just address this bug report but provide a solid testcase/ fsstress round for redhat and for the linux kernel (and I believe you are linked in with the kernel developers).  Especially for ext4 which is coming. 

If that's not true I won't bother. Thoughts?

Thanks,
Devin

Comment 29 Eric Sandeen 2009-08-09 20:09:13 UTC
Hi Devin -

My advice would be to test as far as you need to to ensure that things are working, robust, & solid for your usecase.  Anything else is going well above and beyond the call of duty for a bug reporter.  :)  I will be sure that we get better test coverage of these sorts of issues for internal testing.

I have been working on extending an existing test suite to do more tests on "generic" posix filesystems, and this is something that would be good to add.

Thanks,
-Eric

Comment 30 Devin Nate 2009-08-18 02:01:37 UTC
Ok, well... I can do more if you like, however I'll accept that you're putting together a test platform for this.

Anyhow,
ext3 passed just fine, block size 4096, as you saw.

ext2 had an error, it was on the fsck for ext2. I was able to duplicate it once, but not again. The error was strange, as there was nothing on the screen or in dmesg, just a error code of 1 from fsck. Anyhow, I cannot duplicate it again, and last time I had to do 932 cycles to get to the error. I'm not sure I care that much for ext2.

Haven't started ext4.

I'll keep you posted, thanks.
Devin

Comment 31 Yehia.Adham 2009-09-10 17:53:35 UTC
Hello,

A few hours ago, one of my servers encountered an error with EXT4, The filesystem is on top of LVM  and a 2 disk array. The array and all the drives in it are reporting optimal status and I don't suspect a hardware problem.  I was able to reboot into single user and manually fsck the filesystem and get it back running, but I feel like it' not the problem , that happened since we have upgraded to RHEL 5.4 kernel version 2.6.18-164.el5 .. ( I believe LVM is not the problem as we are expecting the same error with 2 BOXES not running over LVM )

kernel: mpage_da_map_blocks block allocation failed for inode 1843701 at logical offset 0 with max blocks 339 with error -122
Message from syslogd@ at Mon Sep 7 18:03:59 2009 ...
cairoserver kernel: This should not happen.!! Data will be lost

Please Advice.

Thank You!
Yehia

Comment 32 Eric Sandeen 2009-09-10 18:26:58 UTC
Error -122 is EDQUOT /* Quota exceeded */

Do you have quotas in use?

In short what happened was that delay-allocation data found no place to go when it was time to allocate & flush, presumably due to the quota issue.

This is likely a different root cause than this bug was opened for; I'd appreciate it if you could file a new bug for it, and also preferably run this through your RHEL support contacts.

In any new bug, please let me know a bit about your quota usage etc.

Thanks,
-Eric

Comment 33 procaccia 2009-09-14 13:25:07 UTC
I do have the same pb with RHEL 5.4 and latest kernel 2.6.18-164.el5

Message from syslogd@ at Mon Sep 14 15:11:03 2009 ...
gizeh kernel: This should not happen.!! Data will be lost
Message from syslogd@ at Mon Sep 14 15:11:03 2009 ...
gizeh kernel: mpage_da_map_blocks block allocation failed for inode 409873 at logical offset 0 with max blocks 8 with error -122

I use ext4 with LVM and quota
AS suggested by Eric Sandeen I've opened a new bug:

https://bugzilla.redhat.com/show_bug.cgi?id=523201

Comment 34 Eric Sandeen 2009-09-14 21:33:45 UTC
sent to rhkernel-list for RHEL5.5 on 9/14/2009

Comment 35 RHEL Program Management 2009-09-25 17:37:06 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 37 Don Zickus 2009-11-17 21:56:33 UTC
in kernel-2.6.18-174.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 42 errata-xmlrpc 2010-03-30 07:42:10 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html


Note You need to log in before you can comment on or make changes to this bug.