Bug 717635
Summary: | JFFS2 oopses on mount, complaining about "BUG at fs/jffs2/nodelist.c:644" | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Nikolai Zhubr <n-a-zhubr> | ||||||
Component: | kernel | Assignee: | David Woodhouse <dwmw2> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 15 | CC: | daniel-gl, dwmw2, gansalmon, itamar, jonathan, kernel-maint, linzhenhua1205, madhu.chinakonda, n-a-zhubr, shanevolpe | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2012-06-06 15:38:18 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
I suspect you've just got a corrupted filesystem. Can you reproduce this error consistently with a new filesystem on different media? (In reply to comment #1) > I suspect you've just got a corrupted filesystem. Can you reproduce this error > consistently with a new filesystem on different media? Yes, I'll try to reproduce it from a clean state, though it might require numerous experiments and probably will not be quick. Anyway, I think this is no excuse for the oops, because the embedded system is never guaranteed to be shut down cleanly and is never guaranteed to have 100% fail-free MTD memory, so essentially if JFFS required a consistent state of filesystem image in order to just not oops on mount then it would make just no sense. There is even no such thing as fsck.jffs2 - AFAICS jffs is supposed to address the issues internally. You are correct; there is no excuse for the crash. I'll take a look... Created attachment 511016 [details]
Complete self-contained example
This example targets fedora 15 live cd/stick, but it is mostly generic (except for "yum install mtd-utils" in runme0.sh which should be replaced as necessary). How to use: extract contents to any writeable directory, then run runme0.sh, runme1.sh, runme2.sh
What it does:
1. Creates temporary jffs2 image (runme0.sh)
2. Loads this image to ram mtd emulator and performs some writes to filesystem, then unmounts (runme1.sh)
3. Attempts to mount it again - with oops (runme2.sh)
So far I've got more info:
1. The problem is unrelated of sudden poweroff or broken memory. Attachment 511016 [details] shows how to reproduce all stages on mostly any random desktop system (preferrably running fedora 15 live cd/stick)
2. The problem seems to go away if filesystem is created with cleanmarkers (that is, --no-cleanmarkers option removed from runme0.sh)
I'm also seeing this bug with the 2.6.39 kernel on an ARM (xscale) target device. Target: PXA270 FLASH (128MB): NOR Intel P30 RAM (64MB): Linux Version: 2.6.39.2 Kernel Dump: [<c0028884>] (__bug+0x1c/0x28) from [<c010fdbc>] (jffs2_link_node_ref+0x80/0x1b4) [<c010fdbc>] (jffs2_link_node_ref+0x80/0x1b4) from [<c011db58>] (jffs2_sum_scan_sumnode+0x2a4/0x6e4) [<c011db58>] (jffs2_sum_scan_sumnode+0x2a4/0x6e4) from [<c0115390>] (jffs2_scan_medium+0x34c/0x1444) [<c0115390>] (jffs2_scan_medium+0x34c/0x1444) from [<c011857c>] (jffs2_do_mount_fs+0x16c/0x580) [<c011857c>] (jffs2_do_mount_fs+0x16c/0x580) from [<c011a968>] (jffs2_do_fill_super+0x14c/0x238) [<c011a968>] (jffs2_do_fill_super+0x14c/0x238) from [<c0195af4>] (mount_mtd_aux.clone.2+0x50/0xe4) [<c0195af4>] (mount_mtd_aux.clone.2+0x50/0xe4) from [<c0195c94>] (mount_mtd+0x10c/0x13c) [<c0195c94>] (mount_mtd+0x10c/0x13c) from [<c011abdc>] (jffs2_mount+0x10/0x18) [<c011abdc>] (jffs2_mount+0x10/0x18) from [<c009a100>] (mount_fs+0x10/0xc4) [<c009a100>] (mount_fs+0x10/0xc4) from [<c00b1144>] (vfs_kern_mount+0x4c/0x8c) [<c00b1144>] (vfs_kern_mount+0x4c/0x8c) from [<c00b12e0>] (do_kern_mount+0x34/0xcc) [<c00b12e0>] (do_kern_mount+0x34/0xcc) from [<c00b2aec>] (do_mount+0x614/0x68c) [<c00b2aec>] (do_mount+0x614/0x68c) from [<c00b2d70>] (sys_mount+0x84/0xc4) [<c00b2d70>] (sys_mount+0x84/0xc4) from [<c0008bec>] (mount_block_root+0xe8/0x284) [<c0008bec>] (mount_block_root+0xe8/0x284) from [<c0009040>] (prepare_namespace+0x160/0x1b4) [<c0009040>] (prepare_namespace+0x160/0x1b4) from [<c00089c4>] (kernel_init+0x100/0x13c) [<c00089c4>] (kernel_init+0x100/0x13c) from [<c00268b4>] (kernel_thread_exit+0x0/0x8) Code: e59f0010 e1a01003 eb0a64b3 e3a03000 (e5833000) If I disable jffs2 summery support in the kernel after which the kernel stopped dumping when mounting the rootfs. FYI: Although my kernel has summery support enabled I did not run sumtool on the rootfs.jffs2 before I flashed it. Aha, I did not run sumtool too. I'll try with summary support disabled as soon as I get the board back. Thanks for the information. I've also just noticed that summary support is marked as "EXPERIMENTAL" (though usually enabled) in kernel configuration, which probably confirms that it might cause some problems. I have isolated the problem down to 1 sector in my jffs2 file system. I can copy that sector, re-flash the board and copy the sector back in recreating the failure. I've used jffs2dump to view the sector and used the verbose mode so I could look at the summery table as well. Below (end of email) is *part* of the offending summery table. The address that is causing the OPPs on my system is: 0x00024cc4. What is interesting is that 0x00024cc4 is the only value in my summery table with multiple entries. The last 0x00024cc4 entry, however, is the correct entry as I'm assuming it should be if it is some sort of logging summery table. Here is the actual 0x0024cc4 inode header from the main flash (also used jffs2dump to get it): Inode node at 0x00024cc4, totlen 0x000000d6, #ino 5501, version 32, isize 217, csize 146, dsize 169, offset 48 as you can see from looking at the summery table the last entry for 0x00024cc4 is: #ino 5501,version 32, offset 0x00024cc4, totlen 0x000000d6 which matches the inode header at 0x00024cc4. Is there documentation somewhere on the jffs2 summery table other than the kernel source code? I've been learning how it works a bit by browsing and using printks which is showing to be a slow process so any help/feedback from someone who is more knowledgeable on the Linux jffs2 code would be greatly appreciated. Could it be that the multiple entries in the summery table for the same offset are not being properly handled by the kernel? ***Summery Table (Not the complete table, just what I think is interesting) *** #ino 5449, version 53, offset 0x000244a8, totlen 0x00000044 #ino 5449, version 54, offset 0x000244ec, totlen 0x00000072 #ino 5449, version 55, offset 0x00024560, totlen 0x00000078 #ino 5449, version 56, offset 0x000245d8, totlen 0x0000006c #ino 5449, version 57, offset 0x00024644, totlen 0x00000044 #ino 5449, version 58, offset 0x00024688, totlen 0x00000072 #ino 5449, version 59, offset 0x000246fc, totlen 0x00000078 #ino 5449, version 60, offset 0x00024774, totlen 0x0000006d #ino 5449, version 61, offset 0x000247e4, totlen 0x00000044 #ino 5449, version 62, offset 0x00024828, totlen 0x00000072 #ino 5449, version 63, offset 0x0002489c, totlen 0x00000078 #ino 5449, version 64, offset 0x00024914, totlen 0x0000006d #ino 5449, version 65, offset 0x00024984, totlen 0x00000044 #ino 5449, version 66, offset 0x000249c8, totlen 0x00000072 #ino 5449, version 67, offset 0x00024a3c, totlen 0x00000078 #ino 5449, version 68, offset 0x00024ab4, totlen 0x0000006d #ino 5449, version 69, offset 0x00024b24, totlen 0x00000044 #ino 5449, version 70, offset 0x00024b68, totlen 0x00000072 #ino 5449, version 71, offset 0x00024bdc, totlen 0x00000078 #ino 5449, version 72, offset 0x00024c54, totlen 0x0000006d #ino 5502, version 1, offset 0x00024cc4, totlen 0x00000044 #ino 5502, version 1, offset 0x00024cc4, totlen 0x00000044 #ino 164, version 8, offset 0x00024cc4, totlen 0x00000046 #ino 164, version 8, offset 0x00024cc4, totlen 0x00000046 #ino 164, version 9, offset 0x00024cc4, totlen 0x00000046 #ino 164, version 9, offset 0x00024cc4, totlen 0x00000046 #ino 5501, version 4, offset 0x00024cc4, totlen 0x0000004a #ino 5501, version 5, offset 0x00024cc4, totlen 0x0000004a #ino 5501, version 6, offset 0x00024cc4, totlen 0x0000005d #ino 5501, version 7, offset 0x00024cc4, totlen 0x0000005d #ino 5501, version 8, offset 0x00024cc4, totlen 0x00000064 #ino 5501, version 9, offset 0x00024cc4, totlen 0x00000064 #ino 5501, version 10, offset 0x00024cc4, totlen 0x00000056 #ino 5501, version 11, offset 0x00024cc4, totlen 0x00000056 #ino 5501, version 12, offset 0x00024cc4, totlen 0x0000004a #ino 5501, version 13, offset 0x00024cc4, totlen 0x0000004a #ino 5501, version 14, offset 0x00024cc4, totlen 0x0000004d #ino 5501, version 15, offset 0x00024cc4, totlen 0x0000004d #ino 5501, version 16, offset 0x00024cc4, totlen 0x0000004d #ino 5501, version 17, offset 0x00024cc4, totlen 0x0000004d #ino 5501, version 18, offset 0x00024cc4, totlen 0x00000069 #ino 5501, version 19, offset 0x00024cc4, totlen 0x00000069 #ino 5501, version 20, offset 0x00024cc4, totlen 0x0000004d #ino 5501, version 21, offset 0x00024cc4, totlen 0x0000004d #ino 5501, version 22, offset 0x00024cc4, totlen 0x0000004a #ino 5501, version 23, offset 0x00024cc4, totlen 0x0000004a #ino 5501, version 24, offset 0x00024cc4, totlen 0x000000d9 #ino 5501, version 25, offset 0x00024cc4, totlen 0x000000d9 #ino 526, version 2, offset 0x00024cc4, totlen 0x0000019f #ino 526, version 3, offset 0x00024cc4, totlen 0x0000019f #ino 5501, version 26, offset 0x00024cc4, totlen 0x000000d8 #ino 5501, version 27, offset 0x00024cc4, totlen 0x000000d8 #ino 526, version 4, offset 0x00024cc4, totlen 0x0000019f #ino 526, version 5, offset 0x00024cc4, totlen 0x0000019f #ino 5501, version 28, offset 0x00024cc4, totlen 0x000000d6 #ino 5501, version 29, offset 0x00024cc4, totlen 0x000000d6 #ino 526, version 6, offset 0x00024cc4, totlen 0x0000019f #ino 526, version 7, offset 0x00024cc4, totlen 0x0000019f #ino 5503, version 1, offset 0x00024cc4, totlen 0x00000044 #ino 5503, version 1, offset 0x00024cc4, totlen 0x00000044 #ino 5501, version 30, offset 0x00024cc4, totlen 0x00000066 #ino 5501, version 31, offset 0x00024cc4, totlen 0x00000066 #ino 5501, version 32, offset 0x00024cc4, totlen 0x000000d6 #ino 526, version 8, offset 0x00024d9c, totlen 0x0000019f #ino 5504, version 1, offset 0x00024f3c, totlen 0x00000044 This problem occurs at the following codition: When process the summary node, it has duplicate "node summary entry"(ps: Every physical raw node, such as data node/ dirent node, has summary info in the summary node, I call it "node summary entry" in the following) refence to the same phycal node, and add the node ref to the system twice, trigger the BUG(). The detail process flow like this: 1) When the jffs2 file system is mounted, it would scan the whole medium by scaning each (erase) block; 2) During proccessing an (erase) block (impl in the jffs2_scan_eraseblock() function, which called from jffs2_scan_medium() ), it first procedures the summary node in this block, and then the whole block if necessary; 3) If this block exists the valid summary info, the mounting procedure will parse and fetch the node info from the summary node, this is impl in the function jffs2_sum_scan_sumnode() and jffs2_sum_process_sum_data(); 4) IF finds a valid node, it will link node ref into the system by the node type. (in the following path: jffs2_sum_process_sum_data() ==> sum_link_node_ref() ==> jffs2_link_node_ref() ). Actually, In mostly case, this Phenomenon can be ignored by commit out the code "BUG();" in the jffs2_link_node_ref(), and let it go on. The problem maybe disapper in the furture if this block is expropriated by the GC. But it have risk, it may detory the file system in the next (or next ...) mouting. I dream up two solutions to deal this problem: 1) disable the summary feature; 2) Avoid the duplicate "node summary entry" flows into the (raw) summary node, by confilt check in the jffs2_sum_write_data() function, which call from jffs2_sum_write_sumnode(); and if necessary, we can also ingore the duplicate "node summary entry" when the summary node process in the jffs2_sum_process_sum_data() function. (In reply to comment #5) > 2. The problem seems to go away if filesystem is created with cleanmarkers > (that is, --no-cleanmarkers option removed from runme0.sh) This problem is different from the one in comment #10. The problem is jffs2_sum_write_data setting cln_mkr to nonzero even for erase blocks without cleanmarker. This happens when filling the last, incomplete block of an image generated with --no-cleanmarkers. On mount jffs2_sum_scan_sumnode adds a node of cln_mkr bytes at the beginning of the the block, which then conflicts with the first entry read from the summary. |
Created attachment 510454 [details] The jffs2 image causing kernel oops (compressed file) Description of problem: JFFS2 oopses when trying to mount some specific JFFS2 image, complaining about "BUG at fs/jffs2/nodelist.c:644" (see log below). The image in question (see attached file) is a result of just normal filesystem usage on an embedded ARM system with stock 2.6.36.4 kernel. BTW, this image oopses the same way on the said ARM system too. Version-Release number of selected component (if applicable): 2.6.38.6-26.rc1.fc15.i686 How reproducible: Always Steps to Reproduce: 1. modprobe mtdblock 2. modprobe jffs2 3. modprobe mtdram total_size=2048 erase_size=256 4. dd if=rootfs_data_save.img of=/dev/mtdblock0 bs=512 5. mkdir /mnt/tmp 6. mount -t jffs2 /dev/mtdblock0 /mnt/tmp Actual results: [ 0.000000] Linux version 2.6.38.6-26.rc1.fc15.i686 (mockbuild.fedoraproject.org) (gcc version 4.6.0 20110428 (Red Hat 4.6.0-6) (GCC) ) #1 SMP Mon May 9 20:43:14 UTC 2011 ................trim some (uninteresting)...................... [ 430.092040] JFFS2 error: (1729) jffs2_link_node_ref: Adding new ref ebe6c008 at (0x00000000-0x00000034) not immediately after previous (0x00000000-0x00000000) [ 430.094589] ------------[ cut here ]------------ [ 430.095641] kernel BUG at fs/jffs2/nodelist.c:644! [ 430.095641] invalid opcode: 0000 [#1] SMP [ 430.095641] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map [ 430.095641] Modules linked in: mtdchar mtdram jffs2 zlib_deflate mtdblock mtd_blkdevs fuse cpufreq_ondemand 8021q garp stp acpi_cpufreq llc mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables arc4 snd_hda_codec_si3054 snd_hda_codec_realtek gspca_zc3xx gspca_main snd_hda_intel videodev snd_hda_codec snd_hwdep ath5k snd_seq btusb snd_seq_device r852 sm_common nand bluetooth ath nand_ids snd_pcm nand_ecc mtd mac80211 snd_timer snd cfg80211 microcode r8169 serio_raw iTCO_wdt joydev soundcore iTCO_vendor_support mii snd_page_alloc asus_laptop sparse_keymap rfkill ipv6 squashfs vfat fat sdhci_pci sdhci firewire_ohci mmc_core firewire_core crc_itu_t usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] [ 430.095641] [ 430.095641] Pid: 1729, comm: mount Not tainted 2.6.38.6-26.rc1.fc15.i686 #1 ASUSTeK Computer Inc. A8Le /A8Le [ 430.095641] EIP: 0060:[<f8b0c3ab>] EFLAGS: 00010296 CPU: 1 [ 430.095641] EIP is at jffs2_link_node_ref+0xc9/0x115 [jffs2] [ 430.095641] EAX: 000000a8 EBX: f6810000 ECX: 00000046 EDX: 00000000 [ 430.095641] ESI: ebe6c008 EDI: 00000034 EBP: ebde1d18 ESP: ebde1ce4 [ 430.095641] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 430.095641] Process mount (pid: 1729, ti=ebde0000 task=f6810000 task.ti=ebde0000) [ 430.095641] Stack: [ 430.095641] f8b1b1cf 000006c1 f8b1aa10 ebe6c008 00000000 00000034 00000000 00000000 [ 430.095641] eeff0c00 00000000 ed85c000 eeff0c00 00000000 ebde1d3c f8b19664 00000034 [ 430.095641] edb2a3c0 00000000 eeff0c00 f9954d28 edb2a3c0 0000000c ebde1d90 f8b19ebd [ 430.095641] Call Trace: [ 430.095641] [<f8b19664>] sum_link_node_ref+0x54/0x5c [jffs2] [ 430.095641] [<f8b19ebd>] jffs2_sum_scan_sumnode+0x291/0x57d [jffs2] [ 430.095641] [<c045c921>] ? clockevents_program_event+0xbc/0xce [ 430.095641] [<f8b103ad>] jffs2_scan_medium+0x2ea/0x118c [jffs2] [ 430.095641] [<f8b196e6>] ? jffs2_sum_init+0x7a/0xc7 [jffs2] [ 430.095641] [<f8b12c4f>] ? jffs2_do_mount_fs+0x3c/0x43d [jffs2] [ 430.095641] [<f8b12db2>] jffs2_do_mount_fs+0x19f/0x43d [jffs2] [ 430.095641] [<c04d88c3>] ? __kmalloc+0xe3/0xed [ 430.095641] [<f8b14aaa>] jffs2_do_fill_super+0x125/0x213 [jffs2] [ 430.095641] [<f8b14efd>] ? jffs2_fill_super+0x2b/0xe1 [jffs2] [ 430.095641] [<f8b14fad>] jffs2_fill_super+0xdb/0xe1 [jffs2] [ 430.095641] [<f8df5adb>] mount_mtd_aux+0x46/0x8d [mtd] [ 430.095641] [<f8b14ed2>] ? jffs2_fill_super+0x0/0xe1 [jffs2] [ 430.095641] [<f8df5bed>] mount_mtd+0xcb/0x132 [mtd] [ 430.095641] [<f8b14ed2>] ? jffs2_fill_super+0x0/0xe1 [jffs2] [ 430.095641] [<f8b14cfd>] ? jffs2_mount+0x0/0x24 [jffs2] [ 430.095641] [<f8b14d1c>] jffs2_mount+0x1f/0x24 [jffs2] [ 430.095641] [<f8b14ed2>] ? jffs2_fill_super+0x0/0xe1 [jffs2] [ 430.095641] [<c04e62a5>] vfs_kern_mount+0x98/0x1c2 [ 430.095641] [<c04f71c9>] ? get_fs_type+0x38/0x91 [ 430.095641] [<c04e6422>] do_kern_mount+0x39/0xb5 [ 430.095641] [<c04f9868>] do_mount+0x5ec/0x620 [ 430.095641] [<c04bc831>] ? strndup_user+0x2e/0x3f [ 430.095641] [<c04f9ac3>] sys_mount+0x6d/0x99 [ 430.095641] [<c07d66b4>] syscall_call+0x7/0xb [ 430.095641] Code: fc 01 c8 01 d7 89 4c 24 18 89 7c 24 14 89 54 24 10 89 44 24 1c 8b 83 00 02 00 00 c7 04 24 cf b1 b1 f8 89 44 24 04 e8 55 27 cc c7 <0f> 0b 85 d2 89 73 2c 74 0a 8b 4a 04 89 0e 89 72 04 eb 06 c7 06 [ 430.095641] EIP: [<f8b0c3ab>] jffs2_link_node_ref+0xc9/0x115 [jffs2] SS:ESP 0068:ebde1ce4 [ 430.096168] ---[ end trace 6d38df10002c39b4 ]--- Expected results: JFFS2 filesystem image mounted at /mnt/tmp and available for normal use. Additional info: It might be important that the erase-block size is 256K. This size is used in real MTD memory of the abovementioned embedded ARM system.