Hide Forgot
We have 33 VMs hosted by a VMware ESXi cluster. They run fine for around six months running Fedora-12. They begin to be unstable since I upgraded them (using yum) to Fedora-14. Two other VMs installed from scratch start to present the same symptoms, thus I don't think it's relied on the upgrade process. We have more than hundred stations and laptop installed the same way in Fedora-14 that do not have this problem. The symptoms may appear after a rather long time, say between one day or a week. Example 1: loop of kernel oops The more serious one: the VM loops doing a kernel oops, and is not accessible any more. We have to force reboot it. A manual fsck is then sometimes needed. The oops is most often showing calls to system_call_fastpath and ext4_file_write (or nfs3_decode_dirent), but not always. Example 2: uptime and top segfault in libproc [601941.287198] uptime[4348]: segfault at 42410073 ip \ 00000035fbe0a001 sp 00007fff569236e0 error 6 \ in libproc-3.2.8.so[35fbe00000+e000] rpm -V confirms a corruption in libproc: rpm -Vf /lib64/libproc-3.2.8.so prelink: /lib64/libproc-3.2.8.so: prelinked file was modified S.?...... /lib64/libproc-3.2.8.so After rebooting this is solved. Example 3: /var/log/messages showing EXT4-fs error Like: EXT4-fs error (device sda2): ext4_lookup: inode #923158: \ (comm find) deleted inode referenced: 923185 EXT4-fs error (device sda2): ext4_ext_check_inode: inode #209183: \ (comm find) bad header/extent: invalid magic - magic 0, entries 0, \ max 0(0), depth 0(0) We walk the filesystem with find every night. Any advice to investigate more on this welcome. I plan to reconfigure half of those VMs to use EXT3 instead of EXT4. Do you think it's a valuable test? Thanks.
For Example 3, I observed on one of my machine too. In my environment, the file with inode number is a broken directory, which is deleted but still appears in parent directory.
> I plan to reconfigure half of those VMs to use EXT3 instead of EXT4. I did that and noticed an "EXT3-fs error in htree_dirblock_to_tree: bad entry in directory" once on one VM: that was not specific to EXT4. I rebooted all of them at the begining of May after a full "yum update", including the 2.6.35.12-90.fc14 kernel and the problem seems solved. You can close this bug.