| Summary: | EXT4-fs error and kernel oops in VMs hosted by VMware ESXi | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Francis.Montagnac |
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
| Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 14 | CC: | colyli, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-06-25 15:30:43 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
For Example 3, I observed on one of my machine too. In my environment, the file with inode number is a broken directory, which is deleted but still appears in parent directory. > I plan to reconfigure half of those VMs to use EXT3 instead of EXT4.
I did that and noticed an "EXT3-fs error in htree_dirblock_to_tree:
bad entry in directory" once on one VM: that was not specific to EXT4.
I rebooted all of them at the begining of May after a full
"yum update", including the 2.6.35.12-90.fc14 kernel and the problem
seems solved.
You can close this bug.
|
We have 33 VMs hosted by a VMware ESXi cluster. They run fine for around six months running Fedora-12. They begin to be unstable since I upgraded them (using yum) to Fedora-14. Two other VMs installed from scratch start to present the same symptoms, thus I don't think it's relied on the upgrade process. We have more than hundred stations and laptop installed the same way in Fedora-14 that do not have this problem. The symptoms may appear after a rather long time, say between one day or a week. Example 1: loop of kernel oops The more serious one: the VM loops doing a kernel oops, and is not accessible any more. We have to force reboot it. A manual fsck is then sometimes needed. The oops is most often showing calls to system_call_fastpath and ext4_file_write (or nfs3_decode_dirent), but not always. Example 2: uptime and top segfault in libproc [601941.287198] uptime[4348]: segfault at 42410073 ip \ 00000035fbe0a001 sp 00007fff569236e0 error 6 \ in libproc-3.2.8.so[35fbe00000+e000] rpm -V confirms a corruption in libproc: rpm -Vf /lib64/libproc-3.2.8.so prelink: /lib64/libproc-3.2.8.so: prelinked file was modified S.?...... /lib64/libproc-3.2.8.so After rebooting this is solved. Example 3: /var/log/messages showing EXT4-fs error Like: EXT4-fs error (device sda2): ext4_lookup: inode #923158: \ (comm find) deleted inode referenced: 923185 EXT4-fs error (device sda2): ext4_ext_check_inode: inode #209183: \ (comm find) bad header/extent: invalid magic - magic 0, entries 0, \ max 0(0), depth 0(0) We walk the filesystem with find every night. Any advice to investigate more on this welcome. I plan to reconfigure half of those VMs to use EXT3 instead of EXT4. Do you think it's a valuable test? Thanks.