Red Hat Bugzilla – Bug 221621
FC6 host reliably hangs when VMware guest writes to shared host xfs filesystem
Last modified: 2007-11-30 17:11:52 EST
Description of problem:
The whole host computer freezes and requires a power cycle when a VMware guest
writes to a shared host xfs filesystem.
Version-Release number of selected component (if applicable):
Have a VMware guest write to an xfs filesystem.
Steps to Reproduce:
1. Boot a Windows 2000 guest in VMware. Give it 256MB RAM on a host system that
has 512MB RAM.
2. Load a large Excel spreadsheet that lives on the host's xfs filesystem and is
shared via samba.
3. Change values in the spreadsheet and let the spreadsheet update for a while.
Eventually the auto save feature kicks in and tries to write out the
spreadsheet. The host computer immediately freezes.
The auto save feature succeeds in saving the spreadsheet and neither the host
nor the guest crashes.
This works for VMware Workstation 5.5.3 as well as 6.0 Beta, and VMware Player
This bug does *not* require the xfs filesystem to be on an encrypted volume.
However, there is some chance it may be related to bug 221619.
Created attachment 144917 [details]
Output of "top" command during the FC6 host crash.
I just began rerunning the "steps to reproduce" above and this time the host
system froze up before the guest even finished booting. At that time, the only
thing being written to was the guest's virtual disk, which is stored on the
host's xfs filesystem. Therefore, apparently any write by the guest to the
host's xfs filesystem has the potential to freeze the host system. However, the
freeze on boot only happened at about the fifth attempt. All the other times, I
had to do something complicated that forced several writes (the spreadsheet update).
Created attachment 144919 [details]
The output of the "top" command when the host system was crashing during a guest boot.
Can you test whether this is specific to xfs by testing it with an ext3
filesystem for example?
Unfortunately, I didn't try this bug on any other filesystem like I did with the
possibly related bug #221619.
I no longer have the same system configuration since I gave up on FC6 and went
back to FC5.
Thanks David (I missed the fact that you had filed both of these bugs)
I'll see if I can find some time to reproduce one or the other, see if it's
unique to xfs or not.
This bug began with FC6, but I just reproduced it in F7 test1 as well.
I can confirm that the bug is *not* specific to the XFS file system. I can
produce identical results with an XP Guest running on an x86_64 system and
everything is EXT3, (except the guest, which believes it is writing [shudder] to
an NTFS file system). This bug ranges across all the latest x86_64 kernels as
well. I'm running dual 285 Opterons, with 4 GB of RAM, and the guest has 1GB.
The hangs are random, form my point of view, but everyone I've corresponded with
about this seems to believe that it is related to high levels of disk I/O.
Usually, nothing is written to any logs, either from VMWare or the host, about
panics or the like because the machine gets too thoroughly frozen, too quickly.
For what it might be worth, RHEL4 has no issues at all, that I've seen, with the
same basic setup. (At work, on RHEL4 it works fine, at home on FC6 it wedges
the machine _totally_ after a little while.)
I can confirm that the bug does not involve interaction with X-Windows.
Running only a small FreeDos VM, with my server at run level 3, (logging in
remotely, using VMWare's Remote Console utility), the machine fell off the
network and appears from the outside to be locked.
The physical console for the server shows a scrolling array of messages along
the lines of:
mptscsih: ioc0: task abort: SUCCESS (sc=ffff810011176ec3c0)
mptscsih: ioc0: attempting task abort! (sc=ffff810078123080)
hda lost interupt
mptscsih: ioc0: task abort: SUCCESS (sc=ffff810078123080)
(this was recorded by hand, as carefully as I could given the speed at which it
was scrolling by, so if there are typos or if I got some punctuation wrong don't
put too much stock in that.)
The machine would not answer the keyboard or any network login attempts, nothing
but a power off seemed to be able to get its attention again.
The kernel that it is running at this time is 2.6.19-1.2911.6.5 x86_64 SMP.
At teh time it went south I was in fact creating a large compressed tar file of
the xhutdown XP Virtual Machine. (I was trying to make a backup against the
chance that everything would crash and leave me with a VM I couldn't boot.)
FWIW, the VM is on a logical volume partition on a SATA drive, and the tar file
was being written to a logical volume partion on a SCSI drive.
I have what I believe to be the same problem running release 2.6.19-1.2288.2.4.fc5
At random time while trying to install win2k, the machine locks up tight
requiring a hard reboot. Once, this occured when presumably (but not
definately) there was no disk access. I looked down to verify the key i had
typed in, looked up and the host system was locked. Presumably the vm was
doing nothing as it was awaiting input from me...
I think this is likely to be something that vmware need to fix rather than a
Please do not close this bug without verifying whether it is related to bug
221619, which has nothing to do with VMware.
This may or may not be related, but I have started seeing hangs on an FC5 system
running 2.6.20-1.2300.fc5.i686.rpm that started about the same time I switched to
When the problem happens processes start hanging, but the system stays up. It will
respond to ping packets, but generally all of the network services end up hung.
Unfortunately I can't easily touch the machine when this has happened which limits
how much I can look at when this has happened. Resets have worked (as opposed to
powering the box down) to clear up the problem. When I did have an ssh open
when this occured, I was able to run ps and see a lot of hung processes.
Eventually the ssh process locked up and I couldn't recover without a reboot.
I am running the same kernel on two other machines and have not seen a problem
on either of them yet. They don't get as much use.
The one that is locking up less than a day after rebooting is using ext3
file systems with write barriers enabled on top of raid 1 (using md devices).
On one of the other machines, I have a similar set up, but the write barriers
are failing (I am not sure why) and is getting disabled. That predates the
2300 kernel, but I don't think it always did that. I haven't been concerned
enough about that problem to spend time digging into it.
I fell back to 2.6.19-1.2288.2.4.fc5 and didn't see the problem reoccur overnight,
which is longer than I was typically getting when using 2300.
I just stumbled across this bug and thought you may want to try disabling
selinux to see if is related to bug 212201. I don't know if this would be of any
help to anyone.
Thanks, but I was aware of bug 212201 and have been running with selinux
disabled. Also, Karl Lewis above reproduced this bug on ext3, so it is not
xfs-specific as bug 212201 is.
Did you try the update:
Good suggestion. I had tried vmware-any-any-update105 a while back, but not 108.
Unfortunately, I can confirm that this bug still exists. Here is my latest
kernel 2.6.20-1.2933.fc6 with selinux disabled
VMware Server 1.0.2
Without vmware-any-any-update108: fails within two minutes
With vmware-any-any-update108: fails within one hour
The failure mode has not changed. The entire system freezes and requires a
power cycle. Sysrq does not work.
This is not a duplicate of bug 221619 since that bug is fixed as of
2.6.20-1.2933.fc6. They may be related; the failure mode is identical and unusual.
I think you really need to report this problem to vmware...
I have reported this to vmware, but they refuse to look at it. Also, since it
is so similar to the other bug, it looks more likely that it is in FC6.
(In reply to comment #20)
> I have reported this to vmware, but they refuse to look at it. Also, since it
> is so similar to the other bug, it looks more likely that it is in FC6.
Why did vmware refuse to look at it?
RE: VMware Support Request SR# 367153
DO NOT CHANGE THE SUBJECT LINE if you want to respond to this email.
Thank you for your Support Request.
VMware makes a point to support the greatest variety of host operating
systems in the virtualziation industry.
However, Fedora Core is not supported at this time.
I have the same issue on a machine almost identical to Karl W. Lewis (comments
above). I'm running 2.6.20-1.2925.
However, I also have another machine (t42p laptop) that I have been running this
exact same XP VM on for over 20 hours now. I have done extensive disk access
both from and to this VM. So far, it has not exhibited this lock-up behavior.
Could this possibly be related to dual-core ?
Unfortunately, the machine on which it fails for me (a Thinkpad T30) has an old
The T42p is still running without issue. I've been banging on it pretty hard too.
One thing that is different between it and the machine that has the issue: I am
not running "desktop effects" on it (ie. compiz).
Anyone experiencing the problem that has desktop effects disabled?
It sounds like you are talking about desktop effects being on the host rather
than the guest, but Karl Lewis has shown above that the bug is independent of
X-Windows on the host. Also, I am not using desktop effects and it fails for me.
Going back over the posts, the same VM can succeed on one machine and fail on
another. However, two very similar machines, a Thinkpad T30 and a Thinkpad
T42p, show different behavior. Comparing their hardware, I don't see any
differences that should affect VMware (noting that X-Windows graphics has
already been ruled out).
I have experienced the failure on the T30 from 2.6.18-1.2869.fc6 through
I tested with the latest kernel, 2.6.20-1.2933.fc6, and it still fails. I
grabbed a generic 184.108.40.206 from kernel.org, and the VMs have both, (WinXP and
FreeDOS), been running for more than 12 hours, which is unheard of, heretofore.
I will try walking forward, slowly, through the generic kernels to see if the
A generic 220.127.116.11 from kernel.org provided no issues after 4 or 5 days. I'm
trying 18.104.22.168 also from kernel.org. (8.5 hours so far, no crash.)
I did try 22.214.171.124 from kernel.org, and that froze up first thing with the VM
The 126.96.36.199 plain vanilla kernel is perfectly stabel, that is to say that the
VM guest and this kernel have been co-resident in memory for days in a row now
with no issues at all. Others on the VMWare forum have lead me to believe that
stepping up to a 2.6.19 kernel will crash the system.
This may or may not be relevant, but I was struck by comment #23 above. I'm
running FC6 kernel 2.6.18-1.2798 and have just switched from dual single-core
Opteron 240s to dual dual-core Opteron 265s. Nothing else has changed.
I had no problems on the single-core processors, but on the dual-core a WinXP
guest under VMware Workstation 5.5.4 hangs intermittently when accessing the
host's EXT3 disks via Samba.
I'm have the same (or similar problem) with 2 Opteron 246s (not dual core)
running 2.6.20-1.2952.fc6 - I can't say for sure if it is a samba problem or
just a heavy disk IO/network IO issue, but I can reproduce a total server
lockup (black screen) running several torrents through VMWare Workstation 5.5
and 6. The only testing I have done so far does involve samba activity between
the VMWare machine and the server.
The latest news I have is that VMWare Workstation v6 fixes the problem. So even
though VMWare hasn't talked about it they seem to have found a way to make
VMWare work with kernels later than 2.6.18.
Ok, interesting. If another reporter or two can confirm, let's close this then,
FWIW, I've been running VMWare Workstation v6 on an Intel Dual Core Processor
under Fedora 8, (started with test 3), for two or three weeks now with no issues
at all. The guest is Windblows XP. The kernel is 2.6.23-something. As I say,
no issues running the guest for a week at a time. VMWare Server 2.0 beta has
just come out and I'll try to test that out on my dualie and see if that works
with a current kernel.
I've now upgraded to VMWare Workstation v6. Nothing else has changed (see
comment #31). The problem has indeed been been fixed.