Description of problem: The whole host computer freezes and requires a power cycle when a VMware guest writes to a shared host xfs filesystem. Version-Release number of selected component (if applicable): 2.6.18-1.2869.fc6 How reproducible: Have a VMware guest write to an xfs filesystem. Steps to Reproduce: 1. Boot a Windows 2000 guest in VMware. Give it 256MB RAM on a host system that has 512MB RAM. 2. Load a large Excel spreadsheet that lives on the host's xfs filesystem and is shared via samba. 3. Change values in the spreadsheet and let the spreadsheet update for a while. Actual results: Eventually the auto save feature kicks in and tries to write out the spreadsheet. The host computer immediately freezes. Expected results: The auto save feature succeeds in saving the spreadsheet and neither the host nor the guest crashes. Additional info: This works for VMware Workstation 5.5.3 as well as 6.0 Beta, and VMware Player 1.0.3. This bug does *not* require the xfs filesystem to be on an encrypted volume. However, there is some chance it may be related to bug 221619.
Created attachment 144917 [details] Output of "top" command during the FC6 host crash.
I just began rerunning the "steps to reproduce" above and this time the host system froze up before the guest even finished booting. At that time, the only thing being written to was the guest's virtual disk, which is stored on the host's xfs filesystem. Therefore, apparently any write by the guest to the host's xfs filesystem has the potential to freeze the host system. However, the freeze on boot only happened at about the fifth attempt. All the other times, I had to do something complicated that forced several writes (the spreadsheet update).
Created attachment 144919 [details] The output of the "top" command when the host system was crashing during a guest boot.
Can you test whether this is specific to xfs by testing it with an ext3 filesystem for example? Thanks, -Eric
Unfortunately, I didn't try this bug on any other filesystem like I did with the possibly related bug #221619. I no longer have the same system configuration since I gave up on FC6 and went back to FC5.
Thanks David (I missed the fact that you had filed both of these bugs) I'll see if I can find some time to reproduce one or the other, see if it's unique to xfs or not. Thanks, -Eric
This bug began with FC6, but I just reproduced it in F7 test1 as well.
I can confirm that the bug is *not* specific to the XFS file system. I can produce identical results with an XP Guest running on an x86_64 system and everything is EXT3, (except the guest, which believes it is writing [shudder] to an NTFS file system). This bug ranges across all the latest x86_64 kernels as well. I'm running dual 285 Opterons, with 4 GB of RAM, and the guest has 1GB. The hangs are random, form my point of view, but everyone I've corresponded with about this seems to believe that it is related to high levels of disk I/O. Usually, nothing is written to any logs, either from VMWare or the host, about panics or the like because the machine gets too thoroughly frozen, too quickly. For what it might be worth, RHEL4 has no issues at all, that I've seen, with the same basic setup. (At work, on RHEL4 it works fine, at home on FC6 it wedges the machine _totally_ after a little while.) KWL
I can confirm that the bug does not involve interaction with X-Windows. Running only a small FreeDos VM, with my server at run level 3, (logging in remotely, using VMWare's Remote Console utility), the machine fell off the network and appears from the outside to be locked. The physical console for the server shows a scrolling array of messages along the lines of: mptscsih: ioc0: task abort: SUCCESS (sc=ffff810011176ec3c0) mptscsih: ioc0: attempting task abort! (sc=ffff810078123080) hda lost interupt mptscsih: ioc0: task abort: SUCCESS (sc=ffff810078123080) (this was recorded by hand, as carefully as I could given the speed at which it was scrolling by, so if there are typos or if I got some punctuation wrong don't put too much stock in that.) The machine would not answer the keyboard or any network login attempts, nothing but a power off seemed to be able to get its attention again. The kernel that it is running at this time is 2.6.19-1.2911.6.5 x86_64 SMP. At teh time it went south I was in fact creating a large compressed tar file of the xhutdown XP Virtual Machine. (I was trying to make a backup against the chance that everything would crash and leave me with a VM I couldn't boot.) FWIW, the VM is on a logical volume partition on a SATA drive, and the tar file was being written to a logical volume partion on a SCSI drive. KWL
I have what I believe to be the same problem running release 2.6.19-1.2288.2.4.fc5 At random time while trying to install win2k, the machine locks up tight requiring a hard reboot. Once, this occured when presumably (but not definately) there was no disk access. I looked down to verify the key i had typed in, looked up and the host system was locked. Presumably the vm was doing nothing as it was awaiting input from me...
I think this is likely to be something that vmware need to fix rather than a kernel bug.
Please do not close this bug without verifying whether it is related to bug 221619, which has nothing to do with VMware.
This may or may not be related, but I have started seeing hangs on an FC5 system running 2.6.20-1.2300.fc5.i686.rpm that started about the same time I switched to that kernel. When the problem happens processes start hanging, but the system stays up. It will respond to ping packets, but generally all of the network services end up hung. Unfortunately I can't easily touch the machine when this has happened which limits how much I can look at when this has happened. Resets have worked (as opposed to powering the box down) to clear up the problem. When I did have an ssh open when this occured, I was able to run ps and see a lot of hung processes. Eventually the ssh process locked up and I couldn't recover without a reboot. I am running the same kernel on two other machines and have not seen a problem on either of them yet. They don't get as much use. The one that is locking up less than a day after rebooting is using ext3 file systems with write barriers enabled on top of raid 1 (using md devices). On one of the other machines, I have a similar set up, but the write barriers are failing (I am not sure why) and is getting disabled. That predates the 2300 kernel, but I don't think it always did that. I haven't been concerned enough about that problem to spend time digging into it.
I fell back to 2.6.19-1.2288.2.4.fc5 and didn't see the problem reoccur overnight, which is longer than I was typically getting when using 2300.
I just stumbled across this bug and thought you may want to try disabling selinux to see if is related to bug 212201. I don't know if this would be of any help to anyone. Good luck.
Thanks, but I was aware of bug 212201 and have been running with selinux disabled. Also, Karl Lewis above reproduced this bug on ext3, so it is not xfs-specific as bug 212201 is.
Did you try the update: http://knihovny.cvut.cz/ftp/pub/vmware/vmware-any-any-update108.tar.gz
Good suggestion. I had tried vmware-any-any-update105 a while back, but not 108. Unfortunately, I can confirm that this bug still exists. Here is my latest configuration. kernel 2.6.20-1.2933.fc6 with selinux disabled VMware Server 1.0.2 Without vmware-any-any-update108: fails within two minutes With vmware-any-any-update108: fails within one hour The failure mode has not changed. The entire system freezes and requires a power cycle. Sysrq does not work. This is not a duplicate of bug 221619 since that bug is fixed as of 2.6.20-1.2933.fc6. They may be related; the failure mode is identical and unusual.
I think you really need to report this problem to vmware...
I have reported this to vmware, but they refuse to look at it. Also, since it is so similar to the other bug, it looks more likely that it is in FC6.
(In reply to comment #20) > I have reported this to vmware, but they refuse to look at it. Also, since it > is so similar to the other bug, it looks more likely that it is in FC6. > Why did vmware refuse to look at it?
RE: VMware Support Request SR# 367153 DO NOT CHANGE THE SUBJECT LINE if you want to respond to this email. Dear David, Thank you for your Support Request. VMware makes a point to support the greatest variety of host operating systems in the virtualziation industry. However, Fedora Core is not supported at this time.
I have the same issue on a machine almost identical to Karl W. Lewis (comments above). I'm running 2.6.20-1.2925. However, I also have another machine (t42p laptop) that I have been running this exact same XP VM on for over 20 hours now. I have done extensive disk access both from and to this VM. So far, it has not exhibited this lock-up behavior. Could this possibly be related to dual-core ?
Unfortunately, the machine on which it fails for me (a Thinkpad T30) has an old single-core processor.
The T42p is still running without issue. I've been banging on it pretty hard too. One thing that is different between it and the machine that has the issue: I am not running "desktop effects" on it (ie. compiz). Anyone experiencing the problem that has desktop effects disabled?
It sounds like you are talking about desktop effects being on the host rather than the guest, but Karl Lewis has shown above that the bug is independent of X-Windows on the host. Also, I am not using desktop effects and it fails for me.
Going back over the posts, the same VM can succeed on one machine and fail on another. However, two very similar machines, a Thinkpad T30 and a Thinkpad T42p, show different behavior. Comparing their hardware, I don't see any differences that should affect VMware (noting that X-Windows graphics has already been ruled out). T30: ftp://ftp.software.ibm.com/pc/pccbbs/mobiles_pdf/92p1840.pdf T42p: ftp://ftp.software.ibm.com/pc/pccbbs/mobiles_pdf/13n6243.pdf I have experienced the failure on the T30 from 2.6.18-1.2869.fc6 through 2.6.20-1.2933.fc6. Strange.
I tested with the latest kernel, 2.6.20-1.2933.fc6, and it still fails. I grabbed a generic 2.6.17.1 from kernel.org, and the VMs have both, (WinXP and FreeDOS), been running for more than 12 hours, which is unheard of, heretofore. I will try walking forward, slowly, through the generic kernels to see if the problem resurfaces. KWL
A generic 2.6.17.1 from kernel.org provided no issues after 4 or 5 days. I'm trying 2.6.18.1 also from kernel.org. (8.5 hours so far, no crash.) I did try 2.6.20.4 from kernel.org, and that froze up first thing with the VM running. KWL
The 2.6.18.1 plain vanilla kernel is perfectly stabel, that is to say that the VM guest and this kernel have been co-resident in memory for days in a row now with no issues at all. Others on the VMWare forum have lead me to believe that stepping up to a 2.6.19 kernel will crash the system. KWL
This may or may not be relevant, but I was struck by comment #23 above. I'm running FC6 kernel 2.6.18-1.2798 and have just switched from dual single-core Opteron 240s to dual dual-core Opteron 265s. Nothing else has changed. I had no problems on the single-core processors, but on the dual-core a WinXP guest under VMware Workstation 5.5.4 hangs intermittently when accessing the host's EXT3 disks via Samba.
I'm have the same (or similar problem) with 2 Opteron 246s (not dual core) running 2.6.20-1.2952.fc6 - I can't say for sure if it is a samba problem or just a heavy disk IO/network IO issue, but I can reproduce a total server lockup (black screen) running several torrents through VMWare Workstation 5.5 and 6. The only testing I have done so far does involve samba activity between the VMWare machine and the server.
The latest news I have is that VMWare Workstation v6 fixes the problem. So even though VMWare hasn't talked about it they seem to have found a way to make VMWare work with kernels later than 2.6.18. KWL
Ok, interesting. If another reporter or two can confirm, let's close this then, I guess... -Eric
FWIW, I've been running VMWare Workstation v6 on an Intel Dual Core Processor under Fedora 8, (started with test 3), for two or three weeks now with no issues at all. The guest is Windblows XP. The kernel is 2.6.23-something. As I say, no issues running the guest for a week at a time. VMWare Server 2.0 beta has just come out and I'll try to test that out on my dualie and see if that works with a current kernel. KWL
I've now upgraded to VMWare Workstation v6. Nothing else has changed (see comment #31). The problem has indeed been been fixed.