Bug 1284969 - Corruption of Windows 2012 R2 VM filesystems [NEEDINFO]
Corruption of Windows 2012 R2 VM filesystems
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: virtio-win (Show other bugs)
6.6
x86_64 Linux
high Severity high
: pre-dev-freeze
: ---
Assigned To: Vadim Rozenfeld
Virtualization Bugs
:
Depends On:
Blocks: 1200970 1269194
  Show dependency treegraph
 
Reported: 2015-11-24 10:01 EST by Olimp Bockowski
Modified: 2016-09-28 05:45 EDT (History)
19 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-08-24 23:30:02 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
obockows: needinfo? (avinashsau)


Attachments (Terms of Use)

  None (edit)
Description Olimp Bockowski 2015-11-24 10:01:26 EST
Description of problem:
Serious recurring problem with corruption of the filesystem of Windows VMs. 

There are some doubts:
1. write cache enabled according to Microsoft:
Microsoft was analysing this issue, they pointed out many warnings in logs: 
"The driver detected that the device \Device\Harddisk0\DR0 has its write cache enabled. Data corruption may occur."

But all virtual machines has cache=none for qemu-kvm
Anyway Microsoft wrote:
"I suggest you to check the drivers/storage settings for the write cache. The drivers settings are in the registry, you should be able to inspect the settings booting with Command prompt and loading the SYSTEM hive in RegEdit."
^^ could it overwrite cache=none settings?

2. There is suspicion towards FUA - closed bug 837324.

Seems it has been fixed in virtio-win-1.7.4-1.el7.noarch.rpm
The driver version is DriverVer=06/04/2014,60.71.104.8600 and the commit is from:
commit d7d34714f216cc3291c14427820e781eeb2d5668 Author: Vadim Rozenfeld <vrozenfe@redhat.com> 
Date: Sun May 4 15:56:36 2014 +1000 [viostor] Bug 837324 - [virtio-win][viostor]viostor reports support for FUA, but does not implement it. "

concern is:
"This driver version is a bit old and we have found out that Microsoft has updated the viostor.sys driver version in Windows Update from version 06/04/2014,60.71.104.8600 to 22/09/2015 62.72.104.11000 We don't know for sure what are the changes/fixes implemented. Please, Could you investigate this further?"


Version-Release number of selected component (if applicable):


rhevh 6.6 
vdsm-4.16.13.1-1.el6ev.x86_64
qemu-kvm-rhev-0.12.1.2-2.448.el6_6.3.x86_64
rhevm-3.5.5-0.1.el6ev.noarch

How reproducible:

no pattern, up to now we have observed only  Windows 2012 R2 virtual machines.


Actual results:
random corruption of virtual machines.

Expected results:
no corruption

Additional info:
using Emulex LPe12000 HBAs to access storage using FC. In this case there is NO corruption of metada even at VM LV level. The corruption is present only at OS FS level.
Comment 2 Kevin Wolf 2015-11-24 10:55:07 EST
(In reply to Olimp Bockowski from comment #0)
> But all virtual machines has cache=none for qemu-kvm
> Anyway Microsoft wrote:
> "I suggest you to check the drivers/storage settings for the write cache.
> The drivers settings are in the registry, you should be able to inspect the
> settings booting with Command prompt and loading the SYSTEM hive in RegEdit."
> ^^ could it overwrite cache=none settings?

cache=none is a writeback cache mode (cache.direct=on,cache.writeback=on). The
guest can in theory change cache.writeback (resulting in cache=directsync), but
it can't change cache.direct. I don't know whether the Windows virtio-blk
drivers actually implement this, though.

In any case, you don't want to use cache.writeback=off because it comes at a
pretty bad performance penalty, and it's not necessary if the guest driver
operates correctly.
Comment 4 Alexandros Gkesos 2015-11-25 06:51:27 EST
Hello Vadim,

Olimp is out of the office today.
Thank you for the explanation.
May i ask what "crash dump" do you need?
As i understand from the case, they had restarted/shutdown the VM and after that, the FS was corrupted.

Thank you,
Alexandros
Comment 5 Vadim Rozenfeld 2015-11-25 07:15:29 EST
(In reply to Alexandros Gkesos from comment #4)
> Hello Vadim,
> 
> Olimp is out of the office today.
> Thank you for the explanation.
> May i ask what "crash dump" do you need?

Hi Alexandros,

Normally the default folder for storing small and kernel dump files should be C:\windows\Minidump\. You can also check if C:\windows\MEMORY.DMP is present.

> As i understand from the case, they had restarted/shutdown the VM and after
> that, the FS was corrupted.
But how do they recover the system after such crashes?
Another question, if they have any kind of antivirus installed on the system.
Can they also check if the problem is reproducible when the system is operating in Safe Mode?

Best regards,
Vadim.
> 
> Thank you,
> Alexandros
Comment 30 Avinash Kumar 2016-02-24 03:35:10 EST
Hi Guys,

the dump size is around 140Mb, could you plz tell me where to upload this.

Regards
Avinash Kumar
Comment 31 Vadim Rozenfeld 2016-02-24 04:17:14 EST
(In reply to Avinash Kumar from comment #30)
> Hi Guys,
> 
> the dump size is around 140Mb, could you plz tell me where to upload this.
> 
> Regards
> Avinash Kumar

Can you try uploading it to Red Hat Customer Portal?
Btw, as a rule dump files have a very good compression ratio, so it should be much smaller after compression.

Thanks,
Vadim.
Comment 36 Avinash Kumar 2016-03-02 05:58:08 EST
Hi Vladim,

I had uploaded it to Dropbox, named : Case01579700.rar
let me know if you can't find it, I will then upload it again.


[root@rhevm tmp]# curl -T /tmp/Case01579700.rar ftp://dropbox.redhat.com/incoming/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 43.7M    0     0  100 43.7M      0   642k  0:01:09  0:01:09 --:--:-- 1630k


Regards
Avinash Kumar
Comment 37 Avinash Kumar 2016-03-02 05:58:43 EST
Hi Vladim,

I had uploaded it to Dropbox, named : Case01579700.rar
let me know if you can't find it, I will then upload it again.


[root@rhevm tmp]# curl -T /tmp/Case01579700.rar ftp://dropbox.redhat.com/incoming/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 43.7M    0     0  100 43.7M      0   642k  0:01:09  0:01:09 --:--:-- 1630k


Regards
Avinash Kumar
Comment 38 Avinash Kumar 2016-03-02 06:00:15 EST
(In reply to Vadim Rozenfeld from comment #31)
> (In reply to Avinash Kumar from comment #30)
> > Hi Guys,
> > 
> > the dump size is around 140Mb, could you plz tell me where to upload this.
> > 
> > Regards
> > Avinash Kumar
> 
> Can you try uploading it to Red Hat Customer Portal?
> Btw, as a rule dump files have a very good compression ratio, so it should
> be much smaller after compression.
> 
> Thanks,
> Vadim.

Hi Vladim,

I had uploaded it to Dropbox, named : Case01579700.rar
let me know if you can't find it, I will then upload it again.


[root@rhevm tmp]# curl -T /tmp/Case01579700.rar ftp://dropbox.redhat.com/incoming/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 43.7M    0     0  100 43.7M      0   642k  0:01:09  0:01:09 --:--:-- 1630k


Regards
Avinash Kumar
Comment 39 Vadim Rozenfeld 2016-03-03 00:25:45 EST
(In reply to Avinash Kumar from comment #38)
> (In reply to Vadim Rozenfeld from comment #31)
> > (In reply to Avinash Kumar from comment #30)
> > > Hi Guys,
> > > 
> > > the dump size is around 140Mb, could you plz tell me where to upload this.
> > > 
> > > Regards
> > > Avinash Kumar
> > 
> > Can you try uploading it to Red Hat Customer Portal?
> > Btw, as a rule dump files have a very good compression ratio, so it should
> > be much smaller after compression.
> > 
> > Thanks,
> > Vadim.
> 
> Hi Vladim,
> 
> I had uploaded it to Dropbox, named : Case01579700.rar
> let me know if you can't find it, I will then upload it again.
> 
> 
> [root@rhevm tmp]# curl -T /tmp/Case01579700.rar
> ftp://dropbox.redhat.com/incoming/
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time 
> Current
>                                  Dload  Upload   Total   Spent    Left  Speed
> 100 43.7M    0     0  100 43.7M      0   642k  0:01:09  0:01:09 --:--:--
> 1630k
> 
> 
> Regards
> Avinash Kumar

Hi Avinash,

Unfortunately, this crash dump is corrupted and doesn't provide any useful information. I even failed to retrieve the list of running modules.

Best regards,
Vadim.
Comment 41 Avinash Kumar 2016-03-13 01:29:34 EST
(In reply to Vadim Rozenfeld from comment #39)
> (In reply to Avinash Kumar from comment #38)
> > (In reply to Vadim Rozenfeld from comment #31)
> > > (In reply to Avinash Kumar from comment #30)
> > > > Hi Guys,
> > > > 
> > > > the dump size is around 140Mb, could you plz tell me where to upload this.
> > > > 
> > > > Regards
> > > > Avinash Kumar
> > > 
> > > Can you try uploading it to Red Hat Customer Portal?
> > > Btw, as a rule dump files have a very good compression ratio, so it should
> > > be much smaller after compression.
> > > 
> > > Thanks,
> > > Vadim.
> > 
> > Hi Vladim,
> > 
> > I had uploaded it to Dropbox, named : Case01579700.rar
> > let me know if you can't find it, I will then upload it again.
> > 
> > 
> > [root@rhevm tmp]# curl -T /tmp/Case01579700.rar
> > ftp://dropbox.redhat.com/incoming/
> >   % Total    % Received % Xferd  Average Speed   Time    Time     Time 
> > Current
> >                                  Dload  Upload   Total   Spent    Left  Speed
> > 100 43.7M    0     0  100 43.7M      0   642k  0:01:09  0:01:09 --:--:--
> > 1630k
> > 
> > 
> > Regards
> > Avinash Kumar
> 
> Hi Avinash,
> 
> Unfortunately, this crash dump is corrupted and doesn't provide any useful
> information. I even failed to retrieve the list of running modules.
> 
> Best regards,
> Vadim.

Hi Vadim,

The memory dump seems to be corrupted.

I have uploaded the memory dump of another which was crashed(Win2012 64bit VM). I have uploaded the Memory into the dropbox

File Name : CASE-01579700.zip 

[root@rhevm tmp]# curl -T CASE-01579700.zip ftp://dropbox.redhat.com/incoming/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 48.2M    0     0  100 48.2M      0  1188k  0:00:41  0:00:41 --:--:-- 1222k


We have 3 VMs curropted in similar way(1VM 32 bit and 2 VMs 64 bit )

Regards
Avinash Kumar
Comment 42 Vadim Rozenfeld 2016-03-14 07:26:10 EDT
(In reply to Avinash Kumar from comment #41)
> (In reply to Vadim Rozenfeld from comment #39)
> > (In reply to Avinash Kumar from comment #38)
> > > (In reply to Vadim Rozenfeld from comment #31)
> > > > (In reply to Avinash Kumar from comment #30)
> > > > > Hi Guys,
> > > > > 
> > > > > the dump size is around 140Mb, could you plz tell me where to upload this.
> > > > > 
> > > > > Regards
> > > > > Avinash Kumar
> > > > 
> > > > Can you try uploading it to Red Hat Customer Portal?
> > > > Btw, as a rule dump files have a very good compression ratio, so it should
> > > > be much smaller after compression.
> > > > 
> > > > Thanks,
> > > > Vadim.
> > > 
> > > Hi Vladim,
> > > 
> > > I had uploaded it to Dropbox, named : Case01579700.rar
> > > let me know if you can't find it, I will then upload it again.
> > > 
> > > 
> > > [root@rhevm tmp]# curl -T /tmp/Case01579700.rar
> > > ftp://dropbox.redhat.com/incoming/
> > >   % Total    % Received % Xferd  Average Speed   Time    Time     Time 
> > > Current
> > >                                  Dload  Upload   Total   Spent    Left  Speed
> > > 100 43.7M    0     0  100 43.7M      0   642k  0:01:09  0:01:09 --:--:--
> > > 1630k
> > > 
> > > 
> > > Regards
> > > Avinash Kumar
> > 
> > Hi Avinash,
> > 
> > Unfortunately, this crash dump is corrupted and doesn't provide any useful
> > information. I even failed to retrieve the list of running modules.
> > 
> > Best regards,
> > Vadim.
> 
> Hi Vadim,
> 
> The memory dump seems to be corrupted.
> 
> I have uploaded the memory dump of another which was crashed(Win2012 64bit
> VM). I have uploaded the Memory into the dropbox
> 
> File Name : CASE-01579700.zip 
> 
> [root@rhevm tmp]# curl -T CASE-01579700.zip
> ftp://dropbox.redhat.com/incoming/
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time 
> Current
>                                  Dload  Upload   Total   Spent    Left  Speed
> 100 48.2M    0     0  100 48.2M      0  1188k  0:00:41  0:00:41 --:--:--
> 1222k
> 
> 
> We have 3 VMs curropted in similar way(1VM 32 bit and 2 VMs 64 bit )
> 
> Regards
> Avinash Kumar

090815-122000-01.dmp & 090915-79296-01.dmp are caused by vioserial driver which seems to be a bit old (Timestamp: Wed Jun 04 23:23:05 2014 (538F1DB9)). I suggest to update it with a more recent one.


memory.dmp
120615-22546-01.dmp
120615-24234-01.dmp
120615-24390-01.dmp
120615-31125-01.dmp
120615-50578-01.dmp
120615-57437-01.dmp
120715-25937-01.dmp
120715-38171-01.dmp
120715-42265-01.dmp
120815-31218-01.dmp
120815-34125-01.dmp
are all crashed with 0xc00002e2 bugcheck code which is Directory Services related. Can it be that they tried to remove  DirectoryServices-DomainController role?
Did they try to restore the Directory Services and check if there is a problem with the Active Directory Database after that?

Thanks,
Vadim.
Comment 43 Avinash Kumar 2016-08-08 07:57:11 EDT
Hi

This bug was fixed by Microsoft. kindly close this bug.

Regards
Avinash Kumar
Comment 44 Olimp Bockowski 2016-08-08 08:26:55 EDT
Hello Avinash,

could you more details?

olimpb
Comment 46 Avinash Kumar 2016-09-01 07:52:52 EDT
Hi Olimp,

The Disk image was recovered with help of Microsoft team.
The VMs were getting corrupted due to High IO latency from storage end.

Kindly let me know if you require more information.

Regards
Avinash Kumar
Comment 47 Olimp Bockowski 2016-09-28 05:45:00 EDT
Hello Avinash,

Is it possible you could share with us Microsoft case number? The other customer would like to ask Microsoft to check is it the same issue you have experienced. 
I have sent you a private e-mail, if you sent me Microsoft ticket number, I would appreciate.

olimpb

Note You need to log in before you can comment on or make changes to this bug.