Bug 1879388

Summary: red hat virtio scsi disk device 6.3.9600.18758
Product: Red Hat Enterprise Linux 8 Reporter: Evgen Puzanov <e.puzanov>
Component: virtio-winAssignee: Vadim Rozenfeld <vrozenfe>
virtio-win sub component: distribution QA Contact: Peixiu Hou <phou>
Status: CLOSED WORKSFORME Docs Contact:
Severity: urgent    
Priority: unspecified CC: gklein, lijin, lsurette, qizhu, vrozenfe
Version: ---   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-03 23:34:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Evgen Puzanov 2020-09-16 07:35:35 UTC
Hello,
 
We have a cloud environment that consists of hosts backed by KVM hypervisors. About half of the virtual machines are running Windows Server operating systems (2008R2, 2012R2, 2016, 2019), there are hundreds of such instances and almost all of them use VirtIO drivers (mostly 0.1.160).
 
Sometimes (it occurred about 3-4 times) we encountered the following glitch: a guest operating decides that its primary disk storage has more size than it actually is. For example, an instance had a virtual drive of 200 GB, it had been worked fine for years, but at some moment (no one knows at which exactly one) primary partition (we mean "drive C:", which usually is the 2nd one, as the 1st one is being used by the operating system) became the size of 210 GB just out of the blue. After that, the system event log started growing with the following error messages: `The driver detected a controller error \ Device \ Harddisk0 \ DR`. Obviously, it happens when the operating system tries to write pieces of data to the sectors that don't exist.
 
Once we expand this virtual drive to 210 GB, the error messages don't appear anymore. Still, after that we find some part of the data corrupted (maybe some fragments of files are being stored to the non-existent sectors), so it seems to be a real problem for us when it happens.
 
Alas, we didn't find a way to reproduce that. As we stated before, it happened only 3-4 times, though each time the outcomes are quite unpleasant.
 
Should we provide with more data regarding this issue? Should we consider upgrading the driver? Perhaps, we just don’t know that it’s a bug that had been fixed after the 0.1.160's release? Just curious, did anyone send a similar bug-report before? We tried to find them, though with no luck.
 
Thanks in advance for your feedback.

Comment 6 Peixiu Hou 2022-05-24 04:16:48 UTC
Hi Evgen Puzanov,

(In reply to Evgen Puzanov from comment #0)
> Hello,
>  
> We have a cloud environment that consists of hosts backed by KVM
> hypervisors. About half of the virtual machines are running Windows Server
> operating systems (2008R2, 2012R2, 2016, 2019), there are hundreds of such
> instances and almost all of them use VirtIO drivers (mostly 0.1.160).
>  
> Sometimes (it occurred about 3-4 times) we encountered the following glitch:
> a guest operating decides that its primary disk storage has more size than
> it actually is. For example, an instance had a virtual drive of 200 GB, it
> had been worked fine for years, but at some moment (no one knows at which
> exactly one) primary partition (we mean "drive C:", which usually is the 2nd
> one, as the 1st one is being used by the operating system) became the size
> of 210 GB just out of the blue. After that, the system event log started
> growing with the following error messages: `The driver detected a controller
> error \ Device \ Harddisk0 \ DR`. Obviously, it happens when the operating
> system tries to write pieces of data to the sectors that don't exist.
>  
> Once we expand this virtual drive to 210 GB, the error messages don't appear
> anymore. Still, after that we find some part of the data corrupted (maybe
> some fragments of files are being stored to the non-existent sectors), so it
> seems to be a real problem for us when it happens.
>  
> Alas, we didn't find a way to reproduce that. As we stated before, it
> happened only 3-4 times, though each time the outcomes are quite unpleasant.
>  
> Should we provide with more data regarding this issue? Should we consider
> upgrading the driver? Perhaps, we just don’t know that it’s a bug that had
> been fixed after the 0.1.160's release? Just curious, did anyone send a
> similar bug-report before? We tried to find them, though with no luck.
>  
> Thanks in advance for your feedback.

Hi Evgen Puzanov,

Sorry for late for this issue~
I want to try to reproduce this issue, but I need some information for your environment:

1) Can I know what's kind cloud env were you used? RHV, Openstack, CNV or others?
2) What's the version of you used KVM hypervisors? and the host kernel version? What's the bios mode? seabios mode or ovmf mode?
3) Did you hit it again later? and if possible to provide the vm's qemu-command line?
On the host, to run "ps aux| grep qemu" to get this info.

Thanks a lot~
Peixiu