Bug 1723530
Summary: | qemu aio=native on Gluster: Metadata corruption detected at xfs_buf_ioend+0x58/0x1e0 [xfs], xfs_inode block 0x240 xfs_inode_bug_verify | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Avihai <aefrat> | ||||||||||
Component: | qemu-kvm | Assignee: | Virtualization Maintenance <virt-maint> | ||||||||||
qemu-kvm sub component: | Gluster | QA Contact: | qing.wang <qinwang> | ||||||||||
Status: | CLOSED WONTFIX | Docs Contact: | |||||||||||
Severity: | medium | ||||||||||||
Priority: | medium | CC: | bugs, coli, gveitmic, jinzhao, juzhang, mkalinin, mtessun, nsoffer, qinwang, sabose, sgarzare, virt-maint, xuwei, ymankad | ||||||||||
Version: | --- | Keywords: | Triaged | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2021-09-08 19:15:19 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1758964 | ||||||||||||
Attachments: |
|
Description
Avihai
2019-06-24 18:00:17 UTC
Created attachment 1584165 [details]
vm xml extracted from vdsm.log
Avihay, we need more data about this. - can you reproduce this manually? - Does it happen with rhel 7.6 guest or only with rhel 7.7 guest? - Does it happen only with rhel 7.7 host or also with rhel 7.6 hosts and 7.7 guest? - Can you attach the vm journal? (journalctl -b) maybe start the same vm on rhel 7.6 if the issue happens only with 7.7. - Can you add gluster logs from /var/log/gluster*/? When we have more info we can ask the qemu team to look at this. Created attachment 1584203 [details]
gluster logs from the host
(In reply to Nir Soffer from comment #2) > Avihay, we need more data about this. > > - can you reproduce this manually? Yes indeed but about 70% of the times (only on rhel8 guest with gluster os disk) Reproducible is ~70% manually , from 14 created VMs(rhel8 guest) issue occurred only 10 times In automation each time I run TestCase18868 issue occurs(ran about ~7 times). > - Does it happen with rhel 7.6 guest or only with rhel 7.7 guest? 7.6 guest - Tried 8 times issue did not reproduce 7.7 guest - Tried 8 times issue did not reproduce 8.1 guest with gluster os disk- Issue occurred 10/14(~ 70% rep ratio) times manually and each time running automation TestCase18868(which does create VM from rhel8 template +cold snapshot) 8.1 guest with FCP os disk- created 8 VM's issue did not occur once. > - Does it happen only with rhel 7.7 host or also with rhel 7.6 hosts and 7.7 > guest? Until now we worked only with rhel7.6(4.3.4) hosts with same ENVs/gluster and the issue did not occur once. If you are asking if we tested latest 4.3.5 with rhel7.6 hosts we did not as 4.3.5 is synced with RHEL7.7. > - Can you attach the vm journal? (journalctl -b) maybe start the same vm on > rhel 7.6 > if the issue happens only with 7.7. No as when the VM reaches this that you cannot connect via ssh and console is not responsive. > - Can you add gluster logs from /var/log/gluster*/? Added > When we have more info we can ask the qemu team to look at this. Also more info: Also I saw the issue occurs on multiple enviroments. Also the issue occur with 2 different gluster clusters with 2 different versions: glusterfs 3.12.6 (TLV2 site) - out infra TierX runs are done here glusterfs 6.3 (Raanana site) - local RHV storage team (In reply to Avihai from comment #4) > (In reply to Nir Soffer from comment #2) > > Avihay, we need more data about this. > > > > - can you reproduce this manually? > Yes indeed but about 70% of the times (only on rhel8 guest with gluster os > disk) > Reproducible is ~70% manually , from 14 created VMs(rhel8 guest) issue > occurred only 10 times > In automation each time I run TestCase18868 issue occurs(ran about ~7 times). > > > - Does it happen with rhel 7.6 guest or only with rhel 7.7 guest? > > 7.6 guest - Tried 8 times issue did not reproduce > 7.7 guest - Tried 8 times issue did not reproduce > 8.1 guest with gluster os disk- Issue occurred 10/14(~ 70% rep ratio) times > manually and each time running automation TestCase18868(which does create VM > from rhel8 template +cold snapshot) > 8.1 guest with FCP os disk- created 8 VM's issue did not occur once. > > > - Does it happen only with rhel 7.7 host or also with rhel 7.6 hosts and 7.7 > > guest? > Until now we worked only with rhel7.6(4.3.4) hosts with same ENVs/gluster > and the issue did not occur once. > If you are asking if we tested latest 4.3.5 with rhel7.6 hosts we did not as > 4.3.5 is synced with RHEL7.7. > > > > - Can you attach the vm journal? (journalctl -b) maybe start the same vm on > > rhel 7.6 > > if the issue happens only with 7.7. > No as when the VM reaches this that you cannot connect via ssh and console > is not responsive. > > > - Can you add gluster logs from /var/log/gluster*/? > Added > > When we have more info we can ask the qemu team to look at this. > > > Also more info: > Also I saw the issue occurs on multiple enviroments. > Also the issue occur with 2 different gluster clusters with 2 different > versions: > glusterfs 3.12.6 (TLV2 site) - out infra TierX runs are done here > glusterfs 6.3 (Raanana site) - local RHV storage team Could this be related to Bug 1701736? Can you try changing the UseNativeIOForGluster to false in vdc_options before the run? (In reply to Sahina Bose from comment #5) > (In reply to Avihai from comment #4) > > (In reply to Nir Soffer from comment #2) > > > Avihay, we need more data about this. > > > > > > - can you reproduce this manually? > > Yes indeed but about 70% of the times (only on rhel8 guest with gluster os > > disk) > > Reproducible is ~70% manually , from 14 created VMs(rhel8 guest) issue > > occurred only 10 times > > In automation each time I run TestCase18868 issue occurs(ran about ~7 times). > > > > > - Does it happen with rhel 7.6 guest or only with rhel 7.7 guest? > > > > 7.6 guest - Tried 8 times issue did not reproduce > > 7.7 guest - Tried 8 times issue did not reproduce > > 8.1 guest with gluster os disk- Issue occurred 10/14(~ 70% rep ratio) times > > manually and each time running automation TestCase18868(which does create VM > > from rhel8 template +cold snapshot) > > 8.1 guest with FCP os disk- created 8 VM's issue did not occur once. > > > > > - Does it happen only with rhel 7.7 host or also with rhel 7.6 hosts and 7.7 > > > guest? > > Until now we worked only with rhel7.6(4.3.4) hosts with same ENVs/gluster > > and the issue did not occur once. > > If you are asking if we tested latest 4.3.5 with rhel7.6 hosts we did not as > > 4.3.5 is synced with RHEL7.7. > > > > > > > - Can you attach the vm journal? (journalctl -b) maybe start the same vm on > > > rhel 7.6 > > > if the issue happens only with 7.7. > > No as when the VM reaches this that you cannot connect via ssh and console > > is not responsive. > > > > > - Can you add gluster logs from /var/log/gluster*/? > > Added > > > When we have more info we can ask the qemu team to look at this. > > > > > > Also more info: > > Also I saw the issue occurs on multiple enviroments. > > Also the issue occur with 2 different gluster clusters with 2 different > > versions: > > glusterfs 3.12.6 (TLV2 site) - out infra TierX runs are done here > > glusterfs 6.3 (Raanana site) - local RHV storage team > > Could this be related to Bug 1701736? Can you try changing the > UseNativeIOForGluster to false in vdc_options before the run? I tried but I cannot set it, please help. (host FQDN= hosted-engine-06.lab.eng.tlv2.redhat.com) This is what I get: [root@hosted-engine-06 ~]# engine-config -s UseNativeIOForGluster=False Please select a version: 1. 4.1 2. 4.2 3. 4.3 3 Error setting UseNativeIOForGluster's value. No such entry with version 4.3. OK , I found how to add it according to Bug 1701736: root@hosted-engine-06 ~]# vi /etc/ovirt-engine/engine-config/engine-config.properties Add the following lines at the end: +UseNativeIOForGluster.description=Access volumes on glusterfs with aio native insteat of thread +UseNativeIOForGluster.type=Boolean [root@hosted-engine-06 ~]# engine-config -s UseNativeIOForGluster=False Please select a version: 1. 4.1 2. 4.2 3. 4.3 3 [root@hosted-engine-06 ~]# engine-config -g UseNativeIOForGluster UseNativeIOForGluster: false version: 4.1 UseNativeIOForGluster: true version: 4.2 UseNativeIOForGluster: False version: 4.3 [root@hosted-engine-06 ~]# systemctl restart ovirt-engine So to answer Sahina's question :
> Could this be related to Bug 1701736? Can you try changing the
> UseNativeIOForGluster to false in vdc_options before the run?
Indeed it looks related.
What I did:
1) Once I turned UseNativeIOForGluster off the issue was not seen anymore (tried on 16 VM's and non of them had the issue)
2) Once I returned to original engine-config.properties(without a UseNativeIOForGluster value) I saw the issue again
(created pool of 8 VM's and saw the issue on 2 out of 8 VM's)
As the at moment engine-config.properties does not have a UseNativeIOForGluster and it should be added manually.
Was UseNativeIOForGluster value was changed to be enabled at RHEL7.7 or is this a RHV4.3.5 addition?
Another odd thing is that I see this issue only on RHEL8 guest.
This isuue is seen only on the following combination:
Host=RHEL7.7/RHV4.3.5
Guest = RHEL8.1
VM OS disk is gluster (does not matter what version of the gluster storage is)
UseNativeIOForGluster value does not exist in engine-config.properties and is turned on somehow.
Sahina, how do you want to proceed here?
So I guess this should be marked as duplicate then, isn't it? (In reply to Avihai from comment #8) > So to answer Sahina's question : > > > Could this be related to Bug 1701736? Can you try changing the > > UseNativeIOForGluster to false in vdc_options before the run? > > Indeed it looks related. > > What I did: > 1) Once I turned UseNativeIOForGluster off the issue was not seen anymore > (tried on 16 VM's and non of them had the issue) > 2) Once I returned to original engine-config.properties(without a > UseNativeIOForGluster value) I saw the issue again > (created pool of 8 VM's and saw the issue on 2 out of 8 VM's) > > As the at moment engine-config.properties does not have a > UseNativeIOForGluster and it should be added manually. > Was UseNativeIOForGluster value was changed to be enabled at RHEL7.7 or is > this a RHV4.3.5 addition? > > > Another odd thing is that I see this issue only on RHEL8 guest. > This isuue is seen only on the following combination: > > Host=RHEL7.7/RHV4.3.5 > Guest = RHEL8.1 > VM OS disk is gluster (does not matter what version of the gluster storage > is) > UseNativeIOForGluster value does not exist in engine-config.properties and > is turned on somehow. > > Sahina, how do you want to proceed here? Thanks Avihai for checking this. We introduced aio=native due to the performance gains seen on test (Bug 1630744) One option is to revert to using aio=threads. (which we will do since corruption takes precedence over performance) But there does seem to be an issue with aio=native which is seen based on the guest OS used - this also needs to be investigated. Perhaps change the component to aio to investigate? (In reply to Sahina Bose from comment #10) > (In reply to Avihai from comment #8) > > So to answer Sahina's question : > > > > > Could this be related to Bug 1701736? Can you try changing the > > > UseNativeIOForGluster to false in vdc_options before the run? > > > > Indeed it looks related. > > > > What I did: > > 1) Once I turned UseNativeIOForGluster off the issue was not seen anymore > > (tried on 16 VM's and non of them had the issue) > > 2) Once I returned to original engine-config.properties(without a > > UseNativeIOForGluster value) I saw the issue again > > (created pool of 8 VM's and saw the issue on 2 out of 8 VM's) > > > > As the at moment engine-config.properties does not have a > > UseNativeIOForGluster and it should be added manually. > > Was UseNativeIOForGluster value was changed to be enabled at RHEL7.7 or is > > this a RHV4.3.5 addition? > > > > > > Another odd thing is that I see this issue only on RHEL8 guest. > > This isuue is seen only on the following combination: > > > > Host=RHEL7.7/RHV4.3.5 > > Guest = RHEL8.1 > > VM OS disk is gluster (does not matter what version of the gluster storage > > is) > > UseNativeIOForGluster value does not exist in engine-config.properties and > > is turned on somehow. > > > > Sahina, how do you want to proceed here? > > Thanks Avihai for checking this. > > We introduced aio=native due to the performance gains seen on test (Bug > 1630744) > One option is to revert to using aio=threads. (which we will do since > corruption takes precedence over performance) > > But there does seem to be an issue with aio=native which is seen based on > the guest OS used - this also needs to be investigated. Perhaps change the > component to aio to investigate? (In reply to Sahina Bose from comment #10) > (In reply to Avihai from comment #8) > > So to answer Sahina's question : > > > > > Could this be related to Bug 1701736? Can you try changing the > > > UseNativeIOForGluster to false in vdc_options before the run? > > > > Indeed it looks related. > > > > What I did: > > 1) Once I turned UseNativeIOForGluster off the issue was not seen anymore > > (tried on 16 VM's and non of them had the issue) > > 2) Once I returned to original engine-config.properties(without a > > UseNativeIOForGluster value) I saw the issue again > > (created pool of 8 VM's and saw the issue on 2 out of 8 VM's) > > > > As the at moment engine-config.properties does not have a > > UseNativeIOForGluster and it should be added manually. > > Was UseNativeIOForGluster value was changed to be enabled at RHEL7.7 or is > > this a RHV4.3.5 addition? > > > > > > Another odd thing is that I see this issue only on RHEL8 guest. > > This isuue is seen only on the following combination: > > > > Host=RHEL7.7/RHV4.3.5 > > Guest = RHEL8.1 > > VM OS disk is gluster (does not matter what version of the gluster storage > > is) > > UseNativeIOForGluster value does not exist in engine-config.properties and > > is turned on somehow. > > > > Sahina, how do you want to proceed here? > > Thanks Avihai for checking this. > > We introduced aio=native due to the performance gains seen on test (Bug > 1630744) This bug was introduced in 4.2 and the issue was not seen until now, what changed in 4.3.5 to make it noticeble? > One option is to revert to using aio=threads. (which we will do since > corruption takes precedence over performance) > > But there does seem to be an issue with aio=native which is seen based on > the guest OS used - this also needs to be investigated. Perhaps change the > component to aio to investigate? I am not familiar with this component(not in the drop-down), can you please change it? (In reply to Avihai from comment #11) > (In reply to Sahina Bose from comment #10) > > (In reply to Avihai from comment #8) > > > So to answer Sahina's question : > > > > > > > Could this be related to Bug 1701736? Can you try changing the > > > > UseNativeIOForGluster to false in vdc_options before the run? > > > > > > Indeed it looks related. > > > > > > What I did: > > > 1) Once I turned UseNativeIOForGluster off the issue was not seen anymore > > > (tried on 16 VM's and non of them had the issue) > > > 2) Once I returned to original engine-config.properties(without a > > > UseNativeIOForGluster value) I saw the issue again > > > (created pool of 8 VM's and saw the issue on 2 out of 8 VM's) > > > > > > As the at moment engine-config.properties does not have a > > > UseNativeIOForGluster and it should be added manually. > > > Was UseNativeIOForGluster value was changed to be enabled at RHEL7.7 or is > > > this a RHV4.3.5 addition? > > > > > > > > > Another odd thing is that I see this issue only on RHEL8 guest. > > > This isuue is seen only on the following combination: > > > > > > Host=RHEL7.7/RHV4.3.5 > > > Guest = RHEL8.1 > > > VM OS disk is gluster (does not matter what version of the gluster storage > > > is) > > > UseNativeIOForGluster value does not exist in engine-config.properties and > > > is turned on somehow. > > > > > > Sahina, how do you want to proceed here? > > > > Thanks Avihai for checking this. > > > > We introduced aio=native due to the performance gains seen on test (Bug > > 1630744) > > One option is to revert to using aio=threads. (which we will do since > > corruption takes precedence over performance) > > > > But there does seem to be an issue with aio=native which is seen based on > > the guest OS used - this also needs to be investigated. Perhaps change the > > component to aio to investigate? > > (In reply to Sahina Bose from comment #10) > > (In reply to Avihai from comment #8) > > > So to answer Sahina's question : > > > > > > > Could this be related to Bug 1701736? Can you try changing the > > > > UseNativeIOForGluster to false in vdc_options before the run? > > > > > > Indeed it looks related. > > > > > > What I did: > > > 1) Once I turned UseNativeIOForGluster off the issue was not seen anymore > > > (tried on 16 VM's and non of them had the issue) > > > 2) Once I returned to original engine-config.properties(without a > > > UseNativeIOForGluster value) I saw the issue again > > > (created pool of 8 VM's and saw the issue on 2 out of 8 VM's) > > > > > > As the at moment engine-config.properties does not have a > > > UseNativeIOForGluster and it should be added manually. > > > Was UseNativeIOForGluster value was changed to be enabled at RHEL7.7 or is > > > this a RHV4.3.5 addition? > > > > > > > > > Another odd thing is that I see this issue only on RHEL8 guest. > > > This isuue is seen only on the following combination: > > > > > > Host=RHEL7.7/RHV4.3.5 > > > Guest = RHEL8.1 > > > VM OS disk is gluster (does not matter what version of the gluster storage > > > is) > > > UseNativeIOForGluster value does not exist in engine-config.properties and > > > is turned on somehow. > > > > > > Sahina, how do you want to proceed here? > > > > Thanks Avihai for checking this. > > > > We introduced aio=native due to the performance gains seen on test (Bug > > 1630744) > > This bug was introduced in 4.2 and the issue was not seen until now, what > changed in 4.3.5 to make it noticeble? The change was introduced in 4.2 , and bug is seen now. The only change I can see is the guest OS, and RHEL 7.7 on host. > > > One option is to revert to using aio=threads. (which we will do since > > corruption takes precedence over performance) > > > > But there does seem to be an issue with aio=native which is seen based on > > the guest OS used - this also needs to be investigated. Perhaps change the > > component to aio to investigate? > > > I am not familiar with this component(not in the drop-down), can you please > change it? Hi Avihai, KVM QE could not reproduce it in QEMU env. Is it possible for you to provide the QEMU CML when set 'UseNativeIOForGluster' = False ? I would like to confirm the option 'UseNativeIOForGluster' is corresponding to the option 'aio=native' in QEMU side. Thanks. (In reply to CongLi from comment #19) > Hi Avihai, > > KVM QE could not reproduce it in QEMU env. > > Is it possible for you to provide the QEMU CML when set > 'UseNativeIOForGluster' = False ? > I would like to confirm the option 'UseNativeIOForGluster' > is corresponding to the option 'aio=native' in QEMU side. > Sorry, I mean 'UseNativeIOForGluster' = False is corresponding to the option 'aio=threads' in QEMU side. Thanks. > Thanks. (In reply to CongLi from comment #20) > (In reply to CongLi from comment #19) > > Hi Avihai, > > > > KVM QE could not reproduce it in QEMU env. > > > > Is it possible for you to provide the QEMU CML when set > > 'UseNativeIOForGluster' = False ? > > I would like to confirm the option 'UseNativeIOForGluster' > > is corresponding to the option 'aio=native' in QEMU side. > > > > Sorry, I mean 'UseNativeIOForGluster' = False is corresponding > to the option 'aio=threads' in QEMU side. > > Thanks. > > > Thanks. I'v set 'UseNativeIOForGluster' = False at this engine FQDN "storage-ge-08.scl.lab.tlv.redhat.com". But I need your help on providing the QEMU CML, can you please help? details on how I set it: [root@storage-ge-08 ~]# vim /etc/ovirt-engine/engine-config/engine-config.properties Added the following lines at the end: UseNativeIOForGluster.description=Access volumes on glusterfs with aio native insteat of thread UseNativeIOForGluster.type=Boolean [root@storage-ge-08 ~]# engine-config -s UseNativeIOForGluster=False Please select a version: 1. 4.1 2. 4.2 3. 4.3 3 [root@storage-ge-08 ~]# engine-config -g UseNativeIOForGluster UseNativeIOForGluster: false version: 4.1 UseNativeIOForGluster: true version: 4.2 UseNativeIOForGluster: False version: 4.3 [root@storage-ge-08 ~]# systemctl restart ovirt-engine (In reply to Avihai from comment #21) > (In reply to CongLi from comment #20) > > (In reply to CongLi from comment #19) > > > Hi Avihai, > > > > > > KVM QE could not reproduce it in QEMU env. > > > > > > Is it possible for you to provide the QEMU CML when set > > > 'UseNativeIOForGluster' = False ? > > > I would like to confirm the option 'UseNativeIOForGluster' > > > is corresponding to the option 'aio=native' in QEMU side. > > > > > > > Sorry, I mean 'UseNativeIOForGluster' = False is corresponding > > to the option 'aio=threads' in QEMU side. > > > > Thanks. > > > > > Thanks. > > I'v set 'UseNativeIOForGluster' = False at this engine FQDN > "storage-ge-08.scl.lab.tlv.redhat.com". > But I need your help on providing the QEMU CML, can you please help? Could you please help try '# ps aux | grep qemu' in your host / engine ? Thanks. > > details on how I set it: > [root@storage-ge-08 ~]# vim > /etc/ovirt-engine/engine-config/engine-config.properties > Added the following lines at the end: > UseNativeIOForGluster.description=Access volumes on glusterfs with aio > native insteat of thread > UseNativeIOForGluster.type=Boolean > > [root@storage-ge-08 ~]# engine-config -s UseNativeIOForGluster=False > Please select a version: > 1. 4.1 > 2. 4.2 > 3. 4.3 > 3 > [root@storage-ge-08 ~]# engine-config -g UseNativeIOForGluster > UseNativeIOForGluster: false version: 4.1 > UseNativeIOForGluster: true version: 4.2 > UseNativeIOForGluster: False version: 4.3 > [root@storage-ge-08 ~]# systemctl restart ovirt-engine (In reply to CongLi from comment #22) > (In reply to Avihai from comment #21) > > (In reply to CongLi from comment #20) > > > (In reply to CongLi from comment #19) > > > > Hi Avihai, > > > > > > > > KVM QE could not reproduce it in QEMU env. > > > > > > > > Is it possible for you to provide the QEMU CML when set > > > > 'UseNativeIOForGluster' = False ? > > > > I would like to confirm the option 'UseNativeIOForGluster' > > > > is corresponding to the option 'aio=native' in QEMU side. > > > > > > > > > > Sorry, I mean 'UseNativeIOForGluster' = False is corresponding > > > to the option 'aio=threads' in QEMU side. > > > > > > Thanks. > > > > > > > Thanks. > > > > I'v set 'UseNativeIOForGluster' = False at this engine FQDN > > "storage-ge-08.scl.lab.tlv.redhat.com". > > But I need your help on providing the QEMU CML, can you please help? > > Could you please help try '# ps aux | grep qemu' in your host / engine ? Indeed I see aio=threads after setting UseNativeIOForGluster to false. Engine output: [root@storage-ge-08 ~]# ps aux | grep qemu root 1176 0.0 0.0 44216 2564 ? Ss Jul03 0:36 /usr/bin/qemu-ga --method=virtio-serial --path=/dev/virtio-ports/org.qemu.guest_agent.0 --blacklist=guest-file-open,guest-file-close,guest-file-read,guest-file-write,guest-file-seek,guest-file-flush,guest-exec,guest-exec-status -F/etc/qemu-ga/fsfreeze-hook root 15741 0.0 0.0 112720 964 pts/0 S+ 15:12 0:00 grep --color=auto qemu Host output: [root@storage-ge8-vdsm2 ~]# ps aux | grep qemu root 725 0.0 0.0 44216 2436 ? Ss Jul03 1:14 /usr/bin/qemu-ga --method=virtio-serial --path=/dev/virtio-ports/org.qemu.guest_agent.0 --blacklist=guest-file-open,guest-file-close,guest-file-read,guest-file-write,guest-file-seek,guest-file-flush,guest-exec,guest-exec-status -F/etc/qemu-ga/fsfreeze-hook qemu 12943 13.0 19.6 1998592 760756 ? Rl Jul04 543:30 /usr/libexec/qemu-kvm -name guest=pool_vm_gluster-1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-4-pool_vm_gluster-1/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu Nehalem -m size=1048576k,slots=16,maxmem=4194304k -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -object iothread,id=iothread1 -numa node,nodeid=0,cpus=0,mem=1024 -uuid 13b09533-008c-4e40-acde-3efa7150e383 -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.7-7.el7,serial=53a0fc14-3841-4c1d-a7cc-b5b7b25874a4,uuid=13b09533-008c-4e40-acde-3efa7150e383 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=32,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2019-07-04T14:48:37,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,iothread=iothread1,id=ua-5f9f4544-5e3a-4f0b-9c4b-f2b08bb8e9ba,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=ua-0669b3db-2ede-40e1-af66-42ed04cf035c,max_ports=16,bus=pci.0,addr=0x5 -drive if=none,id=drive-ua-2f4d2630-0b13-4973-9c9d-309548498205,werror=report,rerror=report,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ua-2f4d2630-0b13-4973-9c9d-309548498205,id=ua-2f4d2630-0b13-4973-9c9d-309548498205 -drive file=/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge8__volume__0/dc3f1c4c-10a8-459b-ada3-901201bd1df2/images/db3220fa-7af3-4220-a5e4-52e49086edb2/161ae4ac-f98f-47f1-a59a-63dc97531130,format=qcow2,if=none,id=drive-ua-db3220fa-7af3-4220-a5e4-52e49086edb2,serial=db3220fa-7af3-4220-a5e4-52e49086edb2,werror=stop,rerror=stop,cache=none,aio=threads -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-db3220fa-7af3-4220-a5e4-52e49086edb2,id=ua-db3220fa-7af3-4220-a5e4-52e49086edb2,bootindex=1,write-cache=on -netdev tap,fd=36,id=hostua-ced84a2f-971d-4766-914a-e0a1852ea21c,vhost=on,vhostfd=41 -device virtio-net-pci,host_mtu=1500,netdev=hostua-ced84a2f-971d-4766-914a-e0a1852ea21c,id=ua-ced84a2f-971d-4766-914a-e0a1852ea21c,mac=00:1a:4a:16:25:e9,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,fd=42,server,nowait -device virtserialport,bus=ua-0669b3db-2ede-40e1-af66-42ed04cf035c.0,nr=1,chardev=charchannel0,id=channel0,name=ovirt-guest-agent.0 -chardev socket,id=charchannel1,fd=43,server,nowait -device virtserialport,bus=ua-0669b3db-2ede-40e1-af66-42ed04cf035c.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=ua-0669b3db-2ede-40e1-af66-42ed04cf035c.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5900,tls-port=5901,addr=10.35.82.80,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -vnc 10.35.82.80:2,password,tls,x509=/etc/pki/vdsm/libvirt-vnc -k en-us -device qxl-vga,id=ua-5bca4eec-f416-45b3-a56b-291efea91a1c,ram_size=67108864,vram_size=8388608,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=ua-3d0e9700-ad99-40c0-b15d-665b730373de,bus=pci.0,addr=0x4 -device hda-duplex,id=ua-3d0e9700-ad99-40c0-b15d-665b730373de-codec0,bus=ua-3d0e9700-ad99-40c0-b15d-665b730373de.0,cad=0 -device virtio-balloon-pci,id=ua-a0d9d265-ffc4-4a92-b6db-e4ae8fc59db6,bus=pci.0,addr=0x8 -object rng-random,id=objua-9806d9a9-c31c-4ca8-b87a-b09efdc9309e,filename=/dev/urandom -device virtio-rng-pci,rng=objua-9806d9a9-c31c-4ca8-b87a-b09efdc9309e,id=ua-9806d9a9-c31c-4ca8-b87a-b09efdc9309e,bus=pci.0,addr=0x9 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on qemu 12971 12.9 19.1 1936940 740856 ? Rl Jul04 540:09 /usr/libexec/qemu-kvm -name guest=pool_vm_gluster-2,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-5-pool_vm_gluster-2/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu Nehalem -m size=1048576k,slots=16,maxmem=4194304k -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -object iothread,id=iothread1 -numa node,nodeid=0,cpus=0,mem=1024 -uuid e1059eb4-0526-43ef-8a27-62487d5b4588 -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.7-7.el7,serial=53a0fc14-3841-4c1d-a7cc-b5b7b25874a4,uuid=e1059eb4-0526-43ef-8a27-62487d5b4588 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=38,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2019-07-04T14:48:38,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,iothread=iothread1,id=ua-70ab3713-14ce-47d6-a5fc-34312acedf3d,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=ua-8ca2e0ce-47d2-44cc-b97b-f681af344975,max_ports=16,bus=pci.0,addr=0x6 -drive if=none,id=drive-ua-22b0ab7e-3454-4bfb-829d-78822ce8a4e3,werror=report,rerror=report,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ua-22b0ab7e-3454-4bfb-829d-78822ce8a4e3,id=ua-22b0ab7e-3454-4bfb-829d-78822ce8a4e3 -drive file=/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge8__volume__0/dc3f1c4c-10a8-459b-ada3-901201bd1df2/images/e97c1add-a000-4100-9045-40da96ee378b/4f12ef9d-a1f2-4772-815f-9b2c268acbe0,format=qcow2,if=none,id=drive-ua-e97c1add-a000-4100-9045-40da96ee378b,serial=e97c1add-a000-4100-9045-40da96ee378b,werror=stop,rerror=stop,cache=none,aio=threads -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-e97c1add-a000-4100-9045-40da96ee378b,id=ua-e97c1add-a000-4100-9045-40da96ee378b,bootindex=1,write-cache=on -netdev tap,fd=40,id=hostua-e554b0e4-7c8d-4263-9cea-3a54b34900db,vhost=on,vhostfd=32 -device virtio-net-pci,host_mtu=1500,netdev=hostua-e554b0e4-7c8d-4263-9cea-3a54b34900db,id=ua-e554b0e4-7c8d-4263-9cea-3a54b34900db,mac=00:1a:4a:16:25:ea,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,fd=36,server,nowait -device virtserialport,bus=ua-8ca2e0ce-47d2-44cc-b97b-f681af344975.0,nr=1,chardev=charchannel0,id=channel0,name=ovirt-guest-agent.0 -chardev socket,id=charchannel1,fd=41,server,nowait -device virtserialport,bus=ua-8ca2e0ce-47d2-44cc-b97b-f681af344975.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=ua-8ca2e0ce-47d2-44cc-b97b-f681af344975.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5903,tls-port=5904,addr=10.35.82.80,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -vnc 10.35.82.80:5,password,tls,x509=/etc/pki/vdsm/libvirt-vnc -k en-us -device qxl-vga,id=ua-759d513c-80e2-4eb2-b9e4-7a40029f55b0,ram_size=67108864,vram_size=8388608,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=ua-ae3ba68a-8dfd-4eb9-b43d-0b585110a540,bus=pci.0,addr=0x4 -device hda-duplex,id=ua-ae3ba68a-8dfd-4eb9-b43d-0b585110a540-codec0,bus=ua-ae3ba68a-8dfd-4eb9-b43d-0b585110a540.0,cad=0 -device virtio-balloon-pci,id=ua-82435524-5918-4777-aeeb-17cc2fca4fa3,bus=pci.0,addr=0x8 -object rng-random,id=objua-9806d9a9-c31c-4ca8-b87a-b09efdc9309e,filename=/dev/urandom -device virtio-rng-pci,rng=objua-9806d9a9-c31c-4ca8-b87a-b09efdc9309e,id=ua-9806d9a9-c31c-4ca8-b87a-b09efdc9309e,bus=pci.0,addr=0x9 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on qemu 13370 11.8 14.6 1968764 569236 ? Rl Jul04 494:22 /usr/libexec/qemu-kvm -name guest=pool_vm_gluster-6,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-6-pool_vm_gluster-6/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu Nehalem -m size=1048576k,slots=16,maxmem=4194304k -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -object iothread,id=iothread1 -numa node,nodeid=0,cpus=0,mem=1024 -uuid b35a0328-6340-46c3-a697-888a1eec74ac -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.7-7.el7,serial=53a0fc14-3841-4c1d-a7cc-b5b7b25874a4,uuid=b35a0328-6340-46c3-a697-888a1eec74ac -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=33,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2019-07-04T14:49:31,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,iothread=iothread1,id=ua-08d5b00d-83f4-4249-8c5d-7dad3ca97c2c,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=ua-43532932-fe0c-4698-bc7f-615e0ebe69aa,max_ports=16,bus=pci.0,addr=0x6 -drive if=none,id=drive-ua-28053b06-9947-4b3e-a737-03a150e8809f,werror=report,rerror=report,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ua-28053b06-9947-4b3e-a737-03a150e8809f,id=ua-28053b06-9947-4b3e-a737-03a150e8809f -drive file=/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge8__volume__0/dc3f1c4c-10a8-459b-ada3-901201bd1df2/images/fea12c3b-307d-4f3d-89fc-db0ac82d5b67/f4169dcc-c8a8-40fa-9eb9-61b65b6566be,format=qcow2,if=none,id=drive-ua-fea12c3b-307d-4f3d-89fc-db0ac82d5b67,serial=fea12c3b-307d-4f3d-89fc-db0ac82d5b67,werror=stop,rerror=stop,cache=none,aio=threads -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-fea12c3b-307d-4f3d-89fc-db0ac82d5b67,id=ua-fea12c3b-307d-4f3d-89fc-db0ac82d5b67,bootindex=1,write-cache=on -netdev tap,fd=35,id=hostua-4439dbe7-a366-43bd-82aa-fe3a0836afe3,vhost=on,vhostfd=36 -device virtio-net-pci,host_mtu=1500,netdev=hostua-4439dbe7-a366-43bd-82aa-fe3a0836afe3,id=ua-4439dbe7-a366-43bd-82aa-fe3a0836afe3,mac=00:1a:4a:16:25:ee,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,fd=37,server,nowait -device virtserialport,bus=ua-43532932-fe0c-4698-bc7f-615e0ebe69aa.0,nr=1,chardev=charchannel0,id=channel0,name=ovirt-guest-agent.0 -chardev socket,id=charchannel1,fd=38,server,nowait -device virtserialport,bus=ua-43532932-fe0c-4698-bc7f-615e0ebe69aa.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=ua-43532932-fe0c-4698-bc7f-615e0ebe69aa.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5906,tls-port=5907,addr=10.35.82.80,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -vnc 10.35.82.80:8,password,tls,x509=/etc/pki/vdsm/libvirt-vnc -k en-us -device qxl-vga,id=ua-1b09afab-447d-45da-aa2c-05dd017663fa,ram_size=67108864,vram_size=8388608,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=ua-f9ceb167-c491-4630-8d0b-19430b7d9d9a,bus=pci.0,addr=0x4 -device hda-duplex,id=ua-f9ceb167-c491-4630-8d0b-19430b7d9d9a-codec0,bus=ua-f9ceb167-c491-4630-8d0b-19430b7d9d9a.0,cad=0 -device virtio-balloon-pci,id=ua-2d0a78cb-7d26-4b96-811e-e61b6a687c2b,bus=pci.0,addr=0x8 -object rng-random,id=objua-9806d9a9-c31c-4ca8-b87a-b09efdc9309e,filename=/dev/urandom -device virtio-rng-pci,rng=objua-9806d9a9-c31c-4ca8-b87a-b09efdc9309e,id=ua-9806d9a9-c31c-4ca8-b87a-b09efdc9309e,bus=pci.0,addr=0x9 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on > Thanks. > > > > > details on how I set it: > > [root@storage-ge-08 ~]# vim > > /etc/ovirt-engine/engine-config/engine-config.properties > > Added the following lines at the end: > > UseNativeIOForGluster.description=Access volumes on glusterfs with aio > > native insteat of thread > > UseNativeIOForGluster.type=Boolean > > > > [root@storage-ge-08 ~]# engine-config -s UseNativeIOForGluster=False > > Please select a version: > > 1. 4.1 > > 2. 4.2 > > 3. 4.3 > > 3 > > [root@storage-ge-08 ~]# engine-config -g UseNativeIOForGluster > > UseNativeIOForGluster: false version: 4.1 > > UseNativeIOForGluster: true version: 4.2 > > UseNativeIOForGluster: False version: 4.3 > > [root@storage-ge-08 ~]# systemctl restart ovirt-engine Thanks Avihai. Could you please also help provide the steps and script of TestCase18868 ? Thanks. (In reply to CongLi from comment #24) > Thanks Avihai. > > Could you please also help provide the steps and script of TestCase18868 ? > > Thanks. All the mentioned tests in the QA whiteboard (TestCase18868 included) all do the same simple thing which is to start VM from a RHEL8 OS disk which resides on a gluster storage domain than wait for IP and try to do SSH which fails as the issue occurred. We use a huge framework called ART which uses RESTAPI calls data structure for tests so there is no simple script I can supply unless you are already working with this framework. However, this was reproduced manually many times not on all VM's but in 2 out of 8 VM's or more : 1) Create a template which is based on a RHEL8 OS disk on a gluster storage domain. 2) Create many VM's(as many as possible) from that template 3) Start VM and wait for IP 4) Once VM got IP try to do SSH => fails VM start and gets IP but right afterward in the console you see this error and from that point IP is still available, ssh fails all the VM is 'UP' but you cannot do anything with it an also console is stuck. (In reply to CongLi from comment #19) > Hi Avihai, > > KVM QE could not reproduce it in QEMU env. > > Is it possible for you to provide the QEMU CML when set > 'UseNativeIOForGluster' = False ? > I would like to confirm the option 'UseNativeIOForGluster' > is corresponding to the option 'aio=native' in QEMU side. > > Thanks. Hi CongLi, I'm starting to work on this BZ. Are you able to reproduce in the QEMU env? Thanks, Stefano I tried following command : /usr/libexec/qemu-kvm \ -name "guest-rhel8.0-2" \ -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \ -cpu Westmere \ -m 6144 \ -realtime mlock=off \ -uuid dbfc4b9a-74bf-4c21-95c6-0840743fd57a \ -smbios 'type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.7-7.el7,serial=4c4c4544-0047-3210-8053-c4c04f473632,uuid=dbfc4b9a-74bf-4c21-95c6-0840743fd57a' \ -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 \ -nodefaults \ -vga qxl \ -object iothread,id=iothread0 \ -drive file=/mnt/gluster/rhel810-64-virtio2.qcow2,format=qcow2,if=none,id=drive-ua-1,serial=1,werror=stop,rerror=stop,cache=none,aio=native \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-ua-1,id=ua-1,bootindex=1,write-cache=on \ -vnc :2 \ -monitor stdio \ -device virtio-net-pci,mac=9a:b5:b6:a1:b2:c2,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pci.0,addr=0x9 \ -netdev tap,id=idxgXAlm,vhost=on \ -qmp tcp:localhost:5952,server,nowait \ -chardev file,path=/home/serial2.log,id=serial_id_serial0 \ -device isa-serial,chardev=serial_id_serial0 \ /mnt/gluster/rhel810-64-virtio2.qcow2 is my guest image. I can not reproduce this issue. Hi Stefano, could you please help to run above command on your vdsm node. You need to replace "/mnt/gluster/rhel810-64-virtio2.qcow2 " with your guest image which locate your gluster fs. Could you please share your guest image if you may reproduce this issue with this command. (In reply to qing.wang from comment #28) > > Hi Stefano, could you please help to run above command on your vdsm node. > You need to replace "/mnt/gluster/rhel810-64-virtio2.qcow2 " with your guest > image which locate your gluster fs. Hi Qing, I'm not able too. I started QEMU with a very similar command line, but it works well. Maybe Avihai can help us because I don't have access to the VDSM node. It looks like this issue related to ovirt environment. Avihai said new sanlock-3.7.3-1.el7.x86_64 involved, it maybe result in regression issue. I suggest to check sanlock log when it happens. Is it possible to rollback last version of sanlock package and test it again? Guys, The ENV is there for you to use/debug - Please use it ASAP to extract what you need. It took me a while to reproduce this issue but I did so this is hard to reproduce. From 12 VM's only 3 VM's has this issue(see print screen attached): VM name= rhel8_nativeaio-3 VM name= rhel8_nativeaio-6 VM name= rhel8_nativeaio-10 Created attachment 1602624 [details]
print screen of the issue 3 VM's out of 12 has the issue
My test steps: 1.shutdown rhel8_nativeaio-1 - rhel8_nativeaio-8 on ovirt-engine. 2.create guest script (vm1.sh -vm8.sh) with qemu-kvm command like : file=/rhev/data-center/mnt/glusterSD/gluster01.lab.eng.tlv2.redhat.com:_GE__7__volume01/92e8d1b3-bc5b-4759-a997-c5cbdb32f6e7/images/6e798b8b-ef87-4b13-9ef1-43726b79f724/6c7c2f45-a756-48e5-bc73-41955cb8d629 mac=00:1a:4a:16:88:45 idx=1 /usr/libexec/qemu-kvm \ -name "guest-rhel8.0-${idx}" \ -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \ -cpu Westmere \ -m 6144 \ -realtime mlock=off \ -uuid dbfc4b9a-74bf-4c21-95c6-0840743fd57a \ -smbios 'type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.7-7.el7,serial=4c4c4544-0047-3210-8053-c4c04f473632,uuid=dbfc4b9a-74bf-4c21-95c6-0840743fd57a' \ -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 \ -nodefaults \ -vga qxl \ -object iothread,id=iothread0 \ -drive file=${file},format=qcow2,if=none,id=drive-ua-1,serial=1,werror=stop,rerror=stop,cache=none,aio=native \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-ua-1,id=ua-1,bootindex=1,write-cache=on \ -vnc :2${idx} \ -monitor stdio \ -device virtio-net-pci,mac=${mac},id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pci.0,addr=0x9 \ -netdev tap,id=idxgXAlm,vhost=on \ -qmp tcp:localhost:595${idx},server,nowait \ -chardev file,path=/home/serial${idx}.log,id=serial_id_serial0 \ -device isa-serial,chardev=serial_id_serial0 \ -device vmcoreinfo \ on each vm.sh this file and mac address is respective in ovirt guest vms rhel8_nativeaio-1-8. (those scripts locate /root/test/ on lynx25.lab.eng.tlv2.redhat.com and lynx26.lab.eng.tlv2.redhat.com ) 3. run vm1.sh vm2.sh vm3.sh vm4.sh on lynx25.lab.eng.tlv2.redhat.com run vm5.sh vm6.sh vm7.sh vm8.sh on lynx26.lab.eng.tlv2.redhat.com (set the default nic to dhcp enabled in guest vm) 4 wait guest vm started and get ip ,then wait 20-30 minuutes 5. check remote console to see if have any issue on guests 6 . poweroff vm1-vm8 and repeat step3- step5 . =================================================================== I run above test not found issue. And vm1-vm8 still running on lynx25.lab.eng.tlv2.redhat.com and lynx26.lab.eng.tlv2.redhat.com. We may wait to tomorrow to see if have problem on long time running. vm1- vm8 hostname of vms: vm-17-69 to vm-17-76 This is a shared stand and also other teams need to run other tests on it by tomorrow so please debug as needed till then. The issue was already there on 3 VM's(see below), were you able to see what is special about them? VM name= rhel8_nativeaio-3 VM name= rhel8_nativeaio-6 VM name= rhel8_nativeaio-10 (In reply to Avihai from comment #35) > This is a shared stand and also other teams need to run other tests on it by > tomorrow so please debug as needed till then. > > The issue was already there on 3 VM's(see below), were you able to see what > is special about them? > VM name= rhel8_nativeaio-3 > VM name= rhel8_nativeaio-6 > VM name= rhel8_nativeaio-10 I have shutdown above VMs due to my VMs shared their images in my qemu command line testing. But i can not reproduce it. After some investigation, i suspect it is not valid testing in ovirt testing: set aio=native when create vm based on template. More detail please refer to https://www.redhat.com/archives/libvirt-users/2017-January/msg00025.html. I pick up one vm rhel8_nativeaio-12 to explain it. disk info: <disk type='file' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/> <source file='/rhev/data-center/mnt/glusterSD/gluster01.lab.eng.tlv2.redhat.com:_GE__7__volume01/92e8d1b3-bc5b-4759-a997-c5cbdb32f6e7/images/e5508a87-85c0-44cc-94fd-19ebc974c9a8/77ff077c-3ac6-4614-b75b-eb54294058bc'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='file' index='1'> <format type='qcow2'/> <source file='/rhev/data-center/mnt/glusterSD/gluster01.lab.eng.tlv2.redhat.com:_GE__7__volume01/92e8d1b3-bc5b-4759-a997-c5cbdb32f6e7/images/e5508a87-85c0-44cc-94fd-19ebc974c9a8/d5d90f0a-c496-4946-95fd-41e17abac359'/> <backingStore/> </backingStore> <target dev='vda' bus='virtio'/> <serial>e5508a87-85c0-44cc-94fd-19ebc974c9a8</serial> <boot order='1'/> <alias name='ua-e5508a87-85c0-44cc-94fd-19ebc974c9a8'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </disk> This VM used one image /rhev/data-center/mnt/glusterSD/gluster01.lab.eng.tlv2.redhat.com:_GE__7__volume01/92e8d1b3-bc5b-4759-a997-c5cbdb32f6e7/images/e5508a87-85c0-44cc-94fd-19ebc974c9a8/77ff077c-3ac6-4614-b75b-eb54294058bc [root@lynx26 test]# qemu-img info /rhev/data-center/mnt/glusterSD/gluster01.lab.eng.tlv2.redhat.com:_GE__7__volume01/92e8d1b3-bc5b-4759-a997-c5cbdb32f6e7/images/e5508a87-85c0-44cc-94fd-19ebc974c9a8/77ff077c-3ac6-4614-b75b-eb54294058bc image: /rhev/data-center/mnt/glusterSD/gluster01.lab.eng.tlv2.redhat.com:_GE__7__volume01/92e8d1b3-bc5b-4759-a997-c5cbdb32f6e7/images/e5508a87-85c0-44cc-94fd-19ebc974c9a8/77ff077c-3ac6-4614-b75b-eb54294058bc file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 121M cluster_size: 65536 backing file: d5d90f0a-c496-4946-95fd-41e17abac359 (actual path: /rhev/data-center/mnt/glusterSD/gluster01.lab.eng.tlv2.redhat.com:_GE__7__volume01/92e8d1b3-bc5b-4759-a997-c5cbdb32f6e7/images/e5508a87-85c0-44cc-94fd-19ebc974c9a8/d5d90f0a-c496-4946-95fd-41e17abac359) backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false It indicates this image is not pre-allocated, so aio=native is not suitable for it. @Stefano Please help to confirm it (In reply to Avihai from comment #32) > Guys, > > The ENV is there for you to use/debug - Please use it ASAP to extract what > you need. > > It took me a while to reproduce this issue but I did so this is hard to > reproduce. > > From 12 VM's only 3 VM's has this issue(see print screen attached): > VM name= rhel8_nativeaio-3 > VM name= rhel8_nativeaio-6 > VM name= rhel8_nativeaio-10 Sorry for the late response, but I was on PTO. I think that the VMs are not running anymore. Please, can you try to reproduce the issue again? (In reply to qing.wang from comment #36) > (In reply to Avihai from comment #35) > > This is a shared stand and also other teams need to run other tests on it by > > tomorrow so please debug as needed till then. > > > > The issue was already there on 3 VM's(see below), were you able to see what > > is special about them? > > VM name= rhel8_nativeaio-3 > > VM name= rhel8_nativeaio-6 > > VM name= rhel8_nativeaio-10 > > I have shutdown above VMs due to my VMs shared their images in my qemu > command line testing. > > But i can not reproduce it. > > After some investigation, i suspect it is not valid testing in ovirt > testing: set aio=native when create vm based on template. Good catch! > > More detail please refer to > https://www.redhat.com/archives/libvirt-users/2017-January/msg00025.html. > [...] > > @Stefano Please help to confirm it Right, it is not recommended and here there are other details: - https://access.redhat.com/articles/41313 "Specifically, if qemu-kvm is used with the aio=native IO mode over a sparse device image hosted on the ext4 or xfs filesystem, guest filesystem corruption will occur if partitions are not aligned with the host filesystem block size" - https://drive.google.com/file/d/0B44EcgFDZNtXSnhKbEZfNE1ad28 [Slide 61/66] "Native AIO can block the VM if the file is not fully allocated and is therefore not recommended for use on sparse files" "Writes to sparsely allocated files are more likely to block than fully preallocated files. Therefore it is recommended to only use aio=native on fully preallocated files, local disks, or logical volumes." So, if we want to use aio=native, maybe we should provide an image fully preallocated. Suggestion for Insights Rule: if qemu-kvm is used with the aio=native IO mode over a sparse device image hosted on the ext4 or xfs filesystem, give a warning that this may cause guest filesystem corruption if partitions are not aligned with the host filesystem block size. For details - see: https://access.redhat.com/articles/41313 and comment 23 above. This is an old issue which was triggered when RHHI/gluster Default was set to aio=native at rhv 4.3.5 failing our regression tests(bug 1701736). I think this issue is no longer urgent as Default was set to aio=threads for gluster/RHHI storage at bug 1701736. Since this fix was made this issue is no longer seen running our automation regressions. Lowering severity to high, please raise it if you feel otherwise. Avihai, do you have any VM where you can replicate this issue? Which level do we ready to fix this issue? if we follow the Insights Rule mentioned in comment 41 , i think it is upper application (ovirt) issue , not the qemu issue ,right? (In reply to qing.wang from comment #45) > Which level do we ready to fix this issue? if we follow the Insights Rule > mentioned in comment 41 , i think it is upper application (ovirt) issue , > not the qemu issue ,right? We should understand better why if we use aio=native, we have the corruption. It could be an alignment problem or another issue in XFS or QEMU or the Fuse driver or Gluster, maybe aio=native changes the timing and brings out a hidden bug. (In reply to Stefano Garzarella from comment #43) > Avihai, > do you have any VM where you can replicate this issue? Hi Stefano, I reproduce the issue multiple times and left the environment for DEV debug and can do it again if this will help. As this issue seems not supported for now and the default was set to aio=threads for gluster/RHHI storage at bug 1701736, is this still needed? (In reply to Avihai from comment #47) > (In reply to Stefano Garzarella from comment #43) > > Avihai, > > do you have any VM where you can replicate this issue? > > Hi Stefano, > I reproduce the issue multiple times and left the environment for DEV debug > and can do it again if this will help. Thanks! I'll ping you on IRC when I'll work on it. > As this issue seems not supported for now and the default was set to > aio=threads for gluster/RHHI storage at bug 1701736, is this still needed? Maybe we can reduce the priority and severity, but I think can be useful to solve this issue. Deferring it to RHEL8-AV. Worth fixing, but not urgent or regression (see previous comments). QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks Since this issue is not relevant to RHV anymore (due to changing defaults to aio=threads when working with glusterfs) and since it is so hard to reproduce, I propose to close this bug as WONTFIX or DEFERRED. Closing based on comment 53 |