Description of problem: Last year in September I discovered a really odd behaviour when i did run qemu against qcow2 images on glusterfs as described in https://bugs.launchpad.net/qemu/+bug/1793904 When using qcow2 images and the direct qemu-gluster interface, the image gets corrupted. However it went away when I did mount my images via fuse filesystem. So I ignored the issue also because i did plan to reinstall the whole thing with ovirt at that point. However it turns out that qcow images are getting corrupted on a fresh centos based cloud, too, and I am surprised to see that happening, als the ps shows that it is using the filesystem(fuse) and not gluster:// As I find this behaviour really serious, because i can hardly think of any reason why it affects only my datacenter! I do open the bug here as well before I spend more time in research. Version-Release number of selected component (if applicable): 4.3 (centos 7) How reproducible: Steps to Reproduce: 1. Set up a glusterfs storage with replica 2 arbiter 1 2. Set up a Virtual machine 3. Do some stuff aka upgrades disc IO Actual results: Some files contents are randomly zeroed out Expected results: Should not happen under any circumstances Additional info: https://bugs.launchpad.net/qemu/+bug/1793904
This bug could be a problem in glusterfs. As I have spoken to some friends, who are using similar setups (qemu/kvm/gluster) the only difference I can spot so far is that I decided to use ext4 as underlay Filesystem in favor of XFS. We are running on an emergency nfs domain for now.
Are you encountering this bug when you're running with oVirt? Can you provide output of "gluster volume info" and also the fuse mount logs for gluster volume?
Here is the gluster volume info, the other info follows. [root@arbiter0 ~]# gluster volume info Volume Name: ovirt_engine Type: Distributed-Replicate Volume ID: ed509cda-c236-49bf-a6e5-ff57855d0558 Status: Started Snapshot Count: 0 Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: rack1storage1.wg.csph.cloud:/bricks/sda/ovirt_engine Brick2: rack2storage1.wg.csph.cloud:/bricks/sda/ovirt_engine Brick3: arbiter0.wg.csph.cloud:/bricks/sda/ovirt_engine (arbiter) Brick4: rack1storage1.wg.csph.cloud:/bricks/sdb/ovirt_engine Brick5: rack2storage1.wg.csph.cloud:/bricks/sdb/ovirt_engine Brick6: arbiter0.wg.csph.cloud:/bricks/sdb/ovirt_engine (arbiter) Brick7: rack1storage1.wg.csph.cloud:/bricks/sdc/ovirt_engine Brick8: rack2storage1.wg.csph.cloud:/bricks/sdc/ovirt_engine Brick9: arbiter0.wg.csph.cloud:/bricks/sdc/ovirt_engine (arbiter) Brick10: rack1storage1.wg.csph.cloud:/bricks/sdd/ovirt_engine Brick11: rack2storage1.wg.csph.cloud:/bricks/sdd/ovirt_engine Brick12: arbiter0.wg.csph.cloud:/bricks/sdd/ovirt_engine (arbiter) Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet auth.allow: 10.253.0.10 features.quota: on features.inode-quota: on features.quota-deem-statfs: on Volume Name: ovirt_export Type: Distributed-Replicate Volume ID: cd114fe8-aae5-42e9-a2e5-3e02319cb6d7 Status: Started Snapshot Count: 0 Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: rack1storage1.wg.csph.cloud:/bricks/sda/ovirt_export Brick2: rack2storage1.wg.csph.cloud:/bricks/sda/ovirt_export Brick3: arbiter0.wg.csph.cloud:/bricks/sda/ovirt_export (arbiter) Brick4: rack1storage1.wg.csph.cloud:/bricks/sdb/ovirt_export Brick5: rack2storage1.wg.csph.cloud:/bricks/sdb/ovirt_export Brick6: arbiter0.wg.csph.cloud:/bricks/sdb/ovirt_export (arbiter) Brick7: rack1storage1.wg.csph.cloud:/bricks/sdc/ovirt_export Brick8: rack2storage1.wg.csph.cloud:/bricks/sdc/ovirt_export Brick9: arbiter0.wg.csph.cloud:/bricks/sdc/ovirt_export (arbiter) Brick10: rack1storage1.wg.csph.cloud:/bricks/sdd/ovirt_export Brick11: rack2storage1.wg.csph.cloud:/bricks/sdd/ovirt_export Brick12: arbiter0.wg.csph.cloud:/bricks/sdd/ovirt_export (arbiter) Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet auth.allow: 10.253.1.*,10.253.2.* Volume Name: ovirt_images Type: Distributed-Replicate Volume ID: dd865c50-49f1-4402-8894-c0cd07853d10 Status: Started Snapshot Count: 0 Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: rack1storage1.wg.csph.cloud:/bricks/sda/ovirt_images Brick2: rack2storage1.wg.csph.cloud:/bricks/sda/ovirt_images Brick3: arbiter0.wg.csph.cloud:/bricks/sda/ovirt_images (arbiter) Brick4: rack1storage1.wg.csph.cloud:/bricks/sdb/ovirt_images Brick5: rack2storage1.wg.csph.cloud:/bricks/sdb/ovirt_images Brick6: arbiter0.wg.csph.cloud:/bricks/sdb/ovirt_images (arbiter) Brick7: rack1storage1.wg.csph.cloud:/bricks/sdc/ovirt_images Brick8: rack2storage1.wg.csph.cloud:/bricks/sdc/ovirt_images Brick9: arbiter0.wg.csph.cloud:/bricks/sdc/ovirt_images (arbiter) Brick10: rack1storage1.wg.csph.cloud:/bricks/sdd/ovirt_images Brick11: rack2storage1.wg.csph.cloud:/bricks/sdd/ovirt_images Brick12: arbiter0.wg.csph.cloud:/bricks/sdd/ovirt_images (arbiter) Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet auth.allow: 10.253.1.*,10.253.2.* Volume Name: ovirt_iso Type: Distributed-Replicate Volume ID: 82adda67-e958-4b19-b94e-bafb0f3ee9f7 Status: Started Snapshot Count: 0 Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: rack1storage1.wg.csph.cloud:/bricks/sda/ovirt_iso Brick2: rack2storage1.wg.csph.cloud:/bricks/sda/ovirt_iso Brick3: arbiter0.wg.csph.cloud:/bricks/sda/ovirt_iso (arbiter) Brick4: rack1storage1.wg.csph.cloud:/bricks/sdb/ovirt_iso Brick5: rack2storage1.wg.csph.cloud:/bricks/sdb/ovirt_iso Brick6: arbiter0.wg.csph.cloud:/bricks/sdb/ovirt_iso (arbiter) Brick7: rack1storage1.wg.csph.cloud:/bricks/sdc/ovirt_iso Brick8: rack2storage1.wg.csph.cloud:/bricks/sdc/ovirt_iso Brick9: arbiter0.wg.csph.cloud:/bricks/sdc/ovirt_iso (arbiter) Brick10: rack1storage1.wg.csph.cloud:/bricks/sdd/ovirt_iso Brick11: rack2storage1.wg.csph.cloud:/bricks/sdd/ovirt_iso Brick12: arbiter0.wg.csph.cloud:/bricks/sdd/ovirt_iso (arbiter) Options Reconfigured: auth.allow: 10.253.1.*,10.253.2.* transport.address-family: inet nfs.disable: on performance.client-io-threads: off Here is the gluster volume info
Created attachment 1557669 [details] glusterfs client logs as requested logfiles
> "Are you encountering this bug when you're running with oVirt?" Yes I am encountering the bug when running oVirt, which I set up 2 weeks ago. I recently installed 4.3. Those glusterfs bricks are currently running with ext4 not xfs but I as I did talk with some friends that are runing gluster, this is the one main difference in my setup to theirs. I am already planning a test where I move over those bricks to xfs and see if the bug is still there.
(In reply to zem from comment #3) > Here is the gluster volume info, the other info follows. > > [root@arbiter0 ~]# gluster volume info > > Volume Name: ovirt_engine > Type: Distributed-Replicate > Volume ID: ed509cda-c236-49bf-a6e5-ff57855d0558 > Status: Started > Snapshot Count: 0 > Number of Bricks: 4 x (2 + 1) = 12 > Transport-type: tcp > Bricks: > Brick1: rack1storage1.wg.csph.cloud:/bricks/sda/ovirt_engine > Brick2: rack2storage1.wg.csph.cloud:/bricks/sda/ovirt_engine > Brick3: arbiter0.wg.csph.cloud:/bricks/sda/ovirt_engine (arbiter) > Brick4: rack1storage1.wg.csph.cloud:/bricks/sdb/ovirt_engine > Brick5: rack2storage1.wg.csph.cloud:/bricks/sdb/ovirt_engine > Brick6: arbiter0.wg.csph.cloud:/bricks/sdb/ovirt_engine (arbiter) > Brick7: rack1storage1.wg.csph.cloud:/bricks/sdc/ovirt_engine > Brick8: rack2storage1.wg.csph.cloud:/bricks/sdc/ovirt_engine > Brick9: arbiter0.wg.csph.cloud:/bricks/sdc/ovirt_engine (arbiter) > Brick10: rack1storage1.wg.csph.cloud:/bricks/sdd/ovirt_engine > Brick11: rack2storage1.wg.csph.cloud:/bricks/sdd/ovirt_engine > Brick12: arbiter0.wg.csph.cloud:/bricks/sdd/ovirt_engine (arbiter) > Options Reconfigured: > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > auth.allow: 10.253.0.10 > features.quota: on > features.inode-quota: on > features.quota-deem-statfs: on Can you disable quota? > > Volume Name: ovirt_export > Type: Distributed-Replicate > Volume ID: cd114fe8-aae5-42e9-a2e5-3e02319cb6d7 > Status: Started > Snapshot Count: 0 > Number of Bricks: 4 x (2 + 1) = 12 > Transport-type: tcp > Bricks: > Brick1: rack1storage1.wg.csph.cloud:/bricks/sda/ovirt_export > Brick2: rack2storage1.wg.csph.cloud:/bricks/sda/ovirt_export > Brick3: arbiter0.wg.csph.cloud:/bricks/sda/ovirt_export (arbiter) > Brick4: rack1storage1.wg.csph.cloud:/bricks/sdb/ovirt_export > Brick5: rack2storage1.wg.csph.cloud:/bricks/sdb/ovirt_export > Brick6: arbiter0.wg.csph.cloud:/bricks/sdb/ovirt_export (arbiter) > Brick7: rack1storage1.wg.csph.cloud:/bricks/sdc/ovirt_export > Brick8: rack2storage1.wg.csph.cloud:/bricks/sdc/ovirt_export > Brick9: arbiter0.wg.csph.cloud:/bricks/sdc/ovirt_export (arbiter) > Brick10: rack1storage1.wg.csph.cloud:/bricks/sdd/ovirt_export > Brick11: rack2storage1.wg.csph.cloud:/bricks/sdd/ovirt_export > Brick12: arbiter0.wg.csph.cloud:/bricks/sdd/ovirt_export (arbiter) > Options Reconfigured: > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > auth.allow: 10.253.1.*,10.253.2.* > > Volume Name: ovirt_images > Type: Distributed-Replicate > Volume ID: dd865c50-49f1-4402-8894-c0cd07853d10 > Status: Started > Snapshot Count: 0 > Number of Bricks: 4 x (2 + 1) = 12 > Transport-type: tcp > Bricks: > Brick1: rack1storage1.wg.csph.cloud:/bricks/sda/ovirt_images > Brick2: rack2storage1.wg.csph.cloud:/bricks/sda/ovirt_images > Brick3: arbiter0.wg.csph.cloud:/bricks/sda/ovirt_images (arbiter) > Brick4: rack1storage1.wg.csph.cloud:/bricks/sdb/ovirt_images > Brick5: rack2storage1.wg.csph.cloud:/bricks/sdb/ovirt_images > Brick6: arbiter0.wg.csph.cloud:/bricks/sdb/ovirt_images (arbiter) > Brick7: rack1storage1.wg.csph.cloud:/bricks/sdc/ovirt_images > Brick8: rack2storage1.wg.csph.cloud:/bricks/sdc/ovirt_images > Brick9: arbiter0.wg.csph.cloud:/bricks/sdc/ovirt_images (arbiter) > Brick10: rack1storage1.wg.csph.cloud:/bricks/sdd/ovirt_images > Brick11: rack2storage1.wg.csph.cloud:/bricks/sdd/ovirt_images > Brick12: arbiter0.wg.csph.cloud:/bricks/sdd/ovirt_images (arbiter) > Options Reconfigured: > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > auth.allow: 10.253.1.*,10.253.2.* > > Volume Name: ovirt_iso > Type: Distributed-Replicate > Volume ID: 82adda67-e958-4b19-b94e-bafb0f3ee9f7 > Status: Started > Snapshot Count: 0 > Number of Bricks: 4 x (2 + 1) = 12 > Transport-type: tcp > Bricks: > Brick1: rack1storage1.wg.csph.cloud:/bricks/sda/ovirt_iso > Brick2: rack2storage1.wg.csph.cloud:/bricks/sda/ovirt_iso > Brick3: arbiter0.wg.csph.cloud:/bricks/sda/ovirt_iso (arbiter) > Brick4: rack1storage1.wg.csph.cloud:/bricks/sdb/ovirt_iso > Brick5: rack2storage1.wg.csph.cloud:/bricks/sdb/ovirt_iso > Brick6: arbiter0.wg.csph.cloud:/bricks/sdb/ovirt_iso (arbiter) > Brick7: rack1storage1.wg.csph.cloud:/bricks/sdc/ovirt_iso > Brick8: rack2storage1.wg.csph.cloud:/bricks/sdc/ovirt_iso > Brick9: arbiter0.wg.csph.cloud:/bricks/sdc/ovirt_iso (arbiter) > Brick10: rack1storage1.wg.csph.cloud:/bricks/sdd/ovirt_iso > Brick11: rack2storage1.wg.csph.cloud:/bricks/sdd/ovirt_iso > Brick12: arbiter0.wg.csph.cloud:/bricks/sdd/ovirt_iso (arbiter) > Options Reconfigured: > auth.allow: 10.253.1.*,10.253.2.* > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > > > Here is the gluster volume info From the volume info it looks like none of the recommended options are set for use as storage domain in oVirt. Please take a look at https://github.com/gluster/glusterfs/blob/master/extras/group-virt.example. You can set these options using "gluster volume set <volumename> group virt" ** NOTE ** setting group virt adds sharding option to the volume. If your volume has data, this will cause issues. So try this with newly created volumes. You would also need to set the permissions on volume so that the images are accessible by qemu:kvm (How did it work without these settings, I wonder?) gluster volume set <volumename> storage.owner-uid=36 gluster volume set <volumename> storage.owner-gid=36 Can you ensure your volume has these settings and check again.
I did set the storage.owner-uid and gid manually on the mounted filesystem as this was suggested in the first Q&A that I could find via google. The documentation on ovirt.org, as good as it is, seems a bit incomplete regarding that topic. I will incorporate those flags into my storage tests later that week. I remember that I avoided any sort of striping because it somehow did not work relieably, but I am not sure if sharding was already a thing back then.
Test description: You can use the following script: -----find_zeros.sh---------------------------------- #!/bin/bash cd / find opt/ usr/ -type f | ( EXPOUT="00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................" while read f do TESTOUT=$(xxd -l 16 "${f}") if [[ "${TESTOUT}" == "${EXPOUT}" ]] then echo ${f} fi done ) ------------------------------------------------ If your VM is facing issues. I know that this is not absolute but in my experience the results are good enough. I do run the test as follows: 1. Launch a fresh debian stretch qcow2 or raw image. (I did try with raw as mentioned.) 2. run find_zeros.sh --> should find no output 3. vim /etc/apt/sources.list --> s/stretch/buster/ 4. apt dist-upgrade 5. reboot vm to flush all caches 6. run find_zeroe.sh --> should list corrupted files if the image is broken. So far I did run two tests: - nfs-ext4: good - gluster-ext4-without-group-virt: bad - gluster-ext4-without-group-virt-thick-provisioned: good - gluster-ext4-with-group-virt: TBD - gluster-xfs-without-group-virt: TBD - gluster-xfs-with-group-virt: TBD I plan to make those tests on my cloud this Saturday. Switching to xfs is sort of no going back operation because I dont have enough disks left.
Thanks, setting needinfo for results with group virt profile
Sad news: Neither Filesystem nor group virt settings are making any difference. Tests: - gluster-ext4-with-group-virt: bad - gluster-xfs-without-group-virt: bad - gluster-xfs-with-group-virt: bad ------------------------------------------------------------------------------ here the Output of volume info for the new volume on ext4 with group=virt: Volume Name: ovirt_images_virt Type: Distributed-Replicate Volume ID: 9391c6fb-c97b-4dcb-90e0-ea696915cb2b Status: Started Snapshot Count: 0 Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: rack1storage1.wg.csph.cloud:/bricks/sda/ovirt_images_virt Brick2: rack2storage1.wg.csph.cloud:/bricks/sda/ovirt_images_virt Brick3: arbiter0.wg.csph.cloud:/bricks/sda/ovirt_images_virt (arbiter) Brick4: rack1storage1.wg.csph.cloud:/bricks/sdb/ovirt_images_virt Brick5: rack2storage1.wg.csph.cloud:/bricks/sdb/ovirt_images_virt Brick6: arbiter0.wg.csph.cloud:/bricks/sdb/ovirt_images_virt (arbiter) Brick7: rack1storage1.wg.csph.cloud:/bricks/sdc/ovirt_images_virt Brick8: rack2storage1.wg.csph.cloud:/bricks/sdc/ovirt_images_virt Brick9: arbiter0.wg.csph.cloud:/bricks/sdc/ovirt_images_virt (arbiter) Brick10: rack1storage1.wg.csph.cloud:/bricks/sdd/ovirt_images_virt Brick11: rack2storage1.wg.csph.cloud:/bricks/sdd/ovirt_images_virt Brick12: arbiter0.wg.csph.cloud:/bricks/sdd/ovirt_images_virt (arbiter) Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: enable performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off ---------------------------------------------------------------------------- xfs + group=virt Volume Name: ovirt_images Type: Distributed-Replicate Volume ID: d8398afd-ad3e-4ef4-9753-251f625f1c0b Status: Started Snapshot Count: 0 Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: rack1storage1.wg.csph.cloud:/bricks/sda/ovirt_images Brick2: rack2storage1.wg.csph.cloud:/bricks/sda/ovirt_images Brick3: arbiter0.wg.csph.cloud:/bricks/sda/ovirt_images (arbiter) Brick4: rack1storage1.wg.csph.cloud:/bricks/sdb/ovirt_images Brick5: rack2storage1.wg.csph.cloud:/bricks/sdb/ovirt_images Brick6: arbiter0.wg.csph.cloud:/bricks/sdb/ovirt_images (arbiter) Brick7: rack1storage1.wg.csph.cloud:/bricks/sdc/ovirt_images Brick8: rack2storage1.wg.csph.cloud:/bricks/sdc/ovirt_images Brick9: arbiter0.wg.csph.cloud:/bricks/sdc/ovirt_images (arbiter) Brick10: rack1storage1.wg.csph.cloud:/bricks/sdd/ovirt_images Brick11: rack2storage1.wg.csph.cloud:/bricks/sdd/ovirt_images Brick12: arbiter0.wg.csph.cloud:/bricks/sdd/ovirt_images (arbiter) Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: enable performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off --------------------------------------------------------------------- xfs without group=virt Volume Name: ovirt_images_virt Type: Distributed-Replicate Volume ID: ab29a92f-876e-4b82-9963-f2aa75a938ee Status: Started Snapshot Count: 0 Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: rack1storage1.wg.csph.cloud:/bricks/sda/ovirt_images_virt Brick2: rack2storage1.wg.csph.cloud:/bricks/sda/ovirt_images_virt Brick3: arbiter0.wg.csph.cloud:/bricks/sda/ovirt_images_virt (arbiter) Brick4: rack1storage1.wg.csph.cloud:/bricks/sdb/ovirt_images_virt Brick5: rack2storage1.wg.csph.cloud:/bricks/sdb/ovirt_images_virt Brick6: arbiter0.wg.csph.cloud:/bricks/sdb/ovirt_images_virt (arbiter) Brick7: rack1storage1.wg.csph.cloud:/bricks/sdc/ovirt_images_virt Brick8: rack2storage1.wg.csph.cloud:/bricks/sdc/ovirt_images_virt Brick9: arbiter0.wg.csph.cloud:/bricks/sdc/ovirt_images_virt (arbiter) Brick10: rack1storage1.wg.csph.cloud:/bricks/sdd/ovirt_images_virt Brick11: rack2storage1.wg.csph.cloud:/bricks/sdd/ovirt_images_virt Brick12: arbiter0.wg.csph.cloud:/bricks/sdd/ovirt_images_virt (arbiter) Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 transport.address-family: inet nfs.disable: on performance.client-io-threads: off -----------------------------------------------------------------------------------------------------
I did set up 6 virtual machines on my Fedora29 Desktop with glusterfs 5 and glusterfs 6 to provide some reproduceable results, also a fail, but that lead me to an important fact in my setup that I completely overlooked till now: My ovirt-engine is hostet as a libvirt/virt-manager vm on a seperate host, but its storage is also located on the glusterfs storage in volume ovirt_engine. There is no reckognizeable image corruption on that engine VM or on VMs that are running via virt-manager and a fuse mount. so I started digging: ----------This is how ovirt starts the vm---------------------------------------- -drive file=/rhev/data-center/mnt/glusterSD/gluster:_ovirt__images__virt/c0e6c246-4c2f-4b65-95fe-8495a07689e8/images/a14c6441-84c0-4a98-9483-85f74ce0a80c/bbaa47a9-e802-4a84-9d64-b3734a574aeb, format=qcow2, if=none, id=drive-ua-a14c6441-84c0-4a98-9483-85f74ce0a80c, serial=a14c6441-84c0-4a98-9483-85f74ce0a80c, werror=stop, rerror=stop, cache=none, aio=native -device virtio-blk-pci, iothread=iothread1, scsi=off, bus=pci.0,addr=0x6,drive=drive-ua-a14c6441-84c0-4a98-9483-85f74ce0a80c, id=ua-a14c6441-84c0-4a98-9483-85f74ce0a80c, bootindex=1, write-cache=on -------------------------------------------------------------------------------------- At the moment I could narrow the possible flags down to: - aio=native (EA-Mode: native) - cache=none (Buffer mode: none) - write-cache=on (Buffer mode: none) after adding aio=native I had my breakthrough. I now have a Fedora workstation here with 3 vms providing glusterfs, and one vm being with aio=native showing the issue. I am not sure yet if this was the issue last september, as i still have a few gfapi results not adding up. But it is the problem now. How can I change those performance behaviors in oVirt? Why am I the only one having that problem when running oVirt? (rethorical question)
got it! https://github.com/oVirt/ovirt-engine/commit/df07e633d3cdd2c1e0d21dd90e441fee94c452aa#diff-d6f7100af881feb7d909f23faeda326f https://bugzilla.redhat.com/show_bug.cgi?id=1630744 If I read that change correctly all installations of Ovirt 4.3 made after 2nd November 2018 using glusterfs are propably affected. I am not sure how the config settings database is upgraded, when you upgrade to the new release, I am trying to deactivarte aio=native using engine-settings set UseNativeIOForGluster false
I may need help setting this property.
[root@engine ~]# engine-config -g UseNativeIOForGluster Error fetching UseNativeIOForGluster value: no such entry. Please verify key name and property file support. which is a bit annoying as the option should be there, and it should be true
accidently unset needinfo flag
(In reply to zem from comment #12) > got it! > > https://github.com/oVirt/ovirt-engine/commit/ > df07e633d3cdd2c1e0d21dd90e441fee94c452aa#diff- > d6f7100af881feb7d909f23faeda326f > https://bugzilla.redhat.com/show_bug.cgi?id=1630744 > > If I read that change correctly all installations of Ovirt 4.3 made after > 2nd November 2018 using glusterfs are propably affected. > I am not sure how the config settings database is upgraded, when you upgrade > to the new release, I am trying to deactivarte aio=native using > > engine-settings set UseNativeIOForGluster false Not sure I understand. Are you saying that aio=native is the culprit here? And you're NOT seeing the issue with aio=threads? As far as gluster version is concerned, you're using 4.3? Can you confirm that? -Krutika
That is exactly what my Test shows. It is ovirt 4.3 and gluster 5.5 or 5.6 a recent installation made one month ago. More Important: I can reproduce the behaviour on my Fedora with virt-manager as described, and a fully Virtualized glusterfs. AIO=native > fail AIO-Hypervisor default (thread) > success I could not switch my ovirt instance to aio=thread yet ( don't know how ), so I could not run the test on ovirt itself.
I figured out how to do the Hotfix: ------------------------------------------------------------------------------------------------- 1. Patch /etc/ovirt-engine/engine-config/engine-config.properties to make the missing UseNativeIOForGluster parameter available to engine-config [root@engine engine-config]# diff -u engine-config.properties.orig engine-config.properties --- engine-config.properties.orig 2019-04-30 17:30:44.408000000 +0200 +++ engine-config.properties 2019-04-30 17:32:57.738000000 +0200 @@ -513,3 +513,5 @@ CinderlibCommandTimeoutInMinutes.descritpion=The cinderlib command timeout in minutes CinderlibCommandTimeoutInMinutes.type=Integer CinderlibCommandTimeoutInMinutes.validValues=0..3000 +UseNativeIOForGluster.description=Access volumes on glusterfs with aio native insteat of thread +UseNativeIOForGluster.type=Boolean [root@engine engine-config]# 2. set UseNativeIOForGluster to false [root@engine ovirt-engine]# engine-config -s UseNativeIOForGluster=False 3. restart engine [root@engine ovirt-engine]# systemctl restart ovirt-engine 4. restart/start affected virtual machines 5. run the test cycle to see if the chage has taken effect ----------------------------------------------------------------------------------- Possible sideeffects: The fix may reintroduce the behaviours that 1630744 intended to fix in the firstplace. (I can live with that) This solution is a workaround to prevent my data from being killed, the originating bug is somewhere located in in the interface between qemu and glusterfs and should be fixed there. A bugreport for qemu is open for a while now and the link can be found here.
Which version of Gluster are you using, btw?
Also adding needinfo on Sas to try out the test case in Comment 8
@Yaniv: as I wrote gluster5 I think it may still be 5.5 on the server and 5.6 in my virtualized test.
(In reply to Sahina Bose from comment #20) > Also adding needinfo on Sas to try out the test case in Comment 8 I have tried out the same scenario with RHV 4.3.3 & RHGS 3.4.4 ( glusterfs-3.12.2-47.el7rhgs ) I have used 2x(2+1) distributed arbitrated replicated volume for this testing. The issue is not seen. The script returned no output. And VMs are using aio=native
@satheesaran: - Is Gluster 3.12 the latest RHGS gluster version? As I was using glusterfs 5 (5.5 or 5.6) in my tests. - Can you check if your qcow2 Image that you have uses was stored in sparse mode or in qcow2 mode? - should I prepare a nested qcow2 showing that issue and send it over? The qemu version might also be of interest This is the CentOS Version of my Storage System: [root@rack1storage1 ~]# uname -a Linux rack1storage1.boot.csph.cloud 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon Mar 18 15:06:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux [root@rack1storage1 ~]# cat /proc/version Linux version 3.10.0-957.10.1.el7.x86_64 (mockbuild.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Mon Mar 18 15:06:45 UTC 2019 [root@rack1storage1 ~]# rpm -qa | grep gluster glusterfs-5.5-1.el7.x86_64 glusterfs-server-5.5-1.el7.x86_64 glusterfs-client-xlators-5.5-1.el7.x86_64 glusterfs-api-5.5-1.el7.x86_64 glusterfs-cli-5.5-1.el7.x86_64 centos-release-gluster5-1.0-1.el7.centos.noarch glusterfs-libs-5.5-1.el7.x86_64 glusterfs-fuse-5.5-1.el7.x86_64 [root@rack1server2 ~]# rpm -qa | grep qemu ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch libvirt-daemon-driver-qemu-4.5.0-10.el7_6.6.x86_64 qemu-kvm-ev-2.12.0-18.el7_6.3.1.x86_64 qemu-img-ev-2.12.0-18.el7_6.3.1.x86_64 qemu-kvm-common-ev-2.12.0-18.el7_6.3.1.x86_64
(In reply to zem from comment #23) > @satheesaran: > > - Is Gluster 3.12 the latest RHGS gluster version? As I was using glusterfs > 5 (5.5 or 5.6) in my tests. Thanks Zem. Gluster 6 is the latest, though 5.6 is latest in Gluster 5 stream. My answer to Sahina's question was based on downstream product 'Red Hat Gluster Storage' > - Can you check if your qcow2 Image that you have uses was stored in sparse > mode or in qcow2 mode? I have used 'Preallocated' raw image. Have you thin allocated the image file with qcow2 ? > - should I prepare a nested qcow2 showing that issue and send it over? I can try in our systems. > > > The qemu version might also be of interest > > This is the CentOS Version of my Storage System: > > [root@rack1storage1 ~]# uname -a > Linux rack1storage1.boot.csph.cloud 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon > Mar 18 15:06:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux > > [root@rack1storage1 ~]# cat /proc/version > Linux version 3.10.0-957.10.1.el7.x86_64 > (mockbuild.centos.org) (gcc version 4.8.5 20150623 (Red Hat > 4.8.5-36) (GCC) ) #1 SMP Mon Mar 18 15:06:45 UTC 2019 > > [root@rack1storage1 ~]# rpm -qa | grep gluster > glusterfs-5.5-1.el7.x86_64 > glusterfs-server-5.5-1.el7.x86_64 > glusterfs-client-xlators-5.5-1.el7.x86_64 > glusterfs-api-5.5-1.el7.x86_64 > glusterfs-cli-5.5-1.el7.x86_64 > centos-release-gluster5-1.0-1.el7.centos.noarch > glusterfs-libs-5.5-1.el7.x86_64 > glusterfs-fuse-5.5-1.el7.x86_64 > > [root@rack1server2 ~]# rpm -qa | grep qemu > ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch > libvirt-daemon-driver-qemu-4.5.0-10.el7_6.6.x86_64 > qemu-kvm-ev-2.12.0-18.el7_6.3.1.x86_64 > qemu-img-ev-2.12.0-18.el7_6.3.1.x86_64 > qemu-kvm-common-ev-2.12.0-18.el7_6.3.1.x86_64 Its downstream specific qemu that I used. qemu-kvm-rhev-2.12.0-18.el7_6.4.x86_64 Just guide me with the image creation. 1. type of image: raw or qcow2 ? 2. Resource allocation: Is the image preallocated or thinly provisioned ? 3. What's the size of the disk ?
(In reply to SATHEESARAN from comment #24) > (In reply to zem from comment #23) > > @satheesaran: > > - Can you check if your qcow2 Image that you have uses was stored in sparse > > mode or in qcow2 mode? > > I have used 'Preallocated' raw image. > Have you thin allocated the image file with qcow2 ? As I pointed out in comment 8 (in the test results) the problem does not occur with thick provisioned (preallocated) images. As far as my testing showed it has to be a sparsed raw or a sparsed qcow2. > > - should I prepare a nested qcow2 showing that issue and send it over? > I can try in our systems. OK, fine. :) It should be reproduceable. > Its downstream specific qemu that I used. > qemu-kvm-rhev-2.12.0-18.el7_6.4.x86_64 > > Just guide me with the image creation. > 1. type of image: raw or qcow2 ? The error shows with both formats as long as they are "thin provisioned" by use of sparse mode (you cant create thin provisioned raw files within oVirt as far as I am aware). I do the following check on my images: If ls -l shows me the 30 GB size of the image and du shows me 1.6 GB the image is thin provisioned by using sparse. > 2. Resource allocation: > Is the image preallocated or thinly provisioned ? thinly provisioned! > 3. What's the size of the disk ? I have used 20-30 GB. My default template has 30GB thin provisioned and 1-1.6 GB in use. My experience is that the virtual size does not matter.
Can you share the gluster fuse mount and brick logs for a run where you hit this issue? What I mean by "gluster fuse mount log" is the logs of the mount where a vm that gets these extra zeroes is installed from. Also, do you see this issue even when you install your vm with the same qemu parameters but hosted directly on an xfs file system? In other words, when there's no gluster (and fuse kernel) in picture, do you see this bug? I need to know this to isolate the layer which is causing this issue. -Krutika
Hi, sorry for the late answer, I had quite busy weeks and as I already offered I can prepare you a qcow2 image with a test setup instead of back and forward guesswork. And that one will contain all the needed logs. Spoiler: I think we should not use AIO=native by default! ------------------------------------------------------------------------- @Krutika: To answer your question first: > Also, do you see this issue even when you install your vm with the same qemu parameters but hosted directly on an xfs file system? No but I would not bet on it, answer why can be found here: https://access.redhat.com/articles/41313 thanks for asking! ------------------------------------------------------------------------- You should also read and propably rejudge the following bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1305886 https://bugzilla.redhat.com/show_bug.cgi?id=1305886#c4 that Yaniv mentioned https://bugzilla.redhat.com/show_bug.cgi?id=1630744#c8 Especially: I do have a very hard time to find any reports of the mentioned "extra tests", of what has been tested and if those tests where successful. I don't need a full report protocol in a change, thats overkill but I could not even find an "I tested sth., no issues!" line, which means I am either so BIASed that I can't see properly or it has not been properly tested! I also have a very hard time to understand why, with so many cluebat hits, aio=native was made a default? AFAIR the original request was to have the option to do aio=native to deal with some rare performance issues not to have it as a default, so where did that Idea to make aio=native a default even come from? What I could find was that the patch from 1630744 seems to be sort of "pushed" through gerrit (propably to meet a deadline or so) and it was even inclomplete as you can see in Comment 18 of this bug. After all my suggestion for the least inversive solution to this is: - to go back to aio=threads by default (which also is qemu's default) - make nichawla as reporter of https://bugzilla.redhat.com/show_bug.cgi?id=1616270 aware that we do so, - make add my patch from Comment 18 (this bug) to make the settings configurable. 4 Lines to change if I did not misscount, and it can be easily be backported to ovirt 4.2 as this release is affected, too. regards Hans
Did you try this with preallocated disks?
(In reply to Sahina Bose from comment #31) > To sasundar: Did you try this with preallocated disks? Let me rephrase that question as you wrote the answer already in https://bugzilla.redhat.com/show_bug.cgi?id=1701736#c24 _Did have you already tried with thin provisioned disks?_
Verified at 4.3.5.3-0.1.el7. The same test as before was run meaning: 1) Tier1 TC's with multiple failures were seen before only on gluster with this issue after starting VM's were not witnessed here. 2) I did the same manual tests that reproduced the issue on the previose engine build ( create 8-12 VM's from RHEL8 template and run them) and none of the VM's got stuck on XFS: Metadata corruption detected at xfs_buf_ioend+0x58/0x1e0 [xfs], xfs_inode block 0x240 xfs_inode_bug_verify
(In reply to Sahina Bose from comment #31) > Did you try this with preallocated disks? Yes, I tried with preallocated disks, also I tried with aio=native and not facing any issues. But in all my cases I used RHEL 7 guests. Not sure that has anything to do with this issue. But anyhow now aio=threads with oVirt-4.3.5, and lets see that solves the problems, without affecting the performance
This bugzilla is included in oVirt 4.3.5 release, published on July 30th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.