Bug 861806
| Summary: | some of parallel qemu-img convert processes fail to write output file | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ben England <bengland> | ||||
| Component: | glusterfs | Assignee: | Amar Tumballi <amarts> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Sudhir D <sdharane> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 2.0 | CC: | perfbz, rhs-bugs, vbellur, vraman | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2012-10-09 17:55:12 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Ben England
2012-09-30 20:52:50 UTC
> # mount -t glusterfs -o background-qlen=64 gprfs025-10ge:/kvmfs /mnt/glusterfs
background-qlen is 64 by default, can you check if making it 128 helps?
Created attachment 619926 [details]
script that runs a command on specified list of hosts in parallel
this script is used to fire up qemu-img processes on all 8 clients in parallel.
Here are two traces of qemu-img process failing under heavy gluster load, the command used to generate the traces was run outside the above workload using: rm -fv /mnt/glusterfs/junk.tmp* ; strace -ttT -f qemu-img convert -f raw -O qcow2 /mnt/glusterfs/virt/rhs-vm.img /mnt/glusterfs/junk.img8 2>&1 | tee r2.log http://perf1.lab.bos.redhat.com/bengland/laptop/matte/virt/qemu-img-fail1.log http://perf1.lab.bos.redhat.com/bengland/laptop/matte/virt/qemu-img-fail2.log Maybe someone with KVM expertise can help explain what happened inside qemu-img. I tried both background-qlen=16 and background-qlen=256, neither one helped. If we could understand what qemu-img was doing with the filesystem when it saw an error, we could at least come up with an easier reproducer that would help us isolate the problem. I'll just have my qemu-img scripts retry when the failure occurs as a workaround. Even with 10 retries and 20-second delay between retries, I still see some failures when using 32 clients, 1 qemu-img per client. As a result, even with the workaround the throughput is still pretty bad. I think this is a scalability issue, perhaps because with qcow2 all of the read pressure is on a single server or two. Will see if this happens with raw format or with Gluster/NFS. I was using qemu-img wrong, was not creating image backed by master image, but copied from master image instead. When you do this right, the cloned VM image size is only 256 KB, but it grows after you boot it as VM writes data to the disk image. |