Bug 682174
Summary: | qcow2 code falls into an endless loop occurred in alloc_clusters_noref | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Mark Wu <dwu> |
Component: | kvm | Assignee: | Kevin Wolf <kwolf> |
Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5.6 | CC: | areis, juzhang, jwest, mkenneth, pgervase, pmdyermms, rhod, shuang, tburke, virt-maint |
Target Milestone: | rc | ||
Target Release: | 5.7 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-04-10 08:40:05 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 580949, 807971 |
Description
Mark Wu
2011-03-04 11:07:18 UTC
qemu-img check prints a lot of errors, I suppose? 31 errors on the copy of the image I have locally. Paul qemu-img check malongo.img Leaked cluster 475439 refcount=1 reference=0 Leaked cluster 475461 refcount=1 reference=0 Leaked cluster 475462 refcount=1 reference=0 Leaked cluster 493070 refcount=1 reference=0 Leaked cluster 505351 refcount=1 reference=0 Leaked cluster 507836 refcount=1 reference=0 Leaked cluster 2156950 refcount=1 reference=0 Leaked cluster 2156951 refcount=1 reference=0 Leaked cluster 2246627 refcount=1 reference=0 Leaked cluster 2246628 refcount=1 reference=0 Leaked cluster 2246629 refcount=1 reference=0 Leaked cluster 2246630 refcount=1 reference=0 Leaked cluster 2277456 refcount=1 reference=0 Leaked cluster 2277457 refcount=1 reference=0 Leaked cluster 2277458 refcount=1 reference=0 Leaked cluster 2277459 refcount=1 reference=0 Leaked cluster 2277460 refcount=1 reference=0 Leaked cluster 2277461 refcount=1 reference=0 Leaked cluster 2277462 refcount=1 reference=0 Leaked cluster 2277463 refcount=1 reference=0 Leaked cluster 2277464 refcount=1 reference=0 Leaked cluster 2277465 refcount=1 reference=0 Leaked cluster 2277466 refcount=1 reference=0 Leaked cluster 2277467 refcount=1 reference=0 Leaked cluster 2277468 refcount=1 reference=0 Leaked cluster 2277469 refcount=1 reference=0 Leaked cluster 2277470 refcount=1 reference=0 Leaked cluster 2277471 refcount=1 reference=0 Leaked cluster 2277472 refcount=1 reference=0 Leaked cluster 2277473 refcount=1 reference=0 Leaked cluster 2277474 refcount=1 reference=0 31 leaked clusters were found on the image. This means waste of disk space, but no harm to data. Hm, okay, but that's only cluster leaks. So the metadata seems to be consistent. The hexdump that Mark posted looks actually not too bad, it's just a fully used refcount block. Are you sure that the endless loop is in alloc_clusters_noref and not in a caller? You can use the 'finish' gdb command to avoid singlestepping all the way through the refcounts. Hi, I was able to run qemu-img check against the 2 production images that are the origin of this ticket. One image had 31 leaked clusters. The other had 64 leaked clusters. The information I sent earlier, from the USB copy of the production image, is consistent with what we are seeing in production. Paul Kevin, I think here the problem is that there's no zeroed entry available in ref-count table. It seems that reading from ref-count table always returns the following data (gdb) x/2048h 0x17245630 0x17245630: 0x0100 0x0100 0x0100 0x0100 0x0100 0x0100 0x0100 0x0100 0x17245640: 0x0100 0x0100 0x0100 0x0100 0x0100 0x0100 0x0100 0x0100 0x17245650: 0x0100 0x0100 0x0100 0x0100 0x0100 0x0100 0x0100 0x0100 0x17245660: 0x0100 0x0100 0x0100 0x0100 0x0100 0x0100 0x0100 0x0100 .... So it keeps retrying in alloc_clusters_noref: retry: for(i = 0; i < nb_clusters; i++) { int64_t next_cluster_index = s->free_cluster_index++; refcount = get_refcount(bs, next_cluster_index); if (refcount < 0) { return refcount; } else if (refcount != 0) { goto retry; } } But I have no idea about how all cluster was referenced. In every loop iteration s->free_cluster_index is increased by one, so we can't get an endless loop here. If no refcount block contains a free cluster, after scanning all clusters in vain a new refcount block would be appended at the end and initialized with zeros. If you say that you do see an endless loop, what value does s->free_cluster_index have and how does it change while looping through this code? Kevin, Yes, I am wrong. It's not an endless loop in alloc_clusters_noref. I found that in some condition, function update_refcount will move free_cluster_index(2474903) back to the freed cluster(1058272). Then the next alloc_clusters_noref has to traverse from the refcount table from 1058272 to 2474903. Could you shed some light on it for me? Many thanks! I will continue to look into it. Breakpoint 31, update_refcount (bs=0xd898010, offset=4334685696, length=<value optimized out>, addend=-1) at block-qcow2.c:3132 3132 s->free_cluster_index = cluster_index; (gdb) p ((BDRVQcowState*)0xd895dd0)->free_cluster_index $128 = 2474903 (gdb) n 54 return bswap_16(x); (gdb) n 3134 s->refcount_block_cache[block_index] = cpu_to_be16(refcount); (gdb) p ((BDRVQcowState*)0xd895dd0)->free_cluster_index $129 = 1058272 (gdb) bt #0 update_refcount (bs=0xd898010, offset=4334685696, length=<value optimized out>, addend=-1) at block-qcow2.c:3134 #1 0x000000000049982e in free_clusters (bs=0xd8af630, offset=3008, size=227106256) at block-qcow2.c:2740 #2 0x000000000049b2e0 in alloc_cluster_link_l2 (bs=0xd898010, m=<value optimized out>) at block-qcow2.c:1127 #3 0x000000000049b37e in qcow_aio_write_cb (opaque=0xee1bc10, ret=0) at block-qcow2.c:1592 #4 0x0000000000419c57 in posix_aio_read (opaque=<value optimized out>) at block-raw-posix.c:483 #5 0x0000000000409672 in main_loop_wait (timeout=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:4048 #6 0x00000000005005fa in kvm_main_loop () at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:596 #7 0x000000000040e795 in main_loop (argc=39, argv=0x7fff08c583b8, envp=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:4105 #8 main (argc=39, argv=0x7fff08c583b8, envp=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:6545 (gdb) cont Continuing. [Thread 0x66200940 (LWP 14430) exited] [New Thread 0x66200940 (LWP 14439)] Breakpoint 31, update_refcount (bs=0xd898010, offset=4334722048, length=<value optimized out>, addend=-1) at block-qcow2.c:3132 3132 s->free_cluster_index = cluster_index; (gdb) p ((BDRVQcowState*)0xd895dd0)->free_cluster_index $130 = 2475063 (gdb) n 54 return bswap_16(x); (gdb) n 3134 s->refcount_block_cache[block_index] = cpu_to_be16(refcount); (gdb) p ((BDRVQcowState*)0xd895dd0)->free_cluster_index $131 = 1058281 (gdb) bt #0 update_refcount (bs=0xd898010, offset=4334722048, length=<value optimized out>, addend=-1) at block-qcow2.c:3134 #1 0x000000000049982e in free_clusters (bs=0xd8af630, offset=3026, size=227106256) at block-qcow2.c:2740 #2 0x000000000049b2e0 in alloc_cluster_link_l2 (bs=0xd898010, m=<value optimized out>) at block-qcow2.c:1127 #3 0x000000000049b37e in qcow_aio_write_cb (opaque=0xee1bc10, ret=0) at block-qcow2.c:1592 #4 0x0000000000419c57 in posix_aio_read (opaque=<value optimized out>) at block-raw-posix.c:483 #5 0x0000000000409672 in main_loop_wait (timeout=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:4048 #6 0x00000000005005fa in kvm_main_loop () at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:596 #7 0x000000000040e795 in main_loop (argc=39, argv=0x7fff08c583b8, envp=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:4105 #8 main (argc=39, argv=0x7fff08c583b8, envp=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:6545 (gdb) cont Continuing. [Thread 0x66200940 (LWP 14439) exited] [New Thread 0x66200940 (LWP 14447)] [New Thread 0x69405940 (LWP 14448)] Kevin, I think the problem here is the big jump backwards of free_cluster_index. After the jump, alloc_clusters_noref probably has to check all reference count tables from new index to the original index one, like what mentioned in comment 12: from 1058272 to 2474903 and from 1058281 and 2475063. That's why I see qemu spent a lot of time in pread and the data "\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1\0\1" was repeated read out. From the backtrace in comment12, the jump backwards should happen in free clusters in alloc_cluster_link_l2. If there're two concurrent writes to a non-allocated cluster, they will use separate cluster. When the second write finishes, it will copy unmodified data from the cluster used by first write, and free the old cluster. So that means there're concurrent writing to the same location in guest OS. However, it was simply reproduced by command dd on guest and there's no explicitly concurrent writing . So I added some debugging log in alloc_cluster_link_l2 to verify it and see which location was written to. But unfortunately, I can't reproduce it any more with the image. Any idea? Mark, I have updated case 390745, twice, since last I heard from you. Could you send a response? Thanks Paul, I have copied the postings from the 390745 ticket to here. We're going over the output and will update you with our findings. Pete Hi, I ran qemu-img convert and moved the KVM guests to a raw image. That increased the image size by about 5GB, but has corrected the problem with the file. The uninterruptible sleep state is no longer occurring on disk writes. I have the original problem image on www.ctghome.com. You can download it, if more work needs to be done. I am moving on with this. Thanks for the help. Paul Hi pmdyermms, did you remove the image, I can not find it. Thanks suqin reproduce it with the special image kvm-83-241.el5 2.6.18-286.el5 cmd: /usr/libexec/qemu-kvm -monitor stdio -drive file=/root/bug682174,index=0,if=virtio,media=disk,cache=none,boot=on,format=qcow2 -net nic,vlan=0,model=virtio,macaddr='9a:d0:37:77:25:21' -net tap,vlan=0,ifname='t0-043914-vo3C',script=/home/Auto/autotest/client/tests/kvm/scripts/qemu-ifup-switch,downscript='no' -m 2048 -smp 4,cores=2,threads=1,sockets=2 -cpu qemu64,+sse2 -vnc :1 -rtc-td-hack -boot c -usbdevice tablet -no-kvm-pit-reinjection This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. Bug is 13 months old, GSS moved on after using a workaround and we need a special image to reproduce it. I recommend closing it as WONTFIX on RHEL5. |