+++ This bug was initially created as a clone of Bug #1700316 +++ Description of problem: ------------------------------ After user creates a VM out of template and runs it the VM is going to paused state with storage I/O storage error and later investigating the logs found an error stating " stale file handler ". Version-Release number of selected component: ---------------------------------------------- glusterfs-server-3.12.2-47.el7rhgs.x86_64 gluster-ansible-cluster-1.0-1.el7rhgs.noarch gluster-ansible-repositories-1.0-1.el7rhgs.noarch gluster-ansible-features-1.0.4-5.el7rhgs.noarch gluster-ansible-infra-1.0.3-3.el7rhgs.noarch gluster-ansible-maintenance-1.0.1-1.el7rhgs.noarch gluster-ansible-roles-1.0.4-4.el7rhgs.noarch Red Hat Enterprise Linux Server release 7.6 (Maipo) How reproducible: ------------------- 2/2 Steps to Reproduce: --------------------- 1.Successful deploy gluster and hosted engine. 2.Create RHEL 7.6 template and then create vm out of it . 3.The VM goes to pause state after user runs the VM. Actual results: ---------------------- VM goes to pause state with storage I/O error. Expected results: ---------------------- VM should not go in pause state --- Additional comment from RHEL Product and Program Management on 2019-04-16 09:59:33 UTC --- This bug is automatically being proposed for RHHI-V 1.7 release at Red Hat Hyperconverged Infrastructure for Virtualization product, by setting the release flag 'rhiv‑1.7' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag.
I see read (which is an operation on fd) failing with ESTALE. The only reason I can think of for this is using anonymous-fds. Can you turn off performance.open-behind and see whether it helps?
Created attachment 1557151 [details] vmstore.log
In reply to comment#3 I turned off the performance.open-behind still could see this issue.The logs have been attached.
Mugdha, The logs seem to indicate that there were lot of images that were deleted. [2019-04-22 06:21:03.286624] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=a3180be9-e95b-4fc0-81a2-5f5ab9465a18 from backend [2019-04-22 06:21:03.289603] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=92a11c36-a0f0-4a17-9187-484c876106bc from backend [2019-04-22 06:21:03.291567] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=a535e911-fd7d-4bba-9eb9-91842f3ee2ca from backend [2019-04-22 06:21:03.601443] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=1f9cddce-d0cc-46f8-ab4b-c2918ffaf01b from backend [2019-04-22 06:21:03.604608] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=aebb9d5b-3af7-4fd3-b737-4a6402b5f677 from backend [2019-04-22 06:21:03.607079] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=dccb4887-6e03-428b-8a68-5466c38c5829 from backend [2019-04-22 06:21:03.894875] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=c244b562-504a-43e1-b2b5-521ff35776ec from backend [2019-04-22 06:21:03.897581] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=685e0108-35fa-45b2-b919-9d08d0b7c2cd from backend [2019-04-22 06:21:03.899639] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=8c4902db-4318-4005-a451-0d332f5aee59 from backend [2019-04-22 06:21:04.205551] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=fb6075c9-e744-4048-b30d-773bdbc2aa5a from backend [2019-04-22 06:21:04.208530] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=25f6cf12-050a-4d64-92e2-25a2ba640df4 from backend [2019-04-22 06:21:04.210682] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=f901881f-2d21-4ff1-9a33-0e6c929c0cc1 from backend [2019-04-22 06:21:04.507627] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=e2d262df-7cf7-4af0-b529-18960155b592 from backend [2019-04-22 06:21:04.510694] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=292e85a1-e197-4778-bb40-7441d31356c3 from backend [2019-04-22 06:21:04.513183] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=2555816a-88c2-47cd-837d-07113910592b from backend [2019-04-22 06:21:04.789920] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=45581e16-64c6-4c08-b0a0-e8796773c46d from backend [2019-04-22 06:21:04.792762] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=61536a7c-0b69-4220-81c9-2e0acdf53ee8 from backend [2019-04-22 06:21:04.794852] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=c99ac123-fc3a-4a0d-a142-ab3bdb144523 from backend [2019-04-22 06:21:05.076846] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=bfa89657-c62c-4555-96d8-0ca7e80c8ded from backend [2019-04-22 06:21:05.079858] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=c1351e11-46f4-4954-9c4c-844db7823ab7 from backend [2019-04-22 06:21:05.082067] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=06b6ca57-f81d-4a2d-9d4c-3dba7a825f83 from backend [2019-04-22 06:29:34.520531] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=376ebba2-cc30-4aa5-85db-f36f8998026d from backend Was deletion part of the test case too? If so, can you shed some details around that and at what point it is done? -Krutika
Digging further into why fd based operations fail with ESTALE errors, I had a discussion with Krutika on how shard handles open fds. fds are opened only on base shard and hence they are not immune to deletions of path. However, I am not sure whether there were any explicit deletion of shards. But, I do see quite a few shard_delete_shards msgs like, [2019-04-22 08:59:53.237690] W [fuse-bridge.c:2756:fuse_readv_cbk] 0-glusterfs-fuse: 3625714: READ => -1 gfid=ee412a06-262f-4710-948d-36ae97d4cbc3 fd=0x7ff6900c0358 (Stale file handle) [2019-04-22 09:05:25.498899] I [MSGID: 133022] [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=ee412a06-262f-4710-948d-36ae97d4cbc3 from backend Though readv failed earlier than shard_delete_shards msg above. Nevertheless I would like to explore further on: * what caused deletion of shards? * is it possible that shards were deleted earlier than fd based operations? I would request Krutika to explore the above two questions. As to relevance of this bug to dht, only error msgs present in logs relevant to DHT are those that just say the file is not present on a particular subvolume and these msgs doesn't indicate an error condition.
(In reply to Raghavendra G from comment #7) > Digging further into why fd based operations fail with ESTALE errors, I had > a discussion with Krutika on how shard handles open fds. fds are opened only > on base shard and hence they are not immune to deletions of path. However, I > am not sure whether there were any explicit deletion of shards. But, I do > see quite a few shard_delete_shards msgs like, > > [2019-04-22 08:59:53.237690] W [fuse-bridge.c:2756:fuse_readv_cbk] > 0-glusterfs-fuse: 3625714: READ => -1 > gfid=ee412a06-262f-4710-948d-36ae97d4cbc3 fd=0x7ff6900c0358 (Stale file > handle) > [2019-04-22 09:05:25.498899] I [MSGID: 133022] > [shard.c:3636:shard_delete_shards] 0-vmstore-shard: Deleted shards of > gfid=ee412a06-262f-4710-948d-36ae97d4cbc3 from backend > > Though readv failed earlier than shard_delete_shards msg above. Nevertheless > I would like to explore further on: > * what caused deletion of shards? > * is it possible that shards were deleted earlier than fd based operations? > > I would request Krutika to explore the above two questions. Like I mentioned yesterday in the discussion on chat, deleting a vm is disallowed unless it's powered off. In other words, qemu closes the fd on the base file long before deletion. Moreover, the logs about deletion are stemming from qe/dev cleaning up old vms before rerunning the test. Tried the test case myself and I can confirm that deletions happened long after the pause, at the time I cleaned up the setup before a rerun. Multiple reasons to believe deletions and reads are not happening in parallel, including the order of events seen in the logs. I think we can safely drop that theory and explore other possibilities. I'll debug further on qe setup and post my analysis. > > As to relevance of this bug to dht, only error msgs present in logs relevant > to DHT are those that just say the file is not present on a particular > subvolume and these msgs doesn't indicate an error condition.
(In reply to Mugdha Soni from comment #0) > +++ This bug was initially created as a clone of Bug #1700316 +++ > > Description of problem: > ------------------------------ > After user creates a VM out of template and runs it the VM is going to > paused state with storage I/O storage error and later investigating the logs > found an error stating " stale file handler ". Which logs did you find "Stale file handle"? Is it logs of VM or glusterfs client logs? Asking this question to identify failure of which syscall might've caused the VM to pause. From glusterfs client logs attached I could see statfs and readv responses fail with ESTALE errors. I assume one of these would've caused VM Pause, but just want to confirm the same.
Here's the short summary of the debugging done today - So the vm pauses during readv fop on it - specifically on the 12th shard of the vm when I reproduced it today. The error that caused the pause is ESTALE which happens when - shard creates anon fd on the 12th shard and sends readv on it. DHT winds this down to AFR. AFR in turn winds this to its read child (child 0 in this case) as is evidenced by the occurrence of the following logs: [2019-04-24 09:20:20.772460] W [MSGID: 114031] [client-rpc-fops.c:2927:client3_3_readv_cbk] 10-vmstore-client-0: remote operation failed [No such file or directory] The brick returns ENOENT on the read. So afr attempts an inode refresh using fstat. The fstat also fails with ENOENT - [2019-04-24 09:20:20.772959] W [MSGID: 114031] [client-rpc-fops.c:1464:client3_3_fstat_cbk] 10-vmstore-client-0: remote operation failed [No such file or directory] [2019-04-24 09:20:20.773113] W [MSGID: 114031] [client-rpc-fops.c:1464:client3_3_fstat_cbk] 10-vmstore-client-2: remote operation failed [No such file or directory] [2019-04-24 09:20:20.773161] W [MSGID: 114031] [client-rpc-fops.c:1464:client3_3_fstat_cbk] 10-vmstore-client-1: remote operation failed [No such file or directory] The ENOENT is propagated upwards to DHT. DHT in readv cbk on seeing that error is ENOENT launches the migration check synctask in which it sends a lookup on the gfid. This lookup is failed in the resolution phase in protocol/server and server returns ESTALE - [2019-04-24 09:21:58.005820] E [MSGID: 109040] [dht-helper.c:1387:dht_migration_complete_check_task] 10-vmstore-dht: 2da0e8e7-e2ba-44db-af25-22abae893d41: failed to lookup the file on vmstore-dht [Stale file handle] DHT propagates it back to shard and shard propagates it upwards up to fuse where the READV is failed with ESTALE, following which qemu pauses the vm. [2019-04-24 09:31:09.806926] W [fuse-bridge.c:2756:fuse_readv_cbk] 0-glusterfs-fuse: 13005629: READ => -1 gfid=1da75ccf-3de7-4db4-85d2-c5bef44aaec8 fd=0x7ff6900ace58 (Stale file handle) However, this is where it gets interesting. I checked the backend for shard 12 of 1da75ccf-3de7-4db4-85d2-c5bef44aaec8 and it very much exists, along with its gfid handle, only it is a different gfid than the one on which anon fd is opened and readv sent. The readv was sent on gfid 2da0e8e7-e2ba-44db-af25-22abae893d41 (non-existent in backend) whereas the actual gfid in the backend of the shard is 1540624d-59e1-4c8c-b89e-360aa0e635c7 - [root@rhsqa-grafton1-nic2 .shard]# ls -lai 1da75ccf-3de7-4db4-85d2-c5bef44aaec8.12 68786459214 -rw-rw----. 2 root root 67108864 Apr 24 14:12 1da75ccf-3de7-4db4-85d2-c5bef44aaec8.12 [root@rhsqa-grafton1-nic2 .shard]# cd .. [root@rhsqa-grafton1-nic2 vmstore]# find . -inum 68786459214 ./.glusterfs/15/40/1540624d-59e1-4c8c-b89e-360aa0e635c7 ./.shard/1da75ccf-3de7-4db4-85d2-c5bef44aaec8.12 [root@rhsqa-grafton1-nic2 vmstore]# find . -inum 68786459214 ./.glusterfs/15/40/1540624d-59e1-4c8c-b89e-360aa0e635c7 ./.shard/1da75ccf-3de7-4db4-85d2-c5bef44aaec8.12 The question is why is there a gfid mismatch between in-memory inode and backend inode? The in-memory inode object in fact is clean with non-zero ref and is properly linked into inode table - (gdb) p local->inode_list[0] $14 = (inode_t *) 0x7ff69972c848 (gdb) p *local->inode_list[0] $15 = {table = 0x7ff68d4d4d30, gfid = "-\240\350\347\342\272Dۯ%\"\253\256\211=A", lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = { __prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, nlookup = 1, fd_count = 1, active_fd_count = 1, ref = 4, ia_type = IA_IFREG, fd_list = {next = 0x7ff63c027db8, prev = 0x7ff63c027db8}, dentry_list = {next = 0x7ff6a42817e8, prev = 0x7ff6a42817e8}, hash = {next = 0x7ff68d9d9a50, prev = 0x7ff68d9d9a50}, list = {next = 0x7ff6981d5d00, prev = 0x7ff6996a8270}, _ctx = 0x7ff6981f6b40, invalidate_sent = _gf_false} (gdb) call inode_resolve(0x7ff68d4d4d30, "/.shard/1da75ccf-3de7-4db4-85d2-c5bef44aaec8.12") [Thread 0x7ff6a132c700 (LWP 15827) exited] [Thread 0x7ff6a136d700 (LWP 15826) exited] [Thread 0x7ff6a1471700 (LWP 15821) exited] [Thread 0x7ff6a1430700 (LWP 15823) exited] $19 = (inode_t *) 0x7ff69972c848 This is the only thing left to debug, which I will complete tomorrow. Thanks, Krutika P.S. - I'm still using the setup, so would like it to be left undisturbed.
Found the issue. Had to load trace translator manually in the volfile to find the issue. AGAIN, the bug was hit on shard #12 of the file. The logs seen at the time of failure are - <readv-estale-failure> [2019-04-25 10:14:28.404340] W [MSGID: 114031] [client-rpc-fops.c:2927:client3_3_readv_cbk] 0-vmstore-client-1: remote operation failed [No such file or directory] [2019-04-25 10:14:28.404684] W [MSGID: 114031] [client-rpc-fops.c:1464:client3_3_fstat_cbk] 0-vmstore-client-1: remote operation failed [No such file or directory] [2019-04-25 10:14:28.404703] W [MSGID: 114031] [client-rpc-fops.c:1464:client3_3_fstat_cbk] 0-vmstore-client-2: remote operation failed [No such file or directory] [2019-04-25 10:14:28.404848] W [MSGID: 114031] [client-rpc-fops.c:1464:client3_3_fstat_cbk] 0-vmstore-client-0: remote operation failed [No such file or directory] [2019-04-25 10:14:28.406755] E [MSGID: 109040] [dht-helper.c:1387:dht_migration_complete_check_task] 0-vmstore-dht: 6f750303-3bf5-4d88-89f8-d054ab6df464: failed to lookup the file on vmstore-dht [Stale file handle] [2019-04-25 10:14:28.406813] W [fuse-bridge.c:2756:fuse_readv_cbk] 0-glusterfs-fuse: 1068032: READ => -1 gfid=33789188-e848-4e9f-90e5-700f31f257b8 fd=0x7f5d2407eb88 (Stale file handle) </readv-estale-failure> So in this setup, when creation of vms from template was initiated, the fallocate for the vm in question was sent on node 3's gluster mountpoint - [2019-04-25 09:27:43.098832] I [trace.c:715:trace_mknod_cbk] 0-vmstore-shard-child: 224: gfid=00000000-0000-0000-0000-000000000000 (op_ret=0 *stbuf = {gfid=6f750303-3bf5-4d88-89f8-d054ab6df464 ino=9941925239492310116, mode=100660, nlink=1, uid=0, gid=0, size=0, blocks=0, atime=2019-04-25-09:27:43 mtime=2019-04-25-09:27:43 ctime=2019-04-25-09:27:43 atime_sec=1556184463, atime_nsec=97066167, mtime_sec=1556184463, mtime_nsec=97066167, ctime_sec=1556184463, ctime_nsec=98066174}, *preparent = {gfid=be318638-e8a0-4c6d-977d-7a937aa84806 ino=10916015845652580358, mode=40755, nlink=3, uid=0, gid=0, size=4096, blocks=8, atime=2019-04-15-11:15:26 mtime=2019-04-25-09:27:43 ctime=2019-04-25-09:27:43 atime_sec=1555326926, atime_nsec=688394340, mtime_sec=1556184463, mtime_nsec=97105740, ctime_sec=1556184463, ctime_nsec=97105740}, *postparent = {gfid=be318638-e8a0-4c6d-977d-7a937aa84806 ino=10916015845652580358, mode=40755, nlink=3, uid=0, gid=0, size=4096, blocks=8, atime=2019-04-15-11:15:26 mtime=2019-04-25-09:27:43 ctime=2019-04-25-09:27:43 atime_sec=1555326926, atime_nsec=688394340, mtime_sec=1556184463, mtime_nsec=97105740, ctime_sec=1556184463, ctime_nsec=97105740}) and as part of its creation, shard #12 got linked with gfid 6f750303-3bf5-4d88-89f8-d054ab6df464 at time 09:27:43.*. And shard remembers this association in-memory so it doesn't have to look it up everytime on disk before sending operations on it. But around 6 seconds later, an ftruncate to size 0 followed by another fallocate is sent on this very same vm image. This truncate entails deleting all shards of the file, shard #12 being no exception: <ftruncate-on-node-2> [2019-04-25 09:27:49.330619] I [trace.c:1730:trace_lookup] 0-vmstore-shard-child: 396: gfid=00000000-0000-0000-0000-000000000000 path=/.shard/33789188-e848-4e9f-90e5-700f31f257b8.12 [2019-04-25 09:27:49.331083] I [trace.c:626:trace_lookup_cbk] 0-vmstore-shard-child: 396: gfid=6f750303-3bf5-4d88-89f8-d054ab6df464 (op_ret=0 *buf {gfid=6f750303-3bf5-4d88-89f8-d054ab6df464 ino=9941925239492310116, mode=100660, nlink=1, uid=0, gid=0, size=67108864, blocks=131072, atime=2019-04-25-09:27:43 mtime=2019-04-25-09:27:43 ctime=2019-04-25-09:27:43 atime_sec=1556184463, atime_nsec=97066167, mtime_sec=1556184463, mtime_nsec=197066836, ctime_sec=1556184463, ctime_nsec=199066849}, *postparent {gfid=be318638-e8a0-4c6d-977d-7a937aa84806 ino=10916015845652580358, mode=40755, nlink=3, uid=0, gid=0, size=4096, blocks=8, atime=2019-04-15-11:15:26 mtime=2019-04-25-09:27:49 ctime=2019-04-25-09:27:49 atime_sec=1555326926, atime_nsec=688394340, mtime_sec=1556184469, mtime_nsec=236149085, ctime_sec=1556184469, ctime_nsec=236149085} *** ^^ as part of resolution of shards before truncate, shard 12 is looked up in node 2. At this point it's still associated with gfid 6f750303-3bf5-4d88-89f8-d054ab6df464. *** [2019-04-25 09:27:49.422671] I [trace.c:1884:trace_unlink] 0-vmstore-shard-child: 396: gfid=6f750303-3bf5-4d88-89f8-d054ab6df464 path=/.shard/33789188-e848-4e9f-90e5-700f31f257b8.12 flag=0 [2019-04-25 09:27:49.422712] I [trace.c:486:trace_unlink_cbk] 0-vmstore-shard-child: 396: gfid=26c203c6-f958-405c-a98b-0257f105c233 op_ret=0, *preparent = {gfid=be318638-e8a0-4c6d-977d-7a937aa84806 ino=10916015845652580358, mode=40755, nlink=3, uid=0, gid=0, size=4096, blocks=8, atime=2019-04-15-11:15:26 mtime=2019-04-25-09:27:49 ctime=2019-04-25-09:27:49 atime_sec=1555326926, atime_nsec=688394340, mtime_sec=1556184469, mtime_nsec=422150398, ctime_sec=1556184469, ctime_nsec=422150398}, *postparent = {gfid=be318638-e8a0-4c6d-977d-7a937aa84806 ino=10916015845652580358, mode=40755, nlink=3, uid=0, gid=0, size=4096, blocks=8, atime=2019-04-15-11:15:26 mtime=2019-04-25-09:27:49 ctime=2019-04-25-09:27:49 atime_sec=1555326926, atime_nsec=688394340, mtime_sec=1556184469, mtime_nsec=422150398, ctime_sec=1556184469, ctime_nsec=422150398}) *** ^^^ the unlink above as part of truncate successfully deletes the shard. this is expected *** [2019-04-25 09:27:49.505079] I [trace.c:2236:trace_truncate] 0-vmstore-shard-child: 396: gfid=33789188-e848-4e9f-90e5-700f31f257b8 path=(null), offset=0 [2019-04-25 09:27:49.506329] I [trace.c:935:trace_truncate_cbk] 0-vmstore-shard-child: 396: (op_ret=0, *prebuf = {gfid=33789188-e848-4e9f-90e5-700f31f257b8 ino=10440874521693149112, mode=100660, nlink=1, uid=36, gid=36, size=67108864, blocks=131072, atime=2019-04-25-09:27:42 mtime=2019-04-25-09:27:43 ctime=2019-04-25-09:27:49 atime_sec=1556184462, atime_nsec=961065257, mtime_sec=1556184463, mtime_nsec=195066822, ctime_sec=1556184469, ctime_nsec=505109024}, *postbuf = {gfid=33789188-e848-4e9f-90e5-700f31f257b8 ino=10440874521693149112, mode=100660, nlink=1, uid=36, gid=36, size=0, blocks=0, atime=2019-04-25-09:27:42 mtime=2019-04-25-09:27:49 ctime=2019-04-25-09:27:49 atime_sec=1556184462, atime_nsec=961065257, mtime_sec=1556184469, mtime_nsec=505109024, ctime_sec=1556184469, ctime_nsec=505109024} ) *** ^^^ truncate on base file is also completed *** [2019-04-25 09:27:49.506375] I [trace.c:1667:trace_xattrop] 0-vmstore-shard-child: 396: gfid=33789188-e848-4e9f-90e5-700f31f257b8 (path=(null) flags=1) [2019-04-25 09:27:49.506929] I [trace.c:1342:trace_xattrop_cbk] 0-vmstore-shard-child: 396: gfid=33789188-e848-4e9f-90e5-700f31f257b8 op_ret=0, op_errno=0 [2019-04-25 09:27:49.506951] I [trace.c:1195:trace_ftruncate_cbk] 0-vmstore-shard-parent: 396: op_ret=0, *prebuf = {gfid=33789188-e848-4e9f-90e5-700f31f257b8 ino=10440874521693149112, mode=100660, nlink=1, uid=36, gid=36, size=42949672960, blocks=83886080, atime=2019-04-25-09:27:42 mtime=2019-04-25-09:27:43 ctime=2019-04-25-09:27:43 atime_sec=1556184462, atime_nsec=961065257, mtime_sec=1556184463, mtime_nsec=195066822, ctime_sec=1556184463, ctime_nsec=341067799}, *postbuf = {gfid=33789188-e848-4e9f-90e5-700f31f257b8 ino=10440874521693149112, mode=100660, nlink=1, uid=36, gid=36, size=0, blocks=83755008, atime=2019-04-25-09:27:42 mtime=2019-04-25-09:27:43 ctime=2019-04-25-09:27:43 atime_sec=1556184462, atime_nsec=961065257, mtime_sec=1556184463, mtime_nsec=195066822, ctime_sec=1556184463, ctime_nsec=341067799} ) *** ^^^ and now shard unwinds success to its parent which is passed on till fuse/qemu. this marks the completion of ftruncate *** </ftruncate> Not too long after this, the same client at node 2 initiates fallocate again on the vm. As part of this, block #12 gets recreated but now with a different gfid. Nothing surprising yet. <mknod-block-12-again> [2019-04-25 09:27:49.601551] I [trace.c:1823:trace_mknod] 0-vmstore-shard-child: 398: gfid=00000000-0000-0000-0000-000000000000 path=/.shard/33789188-e848-4e9f-90e5-700f31f257b8.12 mode=33200 umask=00, dev=0) [2019-04-25 09:27:49.602291] I [trace.c:715:trace_mknod_cbk] 0-vmstore-shard-child: 398: gfid=00000000-0000-0000-0000-000000000000 (op_ret=0 *stbuf = {gfid=a53dd89c-c5c5-4821-a4df-71b3ce0671e5 ino=11880339359096271333, mode=100660, nlink=1, uid=0, gid=0, size=0, blocks=0, atime=2019-04-25-09:27:49 mtime=2019-04-25-09:27:49 ctime=2019-04-25-09:27:49 atime_sec=1556184469, atime_nsec=600927283, mtime_sec=1556184469, mtime_nsec=600927283, ctime_sec=1556184469, ctime_nsec=600927283}, *preparent = {gfid=be318638-e8a0-4c6d-977d-7a937aa84806 ino=10916015845652580358, mode=40755, nlink=3, uid=0, gid=0, size=4096, blocks=8, atime=2019-04-15-11:15:26 mtime=2019-04-25-09:27:49 ctime=2019-04-25-09:27:49 atime_sec=1555326926, atime_nsec=688394340, mtime_sec=1556184469, mtime_nsec=601151662, ctime_sec=1556184469, ctime_nsec=601151662}, *postparent = {gfid=be318638-e8a0-4c6d-977d-7a937aa84806 ino=10916015845652580358, mode=40755, nlink=3, uid=0, gid=0, size=4096, blocks=8, atime=2019-04-15-11:15:26 mtime=2019-04-25-09:27:49 ctime=2019-04-25-09:27:49 atime_sec=1555326926, atime_nsec=688394340, mtime_sec=1556184469, mtime_nsec=601151662, ctime_sec=1556184469, ctime_nsec=601151662}) </mknod-block-12-again> Now shard 12 is linked to gfid a53dd89c-c5c5-4821-a4df-71b3ce0671e5. And when creation of the vm from template finishes, I attempt to boot the vm. The vm gets launched in host 3, which still contains old-gfid-to-block-12 association in-memory. It creates an anon fd on this inode, sends a readv which fails with ESTALE because the old .glusterfs gfid link ceases to exist.
This new method of creating vms from templates breaks a major assumption in sharding about this workload being a single-writer/reader use-case. I will think about whether there is an easy way to fix this without hurting performance in terms of having to resolve each shard before the read. Meanwhile, I am curious to know why in the new scheme (I say new because the algo seems to have changed since the prev version of RHV), RHV successfully fallocates the image through one mount, only to discard it all with a truncate to 0 and a reallocation (again another fallocate) on another mount point. And larger the image, more time spent in fallocation followed by truncation followed by a reallocation. What exactly is the reason for vm creation from template being done this way? Is this expected behavior? -Krutika
(In reply to Raghavendra G from comment #9) > (In reply to Mugdha Soni from comment #0) > > +++ This bug was initially created as a clone of Bug #1700316 +++ > > > > Description of problem: > > ------------------------------ > > After user creates a VM out of template and runs it the VM is going to > > paused state with storage I/O storage error and later investigating the logs > > found an error stating " stale file handler ". > > Which logs did you find "Stale file handle"? Is it logs of VM or glusterfs > client logs? Asking this question to identify failure of which syscall > might've caused the VM to pause. From glusterfs client logs attached I could > see statfs and readv responses fail with ESTALE errors. I assume one of > these would've caused VM Pause, but just want to confirm the same. Error was seen in glusterfs client logs.
Restoring needinfo on Benny Zlotnick.
Eyal worked on this, moving needinfo
(In reply to Benny Zlotnik from comment #15) > Eyal worked on this, moving needinfo Hi, Did you get a chance to check the queries?
Hi, Can you please provide more information about the VM / template? Gluster based? Thin/Preallocated? etc..
(In reply to Eyal Shenitzky from comment #18) > Hi, > > Can you please provide more information about the VM / template? > > Gluster based? > Thin/Preallocated? > etc.. Yes, gluster-based. I *think* it is preallocated but I'll let Sas confirm.
Also, I think that the truncate itself that I talked about in comment #12 was internally called by the `qemu-img` command. So what I'm observing is a case of multiple hypervisors trying to execute qemu-img/preallocate on the same file a few seconds apart, with the second hypervisor effectively undoing and redoing everything that the first one did.
(In reply to Krutika Dhananjay from comment #20) > Also, I think that the truncate itself that I talked about in comment #12 > was internally called by the `qemu-img` command. > So what I'm observing is a case of multiple hypervisors trying to execute > qemu-img/preallocate on the same file a few seconds apart, with the second > hypervisor effectively undoing and redoing everything that the first one did. Setting needinfo on Eyal over comments #12 and #20.
Please attach the engine log and the vdsm log from the hypervisor
(In reply to Krutika Dhananjay from comment #19) > (In reply to Eyal Shenitzky from comment #18) > > Hi, > > > > Can you please provide more information about the VM / template? > > > > Gluster based? > > Thin/Preallocated? > > etc.. > > Yes, gluster-based. > > I *think* it is preallocated but I'll let Sas confirm. Yes, its Gluster based and preallocated.
(In reply to Eyal Shenitzky from comment #22) > Please attach the engine log and the vdsm log from the hypervisor sosreport from the hypervisor is available. Refer comment#2
(In reply to Krutika Dhananjay from comment #26) > (In reply to Atin Mukherjee from comment #25) > > Status please? > > Setting needinfo on the right person. > > Eyal, any news on this? Can you please add the engine / vdsm version? There was no change in the flow for creating raw-preallocated file-based volumes in vdsm for versions < 4.3. In version 4.3 we create raw-preallocated file-based volume in the following flow: 1. truncate the volume to its size 2. fallocate the volume to the requested size 3. changing the volume permissions The flow that is used for version 4.3 when copying raw-preallocated file-based template volume for new VM that is based on it: 1. truncate the volume to its size 2. fallocate the volume to an initial size (1) 3. changing the volume permissions 4. fallocate the created volume again to it's full size. This change was done in order to keep the file-based volume as preallocated volume (before version 4.3 it was converted to sparse) and to prevent fallocating the volume twice.
Given that there's a change in the fallocate process, do we need any changes?
(In reply to Eyal Shenitzky from comment #27) > (In reply to Krutika Dhananjay from comment #26) > > (In reply to Atin Mukherjee from comment #25) > > > Status please? > > > > Setting needinfo on the right person. > > > > Eyal, any news on this? > > Can you please add the engine / vdsm version? > > There was no change in the flow for creating raw-preallocated file-based > volumes in vdsm for versions < 4.3. > > In version 4.3 we create raw-preallocated file-based volume in the following > flow: > > 1. truncate the volume to its size > 2. fallocate the volume to the requested size > 3. changing the volume permissions > > The flow that is used for version 4.3 when copying raw-preallocated > file-based template volume for new VM that is based on it: > > 1. truncate the volume to its size > 2. fallocate the volume to an initial size (1) > 3. changing the volume permissions > 4. fallocate the created volume again to it's full size. > > This change was done in order to keep the file-based volume as preallocated > volume (before version 4.3 it was converted to sparse) > and to prevent fallocating the volume twice. So what I see in the gluster trace logs is different from what is described above. In particular, the initial fallocate is not done to size 1(byte?). In fact the file is fully fallocated to its real size of 40GB before being truncated to 0 and then fallocated again to full size. The vdsm version used: vdsm.x86_64 4.30.13-1.el7ev @rhv-4.3.3 Am I missing something here? -Krutika
(In reply to Sahina Bose from comment #28) > Given that there's a change in the fallocate process, do we need any changes? I think if the first fallocate that is executed is of size < $features.shard-block-size, there are no changes needed and things should work fine without any pauses. But I don't see that to be the case. it seems to be doing fallocate to full size on both accounts. -Krutika
(In reply to Krutika Dhananjay from comment #29) > (In reply to Eyal Shenitzky from comment #27) > > (In reply to Krutika Dhananjay from comment #26) > > > (In reply to Atin Mukherjee from comment #25) > > > > Status please? > > > > > > Setting needinfo on the right person. > > > > > > Eyal, any news on this? > > > > Can you please add the engine / vdsm version? > > > > There was no change in the flow for creating raw-preallocated file-based > > volumes in vdsm for versions < 4.3. > > > > In version 4.3 we create raw-preallocated file-based volume in the following > > flow: > > > > 1. truncate the volume to its size > > 2. fallocate the volume to the requested size > > 3. changing the volume permissions > > > > The flow that is used for version 4.3 when copying raw-preallocated > > file-based template volume for new VM that is based on it: > > > > 1. truncate the volume to its size > > 2. fallocate the volume to an initial size (1) > > 3. changing the volume permissions > > 4. fallocate the created volume again to it's full size. > > > > This change was done in order to keep the file-based volume as preallocated > > volume (before version 4.3 it was converted to sparse) > > and to prevent fallocating the volume twice. > > So what I see in the gluster trace logs is different from what is described > above. > > In particular, the initial fallocate is not done to size 1(byte?). In fact > the file is fully fallocated to its real size of 40GB before being truncated > to 0 and then fallocated again to full size. > > The vdsm version used: > vdsm.x86_64 4.30.13-1.el7ev @rhv-4.3.3 > > Am I missing something here? > > -Krutika Can you please explain what is look problematic to you (RHV / VDSM side)? What do you think that causing the problem?
(In reply to Eyal Shenitzky from comment #31) > (In reply to Krutika Dhananjay from comment #29) > > (In reply to Eyal Shenitzky from comment #27) > > > (In reply to Krutika Dhananjay from comment #26) > > > > (In reply to Atin Mukherjee from comment #25) > > > > > Status please? > > > > > > > > Setting needinfo on the right person. > > > > > > > > Eyal, any news on this? > > > > > > Can you please add the engine / vdsm version? > > > > > > There was no change in the flow for creating raw-preallocated file-based > > > volumes in vdsm for versions < 4.3. > > > > > > In version 4.3 we create raw-preallocated file-based volume in the following > > > flow: > > > > > > 1. truncate the volume to its size > > > 2. fallocate the volume to the requested size > > > 3. changing the volume permissions > > > > > > The flow that is used for version 4.3 when copying raw-preallocated > > > file-based template volume for new VM that is based on it: > > > > > > 1. truncate the volume to its size > > > 2. fallocate the volume to an initial size (1) > > > 3. changing the volume permissions > > > 4. fallocate the created volume again to it's full size. > > > > > > This change was done in order to keep the file-based volume as preallocated > > > volume (before version 4.3 it was converted to sparse) > > > and to prevent fallocating the volume twice. > > > > So what I see in the gluster trace logs is different from what is described > > above. > > > > In particular, the initial fallocate is not done to size 1(byte?). In fact > > the file is fully fallocated to its real size of 40GB before being truncated > > to 0 and then fallocated again to full size. > > > > The vdsm version used: > > vdsm.x86_64 4.30.13-1.el7ev @rhv-4.3.3 > > > > Am I missing something here? > > > > -Krutika > > Can you please explain what is look problematic to you (RHV / VDSM side)? > What do you think that causing the problem? Let me try to answer this. Eyal, as per the flow you mentioned in comment 27, initially truncate file to size 40G, and then fallocate is called with 1 byte But Krutika thinks fallocate was called with real size of 40GB and then truncated to 0. This is where there's a mismatch seen. Can you confirm if the work flow is infact calling fallocate (full size), truncate(0) and then fallocate (full size) ?
(In reply to Sahina Bose from comment #32) > (In reply to Eyal Shenitzky from comment #31) > > (In reply to Krutika Dhananjay from comment #29) > > > (In reply to Eyal Shenitzky from comment #27) > > > > (In reply to Krutika Dhananjay from comment #26) > > > > > (In reply to Atin Mukherjee from comment #25) > > > > > > Status please? > > > > > > > > > > Setting needinfo on the right person. > > > > > > > > > > Eyal, any news on this? > > > > > > > > Can you please add the engine / vdsm version? > > > > > > > > There was no change in the flow for creating raw-preallocated file-based > > > > volumes in vdsm for versions < 4.3. > > > > > > > > In version 4.3 we create raw-preallocated file-based volume in the following > > > > flow: > > > > > > > > 1. truncate the volume to its size > > > > 2. fallocate the volume to the requested size > > > > 3. changing the volume permissions > > > > > > > > The flow that is used for version 4.3 when copying raw-preallocated > > > > file-based template volume for new VM that is based on it: > > > > > > > > 1. truncate the volume to its size > > > > 2. fallocate the volume to an initial size (1) > > > > 3. changing the volume permissions > > > > 4. fallocate the created volume again to it's full size. > > > > > > > > This change was done in order to keep the file-based volume as preallocated > > > > volume (before version 4.3 it was converted to sparse) > > > > and to prevent fallocating the volume twice. > > > > > > So what I see in the gluster trace logs is different from what is described > > > above. > > > > > > In particular, the initial fallocate is not done to size 1(byte?). In fact > > > the file is fully fallocated to its real size of 40GB before being truncated > > > to 0 and then fallocated again to full size. > > > > > > The vdsm version used: > > > vdsm.x86_64 4.30.13-1.el7ev @rhv-4.3.3 > > > > > > Am I missing something here? > > > > > > -Krutika > > > > Can you please explain what is look problematic to you (RHV / VDSM side)? > > What do you think that causing the problem? > > Let me try to answer this. Eyal, as per the flow you mentioned in comment > 27, initially truncate file to size 40G, and then fallocate is called with 1 > byte > But Krutika thinks fallocate was called with real size of 40GB and then > truncated to 0. > This is where there's a mismatch seen. Can you confirm if the work flow is > infact calling fallocate (full size), truncate(0) and then fallocate (full > size) ? This I already understand, I am trying to understand how it is influencing on you. The situation that the volume will be fully allocated twice is only if deferred_preallocation is not supported by the host.
(In reply to Eyal Shenitzky from comment #33) > (In reply to Sahina Bose from comment #32) > > (In reply to Eyal Shenitzky from comment #31) > > > (In reply to Krutika Dhananjay from comment #29) > > > > (In reply to Eyal Shenitzky from comment #27) > > > > > (In reply to Krutika Dhananjay from comment #26) > > > > > > (In reply to Atin Mukherjee from comment #25) > > > > > > > Status please? > > > > > > > > > > > > Setting needinfo on the right person. > > > > > > > > > > > > Eyal, any news on this? > > > > > > > > > > Can you please add the engine / vdsm version? > > > > > > > > > > There was no change in the flow for creating raw-preallocated file-based > > > > > volumes in vdsm for versions < 4.3. > > > > > > > > > > In version 4.3 we create raw-preallocated file-based volume in the following > > > > > flow: > > > > > > > > > > 1. truncate the volume to its size > > > > > 2. fallocate the volume to the requested size > > > > > 3. changing the volume permissions > > > > > > > > > > The flow that is used for version 4.3 when copying raw-preallocated > > > > > file-based template volume for new VM that is based on it: > > > > > > > > > > 1. truncate the volume to its size > > > > > 2. fallocate the volume to an initial size (1) > > > > > 3. changing the volume permissions > > > > > 4. fallocate the created volume again to it's full size. > > > > > > > > > > This change was done in order to keep the file-based volume as preallocated > > > > > volume (before version 4.3 it was converted to sparse) > > > > > and to prevent fallocating the volume twice. > > > > > > > > So what I see in the gluster trace logs is different from what is described > > > > above. > > > > > > > > In particular, the initial fallocate is not done to size 1(byte?). In fact > > > > the file is fully fallocated to its real size of 40GB before being truncated > > > > to 0 and then fallocated again to full size. > > > > > > > > The vdsm version used: > > > > vdsm.x86_64 4.30.13-1.el7ev @rhv-4.3.3 > > > > > > > > Am I missing something here? > > > > > > > > -Krutika > > > > > > Can you please explain what is look problematic to you (RHV / VDSM side)? > > > What do you think that causing the problem? > > > > Let me try to answer this. Eyal, as per the flow you mentioned in comment > > 27, initially truncate file to size 40G, and then fallocate is called with 1 > > byte > > But Krutika thinks fallocate was called with real size of 40GB and then > > truncated to 0. > > This is where there's a mismatch seen. Can you confirm if the work flow is > > infact calling fallocate (full size), truncate(0) and then fallocate (full > > size) ? > > This I already understand, I am trying to understand how it is influencing > on you. This is a problem in RHHI as shard translator in gluster assumes that the writes are from a single writer. And this was the case until this issue was hit, which breaks this assumption. Even if full-size fallocates were done from the same gluster mount point for a given vm, we are good. But what I saw was that the first full-size fallocate was executed on host 3's gluster mountpoint whereas the second was executed from host 2's gluster mountpoint. > > The situation that the volume will be fully allocated twice is only if > deferred_preallocation is not supported by the host. What's deferred_preallocation? Could you throw some light on that? -Krutika
I simulated the flow where an image is first truncated to actual size, then fallocated to 1B and then fallocated to full size. I can confirm that if these are the steps followed, then there will be no io errors (which in turn implies there will be no vm pause at boot time). -Krutika
I could reproduce this issue with RHV 4.3.5 + RHEL 7.7 + RHGS 3.5.0 ( glusterfs-6.0-5 ) with the following steps: 1. Create a RHEL 7.6 VM with 40GB boot disk, preallocated( using falloc ) - all this is done with VM creation workflow in RHV 2. Post OS installation, powered off the VM and created a template of the VM 3. Created 10 VM out of the template 4. Started these VMs altogether. One of the VM paused
> > This is a problem in RHHI as shard translator in gluster assumes that the > writes are from a single writer. > And this was the case until this issue was hit, which breaks this assumption. > Even if full-size fallocates were done from the same gluster mount point for > a given vm, we are good. But what I saw > was that the first full-size fallocate was executed on host 3's gluster > mountpoint whereas the second was executed from > host 2's gluster mountpoint. OK, now I understand. Yes, this new flow breaks this assumption. The allocation operation can be done from different hosts. The first allocation can be done from host A and the second can be done from host B. But, the allocations are done one after the other, not concurrently. The ability to perform storage operation from different hosts that are not the SPM existing since 4.1 version > > > > What's deferred_preallocation? Could you throw some light on that? The new deferred_preallocation capability is what I explain before. It is the ability to postpone the full allocation to a later stage in the process. In this case, postpone the allocation to the point when the data is copied and not when the disk is allocated. The host should support this new capability since version 4.3 > > -Krutika
(In reply to Eyal Shenitzky from comment #37) > > > > This is a problem in RHHI as shard translator in gluster assumes that the > > writes are from a single writer. > > And this was the case until this issue was hit, which breaks this assumption. > > Even if full-size fallocates were done from the same gluster mount point for > > a given vm, we are good. But what I saw > > was that the first full-size fallocate was executed on host 3's gluster > > mountpoint whereas the second was executed from > > host 2's gluster mountpoint. > > OK, now I understand. > > Yes, this new flow breaks this assumption. > The allocation operation can be done from different hosts. > > The first allocation can be done from host A and the second can be done from > host B. > But, the allocations are done one after the other, not concurrently. Sure, but even if the first fallocate was indeed of 1B size and the second fallocate of full size with them being executed from different hosts, things will work. I simulated the same operations on gluster with a 1B fallocate which is later followed by a full-size fallocate, and things work. But in this case, what we're seeing is that both the first and second fallocate are being done to full-size. In short, the problem is with two fallocates to full-size being executed from two different mounts, even if at different points of time. Could you please check if that is the case in vdsm code? And also maybe point to the code in github where these two fallocates are done? -Krutika > > > The ability to perform storage operation from different hosts that are not > the SPM > existing since 4.1 version > > > > > > > > > What's deferred_preallocation? Could you throw some light on that? > > The new deferred_preallocation capability is what I explain before. > It is the ability to postpone the full allocation to a later stage in the > process. > In this case, postpone the allocation to the point when the data is copied > and not when the disk is allocated. > > The host should support this new capability since version 4.3 > > > > > -Krutika
> Sure, but even if the first fallocate was indeed of 1B size and the second > fallocate of full size with them being executed from different hosts, things > will work. > I simulated the same operations on gluster with a 1B fallocate which is > later followed by a full-size fallocate, and things work. > > But in this case, what we're seeing is that both the first and second > fallocate are being done to full-size. > In short, the problem is with two fallocates to full-size being executed > from two different mounts, even if at different points of time. > > Could you please check if that is the case in vdsm code? And also maybe > point to the code in github where these two fallocates are done? In order to check this, I need the VDSM logs, according to comment #11 the bug appears in the logs around 2019-04-25 09:27:43 but the logs in the SOS report are until - 2019-04-22 only. This is the relevant code - https://github.com/oVirt/vdsm/blob/01b77155a367e3c863f29df079457e088cacc6fc/lib/vdsm/storage/fileVolume.py#L429 > > -Krutika
(In reply to Eyal Shenitzky from comment #39) > > Sure, but even if the first fallocate was indeed of 1B size and the second > > fallocate of full size with them being executed from different hosts, things > > will work. > > I simulated the same operations on gluster with a 1B fallocate which is > > later followed by a full-size fallocate, and things work. > > > > But in this case, what we're seeing is that both the first and second > > fallocate are being done to full-size. > > In short, the problem is with two fallocates to full-size being executed > > from two different mounts, even if at different points of time. > > > > Could you please check if that is the case in vdsm code? And also maybe > > point to the code in github where these two fallocates are done? > > In order to check this, I need the VDSM logs, according to comment #11 > the bug appears in the logs around 2019-04-25 09:27:43 but the logs in the > SOS report are until - 2019-04-22 only. > > This is the relevant code - > https://github.com/oVirt/vdsm/blob/01b77155a367e3c863f29df079457e088cacc6fc/ > lib/vdsm/storage/fileVolume.py#L429 > > > > > -Krutika The sosreports are for older runs. Let me try to deduce the timestamp of vm pause from the old sosreports and get back. -Krutika
(In reply to Krutika Dhananjay from comment #40) > (In reply to Eyal Shenitzky from comment #39) > > > Sure, but even if the first fallocate was indeed of 1B size and the second > > > fallocate of full size with them being executed from different hosts, things > > > will work. > > > I simulated the same operations on gluster with a 1B fallocate which is > > > later followed by a full-size fallocate, and things work. > > > > > > But in this case, what we're seeing is that both the first and second > > > fallocate are being done to full-size. > > > In short, the problem is with two fallocates to full-size being executed > > > from two different mounts, even if at different points of time. > > > > > > Could you please check if that is the case in vdsm code? And also maybe > > > point to the code in github where these two fallocates are done? > > > > In order to check this, I need the VDSM logs, according to comment #11 > > the bug appears in the logs around 2019-04-25 09:27:43 but the logs in the > > SOS report are until - 2019-04-22 only. > > > > This is the relevant code - > > https://github.com/oVirt/vdsm/blob/01b77155a367e3c863f29df079457e088cacc6fc/ > > lib/vdsm/storage/fileVolume.py#L429 > > > > > > > > -Krutika > > The sosreports are for older runs. Let me try to deduce the timestamp of vm > pause from the old sosreports and get back. > > -Krutika Here's one that i found in sosreports from comment #2: [2019-04-16 06:42:51.201964] W [fuse-bridge.c:2756:fuse_readv_cbk] 0-glusterfs-fuse: 846853: READ => -1 gfid=1f9cddce-d0cc-46f8-ab4b-c2918ffaf01b fd=0x7ff690008128 (Stale file handle) One more - [2019-04-16 06:42:51.201964] W [fuse-bridge.c:2756:fuse_readv_cbk] 0-glusterfs-fuse: 846853: READ => -1 gfid=1f9cddce-d0cc-46f8-ab4b-c2918ffaf01b fd=0x7ff690008128 (Stale file handle) Some more: [2019-04-17 06:27:15.835814] W [fuse-bridge.c:2756:fuse_readv_cbk] 0-glusterfs-fuse: 1372629: READ => -1 gfid=1f9cddce-d0cc-46f8-ab4b-c2918ffaf01b fd=0x7ff690008128 (Stale file handle) [2019-04-17 06:27:21.898401] W [fuse-bridge.c:2756:fuse_readv_cbk] 0-glusterfs-fuse: 1372645: READ => -1 gfid=1f9cddce-d0cc-46f8-ab4b-c2918ffaf01b fd=0x7ff690008128 (Stale file handle) [2019-04-17 06:27:22.711071] W [fuse-bridge.c:2756:fuse_readv_cbk] 0-glusterfs-fuse: 1372647: READ => -1 gfid=1f9cddce-d0cc-46f8-ab4b-c2918ffaf01b fd=0x7ff690008128 (Stale file handle) [2019-04-17 06:27:22.877170] W [fuse-bridge.c:2756:fuse_readv_cbk] 0-glusterfs-fuse: 1372649: READ => -1 gfid=1f9cddce-d0cc-46f8-ab4b-c2918ffaf01b fd=0x7ff690008128 (Stale file handle) Please let me know if this is sufficient. Also, all timestamps are in UTC. -Krutika
From the VDSM logs, I see only call for creating new volume without initial size, which means that the volume will be fully allocated twice (in the creation and when copying the data). Can you please verify that the host doesn't support the deferred_preallocation by running the following command on the hosts in the relevant environment and share the output: vdsm-client Host getCapabilities | grep deferred_preallocation
(In reply to Eyal Shenitzky from comment #42) > From the VDSM logs, I see only call for creating new volume without initial > size, which means that the volume will be fully allocated twice (in the > creation and when copying the data). > > Can you please verify that the host doesn't support the > deferred_preallocation by running the following command on the hosts in the > relevant environment and share the output: > > vdsm-client Host getCapabilities | grep deferred_preallocation Asking Satheesaran and Mugdha since they have the setup. Sas, Mugdha, Could you share the output of the command above? -Krutika
I do see these in vdsm logs. A search for "deferred_preallocation" shows the following match - "u'deferred_preallocation': True,". Let me know if this is the same as the output of the vdsm command in comment #42 you're looking for. 2019-04-22 16:01:10,248+0530 INFO (jsonrpc/6) [api.host] FINISH getCapabilities return={'status': {'message': 'Done', 'code': 0}, 'info': {u'HBAInventory': {u'iSCSI': [{u'InitiatorName': u'iqn.1994-05.com.redhat:4a694c96f48'}], u'FC': []}, u'packages2': {u'kernel': {u'release': u'957.10.1.el7.x86_64', u'version': u'3.10.0'}, u'glusterfs-rdma': {u'release': u'47.el7rhgs', u'version': u'3.12.2'}, u'glusterfs-fuse': {u'release': u'47.el7rhgs', u'version': u'3.12.2'}, u'spice-server': {u'release': u'6.el7_6.1', u'version': u'0.14.0'}, u'librbd1': {u'release': u'4.el7', u'version': u'10.2.5'}, u'vdsm': {u'release': u'1.el7ev', u'version': u'4.30.13'}, u'qemu-kvm': {u'release': u'18.el7_6.3', u'version': u'2.12.0'}, u'openvswitch': {u'release': u'5.el7ev', u'version': u'2.10'}, u'libvirt': {u'release': u'10.el7_6.6', u'version': u'4.5.0'}, u'ovirt-hosted-engine-ha': {u'release': u'1.el7ev', u'version': u'2.3.1'}, u'qemu-img': {u'release': u'18.el7_6.3', u'version': u'2.12.0'}, u'mom': {u'release': u'1.el7ev', u'version': u'0.5.12'}, u'glusterfs': {u'release': u'47.el7rhgs', u'version': u'3.12.2'}, u'glusterfs-cli': {u'release': u'47.el7rhgs', u'version': u'3.12.2'}, u'glusterfs-server': {u'release': u'47.el7rhgs', u'version': u'3.12.2'}, u'glusterfs-geo-replication': {u'release': u'47.el7rhgs', u'version': u'3.12.2'}}, u'numaNodeDistance': {u'1': [21, 10], u'0': [10, 21]}, u'cpuModel': u'Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz', u'nestedVirtualization': False, u'liveMerge': u'true', u'hooks': {u'after_vm_start': {u'openstacknet_utils.py': {u'md5': u'1ed38ddf30f8a9c7574589e77e2c0b1f'}, u'50_openstacknet': {u'md5': u'ea0a5a715da8c1badbcda28e8b8fa00e'}}, u'after_network_setup': {u'30_ethtool_options': {u'md5': u'ce1fbad7aa0389e3b06231219140bf0d'}}, u'before_vm_start': {u'50_hostedengine': {u'md5': u'95c810cdcfe4195302a59574a5148289'}, u'50_vhostmd': {u'md5': u'9206bc390bcbf208b06a8e899581be2d'}}, u'after_device_migrate_destination': {u'openstacknet_utils.py': {u'md5': u'1ed38ddf30f8a9c7574589e77e2c0b1f'}, u'50_openstacknet': {u'md5': u'6226fbc4d1602994828a3904fc1b875d'}}, u'before_device_migrate_destination': {u'50_vmfex': {u'md5': u'49caba1a5faadd8efacef966f79bc30a'}}, u'before_device_create': {u'50_vmfex': {u'md5': u'49caba1a5faadd8efacef966f79bc30a'}, u'openstacknet_utils.py': {u'md5': u'1ed38ddf30f8a9c7574589e77e2c0b1f'}, u'50_openstacknet': {u'md5': u'ec8d5d7ba063a109f749cd63c7a2b58d'}, u'20_ovirt_provider_ovn_vhostuser_hook': {u'md5': u'a8af653b7386c138b2e6e9738bd6b62c'}, u'10_ovirt_provider_ovn_hook': {u'md5': u'73822988042847bab1ea832a6b9fa837'}}, u'after_get_caps': {u'openstacknet_utils.py': {u'md5': u'1ed38ddf30f8a9c7574589e77e2c0b1f'}, u'50_openstacknet': {u'md5': u'5c3a9ab6e06e039bdd220c0216e45809'}, u'ovirt_provider_ovn_hook': {u'md5': u'4c4b1d2d5460e6a65114ae36cb775df6'}}, u'after_nic_hotplug': {u'openstacknet_utils.py': {u'md5': u'1ed38ddf30f8a9c7574589e77e2c0b1f'}, u'50_openstacknet': {u'md5': u'6226fbc4d1602994828a3904fc1b875d'}}, u'before_vm_migrate_destination': {u'50_vhostmd': {u'md5': u'9206bc390bcbf208b06a8e899581be2d'}}, u'after_device_create': {u'openstacknet_utils.py': {u'md5': u'1ed38ddf30f8a9c7574589e77e2c0b1f'}, u'50_openstacknet': {u'md5': u'6226fbc4d1602994828a3904fc1b875d'}}, u'before_vm_dehibernate': {u'50_vhostmd': {u'md5': u'9206bc390bcbf208b06a8e899581be2d'}}, u'before_nic_hotplug': {u'50_vmfex': {u'md5': u'49caba1a5faadd8efacef966f79bc30a'}, u'openstacknet_utils.py': {u'md5': u'1ed38ddf30f8a9c7574589e77e2c0b1f'}, u'50_openstacknet': {u'md5': u'ec8d5d7ba063a109f749cd63c7a2b58d'}, u'20_ovirt_provider_ovn_vhostuser_hook': {u'md5': u'a8af653b7386c138b2e6e9738bd6b62c'}, u'10_ovirt_provider_ovn_hook': {u'md5': u'73822988042847bab1ea832a6b9fa837'}}, u'before_network_setup': {u'50_fcoe': {u'md5': u'28c352339c8beef1e1b05c67d106d062'}}, u'after_vm_destroy': {u'delete_vhostuserclient_hook': {u'md5': u'c2f279cc9483a3f842f6c29df13994c1'}, u'50_vhostmd': {u'md5': u'bdf4802c0521cf1bae08f2b90a9559cf'}}}, u'supportsIPv6': True, u'realtimeKernel': False, u'vmTypes': [u'kvm'], u'liveSnapshot': u'true', u'cpuThreads': u'32', u'kdumpStatus': 0, u'deferred_preallocation': True, u'networks': {u'ovirtmgmt': {u'iface': u'ovirtmgmt', u'ipv6autoconf': True, u'addr': u'10.70.45.1', u'dhcpv6': False, u'ipv6addrs': [u'2620:52:0:462c:ec4:7aff:fe57:c748/64'], u'switch': u'legacy', u'bridged': True, u'southbound': u'ens3f0', u'dhcpv4': True, u'netmask': u'255.255.252.0', u'ipv4defaultroute': True, u'stp': u'off', u'ipv4addrs': [u'10.70.45.1/22'], u'mtu': u'1500', u'ipv6gateway': u'fe80:52:0:462c::3fe', u'gateway': u'10.70.47.254', u'ports': [u'vnet0', u'vnet1', u'vnet2', u'ens3f0']}, u'gluster_network': {u'iface': u'enp4s0f0', u'ipv6autoconf': True, u'addr': u'10.70.36.79', u'dhcpv6': False, u'ipv6addrs': [u'2620:52:0:4624:ec4:7aff:fe6f:3388/64'], u'switch': u'legacy', u'bridged': False, u'mtu': u'1500', u'dhcpv4': True, u'netmask': u'255.255.254.0', u'ipv4defaultroute': False, u'ipv4addrs': [u'10.70.36.79/23'], u'interface': u'enp4s0f0', u'southbound': u'enp4s0f0', u'ipv6gateway': u'::', u'gateway': u'10.70.37.254'}}, u'kernelArgs': u'BOOT_IMAGE=/vmlinuz-3.10.0-957.10.1.el7.x86_64 root=/dev/mapper/rhel_rhsqa--grafton1--nic2-root ro crashkernel=auto rd.lvm.lv=rhel_rhsqa-grafton1-nic2/root rd.lvm.lv=rhel_rhsqa-grafton1-nic2/swap rhgb quiet LANG=en_US.UTF-8', u'domain_versions': [0, 2, 3, 4, 5], u'bridges': {u'ovirtmgmt': {u'ipv6autoconf': True, u'addr': u'10.70.45.1', u'dhcpv6': False, u'ipv6addrs': [u'2620:52:0:462c:ec4:7aff:fe57:c748/64'], u'mtu': u'1500', u'dhcpv4': True, u'netmask': u'255.255.252.0', u'ipv4defaultroute': True, u'stp': u'off', u'ipv4addrs': [u'10.70.45.1/22'], u'ipv6gateway': u'fe80:52:0:462c::3fe', u'gateway': u'10.70.47.254', u'opts': {u'multicast_last_member_count': u'2', u'vlan_protocol': u'0x8100', u'hash_elasticity': u'4', u'multicast_query_response_interval': u'1000', u'group_fwd_mask': u'0x0', u'multicast_snooping': u'1', u'multicast_startup_query_interval': u'3125', u'hello_timer': u'0', u'multicast_querier_interval': u'25500', u'max_age': u'2000', u'hash_max': u'512', u'stp_state': u'0', u'topology_change_detected': u'0', u'priority': u'32768', u'multicast_igmp_version': u'2', u'multicast_membership_interval': u'26000', u'root_path_cost': u'0', u'root_port': u'0', u'multicast_stats_enabled': u'0', u'multicast_startup_query_count': u'2', u'nf_call_iptables': u'0', u'vlan_stats_enabled': u'0', u'hello_time': u'200', u'topology_change': u'0', u'bridge_id': u'8000.0cc47a57c748', u'topology_change_timer': u'0', u'ageing_time': u'30000', u'nf_call_ip6tables': u'0', u'multicast_mld_version': u'1', u'gc_timer': u'1001', u'root_id': u'8000.0cc47a57c748', u'nf_call_arptables': u'0', u'group_addr': u'1:80:c2:0:0:0', u'multicast_last_member_interval': u'100', u'default_pvid': u'1', u'multicast_query_interval': u'12500', u'multicast_query_use_ifaddr': u'0', u'tcn_timer': u'0', u'multicast_router': u'1', u'vlan_filtering': u'0', u'multicast_querier': u'0', u'forward_delay': u'0'}, u'ports': [u'vnet0', u'vnet1', u'vnet2', u'ens3f0']}, u'virbr0': {u'ipv6autoconf': False, u'addr': u'192.168.122.1', u'dhcpv6': False, u'ipv6addrs': [], u'mtu': u'1500', u'dhcpv4': False, u'netmask': u'255.255.255.0', u'ipv4defaultroute': False, u'stp': u'on', u'ipv4addrs': [u'192.168.122.1/24'], u'ipv6gateway': u'::', u'gateway': u'', u'opts': {u'multicast_last_member_count': u'2', u'vlan_protocol': u'0x8100', u'hash_elasticity': u'4', u'multicast_query_response_interval': u'1000', u'group_fwd_mask': u'0x0', u'multicast_snooping': u'1', u'multicast_startup_query_interval': u'3125', u'hello_timer': u'109', u'multicast_querier_interval': u'25500', u'max_age': u'2000', u'hash_max': u'512', u'stp_state': u'1', u'topology_change_detected': u'0', u'priority': u'32768', u'multicast_igmp_version': u'2', u'multicast_membership_interval': u'26000', u'root_path_cost': u'0', u'root_port': u'0', u'multicast_stats_enabled': u'0', u'multicast_startup_query_count': u'2', u'nf_call_iptables': u'0', u'vlan_stats_enabled': u'0', u'hello_time': u'200', u'topology_change': u'0', u'bridge_id': u'8000.525400f77a36', u'topology_change_timer': u'0', u'ageing_time': u'30000', u'nf_call_ip6tables': u'0', u'multicast_mld_version': u'1', u'gc_timer': u'21993', u'root_id': u'8000.525400f77a36', u'nf_call_arptables': u'0', u'group_addr': u'1:80:c2:0:0:0', u'multicast_last_member_interval': u'100', u'default_pvid': u'1', u'multicast_query_interval': u'12500', u'multicast_query_use_ifaddr': u'0', u'tcn_timer': u'0', u'multicast_router': u'1', u'vlan_filtering': u'0', u'multicast_querier': u'0', u'forward_delay': u'200'}, u'ports': [u'virbr0-nic']}}, u'uuid': u'00000000-0000-0000-0000-0cc47a6f3388', u'onlineCpus': u'0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31', u'nameservers': [u'10.70.34.2', u'10.70.34.1', u'10.75.5.25'], u'nics': {u'ens3f1': {u'ipv6autoconf': True, u'addr': u'', u'speed': 0, u'dhcpv6': False, u'ipv6addrs': [], u'mtu': u'1500', u'dhcpv4': False, u'netmask': u'', u'ipv4defaultroute': False, u'ipv4addrs': [], u'hwaddr': u'0c:c4:7a:57:c7:49', u'ipv6gateway': u'::', u'gateway': u''}, u'ens3f0': {u'ipv6autoconf': False, u'addr': u'', u'speed': 1000, u'dhcpv6': False, u'ipv6addrs': [], u'mtu': u'1500', u'dhcpv4': False, u'netmask': u'', u'ipv4defaultroute': False, u'ipv4addrs': [], u'hwaddr': u'0c:c4:7a:57:c7:48', u'ipv6gateway': u'::', u'gateway': u''}, u'enp4s0f0': {u'ipv6autoconf': True, u'addr': u'10.70.36.79', u'speed': 10000, u'dhcpv6': False, u'ipv6addrs': [u'2620:52:0:4624:ec4:7aff:fe6f:3388/64'], u'mtu': u'1500', u'dhcpv4': True, u'netmask': u'255.255.254.0', u'ipv4defaultroute': False, u'ipv4addrs': [u'10.70.36.79/23'], u'hwaddr': u'0c:c4:7a:6f:33:88', u'ipv6gateway': u'::', u'gateway': u'10.70.37.254'}, u'enp4s0f1': {u'ipv6autoconf': True, u'addr': u'', u'speed': 0, u'dhcpv6': False, u'ipv6addrs': [], u'mtu': u'1500', u'dhcpv4': False, u'netmask': u'', u'ipv4defaultroute': False, u'ipv4addrs': [], u'hwaddr': u'0c:c4:7a:6f:33:89', u'ipv6gateway': u'::', u'gateway': u''}}, u'software_revision': u'1', u'hostdevPassthrough': u'false', u'clusterLevels': [u'4.1', u'4.2', u'4.3'], u'cpuFlags': u'fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,pdpe1gb,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,rep_good,nopl,xtopology,nonstop_tsc,aperfmperf,eagerfpu,pni,pclmulqdq,dtes64,monitor,ds_cpl,vmx,smx,est,tm2,ssse3,sdbg,fma,cx16,xtpr,pdcm,pcid,dca,sse4_1,sse4_2,x2apic,movbe,popcnt,tsc_deadline_timer,aes,xsave,avx,f16c,rdrand,lahf_lm,abm,epb,intel_ppin,ssbd,ibrs,ibpb,stibp,tpr_shadow,vnmi,flexpriority,ept,vpid,fsgsbase,tsc_adjust,bmi1,avx2,smep,bmi2,erms,invpcid,cqm,xsaveopt,cqm_llc,cqm_occup_llc,dtherm,ida,arat,pln,pts,spec_ctrl,intel_stibp,flush_l1d,model_Opteron_G2,model_kvm32,model_SandyBridge-IBRS,model_Haswell-noTSX,model_Opteron_G1,model_core2duo,model_pentium,model_Westmere-IBRS,model_qemu64,model_kvm64,model_coreduo,model_Nehalem,model_Nehalem-IBRS,model_IvyBridge,model_Westmere,model_SandyBridge,model_IvyBridge-IBRS,model_n270,model_Haswell-noTSX-IBRS,model_qemu32,model_Penryn,model_pentium2,model_pentium3,model_Conroe,model_486', u'kernelFeatures': {u'IBRS': 0, u'RETP': 1, u'SSBD': 3, u'PTI': 1}, u'ISCSIInitiatorName': u'iqn.1994-05.com.redhat:4a694c96f48', u'netConfigDirty': u'False', u'selinux': {u'mode': u'1'}, u'autoNumaBalancing': 1, u'reservedMem': u'321', u'bondings': {}, u'software_version': u'4.30.13', u'supportedENGINEs': [u'4.1', u'4.2', u'4.3'], u'vncEncrypted': False, u'backupEnabled': False, u'cpuSpeed': u'2900.097', u'numaNodes': {u'1': {u'totalMemory': u'131072', u'cpus': [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]}, u'0': {u'totalMemory': u'130961', u'cpus': [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]}}, u'cpuSockets': u'2', u'vlans': {}, u'version_name': u'Snow Man', 'lastClientIface': 'lo', u'cpuCores': u'16', u'hostedEngineDeployed': True, u'hugepages': [1048576, 2048], u'guestOverhead': u'65', u'additionalFeatures': [u'libgfapi_supported', u'GLUSTER_SNAPSHOT', u'GLUSTER_GEO_REPLICATION', u'GLUSTER_BRICK_MANAGEMENT'], u'openstack_binding_host_ids': {u'OPENSTACK_OVN': u'rhsqa-grafton1-nic2.lab.eng.blr.redhat.com', u'OPEN_VSWITCH': u'rhsqa-grafton1-nic2.lab.eng.blr.redhat.com', u'OVIRT_PROVIDER_OVN': u'6b595e68-2518-44be-a957-48b254709e5b'}, u'kvmEnabled': u'true', u'memSize': u'257675', u'emulatedMachines': [u'pc-i440fx-rhel7.1.0', u'pc-q35-rhel7.3.0', u'rhel6.3.0', u'pc-i440fx-rhel7.5.0', u'pc-i440fx-rhel7.0.0', u'rhel6.1.0', u'pc-q35-rhel7.6.0', u'pc-i440fx-rhel7.4.0', u'rhel6.6.0', u'pc-q35-rhel7.5.0', u'rhel6.2.0', u'pc', u'pc-i440fx-rhel7.3.0', u'q35', u'pc-i440fx-rhel7.2.0', u'rhel6.4.0', u'pc-q35-rhel7.4.0', u'pc-i440fx-rhel7.6.0', u'rhel6.0.0', u'rhel6.5.0'], u'rngSources': [u'hwrng', u'random'], u'operatingSystem': {u'release': u'4.el7', u'pretty_name': u'Red Hat Enterprise Linux Server 7.6 (Maipo)', u'version': u'7.6', u'name': u'RHEL'}}} from=::1,57490 (api:54) -Krutika
Yes, it is the same. So I investigate it and it seems like we had a bug in RHV engine that it didn't set the initial size of the volume to 1 byte and caused to the volume to fully allocate twice. After the bug will be fixed, it will solve this issue?
(In reply to Eyal Shenitzky from comment #45) > Yes, it is the same. > > So I investigate it and it seems like we had a bug in RHV engine that it > didn't set the > initial size of the volume to 1 byte and caused to the volume to fully > allocate twice. > > After the bug will be fixed, it will solve this issue? Yes, fixing the bug should solve this issue, based on a test I did just on gluster with an initial fallocate of 1B followed by another to full size, both executed from differrnt mounts, and then a read on the file from the first mount. But we can also test your patch if you can give us the build containing the fix. -Krutika
Eyal, assigning to you based on Comment 45
Please provide a clear scenario so I can see if I can ack.
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.3.z': '?'}', ] For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.3.z': '?'}', ] For more info please contact: rhv-devops
Steps to reproduce: 1. Create a VM with a raw-preallocated file-based disk (Gluster in this case) 2. Create a template from this VM 3. Clone a new VM from the template created in step 2 Before the fix: The imageInitialSizeInBytes that was sent to the CreateVolumeVDSCommand was equal to null. 2019-07-08 08:08:10,199+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (default task-4) [c28bb1f4-624a-4bd7-ad10-5b792ed8c03b] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='f7c3d0eb-7e72-4eb3-8f33-321c2d8cebb7', ignoreFailoverLimit='false', storageDomainId='8ec58e25-d3ae-4079-8925-880873dd15ad', imageGroupId='773e23e0-cc1a-48b6-8947-4bbb0a36bfbb', imageSizeInBytes='3221225472', volumeFormat='RAW', newImageId='40d5f984-2801-4ad0-8009-8c1623ca47ea', imageType='Preallocated', newImageDescription='null', imageInitialSizeInBytes='null', imageId='00000000-0000-0000-0000-000000000000', sourceImageGroupId='00000000-0000-0000-0000-000000000000'}), log id: 6a8b4221 After the fix: The imageInitialSizeInBytes that was sent to the CreateVolumeVDSCommand is equal to '1'. 2019-07-08 08:08:10,199+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (default task-4) [c28bb1f4-624a-4bd7-ad10-5b792ed8c03b] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='f7c3d0eb-7e72-4eb3-8f33-321c2d8cebb7', ignoreFailoverLimit='false', storageDomainId='8ec58e25-d3ae-4079-8925-880873dd15ad', imageGroupId='773e23e0-cc1a-48b6-8947-4bbb0a36bfbb', imageSizeInBytes='3221225472', volumeFormat='RAW', newImageId='40d5f984-2801-4ad0-8009-8c1623ca47ea', imageType='Preallocated', newImageDescription='null', imageInitialSizeInBytes='1', imageId='00000000-0000-0000-0000-000000000000', sourceImageGroupId='00000000-0000-0000-0000-000000000000'}), log id: 6a8b4221
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops
Verified on: engine-4.4.0-0.13.master.el7 with the following steps: 1. Create a VM with a raw-preallocated file-based disk (Gluster) 2. Create a template from this VM 3. Clone a new VM from the template created in step 2
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: RHV Manager (ovirt-engine) 4.4 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3247