Created attachment 1287243 [details] tar that collect all the log Description of problem: impossible to delete snapshot Version-Release number of selected component (if applicable): vdsm 4.19.15 ovirt-engine 4.1.2.2 cluster compatibility 4.1 glusterfs 3.8.12 How reproducible: create vm with 4 snashot Steps to Reproduce: 1.delete one snapshot 2.delete one more 3.wait until the engine realize that the merge have failed Actual results: merge fail Expected results: merge complited without error Additional info: in vdsm.log: i found 2 strange fact: (1): this line below should report the list of img of the chain and is not reporting it. 2017-06-12 14:36:24,070+0200 INFO (merge/3973ee62) [storage.Image] sdUUID=73ca2906-e59d-4c13-97f4-f636cad9fb0e imgUUID=09e08d20-6317-4418-9b69-9e5f396b64f9 chain=[<storage.glusterVolume.GlusterVolume object at 0x1bd37d0>, <storage.glusterVolume.GlusterVolume object at 0x24b3810>] (image:285) (2): the message at the end of the traceback is refered to an immage that was deleted in the firt sanpshot remove. VolumeDoesNotExist: Volume does not exist: ('94f463fa-f431-4a6a-b1f5-11b234161de3',) the chain status are saved on (chain-before-delete1.txt after-delete-1.txt after-delete2.txt in the log.tar) could it be something related to vdsm that is doing blockcommit and some module expect a blockpull? in the log.tar -engine.log -vdsm.log -sanlock.log -libvirt.log -glusterMountPoint.log -chain-before-delete1.txt -after-delete-1.txt -after-delete2.txt
usefull additional info: vmName=mergeWftest4 vmId=3973ee62-98ee-4732-8b63-c63adffd855d disk_groupe_id=09e08d20-6317-4418-9b69-9e5f396b64f9 mountpointpath:/rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9 snapshot_disk_id: current = c54dd527-06ac-4790-bcb8-8c3881b05442 s4 = dd3485a6-6843-4664-bbbc-b6db9f68eb58 s3 = 94f463fa-f431-4a6a-b1f5-11b234161de3 s2 = bd74e5dc-95fd-40e3-aeb0-5fef6b67d172 s1 = 81b15267-721d-45ec-b9e2-061663febe5a delete s2: correlationId: c303cb38-bd04-4a44-a8c2-15f8608d4047 Jun 12, 2017 2:23:42 PM Snapshot 's2' deletion for VM 'mergeWftest4' was initiated by admin@internal-authz. Jun 12, 2017 2:25:14 PM Snapshot 's2' deletion for VM 'mergeWftest4' has been completed. new situation: snapshot_disk_id: current = c54dd527-06ac-4790-bcb8-8c3881b05442 s4 = dd3485a6-6843-4664-bbbc-b6db9f68eb58 s3 = bd74e5dc-95fd-40e3-aeb0-5fef6b67d172 s1 = 81b15267-721d-45ec-b9e2-061663febe5a start delate s2: correlationID: a9a0b2f0-ac6b-47f7-affa-590e119fd212 Jun 12, 2017 2:35:29 PM Snapshot 's1' deletion for VM 'mergeWftest4' was initiated by admin@internal-authz.
In the log I see an error indicating that the storage wasn't available. Can you please try deleting the same snapshot again and update us with the results?
Created attachment 1288065 [details] log after second attempt to delete snapshot as request the log of the new delete attempt
I am looking at the chains in the attached files. Original chain (file: chain-before-delete1.txt): c54dd527-06ac-4790-bcb8-8c3881b05442 -> dd3485a6-6843-4664-bbbc-b6db9f68eb58 -> 94f463fa-f431-4a6a-b1f5-11b234161de3 -> bd74e5dc-95fd-40e3-aeb0-5fef6b67d172 -> 81b15267-721d-45ec-b9e2-061663febe5a After deleting s3 (94f463fa-f431-4a6a-b1f5-11b234161de3), I see the expected chain (file: after-delete-1.txt): c54dd527-06ac-4790-bcb8-8c3881b05442 -> dd3485a6-6843-4664-bbbc-b6db9f68eb58 -> bd74e5dc-95fd-40e3-aeb0-5fef6b67d172 -> 81b15267-721d-45ec-b9e2-061663febe5a However, after deleting s2 (bd74e5dc-95fd-40e3-aeb0-5fef6b67d172), I see two chains (file: after-delete2.txt): chain-1: c54dd527-06ac-4790-bcb8-8c3881b05442 -> dd3485a6-6843-4664-bbbc-b6db9f68eb58 -> 81b15267-721d-45ec-b9e2-061663febe5a chain-2: bd74e5dc-95fd-40e3-aeb0-5fef6b67d172 -> 81b15267-721d-45ec-b9e2-061663febe5a chain-1 is what I expected to see. Trying to understand how we got chain-2. Can you please send the current chain? In addition, could you please send the volumes info by running following command on each volume (also on 94f463fa-f431-4a6a-b1f5-11b234161de3): vdsm-client Volume getInfo storagepoolID=<spUUID> storagedomainID=<sdUUID> imageID=<imgUUID> volumeID=<volUUID>
One more request please, could you please set log level for storage and virt to DEBUG? On the host, edit /etc/vdsm/logger.conf and change log level of logger_storage and logger_virt to DEBUG. Thanks.
(In reply to Ala Hino from comment #5) > One more request please, could you please set log level for storage and virt > to DEBUG? > > On the host, edit /etc/vdsm/logger.conf and change log level of > logger_storage and logger_virt to DEBUG. > > Thanks. And of course try deleting the snapshot again and upload the logs.
(In reply to Ala Hino from comment #4) > I am looking at the chains in the attached files. > > Original chain (file: chain-before-delete1.txt): > > c54dd527-06ac-4790-bcb8-8c3881b05442 -> > dd3485a6-6843-4664-bbbc-b6db9f68eb58 -> > 94f463fa-f431-4a6a-b1f5-11b234161de3 -> > bd74e5dc-95fd-40e3-aeb0-5fef6b67d172 -> > 81b15267-721d-45ec-b9e2-061663febe5a > > After deleting s3 (94f463fa-f431-4a6a-b1f5-11b234161de3), I see the expected > chain (file: after-delete-1.txt): > > c54dd527-06ac-4790-bcb8-8c3881b05442 -> > dd3485a6-6843-4664-bbbc-b6db9f68eb58 -> > bd74e5dc-95fd-40e3-aeb0-5fef6b67d172 -> > 81b15267-721d-45ec-b9e2-061663febe5a > > However, after deleting s2 (bd74e5dc-95fd-40e3-aeb0-5fef6b67d172), I see two > chains (file: after-delete2.txt): > > chain-1: > c54dd527-06ac-4790-bcb8-8c3881b05442 -> > dd3485a6-6843-4664-bbbc-b6db9f68eb58 -> > 81b15267-721d-45ec-b9e2-061663febe5a > > chain-2: > bd74e5dc-95fd-40e3-aeb0-5fef6b67d172 -> > 81b15267-721d-45ec-b9e2-061663febe5a > > chain-1 is what I expected to see. Trying to understand how we got chain-2. > > Can you please send the current chain? image: c54dd527-06ac-4790-bcb8-8c3881b05442 file format: qcow2 virtual size: 15G (16106127360 bytes) disk size: 4.9G cluster_size: 65536 backing file: dd3485a6-6843-4664-bbbc-b6db9f68eb58 backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false file format: qcow2 virtual size: 15G (16106127360 bytes) disk size: 163M cluster_size: 65536 backing file: 81b15267-721d-45ec-b9e2-061663febe5a backing file format: raw Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false file format: qcow2 virtual size: 15G (16106127360 bytes) disk size: 1.3G cluster_size: 65536 backing file: 81b15267-721d-45ec-b9e2-061663febe5a backing file format: raw Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false image: 81b15267-721d-45ec-b9e2-061663febe5a file format: raw virtual size: 15G (16106127360 bytes) disk size: 6.0G > > In addition, could you please send the volumes info by running following > command on each volume (also on 94f463fa-f431-4a6a-b1f5-11b234161de3): > the volume 94f463fa-f431-4a6a-b1f5-11b234161de3 do not exist since delete of s3.. ls on the folder /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9 -rw-rw---- 1 vdsm kvm 15G Jun 12 14:36 81b15267-721d-45ec-b9e2-061663febe5a -rw-rw---- 1 vdsm kvm 1.0M Jun 9 11:53 81b15267-721d-45ec-b9e2-061663febe5a.lease -rw-r--r-- 1 vdsm kvm 342 Jun 19 14:08 81b15267-721d-45ec-b9e2-061663febe5a.meta -rw-rw---- 1 vdsm kvm 1.3G Jun 12 14:24 bd74e5dc-95fd-40e3-aeb0-5fef6b67d172 -rw-rw---- 1 vdsm kvm 1.0M Jun 9 13:26 bd74e5dc-95fd-40e3-aeb0-5fef6b67d172.lease -rw-r--r-- 1 vdsm kvm 265 Jun 12 14:24 bd74e5dc-95fd-40e3-aeb0-5fef6b67d172.meta -rw-rw---- 1 vdsm kvm 4.9G Jun 19 14:08 c54dd527-06ac-4790-bcb8-8c3881b05442 -rw-rw---- 1 vdsm kvm 1.0M Jun 9 14:31 c54dd527-06ac-4790-bcb8-8c3881b05442.lease -rw-r--r-- 1 vdsm kvm 265 Jun 9 14:31 c54dd527-06ac-4790-bcb8-8c3881b05442.meta -rw-rw---- 1 vdsm kvm 163M Jun 12 14:36 dd3485a6-6843-4664-bbbc-b6db9f68eb58 -rw-rw---- 1 vdsm kvm 1.0M Jun 9 14:25 dd3485a6-6843-4664-bbbc-b6db9f68eb58.lease -rw-r--r-- 1 vdsm kvm 269 Jun 9 14:31 dd3485a6-6843-4664-bbbc-b6db9f68eb58.meta > vdsm-client Volume getInfo storagepoolID=<spUUID> storagedomainID=<sdUUID> > imageID=<imgUUID> volumeID=<volUUID> { "status": "OK", "lease": { "owners": [], "version": null }, "domain": "73ca2906-e59d-4c13-97f4-f636cad9fb0e", "capacity": "16106127360", "voltype": "INTERNAL", "description": "{\"DiskAlias\":\"mergeWftest4_Disk1\",\"DiskDescription\":\"mergeWftest4_Disk1\"}", "parent": "00000000-0000-0000-0000-000000000000", "format": "RAW", "generation": 0, "image": "09e08d20-6317-4418-9b69-9e5f396b64f9", "uuid": "81b15267-721d-45ec-b9e2-061663febe5a", "disktype": "2", "legality": "LEGAL", "mtime": "0", "apparentsize": "16106127360", "truesize": "6432301056", "type": "SPARSE", "children": [], "pool": "", "ctime": "1497001994" } { "status": "OK", "lease": { "owners": [], "version": null }, "domain": "73ca2906-e59d-4c13-97f4-f636cad9fb0e", "capacity": "16106127360", "voltype": "LEAF", "description": "", "parent": "81b15267-721d-45ec-b9e2-061663febe5a", "format": "COW", "generation": 0, "image": "09e08d20-6317-4418-9b69-9e5f396b64f9", "uuid": "bd74e5dc-95fd-40e3-aeb0-5fef6b67d172", "disktype": "2", "legality": "LEGAL", "mtime": "0", "apparentsize": "1356070912", "truesize": "1365323776", "type": "SPARSE", "children": [], "pool": "", "ctime": "1497007610" } { "status": "OK", "lease": { "owners": [], "version": null }, "domain": "73ca2906-e59d-4c13-97f4-f636cad9fb0e", "capacity": "16106127360", "voltype": "LEAF", "description": "", "parent": "dd3485a6-6843-4664-bbbc-b6db9f68eb58", "format": "COW", "generation": 0, "image": "09e08d20-6317-4418-9b69-9e5f396b64f9", "uuid": "c54dd527-06ac-4790-bcb8-8c3881b05442", "disktype": "2", "legality": "LEGAL", "mtime": "0", "apparentsize": "5252448256", "truesize": "5286686720", "type": "SPARSE", "children": [], "pool": "", "ctime": "1497011478" } { "status": "OK", "lease": { "owners": [], "version": null }, "domain": "73ca2906-e59d-4c13-97f4-f636cad9fb0e", "capacity": "16106127360", "voltype": "INTERNAL", "description": "", "parent": "94f463fa-f431-4a6a-b1f5-11b234161de3", "format": "COW", "generation": 0, "image": "09e08d20-6317-4418-9b69-9e5f396b64f9", "uuid": "dd3485a6-6843-4664-bbbc-b6db9f68eb58", "disktype": "2", "legality": "LEGAL", "mtime": "0", "apparentsize": "170786816", "truesize": "170655744", "type": "SPARSE", "children": [], "pool": "", "ctime": "1497011131" }
Yeah, it is expected that 94f463fa-f431-4a6a-b1f5-11b234161de3 not to exist. The thing is that Vdsm metadata isn't updated. See volume info of dd3485a6-6843-4664-bbbc-b6db9f68eb58, its parent is 94f463fa-f431-4a6a-b1f5-11b234161de3 although doesn't exist. This btw explains the failure (VolumeDoesNotExist: Volume does not exist: ('94f463fa-f431-4a6a-b1f5-11b234161de3',)) you are getting each time you try to delete the snapshot. Somehow, and this is what I am trying to figure out, Vdsm metadata isn't consistent with the storage - after first delete, dd3485a6-6843-4664-bbbc-b6db9f68eb58 parent should be bd74e5dc-95fd-40e3-aeb0-5fef6b67d172 and not 94f463fa-f431-4a6a-b1f5-11b234161de3. Will you be able to try deleting the snapshot again while DEBUG level is turned on for storage and virt components?
Created attachment 1289106 [details] change logger to DEBUG restarted vdsm and delete again s1 (In reply to Ala Hino from comment #6) > (In reply to Ala Hino from comment #5) > > One more request please, could you please set log level for storage and virt > > to DEBUG? > > > > On the host, edit /etc/vdsm/logger.conf and change log level of > > logger_storage and logger_virt to DEBUG. > > > > Thanks. > > And of course try deleting the snapshot again and upload the logs.
Created attachment 1289653 [details] new test from begin with log level set to DEBUG to help you with the debug i made a new test from scratch with a new vm and the log level set to DEBUG. in case you need info on the gluster configuration and status please let me know. vmName=mergeWftest5 vmId=80d20eee-4a95-40f4-bdc6-c9b3581ba3dd disk_groupe_id=52b5d0ba-a458-4b7f-b18c-82f990af52aa mountpointpath:/rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/52b5d0ba-a458-4b7f-b18c-82f990af52aa snapshot_disk_id: current = 6c9d1f32-fa4f-4d77-97ca-ef41604569c4 s4 = 423847b4-2a6f-428d-8e95-73cd5f0177dc s3 = e0d4ba92-7c18-42c8-9336-2d7a63470178 s2 = fad94d5e-7db2-458e-94a3-921c3db937c9 s1 = b004c421-47c4-425b-bb0f-d21bab267c6b check the chain qemu-img info: file qemu-info-chain1.txt vdsm-client Volume getInfo: file volumeInfo1.txt now start delate s1: correlationId: ee29a99a-8a5f-4448-858d-dbcced68388c Jun 20, 2017 12:08:56 PM Snapshot 's1' deletion for VM 'mergeWftest5' was initiated by admin@internal-authz. Jun 20, 2017 12:11:30 PM Snapshot 's1' deletion for VM 'mergeWftest5' has been completed. new situation: snapshot_disk_id: current = 6c9d1f32-fa4f-4d77-97ca-ef41604569c4 s4 = 423847b4-2a6f-428d-8e95-73cd5f0177dc s3 = e0d4ba92-7c18-42c8-9336-2d7a63470178 s2 = b004c421-47c4-425b-bb0f-d21bab267c6b check the chain qemu-img info: file qemu-info-chain2.txt vdsm-client Volume getInfo: file volumeInfo2.txt update ovs_storage now start delate s2: correlationID: 5f1e2cde-21d4-4285-9c0a-51a5f93b9ab0 Jun 20, 2017 12:19:46 PM Snapshot 's2' deletion for VM 'mergeWftest5' was initiated by admin@internal-authz. check the chain qemu-img info: file qemu-info-chain3.txt vdsm-client Volume getInfo: file volumeInfo3.txt
Much appreciated. Thanks. Please do upload gluster configuration.
Hope this are all the info that you need, but if you need more let me know ill give it to you asap. Thanks for the help. Gluster info: ################################# ################################# #gluster --version glusterfs 3.8.12 built on May 11 2017 18:46:22 ################################# ################################# #gluster volume status vm-images-repo-demo Status of volume: vm-images-repo-demo Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick compute-0-0:/data/glusterfs/brick2/vm -images-repo-demo 49162 0 Y 2139 Brick compute-0-1:/data/glusterfs/brick2/vm -images-repo-demo 49158 0 Y 2462 Brick compute-0-2:/data/glusterfs/brick2/vm -images-repo-demo 49158 0 Y 2896 Brick compute-0-0:/data/glusterfs/brick3/vm -images-repo-demo 49163 0 Y 2156 Brick compute-0-1:/data/glusterfs/brick3/vm -images-repo-demo 49159 0 Y 2457 Brick compute-0-2:/data/glusterfs/brick3/vm -images-repo-demo 49159 0 Y 2902 Brick compute-0-0:/data/glusterfs/brick4/vm -images-repo-demo 49165 0 Y 2254 Brick compute-0-3:/data/glusterfs/brick2/vm -images-repo-demo 49152 0 Y 3868 Brick compute-0-4:/data/glusterfs/brick2/vm -images-repo-demo 49152 0 Y 2706 Brick compute-0-1:/data/glusterfs/brick4/vm -images-repo-demo 49161 0 Y 2446 Brick compute-0-3:/data/glusterfs/brick3/vm -images-repo-demo 49153 0 Y 3862 Brick compute-0-4:/data/glusterfs/brick3/vm -images-repo-demo 49153 0 Y 2687 Brick compute-0-2:/data/glusterfs/brick4/vm -images-repo-demo 49161 0 Y 2910 Brick compute-0-3:/data/glusterfs/brick4/vm -images-repo-demo 49154 0 Y 3855 Brick compute-0-4:/data/glusterfs/brick4/vm -images-repo-demo 49154 0 Y 2699 Self-heal Daemon on localhost N/A N/A Y 13984 Self-heal Daemon on compute-0-2 N/A N/A Y 8653 Self-heal Daemon on compute-0-0 N/A N/A Y 11348 Self-heal Daemon on compute-0-3 N/A N/A Y 6701 Self-heal Daemon on compute-0-4 N/A N/A Y 12306 Task Status of Volume vm-images-repo-demo ------------------------------------------------------------------------------ Task : Rebalance ID : ac0c2ff9-927e-40d7-bc68-f8fa54de6049 Status : completed ############################################################# ############################################################# ##gluster peer status Number of Peers: 4 Hostname: compute-0-3 Uuid: ef2132f5-f4c8-4f6d-ab06-62a3e025372a State: Peer in Cluster (Connected) Other names: 192.168.3.104 Hostname: compute-0-2 Uuid: fb11cda4-e716-4e30-85c5-9cbf73466b01 State: Peer in Cluster (Connected) Other names: 192.168.3.103 Hostname: compute-0-4 Uuid: fc02d6bb-96b3-4d63-89d1-a277a5ef26db State: Peer in Cluster (Connected) Other names: 192.168.3.105 Hostname: compute-0-0 Uuid: d286dbdf-f6e9-4c9a-af92-82d71a5f2d51 State: Peer in Cluster (Connected) Other names: 192.168.3.101 ################################ ############################### #gluster volume get vm-images-repo-demo all Option Value ------ ----- cluster.lookup-unhashed on cluster.lookup-optimize off cluster.min-free-disk 10% cluster.min-free-inodes 5% cluster.rebalance-stats off cluster.subvols-per-directory (null) cluster.readdir-optimize off cluster.rsync-hash-regex (null) cluster.extra-hash-regex (null) cluster.dht-xattr-name trusted.glusterfs.dht cluster.randomize-hash-range-by-gfid off cluster.rebal-throttle normal cluster.lock-migration off cluster.local-volume-name (null) cluster.weighted-rebalance on cluster.switch-pattern (null) cluster.entry-change-log on cluster.read-subvolume (null) cluster.read-subvolume-index -1 cluster.read-hash-mode 1 cluster.background-self-heal-count 8 cluster.metadata-self-heal on cluster.data-self-heal on cluster.entry-self-heal on cluster.self-heal-daemon on cluster.heal-timeout 600 cluster.self-heal-window-size 1 cluster.data-change-log on cluster.metadata-change-log on cluster.data-self-heal-algorithm full cluster.eager-lock enable disperse.eager-lock on cluster.quorum-type auto cluster.quorum-count (null) cluster.choose-local true cluster.self-heal-readdir-size 1KB cluster.post-op-delay-secs 1 cluster.ensure-durability on cluster.consistent-metadata no cluster.heal-wait-queue-length 128 cluster.favorite-child-policy none cluster.stripe-block-size 128KB cluster.stripe-coalesce true diagnostics.latency-measurement off diagnostics.dump-fd-stats off diagnostics.count-fop-hits off diagnostics.brick-log-level INFO diagnostics.client-log-level INFO diagnostics.brick-sys-log-level CRITICAL diagnostics.client-sys-log-level CRITICAL diagnostics.brick-logger (null) diagnostics.client-logger (null) diagnostics.brick-log-format (null) diagnostics.client-log-format (null) diagnostics.brick-log-buf-size 5 diagnostics.client-log-buf-size 5 diagnostics.brick-log-flush-timeout 120 diagnostics.client-log-flush-timeout 120 diagnostics.stats-dump-interval 0 diagnostics.fop-sample-interval 0 diagnostics.fop-sample-buf-size 65535 diagnostics.stats-dnscache-ttl-sec 86400 performance.cache-max-file-size 0 performance.cache-min-file-size 0 performance.cache-refresh-timeout 1 performance.cache-priority performance.cache-size 32MB performance.io-thread-count 16 performance.high-prio-threads 16 performance.normal-prio-threads 16 performance.low-prio-threads 32 performance.least-prio-threads 1 performance.enable-least-priority on performance.least-rate-limit 0 performance.cache-size 128MB performance.flush-behind on performance.nfs.flush-behind on performance.write-behind-window-size 1MB performance.resync-failed-syncs-after-fsyncoff performance.nfs.write-behind-window-size1MB performance.strict-o-direct off performance.nfs.strict-o-direct off performance.strict-write-ordering off performance.nfs.strict-write-ordering off performance.lazy-open yes performance.read-after-open no performance.read-ahead-page-count 4 performance.md-cache-timeout 1 performance.cache-swift-metadata true features.encryption off encryption.master-key (null) encryption.data-key-size 256 encryption.block-size 4096 network.frame-timeout 1800 network.ping-timeout 10 network.tcp-window-size (null) features.lock-heal off features.grace-timeout 10 network.remote-dio enable client.event-threads 2 network.ping-timeout 10 network.tcp-window-size (null) network.inode-lru-limit 16384 auth.allow * auth.reject (null) transport.keepalive (null) server.allow-insecure on server.root-squash off server.anonuid 65534 server.anongid 65534 server.statedump-path /var/run/gluster server.outstanding-rpc-limit 64 features.lock-heal off features.grace-timeout 10 server.ssl (null) auth.ssl-allow * server.manage-gids off server.dynamic-auth on client.send-gids on server.gid-timeout 300 server.own-thread (null) server.event-threads 2 ssl.own-cert (null) ssl.private-key (null) ssl.ca-list (null) ssl.crl-path (null) ssl.certificate-depth (null) ssl.cipher-list (null) ssl.dh-param (null) ssl.ec-curve (null) performance.write-behind on performance.read-ahead off performance.readdir-ahead on performance.io-cache off performance.quick-read off performance.open-behind on performance.stat-prefetch off performance.client-io-threads off performance.nfs.write-behind on performance.nfs.read-ahead off performance.nfs.io-cache off performance.nfs.quick-read off performance.nfs.stat-prefetch off performance.nfs.io-threads off performance.force-readdirp true features.uss off features.snapshot-directory .snaps features.show-snapshot-directory off network.compression off network.compression.window-size -15 network.compression.mem-level 8 network.compression.min-size 0 network.compression.compression-level -1 network.compression.debug false features.limit-usage (null) features.quota-timeout 0 features.default-soft-limit 80% features.soft-timeout 60 features.hard-timeout 5 features.alert-time 86400 features.quota-deem-statfs off geo-replication.indexing off geo-replication.indexing off geo-replication.ignore-pid-check off geo-replication.ignore-pid-check off features.quota off features.inode-quota off features.bitrot disable debug.trace off debug.log-history no debug.log-file no debug.exclude-ops (null) debug.include-ops (null) debug.error-gen off debug.error-failure (null) debug.error-number (null) debug.random-failure off debug.error-fops (null) nfs.enable-ino32 no nfs.mem-factor 15 nfs.export-dirs on nfs.export-volumes on nfs.addr-namelookup off nfs.dynamic-volumes off nfs.register-with-portmap on nfs.outstanding-rpc-limit 16 nfs.port 2049 nfs.rpc-auth-unix on nfs.rpc-auth-null on nfs.rpc-auth-allow all nfs.rpc-auth-reject none nfs.ports-insecure off nfs.trusted-sync off nfs.trusted-write off nfs.volume-access read-write nfs.export-dir nfs.disable true nfs.nlm on nfs.acl on nfs.mount-udp off nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab nfs.rpc-statd /sbin/rpc.statd nfs.server-aux-gids off nfs.drc off nfs.drc-size 0x20000 nfs.read-size (1 * 1048576ULL) nfs.write-size (1 * 1048576ULL) nfs.readdir-size (1 * 1048576ULL) nfs.rdirplus on nfs.exports-auth-enable (null) nfs.auth-refresh-interval-sec (null) nfs.auth-cache-ttl-sec (null) features.read-only off features.worm off features.worm-file-level off features.default-retention-period 120 features.retention-mode relax features.auto-commit-period 180 storage.linux-aio off storage.batch-fsync-mode reverse-fsync storage.batch-fsync-delay-usec 0 storage.owner-uid 36 storage.owner-gid 36 storage.node-uuid-pathinfo off storage.health-check-interval 30 storage.build-pgfid off storage.bd-aio off cluster.server-quorum-type server cluster.server-quorum-ratio 0 changelog.changelog off changelog.changelog-dir (null) changelog.encoding ascii changelog.rollover-time 15 changelog.fsync-interval 5 changelog.changelog-barrier-timeout 120 changelog.capture-del-path off features.barrier disable features.barrier-timeout 120 features.trash off features.trash-dir .trashcan features.trash-eliminate-path (null) features.trash-max-filesize 5MB features.trash-internal-op off cluster.enable-shared-storage disable cluster.write-freq-threshold 0 cluster.read-freq-threshold 0 cluster.tier-pause off cluster.tier-promote-frequency 120 cluster.tier-demote-frequency 3600 cluster.watermark-hi 90 cluster.watermark-low 75 cluster.tier-mode cache cluster.tier-max-promote-file-size 0 cluster.tier-max-mb 4000 cluster.tier-max-files 10000 features.ctr-enabled off features.record-counters off features.ctr-record-metadata-heat off features.ctr_link_consistency off features.ctr_lookupheal_link_timeout 300 features.ctr_lookupheal_inode_timeout 300 features.ctr-sql-db-cachesize 1000 features.ctr-sql-db-wal-autocheckpoint 1000 locks.trace off locks.mandatory-locking off cluster.disperse-self-heal-daemon enable cluster.quorum-reads no client.bind-insecure (null) ganesha.enable off features.shard on features.shard-block-size 4MB features.scrub-throttle lazy features.scrub-freq biweekly features.scrub false features.expiry-time 120 features.cache-invalidation off features.cache-invalidation-timeout 60 features.leases off features.lease-lock-recall-timeout 60 disperse.background-heals 8 disperse.heal-wait-qlength 128 cluster.heal-timeout 600 dht.force-readdirp on disperse.read-policy round-robin cluster.shd-max-threads 8 cluster.shd-wait-qlength 10000 cluster.locking-scheme granular cluster.granular-entry-heal no
I see that you create replica 5 volumes. For the sake of debug, can you try the following? 1. create a replica 1 volume (on gluster server) 2. create a gluster storage domain (sd) with the replica 1 volume created before 3. create a new VM with disk on that gluster sd (no need to install OS nor copy data) 4. create 4 snapshots and delete them as done before Thanks!
Can you also please send gluster logs under /var/logs/glusterfs? There is a cli.log file and the volume(s) logs.
Created attachment 1290121 [details] gluster log (In reply to Ala Hino from comment #13) > I see that you create replica 5 volumes. > > For the sake of debug, can you try the following? > > 1. create a replica 1 volume (on gluster server) > 2. create a gluster storage domain (sd) with the replica 1 volume created > before > 3. create a new VM with disk on that gluster sd (no need to install OS nor > copy data) > 4. create 4 snapshots and delete them as done before > The vm-images-repo-demo volume, in witch the vm image is located, is configured as replica 3 volume. on the attachment you can find the gluster log from the test done before.
Can you please send the content of the following files: /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/c54dd527-06ac-4790-bcb8-8c3881b05442.meta /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/dd3485a6-6843-4664-bbbc-b6db9f68eb58.meta /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/bd74e5dc-95fd-40e3-aeb0-5fef6b67d172.meta /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/81b15267-721d-45ec-b9e2-061663febe5a.meta
(In reply to Ala Hino from comment #16) > Can you please send the content of the following files: > > /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13- > 97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/c54dd527-06ac- > 4790-bcb8-8c3881b05442.meta > DOMAIN=73ca2906-e59d-4c13-97f4-f636cad9fb0e CTIME=1497011478 FORMAT=COW DISKTYPE=2 LEGALITY=LEGAL SIZE=31457280 VOLTYPE=LEAF DESCRIPTION= IMAGE=09e08d20-6317-4418-9b69-9e5f396b64f9 PUUID=dd3485a6-6843-4664-bbbc-b6db9f68eb58 MTIME=0 POOL_UUID= TYPE=SPARSE GEN=0 EOF > /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13- > 97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/dd3485a6-6843- > 4664-bbbc-b6db9f68eb58.meta > DOMAIN=73ca2906-e59d-4c13-97f4-f636cad9fb0e CTIME=1497011131 FORMAT=COW DISKTYPE=2 LEGALITY=LEGAL SIZE=31457280 VOLTYPE=INTERNAL DESCRIPTION= IMAGE=09e08d20-6317-4418-9b69-9e5f396b64f9 PUUID=94f463fa-f431-4a6a-b1f5-11b234161de3 MTIME=0 POOL_UUID= TYPE=SPARSE GEN=0 EOF > /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13- > 97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/bd74e5dc-95fd- > 40e3-aeb0-5fef6b67d172.meta > DOMAIN=73ca2906-e59d-4c13-97f4-f636cad9fb0e CTIME=1497007610 FORMAT=COW DISKTYPE=2 LEGALITY=LEGAL SIZE=31457280 VOLTYPE=LEAF DESCRIPTION= IMAGE=09e08d20-6317-4418-9b69-9e5f396b64f9 PUUID=81b15267-721d-45ec-b9e2-061663febe5a MTIME=0 POOL_UUID= TYPE=SPARSE GEN=0 EOF > /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13- > 97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/81b15267-721d- > 45ec-b9e2-061663febe5a.meta DOMAIN=73ca2906-e59d-4c13-97f4-f636cad9fb0e CTIME=1497001994 FORMAT=RAW DISKTYPE=2 LEGALITY=LEGAL SIZE=31457280 VOLTYPE=INTERNAL DESCRIPTION={"DiskAlias":"mergeWftest4_Disk1","DiskDescription":"mergeWftest4_Disk1"} IMAGE=09e08d20-6317-4418-9b69-9e5f396b64f9 PUUID=00000000-0000-0000-0000-000000000000 MTIME=0 POOL_UUID= TYPE=SPARSE GEN=0 EOF
Thanks. I need qume-img info again of the original volumes (the image name was missing from what you sent before) - the output of the following commands: qemu-img info /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/c54dd527-06ac-4790-bcb8-8c3881b05442 qemu-img info /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/dd3485a6-6843-4664-bbbc-b6db9f68eb58 qemu-img info /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/bd74e5dc-95fd-40e3-aeb0-5fef6b67d172 qemu-img info /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/81b15267-721d-45ec-b9e2-061663febe5a Please make sure to include the image and the backing file. Thanks.
(In reply to Ala Hino from comment #18) > Thanks. > > I need qume-img info again of the original volumes (the image name was > missing from what you sent before) - the output of the following commands: > > qemu-img info > /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13- > 97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/c54dd527-06ac- > 4790-bcb8-8c3881b05442 > image: /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/c54dd527-06ac-4790-bcb8-8c3881b05442 file format: qcow2 virtual size: 15G (16106127360 bytes) disk size: 6.0G cluster_size: 65536 backing file: dd3485a6-6843-4664-bbbc-b6db9f68eb58 (actual path: /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/dd3485a6-6843-4664-bbbc-b6db9f68eb58) backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false > qemu-img info > /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13- > 97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/dd3485a6-6843- > 4664-bbbc-b6db9f68eb58 > image: /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/dd3485a6-6843-4664-bbbc-b6db9f68eb58 file format: qcow2 virtual size: 15G (16106127360 bytes) disk size: 163M cluster_size: 65536 backing file: 81b15267-721d-45ec-b9e2-061663febe5a (actual path: /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/81b15267-721d-45ec-b9e2-061663febe5a) backing file format: raw Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false > qemu-img info > /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13- > 97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/bd74e5dc-95fd- > 40e3-aeb0-5fef6b67d172 > image: /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/bd74e5dc-95fd-40e3-aeb0-5fef6b67d172 file format: qcow2 virtual size: 15G (16106127360 bytes) disk size: 1.3G cluster_size: 65536 backing file: 81b15267-721d-45ec-b9e2-061663febe5a (actual path: /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/81b15267-721d-45ec-b9e2-061663febe5a) backing file format: raw Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false > qemu-img info > /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13- > 97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/81b15267-721d- > 45ec-b9e2-061663febe5a > image: /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/09e08d20-6317-4418-9b69-9e5f396b64f9/81b15267-721d-45ec-b9e2-061663febe5a file format: raw virtual size: 15G (16106127360 bytes) disk size: 6.0G > Please make sure to include the image and the backing file. > > Thanks. Thanks.
4.1.4 is planned as a minimal, fast, z-stream version to fix any open issues we may have in supporting the upcoming EL 7.4. Pushing out anything unrelated, although if there's a minimal/trival, SAFE fix that's ready on time, we can consider introducing it in 4.1.4.
Created attachment 1294988 [details] test with oVirt 4.1.3 I updated to oVirt 4.1.3, and I tried a new test. Unfortunately the results are the same. I collected all the useful information in the tar. Please tell me if there is anything I can do to help find and solve the problem. thank you so much vmName= mergeWftest9 vmId= 02c854d2-fade-44a3-a215-1d81bc68108a disk_groupe_id= bfdbd9cb-dcc6-4278-84bb-7a08be7fbfcf mountpointpath: /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/bfdbd9cb-dcc6-4278-84bb-7a08be7fbfcf snapshot_disk_id: current = 5abffd41-d079-47de-8dc6-b2aba94f801b s4 = 4c6c9cb7-a280-4ce9-96a4-2f7f4fb52137 s3 = 6478a13f-5e83-45ab-ad84-23e90b2b323c s2 = 966ab0f5-887b-41f8-bd85-7cb9bce8fdfb s1 = 49bb049c-db4c-42dc-822d-fc89ed075e9a check the chain qemu-img info: file qemu-info-chain1.txt file vdsm-Volume-Info1.txt now start delete s1: Jul 6, 2017 5:40:39 PM Snapshot 's1' deletion for VM 'mergeWftest9' was initiated by admin@internal-authz. ul 6, 2017 5:42:48 PM Snapshot 's1' deletion for VM 'mergeWftest9' has been completed. correlationId: bff808e8-3866-4d52-a061-995bcdc2fcea new situation: snapshot_disk_id: current = 5abffd41-d079-47de-8dc6-b2aba94f801b s4 = 4c6c9cb7-a280-4ce9-96a4-2f7f4fb52137 s3 = 6478a13f-5e83-45ab-ad84-23e90b2b323c s2 = 49bb049c-db4c-42dc-822d-fc89ed075e9a check the chain qemu-img info: file qemu-info-chain2.txt vdsm-client Volume getInfo: file volumeInfo2.txt update ovs_storage now start delete s2: Jul 6, 2017 5:48:16 PM Snapshot 's2' deletion for VM 'mergeWftest9' was initiated by admin@internal-authz. correlationId: 09edb0bd-ba1a-4fb8-a638-c141467278ce
Thank you very for providing this valuable information. I will look into the new logs and see what I can learn from them. Will update ASAP.
I see that during block commit job, libvirt VIR_EVENT_HANDLE_HANGUP is fired and it seems that the job didn't complete yet. I will have to investigate more why that event fired, what the consequences and how to recover. How many hosts do you have in your environment? Can you upload the SPM logs? Thanks.
Created attachment 1299824 [details] test within SPM hi, i have 5 hosts in my configuration. About the spm log, i could not find the vdsm log of the old test because they were already rewritten by the rotate. So i made a new test and i made it on the host with the SPM assigned. Hope that in the tar there are all the log that you need, if you need more i'll be happy to provide it. thank you very much for helping
Can you please send qemu info of each volume in the chain?
Created attachment 1301237 [details] test13_InsideSPM By mistake I uploaded the old tar, please forgive me, here are the files of the new test. vmName= mergeWftest13 vmId= 4a2be9b3-e24d-45f6-ab56-123b5c60e253 disk_groupe_id= 11a424e1-65dc-4769-981c-10d312d26e86 mountpointpath: /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/11a424e1-65dc-4769-981c-10d312d26e86 snapshot_disk_id: current = 968e63b2-7a92-4045-82b6-69f96b6d66e8 s4 = cb49d1ca-3d90-4ccb-805d-f8b60b596358 s3 = dbde119b-5686-4ab9-8d8c-7ea4aaec52f8 s2 = e49fbac2-9be8-44b4-84ff-2edb029e357c s1 = ebb51580-08fe-4c68-b03a-abc4a1fed413 check the chain qemu-img info: file qemu-info-chain1.txt file vdsm-Volume-Info1.txt now start delete s1: Jul 17, 2017 12:01:52 PM Snapshot 's1' deletion for VM 'mergeWftest13' was initiated by admin@internal-authz. Jul 17, 2017 12:02:52 PM Snapshot 's1' deletion for VM 'mergeWftest13' has been complete correlationId: e8b27d6a-ebe6-456b-be54-39994c04eb66 new situation: snapshot_disk_id: current = 968e63b2-7a92-4045-82b6-69f96b6d66e8 s4 = cb49d1ca-3d90-4ccb-805d-f8b60b596358 s3 = dbde119b-5686-4ab9-8d8c-7ea4aaec52f8 s2 = ebb51580-08fe-4c68-b03a-abc4a1fed413 check the chain qemu-img info: file qemu-info-chain2.txt vdsm-client Volume getInfo: file volumeInfo2.txt update ovs_storage now start delete s2: Jul 17, 2017 12:09:13 PM Snapshot 's2' deletion for VM 'mergeWftest13' was initiated by admin@internal-authz. correlationId: b1da1e27-17c5-4601-8bfb-d7d98561caa4 check the chain qemu-img info: file qemu-info-chain3.txt vdsm-client Volume getInfo: file volumeInfo3.txt
Can you please HSM (the host running the VM) logs as well?
I've created a patch where I added more debug logs to help us better investigate the issue you encounter. Can you apply the changes in your setup and try the flow again? The patch is here: https://gerrit.ovirt.org/79622 In your setup, on the hosts (I need the SPM and the host running the VM or all the hosts): 1. Open /usr/share/vdsm/storage/fileVolume.py 2. From the patch add the highlighted lines exactly at the same lines 3. Restart the host
Created attachment 1301749 [details] wf with patch 79622 Applied the patch, rebooted the node and executed the wf. Can you please tell me where i can find the HSM log? vmName= testWfmerge14 vmId= a748fb84-2dc5-4ab5-89fb-cc25f2e8a25f disk_groupe_id= e3d05664-42b3-4c8a-87c9-1dee5403dedd mountpointpath: /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/e3d05664-42b3-4c8a-87c9-1dee5403dedd snapshot_disk_id: current = 15f3790f-4763-40e1-9176-e688630df07b s4 = 2c29f69c-a552-4aed-ac5c-c23f5cb04e5e s3 = 64cd8559-9d75-41e9-9ae2-0c91ef7c8048 s2 = 440676d3-fcb9-424d-bf54-77348b34e275 s1 = 59286a0b-683c-4fa2-b167-59bf8c665141 check the chain qemu-img info: file qemu-info-chain1.txt file vdsm-Volume-Info1.txt now start delete s1: Jul 20, 2017 2:51:42 PM Snapshot 's1' deletion for VM 'testWfmerge14' was initiated by admin@internal-authz. Jul 20, 2017 2:52:41 PM Snapshot 's1' deletion for VM 'testWfmerge14' has been completed. correlationId: 2c4abb74-da53-49c5-b28a-31141cb09937 new situation: snapshot_disk_id: current = 15f3790f-4763-40e1-9176-e688630df07b s4 = 2c29f69c-a552-4aed-ac5c-c23f5cb04e5e s3 = 64cd8559-9d75-41e9-9ae2-0c91ef7c8048 s2 = 59286a0b-683c-4fa2-b167-59bf8c665141 check the chain qemu-img info: file qemu-info-chain2.txt vdsm-client Volume getInfo: file vdsm-Volume-Info2.txt now start delete s2: Jul 20, 2017 3:02:44 PM Snapshot 's2' deletion for VM 'testWfmerge14' was initiated by admin@internal-authz. correlationId: 6c84cce0-593f-45a0-8b14-e2cf39df4aa7 check the chain qemu-img info: file qemu-info-chain3.txt vdsm-client Volume getInfo: file volumeInfo3.txt
HSM is actually the host running the VM. The log file is still Vdsm log file on the machine.
So in this case (host running Vm and SPM) you need just the vdsm.log, am i right? The log are on the tar attached to Comment 29: -vdsm.log -vdsm.log.1.xz thank you so much for helping me.
My bad, I wasn't looking for the correct message. I am seeing something unexpected and added more messages that will help me understand the root cause off the issue. I will have to ask you to apply the new messages (in addition to the existing ones) and try again. Can you? Please make sure you look at the updated version, here: https://gerrit.ovirt.org/#/c/79622/2/vdsm/storage/fileVolume.py Thanks!
Can you please run the following command on the host where the VM is running? rpm -qa
Created attachment 1301894 [details] test 15 patch 2 made new patch reboot host execute wf vmName= mergeWftest15 vmId= disk_groupe_id= 73aef966-5dfe-4a8c-8bad-9c71a423a9ed mountpointpath: /rhev/data-center/00000001-0001-0001-0001-0000000001a9/73ca2906-e59d-4c13-97f4-f636cad9fb0e/images/73aef966-5dfe-4a8c-8bad-9c71a423a9ed snapshot_disk_id: current = d5e0eabd-3973-4502-bb1c-0eb6584fc50f s4 = f5e891bf-4251-481d-8aee-3ae87626da26 s3 = 6473996c-a623-41b5-9388-6ffd9f3a471d s2 = ddee50e3-cd88-44e4-9f40-ffb0962f1cd7 s1 = 675d0eef-644c-44d7-b196-17459f6bcc56 check the chain qemu-img info: file qemu-info-chain1.txt file vdsm-Volume-Info1.txt now start delete s1: Jul 20, 2017 6:34:23 PM Snapshot 's1' deletion for VM 'mergeWftest15' was initiated by admin@internal-authz. Jul 20, 2017 6:35:15 PM Snapshot 's1' deletion for VM 'mergeWftest15' has been completed. correlationId: a9de6c4d-7465-4fe8-879c-c273be8f113e new situation: current = d5e0eabd-3973-4502-bb1c-0eb6584fc50f s4 = f5e891bf-4251-481d-8aee-3ae87626da26 s3 = 6473996c-a623-41b5-9388-6ffd9f3a471d s2 = 675d0eef-644c-44d7-b196-17459f6bcc56 check the chain qemu-img info: file qemu-info-chain2.txt vdsm-client Volume getInfo: file vdsm-Volume-Info2.txt now start delete s2: Jul 20, 2017 6:42:04 PM Snapshot 's2' deletion for VM 'mergeWftest15' was initiated by admin@internal-authz. correlationId: 675d0eef-644c-44d7-b196-17459f6bcc56 check the chain qemu-img info: file qemu-info-chain3.txt vdsm-client Volume getInfo: file volumeInfo3.txt
Created attachment 1301895 [details] rpm -qa as request rpm -qa
Thanks a lot for providing the info. This is much appreciated. This failure is because the volume name (vm-images-repo-demo) contains "images" in its name. I'd like to kindly ask you to enable DEBUG logs for following components in /etc/vdsm/logger.conf: logger_root logger_vds logger_storage (probably already in DEBUG level) logger_IOProcess And try again but this time with a volume that doesn't contain "images" in its name. Once again, thanks a lot for your cooperation on this!
I've applied a fix to handle volume names that include "images". The fix is here: https://gerrit.ovirt.org/#/c/79657/1 Please note that the fix applied on the master branch. If you'd like to apply the fix, in /usr/share/vdsm/storage/fileVolume.py replace line 169 with: domPath = self.imagePath.rsplit('images', 1)[0] Hope this fixes the issue you see.
Thanks you so much for the help, the patch work well. Do you need the vdsm.log? Do you think that it will be available from 4.1.5?
Good news. No need for vdsm.log. Yes, the fix will be in 4.1.5. Thank you for your cooperation on this bug.
This issue is about using "images" in file based storage domains - gluster or nfs. To reproduce/verify: 1. Create a sd that contains "images" in its name 2. Create 3 snapshots - s1, s2 and s3 3. Delete s1 - this works 4. Delete s2 - this fails without the fix
(In reply to Ala Hino from comment #40) > To reproduce/verify: > > 1. Create a sd that contains "images" in its name Don't you mean in its **path**?
Indeed. Fixed. Thanks
Eyal - this BZ was automatically moved to ON_QA, but no target release was set. Was that intentional?
Hi Allon, We never automated or agreed on the logic to add target release to bugs, and it's currently done manually by the bug owner/project maintainer. There are several issues with automating it, and there wasn't an agreement on the process when it was discussed.
-------------------------------------- Tested with the following code: ---------------------------------------- rhevm-4.1.5-0.1.el7.noarch vdsm-4.19.25-1.el7ev.x86_64 Tested with the following scenario: Steps to Reproduce: 1.create vm with 4 snapshots 2.delete one snapshot 3.delete one more 4.wait until the engine realize that the merge have failed. Actual results: Merge is completed successfully and the snapshot is removed. Expected results: Moving to VERIFIED!
Ala, can you please add some doctext to this BZ?