Credit: Thanks to Shilpa Manjarabad Jagannath for reporting this issue. Description of problem: Openstack glance image files are not migrated after a remove-brick operation. DHT hex is not reset. The impact on RHS-RHOS integration is such that this image goes missing from $filesystem_store_datadir specified in /etc/glance/glance-api.conf and any further launching of new instance fails ('ImageNotFound: Image 92a4a598-584a-47a9-9e32-ffb9c95bc7d6 could not be found.) when all base images in OpenStack Nova is automatically removed post remove_unused_original_minimum_age_seconds which defaults to 86400 seconds. Version-Release number of selected component (if applicable): 3.4.0.15rhs How reproducible: Tested once. Steps to Reproduce: 1. Create a 6X2 distribute-replicate volume. 2. Fuse mount the volume with openstack glance. 3. Create glance image and create an instance. 4. Perform remove-brick start followed by commit operation on the volume accessed by glance image and remove the old brick/sub-directory Actual results: An empty file still exists in the old brick and the glance image file is not available to RHOS at $filesystem_store_datadir specified in /etc/glance/glance-api.conf Expected results: The file should be migrated to a new brick/sub-directory and should be available at $filesystem_store_datadir specified in /etc/glance/glance-api.conf Additional info: 1. Volume info before remove-brick: [root@rhs-vm1 glusterfs-15]# gluster v i Volume Name: glance-vol Type: Distributed-Replicate Volume ID: 967048b5-7f20-4804-8225-983881e1f9b0 Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: 10.70.37.168:/rhs/brick1/g1 Brick2: 10.70.37.74:/rhs/brick1/g2 Brick3: 10.70.37.220:/rhs/brick1/g3 Brick4: 10.70.37.203:/rhs/brick1/g4 Brick5: 10.70.37.220:/rhs/brick1/g7 Brick6: 10.70.37.203:/rhs/brick1/g8 Brick7: 10.70.37.168:/rhs/brick1/g9 Brick8: 10.70.37.74:/rhs/brick1/g10 Brick9: 10.70.37.220:/rhs/brick1/g11 Brick10: 10.70.37.203:/rhs/brick1/g12 Brick11: 10.70.37.168:/rhs/brick1/g5 Brick12: 10.70.37.74:/rhs/brick1/g6 Options Reconfigured: storage.owner-gid: 161 storage.owner-uid: 161 network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off 2. [root@rhs-vm1 glusterfs-15]# find / -name 92a4a598-584a-47a9-9e32-ffb9c95bc7d6 /rhs/brick1/g1/glance/images/92a4a598-584a-47a9-9e32-ffb9c95bc7d6 (The image located in /rhs/brick1/g1) 3. [root@rhs-vm1 glusterfs-15]# gluster v remove-brick glance-vol 10.70.37.168:/rhs/brick1/g1 10.70.37.74:/rhs/brick1/g2 start volume remove-brick start: success ID: 125b887f-6ce9-4244-a567-28d208b5068f gluster v remove-brick glance-vol 10.70.37.168:/rhs/brick1/g1 10.70.37.74:/rhs/brick1/g2 commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success After remove-brick operation: Volume Name: glance-vol Type: Distributed-Replicate Volume ID: 967048b5-7f20-4804-8225-983881e1f9b0 Status: Started Number of Bricks: 5 x 2 = 10 Transport-type: tcp Bricks: Brick1: 10.70.37.220:/rhs/brick1/g3 Brick2: 10.70.37.203:/rhs/brick1/g4 Brick3: 10.70.37.220:/rhs/brick1/g7 Brick4: 10.70.37.203:/rhs/brick1/g8 Brick5: 10.70.37.168:/rhs/brick1/g9 Brick6: 10.70.37.74:/rhs/brick1/g10 Brick7: 10.70.37.220:/rhs/brick1/g11 Brick8: 10.70.37.203:/rhs/brick1/g12 Brick9: 10.70.37.168:/rhs/brick1/g5 Brick10: 10.70.37.74:/rhs/brick1/g6 Options Reconfigured: storage.owner-gid: 161 storage.owner-uid: 161 network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off 4. [root@rhs-vm1 glusterfs-15]# gluster v remove-brick glance-vol 10.70.37.168:/rhs/brick1/g1 10.70.37.74:/rhs/brick1/g2 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 0 completed 0.00 10.70.37.220 0 0Bytes 0 0 0 not started 0.00 10.70.37.203 0 0Bytes 0 0 0 not started 0.00 10.70.37.74 0 0Bytes 1 0 0 completed 0.00 5. [root@rhs-vm1 glusterfs-15]# find / -name 92a4a598-584a-47a9-9e32-ffb9c95bc7d6 /rhs/brick1/g1/glance/images/92a4a598-584a-47a9-9e32-ffb9c95bc7d6 /rhs/brick1/g5/glance/images/92a4a598-584a-47a9-9e32-ffb9c95bc7d6 6. ll /rhs/brick1/g1/glance/images/92a4a598-584a-47a9-9e32-ffb9c95bc7d6 -rw-r----- 2 161 161 251985920 Aug 5 15:32 /rhs/brick1/g1/glance/images/92a4a598-584a-47a9-9e32-ffb9c95bc7d6 ll /rhs/brick1/g5/glance/images/92a4a598-584a-47a9-9e32-ffb9c95bc7d6 ---------T 2 161 161 0 Aug 5 17:36 /rhs/brick1/g5/glance/images/92a4a598-584a-47a9-9e32-ffb9c95bc7d6 7. getfattr -d -m . /rhs/brick1/g1/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/g1/ trusted.afr.glance-vol-client-0=0sAAAAAAAAAAAAAAAA trusted.afr.glance-vol-client-1=0sAAAAAAAAAAAAAAAA trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ== trusted.glusterfs.dht=0sAAAAAQAAAAAAAAAAAAAAAA== trusted.glusterfs.volume-id=0slnBItX8gSASCJZg4geH5sA== getfattr -d -m . /rhs/brick1/g2 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/g2 trusted.afr.glance-vol-client-0=0sAAAAAAAAAAAAAAAA trusted.afr.glance-vol-client-1=0sAAAAAAAAAAAAAAAA trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ== trusted.glusterfs.dht=0sAAAAAQAAAAAAAAAAAAAAAA== trusted.glusterfs.volume-id=0slnBItX8gSASCJZg4geH5sA== [root@rhs-vm1 brick1]# getfattr -d -m . /rhs/brick1/g5/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/g5/ trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ== trusted.glusterfs.dht=0sAAAAAQAAAAAzMzMzZmZmZQ== trusted.glusterfs.volume-id=0slnBItX8gSASCJZg4geH5sA== Logs from /var/log/glusterfs/glance-vol-rebalance.log [2013-08-05 12:06:04.358728] I [dht-common.c:2650:dht_setxattr] 0-glance-vol-dht: fixing the layout of /glance/images [2013-08-05 12:06:04.363508] I [dht-rebalance.c:1116:gf_defrag_migrate_data] 0-glance-vol-dht: migrate data called on /glance/images [2013-08-05 12:06:04.375743] I [dht-rebalance.c:1333:gf_defrag_migrate_data] 0-glance-vol-dht: Migration operation on dir /glance/images took 0.01 secs [2013-08-05 12:06:04.390329] I [dht-rebalance.c:1766:gf_defrag_status_get] 0-glusterfs: Rebalance is completed. Time taken is 0.00 secs [2013-08-05 12:06:04.390362] I [dht-rebalance.c:1769:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 1, failures: 0, skipped: 0
Tried to reproduce this on a new volume with eager-lock turned off as suggested by Shishir. The issue persists. With eager-lock enabled: Volume Name: vol-glance Type: Distributed-Replicate Volume ID: 93bd85a9-1621-444e-8f0d-3122cfa86723 Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: 10.70.37.168:/rhs/brick3/g1 Brick2: 10.70.37.74:/rhs/brick3/g2 Brick3: 10.70.37.220:/rhs/brick3/g3 Brick4: 10.70.37.203:/rhs/brick3/g4 Brick5: 10.70.37.168:/rhs/brick3/g5 Brick6: 10.70.37.74:/rhs/brick3/g6 Brick7: 10.70.37.220:/rhs/brick3/g7 Brick8: 10.70.37.203:/rhs/brick3/g8 Brick9: 10.70.37.168:/rhs/brick3/g9 Brick10: 10.70.37.74:/rhs/brick3/g10 Brick11: 10.70.37.220:/rhs/brick3/g11 Brick12: 10.70.37.203:/rhs/brick3/g12 Options Reconfigured: storage.owner-uid: 161 storage.owner-gid: 161 network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off Remove-brick. File: c4efd768-3b9a-44ab-9b91-5dcfc2989fc0 [root@rhs-vm1 home]# find / -name c4efd768-3b9a-44ab-9b91-5dcfc2989fc0 /rhs/brick3/g9/glance/images/c4efd768-3b9a-44ab-9b91-5dcfc2989fc0 [root@rhs-vm1 home]# gluster v remove-brick vol-glance 10.70.37.168:/rhs/brick3/g9 10.70.37.74:/rhs/brick3/g10 start volume remove-brick start: success ID: 99dea640-3b2d-4c20-92da-0759fa860af6 [root@rhs-vm1 home]# gluster v remove-brick vol-glance 10.70.37.168:/rhs/brick3/g9 10.70.37.74:/rhs/brick3/g10 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 0 completed 0.00 10.70.37.220 0 0Bytes 0 0 0 not started 0.00 10.70.37.203 0 0Bytes 0 0 0 not started 0.00 10.70.37.74 0 0Bytes 4 0 0 completed 0.00 After remove brick: [root@rhs-vm1 home]# find / -name c4efd768-3b9a-44ab-9b91-5dcfc2989fc0 /rhs/brick3/g9/glance/images/c4efd768-3b9a-44ab-9b91-5dcfc2989fc0 [root@rhs-vm3 home]# find / -name c4efd768-3b9a-44ab-9b91-5dcfc2989fc0 /rhs/brick3/g11/glance/images/c4efd768-3b9a-44ab-9b91-5dcfc2989fc0 [root@rhs-vm1 home]# ll /rhs/brick3/g9/glance/images/c4efd768-3b9a-44ab-9b91-5dcfc2989fc0 -rw-r----- 2 161 161 251985920 Aug 6 16:35 /rhs/brick3/g9/glance/images/c4efd768-3b9a-44ab-9b91-5dcfc2989fc0 [root@rhs-vm3 home]# ll /rhs/brick3/g11/glance/images/7ece2be5-2cd5-41c7-a9bc-be44eadb84b3 ---------T 2 161 161 0 Aug 6 16:56 /rhs/brick3/g11/glance/images/7ece2be5-2cd5-41c7-a9bc-be44eadb84b3 With eager-lock off: Volume Name: vol-glance Type: Distributed-Replicate Volume ID: 93bd85a9-1621-444e-8f0d-3122cfa86723 Status: Started Number of Bricks: 5 x 2 = 10 Transport-type: tcp Bricks: Brick1: 10.70.37.168:/rhs/brick3/g1 Brick2: 10.70.37.74:/rhs/brick3/g2 Brick3: 10.70.37.220:/rhs/brick3/g3 Brick4: 10.70.37.203:/rhs/brick3/g4 Brick5: 10.70.37.168:/rhs/brick3/g5 Brick6: 10.70.37.74:/rhs/brick3/g6 Brick7: 10.70.37.220:/rhs/brick3/g7 Brick8: 10.70.37.203:/rhs/brick3/g8 Brick9: 10.70.37.220:/rhs/brick3/g11 Brick10: 10.70.37.203:/rhs/brick3/g12 Options Reconfigured: performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: off network.remote-dio: enable storage.owner-gid: 161 storage.owner-uid: 161 Tested remove-brick on file : 7ece2be5-2cd5-41c7-a9bc-be44eadb84b3 [root@rhs-vm1 home]# find / -name 7ece2be5-2cd5-41c7-a9bc-be44eadb84b3 /rhs/brick3/g1/glance/images/7ece2be5-2cd5-41c7-a9bc-be44eadb84b3 [root@rhs-vm1 home]# gluster v remove-brick vol-glance 10.70.37.168:/rhs/brick3/g1 10.70.37.74:/rhs/brick3/g2 start volume remove-brick start: success ID: f63be36c-1c45-4835-b10c-bc4d784c5001 [root@rhs-vm1 home]# gluster v remove-brick vol-glance 10.70.37.168:/rhs/brick3/g1 10.70.37.74:/rhs/brick3/g2 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 0 completed 0.00 10.70.37.220 0 0Bytes 0 0 0 not started 0.00 10.70.37.203 0 0Bytes 0 0 0 not started 0.00 10.70.37.74 0 0Bytes 2 0 0 completed 0.00 [root@rhs-vm1 home]# ll /rhs/brick3/g1/glance/images/7ece2be5-2cd5-41c7-a9bc-be44eadb84b3 -rw-r----- 2 161 161 251985920 Aug 6 16:35 /rhs/brick3/g1/glance/images/7ece2be5-2cd5-41c7-a9bc-be44eadb84b3 [root@rhs-vm3 home]# ll /rhs/brick3/g11/glance/images/7ece2be5-2cd5-41c7-a9bc-be44eadb84b3 ---------T 2 161 161 0 Aug 6 16:56 /rhs/brick3/g11/glance/images/7ece2be5-2cd5-41c7-a9bc-be44eadb84b3 As seen above, both the tests yield the same result. The file is still found in the orginal brick that is removed.
Tested on distribute volume 6X2. The files are successfully migrated after rebalance. Issue found only on distribute-replicate volumes.
(In reply to shilpa from comment #3) > Tested on a distribute volume. The files are successfully migrated after > rebalance. Issue found only on distribute-replicate volumes.
Continuing tests on distribute-replicate volume. With the gluster volume for glance unmounted on the Openstack client, rebalance seems to work. [root@rhs-client40 cinder(keystone_admin)]# umount /mnt/gluster [root@rhs-vm1 brick1]# gluster v remove-brick glance-vol 10.70.37.220:/rhs/brick1/g7 10.70.37.203:/rhs/brick1/g8 start volume remove-brick start: success ID: cb4a59ad-8888-45ea-9740-a0079c1a8efa [root@rhs-vm1 brick1]# gluster v remove-brick glance-vol 10.70.37.220:/rhs/brick1/g7 10.70.37.203:/rhs/brick1/g8 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 0 not started 0.00 10.70.37.220 3 1.4GB 5 0 0 completed 51.00 10.70.37.203 0 0Bytes 4 0 0 completed 1.00 10.70.37.74 0 0Bytes 0 0 0 not started 0.00
sosreports of RHS nodes and openstack node in http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/993119/.
https://code.engineering.redhat.com/gerrit/11380
Verified in glusterfs-3.4.0.19rhs-1.
Tested with a new volume vol-glance on file d31aca1c-d462-4241-9c4f-e0550466cedb. Volume Name: vol-glance Type: Distributed-Replicate Volume ID: 1f37f298-1563-4df7-844d-6953685ae3ff Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: 10.70.37.168:/rhs/brick3/g1 Brick2: 10.70.37.74:/rhs/brick3/g2 Brick3: 10.70.37.220:/rhs/brick3/g3 Brick4: 10.70.37.203:/rhs/brick3/g4 Brick5: 10.70.37.168:/rhs/brick3/g5 Brick6: 10.70.37.74:/rhs/brick3/g6 Brick7: 10.70.37.220:/rhs/brick3/g7 Brick8: 10.70.37.203:/rhs/brick3/g8 Brick9: 10.70.37.168:/rhs/brick3/g9 Brick10: 10.70.37.74:/rhs/brick3/g10 Brick11: 10.70.37.220:/rhs/brick3/g11 Brick12: 10.70.37.203:/rhs/brick3/g12 Options Reconfigured: storage.owner-uid: 161 storage.owner-gid: 161 network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off [root@rhs-vm3 brick3]# find /rhs -name d31aca1c-d462-4241-9c4f-e0550466cedb /rhs/brick3/g7/glance/images/d31aca1c-d462-4241-9c4f-e0550466cedb [root@rhs-vm3 brick3]# gluster v remove-brick vol-glance 10.70.37.220:/rhs/brick3/g7 10.70.37.203:/rhs/brick3/g8 start volume remove-brick start: success ID: c0218192-d977-4984-bd4f-cf672eda3089 [root@rhs-vm3 brick3]# gluster v remove-brick vol-glance 10.70.37.220:/rhs/brick3/g7 10.70.37.203:/rhs/brick3/g8 stat Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 1 892.6MB 2 0 0 completed 17.00 10.70.37.203 0 0Bytes 1 0 0 completed 0.00 10.70.37.168 0 0Bytes 0 0 0 not started 0.00 10.70.37.74 0 0Bytes 0 0 0 not started 0.00 File successfully migrated to brick g11: [root@rhs-vm3 brick3]# find /rhs -name d31aca1c-d462-4241-9c4f-e0550466cedb /rhs/brick3/g11/glance/images/d31aca1c-d462-4241-9c4f-e0550466cedb [root@rhs-vm3 brick3]# gluster v remove-brick vol-glance 10.70.37.220:/rhs/brick3/g7 10.70.37.203:/rhs/brick3/g8 commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html