Bug 993119 - [RHS-RHOS] DHT hex is not reset on the glance image post remove-brick and it goes missing from filesystem_store_datadir specified in glance-api.conf
[RHS-RHOS] DHT hex is not reset on the glance image post remove-brick and it ...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
2.1
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: shishir gowda
shilpa
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-05 11:58 EDT by Gowrishankar Rajaiyan
Modified: 2013-12-08 20:36 EST (History)
9 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.19rhs-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
virt rhos cinder rhs integration
Last Closed: 2013-09-23 18:35:59 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Gowrishankar Rajaiyan 2013-08-05 11:58:08 EDT
Credit: Thanks to Shilpa Manjarabad Jagannath for reporting this issue.


Description of problem: Openstack glance image files are not migrated after a remove-brick operation. DHT hex is not reset. 

The impact on RHS-RHOS integration is such that this image goes missing from $filesystem_store_datadir specified in /etc/glance/glance-api.conf and any further launching of new instance fails ('ImageNotFound: Image 92a4a598-584a-47a9-9e32-ffb9c95bc7d6 could not be found.)  when all base images in OpenStack Nova is automatically removed post remove_unused_original_minimum_age_seconds which defaults to 86400 seconds. 


Version-Release number of selected component (if applicable): 3.4.0.15rhs


How reproducible: Tested once.


Steps to Reproduce:
1. Create a 6X2 distribute-replicate volume.
2. Fuse mount the volume with openstack glance.
3. Create glance image and create an instance.
4. Perform remove-brick start followed by commit operation on the volume accessed by glance image and remove the old brick/sub-directory

Actual results: An empty file still exists in the old brick and the glance image file is not available to RHOS at $filesystem_store_datadir specified in /etc/glance/glance-api.conf


Expected results: The file should be migrated to a new brick/sub-directory and should be available at $filesystem_store_datadir specified in /etc/glance/glance-api.conf


Additional info: 
1. Volume info before remove-brick:
 
 
[root@rhs-vm1 glusterfs-15]# gluster v i
 
Volume Name: glance-vol
Type: Distributed-Replicate
Volume ID: 967048b5-7f20-4804-8225-983881e1f9b0
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.168:/rhs/brick1/g1
Brick2: 10.70.37.74:/rhs/brick1/g2
Brick3: 10.70.37.220:/rhs/brick1/g3
Brick4: 10.70.37.203:/rhs/brick1/g4
Brick5: 10.70.37.220:/rhs/brick1/g7
Brick6: 10.70.37.203:/rhs/brick1/g8
Brick7: 10.70.37.168:/rhs/brick1/g9
Brick8: 10.70.37.74:/rhs/brick1/g10
Brick9: 10.70.37.220:/rhs/brick1/g11
Brick10: 10.70.37.203:/rhs/brick1/g12
Brick11: 10.70.37.168:/rhs/brick1/g5
Brick12: 10.70.37.74:/rhs/brick1/g6
Options Reconfigured:
storage.owner-gid: 161
storage.owner-uid: 161
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
 
2. [root@rhs-vm1 glusterfs-15]# find / -name 92a4a598-584a-47a9-9e32-ffb9c95bc7d6
/rhs/brick1/g1/glance/images/92a4a598-584a-47a9-9e32-ffb9c95bc7d6
 
(The image located in /rhs/brick1/g1)
 
3. [root@rhs-vm1 glusterfs-15]# gluster v remove-brick glance-vol 10.70.37.168:/rhs/brick1/g1 10.70.37.74:/rhs/brick1/g2 start
volume remove-brick start: success
ID: 125b887f-6ce9-4244-a567-28d208b5068f
 
  gluster v remove-brick glance-vol 10.70.37.168:/rhs/brick1/g1 10.70.37.74:/rhs/brick1/g2 commit
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit: success
 
 
 
 
After remove-brick operation:
 
Volume Name: glance-vol
Type: Distributed-Replicate
Volume ID: 967048b5-7f20-4804-8225-983881e1f9b0
Status: Started
Number of Bricks: 5 x 2 = 10
Transport-type: tcp
Bricks:
Brick1: 10.70.37.220:/rhs/brick1/g3
Brick2: 10.70.37.203:/rhs/brick1/g4
Brick3: 10.70.37.220:/rhs/brick1/g7
Brick4: 10.70.37.203:/rhs/brick1/g8
Brick5: 10.70.37.168:/rhs/brick1/g9
Brick6: 10.70.37.74:/rhs/brick1/g10
Brick7: 10.70.37.220:/rhs/brick1/g11
Brick8: 10.70.37.203:/rhs/brick1/g12
Brick9: 10.70.37.168:/rhs/brick1/g5
Brick10: 10.70.37.74:/rhs/brick1/g6
Options Reconfigured:
storage.owner-gid: 161
storage.owner-uid: 161
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
 
4. [root@rhs-vm1 glusterfs-15]# gluster v remove-brick glance-vol 10.70.37.168:/rhs/brick1/g1 10.70.37.74:/rhs/brick1/g2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run-time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                0        0Bytes             0             0             0      completed             0.00
                            10.70.37.220                0        0Bytes             0             0             0    not started             0.00
                            10.70.37.203                0        0Bytes             0             0             0    not started             0.00
                             10.70.37.74                0        0Bytes             1             0             0      completed             0.00
 
 
5. [root@rhs-vm1 glusterfs-15]# find / -name 92a4a598-584a-47a9-9e32-ffb9c95bc7d6
/rhs/brick1/g1/glance/images/92a4a598-584a-47a9-9e32-ffb9c95bc7d6
/rhs/brick1/g5/glance/images/92a4a598-584a-47a9-9e32-ffb9c95bc7d6
 
6. ll /rhs/brick1/g1/glance/images/92a4a598-584a-47a9-9e32-ffb9c95bc7d6
-rw-r----- 2 161 161 251985920 Aug  5 15:32 /rhs/brick1/g1/glance/images/92a4a598-584a-47a9-9e32-ffb9c95bc7d6
 
   ll /rhs/brick1/g5/glance/images/92a4a598-584a-47a9-9e32-ffb9c95bc7d6
---------T 2 161 161 0 Aug  5 17:36 /rhs/brick1/g5/glance/images/92a4a598-584a-47a9-9e32-ffb9c95bc7d6
 
7. getfattr -d -m . /rhs/brick1/g1/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/g1/
trusted.afr.glance-vol-client-0=0sAAAAAAAAAAAAAAAA
trusted.afr.glance-vol-client-1=0sAAAAAAAAAAAAAAAA
trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.glusterfs.dht=0sAAAAAQAAAAAAAAAAAAAAAA==
trusted.glusterfs.volume-id=0slnBItX8gSASCJZg4geH5sA==
 
 getfattr -d -m . /rhs/brick1/g2
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/g2
trusted.afr.glance-vol-client-0=0sAAAAAAAAAAAAAAAA
trusted.afr.glance-vol-client-1=0sAAAAAAAAAAAAAAAA
trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.glusterfs.dht=0sAAAAAQAAAAAAAAAAAAAAAA==
trusted.glusterfs.volume-id=0slnBItX8gSASCJZg4geH5sA==
 
 
[root@rhs-vm1 brick1]# getfattr -d -m . /rhs/brick1/g5/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/g5/
trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.glusterfs.dht=0sAAAAAQAAAAAzMzMzZmZmZQ==
trusted.glusterfs.volume-id=0slnBItX8gSASCJZg4geH5sA==
 
 
Logs from /var/log/glusterfs/glance-vol-rebalance.log
 
[2013-08-05 12:06:04.358728] I [dht-common.c:2650:dht_setxattr] 0-glance-vol-dht: fixing the layout of /glance/images
[2013-08-05 12:06:04.363508] I [dht-rebalance.c:1116:gf_defrag_migrate_data] 0-glance-vol-dht: migrate data called on /glance/images
[2013-08-05 12:06:04.375743] I [dht-rebalance.c:1333:gf_defrag_migrate_data] 0-glance-vol-dht: Migration operation on dir /glance/images took 0.01 secs
[2013-08-05 12:06:04.390329] I [dht-rebalance.c:1766:gf_defrag_status_get] 0-glusterfs: Rebalance is completed. Time taken is 0.00 secs
[2013-08-05 12:06:04.390362] I [dht-rebalance.c:1769:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 1, failures: 0, skipped: 0
Comment 2 shilpa 2013-08-06 07:50:38 EDT
Tried to reproduce this on a new volume with eager-lock turned off as suggested by Shishir. The issue persists. 

With eager-lock enabled:

Volume Name: vol-glance
Type: Distributed-Replicate
Volume ID: 93bd85a9-1621-444e-8f0d-3122cfa86723
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.168:/rhs/brick3/g1
Brick2: 10.70.37.74:/rhs/brick3/g2
Brick3: 10.70.37.220:/rhs/brick3/g3
Brick4: 10.70.37.203:/rhs/brick3/g4
Brick5: 10.70.37.168:/rhs/brick3/g5
Brick6: 10.70.37.74:/rhs/brick3/g6
Brick7: 10.70.37.220:/rhs/brick3/g7
Brick8: 10.70.37.203:/rhs/brick3/g8
Brick9: 10.70.37.168:/rhs/brick3/g9
Brick10: 10.70.37.74:/rhs/brick3/g10
Brick11: 10.70.37.220:/rhs/brick3/g11
Brick12: 10.70.37.203:/rhs/brick3/g12
Options Reconfigured:
storage.owner-uid: 161
storage.owner-gid: 161
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off

Remove-brick. File: c4efd768-3b9a-44ab-9b91-5dcfc2989fc0

[root@rhs-vm1 home]# find / -name c4efd768-3b9a-44ab-9b91-5dcfc2989fc0
/rhs/brick3/g9/glance/images/c4efd768-3b9a-44ab-9b91-5dcfc2989fc0


[root@rhs-vm1 home]# gluster v remove-brick vol-glance 10.70.37.168:/rhs/brick3/g9 10.70.37.74:/rhs/brick3/g10 start
volume remove-brick start: success
ID: 99dea640-3b2d-4c20-92da-0759fa860af6

[root@rhs-vm1 home]# gluster v remove-brick vol-glance 10.70.37.168:/rhs/brick3/g9 10.70.37.74:/rhs/brick3/g10 status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run-time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                0        0Bytes             0             0             0      completed             0.00
                            10.70.37.220                0        0Bytes             0             0             0    not started             0.00
                            10.70.37.203                0        0Bytes             0             0             0    not started             0.00
                             10.70.37.74                0        0Bytes             4             0             0      completed             0.00
After remove brick:

[root@rhs-vm1 home]# find / -name c4efd768-3b9a-44ab-9b91-5dcfc2989fc0
/rhs/brick3/g9/glance/images/c4efd768-3b9a-44ab-9b91-5dcfc2989fc0

[root@rhs-vm3 home]# find / -name c4efd768-3b9a-44ab-9b91-5dcfc2989fc0
/rhs/brick3/g11/glance/images/c4efd768-3b9a-44ab-9b91-5dcfc2989fc0

[root@rhs-vm1 home]# ll /rhs/brick3/g9/glance/images/c4efd768-3b9a-44ab-9b91-5dcfc2989fc0
-rw-r----- 2 161 161 251985920 Aug  6 16:35 /rhs/brick3/g9/glance/images/c4efd768-3b9a-44ab-9b91-5dcfc2989fc0


[root@rhs-vm3 home]# ll /rhs/brick3/g11/glance/images/7ece2be5-2cd5-41c7-a9bc-be44eadb84b3
---------T 2 161 161 0 Aug  6 16:56 /rhs/brick3/g11/glance/images/7ece2be5-2cd5-41c7-a9bc-be44eadb84b3


With eager-lock off:

Volume Name: vol-glance
Type: Distributed-Replicate
Volume ID: 93bd85a9-1621-444e-8f0d-3122cfa86723
Status: Started
Number of Bricks: 5 x 2 = 10
Transport-type: tcp
Bricks:
Brick1: 10.70.37.168:/rhs/brick3/g1
Brick2: 10.70.37.74:/rhs/brick3/g2
Brick3: 10.70.37.220:/rhs/brick3/g3
Brick4: 10.70.37.203:/rhs/brick3/g4
Brick5: 10.70.37.168:/rhs/brick3/g5
Brick6: 10.70.37.74:/rhs/brick3/g6
Brick7: 10.70.37.220:/rhs/brick3/g7
Brick8: 10.70.37.203:/rhs/brick3/g8
Brick9: 10.70.37.220:/rhs/brick3/g11
Brick10: 10.70.37.203:/rhs/brick3/g12
Options Reconfigured:
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: off
network.remote-dio: enable
storage.owner-gid: 161
storage.owner-uid: 161


Tested remove-brick on file : 7ece2be5-2cd5-41c7-a9bc-be44eadb84b3

[root@rhs-vm1 home]# find / -name 7ece2be5-2cd5-41c7-a9bc-be44eadb84b3
/rhs/brick3/g1/glance/images/7ece2be5-2cd5-41c7-a9bc-be44eadb84b3
[root@rhs-vm1 home]# gluster v remove-brick vol-glance 10.70.37.168:/rhs/brick3/g1 10.70.37.74:/rhs/brick3/g2 start
volume remove-brick start: success
ID: f63be36c-1c45-4835-b10c-bc4d784c5001
[root@rhs-vm1 home]# gluster v remove-brick vol-glance 10.70.37.168:/rhs/brick3/g1 10.70.37.74:/rhs/brick3/g2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run-time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                0        0Bytes             0             0             0      completed             0.00
                            10.70.37.220                0        0Bytes             0             0             0    not started             0.00
                            10.70.37.203                0        0Bytes             0             0             0    not started             0.00
                             10.70.37.74                0        0Bytes             2             0             0      completed             0.00

[root@rhs-vm1 home]# ll /rhs/brick3/g1/glance/images/7ece2be5-2cd5-41c7-a9bc-be44eadb84b3
-rw-r----- 2 161 161 251985920 Aug  6 16:35 /rhs/brick3/g1/glance/images/7ece2be5-2cd5-41c7-a9bc-be44eadb84b3

[root@rhs-vm3 home]# ll /rhs/brick3/g11/glance/images/7ece2be5-2cd5-41c7-a9bc-be44eadb84b3
---------T 2 161 161 0 Aug  6 16:56 /rhs/brick3/g11/glance/images/7ece2be5-2cd5-41c7-a9bc-be44eadb84b3


As seen above, both the tests yield the same result. The file is still found in the orginal brick that is removed.
Comment 3 shilpa 2013-08-07 05:09:00 EDT
Tested on distribute volume 6X2. The files are successfully migrated after rebalance. Issue found only on distribute-replicate volumes.
Comment 4 shilpa 2013-08-07 06:19:26 EDT
(In reply to shilpa from comment #3)
> Tested on a distribute volume. The files are successfully migrated after
> rebalance. Issue found only on distribute-replicate volumes.
Comment 5 shilpa 2013-08-07 08:22:42 EDT
Continuing tests on distribute-replicate volume. With the gluster volume for glance unmounted on the Openstack client, rebalance seems to work.

[root@rhs-client40 cinder(keystone_admin)]# umount /mnt/gluster


[root@rhs-vm1 brick1]# gluster v remove-brick glance-vol 10.70.37.220:/rhs/brick1/g7 10.70.37.203:/rhs/brick1/g8 start

volume remove-brick start: success
ID: cb4a59ad-8888-45ea-9740-a0079c1a8efa

[root@rhs-vm1 brick1]# gluster v remove-brick glance-vol 10.70.37.220:/rhs/brick1/g7 10.70.37.203:/rhs/brick1/g8 status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run-time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                0        0Bytes             0             0             0    not started             0.00
                            10.70.37.220                3         1.4GB             5             0             0      completed            51.00
                            10.70.37.203                0        0Bytes             4             0             0      completed             1.00
                             10.70.37.74                0        0Bytes             0             0             0    not started             0.00
Comment 6 shilpa 2013-08-08 05:30:22 EDT
sosreports of RHS nodes and openstack node in http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/993119/.
Comment 7 Amar Tumballi 2013-08-13 05:31:42 EDT
https://code.engineering.redhat.com/gerrit/11380
Comment 8 shilpa 2013-08-14 02:47:49 EDT
Verified in glusterfs-3.4.0.19rhs-1.
Comment 9 shilpa 2013-08-14 06:37:03 EDT
Tested with a new volume vol-glance on file d31aca1c-d462-4241-9c4f-e0550466cedb.



Volume Name: vol-glance
Type: Distributed-Replicate
Volume ID: 1f37f298-1563-4df7-844d-6953685ae3ff
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.168:/rhs/brick3/g1
Brick2: 10.70.37.74:/rhs/brick3/g2
Brick3: 10.70.37.220:/rhs/brick3/g3
Brick4: 10.70.37.203:/rhs/brick3/g4
Brick5: 10.70.37.168:/rhs/brick3/g5
Brick6: 10.70.37.74:/rhs/brick3/g6
Brick7: 10.70.37.220:/rhs/brick3/g7
Brick8: 10.70.37.203:/rhs/brick3/g8
Brick9: 10.70.37.168:/rhs/brick3/g9
Brick10: 10.70.37.74:/rhs/brick3/g10
Brick11: 10.70.37.220:/rhs/brick3/g11
Brick12: 10.70.37.203:/rhs/brick3/g12
Options Reconfigured:
storage.owner-uid: 161
storage.owner-gid: 161
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off


[root@rhs-vm3 brick3]# find /rhs -name d31aca1c-d462-4241-9c4f-e0550466cedb
/rhs/brick3/g7/glance/images/d31aca1c-d462-4241-9c4f-e0550466cedb


[root@rhs-vm3 brick3]# gluster v remove-brick vol-glance 10.70.37.220:/rhs/brick3/g7 10.70.37.203:/rhs/brick3/g8 start
volume remove-brick start: success
ID: c0218192-d977-4984-bd4f-cf672eda3089


[root@rhs-vm3 brick3]# gluster v remove-brick vol-glance 10.70.37.220:/rhs/brick3/g7 10.70.37.203:/rhs/brick3/g8 stat
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run-time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                1       892.6MB             2             0             0      completed            17.00
                            10.70.37.203                0        0Bytes             1             0             0      completed             0.00
                            10.70.37.168                0        0Bytes             0             0             0    not started             0.00
                             10.70.37.74                0        0Bytes             0             0             0    not started             0.00


File successfully migrated to brick g11:

[root@rhs-vm3 brick3]# find /rhs -name d31aca1c-d462-4241-9c4f-e0550466cedb
/rhs/brick3/g11/glance/images/d31aca1c-d462-4241-9c4f-e0550466cedb

[root@rhs-vm3 brick3]# gluster v remove-brick vol-glance 10.70.37.220:/rhs/brick3/g7 10.70.37.203:/rhs/brick3/g8 commit
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit: success
Comment 10 Scott Haines 2013-09-23 18:35:59 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.