| Summary: | [RHS-RHOS] VM damaged beyond repair if the cinder volume attached to it is rebalanced post add-brick. | |||
|---|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Gowrishankar Rajaiyan <grajaiya> | |
| Component: | glusterd | Assignee: | crisbud <crisbud> | |
| Status: | CLOSED ERRATA | QA Contact: | shilpa <smanjara> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 2.1 | CC: | aavati, divya, grajaiya, kparthas, nsathyan, pkarampu, psriniva, rhs-bugs, sdharane, tkatarki, vagarwal, vbellur, vraman | |
| Target Milestone: | --- | Keywords: | ZStream | |
| Target Release: | RHGS 2.1.2 | Flags: | psriniva:
needinfo?
(pkarampu) |
|
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Previously when you performed an add-brick operation followed by a Rebalanace operation, the Cinder volumes created on top of the Red Hat Storage volumes were rendered unusable. This issue was also observed with VMs managed with the Red Hat Enterprise Virtualization Manager. With this fix, the cinder volumes work as expected in such a scenario.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1013456 (view as bug list) | Environment: |
virt rhos cinder rhs integration
|
|
| Last Closed: | 2014-02-25 07:35:08 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | ||||
| Bug Blocks: | 1013456 | |||
| Attachments: | ||||
|
Description
Gowrishankar Rajaiyan
2013-08-21 15:07:10 UTC
[root@rhsqa-virt06 ~]# for i in 10.70.43.66 10.70.43.47 10.70.43.53 10.70.43.67; do ssh $i "(echo $i:; getfattr -d -m . /rhs/bricks/cinder1/volume-64633c79-6748-4d01-b60b-2923a5d1db3e; echo)"; done 10.70.43.66: getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1/volume-64633c79-6748-4d01-b60b-2923a5d1db3e trusted.gfid=0sebxoy0WtQayHIsGc4lpcew== trusted.glusterfs.dht.linkto="cinder-vol-replicate-4" 10.70.43.47: getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1/volume-64633c79-6748-4d01-b60b-2923a5d1db3e trusted.afr.cinder-vol-client-8=0sAAAAAAAAAAAAAAAA trusted.afr.cinder-vol-client-9=0sAAAAAAAAAAAAAAAA trusted.gfid=0sebxoy0WtQayHIsGc4lpcew== 10.70.43.53: getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1/volume-64633c79-6748-4d01-b60b-2923a5d1db3e trusted.afr.cinder-vol-client-8=0sAAAAAAAAAAAAAAAA trusted.afr.cinder-vol-client-9=0sAAAAAAAAAAAAAAAA trusted.gfid=0sebxoy0WtQayHIsGc4lpcew== 10.70.43.67: getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1/volume-64633c79-6748-4d01-b60b-2923a5d1db3e trusted.gfid=0sebxoy0WtQayHIsGc4lpcew== trusted.glusterfs.dht.linkto="cinder-vol-replicate-4" [root@rhsqa-virt06 ~]# Created attachment 789038 [details] cinder-vol-rebalance logs from all the nodes $ for i in 10.70.43.66 10.70.43.67 10.70.43.68 10.70.43.69 10.70.43.70 10.70.43.61 10.70.43.56 10.70.43.57 10.70.43.53 10.70.43.47 10.70.43.44 10.70.43.42 10.70.43.40 10.70.43.35 10.70.43.36 10.70.43.37; do scp root@$i:/var/log/glusterfs/cinder-vol-rebalance.log /tmp/cinder-vol-rebalance.log.$i; done $ tar -cjvf cinder-vol-rebalance.log.tar.bz2 /tmp/cinder-vol-rebalance.log* [root@rhsqa-virt06 ~]# for i in 10.70.43.66 10.70.43.47 10.70.43.53 10.70.43.57 10.70.43.67 10.70.43.56 10.70.43.61 10.70.43.44 10.70.43.68 10.70.43.69 10.70.43.72 10.70.43.70 10.70.43.42 10.70.43.76 10.70.43.40 10.70.43.35 10.70.43.36 10.70.43.37; do ssh $i "(echo $i; getfattr -m - -d -e hex /rhs/bricks/cinder1; echo)"; done 10.70.43.66 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000001ffffffe trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a 10.70.43.47 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000007ffffffc9ffffffa trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a 10.70.43.53 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000007ffffffc9ffffffa trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a 10.70.43.57 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000005ffffffd7ffffffb trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a 10.70.43.67 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000001ffffffe trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a 10.70.43.56 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000005ffffffd7ffffffb trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a 10.70.43.61 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000003ffffffe5ffffffc trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a 10.70.43.44 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.afr.cinder-vol-client-10=0x000000000000000000000000 trusted.afr.cinder-vol-client-11=0x000000000000000000000002 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000009ffffffbbffffff9 trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a 10.70.43.68 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000001fffffff3ffffffd trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a 10.70.43.69 # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000001fffffff3ffffffd trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a getfattr: Removing leading '/' from absolute path names 10.70.43.72 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000002492492449249247 trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a 10.70.43.70 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000003ffffffe5ffffffc trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a ssh: connect to host 10.70.43.42 port 22: No route to host 10.70.43.76 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000002492492449249247 trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a 10.70.43.40 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000bffffffadffffff8 trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a 10.70.43.35 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000bffffffadffffff8 trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a 10.70.43.36 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000dffffff9ffffffff trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a 10.70.43.37 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder1 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000dffffff9ffffffff trusted.glusterfs.volume-id=0x4d0b2e572eb344c09f9c5e2a1835b81a [root@rhsqa-virt06 ~]# Created attachment 789039 [details]
/var/log/glusterfs/bricks/rhs-bricks-cinder1.log logs
# for i in 10.70.43.66 10.70.43.67 10.70.43.68 10.70.43.69 10.70.43.70 10.70.43.61 10.70.43.56 10.70.43.57 10.70.43.53 10.70.43.47 10.70.43.44 10.70.43.42 10.70.43.40 10.70.43.35 10.70.43.36 10.70.43.37; do scp root@$i:/var/log/glusterfs/bricks/rhs-bricks-cinder1.log /tmp/rhs-bricks-cinder1.log.$i; done
# tar -jcvf rhs-bricks-cinder1.logs.tar.bz rhs-bricks-cinder1.log*
Created attachment 789040 [details]
glustershd logs from all nodes
# for i in 10.70.43.66 10.70.43.67 10.70.43.68 10.70.43.69 10.70.43.70 10.70.43.61 10.70.43.56 10.70.43.57 10.70.43.53 10.70.43.47 10.70.43.44 10.70.43.42 10.70.43.40 10.70.43.35 10.70.43.36 10.70.43.37; do scp root@$i:/var/log/glusterfs/glustershd.log /tmp/glustershd.log.$i; done
# tar -jcvf glustershd.logs.tar.bz glustershd.log*
Note: that 10.70.43.42 node was down under mysterious circumstances during this rebalance operation. So you should see alot of the following messages. Don't think it is relevant since its replica pair 10.70.43.44 was alive and healthy but still mentioning it here to help in diagnosis.
<snip>
[2013-08-21 15:01:24.595182] D [name.c:155:client_fill_address_family] 0-cinder-vol-client-11: address-family not specified, guessing it to be inet from (remote-host: 10.70.43.42)
[2013-08-21 15:01:24.601564] D [common-utils.c:248:gf_resolve_ip6] 0-resolver: returning ip-10.70.43.42 (port-24007) for hostname: 10.70.43.42 and port: 24007
[2013-08-21 15:01:24.601850] D [name.c:155:client_fill_address_family] 0-glance-vol-client-11: address-family not specified, guessing it to be inet from (remote-host: 10.70.43.42)
[2013-08-21 15:01:24.607525] D [common-utils.c:248:gf_resolve_ip6] 0-resolver: returning ip-10.70.43.42 (port-24007) for hostname: 10.70.43.42 and port: 24007
[2013-08-21 15:01:24.699835] D [socket.c:615:__socket_shutdown] 0-cinder-vol-client-11: shutdown() returned -1. Transport endpoint is not connected
[2013-08-21 15:01:24.699902] D [socket.c:492:__socket_rwv] 0-cinder-vol-client-11: EOF on socket
[2013-08-21 15:01:24.699923] D [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
</snip>
Created attachment 789051 [details]
Client log
Client:
[root@rhs-hpc-srv1 glusterfs(keystone_admin)]# df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdb1 ext4 1.1T 1.5G 1.1T 1% /
tmpfs tmpfs 24G 0 24G 0% /dev/shm
/dev/sdf1 ext4 30G 211M 28G 1% /boot
/dev/sdc1 ext4 1.1T 28G 1013G 3% /var
/srv/loopback-device/device1
ext4 960M 34M 876M 4% /srv/node/device1
10.70.43.44:glance-vol
fuse.glusterfs 300G 29G 272G 10% /var/lib/glance/images
10.70.43.44:cinder-vol
fuse.glusterfs 400G 29G 372G 8% /var/lib/cinder/volumes/1d12e17a168a458a2db39ca37ee302fd
10.70.43.44:cinder-vol
fuse.glusterfs 400G 29G 372G 8% /var/lib/nova/mnt/1d12e17a168a458a2db39ca37ee302fd
[root@rhs-hpc-srv1 glusterfs(keystone_admin)]#
Log file attached is /var/log/glusterfs/var-lib-cinder-volumes-1d12e17a168a458a2db39ca37ee302fd.log
Other than the 2 (see below)errors, there are no errors reported. Linkfile still exist, as migration of few files have been skipped due to space constraints. Those should not lead to any failures as afr is involved too. [2013-08-21 01:26:54.073212] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x165) [0x7f056f71cf55] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7f056f71ca93] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f056f71c9ae]))) 1-cinder-vol-client-11: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2013-08-21 01:26:11.904694 (xid=0x443x) [2013-08-21 01:26:54.083631] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x165) [0x7f056f71cf55] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7f056f71ca93] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f056f71c9ae]))) 1-cinder-vol-client-11: forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2013-08-21 01:26:11.904704 (xid=0x444x) I shall re-test this issue. But I am not clear in which log level you asking me to set. DEBUG or TRACE. I believe TRACE (since I am trying to reproduce it), please correct me if otherwise. Please re-run the tests with log level set to DEBUG. With TRACE, there is a possibility of it being masked due to excessive logging(If the issue is hit again, and DEBUG does not help, we may need to re-run it with TRACE.) I could reproduce this. Add brick followed by rebalance + mkfs.ext4 (in this case) on the /attached/device causes this. Attaching logs and other info shortly. Fixing summary since per dev: ----T files still will be created if moving a files leads to volume imbalance wrt space consumption, even if stale -----T are left in the back-end, dht can and will clean them up later on during readdirp/access operations (which hit dht xlator) ----T are not an issue here.. if they are missing or stale, dht can clean them up as needed [root@rhsqa-virt06 ~]# for i in 10.70.43.178 10.70.43.73 10.70.42.250 10.70.42.185 10.70.42.172 10.70.42.178 10.70.42.255 10.70.42.157 10.70.42.164 10.70.43.95 10.70.42.169 10.70.42.159 10.70.42.156 10.70.43.92 10.70.42.163 10.70.42.180 10.70.42.154 10.70.42.161 10.70.42.173 10.70.42.170; do ssh $i "(echo $i; getfattr -m - -d -e hex /rhs/bricks/cinder-volume; echo)"; done 10.70.43.178 # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000e38e38e0ffffffff trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 getfattr: Removing leading '/' from absolute path names 10.70.43.73 # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000e38e38e0ffffffff trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 getfattr: Removing leading '/' from absolute path names 10.70.42.250 # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000005555555471c71c6f trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 getfattr: Removing leading '/' from absolute path names 10.70.42.185 # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000005555555471c71c6f trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 getfattr: Removing leading '/' from absolute path names 10.70.42.172 # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000071c71c708e38e38b trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 getfattr: Removing leading '/' from absolute path names 10.70.42.178 # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000071c71c708e38e38b trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 getfattr: Removing leading '/' from absolute path names 10.70.42.255 getfattr: /rhs/bricks/cinder-volume: No such file or directory 10.70.42.157 getfattr: /rhs/bricks/cinder-volume: No such file or directory 10.70.42.164 # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000038e38e3855555553 trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 getfattr: Removing leading '/' from absolute path names 10.70.43.95 # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000038e38e3855555553 trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 getfattr: Removing leading '/' from absolute path names 10.70.42.169 # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000001c71c71c38e38e37 trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 getfattr: Removing leading '/' from absolute path names 10.70.42.159 # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000001c71c71c38e38e37 trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 getfattr: Removing leading '/' from absolute path names 10.70.42.156 # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000001c71c71b trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 getfattr: Removing leading '/' from absolute path names 10.70.43.92 # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000001c71c71b trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 getfattr: Removing leading '/' from absolute path names 10.70.42.163 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000008e38e38caaaaaaa7 trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 10.70.42.180 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000008e38e38caaaaaaa7 trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 10.70.42.154 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000aaaaaaa8c71c71c3 trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 10.70.42.161 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000aaaaaaa8c71c71c3 trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 10.70.42.173 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000c71c71c4e38e38df trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 10.70.42.170 getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/cinder-volume trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000c71c71c4e38e38df trusted.glusterfs.volume-id=0xced7fae7c90c4df6953f7148ee2ada76 [root@rhsqa-virt06 ~]# [root@localhost ~]# gluster vol info cinder-volume
Volume Name: cinder-volume
Type: Distributed-Replicate
Volume ID: ced7fae7-c90c-4df6-953f-7148ee2ada76
Status: Started
Number of Bricks: 9 x 2 = 18
Transport-type: tcp
Bricks:
Brick1: 10.70.43.178:/rhs/bricks/cinder-volume
Brick2: 10.70.43.73:/rhs/bricks/cinder-volume
Brick3: 10.70.42.250:/rhs/bricks/cinder-volume
Brick4: 10.70.42.185:/rhs/bricks/cinder-volume
Brick5: 10.70.42.172:/rhs/bricks/cinder-volume
Brick6: 10.70.42.178:/rhs/bricks/cinder-volume
Brick7: 10.70.42.164:/rhs/bricks/cinder-volume
Brick8: 10.70.43.95:/rhs/bricks/cinder-volume
Brick9: 10.70.42.156:/rhs/bricks/cinder-volume
Brick10: 10.70.43.92:/rhs/bricks/cinder-volume
Brick11: 10.70.42.169:/rhs/bricks/cinder-volume
Brick12: 10.70.42.159:/rhs/bricks/cinder-volume
Brick13: 10.70.42.163:/rhs/bricks/cinder-volume
Brick14: 10.70.42.180:/rhs/bricks/cinder-volume
Brick15: 10.70.42.154:/rhs/bricks/cinder-volume
Brick16: 10.70.42.161:/rhs/bricks/cinder-volume
Brick17: 10.70.42.173:/rhs/bricks/cinder-volume
Brick18: 10.70.42.170:/rhs/bricks/cinder-volume
Options Reconfigured:
diagnostics.brick-log-level: DEBUG
diagnostics.client-log-level: DEBUG
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.owner-uid: 165
storage.owner-gid: 165
[root@localhost ~]#
[root@localhost ~]# gluster volume rebalance cinder-volume status
Node Rebalanced-files size scanned failures skipped status run time in secs
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 4 4.0GB 78 0 3 completed 47.00
10.70.43.73 0 0Bytes 74 0 0 completed 2.00
10.70.42.250 10 10.0GB 88 0 0 completed 101.00
10.70.42.185 0 0Bytes 74 0 0 completed 2.00
10.70.42.172 8 8.0GB 93 0 0 completed 83.00
10.70.42.178 0 0Bytes 74 0 0 completed 2.00
10.70.42.255 0 0Bytes 74 0 0 completed 2.00
10.70.42.157 0 0Bytes 74 0 0 completed 2.00
10.70.42.164 4 4.0GB 88 0 0 completed 243.00
10.70.43.95 0 0Bytes 74 0 0 completed 2.00
10.70.42.169 8 8.0GB 94 0 0 completed 84.00
10.70.42.159 0 0Bytes 74 0 0 completed 2.00
10.70.42.156 0 0Bytes 74 0 2 completed 2.00
10.70.43.92 0 0Bytes 74 0 0 completed 2.00
10.70.42.163 0 0Bytes 74 0 8 completed 2.00
10.70.42.180 0 0Bytes 74 0 0 completed 2.00
10.70.42.154 0 0Bytes 74 0 0 completed 2.00
10.70.42.161 0 0Bytes 74 0 0 completed 2.00
10.70.42.173 0 0Bytes 74 0 0 completed 2.00
10.70.42.170 0 0Bytes 74 0 0 completed 2.00
volume rebalance: cinder-volume: success:
[root@localhost ~]#
Created attachment 790895 [details]
var-lib-cinder-volumes-01cbb41254aca5f79b543409c40afa79.log
Client info and logs attached.
[root@rhs-hpc-srv1 glusterfs(keystone_admin)]# df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 ext4 30G 10G 18G 36% /
tmpfs tmpfs 24G 0 24G 0% /dev/shm
/dev/sdf ext4 1.1T 318M 1.1T 1% /var/log
10.70.43.178:/glance-volume
fuse.glusterfs 300G 22G 279G 8% /var/lib/glance
/srv/loopback-device/device1
ext4 960M 34M 876M 4% /srv/node/device1
10.70.43.178:/cinder-volume
fuse.glusterfs 450G 22G 429G 5% /var/lib/cinder/volumes/01cbb41254aca5f79b543409c40afa79
10.70.43.178:/cinder-volume
fuse.glusterfs 450G 22G 429G 5% /var/lib/nova/mnt/01cbb41254aca5f79b543409c40afa79
[root@rhs-hpc-srv1 glusterfs(keystone_admin)]#
Created attachment 790896 [details]
Screenshot of the instance
On the latest graph of the client, see these network disconnection log messages: [2013-08-27 04:23:18.693638] I [afr-common.c:3953:afr_local_init] 2-cinder-volume-replicate-7: no subvolumes up [2013-08-27 04:23:18.693694] D [dht-common.c:2885:dht_fd_cbk] 2-cinder-volume-dht: subvolume cinder-volume-replicate-7 returned -1 (Transport endpoint is not connected) [2013-08-27 04:23:18.693710] I [afr-common.c:3953:afr_local_init] 2-cinder-volume-replicate-8: no subvolumes up [2013-08-27 04:23:18.693721] D [dht-common.c:2885:dht_fd_cbk] 2-cinder-volume-dht: subvolume cinder-volume-replicate-8 returned -1 (Transport endpoint is not connected) .... [2013-08-27 04:23:18.698527] I [afr-common.c:3953:afr_local_init] 2-cinder-volume-replicate-7: no subvolumes up [2013-08-27 04:23:18.698565] I [afr-common.c:3953:afr_local_init] 2-cinder-volume-replicate-8: no subvolumes up [2013-08-27 04:23:21.693287] I [afr-common.c:3953:afr_local_init] 2-cinder-volume-replicate-7: no subvolumes up [2013-08-27 04:23:21.693342] I [afr-common.c:3953:afr_local_init] 2-cinder-volume-replicate-8: no subvolumes up [2013-08-27 04:23:23.573663] I [afr-common.c:3953:afr_local_init] 2-cinder-volume-replicate-7: no subvolumes up [2013-08-27 04:23:23.573720] I [afr-common.c:3953:afr_local_init] 2-cinder-volume-replicate-8: no subvolumes up More disconnection logs: [2013-08-27 04:23:18.442073] I [dht-layout.c:633:dht_layout_normalize] 2-cinder-volume-dht: found anomalies in /. holes=1 overlaps=1 missing=0 down=2 misc=0 [2013-08-27 04:23:18.443150] D [dht-layout.c:649:dht_layout_normalize] (-->/usr/lib64/glusterfs/3.4.0.22rhs/xlator/protocol/client.so(client3_3_lookup_cbk+0x626) [0x7f5c2aae71f6] (-->/usr/lib64/glusterfs/3.4.0.2 2rhs/xlator/cluster/replicate.so(afr_lookup_cbk+0x520) [0x7f5c2a8abe80] (-->/usr/lib64/glusterfs/3.4.0.22rhs/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x484) [0x7f5c2a633b64]))) 2-cinder-volume-dht: path=/ err=Transport endpoint is not connected on subvol=cinder-volume-replicate-8 [2013-08-27 04:23:18.443181] D [dht-layout.c:649:dht_layout_normalize] (-->/usr/lib64/glusterfs/3.4.0.22rhs/xlator/protocol/client.so(client3_3_lookup_cbk+0x626) [0x7f5c2aae71f6] (-->/usr/lib64/glusterfs/3.4.0.2 2rhs/xlator/cluster/replicate.so(afr_lookup_cbk+0x520) [0x7f5c2a8abe80] (-->/usr/lib64/glusterfs/3.4.0.22rhs/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x484) [0x7f5c2a633b64]))) 2-cinder-volume-dht: path=/ err=Transport endpoint is not connected on subvol=cinder-volume-replicate-7 [2013-08-27 04:23:18.443187] D [dht-common.c:499:dht_lookup_dir_cbk] 2-cinder-volume-dht: fixing assignment on / [2013-08-27 04:23:18.443204] W [dht-selfheal.c:909:dht_selfheal_directory] 2-cinder-volume-dht: 2 subvolumes down -- not fixing The subvolume seems to have come back up few seconds later: [2013-08-27 04:23:33.237097] I [afr-common.c:3795:afr_notify] 2-cinder-volume-replicate-8: Subvolume 'cinder-volume-client-17' came back up; going online. [2013-08-27 04:23:37.315468] I [afr-common.c:3795:afr_notify] 2-cinder-volume-replicate-7: Subvolume 'cinder-volume-client-14' came back up; going online. Looks like the graph switch happened with 'CHILD_CONNECTING' event, and the CHILD_UP came little late. Due to this, we are done with graph switch, but the new graph is not totally active. Created attachment 791431 [details]
sosreport-1-to-9-rhs-nodes.tar.bz
# tar -jcvf sosreport-1-to-9-rhs-nodes.tar.bz sosreport-10.70.42.154-20130828203046-eac2.tar.xz sosreport-10.70.42.156-20130828202709-4c1b.tar.xz sosreport-10.70.42.159-20130828202853-9a92.tar.xz sosreport-10.70.42.161-20130828203137-d166.tar.xz sosreport-10.70.42.163-20130828202932-d450.tar.xz sosreport-10.70.42.164-20130828200908-6e11.tar.xz sosreport-10.70.42.169-20130828202816-adac.tar.xz sosreport-10.70.42.170-20130828203245-f0b1.tar.xz sosreport-10.70.42.172-20130828200505-c1d1.tar.xz
Created attachment 791432 [details]
sosreport-10-to-18-rhs-nodes.tar.bz
# tar -jcvf sosreport-10-to-18-rhs-nodes.tar.bz sosreport-10.70.42.173-20130828203211-12a7.tar.xz sosreport-10.70.42.178-20130828200552-8ff6.tar.xz sosreport-10.70.42.180-20130828203008-670e.tar.xz sosreport-10.70.42.185-20130828200344-0b69.tar.xz sosreport-10.70.42.250-20130828200235-ec2a.tar.xz sosreport-10.70.43.178-20130828200124-b8a2.tar.xz sosreport-10.70.43.73-20130828200201-2c23.tar.xz sosreport-10.70.43.92-20130828202742-cbb3.tar.xz sosreport-10.70.43.95-20130828201036-de7b.tar.xz
Looks like we now know the issue with rebalance and self-heal. There is a patch in the works, but as proposed by the development, the fix is not 'non-intrusive'. Hence would like little more soaking time. Proposed it in Big Bend Readout meeting, and got approval from people to push it to Update1. Patch will be posted and reviewed, tested in upstream in meantime. Are we documenting this as known issue for Big Bend ? <snip> Internal Whiteboard: Big Bend → Big Bend U1 Flags: rhs-2.1.0+ → rhs-2.1.z? </snip> Removing the blocker flag as it seem to be targetted for Update1 now. Shishir, This bug has been identified as a known issue for Big Bend release. Please provide CCFR information in the Doc Text field. Shishir, I have edited the Doc Text field. Kindly review and sign off. Possible fix for this issue is posted upstream @ http://review.gluster.org/#/c/5891/ upstream patch accepted, now posted to downstream @ https://code.engineering.redhat.com/gerrit/#/c/16038/ https://code.engineering.redhat.com/gerrit/#/c/16039/ Per integration bug triage 12/13, request blocker? flag for Corbett if this failedQA. Tested and verified on glusterfs-3.4.0.44rhs-1.el6rhs.x86_64. Files migrate successfully without leaving T-files behind. Pranith, Can you please review the doc text for technical accuracy? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html |