Created attachment 918157 [details]
Description of problem: Cinder create volumes works intermittently, create volume sometime fails, Cinder's backend is a Gluster dist' volume.
Version-Release number of selected component (if applicable):
Intermittently, some volumes are created while others are not.
Steps to Reproduce:
1. Configure Cinder to use Gluster volume via packstack
2. Cinder create 1 (or cinder create 1 --display-name or with image-id xxxx)
3. Some volumes are created (state available) while others fails with state error.
Gluster volume is a dist' volume, volume options are:
Both gluster volume servers replay to ping and show mount as ok.
Restarting volume share didn't help
Restarting AIO setup didn't help.
Some volumes are created others fail, attached volume log.
Volumes should be created successfully every time.
Possible Gluster issue rather then RHOS, errors from gluster mnt log:
2014-07-16 05:40:34.598553] E [socket.c:2158:socket_connect_finish] 0-gluster-glance-tshefi-client-0: connection to 10.35.102.17:49493 failed (No route to host)
[2014-07-16 05:40:38.598859] I [rpc-clnt.c:1690:rpc_clnt_reconfig] 0-gluster-glance-tshefi-client-0: changing port to 49493 (from 0)
Note errors are for Glance's volume, which oddly enough works fine, it did however gave me a hunch.
As next troubleshooting step, created a new Gluster dist' volume, with 1 brick only on server which doesn't exhibit these "no route to host" problems. Cinder operations now work without a problem.
Not a blocker as on new volume I can continue Cinder sanity testing.
Re-configuring original volume produces problems again, if this is something we want to check?
This indicates something is not quite right:
WARNING cinder.volume.drivers.glusterfs [-] Exception during mounting GlusterFS share at /var/lib/cinder/mnt/7395c6c5745705b5c77187ca90f13207 is not writable by the Cinder volume service. Snapshot operations will not be supported.
What are the permissions on the mount?
Appears to be caused by network connectivity issues between the Cinder volume server and the GlusterFS server.
Yes I agree with assumption, yet uneasy letting it pass just yet.
I'll try testing another single bricked volume this time on suspected server Orion, as on Tiger server this passed.
The volume options:
server.root-squash on (tested with off also)
[root@tiger share]# ll | grep cinder-tshefi
drwxrwxr-x 3 165 165 4096 Jul 17 10:38 gluster2-cinder-tshefi
drwxrwxr-x 3 165 165 4096 Jul 17 10:17 gluster3-cinder-tshefi
drwxrwxr-x 3 165 165 23 Jul 17 10:35 gluster4-cinder-tshefi
drwxrwxr-x 3 165 165 4096 Jul 17 10:07 gluster-cinder-tshefi
iptables -F on offending Gluster server fixed problem.
Cinder operations now work, continuing test run.