Bug 1183764 - [SNAPSHOT]: fails to create on thin LV
Summary: [SNAPSHOT]: fails to create on thin LV
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: snapshot
Version: 3.6.1
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Avra Sengupta
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-19 17:48 UTC by pille
Modified: 2016-08-23 12:43 UTC (History)
6 users (show)

Fixed In Version:
Clone Of: 1178031
Environment:
Last Closed: 2016-08-23 12:43:33 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description pille 2015-01-19 17:48:27 UTC
Description of problem:

creating a distributed shapshot on thin-provisioned LVs fails.


Steps to Reproduce:
1. create bricks on thin-LV
2. create distributed volume
3. gluster snapshot ...

Actual results:
% time sudo gluster snapshot create storage_snap storage_www                                                                                                                                                                                                                                      
snapshot create: failed: Commit failed on localhost. Please check log file for details.
Snapshot command failed
sudo gluster snapshot create storage_snap storage_www  0.03s user 0.01s system 0% cpu 7.049 total

for each try i get a LV left on the node where i issue the above command:
% sudo lvs                                                  
  LV                                 VG       Attr      LSize  Pool          Origin Data%  Move Log Copy%  Convert
  acc0f3d72e2940d7ba28c9485a16c447_0 vg_crypt Vwi-a-tz- 42.00t brickthinpool data     0.01                        
  b66cf2b9867b49308b6352ddbe6894d0_0 vg_crypt Vwi-a-tz- 42.00t brickthinpool data     0.01                        
  brickthinpool                      vg_crypt twi-a-tz- 45.00t                        0.02                        
  data                               vg_crypt Vwi-aotz- 42.00t brickthinpool          0.01                        
  e8f75cf4fac74027b59545b72646e1dd_0 vg_crypt Vwi-a-tz- 42.00t brickthinpool data     0.01                        

Expected results:

snapshot is taken.


Additional info:

the volume contains one file and both brick hosts are idle.
when i run the command on another brick host, i get similar results.

log:

[2015-01-19 17:38:50.439886] D [cli.c:612:cli_rpc_init] 0-cli: Connecting to glusterd using default socket
[2015-01-19 17:38:50.439921] D [rpc-clnt.c:972:rpc_clnt_connection_init] 0-glusterfs: defaulting frame-timeout to 30mins
[2015-01-19 17:38:50.439929] D [rpc-clnt.c:986:rpc_clnt_connection_init] 0-glusterfs: disable ping-timeout
[2015-01-19 17:38:50.439938] D [rpc-transport.c:262:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib/x86_64-linux-gnu/glusterfs/3.6.1/rpc-transport/socket.so
[2015-01-19 17:38:50.440259] D [socket.c:3684:socket_init] 0-glusterfs: disabling nodelay
[2015-01-19 17:38:50.440270] D [socket.c:3799:socket_init] 0-glusterfs: SSL support on the I/O path is NOT enabled
[2015-01-19 17:38:50.440275] D [socket.c:3802:socket_init] 0-glusterfs: SSL support for glusterd is NOT enabled
[2015-01-19 17:38:50.440280] D [socket.c:3819:socket_init] 0-glusterfs: using system polling thread
[2015-01-19 17:38:50.440288] T [rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
[2015-01-19 17:38:50.440294] T [socket.c:2871:socket_connect] 0-glusterfs: connecting 0x7f5aaff0f5a0, state=0 gen=0 sock=-1
[2015-01-19 17:38:50.440301] T [name.c:290:af_unix_client_get_remote_sockaddr] 0-glusterfs: using connect-path /var/run/glusterd.socket
[2015-01-19 17:38:50.440310] T [name.c:106:af_unix_client_bind] 0-glusterfs: bind-path not specified for unix socket, letting connect to assign default value
[2015-01-19 17:38:50.440359] D [rpc-clnt.c:972:rpc_clnt_connection_init] 0-glusterfs: defaulting frame-timeout to 30mins
[2015-01-19 17:38:50.440367] D [rpc-clnt.c:986:rpc_clnt_connection_init] 0-glusterfs: disable ping-timeout
[2015-01-19 17:38:50.440375] D [rpc-transport.c:262:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib/x86_64-linux-gnu/glusterfs/3.6.1/rpc-transport/socket.so
[2015-01-19 17:38:50.440397] D [socket.c:3799:socket_init] 0-glusterfs: SSL support on the I/O path is NOT enabled
[2015-01-19 17:38:50.440402] D [socket.c:3802:socket_init] 0-glusterfs: SSL support for glusterd is NOT enabled
[2015-01-19 17:38:50.440407] D [socket.c:3819:socket_init] 0-glusterfs: using system polling thread
[2015-01-19 17:38:50.440427] T [rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
[2015-01-19 17:38:50.440432] T [socket.c:2871:socket_connect] 0-glusterfs: connecting 0x7f5aaff149f0, state=0 gen=0 sock=-1
[2015-01-19 17:38:50.440438] T [name.c:290:af_unix_client_get_remote_sockaddr] 0-glusterfs: using connect-path /tmp/quotad.socket
[2015-01-19 17:38:50.440446] T [name.c:106:af_unix_client_bind] 0-glusterfs: bind-path not specified for unix socket, letting connect to assign default value
[2015-01-19 17:38:50.440463] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.440477] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.440484] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.440490] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.440496] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.440503] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.440511] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.440519] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.440526] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.440534] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.440539] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.440545] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.440551] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.440557] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.440564] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474027] D [cli-cmd-volume.c:1799:cli_check_gsync_present] 0-cli: Returning 0
[2015-01-19 17:38:50.474069] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474080] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474094] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474106] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474115] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474123] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474131] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474137] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474145] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474151] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474159] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474166] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474171] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474176] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474182] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474189] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474195] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474201] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474207] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474213] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474219] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474224] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474230] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474235] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474242] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474248] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474267] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474272] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474278] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474284] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474292] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474298] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474304] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474310] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474317] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474326] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474332] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474338] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474344] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
[2015-01-19 17:38:50.474382] T [cli.c:264:cli_rpc_notify] 0-glusterfs: got RPC_CLNT_CONNECT
[2015-01-19 17:38:50.474394] T [cli-quotad-client.c:94:cli_quotad_notify] 0-glusterfs: got RPC_CLNT_CONNECT
[2015-01-19 17:38:50.474400] I [socket.c:2344:socket_event_handler] 0-transport: disconnecting now
[2015-01-19 17:38:50.474414] T [cli-quotad-client.c:100:cli_quotad_notify] 0-glusterfs: got RPC_CLNT_DISCONNECT
[2015-01-19 17:38:50.474459] T [rpc-clnt.c:1381:rpc_clnt_record] 0-glusterfs: Auth Info: pid: 0, uid: 0, gid: 0, owner: 
[2015-01-19 17:38:50.474479] T [rpc-clnt.c:1238:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 264, payload: 200, rpc hdr: 64
[2015-01-19 17:38:50.474704] T [socket.c:2863:socket_connect] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f5aae800c96] (--> /usr/lib/x86_64-linux-gnu/glusterfs/3.6.1/rpc-transport/socket.so(+0x5d79)[0x7f5aab904d79] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_submit
+0x3a4)[0x7f5aae17cde4] (--> gluster(cli_submit_request+0x149)[0x7f5aaecb1009] (--> gluster(cli_cmd_submit+0x7f)[0x7f5aaecb2bdf] ))))) 0-glusterfs: connect () called on transport already connected
[2015-01-19 17:38:50.474732] T [rpc-clnt.c:1573:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x1 Program: Gluster CLI, ProgVers: 2, Proc: 39) to rpc-transport (glusterfs)
[2015-01-19 17:38:50.474743] D [rpc-clnt-ping.c:231:rpc_clnt_start_ping] 0-glusterfs: ping timeout is 0, returning
[2015-01-19 17:38:53.440595] T [rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
[2015-01-19 17:38:53.440840] T [socket.c:2863:socket_connect] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f5aae800c96] (--> /usr/lib/x86_64-linux-gnu/glusterfs/3.6.1/rpc-transport/socket.so(+0x5d79)[0x7f5aab904d79] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_reconn
ect+0xb9)[0x7f5aae17b339] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_timer_proc+0xd0)[0x7f5aae81ac90] (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f5aadbf3182] ))))) 0-glusterfs: connect () called on transport already connected
[2015-01-19 17:38:53.440855] T [rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
[2015-01-19 17:38:53.440861] T [socket.c:2871:socket_connect] 0-glusterfs: connecting 0x7f5aaff149f0, state=0 gen=0 sock=-1
[2015-01-19 17:38:53.440871] T [name.c:290:af_unix_client_get_remote_sockaddr] 0-glusterfs: using connect-path /tmp/quotad.socket
[2015-01-19 17:38:53.440884] T [name.c:106:af_unix_client_bind] 0-glusterfs: bind-path not specified for unix socket, letting connect to assign default value
[2015-01-19 17:38:53.440941] T [cli-quotad-client.c:94:cli_quotad_notify] 0-glusterfs: got RPC_CLNT_CONNECT
[2015-01-19 17:38:53.440959] I [socket.c:2344:socket_event_handler] 0-transport: disconnecting now
[2015-01-19 17:38:53.440978] T [cli-quotad-client.c:100:cli_quotad_notify] 0-glusterfs: got RPC_CLNT_DISCONNECT
[2015-01-19 17:38:56.441116] T [rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
[2015-01-19 17:38:56.441361] T [socket.c:2863:socket_connect] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f5aae800c96] (--> /usr/lib/x86_64-linux-gnu/glusterfs/3.6.1/rpc-transport/socket.so(+0x5d79)[0x7f5aab904d79] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_reconn
ect+0xb9)[0x7f5aae17b339] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_timer_proc+0xd0)[0x7f5aae81ac90] (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f5aadbf3182] ))))) 0-glusterfs: connect () called on transport already connected
[2015-01-19 17:38:56.441377] T [rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
[2015-01-19 17:38:56.441383] T [socket.c:2871:socket_connect] 0-glusterfs: connecting 0x7f5aaff149f0, state=0 gen=0 sock=-1
[2015-01-19 17:38:56.441392] T [name.c:290:af_unix_client_get_remote_sockaddr] 0-glusterfs: using connect-path /tmp/quotad.socket
[2015-01-19 17:38:56.441403] T [name.c:106:af_unix_client_bind] 0-glusterfs: bind-path not specified for unix socket, letting connect to assign default value
[2015-01-19 17:38:56.441457] T [cli-quotad-client.c:94:cli_quotad_notify] 0-glusterfs: got RPC_CLNT_CONNECT
[2015-01-19 17:38:56.441475] I [socket.c:2344:socket_event_handler] 0-transport: disconnecting now
[2015-01-19 17:38:56.441493] T [cli-quotad-client.c:100:cli_quotad_notify] 0-glusterfs: got RPC_CLNT_DISCONNECT
[2015-01-19 17:38:57.482590] T [rpc-clnt.c:660:rpc_clnt_reply_init] 0-glusterfs: received rpc message (RPC XID: 0x1 Program: Gluster CLI, ProgVers: 2, Proc: 39) from rpc-transport (glusterfs)
[2015-01-19 17:38:57.482696] D [cli-cmd.c:384:cli_cmd_submit] 0-cli: Returning -1
[2015-01-19 17:38:57.482714] E [cli-rpc-ops.c:9189:gf_cli_snapshot] 0-cli: cli_to_glusterd for snapshot failed
[2015-01-19 17:38:57.482721] D [cli-rpc-ops.c:9234:gf_cli_snapshot] 0-cli: Returning -1
[2015-01-19 17:38:57.482744] D [cli-cmd-snapshot.c:75:cli_cmd_snapshot_cbk] 0-cli: frame->local is not NULL (0x7f5a9c000b80)
[2015-01-19 17:38:57.482755] I [input.c:36:cli_batch] 0-: Exiting with: -1

Comment 1 Vijay Bellur 2015-01-19 17:51:24 UTC
Can you also provide glusterd logs from all nodes?

Comment 3 Vijay Bellur 2015-01-19 18:14:18 UTC
Comment from pille <pille+redhat+bugzilla>:

that's a good hint.
reading this, i have to add that /run is a tmpfs without xattr.

node 1) [local]
[2015-01-19 17:38:57.269064] E [glusterd-snapshot.c:3997:glusterd_snap_brick_create] 0-management: Failed to set extended attribute trusted.glusterfs.volume-id on /run/gluster/snaps/b66cf2b9867b49308b6352ddbe6894d0/brick2. Reason: No data available, snap: b66cf2b9867b49308b6352ddbe6894d0
[2015-01-19 17:38:57.269093] W [glusterd-snapshot.c:4004:glusterd_snap_brick_create] 0-management: unmounting the snap brick mount /run/gluster/snaps/b66cf2b9867b49308b6352ddbe6894d0/brick2
[2015-01-19 17:38:57.287664] E [glusterd-snapshot.c:4309:glusterd_take_brick_snapshot] 0-management: not able to create the brick for the snap storage_snap, volume b66cf2b9867b49308b6352ddbe6894d0
[2015-01-19 17:38:57.287686] E [glusterd-snapshot.c:5366:glusterd_take_brick_snapshot_task] 0-management: Failed to take backend snapshot for brick 2a.gluster.xxxxx.com:/run/gluster/snaps/b66cf2b9867b49308b6352ddbe6894d0/brick2 volume(b66cf2b9867b49308b6352ddbe6894d0)
[2015-01-19 17:38:57.287719] E [glusterd-snapshot.c:5503:glusterd_schedule_brick_snapshot] 0-management: Failed to create snapshot
[2015-01-19 17:38:57.287739] E [glusterd-snapshot.c:5621:glusterd_snapshot_create_commit] 0-management: Failed to take backend snapshot storage_snap
[2015-01-19 17:38:57.287790] W [glusterd-snapshot.c:2514:glusterd_snap_volume_remove] 0-management: Failed to remove lvm snapshot volume b66cf2b9867b49308b6352ddbe6894d0
[2015-01-19 17:38:57.288194] E [glusterd-snapshot.c:6792:glusterd_snapshot] 0-management: Failed to create snapshot
[2015-01-19 17:38:57.288204] W [glusterd-mgmt.c:223:gd_mgmt_v3_commit_fn] 0-management: Snapshot Commit Failed
[2015-01-19 17:38:57.288210] E [glusterd-mgmt.c:1139:glusterd_mgmt_v3_commit] 0-management: Commit failed for operation Snapshot on local node
[2015-01-19 17:38:57.288216] E [glusterd-mgmt.c:1906:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Commit Op Failed
[2015-01-19 17:38:57.481014] I [glusterd-snapshot.c:5278:glusterd_do_snap_cleanup] 0-management: Snapshot (storage_snap) does not exist

node 2)
[2015-01-19 17:38:57.486189] I [glusterd-snapshot.c:5278:glusterd_do_snap_cleanup] 0-management: Snapshot (storage_snap) does not exist

Comment 4 pille 2015-01-19 19:17:13 UTC
btw: i've created on all nodes /run/gluster as xfs.
the issue including the log stays the same.
the snaps subdirectory was created.

Comment 5 pille 2015-03-20 14:06:55 UTC
so what's the progress on this?

i've had snapshots running in the meantime, but after an issue with the LVM layer (thinpool's tmeta ran full on 3/4 nodes) i see this behavior again on one node.
the LVM issue was resolved by lvextending the tmeta and gluster eventually worked again as if nothing happened. but i cannot take any new snapshots.

i'm on 3.6.2 now.

Comment 6 rjoseph 2016-08-23 12:43:33 UTC
This bug is being closed as GlusterFS-3.6 is nearing its End-Of-Life and only important security bugs will be fixed. If you still face this bug with the newer GlusterFS versions, please open a new bug.


Note You need to log in before you can comment on or make changes to this bug.