Description of problem: On the machines got from softserve, all the test cases fails to run with fuse mount being failed Please find the client logs attached. Version-Release number of selected component (if applicable): mainline (source installed) How reproducible: I had 2/2 on two machines Steps to Reproduce: 1. Build gluster source code 2. Run any of .t file (prove -v <.t file>) Actual results: Testcases failed Expected results: Test cases should not fail. Additional info:
Created attachment 1472715 [details] Client logs with TRACE enabled
Did you follow the very specific and detailed instructions that you need to do? https://github.com/gluster/softserve/wiki/Running-Regressions-on-clean-Centos-7-machine After you get the machine, you need to run an ansible playbook on it. Restart and re-run it again to get the last few ipv6 disable fixes. Only then can you run tests.
Yes, I did follow all the steps. Look up on root is very trivial. While the other observation is that. If I create volume on cli and mount it. It perfectly works. But if I run the test case, the test case fails at those lines where it mounts the volume.
[2018-07-31 10:41:38.308584] I [rpc-clnt.c:2087:rpc_clnt_reconfig] 0-master-client-0: changing port to 49152 (from 0) > Got a valid port 49152 for brick. reconfiguring. [2018-07-31 10:41:38.308606] T [socket.c:861:__socket_disconnect] 0-master-client-0: disconnecting 0x7f36f0078be0, sock=12 [2018-07-31 10:41:38.308819] T [socket.c:865:__socket_disconnect] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7f3703988368] (--> /usr/local/lib/glusterfs/4.2dev/rpc-transport/socket.so(+0x608e)[0x7f36f810d08e] (--> /usr/local/lib/glusterfs/4.2dev/rpc-transport/socket.so(+0xe02b)[0x7f36f811502b] (--> /usr/local/lib/libgfrpc.so.0(rpc_transport_disconnect+0x96)[0x7f370374a45b] (--> /usr/local/lib/glusterfs/4.2dev/xlator/protocol/client.so(+0x54292)[0x7f36f5e0f292] ))))) 0-master-client-0: tearing down socket connection [2018-07-31 10:41:38.308877] T [socket.c:3002:socket_event_handler] 0-master-client-0: (sock:12) socket_event_poll_in returned 0 [2018-07-31 10:41:38.308898] T [socket.c:2960:socket_event_handler] 0-master-client-0: client (sock:12) in:1, out:0, err:16 [2018-07-31 10:41:38.308923] T [socket.c:236:socket_dump_info] 0-master-client-0: $$$ client: disconnecting from (af:2,sock:12) 23.253.56.86 non-SSL (errno:0:Success) [2018-07-31 10:41:38.308938] D [socket.c:3021:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:12) (non-SSL) [2018-07-31 10:41:38.308959] D [MSGID: 0] [client.c:2242:client_rpc_notify] 0-master-client-0: got RPC_CLNT_DISCONNECT [2018-07-31 10:41:38.308977] D [MSGID: 0] [client.c:2284:client_rpc_notify] 0-master-client-0: disconnected (skipped notify) [2018-07-31 10:41:38.308997] T [rpc-clnt.c:404:rpc_clnt_reconnect] 0-master-client-0: attempting reconnect [2018-07-31 10:41:38.309011] T [socket.c:3409:socket_connect] 0-master-client-0: connecting 0x7f36f0078be0, sock=-1 [2018-07-31 10:41:38.309028] T [name.c:243:af_inet_client_get_remote_sockaddr] 0-master-client-0: option remote-port missing in volume master-client-0. Defaulting to 24007 > Even after getting a valid port for remote brick, why is this defaulting to 24007 again? Something fishy. Need a deeper look. I observed similar patter for client-1 and client-2 too [2018-07-31 10:41:38.309568] D [MSGID: 0] [common-utils.c:339:gf_resolve_ip6] 0-resolver: returning ip-23.253.56.86 (port-24007) for hostname: builderhrk500.cloud.gluster.org and port: 24007 [2018-07-31 10:41:38.312685] D [MSGID: 0] [common-utils.c:339:gf_resolve_ip6] 0-resolver: returning ip-104.130.69.104 (port-24007) for hostname: builderhrk500.cloud.gluster.org and port: 24007 [2018-07-31 10:41:38.312749] T [socket.c:961:__socket_nodelay] 0-master-client-0: NODELAY enabled for socket 14 [2018-07-31 10:41:38.312779] T [socket.c:1049:__socket_keepalive] 0-master-client-0: Keep-alive enabled for socket: 14, (idle: 20, interval: 2, max-probes: 9, timeout: 0)
This was because of a bug in ansible somewhere. The /etc/hosts file had entries for an IP that was incorrect. Rather than tracking this down, we're deprecating the old instructions in favor of new ones which will run faster in any case.