Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1611635

Summary: infra: softserve machines, regression tests fails
Product: [Community] GlusterFS Reporter: Kotresh HR <khiremat>
Component: project-infrastructureAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, gluster-infra, nigelb, rgowdapp
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-13 10:09:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Client logs with TRACE enabled none

Description Kotresh HR 2018-08-02 14:10:28 UTC
Description of problem:
On the machines got from softserve, all the test cases fails to run with fuse mount being failed

Please find the client logs attached.

Version-Release number of selected component (if applicable):
mainline (source installed)

How reproducible:
I had 2/2 on two machines

Steps to Reproduce:
1. Build gluster source code
2. Run any of .t file (prove -v <.t file>)


Actual results:
Testcases failed

Expected results:
Test cases should not fail.

Additional info:

Comment 1 Kotresh HR 2018-08-02 14:11:05 UTC
Created attachment 1472715 [details]
Client logs with TRACE enabled

Comment 2 Nigel Babu 2018-08-02 14:31:10 UTC
Did you follow the very specific and detailed instructions that you need to do?

https://github.com/gluster/softserve/wiki/Running-Regressions-on-clean-Centos-7-machine

After you get the machine, you need to run an ansible playbook on it. Restart and re-run it again to get the last few ipv6 disable fixes. Only then can you run tests.

Comment 3 Kotresh HR 2018-08-02 14:42:15 UTC
Yes, I did follow all the steps.

Look up on root is very trivial. While the other observation is that. If I create volume on cli and mount it. It perfectly works. But if I run the test case, the test case fails at those lines where it mounts the volume.

Comment 4 Raghavendra G 2018-08-02 17:17:20 UTC
[2018-07-31 10:41:38.308584] I [rpc-clnt.c:2087:rpc_clnt_reconfig] 0-master-client-0: changing port to 49152 (from 0)

> Got a valid port 49152 for brick. reconfiguring.

[2018-07-31 10:41:38.308606] T [socket.c:861:__socket_disconnect] 0-master-client-0: disconnecting 0x7f36f0078be0, sock=12
[2018-07-31 10:41:38.308819] T [socket.c:865:__socket_disconnect] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7f3703988368] (--> /usr/local/lib/glusterfs/4.2dev/rpc-transport/socket.so(+0x608e)[0x7f36f810d08e] (--> /usr/local/lib/glusterfs/4.2dev/rpc-transport/socket.so(+0xe02b)[0x7f36f811502b] (--> /usr/local/lib/libgfrpc.so.0(rpc_transport_disconnect+0x96)[0x7f370374a45b] (--> /usr/local/lib/glusterfs/4.2dev/xlator/protocol/client.so(+0x54292)[0x7f36f5e0f292] ))))) 0-master-client-0: tearing down socket connection
[2018-07-31 10:41:38.308877] T [socket.c:3002:socket_event_handler] 0-master-client-0: (sock:12) socket_event_poll_in returned 0
[2018-07-31 10:41:38.308898] T [socket.c:2960:socket_event_handler] 0-master-client-0: client (sock:12) in:1, out:0, err:16
[2018-07-31 10:41:38.308923] T [socket.c:236:socket_dump_info] 0-master-client-0: $$$ client: disconnecting from (af:2,sock:12) 23.253.56.86 non-SSL (errno:0:Success)
[2018-07-31 10:41:38.308938] D [socket.c:3021:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:12) (non-SSL)
[2018-07-31 10:41:38.308959] D [MSGID: 0] [client.c:2242:client_rpc_notify] 0-master-client-0: got RPC_CLNT_DISCONNECT
[2018-07-31 10:41:38.308977] D [MSGID: 0] [client.c:2284:client_rpc_notify] 0-master-client-0: disconnected (skipped notify)
[2018-07-31 10:41:38.308997] T [rpc-clnt.c:404:rpc_clnt_reconnect] 0-master-client-0: attempting reconnect
[2018-07-31 10:41:38.309011] T [socket.c:3409:socket_connect] 0-master-client-0: connecting 0x7f36f0078be0, sock=-1
[2018-07-31 10:41:38.309028] T [name.c:243:af_inet_client_get_remote_sockaddr] 0-master-client-0: option remote-port missing in volume master-client-0. Defaulting to 24007

> Even after getting a valid port for remote brick, why is this defaulting to 24007 again? Something fishy. Need a deeper look. I observed similar patter for client-1 and client-2 too

[2018-07-31 10:41:38.309568] D [MSGID: 0] [common-utils.c:339:gf_resolve_ip6] 0-resolver: returning ip-23.253.56.86 (port-24007) for hostname: builderhrk500.cloud.gluster.org and port: 24007
[2018-07-31 10:41:38.312685] D [MSGID: 0] [common-utils.c:339:gf_resolve_ip6] 0-resolver: returning ip-104.130.69.104 (port-24007) for hostname: builderhrk500.cloud.gluster.org and port: 24007
[2018-07-31 10:41:38.312749] T [socket.c:961:__socket_nodelay] 0-master-client-0: NODELAY enabled for socket 14
[2018-07-31 10:41:38.312779] T [socket.c:1049:__socket_keepalive] 0-master-client-0: Keep-alive enabled for socket: 14, (idle: 20, interval: 2, max-probes: 9, timeout: 0)

Comment 5 Nigel Babu 2018-08-13 10:09:15 UTC
This was because of a bug in ansible somewhere. The /etc/hosts file had entries for an IP that was incorrect. Rather than tracking this down, we're deprecating the old instructions in favor of new ones which will run faster in any case.