1611635 – infra: softserve machines, regression tests fails

Bug 1611635 - infra: softserve machines, regression tests fails

Summary: infra: softserve machines, regression tests fails

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	project-infrastructure
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-08-02 14:10 UTC by Kotresh HR
Modified:	2018-08-13 10:09 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-08-13 10:09:15 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Client logs with TRACE enabled (109.45 KB, text/plain) 2018-08-02 14:11 UTC, Kotresh HR	no flags	Details
View All

Description Kotresh HR 2018-08-02 14:10:28 UTC

Description of problem:
On the machines got from softserve, all the test cases fails to run with fuse mount being failed

Please find the client logs attached.

Version-Release number of selected component (if applicable):
mainline (source installed)

How reproducible:
I had 2/2 on two machines

Steps to Reproduce:
1. Build gluster source code
2. Run any of .t file (prove -v <.t file>)


Actual results:
Testcases failed

Expected results:
Test cases should not fail.

Additional info:

Comment 1 Kotresh HR 2018-08-02 14:11:05 UTC

Created attachment 1472715 [details]
Client logs with TRACE enabled

Comment 2 Nigel Babu 2018-08-02 14:31:10 UTC

Did you follow the very specific and detailed instructions that you need to do?

https://github.com/gluster/softserve/wiki/Running-Regressions-on-clean-Centos-7-machine

After you get the machine, you need to run an ansible playbook on it. Restart and re-run it again to get the last few ipv6 disable fixes. Only then can you run tests.

Comment 3 Kotresh HR 2018-08-02 14:42:15 UTC

Yes, I did follow all the steps.

Look up on root is very trivial. While the other observation is that. If I create volume on cli and mount it. It perfectly works. But if I run the test case, the test case fails at those lines where it mounts the volume.

Comment 4 Raghavendra G 2018-08-02 17:17:20 UTC

[2018-07-31 10:41:38.308584] I [rpc-clnt.c:2087:rpc_clnt_reconfig] 0-master-client-0: changing port to 49152 (from 0)

> Got a valid port 49152 for brick. reconfiguring.

[2018-07-31 10:41:38.308606] T [socket.c:861:__socket_disconnect] 0-master-client-0: disconnecting 0x7f36f0078be0, sock=12
[2018-07-31 10:41:38.308819] T [socket.c:865:__socket_disconnect] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7f3703988368] (--> /usr/local/lib/glusterfs/4.2dev/rpc-transport/socket.so(+0x608e)[0x7f36f810d08e] (--> /usr/local/lib/glusterfs/4.2dev/rpc-transport/socket.so(+0xe02b)[0x7f36f811502b] (--> /usr/local/lib/libgfrpc.so.0(rpc_transport_disconnect+0x96)[0x7f370374a45b] (--> /usr/local/lib/glusterfs/4.2dev/xlator/protocol/client.so(+0x54292)[0x7f36f5e0f292] ))))) 0-master-client-0: tearing down socket connection
[2018-07-31 10:41:38.308877] T [socket.c:3002:socket_event_handler] 0-master-client-0: (sock:12) socket_event_poll_in returned 0
[2018-07-31 10:41:38.308898] T [socket.c:2960:socket_event_handler] 0-master-client-0: client (sock:12) in:1, out:0, err:16
[2018-07-31 10:41:38.308923] T [socket.c:236:socket_dump_info] 0-master-client-0: $$$ client: disconnecting from (af:2,sock:12) 23.253.56.86 non-SSL (errno:0:Success)
[2018-07-31 10:41:38.308938] D [socket.c:3021:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:12) (non-SSL)
[2018-07-31 10:41:38.308959] D [MSGID: 0] [client.c:2242:client_rpc_notify] 0-master-client-0: got RPC_CLNT_DISCONNECT
[2018-07-31 10:41:38.308977] D [MSGID: 0] [client.c:2284:client_rpc_notify] 0-master-client-0: disconnected (skipped notify)
[2018-07-31 10:41:38.308997] T [rpc-clnt.c:404:rpc_clnt_reconnect] 0-master-client-0: attempting reconnect
[2018-07-31 10:41:38.309011] T [socket.c:3409:socket_connect] 0-master-client-0: connecting 0x7f36f0078be0, sock=-1
[2018-07-31 10:41:38.309028] T [name.c:243:af_inet_client_get_remote_sockaddr] 0-master-client-0: option remote-port missing in volume master-client-0. Defaulting to 24007

> Even after getting a valid port for remote brick, why is this defaulting to 24007 again? Something fishy. Need a deeper look. I observed similar patter for client-1 and client-2 too

[2018-07-31 10:41:38.309568] D [MSGID: 0] [common-utils.c:339:gf_resolve_ip6] 0-resolver: returning ip-23.253.56.86 (port-24007) for hostname: builderhrk500.cloud.gluster.org and port: 24007
[2018-07-31 10:41:38.312685] D [MSGID: 0] [common-utils.c:339:gf_resolve_ip6] 0-resolver: returning ip-104.130.69.104 (port-24007) for hostname: builderhrk500.cloud.gluster.org and port: 24007
[2018-07-31 10:41:38.312749] T [socket.c:961:__socket_nodelay] 0-master-client-0: NODELAY enabled for socket 14
[2018-07-31 10:41:38.312779] T [socket.c:1049:__socket_keepalive] 0-master-client-0: Keep-alive enabled for socket: 14, (idle: 20, interval: 2, max-probes: 9, timeout: 0)

Comment 5 Nigel Babu 2018-08-13 10:09:15 UTC

This was because of a bug in ansible somewhere. The /etc/hosts file had entries for an IP that was incorrect. Rather than tracking this down, we're deprecating the old instructions in favor of new ones which will run faster in any case.

Note You need to log in before you can comment on or make changes to this bug.