1497511 – Start VM fails with libvirtError: internal error: qemu unexpectedly closed the monitor

Bug 1497511 - Start VM fails with libvirtError: internal error: qemu unexpectedly closed the monitor

Summary: Start VM fails with libvirtError: internal error: qemu unexpectedly closed th...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	vdsm
Classification:	oVirt
Component:	General
Sub Component:
Version:	4.20.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high with 1 vote
Target Milestone:	ovirt-4.2.3
Target Release:	---
Assignee:	Sahina Bose
QA Contact:	Avihai
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1525098 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-10-01 08:06 UTC by Raz Tamir
Modified:	2018-04-04 11:00 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-04-04 11:00:57 UTC
oVirt Team:	Gluster
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.2? rule-engine: blocker? sabose: planning_ack? sabose: devel_ack? sabose: testing_ack?

Attachments	(Terms of Use)
engine and vdsm logs (220.80 KB, application/x-gzip) 2017-10-01 08:06 UTC, Raz Tamir	no flags	Details
qemu log (4.39 KB, text/plain) 2017-10-02 08:35 UTC, Raz Tamir	no flags	Details
logs (10.85 MB, application/x-gzip) 2018-01-10 10:36 UTC, Avihai	no flags	Details
View All

Description Raz Tamir 2017-10-01 08:06:26 UTC

Created attachment 1332879 [details]
engine and vdsm logs

Description of problem:

In our automation, we see several failures about start VM failed with,

engine.log:
2017-09-30 14:12:46,236+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-22) [vms_syncAction_b646c6ef-d06d-45ad] EVENT_ID: USER_STARTED_VM(153), VM vm_TestCase18894_REST_GLUS_3014123420 was started by admin@internal-authz (Host: host_mixed_2).
2017-09-30 14:12:53,520+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-1) [] VM '7c03609e-da72-41d4-90a5-65efe9a264f2' was reported as Down on VDS '2e5274d9-65d1-4d1c-b5cf-4c5d07c53690'(host_mixed_2)
2017-09-30 14:12:53,523+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-1) [] START, DestroyVDSCommand(HostName = host_mixed_2, DestroyVmVDSCommandParameters:{hostId='2e5274d9-65d1-4d1c-b5cf-4c5d07c53690', vmId='7c03609e-da72-41d4-90a5-65efe9a264f2', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 74698ad4
2017-09-30 14:12:54,734+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-1) [] FINISH, DestroyVDSCommand, log id: 74698ad4
2017-09-30 14:12:54,734+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-1) [] VM '7c03609e-da72-41d4-90a5-65efe9a264f2'(vm_TestCase18894_REST_GLUS_3014123420) moved from 'WaitForLaunch' --> 'Down'
2017-09-30 14:12:54,747+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-1) [] EVENT_ID: VM_DOWN_ERROR(119), VM vm_TestCase18894_REST_GLUS_3014123420 is down with error. Exit message: internal error: qemu unexpectedly closed the monitor: [2017-09-30 11:12:50.057812] E [MSGID: 108006] [afr-common.c:5001:__afr_handle_child_down_event] 0-GE_12_volume02-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
2017-09-30 14:12:54,748+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-1) [] add VM '7c03609e-da72-41d4-90a5-65efe9a264f2'(vm_TestCase18894_REST_GLUS_3014123420) to rerun treatment
2017-09-30 14:12:54,751+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-1) [] Rerun VM '7c03609e-da72-41d4-90a5-65efe9a264f2'. Called from VDS 'host_mixed_2'
2017-09-30 14:12:54,768+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-162) [] EVENT_ID: USER_INITIATED_RUN_VM_FAILED(151), Failed to run VM vm_TestCase18894_REST_GLUS_3014123420 on Host host_mixed_2.


vdsm.log:
2017-09-30 14:12:53,475+0300 ERROR (vm/7c03609e) [virt.vm] (vmId='7c03609e-da72-41d4-90a5-65efe9a264f2') The vm start process failed (vm:866)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 799, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2576, in _run
    dom.createWithFlags(flags)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 125, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 586, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1069, in createWithFlags
    if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
libvirtError: internal error: qemu unexpectedly closed the monitor: [2017-09-30 11:12:50.057812] E [MSGID: 108006] [afr-common.c:5001:__afr_handle_child_down_event] 0-GE_12_volume02-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
2017-09-30T11:12:51.992205Z qemu-kvm: -drive file=gluster://gluster01.scl.lab.tlv.redhat.com/GE_12_volume02/bcc4bcc2-3bdf-4f64-b12e-20dd532a9876/images/6b2fb326-366e-4db6-83d8-3edce3c63a03/f9a12dfb-9278-4853-b296-3dacff38bb18,file.debug=4,format=qcow2,if=none,id=drive-virtio-disk0,serial=6b2fb326-366e-4db6-83d8-3edce3c63a03,cache=none,werror=stop,rerror=stop,aio=threads: Could not read L1 table: Input/output error
2017-09-30 14:12:53,476+0300 INFO  (vm/7c03609e) [virt.vm] (vmId='7c03609e-da72-41d4-90a5-65efe9a264f2') Changed state to Down: internal error: qemu unexpectedly closed the monitor: [2017-09-30 11:12:50.057812] E [MSGID: 108006] [afr-common.c:5001:__afr_handle_child_down_event] 0-GE_12_volume02-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
2017-09-30T11:12:51.992205Z qemu-kvm: -drive file=gluster://gluster01.scl.lab.tlv.redhat.com/GE_12_volume02/bcc4bcc2-3bdf-4f64-b12e-20dd532a9876/images/6b2fb326-366e-4db6-83d8-3edce3c63a03/f9a12dfb-9278-4853-b296-3dacff38bb18,file.debug=4,format=qcow2,if=none,id=drive-virtio-disk0,serial=6b2fb326-366e-4db6-83d8-3edce3c63a03,cache=none,werror=stop,rerror=stop,aio=threads: Could not read L1 table: Input/output error (code=1) (vm:1553)
2017-09-30 14:12:53,520+0300 INFO  (vm/7c03609e) [virt.vm] (vmId='7c03609e-da72-41d4-90a5-65efe9a264f2') Stopping connection (guestagent:429)
2017-09-30 14:12:53,529+0300 INFO  (jsonrpc/3) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:10.35.69.107,35318 (api:46)
2017-09-30 14:12:53,529+0300 INFO  (jsonrpc/3) [virt.vm] (vmId='7c03609e-da72-41d4-90a5-65efe9a264f2') Release VM resources (vm:4762)
2017-09-30 14:12:53,530+0300 WARN  (jsonrpc/3) [virt.vm] (vmId='7c03609e-da72-41d4-90a5-65efe9a264f2') trying to set state to Powering down when already Down (vm:542)
2017-09-30 14:12:53,530+0300 INFO  (jsonrpc/3) [virt.vm] (vmId='7c03609e-da72-41d4-90a5-65efe9a264f2') Stopping connection (guestagent:429)



Version-Release number of selected component (if applicable):
ovirt-engine-4.2.0-0.0.master.20170925172449.git9d850a9.el7.centos.noarch
vdsm-4.20.3-95.git0813890.el7.centos.x86_64

How reproducible:
From time to time

Steps to Reproduce:
1. Start VM
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Allon Mureinik 2017-10-01 09:38:54 UTC

Moving to virt for initial analysis. If there's a storage issue here, we'll need you guys to point us in the right direction.
Thanks!

Comment 2 Michal Skrivanek 2017-10-02 05:52:16 UTC

When you see a qemu failure it is often needed to have qemu.log. Please attach it as well

Comment 3 Raz Tamir 2017-10-02 08:35:28 UTC

Created attachment 1333134 [details]
qemu log

Comment 4 Michal Skrivanek 2017-10-02 11:15:06 UTC

There's no other issue than that L1 table, gluster related. Sahina?

Comment 5 Sahina Bose 2017-10-10 06:47:05 UTC

It does look gluster related - the volume has lost quorum according to the logs
[2017-09-30 11:13:04.830383] E [MSGID: 108006] [afr-common.c:5001:__afr_handle_child_down_event] 0-GE_12_volume02-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.

Was there a network issue at this time? Could you also attach glusterd logs and brick logs from the 3 nodes?

Comment 6 Raz Tamir 2017-10-15 08:14:14 UTC

(In reply to Sahina Bose from comment #5)
> It does look gluster related - the volume has lost quorum according to the
> logs
> [2017-09-30 11:13:04.830383] E [MSGID: 108006]
> [afr-common.c:5001:__afr_handle_child_down_event]
> 0-GE_12_volume02-replicate-0: All subvolumes are down. Going offline until
> atleast one of them comes back up.
> 
> Was there a network issue at this time? Could you also attach glusterd logs
> and brick logs from the 3 nodes?

There wasn't any network issue during that time.
I don't have the requested logs but I will attach them when I'll see this error again in our automation

Comment 7 Sahina Bose 2017-10-23 05:00:42 UTC

Adding back the needinfo for the requested logs.

Comment 8 Raz Tamir 2017-10-23 19:31:29 UTC

This bug discussed on mail thread:
On Mon, Oct 23, 2017 at 2:16 PM, Krutika Dhananjay <kdhananj> wrote:
I checked the logs in the gluster storage pool machines and the machines that are mounting the volume. The only thing that stands out at the moment is frequent connect/disconnect logs between the bricks and mount process.

Adding Raghavendra G and Milind.

Are there any network issues known between the servers and the client machines?

-Krutika



On Mon, Oct 23, 2017 at 11:49 AM, Raz Tamir <ratamir> wrote:


On Oct 21, 2017 15:22, "Raz Tamir" <ratamir> wrote:
Take a look at this https://bugzilla.redhat.com/show_bug.cgi?id=1497511

Seems like same issue 

Sahina, this bug has a needinfo for me. 
All the requested information can be found in this thread.
If anything still missing, let me know 


On Oct 21, 2017 12:00, "Elad Ben Aharon" <ebenahar> wrote:
Hi Ravi, 

Thanks for looking into this.

We're using glusterfs-3.8.15-2.el7.x86_64


I tried the workaround, this doesn't solve the issue.

[root@gluster01 ~]# gluster volume set storage_local_ge2_volume_0  stat-prefetch off  
volume set: success


[2017-10-21 08:52:22.358182] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-storage_local_ge2_volume_0-server: Shutting down connection storage-ge2-vdsm1.scl.lab.tlv.redhat.com-1319-2 
017/10/21-08:52:21:185720-storage_local_ge2_volume_0-client-0-0-0 
[2017-10-21 08:52:25.157006] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-storage_local_ge2_volume_0-server: accepted client from storage-ge2-vdsm1.scl.lab.tlv.redhat.com-1 
8117-2017/10/21-08:52:25:92950-storage_local_ge2_volume_0-client-0-0-0 (version: 3.12.1) 
[2017-10-21 08:52:25.191877] E [MSGID: 113107] [posix.c:1084:posix_seek] 0-storage_local_ge2_volume_0-posix: seek failed on fd 19 length 196768 [No such device or address] 
[2017-10-21 08:52:25.191980] E [MSGID: 115089] [server-rpc-fops.c:2007:server_seek_cbk] 0-storage_local_ge2_volume_0-server: 22: SEEK-2 (13be5b8f-8855-4dd2-b05b-c01507053bcf) ==> (No such d
evice or address) [No such device or address] 
[2017-10-21 08:52:25.197340] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-storage_local_ge2_volume_0-server: disconnecting connection from storage-ge2-vdsm1.scl.lab.tlv.redhat.com-1
8117-2017/10/21-08:52:25:92950-storage_local_ge2_volume_0-client-0-0-0 
[2017-10-21 08:52:25.197470] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-storage_local_ge2_volume_0-server: Shutting down connection storage-ge2-vdsm1.scl.lab.tlv.redhat.com-18117-
2017/10/21-08:52:25:92950-storage_local_ge2_volume_0-client-0-0-0



On Fri, Oct 20, 2017 at 12:08 PM, Ravishankar N <ravishankar> wrote:


On 10/19/2017 01:56 PM, Sahina Bose wrote:
The message indicates a quorum loss. Adding Krutika and Ravi to take a look

On Thu, Oct 19, 2017 at 1:44 PM, Denis Chaplygin <dchaplyg> wrote:
Hello!


It looks like HW storage issues or may be storage misconfiguration. I found following log lines on the gluster01.scl.lab.tlv.redhat.com in gluster_volumes-storage_local_ge2_volume_0.log

2017-10-19 08:03:03.469689] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-storage_local_ge2_volume_0-server: accepted client from storage-ge2-vdsm1.scl.lab.tlv.redhat.com-6360-2017/10/19-08:03:03:385573-storage_local_ge2_volume_0-client-0-0-0 (version: 3.12.1)
[2017-10-19 08:03:03.490477] E [MSGID: 113107] [posix.c:1084:posix_seek] 0-storage_local_ge2_volume_0-posix: seek failed on fd 18 length 196768 [No such device or address]
[2017-10-19 08:03:03.490608] E [MSGID: 115089] [server-rpc-fops.c:2007:server_seek_cbk] 0-storage_local_ge2_volume_0-server: 20: SEEK-2 (7878ea49-99a8-4408-965e-1e21ae088770) ==> (No such device or address) [No such device or address]
[2017-10-19 08:03:03.495805] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-storage_local_ge2_volume_0-server: disconnecting connection from storage-ge2-vdsm1.scl.lab.tlv.redhat.com-6360-2017/10/19-08:03:03:385573-storage_local_ge2_volume_0-client-0-0-0



These messages seem to be similar to what is observed in https://bugzilla.redhat.com/show_bug.cgi?id=1482994. What version of gluster/rhgs is this? Does the work-around mentioned in the BZ doc text (i.e. gluster volume set <volname> stat-prefetch off) solve the problem?
Regards,
Ravi



On Wed, Oct 18, 2017 at 1:08 PM, Elad Ben Aharon <ebenahar> wrote:
+ Sahina and Denis

On Wed, Oct 18, 2017 at 11:57 AM, Elad Ben Aharon <ebenahar> wrote:
Hi Ala,

We're unable to start a VM with disk on Gluster. The volume is replica3


storage-ge2-vdsm1.scl.labl.tlv.redhat.com

2017-10-18 11:52:40,011+0300 DEBUG (monitor/b203db5) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0 /usr/bin/sudo -n /usr/sbin/lvm vgck --config ' devices { preferred_names = ["^/dev
/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/3514f0c5a516008d6|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prior
itise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' b203db55-5dcf-40de-bb8b-b7a1d991c6a9 (cwd None) (commands:70) 
2017-10-18 11:52:40,256+0300 DEBUG (monitor/b203db5) [storage.Misc.excCmd] SUCCESS: <err> = ''; <rc> = 0 (commands:94) 
2017-10-18 11:52:40,629+0300 ERROR (vm/fede7e95) [virt.vm] (vmId='fede7e95-9ad9-4282-8cf3-240cfe17a010') The vm start process failed (vm:894) 
Traceback (most recent call last): 
 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 823, in _startUnderlyingVm 
   self._run() 
 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2650, in _run 
   dom.createWithFlags(flags) 
 File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 125, in wrapper 
   ret = f(*args, **kwargs) 
 File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 586, in wrapper 
   return func(inst, *args, **kwargs) 
 File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1069, in createWithFlags 
   if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) 
libvirtError: internal error: qemu unexpectedly closed the monitor: [2017-10-18 08:52:38.212261] E [MSGID: 108006] [afr-common.c:5001:__afr_handle_child_down_event] 0-storage_local_ge2_volu
me_0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.



Setup details:
storage-ge-02.scl.lab.tlv.redhat.com     admin/123456
storage-ge2-vdsm{1..3}.scl.lab.tlv.redhat.com 


Can you please take a look?

Thanks


On Wed, Oct 18, 2017 at 11:40 AM, Elad Ben Aharon <ebenahar> wrote:
Looking

On Wed, Oct 18, 2017 at 11:29 AM, Tareq Alayan <talayan> wrote:


On Wed, Oct 18, 2017 at 8:25 AM, Artyom Lukianov <alukiano> wrote:
I have a problem with a gluster storage under different automation runs:
https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.1v4-ge-runner-tier3/57 - two VM's that have disks on gluster storage dropped to pause state
You can see ticket https://projects.engineering.redhat.com/browse/RHEVM-3638 for more information.
https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-tier2/30 - VM's that have disks on gluster storage failed to start.
You can see ticket https://projects.engineering.redhat.com/browse/RHEVM-3632 for more information.
So I have two questions:
Did someone encounter some other issues with our gluster storage?
Can we disable gluster storage in our automation runs until we will have some solution to the problem?
I don't think it is a good to idea a disable the gluster. I suggest you reproduce it, and work with one of guys from the storage team to pin point the issue WDYT?.
Elad can you help with that?

--Tareq
  
Best Regards

Comment 9 Krutika Dhananjay 2017-10-24 06:29:12 UTC

I checked the logs in the gluster storage pool machines and the machines that are mounting the volume. The only thing that stands out at the moment is frequent connect/disconnect logs between the bricks and mount process.

Adding Raghavendra G and Milind to cc list.

Are there any network issues known between the servers and the client machines?
Is this issue being hit consistently?

-Krutika

Comment 10 Raz Tamir 2017-10-24 09:19:56 UTC

(In reply to Krutika Dhananjay from comment #9)
> I checked the logs in the gluster storage pool machines and the machines
> that are mounting the volume. The only thing that stands out at the moment
> is frequent connect/disconnect logs between the bricks and mount process.
> 
> Adding Raghavendra G and Milind to cc list.
> 
> Are there any network issues known between the servers and the client
> machines?
No, we didn't experience network issues during our automation executions on the mentioned time
> Is this issue being hit consistently?
Yes, we see this issue for the last 3 - 4 weeks now
> 
> -Krutika

Comment 14 Red Hat Bugzilla Rules Engine 2017-11-24 12:32:49 UTC

This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 18 Raz Tamir 2017-12-12 15:38:46 UTC

*** Bug 1525098 has been marked as a duplicate of this bug. ***

Comment 19 Raz Tamir 2017-12-12 15:41:44 UTC

Hi Milind,

We see this bug more and more.
Can you please take a look?

Comment 20 Sahina Bose 2017-12-27 06:33:06 UTC

ping Milind. Can you respond?

Comment 21 Milind Changire 2018-01-05 09:31:25 UTC

(In reply to Sahina Bose from comment #20)
> ping Milind. Can you respond?

I need access to the live systems for further investigation to comment any further.

Comment 33 Avihai 2018-01-10 10:36:14 UTC

Created attachment 1379450 [details]
logs

Comment 39 Sahina Bose 2018-02-23 07:55:19 UTC

Is there any action needed from gluster side? I think Milind has analysed and provided the recommendations?

Comment 40 Avihai 2018-02-25 09:01:28 UTC

(In reply to Sahina Bose from comment #39)
> Is there any action needed from gluster side? I think Milind has analysed
> and provided the recommendations?

I do not recall changing the gluster server options to get to this state.
See comment #38 for the existing options I saw a bout a month ago.

About Milind recommendations, I think comment #37 answers that point.

Comment 41 Yaniv Kaul 2018-03-08 13:29:24 UTC

Sahina, anything QE need to change on their setup? How can we make progress on this BZ?

Comment 42 Sahina Bose 2018-03-08 13:47:52 UTC

From the conversation above, I think 2 things
1. Ensure that client and server gluster packages are compatible - ensure same version
2. Set transport.socket.listen-backlog in /etc/glusterfs/glusterd.vol to something higher than 151 , in case SYN flooding messages are seen in /var/log/messages

Is the QE still facing issues on setup?

Comment 43 Sahina Bose 2018-03-12 09:41:07 UTC

Could you please ensure the points mentioned in Comment 42 are addressed and let us know if you continue to face issue?

Comment 44 Avihai 2018-03-19 09:11:46 UTC

(In reply to Sahina Bose from comment #42)
> From the conversation above, I think 2 things
> 1. Ensure that client and server gluster packages are compatible - ensure
> same version

About same version on client and server gluster:
Currently its not the same(see details below),Sahina, why is this an issue?

We had to upgrade the server to 3.12.2-1  - Kevin , do you recall why ?

- Gluster storage cluster gluster01/02/03.scl.lab.tlv.redhat.com build 3.12.2-1. 

- Gluster client (VDSM hosts) is 3.8.4-54 -> this is the official gluster build for latest 4.2 downstream (4.2.2.4-0.1 -> http://bob.eng.lab.tlv.redhat.com/builds/4.2/rhv-4.2.2-6/el7Everything/x86_64/)  




> 2. Set transport.socket.listen-backlog in /etc/glusterfs/glusterd.vol to
> something higher than 151 , in case SYN flooding messages are seen in
> /var/log/messages

I set all 3 Gluster storage cluster hosts gluster01/02/03.scl.lab.tlv.redhat.com /etc/glusterfs/glusterd.vol -> 'option transport.socket.listen-backlog 152'

- Should I do additional action like gluster restart to load it with the new config ?


> Is the QE still facing issues on setup?
I'll run the tests again and update

Comment 45 Avihai 2018-03-19 09:42:05 UTC

> Is the QE still facing issues on setup?
I rerun all the test we saw this issue & the issue does not reproduce on current release 4.2.2.4-0.1 but I have a feeling we'll see it again as it's already disappeared & reappeared since this bug was opened.

Comment 46 Sahina Bose 2018-03-22 13:12:06 UTC

Keeping this open but moving this to 4.2.3. Please provide the setup, if the issue reproduces again.

Comment 47 Sahina Bose 2018-04-03 13:08:31 UTC

Did you run into issue again?
If not, we will close this bz and re-open if you encounter it?

Comment 48 Avihai 2018-04-04 08:50:20 UTC

(In reply to Sahina Bose from comment #47)
> Did you run into issue again?
I tested on latest 4.2.2.6-0.1 and issue is not seen .

> If not, we will close this bz and re-open if you encounter it?
Agreed, If we'll see it again we will re-open this issue.

Comment 49 Sahina Bose 2018-04-04 11:00:57 UTC

Closing based on previous comment

Note You need to log in before you can comment on or make changes to this bug.