Bug 1329636

Summary: libvirtError: Argument list too long
Product: Red Hat OpenStack Reporter: Ondrej <ochalups>
Component: openstack-novaAssignee: Eoghan Glynn <eglynn>
Status: CLOSED DUPLICATE QA Contact: Prasanth Anbalagan <panbalag>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.0 (Kilo)CC: abeekhof, apevec, berrange, dasmith, ebarrera, eglynn, fdinitto, fleitner, jruemker, jschwarz, kchamart, libvirt-maint, nalmond, ochalups, rbalakri, rbryant, sbauza, sferdjao, sgordon, skinjo, sputhenp, srevivo, vromanso
Target Milestone: asyncKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-22 12:02:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Comment 23 Andrew Beekhof 2016-05-09 01:56:59 UTC
In order to make any useful comments, I'm going to need logs from the controllers and the bad compute node.  If sos wont work on some of those locations, please grab /var/log/pacemaker.log and /var/log/corosync.log manually.

Comment 24 Sadique Puthen 2016-05-10 04:26:36 UTC
The problematic compute node has been rebooted which corrected all issues with it. Below problems were magically solved by the reboot.

- Failure to take snapshot for instances running on it.
- Failure to schedule new instances to that compute node.
- Live migration to and from this compute node started working.

Still it's not know why this was failing. Logs are on collabshell if you would like to take a look at it.

Compute-1 /var/log: /cases/01618921/logs_community-cmpt01.localdomain.tar.gz/var/log - This is only upto 20th April while still having problems.

All logs in that location before 24th is before reboot.

Comment 25 Andrew Beekhof 2016-05-10 23:52:31 UTC
Ok, I can see that the errors started at Apr 19 05:51:57

Apr 18 12:50:17 [78627] community-cmpt01.localdomain pacemaker_remoted:     info: crm_compress_string:  Compressed 342635 bytes into 14345 (ratio 23:1) in 106ms
Apr 19 05:51:57 [78627] community-cmpt01.localdomain pacemaker_remoted:    error: crm_send_tls: Connection terminated rc = -53
Apr 19 05:51:57 [78627] community-cmpt01.localdomain pacemaker_remoted:    error: crm_send_tls: Connection terminated rc = -10
Apr 19 05:51:57 [78627] community-cmpt01.localdomain pacemaker_remoted:    error: crm_remote_send:      Failed to send remote msg, rc = -10
Apr 19 05:51:57 [78627] community-cmpt01.localdomain pacemaker_remoted:    error: lrmd_tls_send_msg:    Failed to send remote lrmd tls msg, rc = -10
Apr 19 05:51:57 [78627] community-cmpt01.localdomain pacemaker_remoted:  warning: send_client_notify:   Notification of client remote-lrmd-community-cmpt01:3121/29014a03-c5e0-47af-8ec4-23da75b63cec failed
Apr 19 05:51:57 [78627] community-cmpt01.localdomain pacemaker_remoted:     info: lrmd_remote_client_msg:       Client disconnect detected in tls msg dispatcher.
Apr 19 05:51:57 [78627] community-cmpt01.localdomain pacemaker_remoted:     info: cancel_recurring_action:      Cancelling ocf operation nova-compute_monitor_10000

But without the logs from the controllers I can't tell if this correlates with anything the rest of the system was doing.

Comment 31 Alan Pevec 2016-08-22 12:02:40 UTC

*** This bug has been marked as a duplicate of bug 1354601 ***

Comment 32 awaugama 2017-09-07 19:06:50 UTC
Dup -- QE will decide about automating the original