Bug 962557

Summary: Issue with 65k message limit in qpid
Product: Red Hat OpenStack Reporter: Mark McLoughlin <markmc>
Component: openstack-novaAssignee: Russell Bryant <rbryant>
Status: CLOSED ERRATA QA Contact: Attila Fazekas <afazekas>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.0CC: ajeain, dallan, jkt, mlopes, ndipanov
Target Milestone: asyncKeywords: OtherQA
Target Release: 4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-2013.2-0.21.b3.el6ost Doc Type: Bug Fix
Doc Text:
Under certain conditions, it was possible for Compute to send a QPID message that was larger than the original maximum size. This would have resulted in the failure of the qpid message and its corresponding operation. This update removes the size limit in qpid message encoding. Consequently, operations that previously failed due to qpid message size limit will now succeed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-20 00:02:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mark McLoughlin 2013-05-13 20:59:30 UTC
See:

  https://bugs.launchpad.net/nova/+bug/1175808

  Qpid has a limitation where it cannot serialize a Python dict
  containing a string longer than 65535 characters. This can result
  in problems when making a conductor call that returns a large
  structure - for example, instance_get_all_by_host on one of my
  systems returns 38 instances, which when serialized as JSON is
  too long for Qpid to handle.

Sounds like an issue (a) only seen at scale and (b) specific to nova-conductor and, therefore, Grizzly

Comment 3 Russell Bryant 2013-05-30 19:30:10 UTC
This isn't as bad as it looks at first.  It doesn't really affect grizzly, unless it's receiving a message sent by havana.  The fix for havana will likely require a grizzly change to make sure grizzly can still understand havana messages, though.

https://review.openstack.org/#/c/28711/
https://code.launchpad.net/bugs/1175808

Comment 6 Russell Bryant 2013-07-23 20:40:00 UTC
Fixed upstream in havana:

commit 781a8f908cd3e5e69ff8b88d998fa93c48532e15
Author: Andrew Laski <andrew.laski>
Date:   Wed Jun 5 10:02:07 2013 -0400

    Update rpc/impl_qpid.py from oslo
    
    The current qpid driver cannot serialize objects containing strings
    longer than 65535 characters.  This just became a breaking issue when
    the message to scheduler_run_instance went over that limit.  The fix has
    been commited to oslo, so this just syncs it over to Nova.
    
    Bug 1175808
    Bug 1187595
    
    Change-Id: If95c11a7e03c81d89133f6cad0dcbb6d8acb8148

Comment 9 Attila Fazekas 2013-12-02 08:32:41 UTC
I was able to boot 120+ instance on the same hypervisor  one by one, without an ERROR, and without any suspicious log message.

several related command:
nova quota-update --cores -1 $TENANT
nova quota-update --ram -1 $TENANT
nova quota-update --instances -1 $TENANT
a=1; while nova boot server-$a --image cirros-0.3.1-x86_64-uec --flavor 42  --poll ;do  a=$((a+1)); done 

packages:
openstack-nova-conductor-2013.2-5.el6ost.noarch
openstack-nova-scheduler-2013.2-5.el6ost.noarch

The new oslo rpc code is in the python-nova package.

Comment 12 errata-xmlrpc 2013-12-20 00:02:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html