Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1321325

Summary: stompTests.StompTests test_echo(4096, False) ERROR
Product: [oVirt] vdsm Reporter: Sandro Bonazzola <sbonazzo>
Component: CoreAssignee: Piotr Kliczewski <pkliczew>
Status: CLOSED CURRENTRELEASE QA Contact: Pavel Stehlik <pstehlik>
Severity: high Docs Contact:
Priority: unspecified    
Version: ---CC: bugs, lsvaty, mperina, pkliczew, s.kieske, stirabos
Target Milestone: ovirt-4.0.0-betaKeywords: Automation
Target Release: 4.17.999Flags: mperina: ovirt-4.0.0?
rule-engine: planning_ack+
mperina: devel_ack+
rule-engine: testing_ack?
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-26 11:07:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs of the failing job none

Description Sandro Bonazzola 2016-03-25 13:40:49 UTC
Created attachment 1140313 [details]
logs of the failing job

http://jenkins.ovirt.org/job/vdsm_master_check-patch-fc23-x86_64/4389/

Failed on stompTests.StompTests test_echo(4096, False) ERROR

Attached logs for further investigation.

Comment 1 Sandro Bonazzola 2016-04-29 07:44:35 UTC
Raising severity according to mailing list discussion:

Simone:
I'd suggest to investigate this a bit more since it can hide a serious issue.
I'm moving hosted-engine-setup from XMLRPC to JsonRPC and I'm facing
exactly this kind of issue: it seams that some requests got lost and I
just receive a JsonRpcNoResponseError after a long time.
The real issue is that my request never reached VDSM getting lost
somehow in the queuing mechanism.

Piotr:
Simone,
The issue you are seeing is very interesting - a message we add to a deque disappear next time we check and according the log you provided there is no code  accessing the deque. It happens only for one specific message. All the other messages work ok. Can you please gather logs so we could see what is really happening with it?

Adding needinfo on Simone.

Comment 2 Piotr Kliczewski 2016-04-29 07:58:50 UTC
According to the conversation that I had with Simone it seems that the message is not sent due to vdsm being restarted and as a result connection was lost. I have doubts that both issues are connected.

Comment 3 Simone Tiraboschi 2016-04-29 08:04:27 UTC
Yes, the issue I was talking about was due to sending a message on an unconnected client.
By the way I think that in that case quickly trowing an explicit exception instead of relying on the response timeout could really help detecting it.

Comment 4 Piotr Kliczewski 2016-04-29 08:07:54 UTC
I agree, we are missing that. I will add this behavior soon.

Comment 5 Sandro Bonazzola 2016-05-02 09:48:33 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 6 Lukas Svaty 2016-07-26 11:07:11 UTC
This bug was fixed and is slated to be in the upcoming version. As we
are focusing our testing at this phase on severe bugs, this bug was
closed without going through its verification step. If you think this
bug should be verified by QE, please set its severity to high and move
it back to ON_QA