Created attachment 1140313 [details]
logs of the failing job
Failed on stompTests.StompTests test_echo(4096, False) ERROR
Attached logs for further investigation.
Raising severity according to mailing list discussion:
I'd suggest to investigate this a bit more since it can hide a serious issue.
I'm moving hosted-engine-setup from XMLRPC to JsonRPC and I'm facing
exactly this kind of issue: it seams that some requests got lost and I
just receive a JsonRpcNoResponseError after a long time.
The real issue is that my request never reached VDSM getting lost
somehow in the queuing mechanism.
The issue you are seeing is very interesting - a message we add to a deque disappear next time we check and according the log you provided there is no code accessing the deque. It happens only for one specific message. All the other messages work ok. Can you please gather logs so we could see what is really happening with it?
Adding needinfo on Simone.
According to the conversation that I had with Simone it seems that the message is not sent due to vdsm being restarted and as a result connection was lost. I have doubts that both issues are connected.
Yes, the issue I was talking about was due to sending a message on an unconnected client.
By the way I think that in that case quickly trowing an explicit exception instead of relying on the response timeout could really help detecting it.
I agree, we are missing that. I will add this behavior soon.
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
This bug was fixed and is slated to be in the upcoming version. As we
are focusing our testing at this phase on severe bugs, this bug was
closed without going through its verification step. If you think this
bug should be verified by QE, please set its severity to high and move
it back to ON_QA