Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1751520

Summary: [logging] filter "RPC call" logs
Product: [oVirt] vdsm Reporter: Germano Veit Michel <gveitmic>
Component: CoreAssignee: Marcin Sobczyk <msobczyk>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Matyáš <pmatyas>
Severity: low Docs Contact:
Priority: low    
Version: 4.30.0CC: bugs, lsurette, michal.skrivanek, mperina, srevivo, ycui
Target Milestone: ovirt-4.4.3Flags: pm-rhel: ovirt-4.4+
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: vdsm-4.40.35 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-22 14:32:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Germano Veit Michel 2019-09-12 06:04:13 UTC
Description of problem:

VDSM logs are full of lines such as below:
2019-09-12 15:50:48,950+1000 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.02 seconds (__init__:312)

In fact, 19.3% of all lines in vdsm.log are messages such as above:

# wc -l /var/log/vdsm/vdsm.log
6839 /var/log/vdsm/vdsm.log

# grep "succeeded in" /var/log/vdsm/vdsm.log | wc -l
1321

This is not really useful and producing too much noise, it is obvious to tell if these calls were quick:

2019-09-12 15:50:49,637+1000 INFO  (jsonrpc/3) [api.host] START getAllVmStats() from=::1,38554 (api:48)
2019-09-12 15:50:49,638+1000 INFO  (jsonrpc/3) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,38554 (api:54)
2019-09-12 15:50:49,638+1000 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312)

I would suggest to log "RPC call...." only if:
* response is error
OR
* elapsed time is higher than N seconds

Note a low value of N would deal with most of these calls easily:

# grep "succeeded in" /var/log/vdsm/vdsm.log | awk -F ' ' '{print $11,$12}' | sort -n -k 1 | uniq -c
    900 0.00 seconds
     96 0.01 seconds
    424 0.02 seconds
     35 0.03 seconds
      1 0.04 seconds

    def _serveRequest(self, ctx, req):
        start_time = monotonic_time()
        response = self._handle_request(req, ctx)
        error = getattr(response, "error", None)
        if error is None:
            response_log = "succeeded"
        else:
            response_log = "failed (error %s)" % (error.code,)
        self.log.info("RPC call %s %s in %.2f seconds",
                      req.method, response_log, monotonic_time() - start_time)
        if response is not None:
            ctx.requestDone(response)

Comment 1 Germano Veit Michel 2019-09-12 06:17:58 UTC
From loaded hypervisor with very old, slow hardware and storage, almost all under 0.1s.

# grep "succeeded in" /var/log/vdsm/vdsm.log | awk -F ' ' '{print $11,$12}' | sort -n -k 1 | uniq -c
    820 0.00 seconds
    131 0.01 seconds
     44 0.02 seconds
      3 0.03 seconds
     22 0.04 seconds
    104 0.05 seconds
     25 0.06 seconds
      3 0.07 seconds
      1 0.11 seconds
      6 3.29 seconds
     15 3.30 seconds
     21 3.31 seconds
     14 3.32 seconds
     10 3.33 seconds
     13 3.34 seconds
      9 3.35 seconds
      2 3.36 seconds
      2 3.37 seconds
      1 3.40 seconds
      1 3.41 seconds

Comment 2 Michal Skrivanek 2019-09-13 05:01:07 UTC
0.1s sounds like a good threshold then

Comment 4 Sandro Bonazzola 2020-11-11 06:45:32 UTC
This bugzilla is included in oVirt 4.4.3 release, published on November 10th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.