Bug 1972701
| Summary: | Stalld running not running as fifo | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | browsell | 
| Component: | Performance Addon Operator | Assignee: | Martin Sivák <msivak> | 
| Status: | CLOSED ERRATA | QA Contact: | Gowrishankar Rajaiyan <grajaiya> | 
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.8 | CC: | aos-bugs, fsimonce, grajaiya, keyoung, msivak, shajmakh | 
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-26 14:52:10 UTC | Type: | Bug | 
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1973237 | ||
| Bug Blocks: | 1970940 | ||
| 
 
        
          Comment 1
        
        
          Martin Sivák
        
        
        
        
        
          2021-06-16 12:55:18 UTC
        
       
      
      
      
    Oh.. I think I know what happened. RHCOS 8.4 includes stalld and systemd picked up the unit shipped with it instead of the unit that NTO creates. Jirka: Where do you install the systemd unit that you install via NTO? (In reply to Martin Sivák from comment #1) > That is weird. I see NTO should be using this to start stalld: > > ExecStart=/usr/bin/chrt -f 10 /usr/local/bin/stalld --systemd $CLIST $AGGR > $BP $BR $BD $THRESH $LOGGING $FG $PF Where do you see this, Martine? sh-4.4# grep ExecStart= /usr/lib/systemd/system/stalld.service ExecStart=/usr/bin/stalld --systemd $CLIST $AGGR $BP $BR $BD $THRESH $LOGGING $FG $PF NTO no longer ships the stalld unit files as of: https://github.com/openshift/cluster-node-tuning-operator/pull/226 The CoreOS-shipped stalld.service file is now used and that one seems to be missing the "/usr/bin/chrt -f 10" as pointed out by Brent. (In reply to Martin Sivák from comment #3) > Jirka: Where do you install the systemd unit that you install via NTO? Again, NTO no longer installs any systemd stalld unit files, it relies on the CoreOS provided ones. I found it here: https://github.com/openshift/cluster-node-tuning-operator/blob/master/pkg/tuned/host_payload.go#L80 Fixed in 4.9.0-0.nightly-2021-06-18-002931 and above. The next OCP 4.8 nightly should also have the fix as https://github.com/openshift/cluster-node-tuning-operator/pull/237 merged a while ago. Thanks Jirka! Verifying the bug fix on : oc version Client Version: 4.8.0-0.nightly-2021-06-22-192915 Server Version: 4.8.0-0.nightly-2021-06-22-192915 Kubernetes Version: v1.21.0-rc.0+120883f oc get csv NAME DISPLAY VERSION REPLACES PHASE performance-addon-operator.v4.8.0 Performance Addon Operator 4.8.0 Succeeded Verify that stalld runs now as sched_fifo : ps -ef | grep stalld root 7294 1 0 14:16 ? 00:00:00 /usr/local/bin/stalld --systemd -p 1000000000 -r 10000 -d 3 -t 20 --log_syslog --log_kmsg --foreground --pidfile /run/stalld.pid systemctl status stalld # Write a pidfile # ex: PF=--pidfile /run/stalld.pid Environment=PF="--pidfile /run/stalld.pid" ExecStartPre=/usr/local/bin/throttlectl.sh off ExecStart=/usr/bin/chrt -f 10 /usr/local/bin/stalld --systemd $CLIST $AGGR $BP $BR $BD $THRESH $LOGGING > ExecStopPost=/usr/local/bin/throttlectl.sh on Restart=always User=root As it can be noticed above , stalld binary is used now from nto & running with fifo scheduler (fifo flag of chrt is -f) with priority 10. following comment 14: Retrieving the scheduling attributes of the stalld pid, we get : chrt -ap 7294 pid 7294's current scheduling policy: SCHED_FIFO pid 7294's current scheduling priority: 10 & by verifying that the scheduling policy is SCHED_FIFO.  |