Description of problem: On some systems, update_yaml can take quite a long time to run. Since stickshift-facts is using output redirection '>' to write to facts.yaml, and since output redirection overwrites the file immediately, that means that the facts.yaml file can be 0 bytes for quite a long period of time. Also, stickshift-facts doesn't do anything to see if there are multiple stickshift-facts running. If there are, it should just quietly exit (otherwise they'll both update the file which can lead to problems). -- Proposed fix for the output redirection problem -- /etc/cron.minutely/stickshift-facts should be updated to be: /usr/libexec/mcollective/update_yaml.rb > /etc/mcollective/facts.yaml.tmp && mv /etc/mcollective/facts.yaml.tmp /etc/mcollective/facts.yaml Version-Release number of selected component (if applicable): stickshift-mcollective-agent-0.0.5-1.el6_3.noarch How reproducible: very in PROD Steps to Reproduce: 1. add lots of gears (like ~1000) to a machine 2. watch the /etc/mcollective/facts.yaml file 3. notice that it can stay a 0 length file for long periods of time (like over a minute sometimes) when the stickshift-facts cron job runs Actual results: the facts.yaml file can be a 0 length file for a long time Expected results: facts.yaml should only be 0 length file for as short a period of time as possible.
Submitted https://github.com/openshift/crankcase/pull/259
When will the change be merged into the build? Checked on devenv_1912, it still not merged. ========= [root@ip-10-28-203-164 bin]# cat /usr/libexec/mcollective/update_yaml.rb #!/bin/env ruby require 'facter' require 'yaml' puts YAML.dump(Facter.to_hash) [root@ip-10-28-203-164 bin]# cat /etc/cron.minutely/stickshift-facts #!/bin/bash /usr/libexec/mcollective/update_yaml.rb > /etc/mcollective/facts.yaml
New pull request has been sent - https://github.com/openshift/crankcase/pull/286
Verified this bug on devenv_1920, and PASS. 1. Create a lot of gears, here I copied a lot of gear dirs (like ~1000) in /var/lib/stickshift dir, and increase mem and cup stress via stress tools to create a dummy env (stress --vm 1 --vm-bytes 1G --vm-keep -c 10). 2. Open a terminal to watch the size of /etc/mcollective/facts.yaml file. # while :; do ls -l /etc/mcollective/facts.yaml; done 3. Run stickshift-facts cron job runs manually # time /usr/libexec/mcollective/update_yaml.rb /etc/mcollective/facts.yaml 4. Upon the script is running, watch terminal opened in step 2, the size of /etc/mcollective/facts.yaml is never 0. 5. Upon the script is running, try to run this script again. # ps -ef|grep yaml root 17987 17982 5 01:20 ? 00:00:00 ruby /usr/libexec/mcollective/update_yaml.rb /etc/mcollective/facts.yaml root 18046 3528 0 01:20 pts/0 00:00:00 grep yaml # time /usr/libexec/mcollective/update_yaml.rb /etc/mcollective/facts.yaml Script /usr/libexec/mcollective/update_yaml.rb is already running real 0m0.969s user 0m0.042s sys 0m0.017s