Bug 841681

Summary: /etc/cron.minutely/stickshift-facts wipes out the facts file for long periods of time
Product: OKD Reporter: Thomas Wiest <twiest>
Component: ContainersAssignee: Mrunal Patel <mpatel>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: high    
Version: 2.xCC: bmeng, jialiu
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: devenv_1920 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-07 20:42:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Thomas Wiest 2012-07-19 20:37:42 UTC
Description of problem:

On some systems, update_yaml can take quite a long time to run. Since stickshift-facts is using output redirection '>' to write to facts.yaml, and since output redirection overwrites the file immediately, that means that the facts.yaml file can be 0 bytes for quite a long period of time.

Also, stickshift-facts doesn't do anything to see if there are multiple stickshift-facts running. If there are, it should just quietly exit (otherwise they'll both update the file which can lead to problems).


-- Proposed fix for the output redirection problem --
/etc/cron.minutely/stickshift-facts should be updated to be:
/usr/libexec/mcollective/update_yaml.rb > /etc/mcollective/facts.yaml.tmp && mv /etc/mcollective/facts.yaml.tmp /etc/mcollective/facts.yaml



Version-Release number of selected component (if applicable):
stickshift-mcollective-agent-0.0.5-1.el6_3.noarch


How reproducible:
very in PROD


Steps to Reproduce:
1. add lots of gears (like ~1000) to a machine
2. watch the /etc/mcollective/facts.yaml file
3. notice that it can stay a 0 length file for long periods of time (like over a minute sometimes) when the stickshift-facts cron job runs

  
Actual results:
the facts.yaml file can be a 0 length file for a long time


Expected results:
facts.yaml should only be 0 length file for as short a period of time as possible.

Comment 1 Mrunal Patel 2012-07-23 18:59:45 UTC
Submitted https://github.com/openshift/crankcase/pull/259

Comment 2 Meng Bo 2012-07-26 02:41:51 UTC
When will the change be merged into the build?

Checked on devenv_1912, it still not merged.

=========

[root@ip-10-28-203-164 bin]# cat /usr/libexec/mcollective/update_yaml.rb
#!/bin/env ruby

require 'facter'
require 'yaml'

puts YAML.dump(Facter.to_hash)


[root@ip-10-28-203-164 bin]# cat /etc/cron.minutely/stickshift-facts
#!/bin/bash
/usr/libexec/mcollective/update_yaml.rb > /etc/mcollective/facts.yaml

Comment 3 Mrunal Patel 2012-07-27 00:49:05 UTC
New pull request has been sent -
https://github.com/openshift/crankcase/pull/286

Comment 4 Johnny Liu 2012-07-30 05:21:09 UTC
Verified this bug on devenv_1920, and PASS.

1. Create a lot of gears, here I copied a lot of gear dirs (like ~1000) in /var/lib/stickshift dir, and increase mem and cup stress via stress tools to create a dummy env (stress --vm 1 --vm-bytes 1G --vm-keep -c 10).
2. Open a terminal to watch the size of /etc/mcollective/facts.yaml file.
# while :; do ls -l /etc/mcollective/facts.yaml; done
3. Run stickshift-facts cron job runs manually
# time /usr/libexec/mcollective/update_yaml.rb /etc/mcollective/facts.yaml
4. Upon the script is running, watch terminal opened in step 2, the size of /etc/mcollective/facts.yaml is never 0.
5. Upon the script is running, try to run this script again.
# ps -ef|grep yaml
root     17987 17982  5 01:20 ?        00:00:00 ruby /usr/libexec/mcollective/update_yaml.rb /etc/mcollective/facts.yaml
root     18046  3528  0 01:20 pts/0    00:00:00 grep yaml

# time /usr/libexec/mcollective/update_yaml.rb /etc/mcollective/facts.yaml
Script /usr/libexec/mcollective/update_yaml.rb is already running

real	0m0.969s
user	0m0.042s
sys	0m0.017s