Bug 669829

Summary: Snapshot load with DAEMON_LIST only changes doesn't always tell clients to restart
Product: Red Hat Enterprise MRG Reporter: Robert Rati <rrati>
Component: wallabyAssignee: Will Benton <willb>
Status: CLOSED ERRATA QA Contact: Tomas Rusnak <trusnak>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.3CC: iboverma, ltrilety, matt, trusnak
Target Milestone: 1.3.2   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: wallaby-0.10.4-1 Doc Type: Bug Fix
Doc Text:
Loading a wallaby snapshot could have caused a pre-existing configuration for a certain node, at a certain version, to be hidden by an incorrectly-copied configuration for the default group at that same version. Additionally, under certain circumstances, this could have occurred even when a node had a configuration with the same version. Newly-created snapshots now include version information for each node. If a versioned configuration is available for a given node when a snapshot is loaded, the default group configuration is not copied over. Also, wally no longer duplicates default group configurations at a certain version for nodes that already have a configuration at that version. Also, wallaby version 0.10.4 and later identifies and safely deletes spurious versioned configurations upon service startup.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-15 12:15:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
grid_scale_2011-01-12_15:16:05_small_test2
none
grid_scale_baseline_2011-01-12
none
base-db.yaml none

Description Robert Rati 2011-01-14 21:58:45 UTC
Created attachment 473601 [details]
grid_scale_2011-01-12_15:16:05_small_test2

Description of problem:
Wallaby is not informing the configd of any subsystems to restart or reconfigure when switching between 2 snapshots where the only parameter changed (that is attached to a subsystem) is the DAEMON_LIST.  When loading the grid_scale_baseline_2011-01-12 snapshot and activating with ccp, the config receives an update notice and pulls a config, but when it asks for the subsystems affected it receives empty lists for restart and reconfig.

Test setup:
wallaby load grid_scale_baseline_2011-01-12
condor_configure_pool --take-snapshot grid_scale_baseline_2011-01-12
wallaby load grid_scale_2011-01-12_15:16:05_small_test2
condor_configure_pool --take-snapshot grid_scale_2011-01-12_15:16:05_small_test2

Version-Release number of selected component (if applicable):
wallaby-0.10.1-1

How reproducible:
100%

Steps to Reproduce:
1. condor_configure_pool --load-snapshot grid_scale_2011-01-12_15:16:05_small_test2
2. condor_configure_pool --activate
3. Once configd restarts and connects to wallaby
4. condor_configure_pool --load-snapshot grid_scale_baseline_2011-01-12
5. condor_configure_pool --activate
  
Actual results:
Log contains:
01/14 14:42:11 DEBUG: Received a NodeUpdatedNotice
01/14 14:42:11 DEBUG: The event is for this node
01/14 14:42:11 DEBUG: Checking version of configuration
01/14 14:42:11 DEBUG: Performing a checkin with the store
01/14 14:42:11 DEBUG: Checked in with the store
01/14 14:42:11 INFO: Retrieving configuration version "1295034130265672" from the store
01/14 14:42:13 INFO: Retrieved configuration from the store
01/14 14:42:14 DEBUG: Daemons to restart: []
01/14 14:42:14 DEBUG: Daemons to reconfig: []


Expected results:
01/14 14:42:11 DEBUG: Received a NodeUpdatedNotice
01/14 14:42:11 DEBUG: The event is for this node
01/14 14:42:11 DEBUG: Checking version of configuration
01/14 14:42:11 DEBUG: Performing a checkin with the store
01/14 14:42:11 DEBUG: Checked in with the store
01/14 14:42:11 INFO: Retrieving configuration version "1295034130265672" from the store
01/14 14:42:13 INFO: Retrieved configuration from the store
01/14 14:42:14 DEBUG: Daemons to restart: [master]
01/14 14:42:14 DEBUG: Daemons to reconfig: []


Additional info:

Comment 1 Robert Rati 2011-01-14 21:59:34 UTC
Created attachment 473602 [details]
grid_scale_baseline_2011-01-12

Comment 2 Will Benton 2011-01-26 06:18:12 UTC
The cause of this bug was spurious versioned-configuration creation on snapshot load; i.e., when loading a new snapshot, the most recent default-group configuration would be copied to all nodes.  The fix includes several means to solve this problem:

(1) snapshots now include the last_updated_version for a node; if a versioned configuration contains a last_updated_version, we will verify that it exists in the snapshot database and, if so, set last_updated_version when reconstituting the node,
(2) snapshot loads of old-style snapshots (i.e. without versioning information in serialized nodes) will not spuriously duplicate the default node config, and
(3) the wallaby agent will losslessly identify and delete any spurious versioned configurations upon startup and when requested via API.

To reproduce the initial buggy behavior, you can run "rake spec SPEC=spec/bz669829_spec.rb" from the wallaby source repository against an older wallaby release.

Comment 4 Will Benton 2011-01-27 16:07:23 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C:  Loading a wallaby snapshot could cause the versioned configuration for the default group to be copied as a new versioned configuration for individual nodes.  Under certain circumstances, this could happen even when the nodes had configurations with the same version.
C:  Users could observe that a pre-existing configuration for node N at version V was hidden by the spuriously-copied configuration for the default group at version V.
F:  Firstly, newly-created snapshots will include version information for each node; if a versioned configuration is available for a given node at snapshot load time, the default group configuration will not be copied over.  In addition, wallaby will not duplicate default group configurations at a version V for nodes that already have a configuration at version V.  Finally, wallaby versions 0.10.4 and above will identify and safely delete spurious versioned configurations on service startup.
R:  Users should no longer observe this incorrect behavior.  Spurious versioned configurations should be automatically, quickly, and safely removed from the snapshot database on wallaby startup.

Comment 6 Tomas Rusnak 2011-01-31 15:11:52 UTC
I tried to reproduce it on RHEL5/x86 with following packages:

wallaby-0.9.18-2.el5
ruby-1.8.5-5.el5_4.8
rubygems-1.3.0 - not a part of RHEL5 - downloaded from rubyforge
bz669829_spec.rb - downloaded from wallaby repository

Rake tool was installed by gem. To make rake running, is needed to install jeweler package. 

Jeweler is not running on rubygems-1.3.0:

# gem install jeweler
ERROR:  Error installing jeweler:
	bundler requires RubyGems version >= 1.3.6

I can't use newer rubygems because it depends on ruby-1.8.6. It's possible to upgrade ruby, but in this way I can break compatibility with wallaby. 

It seems like circular dependency problem, which exclude me to reproduce it with provided spec file for rake.

Please, could you find another way how to use rake and spec files on stock RHEL5?

Comment 7 Will Benton 2011-01-31 21:52:25 UTC
The easiest way to reproduce this without installing a slew of dependencies is to follow these steps:

1.  Check out the entire wallaby repository (you'll need it).
2.  Delete the contents of the lib directory from your cloned repository.
3.  Install rubygems version 1.3.5
4.  Install rspec-1.2.9 ("gem install rspec --version '1.2.9'")
5.  From within the wallaby repository, run
   ruby -Ispec $(which spec) spec/bz669829_spec.rb -b

This should run against the wallaby libraries you have installed on your system via RPM.

Comment 8 Tomas Rusnak 2011-02-01 09:12:23 UTC
Thank you Will. I tried with rubygems-1.3.5 and seems to be working. In the wallaby repository is missing file spec/base-db.yaml, see output:

# ruby -Ispec $(which spec) spec/bz669829_spec.rb -b
F..

1)
Errno::ENOENT in 'Mrg::Grid::Config::ConfigVersion should not add spurious versioned configs when loading a new snapshot'
No such file or directory - /root/wallaby/spec/base-db.yaml
/root/wallaby/spec/spec_helper.rb:67:in `initialize'
/root/wallaby/spec/spec_helper.rb:67:in `open'
/root/wallaby/spec/spec_helper.rb:67:in `dbtext'
/root/wallaby/spec/spec_helper.rb:55:in `reconstitute_db'
./spec/bz669829_spec.rb:13:
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_methods.rb:70:in `instance_eval'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_methods.rb:70:in `eval_each_fail_fast'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_methods.rb:70:in `each'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_methods.rb:70:in `eval_each_fail_fast'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_group_hierarchy.rb:17:in `run_before_each'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_methods.rb:103:in `run_before_each'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_methods.rb:124:in `before_each_example'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_methods.rb:39:in `execute'
/usr/lib/ruby/1.8/timeout.rb:48:in `timeout'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_methods.rb:37:in `execute'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_group_methods.rb:214:in `run_examples'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_group_methods.rb:212:in `each'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_group_methods.rb:212:in `run_examples'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_group_methods.rb:103:in `run'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/runner/example_group_runner.rb:23:in `run'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/runner/example_group_runner.rb:22:in `each'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/runner/example_group_runner.rb:22:in `run'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/runner/options.rb:151:in `run_examples'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/runner/command_line.rb:9:in `run'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/bin/spec:5:
/usr/bin/spec:19:in `load'
/usr/bin/spec:19:

Finished in 21.622807 seconds

3 examples, 1 failure

Please, could you repair spec file if there is another yaml pool config file to use, or could you add the right one into the repository?

Comment 9 Will Benton 2011-02-01 13:42:40 UTC
Created attachment 476396 [details]
base-db.yaml

Comment 10 Will Benton 2011-02-01 13:43:33 UTC
Sorry, base-db.yaml is usually put in place by rake.  I've attached a copy.

Comment 11 Tomas Rusnak 2011-02-01 14:04:37 UTC
Reproduced with wallaby-0.9.18-2.el5 on RHEL5/x86

# ruby -Ispec $(which spec) spec/bz669829_spec.rb -b
F..

1)
'Mrg::Grid::Config::ConfigVersion should not add spurious versioned configs when loading a new snapshot' FAILED
expected: ["ALLOW_ADMINISTRATOR", "ALLOW_READ", "ALLOW_WRITE", "COLLECTOR_NAME", "CONDOR_DEVELOPERS", "CONDOR_DEVELOPERS_COLLECTOR", "CONDOR_HOST", "DAEMON_LIST", "MASTER", "MASTER.SEC_DEFAULT_AUTHENTICATION_METHODS", "MASTER_ADDRESS_FILE", "SEC_DEFAULT_AUTHENTICATION", "SEC_DEFAULT_AUTHENTICATION_METHODS", "SEC_DEFAULT_CRYPTO_METHODS", "SEC_DEFAULT_NEGOTIATION", "WALLABY_CONFIG_VERSION"],
     got: ["ALLOW_ADMINISTRATOR", "ALLOW_READ", "ALLOW_WRITE", "BENCHMARKTIMER", "COLLECTOR_NAME", "CONDOR_DEVELOPERS", "CONDOR_DEVELOPERS_COLLECTOR", "CONDOR_HOST", "CONSOLE_DEVICES", "CONTINUE", "DAEMON_LIST", "KILL", "MASTER", "MASTER.SEC_DEFAULT_AUTHENTICATION_METHODS", "MASTER_ADDRESS_FILE", "MAXJOBRETIREMENTTIME", "MAX_STARTD_LOG", "MAX_STARTER_LOG", "PERIODIC_CHECKPOINT", "PREEMPT", "RUNBENCHMARKS", "SEC_DEFAULT_AUTHENTICATION", "SEC_DEFAULT_AUTHENTICATION_METHODS", "SEC_DEFAULT_CRYPTO_METHODS", "SEC_DEFAULT_NEGOTIATION", "START", "STARTD", "STARTD_ADDRESS_FILE", "STARTD_ATTRS", "STARTD_DEBUG", "STARTD_JOB_EXPRS", "STARTD_LOG", "STARTER", "STARTER_DEBUG", "STARTER_LIST", "STARTER_LOG", "SUSPEND", "WALLABY_CONFIG_VERSION", "WANT_SUSPEND", "WANT_VACATE"] (using ==)
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/expectations/fail_with.rb:41:in `fail_with'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/matchers/operator_matcher.rb:39:in `fail_with_message'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/matchers/operator_matcher.rb:61:in `__delegate_operator'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/matchers/operator_matcher.rb:51:in `eval_match'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/matchers/operator_matcher.rb:29:in `=='
./spec/bz669829_spec.rb:50:
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_methods.rb:40:in `instance_eval'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_methods.rb:40:in `execute'
/usr/lib/ruby/1.8/timeout.rb:48:in `timeout'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_methods.rb:37:in `execute'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_group_methods.rb:214:in `run_examples'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_group_methods.rb:212:in `each'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_group_methods.rb:212:in `run_examples'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/example/example_group_methods.rb:103:in `run'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/runner/example_group_runner.rb:23:in `run'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/runner/example_group_runner.rb:22:in `each'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/runner/example_group_runner.rb:22:in `run'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/runner/options.rb:151:in `run_examples'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/lib/spec/runner/command_line.rb:9:in `run'
/usr/lib/ruby/gems/1.8/gems/rspec-1.2.9/bin/spec:5:
/usr/bin/spec:19:in `load'
/usr/bin/spec:19:

Finished in 30.549478 seconds

3 examples, 1 failure

Comment 12 Tomas Rusnak 2011-02-02 11:05:18 UTC
Retested over all supported RHEL5/x86, x86_64 with wallaby-0.10.4-2.el5:

# ruby -Ispec $(which spec) spec/bz669829_spec.rb -b
...
Finished in 30.128564 seconds
3 examples, 0 failures

No regressions found.

>>> VERIFIED

Comment 13 Douglas Silas 2011-02-09 15:38:03 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1,3 @@
-C:  Loading a wallaby snapshot could cause the versioned configuration for the default group to be copied as a new versioned configuration for individual nodes.  Under certain circumstances, this could happen even when the nodes had configurations with the same version.
+Loading a wallaby snapshot could have caused a pre-existing configuration for a certain node, at a certain version, to be hidden by an incorrectly-copied configuration for the default group at that same version. Additionally, under certain circumstances, this could have occurred even when a node had a configuration with the same version.
-C:  Users could observe that a pre-existing configuration for node N at version V was hidden by the spuriously-copied configuration for the default group at version V.
+
-F:  Firstly, newly-created snapshots will include version information for each node; if a versioned configuration is available for a given node at snapshot load time, the default group configuration will not be copied over.  In addition, wallaby will not duplicate default group configurations at a version V for nodes that already have a configuration at version V.  Finally, wallaby versions 0.10.4 and above will identify and safely delete spurious versioned configurations on service startup.
+Newly-created snapshots now include version information for each node. If a versioned configuration is available for a given node when a snapshot is loaded, the default group configuration is not copied over. Also, wally no longer duplicates default group configurations at a certain version for nodes that already have a configuration at that version. Also, wallaby version 0.10.4 and later identifies and safely deletes spurious versioned configurations upon service startup.-R:  Users should no longer observe this incorrect behavior.  Spurious versioned configurations should be automatically, quickly, and safely removed from the snapshot database on wallaby startup.

Comment 14 errata-xmlrpc 2011-02-15 12:15:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0217.html