Bug 1643250 - EVM server won't start after restoring DB
Summary: EVM server won't start after restoring DB
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Appliance
Version: 5.10.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.10.0
Assignee: Keenan Brock
QA Contact: Sudhir Mallamprabhakara
Red Hat CloudForms Documentation
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-25 18:13 UTC by Shveta
Modified: 2019-01-10 15:00 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-10 14:01:39 UTC
Category: ---
Cloudforms Team: CFME Core
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Shveta 2018-10-25 18:13:28 UTC
Description of problem: Restore DB mentioned in BZ 
https://bugzilla.redhat.com/show_bug.cgi?id=1641012. 
Stop evm server to restore DB 


Version-Release number of selected component (if applicable):
5.10.0.21.20181023151612_4c11b0d

How reproducible:


Steps to Reproduce:
1. Create a 5.10 appliance .
2. Stop evm server to restore DB 
3. Follow these steps : 

echo "Stopping EVM Server"
systemctl stop evmserverd

echo "Dropping VMDB_PRODUCTION"
dropdb vmdb_production

echo "Creating VMDB_PRODUCTION"
createdb vmdb_production

echo "Restoring ${DbName}"
pg_restore -c -C -v -U root -d vmdb_production -j 6 ${DbName} 2>&1 | tee pg_restore.out

cd /var/www/miq/vmdb

echo "Executing FIX_AUTH"
bundle exec tools/fix_auth.rb --v2 --invalid bogus 2>&1 | tee fix_auth.out 

Update password :
bundle exec rails c
User.find_by_userid("admin").update_attributes(:password => "smartvm")

4. systemctl start evmserverd

Actual results:
Getting this error in restarting server 
 Started EVM server daemon.
Oct 25 14:07:39 dhcp-8-196-145.cfme2.lab.eng.rdu2.redhat.com abrt[47604]: detected unhandled Ruby exception in '/var/www/miq/vmdb/lib/workers/bin/evm_server.rb'
Oct 25 14:07:39 dhcp-8-196-145.cfme2.lab.eng.rdu2.redhat.com abrt[47604]: can't communicate with ABRT daemon, is it running? No such file or directory - connect(2) for /var/run/abrt/abrt.socket
Oct 25 14:07:40 dhcp-8-196-145.cfme2.lab.eng.rdu2.redhat.com systemd[1]: evmserverd.service: main process exited, code=exited, status=1/FAILURE
[root@dhcp-8-196-145 vmdb]# bundle exec bin/update


Expected results:


Additional info:

Appliance : 10.8.196.145

Comment 2 Shveta 2018-10-25 18:14:09 UTC
Same steps work in 5.9

Comment 3 Dave Johnson 2018-10-25 18:45:36 UTC
Please assess the impact of this issue and update the severity accordingly.  Please refer to https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity for a reminder on each severity's definition.

If it's something like a tracker bug where it doesn't matter, please set the severity to Low.

Comment 4 Joe Rafaniello 2018-10-29 19:42:19 UTC
Hi Shveta,

I don't see the logs showing that error from Oct 25.  Does this fail every time? I'm not sure what's happening here but a prior BZ had the same abrt error and it had to do with starting the server after restoring a DB and missing some migrations.

https://bugzilla.redhat.com/show_bug.cgi?id=1624952

Please try to do this a few times and report back if it happens every time or only sometimes.

Comment 5 Shveta 2018-10-29 21:08:16 UTC
@joe - This is happening everytime .
I tried on two different appliances , restored DB and tried to restart server 4-5 times . Same result. Server won't start .

Comment 7 Joe Rafaniello 2018-10-29 21:30:36 UTC
We're failing to autoload a class, MiqAeMethodService::, with namespace and no class:

bundler: failed to load command: tools/fix_auth.rb (tools/fix_auth.rb)
ArgumentError: undefined class/module MiqAeMethodService::
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/class_loader.rb:54:in `path2class'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/class_loader.rb:54:in `resolve'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/class_loader.rb:46:in `find'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/class_loader.rb:28:in `load'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:396:in `resolve_class'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:208:in `visit_Psych_Nodes_Mapping'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/visitor.rb:16:in `visit'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/visitor.rb:6:in `accept'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:32:in `accept'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:338:in `block in revive_hash'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:336:in `each'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:336:in `each_slice'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:336:in `revive_hash'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:162:in `visit_Psych_Nodes_Mapping'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/visitor.rb:16:in `visit'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/visitor.rb:6:in `accept'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:32:in `accept'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:311:in `visit_Psych_Nodes_Document'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/visitor.rb:16:in `visit'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/visitor.rb:6:in `accept'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:32:in `accept'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych/nodes/node.rb:38:in `to_ruby'
  /opt/rh/rh-ruby23/root/usr/share/ruby/psych.rb:255:in `load'
  /var/www/miq/vmdb/tools/fix_auth/auth_config_model.rb:36:in `recrypt'
  /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:46:in `block in fix_passwords'
  /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:44:in `each'
  /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:44:in `fix_passwords'
  /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:85:in `block in run'


It appears to fail because the seetings have:

:instance_type: !ruby/object:MiqAeMethodService::MiqAeServiceManageIQ_Providers_Openstack_CloudManager_Flavor
  object: !ruby/object:ManageIQ::Providers::Openstack::CloudManager::Flavor

Comment 8 Joe Rafaniello 2018-10-29 21:31:56 UTC
Keenan believes these PRs could be related (note, it's meant for gaprindashvili backport (5.9) but hasn't been yet:

https://github.com/ManageIQ/manageiq/pull/17413

And for fine/5.8:
https://github.com/ManageIQ/manageiq/pull/17428

Comment 9 Keenan Brock 2018-10-29 21:52:19 UTC
Hello Shveta,

The underlying issue is the customer's automate scripts storing invalid values in the requests table. They are not the first customer to do this. The customer should check that their automate scripts are only storing numbers, strings, hashes, and lists in the miq_requests table.

The short term fix:

On the 5.10 appliance, is the --resilient option available for you?
If so, please use it and let me know any other errors you see.
If not, please let me know what date is on the appliance. It should be available on appliances for the past few months.

Comment 10 Shveta 2018-10-30 17:48:15 UTC
Hi Keenan , 

I have no idea how to use --resilient option. 
can you explain more with example command .
This is the appliance I am going to test - 10.8.197.127 .
I asked Joe too but he didn't know either .
Thanks, 
Shveta

Comment 11 Keenan Brock 2018-11-01 14:43:54 UTC
Hi Shveta,

I setup local vm and it had the option --allow-failures. Sorry, I forgot the had name changed.

It looks like you loaded a different database on that vm. After you restore the database, maybe you could try this command and let me know if that helps clear up the automate tables?


    # --v2 is only needed for older versions of cfme
    # --allow-failures tries extra efforts to find objects serialized to the automate requests table.
    # --invalid bogus will set all secrets stored with an unknown v2_key to the value "bogus"

    fix_auth --invalid bogus --allow-failures 2>&1 | tee fix_auth.out


Thank you,
Keenan

Comment 12 Shveta 2018-11-01 17:37:54 UTC
Hi Keenan , 
Tried that . It fixed the auth errors and command completed but evm server still won't start .
Check the pastebin here http://pastebin.test.redhat.com/665438.
Line 6840 onwards (bottom of the log)

Comment 14 Shveta 2018-11-05 18:23:24 UTC
Hi Keenan , 
Not sure which appliance you checked.
Please check 10.16.5.91 .


[root@10-16-5-91 ~]# systemctl start evmserverd 
[root@10-16-5-91 ~]# systemctl status evmserverd 
● evmserverd.service - EVM server daemon
   Loaded: loaded (/usr/lib/systemd/system/evmserverd.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2018-11-05 13:21:07 EST; 21s ago
  Process: 7845 ExecStop=/bin/sh -c /bin/evmserver.sh stop (code=killed, signal=TERM)
  Process: 7966 ExecStart=/bin/sh -c /bin/evmserver.sh start (code=exited, status=0/SUCCESS)
 Main PID: 7996 (ruby)
   CGroup: /system.slice/evmserverd.service
           └─7996 MIQ Server

Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: evmserverd.service failed.
Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Starting EVM server daemon...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: ** CFME 5.10.0.22, codename: Hammer
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Starting EVM...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Running EVM in background...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Started EVM server daemon.
[root@10-16-5-91 ~]# service evmserverd status
Redirecting to /bin/systemctl status evmserverd.service
● evmserverd.service - EVM server daemon
   Loaded: loaded (/usr/lib/systemd/system/evmserverd.service; enabled; vendor preset: disabled)
   Active: deactivating (stop) (Result: exit-code) since Mon 2018-11-05 13:21:37 EST; 3s ago
  Process: 7966 ExecStart=/bin/sh -c /bin/evmserver.sh start (code=exited, status=0/SUCCESS)
 Main PID: 7996 (code=exited, status=1/FAILURE);         : 8145 (evmserver.sh)
   CGroup: /system.slice/evmserverd.service
           └─control
             ├─8145 /bin/bash /bin/evmserver.sh stop
             └─8150 /opt/rh/cfme-gemset/bin/rake evm:stop

Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: evmserverd.service failed.
Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Starting EVM server daemon...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: ** CFME 5.10.0.22, codename: Hammer
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Starting EVM...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Running EVM in background...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Started EVM server daemon.
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com abrt[7996]: detected unhandled Ruby exception in '/var/www/miq/vmdb/lib/workers/bin/evm_server.rb'
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com abrt[7996]: can't communicate with ABRT daemon, is it running? No such file or directory - connect(2) for /var/run/abrt/abrt.socket
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: evmserverd.service: main process exited, code=exited, status=1/FAILURE
[root@10-16-5-91 ~]# systemctl status evmserverd 
● evmserverd.service - EVM server daemon
   Loaded: loaded (/usr/lib/systemd/system/evmserverd.service; enabled; vendor preset: disabled)
   Active: deactivating (stop) (Result: exit-code) since Mon 2018-11-05 13:21:37 EST; 12s ago
  Process: 7966 ExecStart=/bin/sh -c /bin/evmserver.sh start (code=exited, status=0/SUCCESS)
 Main PID: 7996 (code=exited, status=1/FAILURE);         : 8145 (evmserver.sh)
   CGroup: /system.slice/evmserverd.service
           └─control
             ├─8145 /bin/bash /bin/evmserver.sh stop
             └─8150 /opt/rh/cfme-gemset/bin/rake evm:stop

Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: evmserverd.service failed.
Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Starting EVM server daemon...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: ** CFME 5.10.0.22, codename: Hammer
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Starting EVM...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Running EVM in background...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Started EVM server daemon.
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com abrt[7996]: detected unhandled Ruby exception in '/var/www/miq/vmdb/lib/workers/bin/evm_server.rb'
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com abrt[7996]: can't communicate with ABRT daemon, is it running? No such file or directory - connect(2) for /var/run/abrt/abrt.socket
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: evmserverd.service: main process exited, code=exited, status=1/FAILURE
[root@10-16-5-91 ~]# systemctl status evmserverd 
● evmserverd.service - EVM server daemon
   Loaded: loaded (/usr/lib/systemd/system/evmserverd.service; enabled; vendor preset: disabled)
   Active: deactivating (stop) (Result: exit-code) since Mon 2018-11-05 13:21:37 EST; 26s ago
  Process: 7966 ExecStart=/bin/sh -c /bin/evmserver.sh start (code=exited, status=0/SUCCESS)
 Main PID: 7996 (code=exited, status=1/FAILURE);         : 8145 (evmserver.sh)
   CGroup: /system.slice/evmserverd.service
           └─control
             ├─8145 /bin/bash /bin/evmserver.sh stop
             └─8150 /opt/rh/cfme-gemset/bin/rake evm:stop

Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: evmserverd.service failed.
Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Starting EVM server daemon...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: ** CFME 5.10.0.22, codename: Hammer
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Starting EVM...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Running EVM in background...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Started EVM server daemon.
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com abrt[7996]: detected unhandled Ruby exception in '/var/www/miq/vmdb/lib/workers/bin/evm_server.rb'
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com abrt[7996]: can't communicate with ABRT daemon, is it running? No such file or directory - connect(2) for /var/run/abrt/abrt.socket
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: evmserverd.service: main process exited, code=exited, status=1/FAILURE

Comment 15 Keenan Brock 2018-11-06 16:34:45 UTC
In the logs, I noticed the following exception:

[----] E, [2018-11-06T03:39:50.406252 #15037:e68f84] ERROR -- : [ActiveModel::UnknownAttributeError]: unknown attribute 'resource_type' for MiqSchedule.  Method:[block (2 levels) in <class:LogProxy>]
[----] E, [2018-11-06T03:39:50.406461 #15037:e68f84] ERROR -- : /opt/rh/cfme-gemset/gems/activemodel-5.0.7/lib/active_model/attribute_assignment.rb:48:in `_assign_attribute'
activemodel-5.0.7/lib/active_model/attribute_assignment.rb:40:in `block in _assign_attributes'
...
app/models/miq_widget.rb:546:in `seed'

Which sounds like there is a missing attribute when running the seeds.
So that sounds like the database tables are not up to date.

I ran:

  vmdb
  rake db:migrate
  rake evm:db:seed & ; tail -f log/evm.log

I assume all went well when I saw "Synchronizing configuration complete..."


When you load a database from an older version of the application, please make sure you run into a new version of the application, you always need to run the migrations.

I think all is running. Would you please forward other problems my way.

Thank you

Comment 16 Shveta 2018-11-07 00:46:47 UTC
Works with db migrate option .
Thanks Keenan.


Note You need to log in before you can comment on or make changes to this bug.