Bug 1643250
| Summary: | EVM server won't start after restoring DB | ||
|---|---|---|---|
| Product: | Red Hat CloudForms Management Engine | Reporter: | Shveta <sshveta> |
| Component: | Appliance | Assignee: | Keenan Brock <kbrock> |
| Status: | CLOSED NOTABUG | QA Contact: | Sudhir Mallamprabhakara <smallamp> |
| Severity: | high | Docs Contact: | Red Hat CloudForms Documentation <cloudforms-docs> |
| Priority: | high | ||
| Version: | 5.10.0 | CC: | abellott, dmetzger, hkataria, jrafanie, kbrock, lavenel, mfeifer, mpovolny, obarenbo, simaishi, sshveta |
| Target Milestone: | GA | ||
| Target Release: | 5.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-01-10 14:01:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | CFME Core | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Shveta
2018-10-25 18:13:28 UTC
Same steps work in 5.9 Please assess the impact of this issue and update the severity accordingly. Please refer to https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity for a reminder on each severity's definition. If it's something like a tracker bug where it doesn't matter, please set the severity to Low. Hi Shveta, I don't see the logs showing that error from Oct 25. Does this fail every time? I'm not sure what's happening here but a prior BZ had the same abrt error and it had to do with starting the server after restoring a DB and missing some migrations. https://bugzilla.redhat.com/show_bug.cgi?id=1624952 Please try to do this a few times and report back if it happens every time or only sometimes. @joe - This is happening everytime . I tried on two different appliances , restored DB and tried to restart server 4-5 times . Same result. Server won't start . We're failing to autoload a class, MiqAeMethodService::, with namespace and no class: bundler: failed to load command: tools/fix_auth.rb (tools/fix_auth.rb) ArgumentError: undefined class/module MiqAeMethodService:: /opt/rh/rh-ruby23/root/usr/share/ruby/psych/class_loader.rb:54:in `path2class' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/class_loader.rb:54:in `resolve' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/class_loader.rb:46:in `find' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/class_loader.rb:28:in `load' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:396:in `resolve_class' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:208:in `visit_Psych_Nodes_Mapping' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/visitor.rb:16:in `visit' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/visitor.rb:6:in `accept' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:32:in `accept' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:338:in `block in revive_hash' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:336:in `each' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:336:in `each_slice' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:336:in `revive_hash' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:162:in `visit_Psych_Nodes_Mapping' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/visitor.rb:16:in `visit' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/visitor.rb:6:in `accept' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:32:in `accept' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:311:in `visit_Psych_Nodes_Document' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/visitor.rb:16:in `visit' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/visitor.rb:6:in `accept' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/visitors/to_ruby.rb:32:in `accept' /opt/rh/rh-ruby23/root/usr/share/ruby/psych/nodes/node.rb:38:in `to_ruby' /opt/rh/rh-ruby23/root/usr/share/ruby/psych.rb:255:in `load' /var/www/miq/vmdb/tools/fix_auth/auth_config_model.rb:36:in `recrypt' /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:46:in `block in fix_passwords' /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:44:in `each' /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:44:in `fix_passwords' /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:85:in `block in run' It appears to fail because the seetings have: :instance_type: !ruby/object:MiqAeMethodService::MiqAeServiceManageIQ_Providers_Openstack_CloudManager_Flavor object: !ruby/object:ManageIQ::Providers::Openstack::CloudManager::Flavor Keenan believes these PRs could be related (note, it's meant for gaprindashvili backport (5.9) but hasn't been yet: https://github.com/ManageIQ/manageiq/pull/17413 And for fine/5.8: https://github.com/ManageIQ/manageiq/pull/17428 Hello Shveta, The underlying issue is the customer's automate scripts storing invalid values in the requests table. They are not the first customer to do this. The customer should check that their automate scripts are only storing numbers, strings, hashes, and lists in the miq_requests table. The short term fix: On the 5.10 appliance, is the --resilient option available for you? If so, please use it and let me know any other errors you see. If not, please let me know what date is on the appliance. It should be available on appliances for the past few months. Hi Keenan , I have no idea how to use --resilient option. can you explain more with example command . This is the appliance I am going to test - 10.8.197.127 . I asked Joe too but he didn't know either . Thanks, Shveta Hi Shveta,
I setup local vm and it had the option --allow-failures. Sorry, I forgot the had name changed.
It looks like you loaded a different database on that vm. After you restore the database, maybe you could try this command and let me know if that helps clear up the automate tables?
# --v2 is only needed for older versions of cfme
# --allow-failures tries extra efforts to find objects serialized to the automate requests table.
# --invalid bogus will set all secrets stored with an unknown v2_key to the value "bogus"
fix_auth --invalid bogus --allow-failures 2>&1 | tee fix_auth.out
Thank you,
Keenan
Hi Keenan , Tried that . It fixed the auth errors and command completed but evm server still won't start . Check the pastebin here http://pastebin.test.redhat.com/665438. Line 6840 onwards (bottom of the log) Hi Keenan ,
Not sure which appliance you checked.
Please check 10.16.5.91 .
[root@10-16-5-91 ~]# systemctl start evmserverd
[root@10-16-5-91 ~]# systemctl status evmserverd
● evmserverd.service - EVM server daemon
Loaded: loaded (/usr/lib/systemd/system/evmserverd.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2018-11-05 13:21:07 EST; 21s ago
Process: 7845 ExecStop=/bin/sh -c /bin/evmserver.sh stop (code=killed, signal=TERM)
Process: 7966 ExecStart=/bin/sh -c /bin/evmserver.sh start (code=exited, status=0/SUCCESS)
Main PID: 7996 (ruby)
CGroup: /system.slice/evmserverd.service
└─7996 MIQ Server
Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: evmserverd.service failed.
Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Starting EVM server daemon...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: ** CFME 5.10.0.22, codename: Hammer
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Starting EVM...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Running EVM in background...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Started EVM server daemon.
[root@10-16-5-91 ~]# service evmserverd status
Redirecting to /bin/systemctl status evmserverd.service
● evmserverd.service - EVM server daemon
Loaded: loaded (/usr/lib/systemd/system/evmserverd.service; enabled; vendor preset: disabled)
Active: deactivating (stop) (Result: exit-code) since Mon 2018-11-05 13:21:37 EST; 3s ago
Process: 7966 ExecStart=/bin/sh -c /bin/evmserver.sh start (code=exited, status=0/SUCCESS)
Main PID: 7996 (code=exited, status=1/FAILURE); : 8145 (evmserver.sh)
CGroup: /system.slice/evmserverd.service
└─control
├─8145 /bin/bash /bin/evmserver.sh stop
└─8150 /opt/rh/cfme-gemset/bin/rake evm:stop
Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: evmserverd.service failed.
Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Starting EVM server daemon...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: ** CFME 5.10.0.22, codename: Hammer
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Starting EVM...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Running EVM in background...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Started EVM server daemon.
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com abrt[7996]: detected unhandled Ruby exception in '/var/www/miq/vmdb/lib/workers/bin/evm_server.rb'
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com abrt[7996]: can't communicate with ABRT daemon, is it running? No such file or directory - connect(2) for /var/run/abrt/abrt.socket
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: evmserverd.service: main process exited, code=exited, status=1/FAILURE
[root@10-16-5-91 ~]# systemctl status evmserverd
● evmserverd.service - EVM server daemon
Loaded: loaded (/usr/lib/systemd/system/evmserverd.service; enabled; vendor preset: disabled)
Active: deactivating (stop) (Result: exit-code) since Mon 2018-11-05 13:21:37 EST; 12s ago
Process: 7966 ExecStart=/bin/sh -c /bin/evmserver.sh start (code=exited, status=0/SUCCESS)
Main PID: 7996 (code=exited, status=1/FAILURE); : 8145 (evmserver.sh)
CGroup: /system.slice/evmserverd.service
└─control
├─8145 /bin/bash /bin/evmserver.sh stop
└─8150 /opt/rh/cfme-gemset/bin/rake evm:stop
Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: evmserverd.service failed.
Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Starting EVM server daemon...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: ** CFME 5.10.0.22, codename: Hammer
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Starting EVM...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Running EVM in background...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Started EVM server daemon.
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com abrt[7996]: detected unhandled Ruby exception in '/var/www/miq/vmdb/lib/workers/bin/evm_server.rb'
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com abrt[7996]: can't communicate with ABRT daemon, is it running? No such file or directory - connect(2) for /var/run/abrt/abrt.socket
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: evmserverd.service: main process exited, code=exited, status=1/FAILURE
[root@10-16-5-91 ~]# systemctl status evmserverd
● evmserverd.service - EVM server daemon
Loaded: loaded (/usr/lib/systemd/system/evmserverd.service; enabled; vendor preset: disabled)
Active: deactivating (stop) (Result: exit-code) since Mon 2018-11-05 13:21:37 EST; 26s ago
Process: 7966 ExecStart=/bin/sh -c /bin/evmserver.sh start (code=exited, status=0/SUCCESS)
Main PID: 7996 (code=exited, status=1/FAILURE); : 8145 (evmserver.sh)
CGroup: /system.slice/evmserverd.service
└─control
├─8145 /bin/bash /bin/evmserver.sh stop
└─8150 /opt/rh/cfme-gemset/bin/rake evm:stop
Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: evmserverd.service failed.
Nov 05 13:20:50 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Starting EVM server daemon...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: ** CFME 5.10.0.22, codename: Hammer
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Starting EVM...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com sh[7966]: Running EVM in background...
Nov 05 13:21:07 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: Started EVM server daemon.
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com abrt[7996]: detected unhandled Ruby exception in '/var/www/miq/vmdb/lib/workers/bin/evm_server.rb'
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com abrt[7996]: can't communicate with ABRT daemon, is it running? No such file or directory - connect(2) for /var/run/abrt/abrt.socket
Nov 05 13:21:37 10-16-5-91.rhq.lab.eng.bos.redhat.com systemd[1]: evmserverd.service: main process exited, code=exited, status=1/FAILURE
In the logs, I noticed the following exception: [----] E, [2018-11-06T03:39:50.406252 #15037:e68f84] ERROR -- : [ActiveModel::UnknownAttributeError]: unknown attribute 'resource_type' for MiqSchedule. Method:[block (2 levels) in <class:LogProxy>] [----] E, [2018-11-06T03:39:50.406461 #15037:e68f84] ERROR -- : /opt/rh/cfme-gemset/gems/activemodel-5.0.7/lib/active_model/attribute_assignment.rb:48:in `_assign_attribute' activemodel-5.0.7/lib/active_model/attribute_assignment.rb:40:in `block in _assign_attributes' ... app/models/miq_widget.rb:546:in `seed' Which sounds like there is a missing attribute when running the seeds. So that sounds like the database tables are not up to date. I ran: vmdb rake db:migrate rake evm:db:seed & ; tail -f log/evm.log I assume all went well when I saw "Synchronizing configuration complete..." When you load a database from an older version of the application, please make sure you run into a new version of the application, you always need to run the migrations. I think all is running. Would you please forward other problems my way. Thank you Works with db migrate option . Thanks Keenan. |