Bug 1734309 - evmserverd fails to start: Scm branch can't be blank Method
Summary: evmserverd fails to start: Scm branch can't be blank Method
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Appliance
Version: 5.11.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: GA
: 5.11.0
Assignee: Nick Carboni
QA Contact: Jaroslav Henner
Red Hat CloudForms Documentation
URL:
Whiteboard:
Depends On:
Blocks: 1655794
TreeView+ depends on / blocked
 
Reported: 2019-07-30 08:34 UTC by Jaroslav Henner
Modified: 2019-08-02 14:03 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-02 13:01:14 UTC
Category: Bug
Cloudforms Team: CFME Core
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
evm.log (11.34 KB, text/plain)
2019-07-30 08:34 UTC, Jaroslav Henner
no flags Details

Description Jaroslav Henner 2019-07-30 08:34:55 UTC
Created attachment 1594537 [details]
evm.log

Description of problem:
after upgrading an environment with ansible tower, the evmserverd fails to start:

----] E, [2019-07-29T08:24:32.287655 #14110:2b0e51cb25c0] ERROR -- : [ActiveRecord::RecordInvalid]: Validation failed: ManageIQ::Providers::EmbeddedAnsible::AutomationManager::ConfigurationScriptSource: Scm branch can't be blank  Method:[block (2 levels) in <class:LogProxy>]
[----] E, [2019-07-29T08:24:32.287815 #14110:2b0e51cb25c0] ERROR -- : /opt/rh/cfme-gemset/gems/activerecord-5.1.7/lib/active_record/validations.rb:78:in `raise_validation_error'
/opt/rh/cfme-gemset/gems/activerecord-5.1.7/lib/active_record/validations.rb:50:in `save!'


Version-Release number of selected component (if applicable):
cfme-5.11.0.16-1.el8cf.x86_64


How reproducible:
2/2

Steps to Reproduce:
1. Boot 5.9 or 5.10 instance, toggle the switch in the configuration to enable it to serve as embedded ansible.
2. pg_dump --format custom --file /net/$STORE/srv/export/db_dump/ansible_enabled.5.9.dump vmdb_production

on 5.11 instance:

3.
$ pg_restore -v --dbname=vmdb_production /net/$STORE/srv/export/db_dump/ansible_enabled.5.9.dump
$ scp $CFME_5_9:/var/www/miq/vmdb/config/database.yml  config/database.yml
$ scp $CFME_5_9:/var/www/miq/vmdb/certs/v2_key  certs/v2_key
$ systemctl start evmserverd


Actual results:
evmserverd fails to start

Expected results:
evmserverd starts

Additional info:

Comment 2 Nick Carboni 2019-07-31 18:47:42 UTC
I was able to complete this process with no errors, but the upgrade procedure is not particularly trivial.

These were the steps I went through, I'll open a separate docs BZ to get the 5.10 -> 5.11 upgrade procedure documented.

On 5.10 appliance:

- `systemctl stop evmserverd`
- `pg_dumpall -c --if-exists -f vmdb-5.10.pg`

On newly deployed 5.11 appliance:

- Configure a new internal database (appliance console option 7).
- Fetch v2_key from one of the 5.10 appliance.
- Select the same region number and database password as the old installation
- wait for the application to come up

- `systemctl stop evmserverd`
- Copy the GUID file from the 5.10 appliance
- Copy the database dump from the 5.10 appliance
- `psql -f vmdb-5.10.pg postgres`
- `vmdb`
- `bin/rake db:migrate`
- `bin/rake evm:automate:reset`
- `systemctl start evmserverd`

I intend to refine these a bit, but that should do for now.
Jaroslav, can you re-test this and ensure that you can also upgrade successfully? I was able to go from 5.10.7.1 to 5.11.0.17 without issue.

Comment 3 Jaroslav Henner 2019-08-01 14:02:15 UTC
(In reply to Nick Carboni from comment #2)
> I was able to complete this process with no errors, but the upgrade
> procedure is not particularly trivial.
> 
> These were the steps I went through, I'll open a separate docs BZ to get the
> 5.10 -> 5.11 upgrade procedure documented.
> 
> On 5.10 appliance:
> 
> - `systemctl stop evmserverd`
> - `pg_dumpall -c --if-exists -f vmdb-5.10.pg`
> 
> On newly deployed 5.11 appliance:
> 
> - Configure a new internal database (appliance console option 7).
> - Fetch v2_key from one of the 5.10 appliance.
> - Select the same region number and database password as the old installation
> - wait for the application to come up
> 
> - `systemctl stop evmserverd`
> - Copy the GUID file from the 5.10 appliance
> - Copy the database dump from the 5.10 appliance
> - `psql -f vmdb-5.10.pg postgres`
> - `vmdb`
> - `bin/rake db:migrate`
> - `bin/rake evm:automate:reset`
> - `systemctl start evmserverd`
> 
> I intend to refine these a bit, but that should do for now.
> Jaroslav, can you re-test this and ensure that you can also upgrade
> successfully? I was able to go from 5.10.7.1 to 5.11.0.17 without issue.

I have seen various other errors when starting the vmserverd, like:
Aug 01 09:48:39 dhcp-8-197-147.cfme2.lab.eng.rdu2.redhat.com sh[19254]: ActiveRecord::SubclassNotFound: The single-table inheritance mechanism failed to locate the subclass: 'EmbeddedAnsibleWorker'. This error is raised because the column 'type' is reserved for storing the class in case of inheritance. Please rename this column if you didn't intend it to be used for storing the inheritance class or overwrite MiqWorker.inheritance_column to use another column for that information.
Aug 01 09:48:39 dhcp-8-197-147.cfme2.lab.eng.rdu2.redhat.com sh[19254]: /opt/rh/cfme-gemset/gems/activerecord-5.1.7/lib/active_record/inheritance.rb:196:in `rescue in find_sti_class'
Aug 01 09:48:39 dhcp-8-197-147.cfme2.lab.eng.rdu2.redhat.com sh[19254]: /opt/rh/cfme-gemset/gems/activerecord-5.1.7/lib/active_record/inheritance.rb:189:in `find_sti_class'

when I tired to restore from dump that I created without stopping the evmserverd. No matter was it created with dumpall or just dump.


When I stop the evmserverd, the pg_dump or the dump that the appliance client does seems to be enough for good restoration.

It just makes me worried that some db accesses should be done in transactions and aren't, so the data in the db may be not consistent all the time. But maybe it is a design decision.

Comment 4 Nick Carboni 2019-08-01 14:19:10 UTC
> when I tired to restore from dump that I created without stopping the evmserverd

Don't do that. For upgrades the server should always be stopped. Also, the initial bug report states that you stopped the server before running the dump, so I feel like that's not the issue for this BZ.

> It just makes me worried that some db accesses should be done in transactions and aren't, so the data in the db may be not consistent all the time

This doesn't have anything to do with transactions. The database is in a consistent state, just not one that's suitable for upgrades.

Were you able to upgrade without the reported error with the steps that I provided?

Comment 5 Jaroslav Henner 2019-08-01 14:32:16 UTC
(In reply to Nick Carboni from comment #4)
> > when I tired to restore from dump that I created without stopping the evmserverd
> 
> Don't do that. For upgrades the server should always be stopped. Also, the
> initial bug report states that you stopped the server before running the
> dump, so I feel like that's not the issue for this BZ.

I was not aware that I need to stop evmserverd to create the dump. The doc:

https://access.redhat.com/documentation/en-us/red_hat_cloudforms/4.7/html/deployment_planning_guide/introduction#creating_a_database_dump

doesn't say that as well. The appliance_console dump command doesnt seem to complain about evmserverd being started.

> 
> > It just makes me worried that some db accesses should be done in transactions and aren't, so the data in the db may be not consistent all the time
> 
> This doesn't have anything to do with transactions. The database is in a
> consistent state, just not one that's suitable for upgrades.
> 
> Were you able to upgrade without the reported error with the steps that I
> provided?

I am not sure am I able to do yum update to CFME 5.11 from any previous versions. It was not possible and I don't think anything changed. Therefore I am doing dump+restore+db:migrate as an upgrade to 5.11. This works with pg_dump or appliance console dump (as it is surely doing the pg_dump) but evmserverd seems to have to be stopped when dump is being done.

Comment 6 Nick Carboni 2019-08-01 14:56:11 UTC
Please follow the steps in comment 2 and let me know if the upgrade works with those steps.

Comment 7 Jaroslav Henner 2019-08-02 12:12:43 UTC
Yeah I did both dumpall and dump way and without stopping and with stopping the evmserverd before creating the dump file. 

When restoring the dumpall, I had problem:

[root@dhcp-8-197-160 vmdb]# psql -f ~/59.nostop.dumpall 
psql: FATAL:  database "root" does not exist


which stops after

[root@dhcp-8-197-160 vmdb]# createdb root



Today no matter which method, no matter if I stopped evmserverd or not... everything worked.
Yesterday I had problems to start evmserverd after restoring from either dump file created with evmserverd running. With evmserverd stopped it worked both ways fine.


Conclusion
==========
* It seems the documentation should tell to stop evmserverd before making any dump.
* The appliance_console should check whether the evmserverd is running.

Comment 8 Nick Carboni 2019-08-02 13:01:14 UTC
(In reply to Jaroslav Henner from comment #7)
> Conclusion
> ==========
> * It seems the documentation should tell to stop evmserverd before making
> any dump.
> * The appliance_console should check whether the evmserverd is running.

I disagree, the database dumps from the console are not meant to be restored or used as backups.
The option is there for people during support calls so that we can take a look at a possible data issue.

In this particular case we are using a database dump for upgrade and we will document the correct steps as a part of https://bugzilla.redhat.com/show_bug.cgi?id=1726467

I'm closing this as WORKSFORME


Note You need to log in before you can comment on or make changes to this bug.