Bug 1638922 - [AIO] standalone deployment cinder-volume storage does not survive a machine reboot
Summary: [AIO] standalone deployment cinder-volume storage does not survive a machine ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: beta
: 14.0 (Rocky)
Assignee: Alan Bishop
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-12 20:14 UTC by Fabio Massimo Di Nitto
Modified: 2019-01-11 11:54 UTC (History)
6 users (show)

Fixed In Version: openstack-tripleo-heat-templates-9.0.1-0.20181013060868.ffbe879.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, the loopback device for Cinder iSCSI/LVM backend was not recreated after a system restart, which prevented the cinder-volume service from restarting. This fix adds a systemd service that recreates the loopback device and therefore persists the Cinder iSCSI/LVM backend after a restart.
Clone Of:
Environment:
Last Closed: 2019-01-11 11:53:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 597202 0 'None' MERGED Recreate cinder LVM loopback device on startup 2020-11-17 12:58:15 UTC
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:54:03 UTC

Description Fabio Massimo Di Nitto 2018-10-12 20:14:41 UTC
Deploying AIO without ceph with the following parameters:

parameter_defaults:
  CloudName: osp
  # default gateway
  ControlPlaneStaticRoutes:
    - ip_netmask: 0.0.0.0/0
      next_hop: 192.168.0.1
      default: true
  Debug: true
  DeploymentUser: stack
  DnsServers:
    - 192.168.0.1
  NtpServer:
    - 192.168.0.1
  # needed for vip & pacemaker
  KernelIpNonLocalBind: 1
  DockerInsecureRegistryAddress:
    - osp.int.fabbione.net:8787
    - docker-registry.engineering.redhat.com
  NeutronPublicInterface: eth0
  # domain name used by the host
  NeutronDnsDomain: stoca
  # i'm just adding random flags pretending i know what i'm doing
  # stop pretending you all-mighty
  NeutronEnableInternalDNS: true
  DnsServers: ["192.168.0.1"]
  # re-use ctlplane bridge for public net, defined in the standalone
  # net config (do not change unless you know what you're doing)
  NeutronBridgeMappings: datacentre:br-ctlplane
  NeutronPhysicalBridge: br-ctlplane
  # enable to force metadata for public net
  #NeutronEnableForceMetadata: true
  StandaloneEnableRoutedNetworks: false
  StandaloneHomeDir: /home/stack
  StandaloneLocalMtu: 1500
  # Needed if running in a VM, not needed if on baremetal
  #StandaloneExtraConfig:
  #  nova::compute::libvirt::services::libvirt_virt_type: qemu
  #  nova::compute::libvirt::libvirt_virt_type: qemu
  HeatEngineOptVolumes:
    - /usr/lib/heat:/usr/lib/heat:ro

resource_registry:
  OS::TripleO::Services::HeatApi: /usr/share/openstack-tripleo-heat-templates/docker/services/heat-api.yaml
  OS::TripleO::Services::HeatApiCfn: /usr/share/openstack-tripleo-heat-templates/docker/services/heat-api-cfn.yaml
  OS::TripleO::Services::HeatEngine: /usr/share/openstack-tripleo-heat-templates/docker/services/heat-engine.yaml

sudo openstack tripleo deploy \
 --templates \
 --local-ip=192.168.0.202/22 \
 -e /usr/share/openstack-tripleo-heat-templates/environments/standalone.yaml \
 -r /usr/share/openstack-tripleo-heat-templates/roles/Standalone.yaml \
 -e $HOME/containers-prepare-parameters.yaml \
 -e $HOME/standalone_parameters.yaml \
 --output-dir $HOME/workdir \
 --standalone

the default cinder-volume storage is configured to use iscsi where the iscsi backend is currently using a loopback device on a file:

[root@osp ~]# losetup -a
/dev/loop2: [64768]:904031 (/var/lib/cinder/cinder-volumes)

The problem is that there are no default facilities in Linux (not just RHEL) to losetup a loopback file at boot.

Upon reboot, the loop2 is not configured and the iscsi/cinder-volume services will stop functioning properly.

2018-10-12 18:52:49.691 64 INFO cinder.volume.manager [req-8ce372ac-9886-457f-9c8d-36916252f401 - - - - -] Initializing RPC dependent components of volume driver LVMVolumeDriver (3.0.0)
2018-10-12 18:52:49.691 64 ERROR cinder.utils [req-8ce372ac-9886-457f-9c8d-36916252f401 - - - - -] Volume driver LVMVolumeDriver not initialized
2018-10-12 18:52:49.691 64 ERROR cinder.volume.manager [req-8ce372ac-9886-457f-9c8d-36916252f401 - - - - -] Cannot complete RPC initialization because driver isn't initialized properly.: DriverNotInitialized: Volume driver not ready.

That said, by googling around and doing some tests, the only viable option was to introduce a custom systemd unit file (http://www.anthonyldechiaro.com/blog/2010/12/19/lvm-loopback-how-to/).

[root@osp ~]# cat /etc/systemd/system/cinder-volume-loopback.service 
[Unit]
Description=Activate Cinder Volume Loopback device
DefaultDependencies=no
After=systemd-udev-settle.service
Before=lvm2-activation-early.service
Wants=systemd-udev-settle.service

[Service]
ExecStart=/sbin/losetup /dev/loop2 /var/lib/cinder/cinder-volumes
Type=oneshot

[Install]
WantedBy=local-fs.target

systemctl enable cinder-volume-loopback

Enabling the unit et all, will return the system to the correct status after a reboot.

Side notes:
1) adding both DFG:DF and DFG:Storage since it affects both.
2) I don't know if this storage configuration is supported or not. if it's not, then this bz should turn into an RFE to change default backend. I am no storage expert, i just notice when my volumes disappear :P
3) Severity: High to the impact of the problem, Priority: Medium since AIO is still TP

Comment 1 Michele Baldessari 2018-10-12 20:25:44 UTC
Note that this has been closed as WONTFIX in the past as LVM on loopback is not considered for production use (https://bugzilla.redhat.com/show_bug.cgi?id=1241644)

Comment 2 Fabio Massimo Di Nitto 2018-10-13 04:47:48 UTC
(In reply to Michele Baldessari from comment #1)
> Note that this has been closed as WONTFIX in the past as LVM on loopback is
> not considered for production use
> (https://bugzilla.redhat.com/show_bug.cgi?id=1241644)

Noted, but then the default storage should be changed (see also notes #2 in comment #1).

That said, this being a single node deployment that doesn't suffer of HA complexity, it might be treated differently.

Comment 3 Alan Bishop 2018-10-17 15:26:23 UTC
Upstream patch has (nearly) merged, and I'll propose it for Rocky so we can get this fixed for OSP-14.

Comment 4 Alan Bishop 2018-10-21 15:25:48 UTC
Patch has merged on master, and has been proposed for stable/rocky.

Comment 13 errata-xmlrpc 2019-01-11 11:53:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045


Note You need to log in before you can comment on or make changes to this bug.