Bug 1720399
| Summary: | Last update time for variable length metadata not properly set on pool setup | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | mulhern <amulhern> |
| Component: | stratisd | Assignee: | John Baublitz <jbaublitz> |
| Status: | CLOSED ERRATA | QA Contact: | guazhang <guazhang> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 8.2 | CC: | amulhern, bgurney, dkeefe, rhandlin |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | 8.2 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 1.0.6 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-04-28 15:41:56 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
mulhern
2019-06-13 21:48:30 UTC
Upstream PR: https://github.com/stratis-storage/stratisd/pull/1590. I was able to reproduce on a RHEL 8 test machine with these steps:
Required test partitions: at least 2. (In my example, I used /dev/sdc1 and /dev/sdc2.)
1. (Set the system clock to a time in the future.)
2. stratis pool create spool1 /dev/sdc1
3. stratis fs create spool1 sfs1
4. (Reset the system clock to the current time.)
5. stratis pool add-data spool1 /dev/sdc2
6. Reboot.
After this point, the stratis pool will fail to start. Only the "originsub" linear device will be started, which is on the block device with the "newer" MDA/BDA metadata (with the newer last_update_time and initialization_time):
Jul 30 14:20:08 localhost stratisd[828]: WARN libstratis::engine::strat_engine::engine: no pool set up, reason: Engine(Error, "failed to set up pool for (pool UUID: d1558c99-38f5-4ecf-a1b9-ffc27efa8806
, devnodes: /dev/sdc1): reason: DM(Core(Error(IoctlError(DeviceInfo { hdr: Struct_dm_ioctl { version: [4, 39, 0], data_size: 16384, data_start: 312, target_count: 0, open_count: 0, flags: 0, event_nr: 0, dev: Device { ma
jor: 0, minor: 0 }, name: \"stratis-1-private-d1558c9938f54ecfa1b9ffc27efa8806-thinpool-pool\", uuid: \"\" } }, Sys(EINVAL)), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })))")
Jul 30 14:20:08 localhost kernel: device-mapper: thin: 253:5: pool target (24523 blocks) too small: expected 49063
Jul 30 14:20:08 localhost kernel: device-mapper: table: 253:5: thin-pool: preresume failed, error = -22
Jul 30 14:20:08 localhost stratisd[828]: WARN libstratis::engine::strat_engine::engine: no pool set up, reason: Engine(Error, "failed to set up pool for (pool UUID: d1558c99-38f5-4ecf-a1b9-ffc27efa8806
, devnodes: /dev/sdc1 ,/dev/sdc2): reason: Engine(NotFound, \"Stratis device with device number 8:34, devnode /dev/sdc2, pool UUID d1558c99-38f5-4ecf-a1b9-ffc27efa8806 and device UUID ee9753b2-d430-414b-97da-c687d45eca5a
had no record in pool metadata\")")
(In reply to Bryan Gurney from comment #3) > I was able to reproduce on a RHEL 8 test machine with these steps: > > Required test partitions: at least 2. (In my example, I used /dev/sdc1 and > /dev/sdc2.) > > 1. (Set the system clock to a time in the future.) > 2. stratis pool create spool1 /dev/sdc1 > 3. stratis fs create spool1 sfs1 > 4. (Reset the system clock to the current time.) > 5. stratis pool add-data spool1 /dev/sdc2 > 6. Reboot. > > After this point, the stratis pool will fail to start. Only the "originsub" > linear device will be started, which is on the block device with the "newer" > MDA/BDA metadata (with the newer last_update_time and initialization_time): > > Jul 30 14:20:08 localhost stratisd[828]: WARN > libstratis::engine::strat_engine::engine: no pool set up, reason: > Engine(Error, "failed to set up pool for (pool UUID: > d1558c99-38f5-4ecf-a1b9-ffc27efa8806 > , devnodes: /dev/sdc1): reason: DM(Core(Error(IoctlError(DeviceInfo { hdr: > Struct_dm_ioctl { version: [4, 39, 0], data_size: 16384, data_start: 312, > target_count: 0, open_count: 0, flags: 0, event_nr: 0, dev: Device { ma > jor: 0, minor: 0 }, name: > \"stratis-1-private-d1558c9938f54ecfa1b9ffc27efa8806-thinpool-pool\", uuid: > \"\" } }, Sys(EINVAL)), State { next_error: None, backtrace: > InternalBacktrace { backtrace: None } })))") > Jul 30 14:20:08 localhost kernel: device-mapper: thin: 253:5: pool target > (24523 blocks) too small: expected 49063 > Jul 30 14:20:08 localhost kernel: device-mapper: table: 253:5: thin-pool: > preresume failed, error = -22 > Jul 30 14:20:08 localhost stratisd[828]: WARN > libstratis::engine::strat_engine::engine: no pool set up, reason: > Engine(Error, "failed to set up pool for (pool UUID: > d1558c99-38f5-4ecf-a1b9-ffc27efa8806 > , devnodes: /dev/sdc1 ,/dev/sdc2): reason: Engine(NotFound, \"Stratis device > with device number 8:34, devnode /dev/sdc2, pool UUID > d1558c99-38f5-4ecf-a1b9-ffc27efa8806 and device UUID > ee9753b2-d430-414b-97da-c687d45eca5a > had no record in pool metadata\")") This test is interesting, but doesn't really reproduce the problem that this bug addresses. In this test, the failure is quite curious, and makes it look as if the metadata wasn't written at all after the second block device was added. So it needs further investigation. The upstream bug: https://github.com/stratis-storage/stratisd/issues/1509 describes what happened, and how to reproduce, but the salient features are these: At time t_1, a pool is constructed. Then, stratisd is taken down. Then, at time t_2, which is less than t_1, stratisd is brought up again with the same devices. The thinpool is constructed, but when stratisd attempts to write out variable length metadata to the blockdevs, as part of its setup process, there is an error, because the time associated with the new metadata is less than that associated with the most recently recorded metadata. So there is a failure and the pool is left in stratisd's incomplete_pools list. The distinguishing error is logged as: "WARN libstratis::engine::strat_engine::engine: no pool set up, reason: Engine(Error, "failed to set up pool for (pool UUID: bcf7b4d9-8a28-4e58-a602-834045a6d604, devnodes: /dev/sdb1 ,/dev/sdb2): reason: Engine(Error, \"Failed to save metadata to even one device in pool\")")" The fix works by associating a time, the time when it was written, with the variable length metadata read from the devices on setup. When newer metadata is written, this time is compared with the current time, and a correct timestamp for the metadata to be written is constructed. Previously, no time was associated with the metadata, so the code to write out the new metadata followed the same path as if it were writing metadata for the first time, until it encountered the error above. Mass migration to Guangwu. Hello test pass with stratis-cli-2.0.0-1.el8, stratisd-2.0.0-4.el8.x86_64 , so move to verified thanks guazhang Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1634 |