-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculated value for osd target memory
too high for deployments with multiple OSDs per device
#7435
Calculated value for osd target memory
too high for deployments with multiple OSDs per device
#7435
Comments
The number of OSDs defined by the `lvm_volumes` variable is added to `num_osds` in task `Count number of osds for lvm scenario`. Therefore theses devices must not be counted in task `Set_fact num_osds (add existing osds)`. There are currently three problems with the existing approach: 1. Bluestore DB and WAL devices are counted as OSDs 2. `lvm_volumes` supports a second notation to directly specify logical volumes instead of devices when the `data_vg` key exists. This scenario is not yet accounted for. 3. The `difference` filter used to remove devices from `lvm_volumes` returns a list of **unique** elements, thus not accounting for multiple OSDs on a single device The first problem is solved by filtering the list of logical volumes for devices used as `type` `block`. For the second and third problem lists are created from `lvm_volumes` containing either paths to devices or logical volumes devices. For the second problem the output of `ceph-volume` is simply filtered for `lv_path`s appearing in the list of logical volume devices described above. To solve the third problem the remaining OSDs in the output are compiled into a list of their used devices, which is then filtered for devices appearing in the list of devices from `lvm_volumes`. Closes: ceph#7435 Signed-off-by: Jan Horstmann <[email protected]>
The number of OSDs defined by the `lvm_volumes` variable is added to `num_osds` in task `Count number of osds for lvm scenario`. Therefore theses devices must not be counted in task `Set_fact num_osds (add existing osds)`. There are currently three problems with the existing approach: 1. Bluestore DB and WAL devices are counted as OSDs 2. `lvm_volumes` supports a second notation to directly specify logical volumes instead of devices when the `data_vg` key exists. This scenario is not yet accounted for. 3. The `difference` filter used to remove devices from `lvm_volumes` returns a list of **unique** elements, thus not accounting for multiple OSDs on a single device The first problem is solved by filtering the list of logical volumes for devices used as `type` `block`. For the second and third problem lists are created from `lvm_volumes` containing either paths to devices or logical volumes devices. For the second problem the output of `ceph-volume` is simply filtered for `lv_path`s appearing in the list of logical volume devices described above. To solve the third problem the remaining OSDs in the output are compiled into a list of their used devices, which is then filtered for devices appearing in the list of devices from `lvm_volumes`. Fixes: ceph#7435 Signed-off-by: Jan Horstmann <[email protected]>
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
Please note the discussion in the linked PR |
Original commit message: Subject: [PATCH] ceph-config: fix calculation of `num_osds` The number of OSDs defined by the `lvm_volumes` variable is added to `num_osds` in task `Count number of osds for lvm scenario`. Therefore theses devices must not be counted in task `Set_fact num_osds (add existing osds)`. There are currently three problems with the existing approach: 1. Bluestore DB and WAL devices are counted as OSDs 2. `lvm_volumes` supports a second notation to directly specify logical volumes instead of devices when the `data_vg` key exists. This scenario is not yet accounted for. 3. The `difference` filter used to remove devices from `lvm_volumes` returns a list of **unique** elements, thus not accounting for multiple OSDs on a single device The first problem is solved by filtering the list of logical volumes for devices used as `type` `block`. For the second and third problem lists are created from `lvm_volumes` containing either paths to devices or logical volumes devices. For the second problem the output of `ceph-volume` is simply filtered for `lv_path`s appearing in the list of logical volume devices described above. To solve the third problem the remaining OSDs in the output are compiled into a list of their used devices, which is then filtered for devices appearing in the list of devices from `lvm_volumes`. Fixes: ceph/ceph-ansible#7435 Signed-off-by: Jan Horstmann <[email protected]>
Original commit message: Subject: [PATCH] ceph-config: fix calculation of `num_osds` The number of OSDs defined by the `lvm_volumes` variable is added to `num_osds` in task `Count number of osds for lvm scenario`. Therefore theses devices must not be counted in task `Set_fact num_osds (add existing osds)`. There are currently three problems with the existing approach: 1. Bluestore DB and WAL devices are counted as OSDs 2. `lvm_volumes` supports a second notation to directly specify logical volumes instead of devices when the `data_vg` key exists. This scenario is not yet accounted for. 3. The `difference` filter used to remove devices from `lvm_volumes` returns a list of **unique** elements, thus not accounting for multiple OSDs on a single device The first problem is solved by filtering the list of logical volumes for devices used as `type` `block`. For the second and third problem lists are created from `lvm_volumes` containing either paths to devices or logical volumes devices. For the second problem the output of `ceph-volume` is simply filtered for `lv_path`s appearing in the list of logical volume devices described above. To solve the third problem the remaining OSDs in the output are compiled into a list of their used devices, which is then filtered for devices appearing in the list of devices from `lvm_volumes`. Fixes: ceph/ceph-ansible#7435 Signed-off-by: Jan Horstmann <[email protected]>
@guits Can we merge this? We prepare a back port on our site (osism/container-image-ceph-ansible#488), but I would prefer to merge it upstream. The PR #7502 was closed yesterday because of inactivity. |
Bug Report
What happened:
osd target memory
was set to a much higher value after upgrading topacific
, resulting in recurring out of memory kills of OSDs.Cause:
Commit 225ae38ee2f74165e7d265817597fe451df3e919 changed the calculation of
num_osds
, which is used to calculate a sensible value forosd memory target
. The new formula uses ansible'sdifference
filter, which according to the docs returns a list with unique elements.Thus on deployments with multiple OSDs per device, where the same device should be counted multiple times, the value for
num_osds
is too small and there is an overestimation of the available memory per OSD.Apart from that DB devices are now also counted into
num_osds
Workarounds:
Set a fixed value for
osd memory target
inceph_conf_overrides
.Environment:
ansible-playbook --version
):git head or tag or stable branch
):stable-6.0
(same calculation instable-7.0
andmain
, but unverified)ceph -v
):ceph version 16.2.13 (5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)
The text was updated successfully, but these errors were encountered: