Skip to content

Releases: sustainable-computing-io/kepler

release-0.7.12

15 Oct 06:26
3eb6297
Compare
Choose a tag to compare

908db28 fix(bpf_collector): fix command name in case of kernel processes
b90a63b fix(bpf): exclude bpf overhead in bpf_cpu_time
987a139 fix(metrics): Remove resource usage check for skipping bpf metrics
c20c23c fix: error initializing dcgm
4121376 fix(aa66ada): reading libraries to the builder
4c63b43 fix(validator): update the bpf cpu time query
097541c fix: gosec failures (#1778)
5b95acb fix: nvml/dcgm builds
9d14581 fix: habana image build
aa66ada fix(dockerfile): remove redundant habanalabs installation steps
34d27b8 fix: do not probe for power-meters when disabled
ecd5f54 feat(models): update acpi dyn to 0.7.11
7303454 feat(models): update acpi abspower to 0.7.11
fcc8e0a feat(models): update intel-rapl abspower to 0.7.11
c84bfd0 feat(models): log model source url
7b07762 feat: get trainer from model_name in weight
a7b9892 fix(models): add predictor name in errors
aa822ee feat: compute core ratio for local regressor (#1743)
5d6ecc0 fix(config): trim spaces and new lines in MODEL_CONFIG
b2926d1 fix: limit max core ratio to 1
28d42a4 feat: compute idle power with core ratio (#1732)
d8a6c14 feat: add machine spec generator/reader for model weight request
11ff51d feat: add --disable-power-meter option
414533c fix: set default trainer only for local regressor
d446231 fix: format ComponentModelWeights
a6f75a4 feat: add model_name attribute to ComponentModelWeights
6858c58 fix: watcher resubmit items to workqueue (#1686)
eb5a72a fix(bpf): Fix overhead when sampling (#1685)
a97030f fix(bpf): use prev_tgid to register process
5432a39 feat: customize vm_id with libvirt metadata
73367d3 fix: correct regex path name for VM
6a6017b feat: save validation result as json and show the static dashboard using js
c99e399 fix(pkg/bpf): Use channel to process events (#1671)
cde7833 fix: resolve pid 0 to system_processes
b424607 feat(kubernetes): Use workqueues
a39ae55 fix: typo in filename
d73094e fix: apply suggestions from code review
5a113b2 fix(bpf): tgid is in the upper 32 bits

New Contributors

Full Changelog: v0.7.11...v0.7.12

release-0.7.11

11 Jul 06:10
bf1f62d
Compare
Choose a tag to compare

Changes

  • 1506691 - feat(validator): trigger validator workflow on changes (#1591)
  • c7b3ddb - fix(collector): convert cpu time in collection time instead of reporting time to avoid inconsistent use of cpu time in models
  • 9c80387 - bpf: account all running state processes (#1546)
  • 91fc8d4 - feat(validator): Add workflow for validator tests (#1570)
  • a2289d2 - fix: fallback to reading cpus.yaml relative to current dir (#1572)
  • bfaadae - pick up the go mod vendor changes
  • fb7ef35 - feat(metrics): selectively expose prom metrics to reduce overhead
  • d412bfb - fix: vendor/github.com/jaypipes/ghw/Dockerfile to reduce vulnerabilities (#1578)
  • 365ac03 - bpf: remove tgid map
  • c427a47 - fix(manifest): uncomment openshift SCC (#1575)
  • ec2a775 - fix(validator): improve the validator config sample (#1569)
  • 0e22839 - fix: update the VERSION variable assignment method (#1552)
  • 96dd443 - fix: Fix uncomment of YAML in hack/build-manifests.sh
  • 8931d61 - feat(validator): load validations from validations.yaml
  • 4a7bc31 - fix(compose): enable bpf cgroup id
  • a57041c - fix(bpf): Fix kepler_write_page_cache attach
  • fbe9b3c - fix(bpf): Access __state from task_struct (#1550)
  • 0b0b215 - fix(bpf): Use BTF-Defined Raw Tracepoints (#1542)
  • a08a5f6 - deps: Fix usage of textparse.NewPromParser
  • aad6964 - fix(bpf): Fix map lookup for IRQ/Page Cache
  • 9114e75 - Fix MSE and MAPE Single Queries (#1522)
  • 330a531 - fix(bpf): restore command label in process metrics
  • edd4d04 - review feedback: fix mse queries
  • d6420d5 - bump up local_dev_cluster_version version
  • 4ced508 - bpf-collector: change log verbosity to easily show it in CI
  • 34889bb - libbpf: update to use microseconds instead of milliseconds in the ebpf code because the low precision is identifying that the precess was not active
  • 4337a5e - bpf: remove task time
  • 0426e8f - feat(exporter): Graceful Shutdown
  • aec3ab5 - report validator results
  • 07636b1 - Replace expected and actual query with single query (#1489)
  • 1759cca - feat(compose): add build arguments for Kepler image
  • c678217 - use pmu name to get arm cpu id since archspec does not help here
  • 8bb405f - stats: update the verbosity of annoying key error message due to missing gpu metrics (#1480)
  • 59af568 - Add Test Cases for Prometheus, Config, Stresser for Validator (#1461)
  • bdd44b1 - bpf: fix the process parameter order to match the c and go code (#1479)
  • 747e7eb - fix: ensure all entries from bpf map is copied (#1477)
  • c092204 - make: quote ldflags
  • 468ed25 - add vm name option to validator (#1474)
  • 244ae8b - feat: expose version label in kepler_build_info (#1473)
  • 6ae21a0 - update validator usage; remove job from prom query
  • 3ac4f6b - feat(cgroup): Add podman support (#1455)
  • ada7884 - fix platform power return unit (#1468)
  • 3c7e777 - fix(collector): Fix use of waitgroups
  • b134a84 - fix(cmd/validator): Don't add when passing a wg
  • 0158b0b - fix(dev-dashboard): update and correct metrics in dev dashboard
  • f92532a - add new maintainers per 05/21 community meeting vote results (#1462)
  • efad46f - provide a simple template for maintainer nominate (#1463)
  • 9e957f3 - finish kepler on rhel tests
  • dcf78e6 - fix: remove logging while collecting GPU metrics
  • 49acca9 - fix(model): Use correct variable in IsNodeComponentPowerModelEnabled() (#1458)
  • 1b93eb1 - Adding New Metric Cases to Case module (#1453)
  • ea3e2f8 - add equinix metal instance to CI
  • 53d06d4 - add PR review bot (#1446)
  • 0baec47 - feat: Fixed eBPF Feature Detection (#1443)
  • 5f59172 - fix(bpf): cleanup initialising structs and nested ifs (#1444)
  • 2bca8dc - update hack/libbpf-headers.sh script to pull v1.3.0

New Contributors

Full Changelog: v0.7.10...v0.7.11

release-0.7.10

15 May 23:29
54f3613
Compare
Choose a tag to compare

Summary

  • fix(bpfassets): Fix object file lookup (#1419)
  • feat(bpf): Build for bpfel and bpfeb
  • feat(bpf): Bump up libbpf to 1.3.0
  • fix(dashboard): show metal and VM metrics correctly (#1395)
  • doc(dev): add section on how to profile (#1396)
  • feat(bpf): Portable eBPF Probes
  • feat(test): initial version of validator tool
  • dev(compose): add manifests for validation
  • fix(collector): Fix Segmentation fault when collecting CPU Freq from BPF (#1387)
  • feat(kepler): enable pprof (#1383)
  • fix habana installation
  • fix previous pid of finish_task_switch (#1370)
  • fix: update dashboard for docker-compose
  • fix(build): reduce image size by squashing install and clean steps
  • feat(compose): add docker-compose for easier local development
  • feat(exporter): log listening port
  • fix(build): reduce container image size (#1336)

New Contributors

Full Changelog: v0.7.9...v0.7.10

release-0.7.8

08 Apr 17:55
Compare
Choose a tag to compare
bot: Updated coverage badge.

Signed-off-by: sustainable-computing-bot <[email protected]>

release-0.7.8

04 Mar 14:38
Compare
Choose a tag to compare
bot: Updated coverage badge.

Signed-off-by: sustainable-computing-bot <[email protected]>

release-0.7.7

23 Feb 14:55
c34c19a
Compare
Choose a tag to compare
revert rpm source (#1254)

Signed-off-by: Huamin Chen <[email protected]>

release-0.7.6

23 Feb 14:47
d480b25
Compare
Choose a tag to compare
fix rpm spec (#1253)

Signed-off-by: Huamin Chen <[email protected]>

release-0.7.5

23 Feb 14:43
Compare
Choose a tag to compare
bot: Updated coverage badge.

Signed-off-by: sustainable-computing-bot <[email protected]>

release-0.4

23 Feb 14:40
Compare
Choose a tag to compare
bot: Updated coverage badge.

Signed-off-by: sustainable-computing-bot <[email protected]>

release-0.7.3

12 Feb 15:30
Compare
Choose a tag to compare

in kepler 0.7 release

  • switch to libbpf as default ebpf provider
  • base image update decouple GPU driver from kepler image itself
  • use kprobe instead of tracepoint for ebpf to obtain context switch information
  • add task clock event to ebpf and use it to calculate cpu usage for each process. The event is also exported to prometheus