Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parquet output using parquet2 via Rust #1240

Draft
wants to merge 22 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
66412ad
Add Rust build support
the80srobot Nov 24, 2023
9e339b6
rust: add a way to build cxxbridge-cmd
the80srobot Dec 1, 2023
fca8f1a
rust: Add a rust_cxx_bridge target.
the80srobot Dec 1, 2023
ac68f97
Initial (rust-only) parquet writer logic
the80srobot Dec 8, 2023
3d9cae5
Add support for parquet strings
the80srobot Dec 11, 2023
d554ed7
Split code into multiple files
the80srobot Dec 13, 2023
661438b
Simplify the generic traits by instead enumerating the supported types
the80srobot Dec 13, 2023
65536f8
Add a table wrapper
the80srobot Dec 13, 2023
4eac75e
Table encapsulates writing
the80srobot Dec 14, 2023
1b55890
Add the C++ API for building a table
the80srobot Dec 14, 2023
5819f98
Actually write a parquet file
the80srobot Dec 14, 2023
b9c185a
Enable statistics for byte arrays
the80srobot Dec 14, 2023
9afbf9e
Fix a missing definition level bug
the80srobot Dec 14, 2023
18d3798
Only flush if something is buffered
the80srobot Dec 14, 2023
2666346
Document some implementation notes
the80srobot Dec 14, 2023
97fbd58
Code size tweaks
the80srobot Dec 14, 2023
4c2ec63
Clean up and document the API
the80srobot Dec 14, 2023
fdf6b48
Remove the old C-FFI POC code and add an XCTest
the80srobot Dec 14, 2023
85eb27c
Extremely basic e2e test to prove the file is valid
the80srobot Dec 14, 2023
608a650
Update guidance about use of the Result type
the80srobot Dec 14, 2023
0798853
Improve the e2e test to validate the file contents
the80srobot Dec 15, 2023
48ec5ef
Document references to docs
the80srobot Dec 15, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,6 @@ tulsigen-*
compile_commands.json
.cache/
.vscode/*

# Rust stuff
target
296 changes: 296 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[workspace]
members = [
# The below list must be kept in sync with the crates_repository.manifest
# key in the root WORKSPACE file.
"Source/santad/Logs/EndpointSecurity/ParquetLogger",
]
54 changes: 54 additions & 0 deletions Source/santad/Logs/EndpointSecurity/ParquetLogger/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
load("@crate_index//:defs.bzl", "aliases", "all_crate_deps")
load("@rules_cc//cc:defs.bzl", "cc_binary", "cc_library")
load("@rules_rust//rust:defs.bzl", "rust_static_library", "rust_test")
load("//:helper.bzl", "rust_cxx_bridge", "santa_unit_test")

rust_static_library(
name = "parquet_logger",
srcs = [
"column_builder.rs",
"cpp_api.rs",
"page_builder.rs",
"parquet_logger.rs",
"table.rs",
"value.rs",
"writer.rs",
],
aliases = aliases(),
proc_macro_deps = all_crate_deps(
proc_macro = True,
),
deps = all_crate_deps(
normal = True,
),
)

cc_binary(
name = "write_test_file",
srcs = ["write_test_file.cc"],
deps = [":parquet_bridge"],
)

rust_test(
name = "parquet_logger_test",
crate = ":parquet_logger",
)

rust_cxx_bridge(
name = "parquet_bridge",
src = "cpp_api.rs",
deps = [":parquet_logger"],
)

cc_library(
name = "ParquetLogger",
srcs = ["ParquetLogger.cc"],
hdrs = ["ParquetLogger.h"],
deps = [":parquet_bridge"],
)

santa_unit_test(
name = "ParquetLoggerTest",
srcs = ["ParquetLoggerTest.mm"],
deps = [":ParquetLogger"],
)
25 changes: 25 additions & 0 deletions Source/santad/Logs/EndpointSecurity/ParquetLogger/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
[package]
name = "parquet_logger"
version = "0.1.0"
edition = "2021"
description = "Parquet output support for Santa"

[lib]
name = "parquet_logger"
path = "parquet_logger.rs"
crate-type = ["cdylib", "staticlib"]

[dependencies]
parquet2 = "0.17.2"
cxx = "1.0"

# The release profile is tweaked for binary size. Not all of these options are
# applied by bazel at the moment.
[profile.release]
# Automatically strip symbols from the binary. Note: this seems to have less of
# an effect than just calling strip on the binary after the fact.
strip = true
opt-level = "z" # Optimize for size.
lto = true
codegen-units = 1 # Disable parallel codegen.
panic = "abort" # This matches the behavior of LOG(FATAL).
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#include "ParquetLogger.h"

Loading