Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Semantic detection PoC
This defines a framework for more advanced, statistics based, ways of importing data into
synth
. This paves the way for more automation in the process of writingsynth
schemas tailored to a specific data source.Underpinning this is the
semdet
crate which aims to providesynth
with the ability to do fast, zero-copy, in-memory trainable analytics for table instances provided by the user as an import data source. It is built onarrow
,ndarray
andtch
.The PoC is an end-to-end implementation of a dummy model that detects the most likely
fake
generator based on a simple dictionary lookup. The example is simple enough that we can get it done very quickly and yet involves enough moving parts to evidence the possibility of implementing more complex data driven inference mechanisms.How to test it
cargo test --features torch
insemdet/
will run the dummy E2E scenario and should be successful.Roadmap to readiness
Encoder
/Decoder
/Module
APIssqlx
query resultstch
optional so the built binary does not have to carry a dynamic dependency intolibtorch