Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse search strings as byteseek regexes, or just convert to byte arrays? #19

Open
nishihatapalmer opened this issue Aug 18, 2019 · 8 comments

Comments

@nishihatapalmer
Copy link
Owner

Search algorithms usually provide a String constructor.

Currently, that just lets us match the sequence of bytes encoded by that string (either in the default Charset, or one that's provided).

But since byteseek is byte oriented, a hex string (or full byteseek regex syntax) might be more useful. One person has already asked whether they can pass in hex string bytes directly to the search algorithms. I had to say that wasn't true, and they needed to use the SequenceMatcherCompiler to create a SequenceMatcher for the search algorithm to look for.

The same search algorithms also have a byte[] array constructor, if you want to explicitly search for a byte pattern. It's easy to convert a string to a byte array if that's what needs to be searched for.

So I guess the string constructor is essentially redundant - unless we support byteseek regex construction directly, just get rid of those constructors.

If we did support byteseek regex syntax in the search algorithm String constructors, what do we do with Compiler / Parse Exceptions?

@nishihatapalmer
Copy link
Owner Author

I guess those constructors just throw CompileExceptions. It only affects you if you use those constructors - as it should.

@nishihatapalmer
Copy link
Owner Author

This is essentially a convenience constructor. Either a string is a byte array, or it's a regex. In either case we already have constructors for the outputs (SequenceMatcher or byte array). And the byte array can also be modelled by e a SequenceMatcher.

What gives the best convenience?

@nishihatapalmer
Copy link
Owner Author

So - SequenceMatcher constructor is the only general constructor for SequenceSearch algorithms.

@nishihatapalmer
Copy link
Owner Author

Downside of making String constructors for search algorithms process regexes, is it creates a hard dependency on all search algorithms to the byteseek sequence matcher compiler and regex parser.

@nishihatapalmer
Copy link
Owner Author

Currently, matchers and searchers don't depend on the parser and compiler in any way.

@nishihatapalmer
Copy link
Owner Author

The only excuse for such a higher level dependency is convenience - which is what this is.

Is the convenience of instantiating hex string (or more complex syntax) searchers directly worth the dependency it creates?

@nishihatapalmer
Copy link
Owner Author

I don't think a hard dependency between the Searcher and Compiler package will really hurt anything.

It's a general design principle to try to keep them as cleanly separated as possible, but this is a case where we already had a support question raised by a user. They expected (or wanted) to be able to do this.

@nishihatapalmer
Copy link
Owner Author

I'm going to explore using SequenceMatcher compilers directly in the SequeneSearcher String constructors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant