-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support octal, hex, and arbitrary radix numbers #140
base: main
Are you sure you want to change the base?
Conversation
0aa17e2
to
378f19f
Compare
b4f0715
to
3238c4c
Compare
3238c4c
to
14d4863
Compare
14d4863
to
3ac56e9
Compare
cf7fc2d
to
b450179
Compare
/* The 'r' used in arbitrary radix (prefixed with N and then r, where N is the radix (2 <= radix <= 36); */ | ||
/* e.g. 2r10101 for binary, 16rebed00d for hex) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've kept the comment from when we had a new member here.
contains_leading_digit = true; | ||
if(auto const [f, l]{ peek(2).unwrap_or({ ' ', 1 }) }; f != ' ') | ||
{ | ||
// auto const[n, _] {peek(3).unwrap_or({' ', 1})}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this commented code, as well as the others in the various branches below.
#include <iostream> | ||
#include <clang/Basic/Diagnostic.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put these at the top of the file. Also, the comment on line 6 is specifically for the doctest include.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually don't need them. removed.
SUBCASE("Invalid arbitrary radix") | ||
{ | ||
processor p{ "2r3 35rz 8re71 19r-ghi 2r 16r" }; | ||
native_vector<result<token, error>> tokens(p.begin(), p.end()); | ||
CHECK(tokens | ||
== make_results({ | ||
error{ 0, 3, "invalid number: char 3 are invalid for radix 2" }, | ||
error{ 4, 8, "invalid number: char z are invalid for radix 35" }, | ||
error{ 9, 14, "invalid number: char e are invalid for radix 8" }, | ||
error{ 15, 22, "invalid number: char - are invalid for radix 19" }, | ||
error{ 23, 25, "unexpected end of radix number, expecting more digits" }, | ||
error{ 26, 29, "unexpected end of radix number, expecting more digits" } | ||
})); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how you've captured both positive and negative tests here. Let's dig deeper into potential negative cases, since I think there are more. Off the top of my head, to get the ball rolling:
0r0
(is radix 0 supported?) -- In Clojure, no1r1
(is radix 1 supported?) -- In Clojure, no2048r0
(how high can radix actually go?) -- In Clojure,36r0
is the highest2.0r1
(what if radix is a real?)2rr1
(what ifr
is specified multiple times?)2r1r
(what ifr
is specified at the end of a valid number?)2r.0
(what if a decimal is specified afterr
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also give the same treatment to octal and hex. I can help with some test cases if you need more suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated with more unit tests.
b450179
to
1b98551
Compare
1b98551
to
b747cde
Compare
b747cde
to
3cb967c
Compare
SUBCASE("Arbitrary radix edge cases") | ||
{ | ||
/* exceeds 64-bit integer max */ | ||
processor p{ "36r0123456789abcdefghijklmnopqrstuvwxyz" }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should be an error, until we support big numbers. The token we create is lossy.
{ 0, 39, token_kind::integer, 9223372036854775807ll } | ||
})); | ||
|
||
processor p2{ "2r1111111111111111111111111111111111111111111" }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a neat edge case of a binary number with more bits than our integer. That should also be an error.
@@ -313,12 +325,52 @@ namespace jank::read | |||
|
|||
static native_bool is_symbol_char(char32_t const c) | |||
{ | |||
return !std::iswspace(c) && !is_special_char(c) | |||
return !std::iswspace(safe_cast_char32_t_to_int(c)) && !is_special_char(c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is what we want. What we want is this:
- We have a
char32_t
and we want to check if it's a space (true or false)
With this change, we have this:
- We have a
char32_t
and we want to check if it's a space (true or false), but we throw an exception if it's larger than an int
Instead, let's just provide our own iswspace
which takes a char32_t
. The docs for it are here: https://en.cppreference.com/w/cpp/string/wide/iswspace Looks like a simple switch with 6 cases should work.
return err( | ||
error{ token_start, | ||
pos, | ||
fmt::format("invalid number: char {} are invalid for radix {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's clarify the error message here and fix the grammar by saying invalid number: chars '{}' are invalid for radix {}
.
code is a bit ugly but should work: