Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support octal, hex, and arbitrary radix numbers #140

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jianlingzhong
Copy link
Contributor

code is a bit ugly but should work:

❯ build/jank repl
Bottom of clojure.core
clojure.core=> 071
57
clojure.core=> -071
-57
clojure.core=> 08
Read error (0 - 2): invalid number: char 8 are invalid for radix 8
clojure.core=> 08.9
8.9
clojure.core=> 08e2
800
clojure.core=> 0-7.1
0
clojure.core=> 0xabc
2748
clojure.core=> 0Xabc
2748
clojure.core=> -0xef
-239
clojure.core=> 0xg
Read error (0 - 3): invalid number: char g are invalid for radix 16
clojure.core=> 0x-a
Read error (0 - 4): invalid number: char - are invalid for radix 16
clojure.core=> 2r011110
30
clojure.core=> 8r71
57
clojure.core=> -8r71
-57
clojure.core=> 1r0
Read error (0 - 3): invalid number: radix 1 is out of range
clojure.core=> -36razxyu
-18473142
clojure.core=> -30razxyu
Read error (0 - 9): invalid number: char zxyu are invalid for radix 30
clojure.core=> 37rzb
Read error (0 - 5): invalid number: radix 37 is out of range
clojure.core=> 36r4.5
Read error (0 - 6): invalid number: char . are invalid for radix 36
clojure.core=> 36r-4.5
Read error (0 - 7): invalid number: char -. are invalid for radix 36
clojure.core=> 3e-4
0.0003
clojure.core=> 16r3e4
996
clojure.core=>  36r3e4
4396
clojure.core=> 3e2/3
Read error (0 - 3): invalid ratio
clojure.core=> 7r3e4
Read error (0 - 5): invalid number: char e are invalid for radix 7
clojure.core=>  0xe
14
clojure.core=> 16re3
227
clojure.core=> 16r3e
62
clojure.core=> 0x3e
62
clojure.core=>  0x3e3
995
clojure.core=> 16r3e3
995

@jianlingzhong jianlingzhong force-pushed the integer branch 2 times, most recently from 0aa17e2 to 378f19f Compare December 3, 2024 07:30
@jianlingzhong jianlingzhong force-pushed the integer branch 6 times, most recently from cf7fc2d to b450179 Compare December 19, 2024 01:34
@jianlingzhong jianlingzhong requested a review from jeaye December 19, 2024 04:48
Comment on lines 160 to 161
/* The 'r' used in arbitrary radix (prefixed with N and then r, where N is the radix (2 <= radix <= 36); */
/* e.g. 2r10101 for binary, 16rebed00d for hex) */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've kept the comment from when we had a new member here.

contains_leading_digit = true;
if(auto const [f, l]{ peek(2).unwrap_or({ ' ', 1 }) }; f != ' ')
{
// auto const[n, _] {peek(3).unwrap_or({' ', 1})};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this commented code, as well as the others in the various branches below.

Comment on lines 7 to 8
#include <iostream>
#include <clang/Basic/Diagnostic.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put these at the top of the file. Also, the comment on line 6 is specifically for the doctest include.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually don't need them. removed.

Comment on lines 573 to 728
SUBCASE("Invalid arbitrary radix")
{
processor p{ "2r3 35rz 8re71 19r-ghi 2r 16r" };
native_vector<result<token, error>> tokens(p.begin(), p.end());
CHECK(tokens
== make_results({
error{ 0, 3, "invalid number: char 3 are invalid for radix 2" },
error{ 4, 8, "invalid number: char z are invalid for radix 35" },
error{ 9, 14, "invalid number: char e are invalid for radix 8" },
error{ 15, 22, "invalid number: char - are invalid for radix 19" },
error{ 23, 25, "unexpected end of radix number, expecting more digits" },
error{ 26, 29, "unexpected end of radix number, expecting more digits" }
}));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how you've captured both positive and negative tests here. Let's dig deeper into potential negative cases, since I think there are more. Off the top of my head, to get the ball rolling:

  • 0r0 (is radix 0 supported?) -- In Clojure, no
  • 1r1 (is radix 1 supported?) -- In Clojure, no
  • 2048r0 (how high can radix actually go?) -- In Clojure, 36r0 is the highest
  • 2.0r1 (what if radix is a real?)
  • 2rr1 (what if r is specified multiple times?)
  • 2r1r (what if r is specified at the end of a valid number?)
  • 2r.0 (what if a decimal is specified after r?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also give the same treatment to octal and hex. I can help with some test cases if you need more suggestions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with more unit tests.

SUBCASE("Arbitrary radix edge cases")
{
/* exceeds 64-bit integer max */
processor p{ "36r0123456789abcdefghijklmnopqrstuvwxyz" };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be an error, until we support big numbers. The token we create is lossy.

{ 0, 39, token_kind::integer, 9223372036854775807ll }
}));

processor p2{ "2r1111111111111111111111111111111111111111111" };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a neat edge case of a binary number with more bits than our integer. That should also be an error.

@@ -313,12 +325,52 @@ namespace jank::read

static native_bool is_symbol_char(char32_t const c)
{
return !std::iswspace(c) && !is_special_char(c)
return !std::iswspace(safe_cast_char32_t_to_int(c)) && !is_special_char(c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is what we want. What we want is this:

  • We have a char32_t and we want to check if it's a space (true or false)

With this change, we have this:

  • We have a char32_t and we want to check if it's a space (true or false), but we throw an exception if it's larger than an int

Instead, let's just provide our own iswspace which takes a char32_t. The docs for it are here: https://en.cppreference.com/w/cpp/string/wide/iswspace Looks like a simple switch with 6 cases should work.

return err(
error{ token_start,
pos,
fmt::format("invalid number: char {} are invalid for radix {}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clarify the error message here and fix the grammar by saying invalid number: chars '{}' are invalid for radix {}.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants