Support octal, hex, and arbitrary radix numbers #140

jianlingzhong · 2024-12-03T07:27:31Z

code is a bit ugly but should work:

❯ build/jank repl
Bottom of clojure.core
clojure.core=> 071
57
clojure.core=> -071
-57
clojure.core=> 08
Read error (0 - 2): invalid number: char 8 are invalid for radix 8
clojure.core=> 08.9
8.9
clojure.core=> 08e2
800
clojure.core=> 0-7.1
0
clojure.core=> 0xabc
2748
clojure.core=> 0Xabc
2748
clojure.core=> -0xef
-239
clojure.core=> 0xg
Read error (0 - 3): invalid number: char g are invalid for radix 16
clojure.core=> 0x-a
Read error (0 - 4): invalid number: char - are invalid for radix 16
clojure.core=> 2r011110
30
clojure.core=> 8r71
57
clojure.core=> -8r71
-57
clojure.core=> 1r0
Read error (0 - 3): invalid number: radix 1 is out of range
clojure.core=> -36razxyu
-18473142
clojure.core=> -30razxyu
Read error (0 - 9): invalid number: char zxyu are invalid for radix 30
clojure.core=> 37rzb
Read error (0 - 5): invalid number: radix 37 is out of range
clojure.core=> 36r4.5
Read error (0 - 6): invalid number: char . are invalid for radix 36
clojure.core=> 36r-4.5
Read error (0 - 7): invalid number: char -. are invalid for radix 36
clojure.core=> 3e-4
0.0003
clojure.core=> 16r3e4
996
clojure.core=>  36r3e4
4396
clojure.core=> 3e2/3
Read error (0 - 3): invalid ratio
clojure.core=> 7r3e4
Read error (0 - 5): invalid number: char e are invalid for radix 7
clojure.core=>  0xe
14
clojure.core=> 16re3
227
clojure.core=> 16r3e
62
clojure.core=> 0x3e
62
clojure.core=>  0x3e3
995
clojure.core=> 16r3e3
995

compiler+runtime/src/cpp/jank/read/lex.cpp

compiler+runtime/include/cpp/jank/read/lex.hpp

compiler+runtime/src/cpp/jank/read/lex.cpp

jeaye · 2024-12-19T19:25:10Z

compiler+runtime/include/cpp/jank/read/lex.hpp

+    /* The 'r' used in arbitrary radix (prefixed with N and then r, where N is the radix (2 <= radix <= 36); */
+    /* e.g. 2r10101 for binary, 16rebed00d for hex) */


We've kept the comment from when we had a new member here.

jeaye · 2024-12-19T19:27:23Z

compiler+runtime/src/cpp/jank/read/lex.cpp

+              contains_leading_digit = true;
+              if(auto const [f, l]{ peek(2).unwrap_or({ ' ', 1 }) }; f != ' ')
+              {
+                // auto const[n, _] {peek(3).unwrap_or({' ', 1})};


Please remove this commented code, as well as the others in the various branches below.

compiler+runtime/src/cpp/jank/read/lex.cpp

jeaye · 2024-12-19T19:38:52Z

compiler+runtime/test/cpp/jank/read/lex.cpp

+#include <iostream>
+#include <clang/Basic/Diagnostic.h>


Please put these at the top of the file. Also, the comment on line 6 is specifically for the doctest include.

we actually don't need them. removed.

jeaye · 2024-12-19T19:46:43Z

compiler+runtime/test/cpp/jank/read/lex.cpp

+      SUBCASE("Invalid arbitrary radix")
+      {
+        processor p{ "2r3 35rz 8re71 19r-ghi 2r 16r" };
+        native_vector<result<token, error>> tokens(p.begin(), p.end());
+        CHECK(tokens
+              == make_results({
+                error{  0,  3,        "invalid number: char 3 are invalid for radix 2" },
+                error{  4,  8,       "invalid number: char z are invalid for radix 35" },
+                error{  9, 14,        "invalid number: char e are invalid for radix 8" },
+                error{ 15, 22,       "invalid number: char - are invalid for radix 19" },
+                error{ 23, 25, "unexpected end of radix number, expecting more digits" },
+                error{ 26, 29, "unexpected end of radix number, expecting more digits" }
+        }));
+      }


I really like how you've captured both positive and negative tests here. Let's dig deeper into potential negative cases, since I think there are more. Off the top of my head, to get the ball rolling:

0r0 (is radix 0 supported?) -- In Clojure, no

1r1 (is radix 1 supported?) -- In Clojure, no

2048r0 (how high can radix actually go?) -- In Clojure, 36r0 is the highest

2.0r1 (what if radix is a real?)

2rr1 (what if r is specified multiple times?)

2r1r (what if r is specified at the end of a valid number?)

2r.0 (what if a decimal is specified after r?)

Please also give the same treatment to octal and hex. I can help with some test cases if you need more suggestions.

updated with more unit tests.

jeaye · 2024-12-22T02:38:35Z

compiler+runtime/test/cpp/jank/read/lex.cpp

+      SUBCASE("Arbitrary radix edge cases")
+      {
+        /* exceeds 64-bit integer max */
+        processor p{ "36r0123456789abcdefghijklmnopqrstuvwxyz" };


Seems like this should be an error, until we support big numbers. The token we create is lossy.

jeaye · 2024-12-22T02:39:39Z

compiler+runtime/test/cpp/jank/read/lex.cpp

+                { 0, 39, token_kind::integer, 9223372036854775807ll }
+        }));
+
+        processor p2{ "2r1111111111111111111111111111111111111111111" };


There's a neat edge case of a binary number with more bits than our integer. That should also be an error.

jeaye · 2024-12-22T02:43:48Z

compiler+runtime/src/cpp/jank/read/lex.cpp

@@ -313,12 +325,52 @@ namespace jank::read

    static native_bool is_symbol_char(char32_t const c)
    {
-      return !std::iswspace(c) && !is_special_char(c)
+      return !std::iswspace(safe_cast_char32_t_to_int(c)) && !is_special_char(c)


I don't think this is what we want. What we want is this:

We have a char32_t and we want to check if it's a space (true or false)

With this change, we have this:

We have a char32_t and we want to check if it's a space (true or false), but we throw an exception if it's larger than an int

Instead, let's just provide our own iswspace which takes a char32_t. The docs for it are here: https://en.cppreference.com/w/cpp/string/wide/iswspace Looks like a simple switch with 6 cases should work.

jeaye · 2024-12-22T02:45:46Z

compiler+runtime/src/cpp/jank/read/lex.cpp

+                return err(
+                  error{ token_start,
+                         pos,
+                         fmt::format("invalid number: char {} are invalid for radix {}",


Let's clarify the error message here and fix the grammar by saying invalid number: chars '{}' are invalid for radix {}.

jianlingzhong force-pushed the integer branch 2 times, most recently from 0aa17e2 to 378f19f Compare December 3, 2024 07:30

jeaye reviewed Dec 8, 2024

View reviewed changes

compiler+runtime/src/cpp/jank/read/lex.cpp Outdated Show resolved Hide resolved

jianlingzhong force-pushed the integer branch 3 times, most recently from b4f0715 to 3238c4c Compare December 9, 2024 07:11

jianlingzhong requested a review from jeaye December 9, 2024 07:12

jianlingzhong force-pushed the integer branch from 3238c4c to 14d4863 Compare December 9, 2024 07:35

jeaye reviewed Dec 10, 2024

View reviewed changes

compiler+runtime/src/cpp/jank/read/lex.cpp Outdated Show resolved Hide resolved

jeaye reviewed Dec 10, 2024

View reviewed changes

compiler+runtime/include/cpp/jank/read/lex.hpp Outdated Show resolved Hide resolved

jianlingzhong force-pushed the integer branch from 14d4863 to 3ac56e9 Compare December 10, 2024 06:39

jeaye reviewed Dec 10, 2024

View reviewed changes

compiler+runtime/include/cpp/jank/read/lex.hpp Outdated Show resolved Hide resolved

jeaye reviewed Dec 10, 2024

View reviewed changes

compiler+runtime/src/cpp/jank/read/lex.cpp Outdated Show resolved Hide resolved

jianlingzhong force-pushed the integer branch 6 times, most recently from cf7fc2d to b450179 Compare December 19, 2024 01:34

jianlingzhong requested a review from jeaye December 19, 2024 04:48

jeaye reviewed Dec 19, 2024

View reviewed changes

compiler+runtime/src/cpp/jank/read/lex.cpp Show resolved Hide resolved

jeaye reviewed Dec 19, 2024

View reviewed changes

compiler+runtime/src/cpp/jank/read/lex.cpp Show resolved Hide resolved

jeaye reviewed Dec 19, 2024

View reviewed changes

jianlingzhong force-pushed the integer branch from b450179 to 1b98551 Compare December 20, 2024 06:48

jianlingzhong requested a review from jeaye December 20, 2024 06:53

jianlingzhong force-pushed the integer branch from 1b98551 to b747cde Compare December 20, 2024 07:00

Support octal, hex, and arbitrary radix numbers

3cb967c

jianlingzhong force-pushed the integer branch from b747cde to 3cb967c Compare December 20, 2024 21:55

jeaye reviewed Dec 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support octal, hex, and arbitrary radix numbers #140

Support octal, hex, and arbitrary radix numbers #140

jianlingzhong commented Dec 3, 2024

jeaye Dec 19, 2024

jeaye Dec 19, 2024

jeaye Dec 19, 2024

jianlingzhong Dec 20, 2024

jeaye Dec 19, 2024

jeaye Dec 19, 2024

jianlingzhong Dec 20, 2024

jeaye Dec 22, 2024

jeaye Dec 22, 2024

jeaye Dec 22, 2024

jeaye Dec 22, 2024

		/* The 'r' used in arbitrary radix (prefixed with N and then r, where N is the radix (2 <= radix <= 36); */
		/* e.g. 2r10101 for binary, 16rebed00d for hex) */

		#include <iostream>
		#include <clang/Basic/Diagnostic.h>

Support octal, hex, and arbitrary radix numbers #140

Are you sure you want to change the base?

Support octal, hex, and arbitrary radix numbers #140

Conversation

jianlingzhong commented Dec 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment