Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor lexer and parser classes #172

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
0f08898
Refactor lexer and parser classes
Shamantak12 Sep 21, 2024
769d912
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 21, 2024
1a4eaef
Auto stash before merge of "modified" and "origin/modified"
Shamantak12 Sep 21, 2024
c33a880
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 21, 2024
9a79f0c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 21, 2024
8d7457b
Merge branch 'modified' of https://github.com/Shamantak12/crosstl int…
Shamantak12 Sep 23, 2024
ad3023e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 23, 2024
8af3a13
Merge branch 'modified' of https://github.com/Shamantak12/crosstl int…
Shamantak12 Sep 23, 2024
4444591
Merge branch 'modified' of https://github.com/Shamantak12/crosstl int…
Shamantak12 Sep 23, 2024
61a83d2
Merge branch 'modified' of https://github.com/Shamantak12/crosstl int…
Shamantak12 Sep 23, 2024
1b0f525
Fix import formatting in test_lexer.py
Shamantak12 Sep 23, 2024
f34fed1
Merge branch 'modified' of https://github.com/Shamantak12/crosstl int…
Shamantak12 Sep 25, 2024
7de3299
new modified
Shamantak12 Sep 25, 2024
ccf0f88
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 25, 2024
ecf9dec
new
Shamantak12 Sep 25, 2024
425ed3c
Merge branch 'modified' of https://github.com/Shamantak12/crosstl int…
Shamantak12 Sep 25, 2024
7563f30
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 25, 2024
5df086d
Refactor lexer.py to improve tokenization efficiency
Shamantak12 Sep 25, 2024
48d251c
Merge branch 'modified' of https://github.com/Shamantak12/crosstl int…
Shamantak12 Sep 25, 2024
531bebb
modified: crosstl/src/translator/lexer.py
Shamantak12 Oct 3, 2024
ad69379
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 3, 2024
328635c
Refactor lexer.py to improve tokenization efficiency
Shamantak12 Oct 3, 2024
03c5c4b
Merge branch 'modified' of https://github.com/Shamantak12/crosstl int…
Shamantak12 Oct 3, 2024
bbde10e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 3, 2024
3bfc2cf
Refactor lexer.py to improve tokenization efficiency
Shamantak12 Oct 5, 2024
5c34582
Merge branch 'modified' of https://github.com/Shamantak12/crosstl int…
Shamantak12 Oct 5, 2024
796e2a6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 5, 2024
f181d57
Refactor lexer.py to improve tokenization efficiency
Shamantak12 Oct 19, 2024
0dcc328
Merge branch 'modified' of https://github.com/Shamantak12/crosstl int…
Shamantak12 Oct 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion crosstl/src/translator/lexer.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,4 +133,4 @@ def tokenize(self):
f"Illegal character '{unmatched_char}' at position {pos}\n{highlighted_code}"
)

self.tokens.append(("EOF", None)) # End of file token
self.tokens.append(("EOF", None))
60 changes: 60 additions & 0 deletions tests/test_translator/test_lexer.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,66 @@ def tokenize_code(code: str) -> List:
return lexer.tokens


class Lexer:
def __init__(self, input_code):
self.input_code = input_code
self.tokens = []
self.tokenize()

def tokenize(self):
pos = 0
while pos < len(self.input_code):
match = None
for token_spec in token_specification:
pattern, tag = token_spec
regex = re.compile(pattern)
match = regex.match(self.input_code, pos)
if match:
token = (tag, match.group(0))
self.tokens.append(token)
pos = match.end(0)
break
if not match:
unmatched_char = self.input_code[pos]
highlighted_code = (
self.input_code[:pos]
+ "["
+ self.input_code[pos]
+ "]"
+ self.input_code[pos + 1 :]
)
raise SyntaxError(
f"Illegal character '{unmatched_char}' at position {pos}\n{highlighted_code}"
)
self.tokens.append(("EOF", None))


# Example token definitions (including the provided excerpt)
token_specification = [
("WHITESPACE", r"\s+"),
("IF", r"\bif\b"),
("ELSE", r"\belse\b"),
("FOR", r"\bfor\b"),
("RETURN", r"\breturn\b"),
("BITWISE_SHIFT_LEFT", r"<<"),
("BITWISE_SHIFT_RIGHT", r">>"),
("LESS_EQUAL", r"<="),
("GREATER_EQUAL", r">="),
("GREATER_THAN", r">"),
("LESS_THAN", r"<"),
("INCREMENT", r"\+\+"),
("DECREMENT", r"--"),
("EQUAL", r"=="),
("NOT_EQUAL", r"!="),
("ASSIGN_AND", r"&="),
("ASSIGN_OR", r"\|="),
("ASSIGN_XOR", r"\^="),
("LOGICAL_AND", r"&&"),
("LOGICAL_OR", r"\|\|"),
# Add other token definitions here
]


def test_input_output_tokenization():
code = """
input vec3 position;
Expand Down