Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] C support #989

Closed
aaaaaa123456789 opened this issue Apr 1, 2022 · 5 comments
Closed

[Feature request] C support #989

aaaaaa123456789 opened this issue Apr 1, 2022 · 5 comments

Comments

@aaaaaa123456789
Copy link
Member

aaaaaa123456789 commented Apr 1, 2022

There's always been an unnecessary conflict between C programmers and assembly programmers in the community, mostly due to the underperformance of the former and the developer complexity of the latter. While there's little doubt that there are tasks in a GB game (or program, in general) that are more suitable for one or the other, the impossibility of mixing both in a friendly manner (particularly when considering toolchain incompatibility) has continuously led projects into making a difficult choice right at the start.
Where using small snippets of assembly code in a primarily C codebase is a need that is already catered to some extent by GBDK, the opposite is an unfulfilled niche.

Therefore, my proposal and request is to allow the direct inclusion of C code in assembly codebases in RGBASM, which the toolchain could compile suitably. Taking advantage that C and ENDC are already keywords, such a block could be introduced without much difficulty to the language:

C
unsigned max16(unsigned, unsigned);
ENDC

SECTION "Code", ROM0

max16:
  ; returns the maximum between hl and de (in hl)
  ld a, h
  cp d
  jr c, .swap
  ret nz
  ld a, l
  cp e
  ret nc
.swap
  ld h, d
  ld l, e
  ret

C
unsigned min16 (unsigned first, unsigned second) {
  return ~max16(~first, ~second);
}
ENDC

When encountering such a block, RGBASM would compile it (using some reasonable optimization configuration), producing the equivalent assembly. In a code section (ROM0/ROMX), the C code must only contain functions and read-only globals (const or arrays thereof); in a data section (WRAM0, etc.), read-only and regular globals could be defined, but not functions. Pure declarations (e.g., forwards) should be allowed anywhere, including outside of sections.

Since RGBASM already has a concept of a translation unit, such a concept would extend to the C code defined therein for purposes of scoping; likewise, any C block can use names defined in a previous block in the same translation unit. (This enables the use of headers.)

While an implementation of C in RGBASM should be a fully-compliant freestanding C17 implementation (or realistically, by the time the feature is finished, C23), focus should be placed on features that a GB/GBC game could realistically use. For instance, while there should be an implementation of floating point (float, double and long double are mandatory types, and their minimum ranges essentially require the use of true floating point to represent them; plus, the odd users using these types would expect IEEE floats), it should not be the focus of optimizations: anyone using floating point in a GBC program (for example, a scientific calculator program) expects it to be slow.

A C implementation that interoperates with assembly requires a well-defined calling convention so that both languages can call into each other. This must be documented; however, it makes little sense to make it configurable, as it would inhibit development of the optimizer (which is necessary to achieve any acceptable performance from C code). My suggestion is the following:

  • All registers are caller-save: there's enough register pressure already, and assembly-language callers can take advantage of this.
  • 16-bit arguments are passed in hl, de, bc in that order. The first 8-bit argument is passed in a; other 8-bit arguments use registers from the 16-bit list not yet accounted for. (A function taking (uint8_t, uint8_t, uint16_t) would have its arguments in a, d, hl in that order.)
  • Any leftover arguments are passed on the stack. The stack is cleaned up by the callee for non-varargs functions, so they don't need to preserve values they pop off and consume.
  • Varargs functions use the stack for all arguments, and the stack is cleaned up by the caller (because the callee cannot know the true size of the argument list: it's valid for it to not consume all of its arguments).
  • 8-bit return values go in a, 16-bit values in hl, 32-bit values in hlde. Anything wider than 32 bits (uint64_t, structs, etc.) takes a pointer to the return buffer in hl; in this case, hl is not available for arguments.

A conformant C implementation requires a preprocessor, and this case is no different. However, such a preprocessor must only be applied to C blocks. Some special cases are of interest:

  • #define for object-like macros (i.e., those defined without arguments) should use the same namespace as EQUS, for simplicity. This means that EQUS definitions from assembly would be available in C, and #define macros from C would be available in assembly. (C function-like macros have no equivalent in RGBASM, so those would only be available within C blocks. However, the implementation could require them to not match the name of an EQUS or an object-like macro. Another alternative would be to introduce function-like macros to assembly, although that might collide with other proposals.)
  • #undef would therefore behave like PURGE.
  • #include <file> should only pull from the standard library, or perhaps a user-defined inclusion directory. #include "file", if file exists, would behave exactly like INCLUDE in assembly. (As per C17, section 6.10.2, clause 3, if #include "file" fails to find that file, it should behave like #include <file>.) Including a file from C code would allow straight-up C files to exist in the codebase: as the contents of the included file are virtually transcluded at the point of inclusion, they would be included within a C block, and thus would be processed as C code.
  • #if ... #endif blocks should begin and end within the same C block. While it's entirely possible to make them interact with IF ... ENDC blocks, ENDC itself is being used as an ending marker for the C block as a whole (intentionally so, as it disallows this harmful interaction), and the contexts in which the corresponding expressions are evaluated is completely different, which makes them non-interchangeable.

The concept of a standard library is meaningful in itself. A freestanding C implementation doesn't require any functions to be defined by the standard library. (C17, section 4, clause 6.) Therefore, any functions used by C blocks would have to be implemented by the user elsewhere. (RGBASM may provide a standard library of functions in its headers; however, since there is no default location for RGBASM to place code (as RGBASM doesn't normally generate any code not submitted by the user), it would also have to provide the facilities for users to designate the location where that code would be included in the resulting binary. This is probably excessive for an initial implementation.)

There are no provisions for banking or an entry point in the previous description. A freestanding C implementation doesn't have a defined entry point; there is no special main function. Since the environment already has a well-defined entry point (at address 256), there is no need to introduce a new one.

As for banking considerations, since the primary use of the feature is defining locally-relevant blocks of C code, it should be left entirely to the user. It's entirely possible for users to define their own banking functions where needed, like so:

C
struct farptr {
  unsigned bank;
  void * address;
};

unsigned char get_far_byte(const struct farptr *);
ENDC

SECTION "ROM0 code", ROM0

get_far_byte:
  ldh a, [hCurrentBankLow]
  ld e, a
  ldh a, [hCurrentBankHigh]
  ld d, a
  push de
  ld a, [hli]
  push hl
  ld h, [hl]
  ld l, a
  call Bankswitch
  pop hl
  inc hl
  ld a, [hli]
  ld h, [hl]
  ld l, a
  ld c, [hl]
  pop hl
  call Bankswitch
  ld a, c
  ret

SECTION "ROMX code", ROMX

C
unsigned get_far_word (const struct farptr * pointer) {
  // note: really inefficient!
  return get_far_byte(pointer) |
    (get_far_byte((const struct farptr []) {{.bank = pointer -> bank, .address = (char *) pointer -> address + 1}}) << 8);
}
ENDC

I hope my exposition has been clear, and I'm sure the compelling benefits of this idea will lead to a quick implementation. I'm open for any questions you might have.

@ISSOtm
Copy link
Member

ISSOtm commented Apr 1, 2022

A few notes:

  • As I understand it, then, it should be possible to #define a macro from a C block, but then PURGE it in an assembly block. I don't think this is a good idea, as it could be confusing to the user.
  • This is sadly blocked by [Feature request] Nested label scopes #916, otherwise the ASM code won't be able to interact with nested struct/union members. :(
  • How should C and ENDC interact with equs? It'll likely be the usual: C can be equs'd, but unless the endc is part of the same, it can't be.
  • How do we want to deal with inlining? Should we provide constructs for the programmer to delineate "function" boundaries, so that ASM functions can be inlined into C code?
  • Should we use this same functionality to allow inline ASM in C blocks, or should we have another feature to make RGBASM switch back to a (restricted) ASM mode inside of a C block?

All in all, this is a resounding "yes" from me! This should finally bridge the gap we've had for so long—I'm actually flabbergasted we haven't thought of it earlier.
Another thing I really like about this feature is that it would automatically take care of #98 without any extra effort!
This also means that we can bring the power of RGBASM metaprogramming to C, which is definitely a thing that it needed.

@aaaaaa123456789
Copy link
Member Author

As I understand it, then, it should be possible to #define a macro from a C block, but then PURGE it in an assembly block. I don't think this is a good idea, as it could be confusing to the user.

This is a mirror of doing it the other way around (EQUS + #undef), which might be just as surprising. It's consistent with being able to access strings/macros defined in either language from either language. But requiring redefinitions and undefinitions to come from the same language as the original definition is also a possibility.

A more complicated case would be that of EQU definitions. These could be defined in C code, in the same way enum members are: as compile-time integral constants. The real issue is the type of such constants: intuitively the correct type would be int32_t, but that's longer than int in the GB and thus would cause some undesirable integer promotions when used in expressions. (Note that C has no way of undefining enum constants.)

This is sadly blocked by #916, otherwise the ASM code won't be able to interact with nested struct/union members. :(

I'd say that #916 is complementary to this request, not a blocker. If a way to designate the type of a symbol is added to RGBASM, then, after #916 is implemented, C structures could be used when defining a label to automatically create local members.

For example:

C
struct A {
  int first;
  unsigned second;
  long third;
};

struct B {
  int foo;
  char bar;
  char baz;
  struct A substruct;
};
ENDC

SECTION "Data", WRAM0

wSomeData:: dc struct B

This example introduces a hypothetical dc keyword (for "define C") which reserves space for a C type and introduces any members it has (if it is a struct/union type) as local labels. Under this example, after #916 is implemented, wSomeData.substruct.third would access the last member of the internal struct (the same as wSomeData + 8).

How should C and ENDC interact with equs? It'll likely be the usual: C can be equs'd, but unless the endc is part of the same, it can't be.

I'd replicate the same behavior that currently exists for MACRO/ENDM.

How do we want to deal with inlining? Should we provide constructs for the programmer to delineate "function" boundaries, so that ASM functions can be inlined into C code?
Should we use this same functionality to allow inline ASM in C blocks, or should we have another feature to make RGBASM switch back to a (restricted) ASM mode inside of a C block?

I'll address inlining in a separate message, since it's a complex topic.

@aaaaaa123456789
Copy link
Member Author

aaaaaa123456789 commented Apr 1, 2022

Inlining a pure assembly function directly into C code would be problematic, for the reasons stated above. In particular, assembly functions don't have a well-defined boundary; they also lack any formal way of specifying their inputs and outputs, and therefore cannot be safely used when interacting with C code.

The solution to this problem is to write assembly code directly within C. Of course, within these assembly blocks, the full syntax of RGBASM assembly would be available (with the exception of C blocks themselves, since nesting blocks that way is confusing and not useful). For example:

C
static inline void halt (void) {
  __asm {
    halt
  }
}
ENDC

(Note that this function is defined outside of any sections, as it is marked static inline. This should be allowed.)

Many compilers allow defining a "naked" assembly function (i.e., a function without a prologue or epilogue), but this wouldn't be necessary here, as the function can be defined entirely in assembly and simply forward-declared in C code. (See the get_far_byte example in the OP.)

@meithecatte
Copy link
Contributor

In my opinion, this is a prime example of feature creep. This functionality would best be provided by an optional macro package, there's no need to include it in RGBDS's core.

@ISSOtm
Copy link
Member

ISSOtm commented Apr 2, 2022

Oh duh, you're totally right. Marking as WONTFIX INVALID APRILFOOLS.

@ISSOtm ISSOtm closed this as completed Apr 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants