Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce custom string literal syntax? #328

Open
adam-antonik opened this issue Jan 27, 2020 · 12 comments
Open

Introduce custom string literal syntax? #328

adam-antonik opened this issue Jan 27, 2020 · 12 comments

Comments

@adam-antonik
Copy link
Contributor

Firstly, Hobbes is great, thanks.
Working with hobbes in an interactive environment, Unqualifiers are the natural way to load data, but typing

stuff = (readCsv :: (CSV "file name.csv" _) => _)

or whatever repeatedly feels like too much boilerplate. I'd like to suggest the ability to register custom literal prefixes to allow one to write

stuff = csv:"file name.csv"

and have that be interpreted as the first expression. Having hacked this in locally, it does seem to work rather nicely. It would also, for example. allow printf or python style format strings

:t printf:"%s %010d"
([char] * double) -> [char]
printf:"%s %010d"("hi", 123)
hi 0000000123

@kthielen
Copy link
Contributor

kthielen commented Jan 27, 2020

I like your idea, and I agree with you that this syntax is awkward. I think that most people want to be able to avoid talking about types if possible (hence the popularity of type inference), and in this case it definitely seems possible (especially with the _ _) => _ that reads like "blah blah blah"). Avoiding boilerplate is really important, and this particular pattern comes up a lot (loading structured data files and making connections to remote processes are also examples in common use).

I'm not sure if you chose the printf example deliberately for this purpose (it's a great example), but it's one of the common arguments for dependent types, like in Cayenne (see the example here):
https://en.wikipedia.org/wiki/Cayenne_(programming_language)

Basically, the syntax that you describe works well when the argument(s) are "clearly constant", and by making it a syntax distinguished from ordinary function calls you might avoid the issue of dependent types because you could justify having some kind of "illegal argument: not a constant" error message (or even a vanilla parse error).

When arguments aren't "clearly constant" as e.g. variables determined later then (if we were very principled about it) they would create a new "stage" in compilation. We stop talking about "parse-time" versus "type-inference-time" versus "compile-time" versus "run-time" and instead start talking about "N-time" where N is one of a (finite?) set of stages where we might need the full compiler (which is fine, this project is intentionally an embedded JIT compiler for this reason).

Does that make sense? So if you don't know the format string to printf up front but instead read it from a file, then you won't know what the type should be until you've got the file to read (which you might deliberately want to stage later than "compile-time").

Sorry if this is more nuance than you wanted to think about, and if your proposal is to support just a syntax for these macros on constant strings that's totally fine and maybe we should do that (can't hurt, it's an easy special case in a dependently-typed setting anyway).

But I wanted to give you an idea of some of the thinking I've done in this general area because I'm curious to know what you think of it. Dependent types can be a little "mind bending" for some people (at least, they have been for me) but there are a lot of useful things that you can do with them. Actually that's an understatement, you can get to a foundation for all mathematics this way (https://en.wikipedia.org/wiki/Homotopy_type_theory) but it's also an active area of research.

What do you think? Do we take the blue pill and have a special syntax accepting just string literals, or do we take the red pill? :)

@adam-antonik
Copy link
Contributor Author

No, printf wasn't a co-incidence, I'm fully on board the dependent-type band-wagon, and I see from the code comments that is where you think this needs to go too. However, I don't know when they're going to land in hobbes, certainly I wasn't going to be able to make that happen in an evening. Adding an extra lexer token and a little logic to deal with it though does greatly improve the ergonomics from my point of view, and it seems like a reasonable price to pay in the mean-time.

@kthielen
Copy link
Contributor

You said that you made a prototype. Is it something that could be made into a PR? What does it look like to introduce one of these macros? Is it something that gets hard-coded in the parser?

@adam-antonik
Copy link
Contributor Author

I did say hack. I haven't got a full implementation for a pr, just a lexer rule introducing a new token

[a-zA-Z_][a-zA-Z0-9_]*/:\" { SAVE_STR;  return TSTRINGLIT; }

and in yacc under l6expr

| "stringlit" ":" "stringV" { if (Expr * decoded = strlit(*$1, *$3, m(@1, @3))) {$$ = decoded;} else {yyerror("unknown literal");};}

where strlit is a function that hard-codes a handful of prefix to either return something like

ConstraintPtr c(new Constraint("CSV", list(TString::make(val), TVar::make("q"))));
return new Assump(ExprPtr(varCtorFn("readCsv", la)), QualTypePtr(new QualType(list(c), TVar::make("q"))),la);

or a lovely nullptr if not there. I think it should be possible to replace this with a call to a single unqualifer of the form (literal :: (StringLiteral _) => _) upon which one can register maps from to a pair of (variable name, *Unqualifier), which would then forward its methods to the registered handlers, but I haven't got around to trying to do it properly yet.

@kthielen
Copy link
Contributor

Fair enough, we could probably make that a little bit more convenient for users (to avoid hacking on the parser each time). I agree with you that this is worthwhile. If you want to work it up to a PR, that'd be awesome. If you'd rather not, I could take a look.

@adam-antonik
Copy link
Contributor Author

I'd need to get sign-off for a PR, but I can have a look at that.

@adam-antonik
Copy link
Contributor Author

BTW, the ww project looks very nice, although I'm still undecided as to whether future-me truly trusts past-me with an extensible grammar. On the other hand I do appreciate how hackable the hobbes code base is, for instance I've also redefined "if" as

class If c i e r | c i e -> r where
    (if) :: (c * i * e) -> r

Which wasn't difficult, but would have been cleaner with an extensible (and rebindable) grammar.

@kthielen
Copy link
Contributor

Yes, that same uncertainty has made it hard to convince others that this way would be worthwhile (https://github.com/kthielen/ww for anyone reading this thread outside the context of the parser issue, closed now). Also the eventual prospect of IDE support is much more complicated that way (as far as I know, all common IDE tooling assumes a fixed grammar).

Your type class for if is interesting (I guess also complicated by eager evaluation forcing both branches, unlike the lazy setting in Haskell).

Out of curiosity, if you don't mind, did you set up this If c i e r constraint to allow i and e to convert into r? I've had the thought before that we could replace unification errors with conversion into a common type, to transparently handle this sort of thing (some PLs have limited support for this idea like e.g. C/C++). But I worried that this might make error messages even more inscrutable than they are already (several people have complained about the difficulty of deciphering error messages).

But there's also a lot of value in automated conversion especially for structural types. I set up the Convert class to represent this directed conversion. We have used it for automating message versioning decisions (as e.g. a sending process sending {x:int,y:int,z:int} to a process that expects {z:long,x:double} can have such a conversion function automatically derived). It's just a little annoying to have to explicitly write convert(x) to get this behavior where we'd otherwise rather just write x.

Anyway, just another dimension in PL design space I guess. My intuition is to make these things we've discussed optional but there are also folks who are more conservative.

@adam-antonik
Copy link
Contributor Author

I wanted the If typeclass up to allow broadcasting of pointwise expressions over collections of points, but one could equally use it to allow conversions. I wouldn't recommend it at the moment as I haven't worked out how to get

if true then 1 else newPrim()

to work, for the compiler to notice that "If bool int b a" can be unified with "If bool a a a" for which an instance (the original if op) exists.

@kthielen
Copy link
Contributor

It sounds like you've got an unbound type variable at the type inference step. In your example, since you've changed the meaning of "if" so that the two branches don't necessarily need to have the same type, then 1 isn't enough to determine the type of newPrim() (ie: the 1 is inferred to have type int but the newPrim() is inferred to have type a).

So you're in that state, and then you go into type class instance selection -- this is where we might get unstuck, since refine in constraints can make decisions for type inference. As implemented here, functional dependencies (fundeps) are the way that type class constraints will decide how to make decisions for type inference. Your fundep says c i e -> r which means "for unique types c, i, and e, there is one unique r". In your expression if true then 1 else newPrim() you do have unique c and i but not e and so this constraint can't make progress.

You might say "but I do have an instance If bool a a a that should say that r = e = i". But unfortunately that's actually not what the instance says (given your fundep). It actually says "If e = i then r = e (or equally r = i).

Does that make sense? Basically, it's just the fundeps that say how the class can advance type inference. Patterns in instance heads are "one way" (not unifying).

I'm not trying to say how it must be, just how it is. This is also a reason I made the Unqualifier interface distinct from type classes (ie: type classes are just one kind of Unqualifier). It's possible to define alternate ways to resolve constraints other than type classes.

FWIW, on the conversion point and how it might be related to what you're trying to do, maybe this helps:

$ cat q.hob
instance Convert a [:a|n:] where
    convert p = [p | _ <- newPrim()::r]::r
                                                                                                                                                                        
$ hi -s q.hob
> t = newPrim()::[:int|42:]
> selectInto([1..100], t)
> t
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42]
> if true then 42 else t
stdin:1,1-2: Cannot unify types: int != [:int|42L:]
1 if true then 42 else t
> if true then convert(42) else t
[42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42]
>

Here the convert is explicit to make it work, but you can see that I just put it where the unification failure was. We might possibly say that rather than raising an error where there's a unification failure, we could insert a convert patch. If we did that, then this if true then 42 else t expression would work (given the Convert t [:t|n:] definition). But it'd require some experimentation/research, we wouldn't want to produce less comprehensible error messages than the ones we started with.

@adam-antonik
Copy link
Contributor Author

I withdraw this proposal, I should be using quoted strings as arguments in my own unqualifiers to get something simple, see printf example in #362, (and and with that same PR I can write a function that takes a quoted string and wires up e.g. inputFile if I so wanted.)

@kthielen
Copy link
Contributor

Let's keep it open as a use-case for that PR then, good idea. There's a small amount of boilerplate around things like inputFile and connection, and it'd be great to eliminate that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants