Rewrote the LaTeX parser to be more flexible #10

Happypig375 · 2019-02-25T10:50:33Z

No description provided.

Happypig375 · 2019-02-28T00:07:31Z

If no more review comments, then I'm merging.

MathDisplay/DataTypes/AliasDictionary.fs

charlesroddie · 2019-02-28T00:39:48Z

Only had time for a very brief review. Can look again this weekend.

charlesroddie · 2019-02-28T00:44:58Z

This comment here as it may get some discussion:
toAtom gives a Result<MathAtom, string>
Should it give this or a MathAtom option?

The former will have more performant processing of error cases, since they are not exceptional. But it will need more work to maintain, as results are harder to compose. It will also be slower in the happy case, especially if computation expressions are needed to deal with the complexity of Results.

The alternative is an option, where failure to find things gives a None, and where invalid syntax gives an exception.

I think invalid syntax can be treated as exceptional, so we can make our lives easier by avoiding Result. Any other opinions?

charlesroddie · 2019-02-28T01:07:42Z

In general it's good that you have got something working but a lot of simplification and explanation will be needed. I wasn't able to discover the strategy used for parsing LaTeX because it was hidden behind functions with highly obscure type signatures.

Names should be informative and their should be xml docs but it's only possible to name and describe objects once they have been simplified enough to be comprehensible.

I will try to do some constructive work in this direction this weekend.

charlesroddie · 2019-02-28T01:11:20Z

Possibly useful?
https://github.com/charlesroddie/StyleGuide

Happypig375 · 2019-02-28T11:14:34Z

But it will need more work to maintain, as results are harder to compose.

Result.map and Result.bind can be used.

It will also be slower in the happy case, especially if computation expressions are needed to deal with the complexity of Results.

I read about the slowness of computation expressions. I wouldn't use them for Results.

In general it's good that you have got something working but a lot of simplification and explanation will be needed. I wasn't able to discover the strategy used for parsing LaTeX because it was hidden behind functions with highly obscure type signatures.

Will simplify the code a bit.

MathDisplay/MathAtom/LaTeX.fs

charlesroddie · 2019-03-03T02:23:00Z

I have thought more about Results and also considered a simple issue in another repo.
The alternatives are:

Return results locally during parsing, catching errors.
Throwing exceptions for a parsing failure and catching them in the main parse function (generating a None or a Result.Error).

I believe exceptions is better, because the inner functions are simplified to have types which return 'a instead of types which return Option<'a> or Result<'a,'e>. Of course in .Net exceptions should only be used for exceptional data.

A parsing function routine which for example looks for a certain pattern should return None if it fails to find the pattern; these are not exceptional.

A parsing function which is used when a certain pattern is known to be implied should throw an exception on failure. For example after seeing \color the algorithm is allowed to find { and } and fail if it doesn't find them.

The global algorithm can catch failures and either return Error("invalid LaTeX") or fail with an InvalidLatexException.

Delimiter to DU

charlesroddie

Removed

MathDisplay/DataTypes/AliasDictionary.fs

charlesroddie · 2019-02-28T00:13:00Z

MathDisplay/DataTypes/ListExtensions.fs

+    loop []
+
+  let mapFoldResult f (state:'State) (list: 'T list) : Result<'Result list * 'State, 'Error> =


Needs XML comment.

charlesroddie · 2019-02-28T00:14:06Z

MathDisplay/MathAtom/LaTeX.fs

    }
+
+
+[<Struct>] type internal Read = Until of char | UntilRightDelimiter | OneArgument | All


XML doc for this type?

MathDisplay/DataTypes/AliasDictionary.fs

MathDisplay/MathAtom/LaTeX.fs

charlesroddie · 2019-02-28T00:47:33Z

MathDisplay/MathAtom/LaTeX_Old.fs

+    let (|CommandName|) cs = match cs with
+                             | c::cs when not <| isAlphabet c -> string c, cs
+                             | PartitionAlphabets (ab, cs) -> System.String.Concat ab, skipSpaces cs
+    let rec read tableEnv until cs list =


The type signature of read is very complex. From the name and the type signature (and failing that an xml doc) you should be able to work our what any function does.

MathDisplay/MathAtom/LaTeX_Old.fs

charlesroddie · 2019-02-28T00:50:52Z

MathDisplay/MathAtom/LaTeX_Old.fs

+                             | PartitionAlphabets (ab, cs) -> System.String.Concat ab, skipSpaces cs
+    let rec read tableEnv until cs list =
+        ///Reads a delimiter
+        let readDelimiter cs =


This function is almost comprehensible.
Why does it return a Delimiter * char list instead of just a Delimiter?
If it returned a Delimiter then the type signature and name would indicate what it does.

It takes a list of to-be-read characters, slices out a delimiter, returns it along with the to-be-read characters after the delimiter.

charlesroddie · 2019-02-28T00:55:52Z

MathDisplay/MathAtom/LaTeX.fs

+                            | '['::cs -> readArgUntil' readOptionalArgUntil id (Until ']') argDict argDict.AddOptional tableEnv cs
+                            | _ -> Ok (tableEnv, ValueNone, cs)
+
+                    let rec replaceArguments atom state =


replaceArguments is obviously a key function but has an obfuscated type signature. The same goes for the inner functions replace1 and replace2.

MathDisplay.Tests.Benchmarks/AliasMap.fs

charlesroddie · 2019-03-03T13:00:34Z

I've resolved the AliasDictionary issues (simplification and documentation).
Can move to LaTeX.fs now. Would appreciate documentation or simplification explanation of the methods so I can get started with this @Happypig375 . Thanks.

Happypig375 · 2019-03-03T13:16:02Z

Now AliasDictionary, as a collection type, does not implement normal collection interfaces. Like MathCollection, it will surprise the user when trying to use AliasDictionary in functions taking e.g. IEnumerable<T>.

charlesroddie · 2019-03-03T13:20:16Z

Why does a user need to interact with an alias dictionary? If there is a need for extension in future it can be done but it is comprehensible now, as simple as possible for what it is doing, documented, and definitely provides the functionality we are using at the moment (or else the project wouldn't compile).

Happypig375 · 2019-03-03T13:25:15Z

For example, LaTeXDefaultMaps.Commands.Where(command => command.Key.StartsWith("a")) does not compile because AliasDictionary does not implement IEnumerable<T>.

charlesroddie · 2019-03-03T13:39:46Z

I removed as much as possible while keeping the main project buildable. Ok do do one or two extra things like expose the keys or remove private on the dictionaries as needed in future. But we shouldn't add them until then. It's important to keep good discipline. I think the file went from 100 lines to 50. Maybe in future it goes back up to 55 if we need something extra.

Happypig375 · 2019-03-03T13:42:39Z

While I agree on YANGI in general, I think all collection types should at least act like collections, have collection methods, and implement collection interfaces.

Also, why make Delimiter a DU which limits its possible values? It also kind of repeats information in LaTeXDefaultMaps.Delimiters.

charlesroddie · 2019-03-03T14:49:47Z

I think all collection types should at least act like collections, have collection methods, and implement collection interfaces

Not all interfaces; otherwise it gets too large to maintain (as the previous code was). We should consider this an internal type, and add these things as needed. The correct API if we do add them is not obvious because there are two dictionaries. For example member __.Aliases = d.Keys is probably OK (I have added this back as it should be useful for parsing) but should we have member __.Values = d.Values or member __.Values = e.Keys; these actually have different, and rather obscure, types. The original code always chose the dictionary from aliases to values when doing the various additional properties but that is not a clear decision. Maybe casting both dictionaries to IReadOnlyDictionary might give us everything we could possibly need in future. But no need to spend time on that (writing this conversion, code review, documentation) at the moment. I have limited time for code review and documentation and would rather spend this on important stuff like the parser than on methods that don't end up getting used.

Also, why make Delimiter a DU which limits its possible values

There are a limited number of delimiters so representing these as a limited number of values is better. Easier to understand code with a finite number of values than infinite number of possible values of which most are invalid. There is the option of adding an Delimiter.UserDefined of string case if absolutely necessary (much) later.

charlesroddie · 2019-03-03T16:57:57Z

You had enumerated all of the LaTeX delimiters:

charlesroddie · 2019-03-03T18:07:53Z

This idea of yours is interesting:

let Commands =
    [   ["frac"], Fraction (Argument 1, Argument 2, Center, Center, ValueNone)
        ["sqrt"], Radical (Argument_Optional (1, Row []), Argument 1)
    |> AliasDictionary

While we may not be able to do exactly this (since it requires usage of an undisplayable internal Latex-specific Argument atom), we could adjust it like this:

[<Struct>][<RequireQualifiedAccess>]
type Command =
    | Frac
    /// A square root or nth root
    | Sqrt
    | Color

let Commands =
    [   ["frac"], Command.Frac
        ["sqrt"], Command.Sqrt
        ["color"], Command.Color]
    |> AliasDictionary

let parseColor:string -> System.Drawing.Color = failwith "not implemented"

let rec parseCommand(command:Command,
                     optionals:string[],
                     args:string[],
                     parseAtom:string->MathAtom) =
    let require (n:int) =
        if args.Length <> n
        then failwith <| (string n) + " inputs required by " + (Commands.GetPrimaryAlias command)
    let maxOptional (n:int) =
        if optionals.Length > n
        then failwith <| "At most " + (string n) + " optional inputs allowed in " + (Commands.GetPrimaryAlias command)
    let getOptional n =
        if n < optionals.Length then Some(optionals.[n]) else None
    match command with
    | Command.Frac ->
        require 2
        Fraction(parseAtom args.[0], parseAtom args.[1], ValueNone, ValueNone, ValueNone)
    | Command.Sqrt ->
        require 1
        maxOptional 1
        Radical(getOptional 0 |> Option.map parseAtom, parseAtom args.[0])
    | Command.Color ->
        require 2
        Colored(parseAtom args.[1], parseColor args.[0])

This also works with things to be parsed that are not Atoms (e.g. Color as shown).

Happypig375 · 2019-03-04T02:12:45Z

It gets complicated when we are going to implement "\binom" or "\TeX" which give atoms with more complicated structures. These two commands are both present in CSharpMath.

["binom"], Delimited (Delimiter.LeftBracket, Frac (Argument 1, Argument 2, Center, Center, ValueNone), Delimiter.RightBracket)
["TeX"], Styled (Row [Ordinary "T", Space -.1667, Offsetted (Ordinary "E", 0, -4.5), Space -.125, Ordinary "X"], FontStyles.Roman)

Happypig375 · 2019-03-04T10:53:04Z

Regarding Result vs Exception:

MathAtom/MathDisplay.Tests.Benchmarks/Result_ExceptionBenchmark.fs

Lines 43 to 48 in a0eac56

    
                     Method |         Mean |       Error |      StdDev | 
        
           ---------------- |-------------:|------------:|------------:| 
        
                  Ok_Result |     46.72 ns |   0.2317 ns |   0.2167 ns | 
        
               Ok_Exception |     17.36 ns |   0.1735 ns |   0.1623 ns | 
        
               Error_Result |     37.50 ns |   0.1186 ns |   0.1110 ns | 
        
            Error_Exception | 39,582.16 ns | 204.3940 ns | 181.1899 ns |

While Result is 2.7 times slower than Exception when the result is Ok, Exception is 1055.5 times slower than Result when the result is Error.

Happypig375 · 2019-03-04T10:54:53Z

This will probably be very noticeable when using the Example app which displays user-inputted valid/invalid LaTeX real-time.

charlesroddie · 2019-03-04T14:38:05Z

It gets complicated when we are going to implement "\binom" or "\TeX" which give atoms with more complicated structures. These two commands are both present in CSharpMath.

Those examples easy with the draft scheme I gave.

    | Command.Binom ->
        require 2
        Delimited (Delimiter.LeftBracket, Frac (parseAtom args.[0], parseAtom args.[1], Center, Center, ValueNone), Delimiter.RightBracket)
    | Command.TeX ->
        require 0
        Styled (Row [Ordinary "T", Space -.1667, Offsetted (Ordinary "E", 0, -4.5), Space -.125, Ordinary "X"], FontStyles.Roman)

require may not be the best way to do it. The best structure may depend on the answer to an important question:

Is it possible to tokenize a LaTeX string (split it into a list of commands, delimiters, ordinaries...) before implementing any logic?

Happypig375 · 2019-03-05T11:18:00Z

Now we need to define new Commands if we are going to add new ones. That goes against the point of a Commands dictionary, which is to reduce duplicate information.

Is it possible to tokenize a LaTeX string (split it into a list of commands, delimiters, ordinaries...) before implementing any logic?

That is the purpose of LaTeX.ToAtom instead of drawing them directly.

charlesroddie · 2019-03-05T12:15:37Z

That goes against the point of a Commands dictionary
A commands dictionary containing code is no longer a two-way dictionary. (The current code in the PR at this point has a two way dictionary but the second direction won't be usable or useful.)

The purpose of AliasDictionary is to convert multiple alias strings into the things they represent and to give the primary alias of a represented object.

A one-way dictionary is OK from commands to behavior, which is defined on all commands, is better known/represented as a function.

That is the purpose of LaTeX.ToAtom

This does more than lexing.

Happypig375 · 2019-03-06T12:07:24Z

A one-way dictionary is OK from commands to behavior, which is defined on all commands, is better known/represented as a function.

What I'm trying to recreate with Commands is two-way pattern matching.

This does more than lexing.

It's just some sort of pattern matching that converts between LaTeX strings and MathAtoms.

charlesroddie · 2019-03-06T13:03:20Z

Some miscommunication. I am asking whether lexing of LaTeX is possible.

Happypig375 · 2019-03-06T15:19:45Z

Lexing itself is possible, just that I don't see a reason for the extra step.

charlesroddie · 2019-03-09T23:12:57Z

Two way pattern matching: I'm skeptical that will work. I think we need to get this repo into a clean state with comprehensible code and any attempts like that can branch off as experiments. Don't want to tackle extremely difficult stuff before we have good basics in place.

charlesroddie · 2019-03-09T23:20:20Z

I think I will need to implement a simple parser.

charlesroddie · 2019-03-09T23:24:45Z

Serialization of some simple mathatoms into latex will have some utility. And some sharing of data, especially names and aliases, could be shared between parser and serializer. But it's important not to couple parsing to a complex system to create a difficult coupling between parser and serializer and mathatom itself. That will prevent getting master into any sort of reasonable shape in the near future.

charlesroddie · 2019-03-09T23:33:32Z

I would like this pr to go into master which means keeping it clean. I will write a parser but am travelling for next 2 weeks (including HK, could meet there). Anything to do with Serialization (beyond aliases and a list of commands) shouldn't go in this pr. Once we have a good pr to merge then master will be in a good state to branch off from with feature branches.

Happypig375 · 2019-03-12T16:26:12Z

When will you be in HK?

Happypig375 added 8 commits February 8, 2019 23:52

Started a LaTeX code rewrite

69bc573

Still not buildable yet...

c22da95

More arguments that can now be replaced

c92ea5b

Actual progress!!!

da0d631

Buildable again...!

37f5512

\frac23 successful

ca21278

LaTeX Rewrite - commit!

f3d011d

Remove unused functions

3d3702b

Happypig375 requested a review from charlesroddie February 27, 2019 10:54

charlesroddie reviewed Feb 28, 2019

View reviewed changes

MathDisplay/DataTypes/AliasDictionary.fs Outdated Show resolved Hide resolved

FoggyFinder reviewed Feb 28, 2019

View reviewed changes

MathDisplay/MathAtom/LaTeX.fs Outdated Show resolved Hide resolved

Happypig375 added 2 commits February 28, 2019 19:56

Simplification

3867c0a

Added \left and \right

4356592

Charles Roddie added 3 commits March 3, 2019 12:43

documented and simplified AliasDictionary

85ce106

Delimiter to DU

finish tidying AliasDictionary

bd84e27

finish tidy of AliasDictionary

9640f15

charlesroddie reviewed Mar 3, 2019

View reviewed changes

Charles Roddie added 2 commits March 3, 2019 12:56

rename collapseRow

acae974

tidy

efc8bef

add Aliases property to AliasDictionary

dc67af6

Simplified AliasDictionary

4004ec9

[WIP] Working on FromAtom

8fa8bd2

		loop []

		let mapFoldResult f (state:'State) (list: 'T list) : Result<'Result list * 'State, 'Error> =

		}


		[<Struct>] type internal Read = Until of char \| UntilRightDelimiter \| OneArgument \| All

Rewrote the LaTeX parser to be more flexible #10

Are you sure you want to change the base?

Rewrote the LaTeX parser to be more flexible #10

Conversation

Happypig375 commented Feb 25, 2019

Happypig375 commented Feb 28, 2019

charlesroddie commented Feb 28, 2019

charlesroddie commented Feb 28, 2019

charlesroddie commented Feb 28, 2019

charlesroddie commented Feb 28, 2019

Happypig375 commented Feb 28, 2019

charlesroddie commented Mar 3, 2019

charlesroddie left a comment

Choose a reason for hiding this comment

charlesroddie Feb 28, 2019

Choose a reason for hiding this comment

charlesroddie Feb 28, 2019

Choose a reason for hiding this comment

charlesroddie Feb 28, 2019

Choose a reason for hiding this comment

charlesroddie Feb 28, 2019

Choose a reason for hiding this comment

Happypig375 Mar 4, 2019

Choose a reason for hiding this comment

charlesroddie Feb 28, 2019

Choose a reason for hiding this comment

charlesroddie commented Mar 3, 2019

Happypig375 commented Mar 3, 2019

charlesroddie commented Mar 3, 2019 • edited Loading

Happypig375 commented Mar 3, 2019 • edited Loading

charlesroddie commented Mar 3, 2019 • edited Loading

Happypig375 commented Mar 3, 2019 • edited Loading

charlesroddie commented Mar 3, 2019 • edited Loading

charlesroddie commented Mar 3, 2019

charlesroddie commented Mar 3, 2019 • edited Loading

Happypig375 commented Mar 4, 2019 • edited Loading

Happypig375 commented Mar 4, 2019

Happypig375 commented Mar 4, 2019

charlesroddie commented Mar 4, 2019

Happypig375 commented Mar 5, 2019

charlesroddie commented Mar 5, 2019

Happypig375 commented Mar 6, 2019

charlesroddie commented Mar 6, 2019

Happypig375 commented Mar 6, 2019

charlesroddie commented Mar 9, 2019 • edited Loading

charlesroddie commented Mar 9, 2019

charlesroddie commented Mar 9, 2019

charlesroddie commented Mar 9, 2019

Happypig375 commented Mar 12, 2019

charlesroddie commented Mar 3, 2019 •

edited

Loading

Happypig375 commented Mar 3, 2019 •

edited

Loading

charlesroddie commented Mar 3, 2019 •

edited

Loading

Happypig375 commented Mar 3, 2019 •

edited

Loading

charlesroddie commented Mar 3, 2019 •

edited

Loading

charlesroddie commented Mar 3, 2019 •

edited

Loading

Happypig375 commented Mar 4, 2019 •

edited

Loading

charlesroddie commented Mar 9, 2019 •

edited

Loading