v1 milestones & release #2

MrPowers · 2021-04-03T22:08:17Z

I'm really excited about this project!

Think about the features that'll be included in the "initial public release". Once all the initial features are built, ping me, and I'll make a commit to make a compelling sell in the project README.

Once the README is updated, I'll start marketing the project to try to get users and feedback on the code.

Sounds like a good plan? I'm definitely interested in seeing this project grow & get a lot of users!

alfonsorr · 2021-04-05T12:12:49Z

Hey @MrPowers, im very happy for your interest 😄

Im still refactoring the code to make a first usable version. I spect to have all types except structs included, and the idea is to have basic functionality for map functions (withColumn, filter, select drop etc) but typed.

The idea now is to make a syntax very close to the spark API, an example of would be something like:

df.withColumn("new_col", getInt("c1") + getInt("c2"))
df.withColumn("new_col", getInt("c1") + getTimestamp("c2")) //wont compile

any error in runtime will be accumulated, so if c1 and c2 are not integer, it will be throwed in a single error saying that both columns selected are invalid.

I will try to have some basic functionality in the following days to show you.

MrPowers · 2021-04-05T15:15:56Z

@alfonsorr - that sounds like a good first implementation. I like the idea of making this lib a "minimalistic, performant way to write typesafe Spark code". It can have these selling points:

it allows for typesafe programming with compile-time checks
it's just as performant as regular Spark DataFrames (unlike Datasets)
it can be used in conjunction with "regular Spark code"

Bringing the benefits of typesafe programming to the Spark-Scala community will be a huge benefit!

Let me know when you're finished with the basic prototype and I'll try it out. Not rush. Definitely excited!

jserranohidalgo · 2021-04-06T11:02:25Z

Awesome selling points :)

My only possible caveat is that the message sounds too strong. I mean, DataFrames are dynamically typed, and this won't be avoided by doric expressions: compile-time checks may succeed and we may still get typing errors at runtime, right? Things might be different if we could start from some kind of ValidatedDataFrame[T]. In that case, dynamic typing errors could also happen, of course, but they would be captured in advance. We may then say that execution is guaranteed to be successful provided that the validation checks on the accompanying DataFrame succeed. Not sure at all if this kind of ValidatedDataFrame is useful at all, though. Maybe, it would be enough to constrain the scope of type-safety in a footnote to well-formed column Spark expressions or something like that, leaving your selling points intact.

Thanks for your involvement, @MrPowers!

alfonsorr · 2021-04-15T21:49:47Z

I've opened a few issues with elements pending for a first release and created project in github to keep track of them.

MrPowers · 2021-04-16T11:20:32Z

@alfonsorr - I checked the issues and the project and it looks like you're making great progress. Ping me when the v1 stuff is done, so I can try out the project and provide feedback. Can't wait!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1 milestones & release #2

v1 milestones & release #2

MrPowers commented Apr 3, 2021

alfonsorr commented Apr 5, 2021

MrPowers commented Apr 5, 2021

jserranohidalgo commented Apr 6, 2021

alfonsorr commented Apr 15, 2021

MrPowers commented Apr 16, 2021

v1 milestones & release #2

v1 milestones & release #2

Comments

MrPowers commented Apr 3, 2021

alfonsorr commented Apr 5, 2021

MrPowers commented Apr 5, 2021

jserranohidalgo commented Apr 6, 2021

alfonsorr commented Apr 15, 2021

MrPowers commented Apr 16, 2021