Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertSmallDataFrameEquality with ignoreNullable set to true isn't working #118

Open
labbedaine opened this issue Jul 5, 2023 · 1 comment

Comments

@labbedaine
Copy link

labbedaine commented Jul 5, 2023

Hello. I have a question regarding the usage of the ignoreNullable flag when a Dataframe is not created by createDF. The following test works properly:

test("IgnoreNullable") {
    val df1 =
      spark.createDF(
        List(("Hello, world!")),
        List(("Test", StringType, false))
      )

    val df2 =
      spark.createDF(
        List(("Hello, world!")),
        List(("Test", StringType, true))
      )

    assertSmallDataFrameEquality(df1, df2, ignoreNullable = true)
  }

However when it's time to compare a Dataframe produced by production code (ex.: .transform) with an expected Dataframe created with createdDF, the ignoreNullable is ignored and then the library throws an error on the schema.


test("IgnoreNullable: Not working") {
    Given("")
    //-- NOOP

    When("")
    val actualDF = spark.table("MyTable").transform(ApplyBuisnessLogic())

    Then("")
    val expectedDF =
      spark.createDF(
        List(("Hello, world!")),
        List(("Test", StringType, true))
      )
      
    assertSmallDataFrameEquality(actualDF, expectedDF, ignoreNullable = true)
  }

Is it a bug or simply me not able to use the flag properly? I am using v.1.3.0

Thank you!

@scheleaap
Copy link

scheleaap commented Apr 3, 2024

Are you sure there isn't a datatype difference somewhere? The way the output is formatted can be misleading and it has led me to mistakenly believe ignoreNullable doesn't work correctly several times.

In the example below, several rows are marked red because there is a difference. However, not all differences cause the test to fail. For example transactionVersion (line 3) is colored red because the nullability is different. If you step through the code, you'll see that that doesn't cause the test to fail. In reality, it's line 17 where the field name and type are different that cause the test to fail.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants