Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating synthetic data from a scan report without Field Values results in empty CSV files #420

Open
howff opened this issue Sep 13, 2024 · 1 comment
Assignees
Labels

Comments

@howff
Copy link

howff commented Sep 13, 2024

Describe the bug
Generating synthetic data from a scan report without Field Values results in empty CSV files

To Reproduce

  • Scan without Field Values ticked
  • Generate from that scan report
  • CSV files are created but completely empty (zero bytes)

Expected behavior
Something in the CSV files

@howff howff added the bug label Sep 13, 2024
@janblom
Copy link
Collaborator

janblom commented Oct 10, 2024

The code fake data generation was developed some 6 years ago, and has never left the experimental stage, at least according to the documentation

What I can tell from the code:

  • fake data generation was developed with using the existing field values in mind: values are randomly chosen from the values in the scan report
  • if field values are not in the scan report, like in your case, there is some code present that attempts to generate completely random data
  • for generating random data, the type of the fields is used, and some types, like Date and Float, are not supported
  • if the scan report contains no types for the fields (e.g. when run on csv files), no data will be generated
  • if the scan report contains a zero (0) value in the the "N rows checked" column(s) in the "Table Overview" sheet, no data will be generated at all. This will be the case when "Scan without Field Values" is ticked; you can of course circumvent this by changing the 0 to 1 for each table you want to have data for, but then you still end up with random data (or incomplete data), for which I cannot see much value.

There is no simple, obvious and meaningful fix for this use case, as far as I can tell.

@janblom janblom self-assigned this Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants