Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data for Sandboxes Only #66

Open
odscjames opened this issue Jan 27, 2021 · 5 comments
Open

Data for Sandboxes Only #66

odscjames opened this issue Jan 27, 2021 · 5 comments

Comments

@odscjames
Copy link
Contributor

odscjames commented Jan 27, 2021

We want to have some data that is private (ie does NOT appear to general public) but does appear in Sandboxes (these sandboxes are password protected so only certain people can see them).

Spreadsheets

Have new fields at relevant places alongside existing status fields, labelled "In Private Sandboxes".

This is a string, and people can put in several Sandbox ID's comma separated or none

  • sandbox1
  • sandbox1, sandbox2

in-private-sandbox

When Status is:

  • Public - the "In Private Sandboxes" field is ignored, even if it has contents. The data is all public anyway.
  • Disputed - the "In Private Sandboxes" field is ignored, even if it has contents. The data will not be in any sandboxes. If it's disputed no-one outside GoLab should see it until the dispute is resolved.
  • Private - the "In Private Sandboxes" field is used. The data will not appear in public, but it will appear in the sandboxes specified in "In Private Sandboxes".

Essentially then, the "In Private Sandboxes" field ONLY does something if the status is Private - hence it's name "In Private Sandboxes".

Sandbox definition

We will have a new model / database table, "Sandbox". This will have fields

  • public_id: a public and humanly used id, sandbox1, sandbox2
  • name or title: just to show in Web UI as a more friendly thing

This can be set via Django admin for now, but later options can be added to the GoLab admin interface so GoLab staff can make them themselves.

An API Key is set in the app config.

API

When getting project data from API (eg https://golab-indigo-data-store.herokuapp.com/app/api1/project/INDIGO-POJ-0XXX ) a Header or Get can be set; this is the API key.
Normally the response has

  • project/id
  • project/data - all public data

Now it will also have

  • project/sandboxes/ID - for each sandbox with data, the entire data block (Public and Sandbox) will be here

The Plump website, will want public data and several sandboxes, so this will allow them to do that with one request.

Internal data

Internally, to support this the Project model will have a new field data_sandboxes, a JSON field. This will be

{
    'sandbox1': { .........  extra data in that sandbox .........    },
    'sandbox2': {......... extra data in that sandbox .........     },
}

This field will be calculated at existing update_data stage.

This should be data in that sandbox ONLY; not public data too. This makes it easy to:

  • see if a project has any extra data in a sandbox or not
  • in the web UI, show admins what that extra data is (without confusing them with public data too)

Web UI

This will make it easy to show in the admin UI of the app:

  • If a project has any extra info in a particular sandbox, and what it is.
  • A list of projects with extra info in one sandbox.

The first one should be done but the second one might not be done straight away

CSV, spreadsheets, etc

To start with, no sandbox data will appear in these. There are several issues to sort here:

  • (user issue) How to communicate to members of public this data is private and not fully open
  • (technical issue) How to get a user to download one without leaking the API key
@ScatteredInk
Copy link

Public - this does nothing. It's all public anyway.

If the sandbox field is filled in then does it not make more sense to assume that status should be 'private', rather than over-riding and publishing as public - or am I misunderstanding?

@odscjames
Copy link
Contributor Author

Well first I've made clearer:


When Status is:

  • Public - the "In Private Sandboxes" field is ignored, even if it has contents. The data is all public anyway.
  • Disputed - the "In Private Sandboxes" field is ignored, even if it has contents. The data will not be in any sandboxes. If it's disputed no-one outside GoLab should see it until the dispute is resolved.
  • Private - the "In Private Sandboxes" field is used. The data will not appear in public, but it will appear in the sandboxes specified in "In Private Sandboxes".

It's not great having 2 fields and having combinations in which one of them is ignored, but I can't think of a better way.

I think your suggesting:


If the "In Private Sandboxes" field is empty, then status works as it did before.

If the "In Private Sandboxes" field has contents and ...

  • status is "private", data is not in public but is in sandbox
  • status is "public", we assume this is a mistake and we should treat status as "private"
  • status is "disputed", data is not in public and not in sandbox because no-one outside GoLab should see it until the dispute is resolved.

Hmmmm

@odscjames
Copy link
Contributor Author

I'd be tempted to leave the first one (cos honestly, that's easier :-P ) and instead put a Data Quality Report check in:

If status=public but "In Private Sandboxes" has contents raise an flag saying this is ambiguous and we assume public.

@odscjames
Copy link
Contributor Author

Actually, maybe a cleaner way that avoids any issues about how it works in different ways is to make them be more explicit.

Add a 4th status, "SANDBOX".

  • If Status is "PRIVATE", "PUBLIC" or "DISPUTED" then just ignore anything in the other "In Sandboxes" field.
  • If status is "Sandbox", it's not public, and look at contents of "In Sandboxes" field to see which ones it appears in.

This means to put data in sandbox takes an extra step and is more explicit.
Also makes it simpler to use - now no knowledge is required about combinations of the 2 fields (eg "what if disputed and sandboxed?") - it's a simple rule, if status is sandbox it is in those sandboxes, if it's not it's just not.

odscjames added a commit that referenced this issue Feb 10, 2021
odscjames added a commit that referenced this issue Feb 10, 2021
odscjames added a commit that referenced this issue Feb 10, 2021
#66

This is only available to transactions table in projects so far.

But it is a general purpose framework that could apply to any status field.
odscjames added a commit that referenced this issue Feb 10, 2021
#66

This is only available to transactions table in projects so far.

But it is a general purpose framework that could apply to any status field.
odscjames added a commit that referenced this issue Feb 10, 2021
#66

This is only available to transactions table in projects so far.

But it is a general purpose framework that could apply to any status field.
odscjames added a commit that referenced this issue Feb 10, 2021
#66

This is only available to transactions table in projects so far.

But it is a general purpose framework that could apply to any status field.
@odscjames
Copy link
Contributor Author

I've just realised the API needs to be simplified.

It has to be: "sandboxes" key in API returns all data (public and sandbox).

If it just returns only the sandbox data, there is a problem merging the data and we have just handed that problem to the client. That sucks. We should merge on server, and make client easy.

(The problem is that you can't just replace the sandbox data over the original data; if a key is a list then the list must be merged)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants