cpg0037-oasis #92
Replies: 13 comments 35 replies
-
Hi Shantanu, thanks for getting the ball rolling on this! Excited! Confirming the assay used is the standard Cell Painting v3.0 dye set to stain primary human hepatocytes (PHHs). There are 68 x 384 well plates in the dataset. We use the whole plate and do not exclude any edge wells. We image 15 sites (field of view) using the 40x objective per well. The field images are comprised of 5 fluorescent channels and 1 BF image. Each individual channel image is around 8 MB. Therefore it looks like the images will be around 20 TB, in sum. Images were acquired on an Operetta - we will also transfer the associated flat field correction (FFC) profiles and xml metadata for each plate. In addition to the images, we are also happy to share our in-house analysis QC pipeline results, including:
Leveraging the embeddings has proven particularly insightful. We have seen that ~70% of the compounds in this dataset produce image embeddings that are significantly different than the DMSO vehicle control embeddings in the same plate, and could be considered bioactive. |
Beta Was this translation helpful? Give feedback.
-
Thanks for all these details @ktitterton! Let's start with image data first. Do you anticipate any issues in structuring the data like this? As for the remaining components, I suspect they will all go into one of the existing folders in
And then finally, would it be possible for you to create metadata for each of these plates, as detailed in https://broadinstitute.github.io/cellpainting-gallery/data_structure.html#arrayed-metadata? |
Beta Was this translation helpful? Give feedback.
-
Hi Shantanu, Currently our data are indeed under a The raw image data follows the standard PerkinElmer/Revvity format:
The associated metadata and features folder structure is:
Some misc. notes:
Yes, we are happy to create metadata following the specifications. Is this to be provided in addition to the data as it currently exists, described above? tagging my colleague Alex @cabreraalex who is our resident data wizard and much more expert in these matters than myself! |
Beta Was this translation helpful? Give feedback.
-
The Axiom team just chatted a bit internally, and we were wondering if it would be most helpful to go ahead and copy over the data as it currently exists into an s3 bucket, and share it with y'all - that way you can let us know exactly what should be added, deleted, or rearranged. Let us know if this would be a good next step! Excited! |
Beta Was this translation helpful? Give feedback.
-
@ktitterton The images are structured exactly per specs, so all set there. The metadata.parquet looks good but will eventually need to be wrangled to follow the structure, but that can wait. Next, can you please provide the following for 1-2 plates?
Please upload that to |
Beta Was this translation helpful? Give feedback.
-
Hi @cabreraalex - thanks for transferring everything, we've kicked off the image processing on our end. Do you, @ktitterton, or @shntnu know the plan for transferring the Dino features? In other projects, we've created additional folders to hold alternative features, for example in this case we have 'images' for the raw .tiff, 'workspace' for CellProfiler features, and 'workspace_dl' for some deep learning-derived features: I imagine that we'll want to do something like this as well. Has it already been discussed? |
Beta Was this translation helpful? Give feedback.
-
@ktitterton
But instead were uploaded with image files directly in the plate folder:
How might we go about getting the
|
Beta Was this translation helpful? Give feedback.
-
@ktitterton @cabreraalex I also want to confirm that channel mapping I've parsed below is correct and is consistent across all plates and all batches? channels: channelid: |
Beta Was this translation helpful? Give feedback.
-
Hi Erin, Yes, I can confirm there are two different Harmony versions and two different microscopes in the dataset. The vast majority of the data was acquired on O1, with a very small minority on O2. There was an unfortunate automation error in prod_25, so there are some technical artifacts in this batch, including the microscope switch for a couple of plates. In our unbiased image clustering, we have noticed some patterning by microscope in high-tox conditions. In general, the O2 microscope seems to generate images which have slightly more texture - but the raw image histograms are otherwise extremely similar. Yup, I can also confirm that channel mapping looks correct! Please do let us know if you have any other questions- no question is too small. We are all very much looking forward to seeing your results sometime soon! Thanks for checking in |
Beta Was this translation helpful? Give feedback.
-
@ktitterton There are several that are off by 6 files (34555 files) which would be a single field of view and I am not concerned about these, but I will note them here for the sake of completeness. |
Beta Was this translation helpful? Give feedback.
-
Hi @cabreraalex and @ktitterton - I'm not seeing any positive controls or negative controls in the Dino embeddings (and wondering if @ErinWeisbart has any in the images / resulting CellProfiler features). After compiling the metadata.parquet across all plates in locations like this, I don't see any compounds with the name "DMSO" or "dimethyl sufoxide", and no compounds with a concentration of 0. Can you let me know if you either A) didn't collect poscons/negcons on each plate or B) just didn't include them in the data that you uploaded? Poscons I can live without, but the DMSO samples are pretty crucial for the normalization and the concentration-response analysis. |
Beta Was this translation helpful? Give feedback.
-
Hi folks, our plan was to upload Axiom's processed data as-is and then re-organize later. Let's do that now We will follow the instructions here This is the current structure, under
Proposed structure Under
The Then under
Additionally, @jessica-ewald has new CNN-based from Ray Jones and Bram Gorissen; these would be structured like this:
Does all this sound ok, @ktitterton @cabreraalex @jessica-ewald? We (Broad) should be able to help complete this task but might ping some of you for help |
Beta Was this translation helpful? Give feedback.
-
For our notes -- we also have an internal email thread discussing this "Updating metadata.parquet files / blocklists" |
Beta Was this translation helpful? Give feedback.
-
The data from Axiom will kick us off!
We will follow the process documented here https://github.com/broadinstitute/cellpainting-gallery/wiki/Data-Transfer-Workflow
Axiom team, could you please answer the following:
images
,analysis
,backend
,load_data_csv
,profiles
. optional components:pipelines
,qc
, etc.). Note thatmetadata
is required.cpg0037-oasis
Then, please orient yourself to the prep in https://broadinstitute.github.io/cellpainting-gallery/contributing_to_cpg.html#preparing-for-data-deposition
For Broadies, here's our checklist from https://github.com/broadinstitute/cellpainting-gallery/wiki/Data-Transfer-Workflow
Champion: Shantanu SIngh
Dataset name:
cpg0037-oasis
cpg0031
) by incrementing the identifier of the most recent dataset in the dataset list, and update the list with the new identifier.cpg0031-caicedo-cmvip
).Beta Was this translation helpful? Give feedback.
All reactions