You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @montanalow . This is really a great work. I really like how you abstract the common pitfalls in machine learning and streamline the process in this project. I see a lot of potential in this project from a data scientist perspective. If you don't mind, I can provide my feedback from using this tool.
For this particular issue, I encountered h5py error because of too many Input layers. As show here, we have to pass one encoder for each column in the dataframe, and each encoder corresponds to one Input layer. I deal with a lot of DNA sequence data which is usually >5000 columns. I think it makes sense to at least combine the columns using Continuous or Pass encoders into one Input.
The text was updated successfully, but these errors were encountered:
@Guzzii This is something we've run into internally as well. The current work around is to set short_names = True, which will get you to hundreds, but probably not thousands of inputs.
What if encoders that share a common base name, followed by a number, e.g. 'sequence_1', 'sequence_2', 'sequence_3', ... 'sequence_n' were mapped into a single input of 'sequence' with shape(n), for all types where that is possible?
Hi montanalow. I think it makes sense. Just want to make sure if I understand correctly. In this case, it would aggregate columns with shared base name sequence_col_{}, and encoder generated input like one_hot_{}, respectively.
Correct. I think there will be a little bit of complexity around encoders that have a sequence_length like the Token encoder, because they will need to go to a 2D shaped input, but should still work in theory.
Hi @montanalow . This is really a great work. I really like how you abstract the common pitfalls in machine learning and streamline the process in this project. I see a lot of potential in this project from a data scientist perspective. If you don't mind, I can provide my feedback from using this tool.
For this particular issue, I encountered
h5py
error because of too manyInput
layers. As show here, we have to pass one encoder for each column in the dataframe, and each encoder corresponds to oneInput
layer. I deal with a lot of DNA sequence data which is usually >5000 columns. I think it makes sense to at least combine the columns usingContinuous
orPass
encoders into oneInput
.The text was updated successfully, but these errors were encountered: