Summary & next steps #2

ml-evs · 2024-05-14T10:05:05Z

Now the dust has settled a bit, we should collect our ideas for how we can extend this project, taking inspiration from other projects at the hackathon. I'll kick things off with a somewhat random list of takeaways/ideas:

Our datalab API package can be made more human and AI-friendly
- A lot of the improvements are things that were already planned, e.g., being able to use a datalab's info endpoints at the initial handshake to populate things like allowed item and block types so that better error messages/completions can be offered
- We could write a "robots.txt" (or ai.txt) file for the API package that will give a starting point for others to use our package with AI-agents. Perhaps this should even be AI-generated from the code itself (with the 'best' available model and then used by smaller models)
- Following the example of LangSim, rather than letting the agent write code using our API directly, we could also consider packaging up our methods as tools that can be directly called -- this doesn't look like it would be too difficult, though we would want to avoid writing a separate AI and human API.
LLM agents offer new UI opportunities
- We should continue on the path to make our UI pluggable, and one such AI-driven plugin could be a new "add item" button that spools up a pre-prompted LLM agent and accepts text, images etc. with the aim of creating one or many new entries in datalab.
- Another would be a Jupyter interface to the same agent that can be used directly alongside the Python code for things like data analysis, ideally where the agent can write cells itself for execution.
- Broadening the idea of UI, our project (and now, gpt4o, which has somewhat upstaged it...) could be the basis for multimodal UIs in the lab, e.g., a camera watching a glovebox and receiving speech commands like "record me making this cell and create a datalab entry" [whereby it would try to read IDs/barcodes or even watch and understand the procedure], or remote question asking like "what is currently in glovebox " or "can you see my sample in any of the gloveboxes?". De-emphasising this as purely a surveillance tool, the direct 'datalab' integration of it would allow for the speech and video of a sample to be stored alongside the sample itself in datalab, with the ability to easily make corrections. Another project along these lines was speech-schema-filling -- certainly such a device could be prompted in such a way that it is agnostic to the actual data backend (e.g., datalab, NOMAD or otherwise).
- e.g., filming chemical inventory
New models, new modalities, running costs and hardware
- Models like gpt4o that combine text with audiovisual inputs in a single model look very promising for HCI. Will other smaller/cheaper models be 'good enough' for our structured use case? For example, we found good performance and speed with haiku, whereas opus was a bit too slow for interactive use. These models, released ages ago in March 2024, seem to get smoked by gpt4o (released yesterday) in both speed and performance (though not price).
- Will we ever be able to escape using external paid APIs? Are the big OSS models up to the task already, and can we afford to run them if we have some shared infrastructure across datalab projects?

ml-evs changed the title ~~Next steps~~ Summary & next steps May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summary & next steps #2

Summary & next steps #2

ml-evs commented May 14, 2024 •

edited

Loading

Summary & next steps #2

Summary & next steps #2

Comments

ml-evs commented May 14, 2024 • edited Loading

ml-evs commented May 14, 2024 •

edited

Loading