-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Towards Roadmap #1215
Comments
Indeed an algorithm abstraction and strong documentation sound like the way to go!! |
Great job @MischaPanch ! Keep us updated :) |
Thanks for the update! I've been using tianshou for a while and am looking forward to the next stable release! |
Indeed, the Tianshou project aims to be a standard for unifying the problem of researching RL algorithms and developing them for production into one library by adhering to high software engineering practices. |
Thanks for the update! Really hope to see this project succeed, it is rare to find a flexible, well-planned library in the RL space. Tianshou is one of the best-designed RL libraries I have seen in terms of flexibility. Is there any plan to support "end-to-end" RL where everything (from environment to batching to logging) runs on GPU (like Warp-Drive or PureJaxRL)? |
Great to see your roadmap, as a contributor to Sample-Factory, clean-rl and StableBaselines3, and the creator of the open source Godot RL Agents library. I can attest to the value that Tianshou brings to the RL community. Looking forward to your future developments. |
Thank you very much for this great roadmap with a clear vision and a concrete plan! As a former contributor and heavy user of Tianshou, it warms my heart to see the amazing work you guys did over the past few years. I really hope you can continue working on it and to turn the vision into reality. |
Tianshou is an amazing rl library for researchers, and it would be great to see it expand into RLHF with LLMs! |
As someone who has worked extensively with Tianshou over the past year, I have been consistently impressed by its readability and the high code quality that underpins the project. The high-level interfaces and comprehensive examples make it straightforward to set up experiments while also offering the flexibility needed for customization, which is invaluable for both practitioners and researchers. I believe that Tianshou holds great potential for bridging the gap between straightforward implementations and the complex needs of researchers, thanks to its readable, maintainable codebase and its active, responsive community. I completely agree with the points proposed for the next major release, and for me, the most important ones are:
I’m excited about Tianshou’s future and would be happy to continue contributing towards its success! |
Cool, looking forward to the next release! Thanks for your work on Tianshou! |
Late to the party but @MischaPanch et al thank you for your hard work during the last year that making Tianshou gradually better! |
This issue serves for informing about and discussing the next major release of Tianshou, after which the library can be considered mature and stable from our perspective. The progress and the related issues are tracked in the corresponding milestone.
Reasoning - the Current State of RL Software
Before outlining the work to be done for the next major release, we should clarify why we want put so much work into Tianshou at all. The reason is the immense potential that we believe this project to hold and the need for such a resource for the RL community.
Currently, most open-source RL projects fall into one of two types (we only consider currently maintained projects that support gymnasium, so dopamine and acme are out of the picture):
libraries that are mainly oriented towards practitioners who want convenient interfaces for training an agent but don't need to develop algorithms (rllib is the prime example, to some extent also pearl). These libraries are generally not well suited for RL researchers, who are by far the largest part of the RL community, because of their typically enormous complexity in lower-level interfaces and builtins. While such libraries have to focus on stability of high-level interfaces, they often go through large breaking changes on low-level interfaces that can be frustrating to deal with for the select few researchers who put in the effort of learning them for some version of the project.
Projects that help researchers quickly prototype an algorithm. They include educational projects like spinning up, single-file implementations like clean-rl, low-level libraries with building blocks like pytorch-rl and many others. They are often targeted towards training on selected environments and at producing training curves rather than creating useful artifacts for production code. Many other features are often missing, like a good integration of hyper-parameter optimization or proper evaluation protocols - both of crucial importance for reliable research. More often than not, such projects are not suitable to be used in production code. In addition, many such projects are results from work during the PhD or PostDocs and don't follow the highest coding standards (type-safety, sufficient testing, flexible design)
A notable exception is SB3 and its ecosystem, which can be considered the golden standard of RL libraries (as the name tells, it took 2 complete rewrites to get there from the original stable baselines). The main limitation of SB3 is its restricted scope - there are no plans to include offline RL, multi-agent RL, multi-node support or other things. Many important algorithms are not included in SB3, and as the name tells, it is rather directed towards stability than a very active development needed for completeness. SB3 interfaces are also somewhat rigid (e.g. the buffers can contain only specific fields, while Tianshou's internal data structures can contain anything while still being type-safe, thus facilitating algorithm development), and as a minor aside, there are no purely configuration-based high-level interfaces that are usually required in more "production oriented" application code.
Predictably, the absence of convenient, complete and reliable code for researchers led to an enormous proliferation of projects of the second type. At the same time, absence of researchers' influence to projects of the first type means that their user base is much smaller than it should be for such a complex topic as RL. Therefore, projects of the first type often lack important features, algorithms, have performance issues for common use cases and can be frustrating to work with even in production settings.
Tianshou's Approach
We believe that Tianshou can make a significant positive contribution to the community by being a resource that is useful both for practitioners and researchers - possibly the only (or first) of its kind! Such a resource would need at least
We believe that the existence of such a resource can be transformative for the RL community and for its somewhat sorry current state, plagued by code duplication, limited applicability, evaluation and reproducibility crises, and community fragmentation. Moreover, we believe that Tianshou 2.0 will be such a resource.
Current State of Tianshou
As it is now, Tianshou ticks several boxes outlined above, but not all. Importantly, it does lay the right foundations by already having
Tianshou focuses on simplicity in use for researchers and practitioners (including in simplicity of reading the code), a philosophy which is paramount to the development and that will always be a priority.
However, a large block remains to be done until all of the (minimal) requirements outlined above are fulfilled
What is Missing
1. Algorithm Abstraction
A major technical problem in Tianshou is the lack of an
Algorithm
abstraction. Instead, the current codebase implements on-policy, off-policy and offline learning on the level of theTrainer
. This has multiple downsides, most importantly that thelearn
method doesn't have a proper interface (an issue that is hidden by the usage of**kwargs
there) and that one needs to think about things like buffer-size for on-policy learning, where it doesn't make sense. This is described in issues #1034 #949 #948. Solving this will be a major breaking change. It doesn't make sense to write proper documentation on implementing algorithms before solving this, which is why work on documentation has stalled.2. Improvements in Internal Data Structures
There are several issues with the central data structures and objects like Batch, Buffer and the Collectors. Some parts of the library still carry confusing names, and convoluted testing strategies, which (rightly) hinders adoption by researchers. Such issues need to be resolved, the core interfaces of Tianshou need to be very clean and properly documented. We did a lot of work on it in the last year but some still remains to be done.
3. Automated Benchmark and More Testing
While performance of algorithms is tested to some extent in CI, this is insufficient. The results reported in the docs under Benchmark need to be easily reproducible and in fact need to be recreated on every release. The recent addition of the
eval
package represents a major step in that direction, but we need more. In particular, we should explore using fast envs like in pufferlib for representative performance tests.4. Exhaustive Documentation and Knowledge Base
Documentation is severely lacking, the tutorials cover only a tiny fraction of what is possible with Tianshou. Many more Tutorials and how-to's are needed to make it easy for the community to make the most out of Tianshou. However, part of the documentation writing is blocked by the refactorings.
A knowledge base as such barely exists in Tianshou at the moment.
5. Improved Parallelization Performance
Multi-node support is possible in principle through the Ray worker, but it is not currently practicable. Throughput performance can be improved by better vectorization (depending on the speed of env.step), better integration of async (some support is already there) and by double-buffering.
6. More Standard RL Algorithms
Many important algorithms are not implemented, including MPO, IQL and ARS.
7. Fix Long-Standing RNN Bugs
The RNN 'history summarization' is currently broken, though the interfaces support it in principle (long standing bug that will be addressed during the
Algorithm
implementation).8. Integrate Hyperparameter Optimization
HPO is not fully integrated yet, probably CARBS as default would be a good idea
9. Enhanced Logging and Callbacks
The logging and callbacks need to become more flexible and configurable through the Trainers, this is already work in progress.
10. Unify Interfaces
Agent and Critic interfaces need to be consolidated, there are already various issues written on that.
Handling of
dist_fn
needs to be improved, discrete agents shouldn't require adist_fn
as input at all (it will always be Categorical).11. Support Gymnasium 1.0.0
Gymnasium 1.0.0 is not supported yet, and integrating it will be a breaking change (@pseudo-rnd-thoughts FYI)
Developers, Plan and Timeline
While there is a lot to do, the work is also not unsurmountable! Tianshou already has an excellent foundation, is widely used and well structured. The already established high performance and large test coverage together with the static code analysis makes further development safe and relatively easy, especially for seasoned programmers. The foundation for tested documentation has been layed. A lot of work has already been invested in testing out what the
Algorithm
abstraction should look like (we can also take inspiration from SB3) and how to fix the RNN bugs (where we can take inspiration from Pearl).The main developers working on this are @MischaPanch and @opcode81, with significant support in coordination from the original main developer @Trinkle23897. Recurrent contributors are @dantp-ai, @maxhuettenrauch and @carlocagnetta . We are hoping on support on multi-node and double-buffering topics from @destin-v ;).
There will be several intermediate releases until 2.0.0, the latter will only come after the breaking changes of
Algorithm
. After the 2.0.0 release there will be a dedicated focus on documentation and establishing of a knowledge base. After this point, Tianshou can be considered mature. We believe that researchers will appreciate how much faster they can get to results by using Tianshou, and also how much faster these results can be adopted by the community after being integrated there through a PR. Our ambition is large: Tianshou should become the first go to place for a large part of the RL community.Development times are very difficult to estimate, and we yet need to fully verify the extent of support that the developers will receive both from official supporting organizations (including the current ones) and the community. But very roughly, Tianshou 2.0.0 can be achieved in more than 6 months but likely less than a year, with intermediate releases happening every few months.
Newer Developments in RL
In the last years, several new research directions in RL have appeared, the largest of them are RL for language models (a la RLHF) and the combination of RL and Diffusion approaches. Originally, Tianshou was not developed with such directions in mind. In the next months we will evaluate how naturally these approaches fit into the development plans, and also how important it is for the community to incorporate them.
To be Discussed
Tianshou is torch based, but the actual part that depends on torch is rather small. We just use optimizers to perform gradient updates, and the highly useful and differentiable
Distribution
objects. Most other things like Batch, Buffer, Collector and Trainer are not relying on any torch specifics. Given the popularity of Jax for RL, and the maturity and huge community behind keras3, it might be possible to introduce multi-backend support to Tianshou with little effort. However, it might also not be possible without major breaking changes or without increasing complexity, in which cases this likely won't be done as part of 2.0.0. The feasibility of multi-backend support by using keras as differentiation engine will be evaluated over the next monthsThe text was updated successfully, but these errors were encountered: