Towards Roadmap #1215

MischaPanch · 2024-09-08T09:06:11Z

This issue serves for informing about and discussing the next major release of Tianshou, after which the library can be considered mature and stable from our perspective. The progress and the related issues are tracked in the corresponding milestone.

Reasoning - the Current State of RL Software

Before outlining the work to be done for the next major release, we should clarify why we want put so much work into Tianshou at all. The reason is the immense potential that we believe this project to hold and the need for such a resource for the RL community.

Currently, most open-source RL projects fall into one of two types (we only consider currently maintained projects that support gymnasium, so dopamine and acme are out of the picture):

libraries that are mainly oriented towards practitioners who want convenient interfaces for training an agent but don't need to develop algorithms (rllib is the prime example, to some extent also pearl). These libraries are generally not well suited for RL researchers, who are by far the largest part of the RL community, because of their typically enormous complexity in lower-level interfaces and builtins. While such libraries have to focus on stability of high-level interfaces, they often go through large breaking changes on low-level interfaces that can be frustrating to deal with for the select few researchers who put in the effort of learning them for some version of the project.
Projects that help researchers quickly prototype an algorithm. They include educational projects like spinning up, single-file implementations like clean-rl, low-level libraries with building blocks like pytorch-rl and many others. They are often targeted towards training on selected environments and at producing training curves rather than creating useful artifacts for production code. Many other features are often missing, like a good integration of hyper-parameter optimization or proper evaluation protocols - both of crucial importance for reliable research. More often than not, such projects are not suitable to be used in production code. In addition, many such projects are results from work during the PhD or PostDocs and don't follow the highest coding standards (type-safety, sufficient testing, flexible design)

A notable exception is SB3 and its ecosystem, which can be considered the golden standard of RL libraries (as the name tells, it took 2 complete rewrites to get there from the original stable baselines). The main limitation of SB3 is its restricted scope - there are no plans to include offline RL, multi-agent RL, multi-node support or other things. Many important algorithms are not included in SB3, and as the name tells, it is rather directed towards stability than a very active development needed for completeness. SB3 interfaces are also somewhat rigid (e.g. the buffers can contain only specific fields, while Tianshou's internal data structures can contain anything while still being type-safe, thus facilitating algorithm development), and as a minor aside, there are no purely configuration-based high-level interfaces that are usually required in more "production oriented" application code.

Predictably, the absence of convenient, complete and reliable code for researchers led to an enormous proliferation of projects of the second type. At the same time, absence of researchers' influence to projects of the first type means that their user base is much smaller than it should be for such a complex topic as RL. Therefore, projects of the first type often lack important features, algorithms, have performance issues for common use cases and can be frustrating to work with even in production settings.

Tianshou's Approach

We believe that Tianshou can make a significant positive contribution to the community by being a resource that is useful both for practitioners and researchers - possibly the only (or first) of its kind! Such a resource would need at least

Flexible, type-safe and as simple as possible "procedural" interfaces that allow building all kinds of algorithms. This includes flexible datastructures that are nevertheless more expressive than dicts (or tensordicts).
Fully integrated utilities for thorough evaluation and comparison of performance (a la rliable), useful for reporting in research papers.
Fully integrated hyper-parameter optimization, a crucial and often overlooked and underreported aspect of any work on RL.
High performance algorithm implementations (SOTA or better) that are fast (little to no overhead and high degree of parallelization).
Large scope that covers the majority of the forms that RL can take
Stable and simple configuration interfaces for practitioners
Strong focus on backwards compatibility, persistence, export, logging and other features needed for production use
Extensive documentation of the features and interfaces, including tutorials, examples, cheat-sheets and so on. The documentaiton itself needs to be correct and not become outdated (Tianshou reaches this goal by mainly keeping docs in executable notebooks)
Extensive documentation on how the library can be extended, allowing researchers to develop new algorithms that can quickly be merged into tianshou, thereby bringing the frontiers of RL research into a resource that is suitable for production use
Extensive and continuosly extended knowledge base that contains descriptions of the most common problems encountered in RL
Compatibility with Gymnasium and custom environments, tutorials for the latter
High quality of code and large test coverage, including performance and documentation tests
Support for accelerating things on the environment parallelization level, like double-buffered batched or async environments that are faster than what is currently offered by gymnasium or SB3 (like envpool but on python level and for all envs. An env vectorization in that direction is achieved in pufferlib's multiprocessing implementation. @jsuarez5341 I hope I got that right, feel free to correct me otherwise)

We believe that the existence of such a resource can be transformative for the RL community and for its somewhat sorry current state, plagued by code duplication, limited applicability, evaluation and reproducibility crises, and community fragmentation. Moreover, we believe that Tianshou 2.0 will be such a resource.

Current State of Tianshou

As it is now, Tianshou ticks several boxes outlined above, but not all. Importantly, it does lay the right foundations by already having

Flexible and type-safe datastructures - Batch, Buffer and the various Protocols that allow expressive typing.
Large scope. At least nominally, Tianshou supports most RL paradigms (even though MARL support is rather experimental)
High performance: Tianshou's implementations are SOTA and very fast
Stable high-level interfaces
Many examples and procedural low-level interfaces suitable for rapid prototyping
Integrates with Gymnasium, custom envs
High code quality (we take this topic very seriously)
Started with support for best practices of evaluation (integrated rliable)

Tianshou focuses on simplicity in use for researchers and practitioners (including in simplicity of reading the code), a philosophy which is paramount to the development and that will always be a priority.

However, a large block remains to be done until all of the (minimal) requirements outlined above are fulfilled

What is Missing

1. Algorithm Abstraction

A major technical problem in Tianshou is the lack of an Algorithm abstraction. Instead, the current codebase implements on-policy, off-policy and offline learning on the level of the Trainer. This has multiple downsides, most importantly that the learn method doesn't have a proper interface (an issue that is hidden by the usage of **kwargs there) and that one needs to think about things like buffer-size for on-policy learning, where it doesn't make sense. This is described in issues #1034 #949 #948. Solving this will be a major breaking change. It doesn't make sense to write proper documentation on implementing algorithms before solving this, which is why work on documentation has stalled.

2. Improvements in Internal Data Structures

There are several issues with the central data structures and objects like Batch, Buffer and the Collectors. Some parts of the library still carry confusing names, and convoluted testing strategies, which (rightly) hinders adoption by researchers. Such issues need to be resolved, the core interfaces of Tianshou need to be very clean and properly documented. We did a lot of work on it in the last year but some still remains to be done.

3. Automated Benchmark and More Testing

While performance of algorithms is tested to some extent in CI, this is insufficient. The results reported in the docs under Benchmark need to be easily reproducible and in fact need to be recreated on every release. The recent addition of the eval package represents a major step in that direction, but we need more. In particular, we should explore using fast envs like in pufferlib for representative performance tests.

4. Exhaustive Documentation and Knowledge Base

Documentation is severely lacking, the tutorials cover only a tiny fraction of what is possible with Tianshou. Many more Tutorials and how-to's are needed to make it easy for the community to make the most out of Tianshou. However, part of the documentation writing is blocked by the refactorings.

A knowledge base as such barely exists in Tianshou at the moment.

5. Improved Parallelization Performance

Multi-node support is possible in principle through the Ray worker, but it is not currently practicable. Throughput performance can be improved by better vectorization (depending on the speed of env.step), better integration of async (some support is already there) and by double-buffering.

6. More Standard RL Algorithms

Many important algorithms are not implemented, including MPO, IQL and ARS.

7. Fix Long-Standing RNN Bugs

The RNN 'history summarization' is currently broken, though the interfaces support it in principle (long standing bug that will be addressed during the Algorithm implementation).

8. Integrate Hyperparameter Optimization

HPO is not fully integrated yet, probably CARBS as default would be a good idea

9. Enhanced Logging and Callbacks

The logging and callbacks need to become more flexible and configurable through the Trainers, this is already work in progress.

10. Unify Interfaces

Agent and Critic interfaces need to be consolidated, there are already various issues written on that.

Handling of dist_fn needs to be improved, discrete agents shouldn't require a dist_fn as input at all (it will always be Categorical).

11. Support Gymnasium 1.0.0

Gymnasium 1.0.0 is not supported yet, and integrating it will be a breaking change (@pseudo-rnd-thoughts FYI)

Developers, Plan and Timeline

While there is a lot to do, the work is also not unsurmountable! Tianshou already has an excellent foundation, is widely used and well structured. The already established high performance and large test coverage together with the static code analysis makes further development safe and relatively easy, especially for seasoned programmers. The foundation for tested documentation has been layed. A lot of work has already been invested in testing out what the Algorithm abstraction should look like (we can also take inspiration from SB3) and how to fix the RNN bugs (where we can take inspiration from Pearl).

The main developers working on this are @MischaPanch and @opcode81, with significant support in coordination from the original main developer @Trinkle23897. Recurrent contributors are @dantp-ai, @maxhuettenrauch and @carlocagnetta . We are hoping on support on multi-node and double-buffering topics from @destin-v ;).

There will be several intermediate releases until 2.0.0, the latter will only come after the breaking changes of Algorithm. After the 2.0.0 release there will be a dedicated focus on documentation and establishing of a knowledge base. After this point, Tianshou can be considered mature. We believe that researchers will appreciate how much faster they can get to results by using Tianshou, and also how much faster these results can be adopted by the community after being integrated there through a PR. Our ambition is large: Tianshou should become the first go to place for a large part of the RL community.

Development times are very difficult to estimate, and we yet need to fully verify the extent of support that the developers will receive both from official supporting organizations (including the current ones) and the community. But very roughly, Tianshou 2.0.0 can be achieved in more than 6 months but likely less than a year, with intermediate releases happening every few months.

Newer Developments in RL

In the last years, several new research directions in RL have appeared, the largest of them are RL for language models (a la RLHF) and the combination of RL and Diffusion approaches. Originally, Tianshou was not developed with such directions in mind. In the next months we will evaluate how naturally these approaches fit into the development plans, and also how important it is for the community to incorporate them.

To be Discussed

Tianshou is torch based, but the actual part that depends on torch is rather small. We just use optimizers to perform gradient updates, and the highly useful and differentiable Distribution objects. Most other things like Batch, Buffer, Collector and Trainer are not relying on any torch specifics. Given the popularity of Jax for RL, and the maturity and huge community behind keras3, it might be possible to introduce multi-backend support to Tianshou with little effort. However, it might also not be possible without major breaking changes or without increasing complexity, in which cases this likely won't be done as part of 2.0.0. The feasibility of multi-backend support by using keras as differentiation engine will be evaluated over the next months

The text was updated successfully, but these errors were encountered:

pvjosue · 2024-09-09T13:12:49Z

Indeed an algorithm abstraction and strong documentation sound like the way to go!!

arnaujc91 · 2024-09-09T13:18:19Z

Great job @MischaPanch ! Keep us updated :)

ErikaWes · 2024-09-09T14:12:17Z

Thanks for the update! I've been using tianshou for a while and am looking forward to the next stable release!

dantp-ai · 2024-09-09T14:16:32Z

Indeed, the Tianshou project aims to be a standard for unifying the problem of researching RL algorithms and developing them for production into one library by adhering to high software engineering practices.

UsaidPro · 2024-09-10T05:36:26Z

Thanks for the update! Really hope to see this project succeed, it is rare to find a flexible, well-planned library in the RL space. Tianshou is one of the best-designed RL libraries I have seen in terms of flexibility.

Is there any plan to support "end-to-end" RL where everything (from environment to batching to logging) runs on GPU (like Warp-Drive or PureJaxRL)?

edbeeching · 2024-09-10T07:58:12Z

Great to see your roadmap, as a contributor to Sample-Factory, clean-rl and StableBaselines3, and the creator of the open source Godot RL Agents library. I can attest to the value that Tianshou brings to the RL community. Looking forward to your future developments.

nuance1979 · 2024-09-10T22:32:01Z

Thank you very much for this great roadmap with a clear vision and a concrete plan! As a former contributor and heavy user of Tianshou, it warms my heart to see the amazing work you guys did over the past few years. I really hope you can continue working on it and to turn the vision into reality.

youkaichao · 2024-09-12T07:59:39Z

Tianshou is an amazing rl library for researchers, and it would be great to see it expand into RLHF with LLMs!

carlocagnetta · 2024-09-12T10:46:05Z

As someone who has worked extensively with Tianshou over the past year, I have been consistently impressed by its readability and the high code quality that underpins the project. The high-level interfaces and comprehensive examples make it straightforward to set up experiments while also offering the flexibility needed for customization, which is invaluable for both practitioners and researchers.

I believe that Tianshou holds great potential for bridging the gap between straightforward implementations and the complex needs of researchers, thanks to its readable, maintainable codebase and its active, responsive community.

I completely agree with the points proposed for the next major release, and for me, the most important ones are:

Ensuring Continuous and Up-to-Date Integration with Gymnasium: Regular updates and close collaboration between Tianshou developers and the Gymnasium team will keep Tianshou relevant and effective within the broader RL ecosystem.
More Exhaustive Documentation: While Tianshou already provides a good foundation, expanding the documentation to include more examples of customized environments and best practices for RL training in diverse scenarios would be highly beneficial. This could make the library more accessible and user-friendly for both newcomers and experienced users.

I’m excited about Tianshou’s future and would be happy to continue contributing towards its success!

Fvalde2024 · 2024-09-13T10:06:53Z

Cool, looking forward to the next release! Thanks for your work on Tianshou!

Trinkle23897 · 2024-09-15T21:28:56Z

Late to the party but @MischaPanch et al thank you for your hard work during the last year that making Tianshou gradually better!

MischaPanch added the major Large changes that cannot or should not be broken down into smaller ones label Sep 8, 2024

MischaPanch added this to the Release 2.0.0 milestone Sep 8, 2024

MischaPanch assigned MischaPanch and opcode81 Sep 8, 2024

MischaPanch pinned this issue Sep 8, 2024

MischaPanch changed the title ~~Towards Release 2.0.0~~ Towards Roadmap Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Towards Roadmap #1215

Towards Roadmap #1215

MischaPanch commented Sep 8, 2024 •

edited

Loading

pvjosue commented Sep 9, 2024

arnaujc91 commented Sep 9, 2024

ErikaWes commented Sep 9, 2024

dantp-ai commented Sep 9, 2024 •

edited

Loading

UsaidPro commented Sep 10, 2024

edbeeching commented Sep 10, 2024

nuance1979 commented Sep 10, 2024

youkaichao commented Sep 12, 2024

carlocagnetta commented Sep 12, 2024

Fvalde2024 commented Sep 13, 2024

Trinkle23897 commented Sep 15, 2024

Towards Roadmap #1215

Towards Roadmap #1215

Comments

MischaPanch commented Sep 8, 2024 • edited Loading

Reasoning - the Current State of RL Software

Tianshou's Approach

Current State of Tianshou

What is Missing

1. Algorithm Abstraction

2. Improvements in Internal Data Structures

3. Automated Benchmark and More Testing

4. Exhaustive Documentation and Knowledge Base

5. Improved Parallelization Performance

6. More Standard RL Algorithms

7. Fix Long-Standing RNN Bugs

8. Integrate Hyperparameter Optimization

9. Enhanced Logging and Callbacks

10. Unify Interfaces

11. Support Gymnasium 1.0.0

Developers, Plan and Timeline

Newer Developments in RL

To be Discussed

pvjosue commented Sep 9, 2024

arnaujc91 commented Sep 9, 2024

ErikaWes commented Sep 9, 2024

dantp-ai commented Sep 9, 2024 • edited Loading

UsaidPro commented Sep 10, 2024

edbeeching commented Sep 10, 2024

nuance1979 commented Sep 10, 2024

youkaichao commented Sep 12, 2024

carlocagnetta commented Sep 12, 2024

Fvalde2024 commented Sep 13, 2024

Trinkle23897 commented Sep 15, 2024

MischaPanch commented Sep 8, 2024 •

edited

Loading

dantp-ai commented Sep 9, 2024 •

edited

Loading