diff --git a/README.md b/README.md
index c9a309e..e112b32 100644
--- a/README.md
+++ b/README.md
@@ -9,9 +9,10 @@
- Semantic Signal Separation - SΒ³ π§
- KeyNMF π (paper in progress β³)
- GMM :gem: (paper soon)
- - Implementations of existing transformer-based topic models
+ - Implementations of other transformer-based topic models
- Clustering Topic Models: BERTopic and Top2Vec
- Autoencoding Topic Models: CombinedTM and ZeroShotTM
+ - FASTopic
- Streamlined scikit-learn compatible API π οΈ
- Easy topic interpretation π
- Dynamic Topic Modeling π (GMM, ClusteringTopicModel and KeyNMF)
@@ -19,43 +20,45 @@
> This package is still work in progress and scientific papers on some of the novel methods are currently undergoing peer-review. If you use this package and you encounter any problem, let us know by opening relevant issues.
-### New in version 0.4.0
+### New in version 0.5.0
-#### Online KeyNMF
+#### Hierarchical KeyNMF
-You can now online fit and finetune KeyNMF as you wish!
+You can now subdivide topics in KeyNMF at will.
```python
-from itertools import batched
from turftopic import KeyNMF
-model = KeyNMF(10, top_n=5)
-
-corpus = ["some string", "etc", ...]
-for batch in batched(corpus, 200):
- batch = list(batch)
- model.partial_fit(batch)
+model = KeyNMF(2, top_n=15, random_state=42).fit(corpus)
+model.hierarchy.divide_children(n_subtopics=3)
+print(model.hierarchy)
```
-#### $S^3$ Concept Compasses
+
+You can now use [FASTopic](https://github.com/BobXWu/FASTopic) inside Turftopic.
+```python
+from turftopic import FASTopic
+
+model = FASTopic(10).fit(corpus)
+model.print_topics()
+```
## Basics [(Documentation)](https://x-tabdeveloping.github.io/turftopic/)
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/x-tabdeveloping/turftopic/blob/main/examples/basic_example_20newsgroups.ipynb)
@@ -180,6 +183,7 @@ Alternatively you can use the [Figures API](https://x-tabdeveloping.github.io/to
## References
- Kardos, M., Kostkan, J., Vermillet, A., Nielbo, K., Enevoldsen, K., & Rocca, R. (2024, June 13). $S^3$ - Semantic Signal separation. arXiv.org. https://arxiv.org/abs/2406.09556
+- Wu, X., Nguyen, T., Zhang, D. C., Wang, W. Y., & Luu, A. T. (2024). FASTopic: A Fast, Adaptive, Stable, and Transferable Topic Modeling Paradigm. ArXiv Preprint ArXiv:2405.17978.
- Grootendorst, M. (2022, March 11). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv.org. https://arxiv.org/abs/2203.05794
- Angelov, D. (2020, August 19). Top2VEC: Distributed representations of topics. arXiv.org. https://arxiv.org/abs/2008.09470
- Bianchi, F., Terragni, S., & Hovy, D. (2020, April 8). Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence. arXiv.org. https://arxiv.org/abs/2004.03974
diff --git a/docs/FASTopic.md b/docs/FASTopic.md
new file mode 100644
index 0000000..9338469
--- /dev/null
+++ b/docs/FASTopic.md
@@ -0,0 +1,15 @@
+# FASTopic
+
+FASTopic is a neural topic model based on Dual Semantic-relation Reconstruction.
+
+> Turftopic contains an implementation repurposed for our API, but the implementation is mostly from the [original FASTopic package](https://github.com/BobXWu/FASTopic).
+
+:warning: This part of the documentation is still under construction :warning:
+
+## References
+
+Wu, X., Nguyen, T., Zhang, D. C., Wang, W. Y., & Luu, A. T. (2024). FASTopic: A Fast, Adaptive, Stable, and Transferable Topic Modeling Paradigm. ArXiv Preprint ArXiv:2405.17978.
+
+## API Reference
+
+::: turftopic.models.fastopic.FASTopic
diff --git a/docs/KeyNMF.md b/docs/KeyNMF.md
index c785f62..85742b7 100644
--- a/docs/KeyNMF.md
+++ b/docs/KeyNMF.md
@@ -309,6 +309,47 @@ for batch in batched(zip(corpus, timestamps)):
model.partial_fit_dynamic(text_batch, timestamps=ts_batch, bins=bins)
```
+## Hierarchical Topic Modeling
+
+When you suspect that subtopics might be present in the topics you find with the model, KeyNMF can be used to discover topics further down the hierarchy.
+
+This is done by utilising a special case of **weighted NMF**, where documents are weighted by how high they score on the parent topic.
+In other words:
+
+1. Decompose keyword matrix $M \approx WH$
+2. To find subtopics in topic $j$, define document weights $w$ as the $j$th column of $W$.
+3. Estimate subcomponents with **wNMF** $M \approx \mathring{W} \mathring{H}$ with document weight $w$
+ 1. Initialise $\mathring{H}$ and $\mathring{W}$ randomly.
+ 2. Perform multiplicative updates until convergence.
+ $\mathring{W}^T = \mathring{W}^T \odot \frac{\mathring{H} \cdot (M^T \odot w)}{\mathring{H} \cdot \mathring{H}^T \cdot (\mathring{W}^T \odot w)}$
+ $\mathring{H}^T = \mathring{H}^T \odot \frac{ (M^T \odot w)\cdot \mathring{W}}{\mathring{H}^T \cdot (\mathring{W}^T \odot w) \cdot \mathring{W}}$
+4. To sufficiently differentiate the subcomponents from each other a pseudo-c-tf-idf weighting scheme is applied to $\mathring{H}$:
+ 1. $\mathring{H} = \mathring{H}_{ij} \odot ln(1 + \frac{A}{1+\sum_k \mathring{H}_{kj}})$, where $A$ is the average of all elements in $\mathring{H}$
+
+To create a hierarchical model, you can use the `hierarchy` property of the model.
+
+```python
+# This divides each of the topics in the model to 3 subtopics.
+model.hierarchy.divide_children(n_subtopics=3)
+print(model.hierarchy)
+```
+
+
+
+For a detailed tutorial on hierarchical modeling click [here](hierarchical.md).
+
## Considerations
### Strengths
diff --git a/docs/dynamic.md b/docs/dynamic.md
index 772a60b..6693cf5 100644
--- a/docs/dynamic.md
+++ b/docs/dynamic.md
@@ -77,7 +77,7 @@ model.plot_topics_over_time(top_k=5)
Topics over time on a Figure
-## Interface
+## API reference
All dynamic topic models have a `temporal_components_` attribute, which contains the topic-term matrices for each time slice, along with a `temporal_importance_` attribute, which contains the importance of each topic in each time slice.
diff --git a/docs/hierarchical.md b/docs/hierarchical.md
new file mode 100644
index 0000000..de5696e
--- /dev/null
+++ b/docs/hierarchical.md
@@ -0,0 +1,152 @@
+# Hierarchical Topic Modeling
+
+> Note: Hierarchical topic modeling in Turftopic is still in its early stages, you can expect more visualization utilities, tools and models in the future :sparkles:
+
+You might expect some topics in your corpus to belong to a hierarchy of topics.
+Some models in Turftopic (currently only [KeyNMF](KeyNMF.md)) allow you to investigate hierarchical relations and build a taxonomy of topics in a corpus.
+
+## Divisive Hierarchical Modeling
+
+Currently Turftopic, in contrast with other topic modeling libraries only allows for hierarchical modeling in a divisive context.
+This means that topics can be divided into subtopics in a **top-down** manner.
+[KeyNMF](KeyNMF.md) does not discover a topic hierarchy automatically,
+ but you can manually instruct the model to find subtopics in larger topics.
+
+As a demonstration, let's load a corpus, that we know to have hierarchical themes.
+
+```python
+from sklearn.datasets import fetch_20newsgroups
+
+corpus = fetch_20newsgroups(
+ subset="all",
+ remove=("headers", "footers", "quotes"),
+ categories=[
+ "comp.os.ms-windows.misc",
+ "comp.sys.ibm.pc.hardware",
+ "talk.religion.misc",
+ "alt.atheism",
+ ],
+).data
+```
+
+In this case, we have two base themes, which are **computers**, and **religion**.
+Let us fit a KeyNMF model with two topics to see if the model finds these.
+
+```python
+from turftopic import KeyNMF
+
+model = KeyNMF(2, top_n=15, random_state=42).fit(corpus)
+model.print_topics()
+```
+
+| Topic ID | Highest Ranking |
+| - | - |
+| 0 | windows, dos, os, disk, card, drivers, file, pc, files, microsoft |
+| 1 | atheism, atheist, atheists, religion, christians, religious, belief, christian, god, beliefs |
+
+The results conform our intuition. Topic 0 seems to revolve around IT, while Topic 1 around atheism and religion.
+We can already suspect, however that more granular topics could be discovered in this corpus.
+For instance Topic 0 contains terms related to operating systems, like *windows* and *dos*, but also components, like *disk* and *card*.
+
+We can access the hierarchy of topics in the model at the current stage, with the model's `hierarchy` property.
+
+```python
+print(model.hierarchy)
+```
+
+
+
+There isn't much to see yet, the model contains a flat hierarchy of the two topics we discovered and we are at root level.
+We can dissect these topics, by adding a level to the hierarchy.
+
+Let us add 3 subtopics to each topic on the root level.
+
+```python
+model.hierarchy.divide_children(n_subtopics=3)
+```
+
+
+
+As you can see, the model managed to identify meaningful subtopics of the two larger topics we found earlier.
+Topic 0 got divided into a topic mostly concerned with dos and windows, a topic on operating systems in general, and one about hardware,
+while Topic 1 contains a topic about newsgroups, one about atheism, and one about morality and christianity.
+
+You can also easily access nodes of the hierarchy by indexing it:
+```python
+model.hierarchy[0]
+```
+
+
+
+You can also divide individual topics to a number of subtopics, by using the `divide()` method.
+Let us divide Topic 0.0 to 5 subtopics.
+
+```python
+model.hierarchy[0][0].divide(5)
+model.hierarchy
+```
+
+