Merge pull request #362 from aitomatic/docs

update examples and tutorials
aitomatic · Oct 9, 2024 · 7d87fa0 · 7d87fa0
2 parents c13832d + 80df9a0
commit 7d87fa0
Show file tree

Hide file tree

Showing 26 changed files with 6,619 additions and 1,044 deletions.
diff --git a/README.md b/README.md
@@ -3,8 +3,8 @@
 # OpenSSA: Neurosymbolic Agentic AI for Industrial Problem-Solving
 
 OpenSSA is an open-source neurosymbolic agentic AI framework
-designed to solve complex, high-stakes problems in industries like semiconductor, manufacturing and finance,
-where consistency, accuracy and deterministic outcomes are essential.
+designed to solve complex, high-stakes problems in industries like semiconductor, energy and finance,
+where consistency, accuracy and deterministic outcomes are paramount.
 
 At the core of OpenSSA is the [__Domain-Aware Neurosymbolic Agent (DANA)__](https://arxiv.org/abs/2410.02823) architecture,
 advancing generative AI from basic pattern matching and information retrieval to industrial-grade problem solving.

diff --git a/docs/GETTING_STARTED.md b/docs/GETTING_STARTED.md
@@ -16,10 +16,10 @@ Go straight to [OpenSSA Streamlit app](https://openssa.streamlit.app/) and start
 
 ## Getting Started as a Developer
 
-See some example user programs in the [examples/notebooks](./examples/notebooks) directory. For example, to see the sample use case on ALD semiconductor knowledge, do:
+See some example user programs in the [examples](./examples) directory. For example, to see the sample use case on semiconductor knowledge, do:
 
 ```bash
-% cd examples/notebooks
+% cd examples/semiconductor
 ```
 
 ### Common `make` targets for OpenSSA developers

diff --git a/docs/diagrams/ssm-QA-vs-PS.drawio.png b/docs/diagrams/ssm-QA-vs-PS.drawio.png
diff --git a/docs/diagrams/ssm-class-diagram.drawio.png b/docs/diagrams/ssm-class-diagram.drawio.png
diff --git a/docs/diagrams/ssm-composability.drawio.png b/docs/diagrams/ssm-composability.drawio.png
diff --git a/docs/diagrams/ssm-full-industrial-use-case.drawio.png b/docs/diagrams/ssm-full-industrial-use-case.drawio.png
diff --git a/docs/diagrams/ssm-industrial-use-case.drawio.png b/docs/diagrams/ssm-industrial-use-case.drawio.png
diff --git a/docs/diagrams/ssm-key-components.drawio.png b/docs/diagrams/ssm-key-components.drawio.png
diff --git a/docs/diagrams/ssm-llama-index-integration-patterns.drawio.png b/docs/diagrams/ssm-llama-index-integration-patterns.drawio.png
diff --git a/docs/diagrams/ssm-llama-index-integration.drawio.png b/docs/diagrams/ssm-llama-index-integration.drawio.png
diff --git a/docs/diagrams/ssm-team-of-experts.drawio.png b/docs/diagrams/ssm-team-of-experts.drawio.png
diff --git a/examples/FinanceBench-Lite/.env.template b/examples/FinanceBench-Lite/.env.template
@@ -0,0 +1,2 @@
+HF_API_KEY=[... HuggingFace API key if running HuggingFace-hosted models ...]
+OPENAI_API_KEY=[... OpenAI API key if running on OpenAI services ...]
diff --git a/examples/FinanceBench-Lite/.gitignore b/examples/FinanceBench-Lite/.gitignore
@@ -0,0 +1,15 @@
+# data files
+.data/
+
+# environment variables
+.env
+
+# iPython/Jupyter notebooks
+*.ipynb
+
+# log files
+.log/
+*.log
+
+# Streamlit secrets
+.streamlit/secrets.toml
diff --git a/examples/FinanceBench-Lite/Makefile b/examples/FinanceBench-Lite/Makefile
@@ -0,0 +1,33 @@
+dana-solve:
+	@poetry run python dana.py ${id}
+
+dana-solve-w-knowledge:
+	@poetry run python dana.py ${id} --knowledge
+
+dana-solve-w-prog-store:
+	@poetry run python dana.py ${id} --prog-store
+
+dana-solve-w-knowledge-and-prog-store:
+	@poetry run python dana.py ${id} --knowledge --prog-store
+
+dana-solve-w-llama3:
+	@poetry run python dana.py ${id} --llama3
+
+dana-solve-w-knowledge-w-llama3:
+	@poetry run python dana.py ${id} --knowledge --llama3
+
+dana-solve-w-prog-store-w-llama3:
+	@poetry run python dana.py ${id} --prog-store --llama3
+
+dana-solve-w-knowledge-and-prog-store-w-llama3:
+	@poetry run python dana.py ${id} --knowledge --prog-store --llama3
+
+dana-solve-all-combos:
+	@poetry run python dana.py ${id}
+	@poetry run python dana.py ${id} --knowledge
+	@poetry run python dana.py ${id} --prog-store
+	@poetry run python dana.py ${id} --knowledge --prog-store
+	@poetry run python dana.py ${id} --llama3
+	@poetry run python dana.py ${id} --knowledge --llama3
+	@poetry run python dana.py ${id} --prog-store --llama3
+	@poetry run python dana.py ${id} --knowledge --prog-store --llama3
diff --git a/examples/FinanceBench-Lite/README.md b/examples/FinanceBench-Lite/README.md
@@ -0,0 +1,58 @@
+<!-- markdownlint-disable MD013 MD043 -->
+
+# OpenSSA-FinanceBench Lite benchmarking
+
+This is a lite version of the benchmarking of `OpenSSA` performance
+on the `FinanceBench` dataset. We will use 1 question from the dataset to demonstrate the use of `OpenSSA` with `DANA` architecture.
+
+## [`FinanceBench` Dataset](https://github.com/patronus-ai/financebench/blob/main/financebench_sample_150.csv)
+
+## Getting Started with DANA Agent
+
+Have Python 3.12 installed.
+
+__Install__ project, and update its dependencies from time to time:
+__`make install`__.
+
+Create `.env` file following the `.env.template` and fill in necessary credentials.
+
+__Solve__ the problem corresponding to a problem `00807` `financebench_id`:
+__`make dana-solve id=00807`__.
+
+### Question
+
+`Does 3M have a reasonably healthy liquidity profile based on its quick ratio for Q2 of FY2023? If the quick ratio is not relevant to measure liquidity, please state that and explain why.`
+
+### Knowledge
+
+To solve this question, you can add knowledge related to `liquidity`. See the example below:
+
+- Liquidity Metric Formulas
+  - `(Net) Working Capital` = `(Total) Current Assets` - `(Total) Current Liabilities`
+  - `Working Capital Ratio` = `(Total) Current Assets` / `(Total) Current Liabilities`
+
+Go to `knowledge-store.txt` to add relevant knowledge yourself and see how it helps the agent to solve this question.
+
+### Program
+
+With the above-provided knowledge, the program we can provide to the agent could be as below:
+
+- Goal: To assess liquidity health of a company, calculate `quick ratio`
+  - Task: To calculate `quick ratio`, use this formula
+            `Quick Ratio` = (
+          (`Cash & Cash Equivalents` +
+           `Short-Term Investments or (Current) Marketable Securities` +
+           `(Net) Accounts Receivable, a.k.a. (Net) (Trade) Receivables`)
+          / `(Total) Current Liabilities`
+        )
+        - Sub-task 1: What are values in dollars of `Cash & Cash Equivalents`?
+        - Sub-task 2: What are values in dollars of `Short-Term Investments or (Current) Marketable Securities`?
+        - Sub-task 3: What are values in dollars of `(Net) Accounts Receivable, a.k.a. (Net) (Trade) Receivables`?
+        - Sub-task 4: What are values in dolloars of `(Total) Current Liabilities`?
+
+Go to `program-store.yml` to see details of the program yourself! You can experimenting with different plans to see how it helps the agent solve the problem as well.
+
+## Advancing DANA Agent with Domain Knowledge and Program Store
+
+- To solve the question with added domain knowledge, run `make dana-solve-w-knowledge id=00807`
+- To solve the question with added domain knowledge and program store, run `make dana-solve-w-knowledge-and-prog-store id=00807`
diff --git a/examples/FinanceBench-Lite/dana.py b/examples/FinanceBench-Lite/dana.py
@@ -0,0 +1,155 @@
+from argparse import ArgumentParser
+from functools import cache
+
+from openssa import DANA, ProgramStore, HTP, HTPlanner, FileResource, LMConfig
+from openssa.core.util.lm.huggingface import HuggingFaceLM
+from openssa.core.util.lm.openai import OpenAILM, default_llama_index_openai_lm
+
+# pylint: disable=wrong-import-order,wrong-import-position
+from data_and_knowledge import (DocName, FbId, Answer, Doc, FB_ID_COL_NAME, DOC_NAMES_BY_FB_ID, QS_BY_FB_ID,
+                                EXPERT_KNOWLEDGE, EXPERT_PROGRAMS, EXPERT_HTP_COMPANY_KEY, EXPERT_HTP_PERIOD_KEY)
+from util import QAFunc, enable_batch_qa_and_eval, log_qa_and_update_output_file
+
+
+@cache
+def get_main_lm(use_llama3: bool = False):
+    return (HuggingFaceLM if use_llama3 else OpenAILM).from_defaults()
+
+
+@cache
+def get_or_create_expert_program_store(use_llama3: bool = False) -> ProgramStore:
+    program_store = ProgramStore(lm=get_main_lm(use_llama3=use_llama3))
+
+    for program_name, htp_dict in EXPERT_PROGRAMS.items():
+        htp = HTP.from_dict(htp_dict)
+        program_store.add_or_update_program(name=program_name, description=htp.task.ask, program=htp)
+
+    return program_store
+
+
+@cache
+def get_or_create_agent(doc_name: DocName, expert_knowledge: bool = False, expert_programs: bool = False,
+                        max_depth=3, max_subtasks_per_decomp=6,
+                        use_llama3: bool = False,
+                        llama_index_openai_lm_name: str = LMConfig.OPENAI_DEFAULT_MODEL) -> DANA:
+    # pylint: disable=too-many-arguments
+    return DANA(knowledge={EXPERT_KNOWLEDGE} if expert_knowledge else None,
+
+                program_store=(get_or_create_expert_program_store(use_llama3=use_llama3)
+                               if expert_programs
+                               else ProgramStore()),
+
+                programmer=HTPlanner(lm=get_main_lm(use_llama3=use_llama3),
+                                     max_depth=max_depth, max_subtasks_per_decomp=max_subtasks_per_decomp),
+
+                resources={FileResource(path=Doc(name=doc_name).dir_path,
+                                        lm=default_llama_index_openai_lm(llama_index_openai_lm_name))})
+
+
+@cache
+def get_or_create_adaptations(doc_name: DocName) -> dict[str, str]:
+    return {EXPERT_HTP_COMPANY_KEY: (doc := Doc(name=doc_name)).company, EXPERT_HTP_PERIOD_KEY: doc.period}
+
+
+@enable_batch_qa_and_eval(output_name='DANA')
+@log_qa_and_update_output_file(output_name='DANA')
+def solve(fb_id: FbId) -> Answer:
+    return get_or_create_agent(doc_name=DOC_NAMES_BY_FB_ID[fb_id]).solve(
+        problem=QS_BY_FB_ID[fb_id],
+        adaptations_from_known_programs=get_or_create_adaptations(doc_name=DOC_NAMES_BY_FB_ID[fb_id]))
+
+
+@enable_batch_qa_and_eval(output_name='DANA-wKnowledge')
+@log_qa_and_update_output_file(output_name='DANA-wKnowledge')
+def solve_with_knowledge(fb_id: FbId) -> Answer:
+    return get_or_create_agent(doc_name=DOC_NAMES_BY_FB_ID[fb_id], expert_knowledge=True).solve(
+        problem=QS_BY_FB_ID[fb_id],
+        adaptations_from_known_programs=get_or_create_adaptations(doc_name=DOC_NAMES_BY_FB_ID[fb_id]))
+
+
+@enable_batch_qa_and_eval(output_name='DANA-wProgStore')
+@log_qa_and_update_output_file(output_name='DANA-wProgStore')
+def solve_with_program_store(fb_id: FbId) -> Answer:
+    return get_or_create_agent(doc_name=DOC_NAMES_BY_FB_ID[fb_id], expert_programs=True).solve(
+        problem=QS_BY_FB_ID[fb_id],
+        adaptations_from_known_programs=get_or_create_adaptations(doc_name=DOC_NAMES_BY_FB_ID[fb_id]))
+
+
+@enable_batch_qa_and_eval(output_name='DANA-wKnowledge-wProgStore')
+@log_qa_and_update_output_file(output_name='DANA-wKnowledge-wProgStore')
+def solve_with_knowledge_and_program_store(fb_id: FbId) -> Answer:
+    return get_or_create_agent(DOC_NAMES_BY_FB_ID[fb_id], expert_knowledge=True, expert_programs=True).solve(
+        problem=QS_BY_FB_ID[fb_id],
+        adaptations_from_known_programs=get_or_create_adaptations(doc_name=DOC_NAMES_BY_FB_ID[fb_id]))
+
+
+@enable_batch_qa_and_eval(output_name='DANA-wLlama3')
+@log_qa_and_update_output_file(output_name='DANA-wLlama3')
+def solve_with_llama3(fb_id: FbId) -> Answer:
+    return get_or_create_agent(doc_name=DOC_NAMES_BY_FB_ID[fb_id], use_llama3=True).solve(
+        problem=QS_BY_FB_ID[fb_id],
+        adaptations_from_known_programs=get_or_create_adaptations(doc_name=DOC_NAMES_BY_FB_ID[fb_id]))
+
+
+@enable_batch_qa_and_eval(output_name='DANA-wKnowledge-wLlama3')
+@log_qa_and_update_output_file(output_name='DANA-wKnowledge-wLlama3')
+def solve_with_knowledge_with_llama3(fb_id: FbId) -> Answer:
+    return get_or_create_agent(doc_name=DOC_NAMES_BY_FB_ID[fb_id], expert_knowledge=True, use_llama3=True).solve(
+        problem=QS_BY_FB_ID[fb_id],
+        adaptations_from_known_programs=get_or_create_adaptations(doc_name=DOC_NAMES_BY_FB_ID[fb_id]))
+
+
+@enable_batch_qa_and_eval(output_name='DANA-wProgStore-wLlama3')
+@log_qa_and_update_output_file(output_name='DANA-wProgStore-wLlama3')
+def solve_with_program_store_with_llama3(fb_id: FbId) -> Answer:
+    return get_or_create_agent(doc_name=DOC_NAMES_BY_FB_ID[fb_id], expert_programs=True, use_llama3=True).solve(
+        problem=QS_BY_FB_ID[fb_id],
+        adaptations_from_known_programs=get_or_create_adaptations(doc_name=DOC_NAMES_BY_FB_ID[fb_id]))
+
+
+@enable_batch_qa_and_eval(output_name='DANA-wKnowledge-wProgStore-wLlama3')
+@log_qa_and_update_output_file(output_name='DANA-wKnowledge-wProgStore-wLlama3')
+def solve_with_knowledge_and_program_store_with_llama3(fb_id: FbId) -> Answer:
+    return get_or_create_agent(DOC_NAMES_BY_FB_ID[fb_id], expert_knowledge=True, expert_programs=True, use_llama3=True).solve(  # noqa: E501
+        problem=QS_BY_FB_ID[fb_id],
+        adaptations_from_known_programs=get_or_create_adaptations(doc_name=DOC_NAMES_BY_FB_ID[fb_id]))
+
+
+if __name__ == '__main__':
+    arg_parser = ArgumentParser()
+    arg_parser.add_argument('fb_id')
+    arg_parser.add_argument('--from-id', action='store_true')
+    arg_parser.add_argument('--knowledge', action='store_true')
+    arg_parser.add_argument('--prog-store', action='store_true')
+    arg_parser.add_argument('--llama3', action='store_true')
+    args = arg_parser.parse_args()
+
+    match (args.knowledge, args.prog_store, args.llama3):
+        case (False, False, False):
+            solve_func: QAFunc = solve
+
+        case (True, False, False):
+            solve_func: QAFunc = solve_with_knowledge
+
+        case (False, True, False):
+            solve_func: QAFunc = solve_with_program_store
+
+        case (True, True, False):
+            solve_func: QAFunc = solve_with_knowledge_and_program_store
+
+        case (False, False, True):
+            solve_func: QAFunc = solve_with_llama3
+
+        case (True, False, True):
+            solve_func: QAFunc = solve_with_knowledge_with_llama3
+
+        case (False, True, True):
+            solve_func: QAFunc = solve_with_program_store_with_llama3
+
+        case (True, True, True):
+            solve_func: QAFunc = solve_with_knowledge_and_program_store_with_llama3
+
+    if not (fb_id := args.fb_id).startswith(FB_ID_COL_NAME):
+        fb_id: FbId = f'{FB_ID_COL_NAME}_{fb_id}'
+
+    solve_func(f'from:{fb_id}' if args.from_id else fb_id)