- Episodic Memory Tool Set: This tool measures how fast and accurately MemMachine performs core episodic memory tasks. For a list of specific commands, check out the Episodic Memory Tool Set.
- Episodic Profile Agent Tool Set: This tool is designed to evaluate the speed and quality of MemMachine’s Profile Agent. For a list of specific commands, check out the Episodic Profile Agent Tool Set.
Getting Started
Before you run any benchmarks, you’ll need to set up your environment. General Prerequisites:- MemMachine Backend: Both tools require that your MemMachine backend be installed and configured. If you need help with this, you can check out our QuickStart Guide.
-
Start the Backend: Once everything is set up, start MemMachine with this command:
- Episodic Memory: For this tool, please ensure your
cfg.yml
file has been copied into yourlocomo
directory (/memmachine/evaluation/locomo/
) and renamed tolocomo_config.yaml
. - Episodic Profile Agent: This tool requires your MCP Server to be running before you run any commands.
Running the Benchmark
Ready to go? Follow these simple steps. A. All commands should be run from their respective tool directory (e.g.,locomo/episodic_memory/
or locomo/episodic_profile_agent/
).
B. The path to your data file, locomo10.json
, should be updated to match its location. By default, you can find it in /memmachine/evaluation/locomo/
.
C. Once you have performed step 1 below, you can repeat the benchmark run by performing steps 2-4. Once are you finished performing the benchmark, run step 5.
1
Ingest a Conversation
First, let’s add conversation data to MemMachine. This only needs to be done once per test run.
2
Search the Conversation
Now, let’s search through the data you just added.
3
Evaluate the Responses
Next, run a LoCoMo evaluation against the search results.
4
Generate Your Final Score
Once the evaluation is complete, you can generate the final scores.The output will be a table in your shell showing the mean scores for each category and an overall score.
5
Clean Up Your Data
When you’re finished, you may want to delete the test data. This is especially important before running a different benchmark.Then, clean up the
-
For Episodic Memory: Simply run this command:
- For Episodic Profile Agent: You’ll need to run two commands to ensure all data is removed:
locomo
data as well: