# Memory Recall Cost-Mode Evaluation

## Objective
Run local-first memory recall with semantic embeddings as fallback only, then evaluate quality/cost at Day 1, 7, 14, and 30.

## Start Date
- 2026-02-24

## Mode Definition
- Primary: local recall (QMD/grep/scripts)
- Secondary: semantic embeddings only for ambiguous or failed local recall

## Baseline (Day 0)
- memory_search tool currently failing due to embedding quota exhaustion (`insufficient_quota` 429).
- Local knowledge system remains available.

## Evaluation Checkpoints
- Day 1: 2026-02-25
- Day 7: 2026-03-02
- Day 14: 2026-03-10
- Day 30: 2026-03-26

## Metrics to Capture at Each Checkpoint
1. Recall quality
   - Number of memory-related requests answered successfully
   - Number needing follow-up clarification
   - Number of misses
2. Cost/usage
   - Embedding/API usage during period (if available)
   - Any quota or outage events
3. Latency/reliability
   - Perceived responsiveness
   - Any retrieval failures
4. User satisfaction
   - Quick subjective rating (good / acceptable / poor)

## Decision Rubric (Day 30)
- Keep local-first if quality remains acceptable and outages/costs decrease.
- Reintroduce broader semantic usage only if recall quality is materially impaired.

## Log
### 2026-02-24
- Plan created.
- Added reminder checkpoints in Apple Reminders for Day 1, 7, 14, 30.
