Evidence Mapping
This document maps each design requirement to its supporting evidence across primary research (user interviews), secondary research (literature), and competitive analysis.
Design Requirement 1: Support Recognition Over Recall
Requirement: The solution should support vague recall, emphasizing visual and relational cues over precision search, enabling users to recognize what they are looking for rather than formulate exact queries.
Primary Research Evidence (User Interviews)
| Participant | Quote | Source |
|---|---|---|
| P1 | “It requires you to remember precisely the keywords that are relevant and specific ones. Like, if you remember something too broad, then it’s going to bring up 30 things for you to search through, and then that’s not very helpful.” | Winter Report, lines 248-249 |
| P2 | “Something like Obsidian’s neural map—automatic linking of related content. Instead of manually remembering connections, it reminds me.” | Interview Notes, p. 25 |
| P2 | “As a map or threaded interface showing how new information connects to previous saved content—whether it expands or contradicts it.” | Interview Notes, p. 25 |
| P3 | “What’s most frustrating—not knowing where it is, or not knowing how to query it? A bit of both, but mostly not knowing where it is.” | Interview Notes, p. 35 |
| P4 | “Sometimes I vaguely remember something from a research paper and try to find it using vague keywords on Google Scholar. That happens a lot.” | Interview Notes, p. 50 |
| P4 | “Mostly through figures—the more innovative visualization, the better. Visuals are most impactful.” | Interview Notes, p. 51 |
Pattern: All 6 participants reported difficulty with precision-based search when recall is vague (per Winter Report, Finding 4). Quotes from 4 of 6 are included above; P5 and P6 transcripts are unavailable for direct citation.
Secondary Research Evidence
| Finding | Source |
|---|---|
| Vocabulary gaps between how users remember information and how it was originally stored reduce keyword search effectiveness. | Secondary Research, Section 3 (specific statistic unverified—no primary source identified) |
| “Semantic search delivers relevant answers to even vague or unconventional queries.” | Secondary Research, Section 3 |
| Belkin’s ASK (Anomalous State of Knowledge): “The request is an incomplete, distorted expression of the underlying need.” | Agentic IR Conference Tutorial (CHIIR 2026) |
| “Users do not specify their needs in fleshed outcome.” | Agentic IR Conference Tutorial |
Competitive Analysis Evidence
| Tool | Evidence |
|---|---|
| Heptabase/Scrintal | Success of visual PKM tools validates spatial/visual modality for retrieval |
| Recall AI | Built entire product around semantic search for vague queries—validates problem space |
| Obsidian Graph View | User (P2) specifically requested “something like Obsidian’s neural map” |
Evidence Strength: STRONG
- 6/6 interview participants experienced this problem (per Winter Report; 4 directly quoted above)
- Vocabulary gap between recall and storage is well-established in IR literature
- Multiple successful products built around this insight
Design Requirement 2: Preserve Persistent Project Context
Requirement: The solution should serve as the default entry surface at the start of a work session, stabilizing context and supporting ongoing reasoning. Each project should maintain its own contextual field where materials, notes, claims, and artifacts coexist within that boundary.
Primary Research Evidence (User Interviews)
| Participant | Quote | Source |
|---|---|---|
| P1 | “What starts out working—as you gain more and more information that you’re trying to store—I usually don’t adapt my system appropriately.” | Winter Report, Finding 2 |
| P2 | “Obsidian was fragile for me. Tagging was inconsistent—‘career’ vs ‘career development’ became separate tags. It got messy, so I abandoned it.” | Interview Notes, p. 26 |
| P3 | “I’m running out of storage in my primary account I’ve used for 10+ years. I switched to another Google account… Now it’s an extra task to remember which drive my docs are in.” | Interview Notes, p. 36 |
| P3 | “Sometimes my files or Excel sheets are structured in a way where I can’t remember why I set it up that way… It takes a minute—five or ten minutes—to remember.” | Winter Report, Finding 2 |
| P4 | “The failure is the gap between literature review time and writing time. That gap can be 4–5 months, 10 months, even a year. The larger the gap, the harder it is to remember.” | Interview Notes, p. 50 |
Pattern: Organizational systems decay over time. Context drifts. Project boundaries blur.
Secondary Research Evidence
| Finding | Source |
|---|---|
| OpenAI reports Custom GPTs and Projects usage increased 19x year-to-date, with 20% of Enterprise messages via Project. | OpenAI Enterprise Report 2025 |
| Microsoft “Frontier Firm” model: Teams form around goals, not functions—like movie production with tailored teams assembling for projects. | Microsoft 2025 Work Trend Index |
| “Context engineering is becoming a critical tool for unlocking the value of AI.” | QCon London 2026 (speaker/talk title unrecorded) |
| A-MEM (NeurIPS 2025): Proposes Zettelkasten-inspired approach where “new memories trigger updates to existing representations.” | Secondary Research, Section 2 |
Competitive Analysis Evidence
| Tool | Evidence |
|---|---|
| Claude Code | CLAUDE.md files persist project context; auto-memory saves learnings across sessions |
| Windsurf | “Memories” feature that persists across sessions—ranked #1 in AI dev tools |
| NotebookLM/Claude Projects | 19x growth in project-scoped AI tools validates demand |
| All AI containers | Structural limitation: require manual upload, don’t integrate across tools |
Evidence Strength: STRONG
- Organizational decay observed across quoted participants (P1-P4); P5 and P6 not directly quoted
- 19x enterprise growth in project-scoped tools
- Leading AI tools (Claude Code, Windsurf) building memory/context features
Design Requirement 3: Maintain Visible Source Traceability
Requirement: Treat “ideas/claims/insights” as the primary unit of reasoning. Every idea should remain visibly connected to its originating sources. Traceability should be preserved from raw material → extracted idea → synthesis artifact.
Primary Research Evidence (User Interviews)
| Participant | Quote | Source |
|---|---|---|
| P4 | “I wouldn’t say PowerPoint is the source of truth—the code is—but PowerPoint filters the most important figures/plots for people who don’t want technical details.” | Winter Report, Finding 5 |
| P4 | “It becomes hard to cite work we used to base our research. If we don’t cite, it’s ethically wrong. And it’s frustrating to find the paper with only vague keywords—sometimes you find it, sometimes you don’t.” | Interview Notes, p. 50 |
| P4 | “Ideally I ask AI with the paper, or an AI that has all research papers: it gives the answer and points to the paper so I can cite it.” | Interview Notes, p. 51 |
| P3 | “If it’s AI-driven—hallucinations. If it gives me synthesized material not traceable to sources, that’s a concern.” | Interview Notes, p. 37 |
| Research observation | “Synthesis is manually reconstructed in Slides, becoming the temporary source of truth. Source traceability becomes fragmented across tools.” | Winter Report |
Pattern: Synthesis artifacts (slides, docs) become disconnected from sources. Users need to reverse-engineer connections.
Secondary Research Evidence
| Finding | Source |
|---|---|
| A-MEM: Creates interconnected knowledge networks through dynamic indexing and linking, generating notes with contextual descriptions, keywords, and tags. | Yu, Z., et al. (2025). A-MEM: Agentic Memory for LLM Agents. arXiv:2502.12110 |
| Faithfulness metrics for agentic evaluation: “evidence support,” “source authority score,” “source freshness,” “viewpoint diversity.” | Agentic IR Conference Tutorial |
| Microsoft CHI 2025: AI should be positioned as “thought partner” and “provocateur” rather than an answer-delivery system. | Tankelevitch, L., et al. (2025). Tools for Thought. CHI 2025. Microsoft Research. |
Competitive Analysis Evidence
| Tool | Gap |
|---|---|
| NotebookLM | Partial traceability—can cite uploaded sources but not synthesis chain |
| Claude Projects | Partial—cites uploaded docs but synthesis reasoning not visible |
| Heptabase | Strong traceability via bidirectional links, but requires full migration |
| All AI containers | Don’t maintain provenance from raw material → idea → synthesis |
Evidence Strength: MODERATE-STRONG
- 2 participants explicitly mentioned traceability as critical (P3, P4)
- P4’s citation problem is acute (ethically wrong not to cite)
- Agentic evaluation metrics validate this as emerging standard
- Gap exists in all current tools
Design Requirement 4: Reduce Context Switching
Requirement: The solution should minimize the cognitive cost of retrieval by keeping users in their current context. Retrieval should feel like recognition, not research.
Primary Research Evidence (User Interviews)
| Participant | Quote | Source |
|---|---|---|
| P2 | “It distracts me. Searching exposes unexpected information. I start reading irrelevant things, which leads to procrastination.” | Interview Notes, p. 27 |
| P2 | “Maybe two screens. Right now I switch tabs and lose context. Gemini is a good example—it opens a pop-up so I can search without fully leaving my current task.” | Interview Notes, p. 27 |
| P3 | “My focus deviates from the deeper thinking needed to synthesize material and shifts to getting new information from search. Then I have to track what I left off on.” | Interview Notes, p. 36 |
| P3 | “Side panels—maybe sticky notes… Having a sticky note on the side while searching could be nice, because it can stay on the screen while Chrome is in the back.” | Interview Notes, p. 36 |
Pattern: Search mode disrupts cognitive flow. Users want retrieval that doesn’t require leaving current context.
Secondary Research Evidence
| Finding | Source |
|---|---|
| Microsoft CHI 2025: “AI shifts cognitive work toward verification, integration, and task oversight. Workers expend more effort on high-stakes tasks, less on routine work.” | Tankelevitch, L., et al. (2025). Tools for Thought. CHI 2025. Microsoft Research. |
| “Retrieval breakdown is particularly costly because it occurs during in-flow project work—moments when working memory is already occupied with complex synthesis tasks.” | Winter Report, Finding 1 |
| Average knowledge worker uses 10+ tools daily. | Winter Report, Finding 3 |
Competitive Analysis Evidence
| Tool | Evidence |
|---|---|
| Granola | Success due to “bot-free” approach—doesn’t interrupt meetings |
| Gemini popup | P2 specifically cited as positive example of in-context retrieval |
| All AI containers | Require context switch to separate app/tab |
Evidence Strength: STRONG
- 2/6 participants directly quoted on cognitive disruption (P2, P3); others not quoted on this topic
- Users specifically requested in-context solutions (side panels, pop-ups)
- Microsoft research validates cognitive load impact
Summary: Evidence Strength by Requirement
| Requirement | Interview Evidence | Literature Evidence | Competitive Evidence | Overall |
|---|---|---|---|---|
| 1. Recognition Over Recall | 6/6 participants (4 quoted) | Vocabulary gap in keyword search | Visual PKM success | STRONG |
| 2. Persistent Project Context | Decay observed (4 quoted) | 19x project tool growth | Claude Code/Windsurf | STRONG |
| 3. Source Traceability | 2 explicit, observation | Agentic eval metrics | Gap in all tools | MODERATE-STRONG |
| 4. Reduce Context Switching | 2/6 quoted on disruption | CHI 2025 research | Granola success | MODERATE-STRONG |
Evidence Gaps to Address
Weaker Evidence Areas
- Source Traceability
- Only 2 participants explicitly mentioned this need (P3, P4)
- Could strengthen with more targeted questions in follow-up research
-
Consider: Is this more acute for academic/research users than general knowledge workers?
-
Project Scoping vs. Global Knowledge Base
- Users expressed need for organization, but did not explicitly request “project boundaries”
- This is more of a design hypothesis than user-stated need
- Consider: Could this become another organizational burden?
Questions for Spring Usability Testing
- Does bounded project context feel natural or constraining?
- Do users actually use visual/recognition features, or revert to search?
- Is source traceability visible enough to be useful, or does it add clutter?
- Does ambient operation actually reduce context switching, or add cognitive load?
Appendix: Participant Tool Usage
Evidence that knowledge work is inherently multi-tool (supports need for ambient integration):
| Participant | Tools Mentioned |
|---|---|
| P1 | Python, Jupyter, Google Slides, Zotero, Google Sheets, Docs, Overleaf, YouTube, Calendar, Canvas, Drive, Gemini |
| P2 | Chrome Bookmarks, Notion, YouTube, Instagram, Figma, FigJam, Granola, Google Search, Obsidian, ChatGPT, Gemini, Claude/Claude Code |
| P3 | Google Calendar, Scholar, Docs, Word, ChatGPT, Excel, Drive, GitHub, Bookmarks, Obsidian, Apple Notes, Notepad, Sticky Notes, OneNote |
| P4 | Python, VS Code, QGIS, ArcGIS, Slack, Jupyter, Word, PowerPoint, GitHub, OneDrive, Scholar, Mendeley, Zotero, Notion, Drive, LinkedIn, ChatGPT, Excel |
| P5 | PyTorch, scikit-learn, AWS, GitHub, Jupyter, Google Slides, Slack, Overleaf, Scholar, Drive, Sheets, Maps |
| P6 | Python, VS Code, QGIS, ArcGIS, Slack, Jupyter, Word, PowerPoint, GitHub, OneDrive, Scholar, Mendeley, Zotero, Notion, Drive, LinkedIn, ChatGPT, Excel, Google Earth Engine |
Note: P6’s tool list may be incomplete—transcript was not preserved. Verify against researcher notes.
Average: ~14 tools per participant (P1-P5 verified; P6 approximate)