Exploration Guide

Learn how Assay helps you discover connections between documents using intelligent theme matching and similarity algorithms.

Web-Based Exploration

Theme-Based Search

Every document is automatically classified using a canonical theme taxonomy with 180 themes across 30 domains. Use theme search to:

  • Browse documents by research area or topic
  • Filter search results by specific themes
  • Discover related documents through theme overlap
  • Explore public collections by theme

Tip: Use fuzzy search to find themes even if you don't know the exact name. The system suggests matches as you type.

Theme-based search interface showing fuzzy search suggestions
Search results showing documents organized by themes

Similar Document Discovery

Find related documents using Jaccard similarity. When viewing a document, click "Find Similar Documents" to see:

  • Up to 15 documents from your private collection
  • Up to 15 documents from the public collection
  • An explanation of why documents matched (which themes overlap)
  • Similarity scores sorted by relevance

How it works: The system calculates Jaccard similarity for all documents, then ranks them by weighted score (L1 themes weighted 0.8, L0 themes weighted 0.2).

Similar documents view showing theme overlap and similarity scores

Library Dashboard

Your personalized dashboard provides theme-based aggregation of your collection:

  • View your documents organized by research domains (L0 themes)
  • See sub-theme breakdowns (L1 themes) within each domain
  • Discover related public documents aligned with your interests
  • Get AI-generated insights about your research profile

MCP-Based Exploration

Exploring with Claude Desktop

Use MCP tools to explore your collection through conversational AI

When using Assay MCP tools with Claude Desktop, you can explore your collection through natural language conversations. Claude uses the same Jaccard similarity algorithm and theme-based organization behind the scenes.

Key MCP Exploration Tools

Search & Discovery

  • • search_by_theme - Find documents by canonical theme
  • • search_documents - Search by theme, author, or title
  • • browse_themes - Explore the theme taxonomy
  • • search_by_keywords - Search in keywords and concepts

Similarity & Analysis

  • • get_similar_documents - Uses Jaccard similarity
  • • compare_documents - Compare up to 10 documents
  • • ask_question - Ask questions across your library
  • • produce_faq - Generate FAQs from multiple docs

Example MCP Queries

"Find documents similar to this one"

Claude uses get_similar_documents with Jaccard similarity

"What documents share the Cloud Architecture theme?"

Claude uses search_by_theme to find theme matches

"Compare these 3 documents about zero trust"

Claude uses compare_documents to analyze theme overlap and differences

When to Use Each Method

Web Interface

Best for:

  • Browsing and reading summaries
  • Visual exploration of your collection
  • Quick theme-based filtering
  • Managing document visibility
  • Viewing library insights and stats

MCP Integration

Best for:

  • Conversational document exploration
  • Asking complex questions across documents
  • Generating FAQs and comparisons
  • AI-powered synthesis and analysis
  • Workflow integration with Claude

How Jaccard Similarity Works

Understanding Document Similarity

Documents are related when they share themes, not just keywords

Assay uses the Jaccard similarity coefficient to measure how similar two documents are based on their shared themes. This approach enables semantic discovery—finding related documents even when they use different terminology.

The Jaccard Formula

Jaccard(A, B) = |A ∩ B| / |A ∪ B|

Where A and B are the theme sets of two documents

What this means:

  • Intersection (A ∩ B): Themes shared by both documents
  • Union (A ∪ B): All unique themes from both documents
  • Result: A score between 0 (no overlap) and 1 (identical themes)

Example Calculation

Document A themes:[AI, Machine Learning, Neural Networks, Deep Learning]
Document B themes:[AI, Machine Learning, Computer Vision, Deep Learning]

Intersection: [AI, Machine Learning, Deep Learning] = 3 themes

Union: [AI, Machine Learning, Neural Networks, Deep Learning, Computer Vision] = 5 themes

Jaccard similarity = 3/5 = 0.6 (60% similarity)

Why Jaccard?

Theme-Based Matching

Compares documents based on their thematic content, not just keywords. Two documents about "fault tolerance" are related even if they use different terminology.

Normalized Scoring

Provides a consistent 0-1 similarity score regardless of document size. Documents with many themes don't dominate the results.

Hierarchical Awareness

Uses weighted scoring (L1: 0.8, L0: 0.2) to prioritize specific theme matches over broad domain overlap.

Contextual Relevance

Finds documents that share research areas and concepts, enabling semantic discovery across your collection.