Loading...
Loading...
Technical documentation and theoretical framework
Misophonia research faces a challenge: while clinical studies provide controlled data, they often miss the rich, naturalistic experiences shared by sufferers in conversational settings. Podcast interviews represent an untapped reservoir of phenomenological data, containing detailed first-person accounts of triggers, coping mechanisms, disease progression, and lived experiences across diverse populations.
LERAGE (Lived Experience Retrieval Augmented Generation Engine) is a computational framework that applies natural language processing, mathematical diversity optimization, and weighted bias control techniques to systematically analyze misophonia podcast transcripts. By combining Maximal Marginal Relevance (MMR) diversity optimization, hybrid search strategies, weighted bias control, and statistical analysis, LERAGE transforms unstructured conversational data into research insights while maintaining methodological standards.
The system includes: (1) MMR diversity optimization for qualitative health research, (2) weighted bias control that reduces researcher influence without information loss, (3) hybrid search architecture combining semantic, temporal, and causal retrieval strategies, and (4) automated generation of statistics with Wilson confidence intervals and effect sizes.
Misophonia, characterized by intense emotional and physiological responses to specific sounds, presents challenges for researchers:
Podcast interviews with misophonia sufferers offer several advantages over traditional research methods:
Analyzing large volumes of conversational data introduces challenges: researcher bias in interpretation, information overload, lack of systematic diversity controls, and the need for statistical analysis. LERAGE addresses these challenges through computational methods designed for qualitative health research.
LERAGE implements four computational components:
LERAGE extends the traditional RAG paradigm with enhancements designed for research integrity:
MMR balances relevance and diversity through the mathematical formula:
MMR = λ × Sim(query, doc) - (1-λ) × max(Sim(doc, selected_doc))
Where:
LERAGE implements MMR for qualitative research by calculating relevance scores, selecting the highest relevance document first, then iteratively selecting diverse documents based on combined MMR scores that balance query relevance against similarity to already-selected content.
LERAGE employs three complementary search strategies, each optimized for different research questions:
The hybrid approach dynamically weights these strategies based on query analysis, adjusting weights for temporal queries, causal queries, or maintaining balance for general questions.
LERAGE implements weighted bias control that preserves information while reducing influence. The system detects speculation levels, applies contributor downweighting when appropriate, and adjusts relevance scores based on configured weights (typically 0.5 for speculative content from non-sufferers).
This approach enables researchers to:
LERAGE implements publication-quality statistical methods:
Multi-factor confidence assessment combines base confidence from average scores, volume bonus from number of chunks, and diversity bonus from unique episodes to produce overall confidence metrics.
LERAGE processes HTML transcripts with structured speaker attribution, maintaining temporal context while normalizing speaker roles for consistent analysis. The system extracts speaker turns with timestamps, normalizes Host/Guest roles, preserves original attribution, and generates deterministic episode IDs.
Three parallel chunking approaches ensure comprehensive coverage:
Each chunk receives deterministic IDs enabling consistent processing across system restarts.
The system determines search strategy weights based on query characteristics, performs weighted searches across semantic, temporal, and causal dimensions as needed, then merges and reranks results. It dynamically adapts search strategies based on query characteristics, ensuring optimal retrieval for different research questions.
The bias control system detects speculation through pattern matching of phrases like "I think", "I believe", "it seems", calculating speculation density per chunk. It identifies primary sources through experiential language patterns like "I experience", "when I hear", "my triggers", scoring content based on first-person accounts versus third-party speculation.
The analytics service calculates prevalence with Wilson confidence intervals, providing accurate bounds for proportions. It implements chi-square tests for associations, Cramér's V for effect sizes, and odds ratios for relationship strength, all following standard statistical methodologies.
Drag to pan • Scroll to zoom
Drag to pan • Scroll to zoom
LERAGE supports multiple research query types:
Each query type activates appropriate search strategies and bias controls.
The system provides prevalence calculation with confidence intervals, association testing with effect sizes, and generation of publication-ready tables through the command-line interface.
Researchers can exclude specific contributors completely, reduce contributor influence by weighting, filter for primary sources only (excluding speculation), and control diversity versus relevance balance through the MMR lambda parameter.
This work demonstrates the application of Maximal Marginal Relevance to qualitative health research, providing:
Unlike binary exclusion approaches, weighted bias control offers:
The combination of semantic, temporal, and causal search strategies provides:
The system uses Docker Compose for service orchestration, with ChromaDB for vector storage and a FastAPI-based API service. Configuration is managed through environment variables including API keys and model specifications.
# Data Management
lerage data stats # System statistics
lerage data ingest /path --exclude-researchers # Ingest with filtering
lerage data ingestion-status # Check processing status
# Analysis Commands
lerage analyze search "question" # Basic search
lerage analyze search "question" --exclude-author name --mmr-lambda 0.7
lerage analyze prevalence feature1 feature2 # Prevalence analysis
lerage analyze association feature1 feature2 # Association testing
lerage analyze triggers --sufferers-only # Common trigger analysis
lerage analyze coping --sufferers-only # Coping strategy patterns
lerage analyze onset --sufferers-only # Age of onset analysis
lerage analyze patterns # Comprehensive pattern extraction
# Content Commands
lerage content find-highlights <slug> # AI clip discovery
lerage content extract-text-assets <slug> # Quote generation
lerage content process-clips <slug> # Video production
lerage content generate-schedule # Social media calendar
lerage content seo-audit # SEO optimization
--exclude-author NAME # Complete contributor exclusion
--downweight-author FACTOR # Weight multiplier (0.1-1.0)
--mmr-lambda VALUE # Diversity control (0.0-1.0)
--primary-sources # Exclude speculation
--min-sources N # Minimum unique episodes
LERAGE represents an advancement in computational qualitative research methodology, demonstrating how technical approaches can enhance traditional research methods. The system's components—MMR diversity optimization, weighted bias control, hybrid search strategies, and automated research statistics—address challenges in scaling qualitative analysis while maintaining scientific rigor.
Key contributions include:
The system has successfully processed extensive podcast content, enabling researchers to extract systematic insights from conversational data at scale. By making experiential data accessible for rigorous analysis, LERAGE opens possibilities for understanding misophonia and potentially other conditions where qualitative data exists in unstructured formats.
# Core API Settings
OPENAI_API_KEY=your_api_key_here
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
LLM_MODEL=gpt-4
# Chunking Parameters
CHUNK_SIZE=600
CHUNK_OVERLAP=150
# Bias Control Defaults
DEFAULT_DOWNWEIGHT_FACTOR=0.5
DEFAULT_MMR_LAMBDA=0.7
DEFAULT_TOP_K=40
# Statistical Settings
CONFIDENCE_LEVEL=0.95
The bias configuration includes options for excluding contributors, setting downweight factors, adjusting MMR lambda, filtering for primary sources only, and excluding speculation.
CI = (p̂ + z²/2n ± z√(p̂(1-p̂)/n + z²/4n²)) / (1 + z²/n)
Where p̂ is the sample proportion, z is the critical value, and n is the sample size.
V = √(χ²/(n × (min(rows,cols) - 1)))
Standardized measure of association strength between categorical variables.
MMR(Di) = λ × Sim(Di, Q) - (1-λ) × max[Sim(Di, Dj)]
Where Dj represents already selected documents, balancing relevance and diversity.
LERAGE: Advancing misophonia research through computationally-enhanced, bias-controlled analysis of lived experiences.
Questions about LERAGE or interested in collaboration?
Get in Touch