Yappify Models
Technical information about Brian's AI chat models
A lightweight, pattern-based response system that runs entirely in your browser.
Uses Retrieval-Augmented Generation (RAG) with pre-computed embeddings to provide
fast responses about Brian's background and projects.
- Instant responses with no model download required
- Works offline after initial page load
- Pattern matching with contextual responses
- RAG integration with Brian's portfolio data
- Privacy-first: all processing happens locally
Technical Specifications
Architecture
Rule-based + RAG
Response Time
~500ms
Memory Usage
< 5MB
Vector Database
JSON embeddings
Production Ready
A state-of-the-art 1.5 billion parameter language model from Alibaba Cloud,
running locally via WebLLM and WebGPU. Provides sophisticated conversational AI
with deep understanding of Brian's work and general knowledge.
- 1.5B parameter transformer model
- Advanced reasoning and conversation abilities
- Integrated with Brian's knowledge base via RAG
- WebGPU acceleration for fast inference
- Quantized to 4-bit for efficient browser deployment
Technical Specifications
Parameters
1.5B
Quantization
4-bit (q4f16_1)
Model Size
~900MB
Context Length
32K tokens
Acceleration
WebGPU
Framework
MLC-AI WebLLM
Beta Release
Future integration with OpenAI's GPT-4 and Anthropic's Claude models via secure API connections.
These will provide access to the most advanced AI capabilities while maintaining privacy and security.
- State-of-the-art reasoning capabilities
- Multimodal understanding (text, images, documents)
- Advanced code generation and analysis
- Real-time web search integration
- Secure API key management
Coming Soon