Netvouz - llm bookmarks by henrik

Emerging Architectures for LLM Applications | Andreessen Horowitz
Large language models are a powerful new primitive for building software. But since they are so new—and behave so differently from normal computing resources—it’s not always obvious how to use them. In this post, we’re sharing a reference architecture for the emerging LLM app stack. It shows the most common systems, tools, and design patterns we’ve seen used by AI startups and sophisticated tech companies. This stack is still very early and may change substantially as the underlying technology advances, but we hope it will be a useful reference for developers working with LLMs now.
in Artificial Intelligence AI > LLM/FM Large Language / Foundation Models with ai architecture foundationmodels generativeai largelanguagemodels llm
LLM Inference Sizing and Performance Guidance
When planning to deploy a chatbot or simple Retrieval-Augmentation-Generation (RAG) pipeline, you may have questions about sizing (capacity) and performance based on your existing GPU resources or potential future GPU acquisitions. For instance: What is the maximum number of concurrent requests that can be supported for a specific Large Language Model (LLM) on a specific GPU? What is the maximum sequence length (or prompt size) that a user can send to the chat app without experiencing a noticeably slow response time? What is the estimated response time (latency) for generating output tokens, and how does it vary with different input sizes and LLM sizes? Conversely, if you have specific capacity or latency requ
in Computers > Hårdvara > AI with concurrency gpu llm memory requirements scaling sizing
Meta LLaMa 2 with chat interface
Try out Meta Llama2 using chat interface
in Artificial Intelligence AI > LLM/FM Large Language / Foundation Models with explore facebook generativeai llm meta test try watsonx

llm from all users

Common Tags