LLM Queries

Local LLM
Cloud
API

LLM TUTORIALS

Futurepedia - ai tools

Matt Wolfe

AI Revolution

David Ondrej

Code King

Riley Brown

local install

LLMs

* Consider trying 8-bit quantization as a test if runs

vLLM or Text Generation WebUI for GPU acceleration.

Textual Language Translation

Mistral AI - Mistral 7B

Meta - Llama 3.2 8B (4-bit quantization)

Alibaba - Qwen3-8B (4-bit quantization)

DeepSeek-R1, Distill-Llama-8B (4-bit quantization)

-------------

DeepSeek-R1, Distill-Qwen-7B (4-bit quantization)

Stanford - Alpaca-LoRA 7B (4-bit quantization)

Salesforce - XGen-7B (4-bit quantization)

Meta AI - Llama 3.3 3B

Alibaba - Qwen3-4B (4-bit quantization)
Microsoft - Phi-3, 4B

-------------

Nvidia Minitron 4B - More focused on efficiency than writing quality

Stability.ai - https://github.com/Stability-AI/StableLM
https://github.com/Stability-AI/stablediffusion
Stable LM 2 3B (4-bit) - Smaller model, basic writing capabilities

MiniStral 3B (4-bit) - Very compact, limited complexity handling

Google - Gemma 2B small for complex article structure

Inception Labs
Mercury Coder Mini 5B (4-bit quantization)

Text to Image (Diffusion Models)

Use Automatic1111 WebUI (GitHub) for SD models.
ComfyUI (GitHub) for modular workflows.
Diffusers (Hugging Face) for Python-based inference.

Start with Flux.1 [schnell] on Replicate or Poe.com (free daily credits available) to test without local setup.

https://huggingface.co/RunDiffusion/Juggernaut-XL
,br> FLUX.1 [schnell] - 12B parameter, image generator

Stable Diffusion 1.5/2.1: Easily at full precision
Stable Diffusion LCM-LoRA
https://stability.ai/license

Dreamlike Photoreal 2.0

https://huggingface.co/latent-consistency/lcm-lora-sdxl

SDXL (with --medvram), DeepFloyd IF (if RAM allows)
. Segmind Stable Diffusion (SSD-1B)

Text to Speech (TTS) - Audio

Speech to Text
Language Translation

Text to Video

444

Local Installation (AI)

Ollama - pulls a model.
Ollama (if using GGUF-quantized SD models)
MSTY enhances Ollama

llama.cpp with Q4_K_M quantization

Automatic1111 WebUI (supports 4-bit via --medvram)

LangChain - LangChain is an open-source framework that simplifies the development of applications using large language models (LLMs). It provides a suite of tools to help developers build applications that combine language models with other resources, such as databases, APIs, and other data sources, to create powerful, flexible, and context-aware systems.

ComfyUI (better for LCM-LoRA & efficiency)open-source, user-friendly graphical interface for working with machine learning models, particularly those in the field of generative AI (such as image generation and processing). It is designed to make it easier for non-technical users to interact with complex AI models without needing to write code.

Replit

CPU: multi-core CPU (Intel Core i7/i9 with 6+ core). CPU-Only: A PC with 64 GB RAM, a decent multi-core CPU, and an SSD. This could run a 4-bit quantized 70B model at 1-2 tokens/second, relying on system RAM. Without a GPU, inference will lean heavily on RAM bandwidth, so a CPU with multiple memory channels (e.g., dual-channel DDR4/DDR5) helps.

GPU-Assisted: An NVIDIA RTX 3090 (24 GB VRAM), 32 GB system RAM, and a mid-tier CPU. This offloads the model to the GPU, potentially reaching 5-10 tokens/second depending on optimization.

Quantization (1-bit, 4-bit or 8-bit precision), which reduces the memory footprint. Software like llama.cpp or Ollama can optimize the model for lower-end hardware.

High-speed internet connection

Cloud Server

Lease hosted, managed cloud space
digital Ocean, affiliate program

Netlify
Hugging Face
Spaces
Models

https://www.alibabacloud.com/en

LLMs API

Amazon, Nova

Inflection AI, Inflection-2.5

OpenAI - Chat GPT

Google - DeepMind, Gemini 2.5 Pro, Gemini 2.0

Grok - Elon Musk, Tesla - X (Twitter) -
Optimus - Humanoid Robot

Claude 3.7 - Anthropic
- Sonnet, Amazon AWS Cloud

Perplexity AI
Multimodal capabilities (text, images via DALL-E 3, Stable Diffusion XL). DeepSeek, R1
DeepSeek GitHub
DeepSeek Hugging Face

Flux
Tencent - Hunyuan
Doubao (LLM) is ByteDance’s family of large language models (LLMs), powering its AI chatbot, also named Doubao, designed to compete with models like OpenAI’s ChatGPT. Launched in August 2023, it supports text, audio, image, and video generation, with a focus on cost-efficiency and enterprise use.

- Doubao-1.5-Pro: Matches or outperforms GPT-4o and Claude 3.5 Sonnet in benchmarks. Priced at ~$0.11 per million input tokens, it’s up to 50x cheaper than competitors.

- Video Generation: Includes models like Doubao-PixelDance (10-second videos) and Doubao-Seaweed (30-second videos), leveraging Douyin and CapCut’s video expertise for lifelike clips from text/image prompts.

- Accessible via ByteDance’s Volcano Engine, with pricing as low as 0.0008 yuan per 1,000-token prompt.

Tarsier2 is a large vision-language model (LVLM)

TikTok, launched globally in September 2017 after ByteDance acquired Musical.ly, is a short-form video-sharing platform.

Douyin (抖音), launched in September 2016, is ByteDance’s Chinese version of TikTok, serving 700 million users in China. It’s a short-form video platform with more advanced features than TikTok, tailored for the Chinese market.

TikTok and Douyin are available on app stores, with Douyin restricted to China.

Access Doubao via https://volcanoengine.com or the Doubao app.