Large Language Models (LLMs)

1. Elon Musk, Tesla - X (Twitter) - Grok
Optimus - Humanoid Robot

2. OpenAI - Chat GPT
Microsoft Azure Cloud - Co-Pilot

3. Anthropic - Claude 3
- Stable Diffusion, Image Generation
- Sonnet
Amazon AWS Cloud

4. Meta (Facebook) - Llama 3
LLaMA - Meta, Facebook
4. Google - Deep Mind - Gemini, Lumiere
Gemini - Google , Bard Google (Assistant) Google Lumiere.ai

DeepSeek, R1
https://allenai.org/tulu
Ai2 - Tulu 3
Hugging Face - BLOOM
Mistral

MosaicML Foundations - MPT-7B

---------------------

https://chat.reka.ai/auth/login Baidu, https://yiyan.baidu.com/ - Ernie
Alibaba - EMO
Moonshot AI, Kimi
Apple
Stability.ai
Adobe

Copilot - Microsoft , Bing - Azure Cloud AI, Copilot (integrated with Microsoft Office 365).

imagen image-fx

6. NVIDIA -

7. Apple - Siri

8. Ten Cent - WeChat and QQ
TenCent Cloud

9. ByteDance - TikTok and its Chinese counterpart Douyin

API

Key
Python

Python

Python has the most LLM libraries (LangChain, LlamaIndex, OpenAI API, Hugging Face). Most open-source LLMs (LLaMA, Mistral, Falcon, Gemma) have first-class support for Python. Major frameworks like Ollama, LangChain, Auto-GPT, GPT-Agents are written in Python. Open Source Community LangChain – Framework for LLM-powered apps. LlamaIndex – Data management for AI agents. Auto-GPT – Fully autonomous AI agent. Ollama – Runs local LLMs easily.

LOCAL APPLICATIONS

ollama pulls a model langchain Hardware Requirements
Quantization (4-bit or 8-bit precision), which reduces the memory footprint. For a 70B model with 4-bit quantization (a common optimization), the memory requirement is 64 GB of RAM or VRAM. GPU (Recommended): A GPU with at least 24 GB of VRAM (like an NVIDIA RTX 3090 or 4090) can offload the model for faster inference. If you use a GPU, you can reduce the RAM requirement to 16-32 GB, as the model weights can reside in VRAM. CPU: multi-core CPU (AMD Ryzen 7/Ryzen 9 5900X or Intel Core i7/i9 with 6+ core. Higher core counts and faster memory bandwidth (DDR5). CPU-Only: A PC with 64 GB RAM, a decent multi-core CPU (e.g., ), and an SSD. This could run a 4-bit quantized 70B model at 1-2 tokens/second, relying on system RAM. Without a GPU, inference will lean heavily on RAM bandwidth, so a CPU with multiple memory channels (e.g., dual-channel DDR4/DDR5) helps. Storage: 500 GB of free space, preferably on an SSD, to store the model files and ensure decent load times. GPU-Assisted: An NVIDIA RTX 3090 (24 GB VRAM), 32 GB system RAM, and a mid-tier CPU. This offloads the model to the GPU, potentially reaching 5-10 tokens/second depending on optimization. Software like llama.cpp or Ollama can optimize the model for lower-end hardware, but speed will suffer compared to high-end setups (e.g., multi-GPU rigs with 100+ GB VRAM). For a smoother experience (10+ tokens/second), you’d need a beefier setup, like 128 GB RAM or a GPU with 48 GB VRAM (e.g., NVIDIA A6000). Graphics Card
High-speed internet connection

OPEN SOURCE COMMUNITIES

Hugging Face

https://huggingface.co/spaces

Google Co-Labs

GitHub

Discord Servers
https://top.gg/ - toplist of discord bots

Poe - Lists some of the main developers

Tools

Frameworks

LangChain - LangChain is an open-source framework that simplifies the development of applications using large language models (LLMs). It provides a suite of tools to help developers build applications that combine language models with other resources, such as databases, APIs, and other data sources, to create powerful, flexible, and context-aware systems.

ComfyUI - open-source, user-friendly graphical interface for working with machine learning models, particularly those in the field of generative AI (such as image generation and processing). It is designed to make it easier for non-technical users to interact with complex AI models without needing to write code.

RAG
n8n - a tool for automating workflows by connecting different services, APIs, and tools. It enables you to create complex, automated workflows without writing a lot of code, often by using simple drag-and-drop interfaces. It is capable of handling data inputs, making API calls, processing results, and triggering actions across various applications.