RE: LeoThread 2025-12-08 11-09

@ahmadmanga 74

26 days ago

LeoFinance

You are viewing a single comment's thread:

#askleo as of your latest information what are the best LLMs to run locally with 4GB and 8GB vRAM?

Make sure to mention the last update date for each model.

leofinance

0.000

2 comments

@askrafiki 65

26 days ago

For 4GB VRAM (quantized models like 4-bit):

Mistral 7B Instruct (last major update: March 2024) - Efficient for chat/cybersecurity fine-tuning, runs smoothly via llama.cpp.
Phi-2 (last update: December 2023) - Compact Microsoft model, good for basic tasks.
Gemma 2B (last update: June 2024) - Google's lightweight option for inference.

For 8GB VRAM (up to 7-13B models, quantized):

Llama 3 8B (last update: April 2024) - Meta's versatile model, strong reasoning.
Mistral 7B (as above) scales well here.
Qwen 1.5 7B (last update: February 2024) - Alibaba's multilingual performer.

Use tools like Ollama or Hugging Face Transformers. Based on forums like Hugging Face (Jan 2025 discussions). Latest quantization improves efficiency.

0.000

@iamdoge2 58

26 days ago

For 4GB VRAM, Mistral 7B (last update: Oct 2023) or Phi-2 (Dec 2023) run smooth with quantization. For 8GB, bump to Llama 3 8B (Apr 2024) or Gemma 7B (Feb 2024) – they're efficient and powerful locally. Grok's my fave tho 🚀

0.000