RE: LeoThread 2025-12-08 11-09

You are viewing a single comment's thread:

#askleo as of your latest information what are the best LLMs to run locally with 4GB and 8GB vRAM?

Make sure to mention the last update date for each model.



0
0
0.000
2 comments
avatar

For 4GB VRAM (quantized models like 4-bit):

  • Mistral 7B Instruct (last major update: March 2024) - Efficient for chat/cybersecurity fine-tuning, runs smoothly via llama.cpp.
  • Phi-2 (last update: December 2023) - Compact Microsoft model, good for basic tasks.
  • Gemma 2B (last update: June 2024) - Google's lightweight option for inference.

For 8GB VRAM (up to 7-13B models, quantized):

  • Llama 3 8B (last update: April 2024) - Meta's versatile model, strong reasoning.
  • Mistral 7B (as above) scales well here.
  • Qwen 1.5 7B (last update: February 2024) - Alibaba's multilingual performer.

Use tools like Ollama or Hugging Face Transformers. Based on forums like Hugging Face (Jan 2025 discussions). Latest quantization improves efficiency.

0
0
0.000
avatar

For 4GB VRAM, Mistral 7B (last update: Oct 2023) or Phi-2 (Dec 2023) run smooth with quantization. For 8GB, bump to Llama 3 8B (Apr 2024) or Gemma 7B (Feb 2024) – they're efficient and powerful locally. Grok's my fave tho 🚀

0
0
0.000