Use generative AI assistants like ChatGPT for FREE

Wednesday, Feb 7, 2024 | 4 minute read | Updated at Thursday, Mar 6, 2025

Tags: Ai Facebook Generative-Ai How-To Macos Llm

Use generative AI assistants like ChatGPT for FREE

If you’re intrigued by the concept of AI assistants like ChatGPT, Google Bard, Bing Chat, or others, you might have certain apprehensions regarding privacy, expenses, and beyond. This is where Llama 2 steps in.

LLaMA (Language Model for Answering) is a powerful language model developed by Facebook that can understand and respond to natural language input in a conversational manner. It is the successor of the original LLaMA model, which was released in 2019.

LLaMA 2 builds upon the advancements made in the first version of the model, with several improvements in areas such as:

Contextual Understanding: LLaMA 2 has improved contextual understanding abilities, enabling it to better comprehend the nuances of language and provide more accurate responses.
Conversational Ability: The new model has been trained on a larger dataset and can engage in longer, more complex conversations than its predecessor.
Emotional Intelligence: LLaMA 2 has been fine-tuned to recognize and respond to emotions expressed in text, such as empathy or sarcasm.
Multi-Turn Responses: The model can now engage in multi-turn conversations, where it responds to multiple messages in a conversation without losing context.
Improved Performance: LLaMA 2 has been optimized for better performance on a wide range of tasks, including question answering, text classification, and language translation.

Overall, LLaMA 2 represents a significant improvement over the original LLaMA model, with enhanced conversational abilities, emotional intelligence, and multi-turn response capabilities.

This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat.

Some use cases for software developers

Explain what this code does
Simplify it
Rewrite it in a more efficient way
Rewrite it in a more readable way
Replace the use of a library with another
Write documentation for it
Describe potential edge cases
Write unit tests for those edge cases

Memory requirements

7b models generally require at least 8GB of RAM, 3.8GB download
13b models generally require at least 16GB of RAM, 8GB download
70b models generally require at least 64GB of RAM If you run into issues with higher quantization levels, try using the q4 model or shut down any other programs that are using a lot of memory.

How to run Llama 2 on your Mac, Linux or Windows for free using Ollama

If you have a Mac or Linux (Windows support coming), you can use Ollama to run LLaMA 2. It’s by far the easiest way to do it of all the platforms, as it requires minimal work to do so. All you need is a Mac and time to download the LLM, as it’s a large file.

Download Ollama

The first thing you’ll need to do is download Ollama . It runs on Mac and Linux and makes it easy to download and run multiple models, including Llama 2. Once Ollama is downloaded, move the ollama.app to the MacOS application folder and run it.

If you have 8GB RAM

info

LLMA3 is now available, use below instead ``ollama run llama3`

We will then download a 7billion parameter LLaMA 2 model. Open a new Terminal and run (it will download a bit less than 4GB)

ollama run llama2

If you have 16GB RAM

We will then download a 13billion parameter LLaMA 2 model. Open a new Terminal and run (it will download a bit less than 8GB)

ollama pull llama2:13b

then run it with

ollama run llama2:13b

You’re4 done, you have a chat gpt like ai assisant in your terminal for free!

Running Ollama in a web browser

We will use docker compose

version: '3.8'

services:
  ollama:
    volumes:
      - ollama:/root/.ollama
    container_name: ollama
    pull_policy: always
    tty: true
    restart: unless-stopped
    image: ollama/ollama:latest

  ollama-webui:
    build:
      context: .
      args:
        OLLAMA_API_BASE_URL: '/ollama/api'
      dockerfile: Dockerfile
    image: ghcr.io/ollama-webui/ollama-webui:main
    container_name: ollama-webui
    volumes:
      - ollama-webui:/app/backend/data
    depends_on:
      - ollama
    ports:
      - ${OLLAMA_WEBUI_PORT-3003}:8080
    environment:
      - 'OLLAMA_API_BASE_URL=http://ollama:11434/api'
      - 'WEBUI_SECRET_KEY='
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

volumes:
  ollama: {}
  ollama-webui: {}

you can now access http://localhost:3003 and self register an admin account

Using text-generation-webui user interface for nearly any AI model

We will be using the project https://github.com/oobabooga/text-generation-webui

warning

Most serious and performing model need a beefy graphic card and lot of memory, dont expect to run them on a low or mid range GPU or older CPU

git clone https://github.com/oobabooga/text-generation-webui

then start either one of these scripts in a terminal

start_linux.sh
start_windows.bat
start_macos.sh
start_wsl.bat

now browse to http://127.0.0.1:7860

Go to the model tab, we need to load LLama 2 with 13Billion parameters first

go to Download model or LoRA and use TheBloke/Llama-2-13B-Chat-fp16 (32GB of download)

Click the download button and wait….

When finished, hit the reload button, select the model in the list box, and you’re done

Stable Difusion GUI

see https://github.com/AUTOMATIC1111/stable-diffusion-webui

Tags: Ai Facebook Generative-Ai How-To Macos Llm

Previous page Securing self hosted app with Fail2Ban, Traeffik

Next page OpenVPN server in a Docker container