Cost effective comparison of LLMs
Overview
Comparing LLM outputs BEFORE operationalizing at scale is key. We will take a look at how to do that with easy using Google Colab.
We will explore the most carbon-neutral way of doing the same without using GPUs or extra API cost.
Objective
Get answer to the SAME question using Ollama tool to run two different LLMs
- Phi3 Mini
- Llama 3.1
We will utilize the power of Langchain to invoke the LLMs
Setup
This code installs the necessary software to run large language models (LLMs) locally in Google Colab.
!curl https://ollama.ai/install.sh | sh
:
Downloads and executes the installation script for Ollama, a tool for running LLMs locally.
!pip install ollama langchain langchain_ollama pandas langchain_community
:
Installs the required Python libraries:
ollama
: Python library to interact with the Ollama server.langchain
: Framework for building applications with LLMs.langchain_ollama
: Provides integration between LangChain and Ollama.pandas
: Data analysis and manipulation library.langchain_community
: Community-driven extensions for LangChain.
!curl https://ollama.ai/install.sh | sh
!pip install ollama langchain langchain_ollama pandas langchain_community
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0>>> Downloading ollama...
100 11868 0 11868 0 0 30051 0 --:--:-- --:--:-- --:--:-- 30045
############################################################################################# 100.0%
>>> Installing ollama to /usr/local/bin...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
WARNING: Unable to detect NVIDIA/AMD GPU. Install lspci or lshw to automatically detect and install GPU dependencies.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
Requirement already satisfied: ollama in /usr/local/lib/python3.10/dist-packages (0.3.1)
Requirement already satisfied: langchain in /usr/local/lib/python3.10/dist-packages (0.2.12)
...
...
...
Requirement already satisfied: langchain_ollama in
Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx<0.28.0,>=0.27.0->ollama) (1.2.2)
Launch Ollama
This code starts the Ollama server as a background process.
import subprocess: Imports the subprocess module, which allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.
process = subprocess.Popen("nohup ollama serve &", shell=True): Starts the Ollama server in the background using the following:
nohup: Ensures the Ollama server continues running even if the current terminal session
import subprocess
process = subprocess.Popen("nohup ollama serve &", shell=True)
Run phi3:mini
!ollama pull phi3:mini
[?25lpulling manifest โ [?25h[?25l[2K[1Gpulling manifest โ [?25h[?25l[2K[1Gpulling manifest โ น [?25h[?25l[2K[1Gpulling manifest โ ธ [?25h[?25l[2K[1Gpulling manifest โ ผ [?25h[?25l[2K[1Gpulling manifest
pulling 633fc5be925f... 100% โโ 2.2 GB
pulling fa8235e5b48f... 100% โโ 1.1 KB
pulling 542b217f179c... 100% โโ 148 B
pulling 8dde1baf1db0... 100% โโ 78 B
pulling 23291dc44752... 100% โโ 483 B
verifying sha256 digest
writing manifest
removing any unused layers
success [?25h
!nohup ollama run phi3:mini &
nohup: appending output to 'nohup.out'
Valdiate LLM
Following command enlists the models being currently run by Ollama. If phi3 is found, it's successfull validation.
!ollama ps
NAME ID SIZE PROCESSOR UNTIL
phi3:mini 4f2222927938 6.0 GB 100% CPU 4 minutes from now
Eliciting output
Let us elicit an answer from the llm for a question
###########
# QUESTIONS
############
base="Can you write a social media posting in passive voice in GenZ language with hashtags and emoticons"
questions=[base+" for a purple beaded jewellery?",base+" for a green jewellery with golden flowers?",base+" for a pink and black beaded jewellery with baked tear-drop style pendant?"]
###################
# Hyper parameters
###################
temperature=1.0
num_predict=400
top_k=2
top_p=0.5
repeat_penalty=1.5
model_name='phi3:mini'
# Model invocation
from langchain_community.llms import Ollama
llm = Ollama(model=model_name,temperature=temperature,num_predict=num_predict,top_k=top_k,top_p=top_p,repeat_penalty=repeat_penalty)
output1 = llm.invoke(questions[0])
print(output1)
๐ตโจ Hey fam, just dropped some killer new #PurplishBeads on the block! These bad boys are straight fire & totally glow-up your style game. They're not only dope to look at but also made with love and care by our skilled artisans ๐โจ
#JewelryLove #PurpleIsTheNewBlack @OwnItWithBeads โ
Check out the link in bio for a sneak peek! Don't sleep on this purty piece, itโs gonna be lit AF ๐ฅโจ
#PassiveVibesOnly #BeadedBliss (Link to product)
Switching gears
Restarting ollama here.
!ollama ps
!systemctl restart ollama
!sudo killall -s 9 ollama
!ollama ps
NAME ID SIZE PROCESSOR UNTIL
phi3:mini 4f2222927938 6.0 GB 100% CPU 4 minutes from now
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
Error: could not connect to ollama app, is it running?
import subprocess
process = subprocess.Popen("nohup ollama serve &", shell=True)
!ollama pull llama3.1 &
[?25lpulling manifest โ [?25h[?25l[2K[1Gpulling manifest โ [?25h[?25l[2K[1Gpulling manifest โ น [?25h[?25l[2K[1Gpulling manifest โ ธ [?25h[?25l[2K[1Gpulling manifest
pulling 87048bcd5521... 100% โโ 4.7 GB
pulling 8cf247399e57... 100% โโ 1.7 KB
pulling f1cd752815fc... 100% โโ 12 KB
pulling 56bb8bd477a5... 100% โโ 96 B
pulling e711233e7343... 100% โโ 485 B
verifying sha256 digest
writing manifest
removing any unused layers
success [?25h
!ollama ps
NAME ID SIZE PROCESSOR UNTIL
Running Llama 3.1
!nohup ollama run llama3.1 &
nohup: appending output to 'nohup.out'
!ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3.1:latest 62757c860e01 6.2 GB 100% CPU 4 minutes from now
###################
# Hyper parameters
###################
model_name='llama3.1'
# Model invocation
from langchain_community.llms import Ollama
llm = Ollama(model=model_name,temperature=temperature,num_predict=num_predict,top_k=top_k,top_p=top_p,repeat_penalty=repeat_penalty)
output2=llm.invoke(questions[0])
!ollama ps
!systemctl restart ollama
!sudo killall -s 9 ollama
!ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3.1:latest 62757c860e01 6.2 GB 100% CPU 4 minutes from now
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
Error: could not connect to ollama app, is it running?
Comparing outputs
Now let's compare the output from the models
print(f"\nQuestion : {questions[0]}")
print(f"\n\n\tLLM1 phi3:mini : {output1}")
print(f"\n\n\tLLM2 llama3.1 : {output2}")
Question : Can you write a social media posting in passive voice in GenZ language with hashtags and emoticons for a purple beaded jewellery?
LLM1 phi3:mini : ๐ตโจ Hey fam, just dropped some killer new #PurplishBeads on the block! These bad boys are straight fire & totally glow-up your style game. They're not only dope to look at but also made with love and care by our skilled artisans ๐โจ
#JewelryLove #PurpleIsTheNewBlack @OwnItWithBeads โ
Check out the link in bio for a sneak peek! Don't sleep on this purty piece, itโs gonna be lit AF ๐ฅโจ
#PassiveVibesOnly #BeadedBliss (Link to product)
LLM2 llama3.1 : Here's the post:
"Lowkey obsessed ๐๐ฎ, but our new collection of lavender-hued beads is getting loved by everyone who sees it ๐คฉ. It seems like people are really vibing these gorgeous pieces that just happen to have been designed with love and care โค๏ธ... #PurpleVibesOnly โจ๐ #BeadedBliss ๐๐"
Conclusion
Thus without any extra cost, using the free tier of Google Colab, one can effectively compare and contrast the outputs of different open source LLMs.