Cost effective comparison of LLMs

Carbon neutral way of using CPU to compare LLMs

Overview

Comparing LLM outputs BEFORE operationalizing at scale is key. We will take a look at how to do that with easy using Google Colab.

We will explore the most carbon-neutral way of doing the same without using GPUs or extra API cost.

Objective

Get answer to the SAME question using Ollama tool to run two different LLMs

  • Phi3 Mini
  • Llama 3.1

We will utilize the power of Langchain to invoke the LLMs

Setup

This code installs the necessary software to run large language models (LLMs) locally in Google Colab.

!curl https://ollama.ai/install.sh | sh:

Downloads and executes the installation script for Ollama, a tool for running LLMs locally.

!pip install ollama langchain langchain_ollama pandas langchain_community:

Installs the required Python libraries:

  • ollama: Python library to interact with the Ollama server.

  • langchain: Framework for building applications with LLMs.

  • langchain_ollama: Provides integration between LangChain and Ollama.

  • pandas: Data analysis and manipulation library.

  • langchain_community: Community-driven extensions for LangChain.

!curl https://ollama.ai/install.sh | sh
!pip install ollama langchain langchain_ollama pandas langchain_community
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0>>> Downloading ollama...
100 11868    0 11868    0     0  30051      0 --:--:-- --:--:-- --:--:-- 30045
############################################################################################# 100.0%
>>> Installing ollama to /usr/local/bin...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
WARNING: Unable to detect NVIDIA/AMD GPU. Install lspci or lshw to automatically detect and install GPU dependencies.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
Requirement already satisfied: ollama in /usr/local/lib/python3.10/dist-packages (0.3.1)
Requirement already satisfied: langchain in /usr/local/lib/python3.10/dist-packages (0.2.12)
...
...
...
Requirement already satisfied: langchain_ollama in 
Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx<0.28.0,>=0.27.0->ollama) (1.2.2)

Launch Ollama

This code starts the Ollama server as a background process.

import subprocess: Imports the subprocess module, which allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.

process = subprocess.Popen("nohup ollama serve &", shell=True): Starts the Ollama server in the background using the following:

nohup: Ensures the Ollama server continues running even if the current terminal session

import subprocess
process = subprocess.Popen("nohup ollama serve &", shell=True)

Run phi3:mini

!ollama pull phi3:mini
[?25lpulling manifest โ ‹ [?25h[?25lpulling manifest โ ™ [?25h[?25lpulling manifest โ น [?25h[?25lpulling manifest โ ธ [?25h[?25lpulling manifest โ ผ [?25h[?25lpulling manifest 
pulling 633fc5be925f... 100% โ–•โ– 2.2 GB                         
pulling fa8235e5b48f... 100% โ–•โ– 1.1 KB                         
pulling 542b217f179c... 100% โ–•โ–  148 B                         
pulling 8dde1baf1db0... 100% โ–•โ–   78 B                         
pulling 23291dc44752... 100% โ–•โ–  483 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success [?25h
!nohup ollama run phi3:mini &
nohup: appending output to 'nohup.out'

Valdiate LLM

Following command enlists the models being currently run by Ollama. If phi3 is found, it's successfull validation.

!ollama ps
NAME     	ID          	SIZE  	PROCESSOR	UNTIL              
phi3:mini	4f2222927938	6.0 GB	100% CPU 	4 minutes from now

Eliciting output

Let us elicit an answer from the llm for a question

###########
# QUESTIONS
############

base="Can you write a social media posting in passive voice in GenZ language with hashtags and emoticons"
questions=[base+" for a purple beaded jewellery?",base+" for a green jewellery with golden flowers?",base+" for a pink and black beaded jewellery with baked tear-drop style pendant?"]


###################
# Hyper parameters
###################
temperature=1.0
num_predict=400
top_k=2
top_p=0.5
repeat_penalty=1.5
model_name='phi3:mini'

#  Model invocation
from langchain_community.llms import Ollama
llm = Ollama(model=model_name,temperature=temperature,num_predict=num_predict,top_k=top_k,top_p=top_p,repeat_penalty=repeat_penalty)
output1 = llm.invoke(questions[0])
print(output1)
๐Ÿ”ตโœจ Hey fam, just dropped some killer new #PurplishBeads on the block! These bad boys are straight fire & totally glow-up your style game. They're not only dope to look at but also made with love and care by our skilled artisans ๐Ÿ’–โœจ
#JewelryLove #PurpleIsTheNewBlack @OwnItWithBeads  โœ… Check out the link in bio for a sneak peek! Don't sleep on this purty piece, itโ€™s gonna be lit AF ๐Ÿ”ฅโœจ
#PassiveVibesOnly #BeadedBliss (Link to product)

Switching gears

Restarting ollama here.

!ollama ps
!systemctl restart ollama
!sudo killall -s 9 ollama
!ollama ps
NAME     	ID          	SIZE  	PROCESSOR	UNTIL              
phi3:mini	4f2222927938	6.0 GB	100% CPU 	4 minutes from now	
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
Error: could not connect to ollama app, is it running?
import subprocess
process = subprocess.Popen("nohup ollama serve &", shell=True)
!ollama pull llama3.1 &
[?25lpulling manifest โ ‹ [?25h[?25lpulling manifest โ ™ [?25h[?25lpulling manifest โ น [?25h[?25lpulling manifest โ ธ [?25h[?25lpulling manifest 
pulling 87048bcd5521... 100% โ–•โ– 4.7 GB                         
pulling 8cf247399e57... 100% โ–•โ– 1.7 KB                         
pulling f1cd752815fc... 100% โ–•โ–  12 KB                         
pulling 56bb8bd477a5... 100% โ–•โ–   96 B                         
pulling e711233e7343... 100% โ–•โ–  485 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success [?25h
!ollama ps
NAME	ID	SIZE	PROCESSOR	UNTIL

Running Llama 3.1

!nohup ollama run llama3.1 &
nohup: appending output to 'nohup.out'
!ollama ps
NAME           	ID          	SIZE  	PROCESSOR	UNTIL              
llama3.1:latest	62757c860e01	6.2 GB	100% CPU 	4 minutes from now
###################
# Hyper parameters
###################
model_name='llama3.1'

#  Model invocation
from langchain_community.llms import Ollama
llm = Ollama(model=model_name,temperature=temperature,num_predict=num_predict,top_k=top_k,top_p=top_p,repeat_penalty=repeat_penalty)
output2=llm.invoke(questions[0])
!ollama ps
!systemctl restart ollama
!sudo killall -s 9 ollama
!ollama ps
NAME           	ID          	SIZE  	PROCESSOR	UNTIL              
llama3.1:latest	62757c860e01	6.2 GB	100% CPU 	4 minutes from now	
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
Error: could not connect to ollama app, is it running?

Comparing outputs

Now let's compare the output from the models

print(f"\nQuestion : {questions[0]}")
print(f"\n\n\tLLM1 phi3:mini : {output1}")
print(f"\n\n\tLLM2 llama3.1  : {output2}")
Question : Can you write a social media posting in passive voice in GenZ language with hashtags and emoticons for a purple beaded jewellery?
    
    
        LLM1 phi3:mini : ๐Ÿ”ตโœจ Hey fam, just dropped some killer new #PurplishBeads on the block! These bad boys are straight fire & totally glow-up your style game. They're not only dope to look at but also made with love and care by our skilled artisans ๐Ÿ’–โœจ
    #JewelryLove #PurpleIsTheNewBlack @OwnItWithBeads  โœ… Check out the link in bio for a sneak peek! Don't sleep on this purty piece, itโ€™s gonna be lit AF ๐Ÿ”ฅโœจ
    #PassiveVibesOnly #BeadedBliss (Link to product)
    
    
        LLM2 llama3.1  : Here's the post:
    
    "Lowkey obsessed ๐Ÿ’œ๐Ÿ”ฎ, but our new collection of lavender-hued beads is getting loved by everyone who sees it ๐Ÿคฉ. It seems like people are really vibing these gorgeous pieces that just happen to have been designed with love and care โค๏ธ... #PurpleVibesOnly โœจ๐Ÿ’Ž #BeadedBliss ๐Ÿ˜Œ๐Ÿ‘€"

Conclusion

Thus without any extra cost, using the free tier of Google Colab, one can effectively compare and contrast the outputs of different open source LLMs.

Previous
Previous

Worldโ€™s first โ€œBrand Engineering Copilotโ€ by Seyana AI

Next
Next

Seyana AI Cost-effective Generative AI apps