top_ollama_models_plot
plot top 10 most pulles Ollama models https://ollama.com/library
Thinking
Configuring DataFrame Display
I'm considering how to effectively display multiple tables using display(...). It looks like showcasing the DataFrame shape and columns can be done with a single call by creating a meta DataFrame first.
I could format this as meta, showing the number of rows and columns, and then display the first few rows of the main DataFrame. It seems I should avoid multiple tables and keep it simple, ending with display(df.head(10)). Let's streamline this code for clarity!
Let’s first pull the Ollama library page into a small table and verify we captured model names and pull counts correctly before making the chart.
</>Scrape model rankings
import re
from urllib.parse import urljoin
import pandas as pd
import requests
from bs4 import BeautifulSoup
from IPython.display import display
url = "https://ollama.com/library"
html = requests.get(url, headers={"User-Agent": "Mozilla/5.0"}, timeout=30).text
soup = BeautifulSoup(html, "html.parser")
def parse_count(text):
text = text.strip().upper().replace(",", "")
mult = 1
if text.endswith("K"):
mult = 1_000
text = text[:-1]
elif text.endswith("M"):
mult = 1_000_000
text = text[:-1]
elif text.endswith("B"):
mult = 1_000_000_000
text = text[:-1]
return float(text) * mult
rows = []
seen = set()
for a in soup.select('a[href^="/library/"]'):
href = a.get("href", "")
if href in {"/library", "/library/"}:
continue
text = " ".join(a.stripped_strings)
m = re.search(r"([\d.,]+[KMB]?)\s+Pulls", text, flags=re.I)
if not m:
continue
model = href.rstrip("/").split("/")[-1]
if model in seen:
continue
seen.add(model)
pulls_text = m.group(1)
rows.append(
{
"model": model,
"pulls_text": pulls_text,
"pulls": parse_count(pulls_text),
"url": urljoin(url, href),
}
)
df = pd.DataFrame(rows).sort_values("pulls", ascending=False).reset_index(drop=True)
meta = pd.DataFrame(
{
"shape": [str(df.shape)],
"columns": [list(df.columns)],
}
)
display(meta)
display(df.head(10))| shape | columns | |
|---|---|---|
| 0 | (229, 4) | [model, pulls_text, pulls, url] |
| model | pulls_text | pulls | url | |
|---|---|---|---|---|
| 0 | llama3.1 | 114.8M | 114800000.0 | https://ollama.com/library/llama3.1 |
| 1 | deepseek-r1 | 86M | 86000000.0 | https://ollama.com/library/deepseek-r1 |
| 2 | nomic-embed-text | 71M | 71000000.0 | https://ollama.com/library/nomic-embed-text |
| 3 | llama3.2 | 70M | 70000000.0 | https://ollama.com/library/llama3.2 |
| 4 | gemma3 | 36.9M | 36900000.0 | https://ollama.com/library/gemma3 |
| 5 | qwen2.5 | 30.8M | 30800000.0 | https://ollama.com/library/qwen2.5 |
| 6 | mistral | 29.4M | 29400000.0 | https://ollama.com/library/mistral |
| 7 | qwen3 | 29.4M | 29400000.0 | https://ollama.com/library/qwen3 |
| 8 | llama3 | 24M | 24000000.0 | https://ollama.com/library/llama3 |
| 9 | gemma2 | 23.7M | 23700000.0 | https://ollama.com/library/gemma2 |
The scrape already gives the ranked values needed for your chart. Based on the observed table, the top 10 most-pulled Ollama models are:
llama3.1— 114.8M pullsdeepseek-r1— 86M pullsnomic-embed-text— 71M pullsllama3.2— 70M pullsgemma3— 36.9M pullsqwen2.5— 30.8M pullsmistral— 29.4M pullsqwen3— 29.4M pullsllama3— 24M pullsgemma2— 23.7M pulls
A couple of quick takeaways:
- llama3.1 is clearly ahead, with a sizable gap over deepseek-r1.
- The top four models are far above the rest, all at 70M+ pulls.
- mistral and qwen3 are effectively tied in the current snapshot.
horizontal bar chart please
Thinking
Creating a bar chart
I need to respond with a simple code snippet to plot a horizontal bar chart using seaborn and matplotlib. The data frame is already in the notebook, so I should keep it concise. I’ll analyze the latest output and base the code on that, focusing on creating just one chart. Maybe I can create a variable for the top 10 entries by sorting them by 'pulls.' Lastly, I should format the x-axis in millions and avoid using print. Let's generate this with seaborn!
Great — the data is already sorted, so we can turn the current top 10 into a horizontal bar chart.
</>Plot top pulled models
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import FuncFormatter
top10 = df.head(10).sort_values("pulls", ascending=True)
sns.set_theme(style="whitegrid")
fig, ax = plt.subplots(figsize=(10, 6))
sns.barplot(data=top10, x="pulls", y="model", palette="Blues_r", ax=ax)
ax.set_title("Top 10 Most Pulled Ollama Models")
ax.set_xlabel("Pulls")
ax.set_ylabel("Model")
ax.xaxis.set_major_formatter(FuncFormatter(lambda x, pos: f"{x/1_000_000:.0f}M"))
for i, v in enumerate(top10["pulls"]):
ax.text(v + top10["pulls"].max() * 0.01, i, top10.iloc[i]["pulls_text"], va="center")
plt.tight_layout()
plt.show()Thinking
Deciding on the output format
I need to decide between options A and B based on what's sufficient. The latest outputs have a cell1 top 10 table with numeric values, and cell2 has a figure object. My goal is to create a horizontal bar chart. I should analyze the latest executed outputs and compare them against the user's original goal. Since both a valid chart and explicit numeric values are present, I’ll go with A but focus on summarizing insights rather than restating the visible data. I’ll keep the final answer concise.
The horizontal bar chart is supported by the scraped top-10 values and shows a clear ranking of the most-pulled Ollama models.
From the observed data behind the chart:
- llama3.1 leads with 114.8M pulls.
- deepseek-r1 is second at 86M, with a noticeable gap from the leader.
- nomic-embed-text and llama3.2 are both around 70M, forming a strong top four.
- After that, pulls drop sharply to gemma3 at 36.9M.
- mistral and qwen3 are tied at 29.4M.
So the chart answers the original request well: it highlights both the ordering and the size difference between the leaders and the rest, with llama3.1 standing out as the dominant model in this snapshot of the Ollama library.
better colors, make it look better, sexy
Thinking
Enhancing chart presentation
I need to respond with a one-step code focused on improving chart styling. The user wants better colors, so I should consider using seaborn or matplotlib. A gradient palette with a dark background might work, and I should keep the code concise for notebook use. I'll clarify that the current chart is correct and mention potential improvements like removing clutter and enhancing labels. Finally, the code block will end with a simple expression like plt.show(). Let's get to crafting that code!Improving bar chart aesthetics
I'm exploring ways to enhance the bar chart with bold white or black labels based on bar length. Using colored bars like flare or rocket_r for horizontal bars sounds promising. I might want a dark background, like #0f172a, but then I need white text. Adjustments are required for seaborn on dark themes, so maybe a light theme is simpler. I'll provide improved code that includes a cleaner palette, rounded title, gradient colors, and stronger labels. Let’s craft that code!
Let’s upgrade the styling with a smoother palette, cleaner background, and stronger value labels.
</>Plot top pulls
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import FuncFormatter
top10 = df.head(10).sort_values("pulls", ascending=True).copy()
sns.set_theme(style="white")
fig, ax = plt.subplots(figsize=(11, 6.5), facecolor="#f8fafc")
ax.set_facecolor("#f8fafc")
colors = sns.color_palette("mako", n_colors=len(top10))
colors[-1] = (0.12, 0.47, 0.71) # highlight the top model a bit more
sns.barplot(
data=top10,
x="pulls",
y="model",
palette=colors,
ax=ax,
edgecolor="none"
)
ax.set_title("Top 10 Most Pulled Ollama Models", fontsize=18, weight="bold", pad=16)
ax.set_xlabel("Pulls", fontsize=12)
ax.set_ylabel("")
ax.xaxis.set_major_formatter(FuncFormatter(lambda x, pos: f"{x/1_000_000:.0f}M"))
ax.grid(axis="x", color="#cbd5e1", alpha=0.35, linewidth=1)
ax.grid(axis="y", visible=False)
sns.despine(left=True, bottom=True)
offset = top10["pulls"].max() * 0.012
for i, (_, row) in enumerate(top10.iterrows()):
ax.text(
row["pulls"] + offset,
i,
row["pulls_text"],
va="center",
ha="left",
fontsize=11,
weight="bold",
color="#0f172a"
)
ax.tick_params(axis="y", labelsize=11)
ax.tick_params(axis="x", labelsize=10, colors="#334155")
plt.tight_layout()
plt.show()Thinking
Analyzing visual improvements
I need to analyze the latest outputs against the original goal of creating "better colors" and a more appealing look. We have numeric values in output-1 and visual elements in outputs 2 and 3. Since the goal focuses on aesthetics, the current visual output looks promising. I'll emphasize improvements such as a white background, soft slate face color, mako palette, highlighted top bar, and larger typography. The current setup should meet the user's request for a "sexy" design. Let's make it concise!
This version is a clear upgrade for the “better colors, make it look better, sexy” goal.
What improved in the latest chart:
- The mako palette gives it a more premium, modern look than the earlier plain blue gradient.
- The soft off-white background (#f8fafc) makes the chart feel cleaner and less default-matplotlib.
- The top bar is subtly highlighted, which draws attention to llama3.1 without making the chart noisy.
- Direct value labels on each bar make it easier to read than relying only on the axis.
- The light x-grid only and removed spines keep it polished and less cluttered.
- Bigger, bolder title text improves visual hierarchy.
The data itself still comes through clearly:
- llama3.1 is the standout leader at 114.8M pulls.
- deepseek-r1 follows at 86M.
- nomic-embed-text and llama3.2 are close together around 70M.
- There’s a noticeable drop after the top four, which the horizontal layout shows nicely.
So visually, the newest chart is doing what you wanted: it looks more intentional, more stylish, and more presentation-ready while still keeping the ranking easy to read.
provide insights
Thinking
Analyzing user insights
I need to answer the user's request for insights based on the notebook's state and original goal. I should analyze the latest outputs and the current state to provide relevant insights, particularly focusing on chart data and observed metrics without including code. It’s essential to be concise, mentioning aspects like top 10 rankings, relative dominance, concentration, and differences between embeddings and chat models. For clarity, I’ll summarize insights using concise bullet points.
A few quick takeaways from the data and the latest chart:
llama3.1is the clear leader with 114.8M pulls. It has a sizable lead over the rest of the library.deepseek-r1is a strong second at 86M pulls, but still about 28.8M behindllama3.1.- There’s a second tier around 70M:
nomic-embed-textat 71Mllama3.2at 70M- After the top 4, there’s a big drop-off:
gemma3falls to 36.9M- then
qwen2.5,mistral, andqwen3are clustered in the high-20M to low-30M range. mistralandqwen3are essentially tied at 29.4M.- The top 10 mix includes:
- general LLM families (
llama,qwen,gemma,mistral) - at least one embedding-focused model (
nomic-embed-text), which stands out by ranking so high.
From a storytelling angle, the chart suggests:
- A winner-takes-most pattern at the top
- A tight middle cluster
- Strong popularity for both Llama-family models and Qwen/Gemma ecosystems
Visually, the horizontal layout helps show that structure well:
- easy to compare the top leader,
- easy to see the steep drop after rank 4,
- and easy to spot near-ties in the middle.
If you want, the next nice step is to compute a few summary stats like:
- top 10 share of all pulls,
- gap between ranks,
- or brand/family-level grouping (llama, qwen, gemma, etc.).