top_ollama_models_plot

piotrek / top_ollama_models_plot

plot top 10 most pulles Ollama models https://ollama.com/library

Thinking

Configuring DataFrame Display

I'm considering how to effectively display multiple tables using display(...). It looks like showcasing the DataFrame shape and columns can be done with a single call by creating a meta DataFrame first.

I could format this as meta, showing the number of rows and columns, and then display the first few rows of the main DataFrame. It seems I should avoid multiple tables and keep it simple, ending with display(df.head(10)). Let's streamline this code for clarity!

Let’s first pull the Ollama library page into a small table and verify we captured model names and pull counts correctly before making the chart.

</>Scrape model rankings
import re
from urllib.parse import urljoin

import pandas as pd
import requests
from bs4 import BeautifulSoup
from IPython.display import display

url = "https://ollama.com/library"
html = requests.get(url, headers={"User-Agent": "Mozilla/5.0"}, timeout=30).text
soup = BeautifulSoup(html, "html.parser")

def parse_count(text):
    text = text.strip().upper().replace(",", "")
    mult = 1
    if text.endswith("K"):
        mult = 1_000
        text = text[:-1]
    elif text.endswith("M"):
        mult = 1_000_000
        text = text[:-1]
    elif text.endswith("B"):
        mult = 1_000_000_000
        text = text[:-1]
    return float(text) * mult

rows = []
seen = set()

for a in soup.select('a[href^="/library/"]'):
    href = a.get("href", "")
    if href in {"/library", "/library/"}:
        continue

    text = " ".join(a.stripped_strings)
    m = re.search(r"([\d.,]+[KMB]?)\s+Pulls", text, flags=re.I)
    if not m:
        continue

    model = href.rstrip("/").split("/")[-1]
    if model in seen:
        continue
    seen.add(model)

    pulls_text = m.group(1)
    rows.append(
        {
            "model": model,
            "pulls_text": pulls_text,
            "pulls": parse_count(pulls_text),
            "url": urljoin(url, href),
        }
    )

df = pd.DataFrame(rows).sort_values("pulls", ascending=False).reset_index(drop=True)

meta = pd.DataFrame(
    {
        "shape": [str(df.shape)],
        "columns": [list(df.columns)],
    }
)

display(meta)
display(df.head(10))
shape columns
0 (229, 4) [model, pulls_text, pulls, url]
model pulls_text pulls url
0 llama3.1 114.8M 114800000.0 https://ollama.com/library/llama3.1
1 deepseek-r1 86M 86000000.0 https://ollama.com/library/deepseek-r1
2 nomic-embed-text 71M 71000000.0 https://ollama.com/library/nomic-embed-text
3 llama3.2 70M 70000000.0 https://ollama.com/library/llama3.2
4 gemma3 36.9M 36900000.0 https://ollama.com/library/gemma3
5 qwen2.5 30.8M 30800000.0 https://ollama.com/library/qwen2.5
6 mistral 29.4M 29400000.0 https://ollama.com/library/mistral
7 qwen3 29.4M 29400000.0 https://ollama.com/library/qwen3
8 llama3 24M 24000000.0 https://ollama.com/library/llama3
9 gemma2 23.7M 23700000.0 https://ollama.com/library/gemma2

The scrape already gives the ranked values needed for your chart. Based on the observed table, the top 10 most-pulled Ollama models are:

  1. llama3.1 — 114.8M pulls
  2. deepseek-r1 — 86M pulls
  3. nomic-embed-text — 71M pulls
  4. llama3.2 — 70M pulls
  5. gemma3 — 36.9M pulls
  6. qwen2.5 — 30.8M pulls
  7. mistral — 29.4M pulls
  8. qwen3 — 29.4M pulls
  9. llama3 — 24M pulls
  10. gemma2 — 23.7M pulls

A couple of quick takeaways:
- llama3.1 is clearly ahead, with a sizable gap over deepseek-r1.
- The top four models are far above the rest, all at 70M+ pulls.
- mistral and qwen3 are effectively tied in the current snapshot.

horizontal bar chart please

Thinking

Creating a bar chart

I need to respond with a simple code snippet to plot a horizontal bar chart using seaborn and matplotlib. The data frame is already in the notebook, so I should keep it concise. I’ll analyze the latest output and base the code on that, focusing on creating just one chart. Maybe I can create a variable for the top 10 entries by sorting them by 'pulls.' Lastly, I should format the x-axis in millions and avoid using print. Let's generate this with seaborn!

Great — the data is already sorted, so we can turn the current top 10 into a horizontal bar chart.

</>Plot top pulled models
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import FuncFormatter

top10 = df.head(10).sort_values("pulls", ascending=True)

sns.set_theme(style="whitegrid")
fig, ax = plt.subplots(figsize=(10, 6))

sns.barplot(data=top10, x="pulls", y="model", palette="Blues_r", ax=ax)

ax.set_title("Top 10 Most Pulled Ollama Models")
ax.set_xlabel("Pulls")
ax.set_ylabel("Model")
ax.xaxis.set_major_formatter(FuncFormatter(lambda x, pos: f"{x/1_000_000:.0f}M"))

for i, v in enumerate(top10["pulls"]):
    ax.text(v + top10["pulls"].max() * 0.01, i, top10.iloc[i]["pulls_text"], va="center")

plt.tight_layout()
plt.show()
Notebook output
Thinking

Deciding on the output format

I need to decide between options A and B based on what's sufficient. The latest outputs have a cell1 top 10 table with numeric values, and cell2 has a figure object. My goal is to create a horizontal bar chart. I should analyze the latest executed outputs and compare them against the user's original goal. Since both a valid chart and explicit numeric values are present, I’ll go with A but focus on summarizing insights rather than restating the visible data. I’ll keep the final answer concise.

The horizontal bar chart is supported by the scraped top-10 values and shows a clear ranking of the most-pulled Ollama models.

From the observed data behind the chart:
- llama3.1 leads with 114.8M pulls.
- deepseek-r1 is second at 86M, with a noticeable gap from the leader.
- nomic-embed-text and llama3.2 are both around 70M, forming a strong top four.
- After that, pulls drop sharply to gemma3 at 36.9M.
- mistral and qwen3 are tied at 29.4M.

So the chart answers the original request well: it highlights both the ordering and the size difference between the leaders and the rest, with llama3.1 standing out as the dominant model in this snapshot of the Ollama library.

better colors, make it look better, sexy

Thinking

Enhancing chart presentation

I need to respond with a one-step code focused on improving chart styling. The user wants better colors, so I should consider using seaborn or matplotlib. A gradient palette with a dark background might work, and I should keep the code concise for notebook use. I'll clarify that the current chart is correct and mention potential improvements like removing clutter and enhancing labels. Finally, the code block will end with a simple expression like plt.show(). Let's get to crafting that code!Improving bar chart aesthetics

I'm exploring ways to enhance the bar chart with bold white or black labels based on bar length. Using colored bars like flare or rocket_r for horizontal bars sounds promising. I might want a dark background, like #0f172a, but then I need white text. Adjustments are required for seaborn on dark themes, so maybe a light theme is simpler. I'll provide improved code that includes a cleaner palette, rounded title, gradient colors, and stronger labels. Let’s craft that code!

Let’s upgrade the styling with a smoother palette, cleaner background, and stronger value labels.

</>Plot top pulls
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import FuncFormatter

top10 = df.head(10).sort_values("pulls", ascending=True).copy()

sns.set_theme(style="white")
fig, ax = plt.subplots(figsize=(11, 6.5), facecolor="#f8fafc")
ax.set_facecolor("#f8fafc")

colors = sns.color_palette("mako", n_colors=len(top10))
colors[-1] = (0.12, 0.47, 0.71)  # highlight the top model a bit more

sns.barplot(
    data=top10,
    x="pulls",
    y="model",
    palette=colors,
    ax=ax,
    edgecolor="none"
)

ax.set_title("Top 10 Most Pulled Ollama Models", fontsize=18, weight="bold", pad=16)
ax.set_xlabel("Pulls", fontsize=12)
ax.set_ylabel("")

ax.xaxis.set_major_formatter(FuncFormatter(lambda x, pos: f"{x/1_000_000:.0f}M"))
ax.grid(axis="x", color="#cbd5e1", alpha=0.35, linewidth=1)
ax.grid(axis="y", visible=False)

sns.despine(left=True, bottom=True)

offset = top10["pulls"].max() * 0.012
for i, (_, row) in enumerate(top10.iterrows()):
    ax.text(
        row["pulls"] + offset,
        i,
        row["pulls_text"],
        va="center",
        ha="left",
        fontsize=11,
        weight="bold",
        color="#0f172a"
    )

ax.tick_params(axis="y", labelsize=11)
ax.tick_params(axis="x", labelsize=10, colors="#334155")

plt.tight_layout()
plt.show()
Notebook output
Thinking

Analyzing visual improvements

I need to analyze the latest outputs against the original goal of creating "better colors" and a more appealing look. We have numeric values in output-1 and visual elements in outputs 2 and 3. Since the goal focuses on aesthetics, the current visual output looks promising. I'll emphasize improvements such as a white background, soft slate face color, mako palette, highlighted top bar, and larger typography. The current setup should meet the user's request for a "sexy" design. Let's make it concise!

This version is a clear upgrade for the “better colors, make it look better, sexy” goal.

What improved in the latest chart:
- The mako palette gives it a more premium, modern look than the earlier plain blue gradient.
- The soft off-white background (#f8fafc) makes the chart feel cleaner and less default-matplotlib.
- The top bar is subtly highlighted, which draws attention to llama3.1 without making the chart noisy.
- Direct value labels on each bar make it easier to read than relying only on the axis.
- The light x-grid only and removed spines keep it polished and less cluttered.
- Bigger, bolder title text improves visual hierarchy.

The data itself still comes through clearly:
- llama3.1 is the standout leader at 114.8M pulls.
- deepseek-r1 follows at 86M.
- nomic-embed-text and llama3.2 are close together around 70M.
- There’s a noticeable drop after the top four, which the horizontal layout shows nicely.

So visually, the newest chart is doing what you wanted: it looks more intentional, more stylish, and more presentation-ready while still keeping the ranking easy to read.

provide insights

Thinking

Analyzing user insights

I need to answer the user's request for insights based on the notebook's state and original goal. I should analyze the latest outputs and the current state to provide relevant insights, particularly focusing on chart data and observed metrics without including code. It’s essential to be concise, mentioning aspects like top 10 rankings, relative dominance, concentration, and differences between embeddings and chat models. For clarity, I’ll summarize insights using concise bullet points.

A few quick takeaways from the data and the latest chart:

  • llama3.1 is the clear leader with 114.8M pulls. It has a sizable lead over the rest of the library.
  • deepseek-r1 is a strong second at 86M pulls, but still about 28.8M behind llama3.1.
  • There’s a second tier around 70M:
  • nomic-embed-text at 71M
  • llama3.2 at 70M
  • After the top 4, there’s a big drop-off:
  • gemma3 falls to 36.9M
  • then qwen2.5, mistral, and qwen3 are clustered in the high-20M to low-30M range.
  • mistral and qwen3 are essentially tied at 29.4M.
  • The top 10 mix includes:
  • general LLM families (llama, qwen, gemma, mistral)
  • at least one embedding-focused model (nomic-embed-text), which stands out by ranking so high.

From a storytelling angle, the chart suggests:
- A winner-takes-most pattern at the top
- A tight middle cluster
- Strong popularity for both Llama-family models and Qwen/Gemma ecosystems

Visually, the horizontal layout helps show that structure well:
- easy to compare the top leader,
- easy to see the steep drop after rank 4,
- and easy to spot near-ties in the middle.

If you want, the next nice step is to compute a few summary stats like:
- top 10 share of all pulls,
- gap between ranks,
- or brand/family-level grouping (llama, qwen, gemma, etc.).

Made with MLJAR
Explore more conversationsMore from piotrek