5 Tiny Language models that you need to know (Part 1)

Tiny is a new large!

Oct 21, 2024

While large language models (LLMs) have captured the spotlight in recent years, their smaller counterparts, small language models (SLMs), have often been overshadowed. Despite their widespread adoption in smart devices, SLMs have received significantly less academic attention.

This disparity is largely due to the focus on achieving artificial general intelligence (AGI) with LLMs, which are typically deployed in data centers and cloud environments. In contrast, SLM research aims to make machine intelligence more accessible, affordable, and efficient for everyday tasks.

This article will explore 5 most promising SLMs and discuss the potential for further advancements in this often-overlooked area of AI research.

1. Meta

Meta introduced OPT and Galactica in 2022.

OPT, which stands for Open Pre-trained Transformer, learned a lot from English text during its training. However, it also had a tiny bit of exposure to non-English data through something called CommonCrawl. The model was taught to predict the next word in a sentence, which is a common way to train language models. OPT is similar to GPT-3, which is also a decoder-only model. So, it was trained in a similar way, using the same self-supervised method to predict words in a sentence." It has 4 versions, including 125M, 350M, 1.3B, and 2.7B.

from transformers import pipeline

generator = pipeline('text-generation', model="facebook/opt-125m")
generator("What are we having for dinner?")

The GALACTICA models are like super smart scientists! They've been taught using a huge collection of scientific information. These models can do all sorts of clever things, like predicting citations, answering tricky science questions, solving math problems, summarizing complex topics, generating documents, predicting molecular properties, and extracting important information. The Papers with Code team at Meta AI created these models to explore how language models can help organize scientific knowledge automatically.


from transformers import AutoTokenizer, OPTForCausalLM

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-125m")
model = OPTForCausalLM.from_pretrained("facebook/galactica-125m")

input_text = "The Transformer architecture [START_REF]"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

2. Google

Gemma is a family of awesome, lightweight models created by Google. These models are based on the same super smart research and technology used to make the impressive Gemini models. Gemma models are like geniuses when it comes to understanding and generating text. They can answer questions, summarize complex information, and reason through problems just like a human!

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="google/gemma-2-2b",
    device="cuda",  # replace with "mps" to run on a Mac device
)

text = "Once upon a time,"
outputs = pipe(text, max_new_tokens=256)
response = outputs[0]["generated_text"]
print(response)

Additionally, Google also releases RecurrentGemma. It is another cool family of open language models, but this time with a twist! It's built on a unique, recurrent architecture designed by the smart folks at Google. Just like Gemma, RecurrentGemma models are great at generating text and can tackle tasks like answering questions, summarizing, and reasoning. But here's the exciting part: RecurrentGemma has a special design that means it needs less memory and can work faster when creating long sequences of text. This makes it even more accessible and efficient, especially for those with limited resources. So, if you're looking for a powerful yet lightweight model, RecurrentGemma might just be the perfect fit!

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/recurrentgemma-2b")
model = AutoModelForCausalLM.from_pretrained("google/recurrentgemma-2b", device_map="auto")

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

3. TinyLlama

TinyLlama is a 1.1B parameter model that was published in 2023. From my point of view, this is a go-to model whenever your problem tackles the tiny model considering.

from transformers import AutoTokenizer
import transformers 
import torch
model = "PY007/TinyLlama-1.1B-Chat-v0.5"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

CHAT_EOS_TOKEN_ID = 32002

prompt = "How to get in a good university?"
formatted_prompt = (
    f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
)


sequences = pipeline(
    formatted_prompt,
    do_sample=True,
    top_k=50,
    top_p = 0.9,
    num_return_sequences=1,
    repetition_penalty=1.1,
    max_new_tokens=1024,
    eos_token_id=CHAT_EOS_TOKEN_ID,
)

for seq in sequences:
    print(f"Result: {seq['generated_text']}")

4. Stability AI

StableLM Zephyr 3B is an impressive model with 3 billion parameters, inspired by the brilliant minds at HugginFaceH4. It's been trained on a mix of publicly available and synthetic datasets, using a special technique called Direct Preference Optimization (DPO).

StableLM Zephyr 3B is a real powerhouse, and its training makes it well-equipped to handle a variety of text-based tasks

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('stabilityai/stablelm-zephyr-3b')
model = AutoModelForCausalLM.from_pretrained(
    'stabilityai/stablelm-zephyr-3b',
    device_map="auto"
)

prompt = [{'role': 'user', 'content': 'List 3 synonyms for the word "tiny"'}]
inputs = tokenizer.apply_chat_template(
    prompt,
    add_generation_prompt=True,
    return_tensors='pt'
)

tokens = model.generate(
    inputs.to(model.device),
    max_new_tokens=1024,
    temperature=0.8,
    do_sample=True
)

print(tokenizer.decode(tokens[0], skip_special_tokens=False))

5. DataBricks

Databricks' dolly-v2-3b is an amazing large language model that's been trained to follow instructions like a pro! It's been developed on the Databricks machine learning platform and is licensed for commercial use, so it's ready to tackle real-world tasks.

It can do all sorts of clever things, like brainstorming, classifying, answering questions, generating text, extracting information, and summarizing. Even though it's not the newest model out there, Dolly-v2-3b has surprised everyone with its high-quality instruction-following skills.

If you need an even more powerful version, Dolly v2 is also available in larger sizes: dolly-v2-12b with 12 billion parameters and dolly-v2-7b with 6.9 billion parameters.

import torch
from transformers import pipeline

generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
res = generate_text("Explain to me the difference between nuclear fission and fusion.")
print(res[0]["generated_text"])

In conclusion, while large language models have rightfully garnered significant attention and resources, it's important to recognize the value and potential of their smaller counterparts, small language models (SLMs).

SLMs have the power to bring machine intelligence to everyday devices and make it more accessible and affordable for a wide range of users.

By exploring and advancing SLM research, we can unlock new possibilities and ensure that the benefits of AI are not limited to data centers and cloud environments.

Thank you for reading this article; I hope it added something to your knowledge bank! Just before you leave:

👉 Be sure to press the like button and follow me. It would be a great motivation for me.

👉 Follow me: LinkedIn | GitHub

octoopt

Discussion about this post