Summarization
#1
by
MCFred - opened
Great release!
Do you have an example prompt which you have used successfully for summarization?
Hi @MCFred
Since T5Gemma 2 is an Encoder-Decoder model . You could prompt like this directly
Prompt : "summarize: [Your long document or text goes here]" or
Prompt : Summarize the following document into a concise paragraph focusing on he key technical findings:
[Insert Document]
Thanks
I have created this example script testing out the different models with different prompts, but the output is either random noise, nothing, or simply the input without any changes. What am I doing wrong?
The script can be run with uv run script.py
# /// script
# requires-python = ">=3.10"
# dependencies = [
# "transformers[torch]==5.0.0rc3",
# "pillow",
# ]
# ///
from transformers import AutoProcessor, AutoModelForSeq2SeqLM
text = """
The Global Journey of the Coffee Bean
Though millions of people start their day with a fresh cup of coffee, few stop to consider the complex history behind the brew. Legend has it that coffee was first discovered in the 9th century by an Ethiopian goat herder named Kaldi, who noticed his flock became unusually energetic after eating berries from a certain tree. From the Ethiopian highlands, the knowledge of these "magic beans" spread to the Arabian Peninsula, where coffee was first roasted and brewed as we know it today.
By the 17th century, coffee had made its way to Europe. Despite initial pushback from some religious leaders who dubbed it the "bitter invention of Satan," the drink quickly became a staple of social life. "Coffeehouses" began popping up in London, Paris, and Amsterdam, becoming known as "penny universities" because for the price of a cup of coffee, one could engage in high-level intellectual discussion and debate. These hubs played a significant role in the development of the Enlightenment, as they provided a sober alternative to the alcohol-heavy atmosphere of local taverns.
Today, coffee is much more than a social lubricant; it is a massive global commodity. It is the second most traded product in the world, surpassed only by crude oil. The industry supports the livelihoods of approximately 25 million smallholder farmers across the "Bean Belt"—the equatorial region between the Tropics of Cancer and Capricorn. However, the industry faces modern challenges, including the volatile price of beans on the global market and the increasing threat of climate change, which impacts the delicate environments where high-quality Arabica beans thrive.
"""
prompt_templates = {
'simple': 'Summarize:\n\n',
'concise': 'Provide a short, concise summary of the following transcript in english. Focus on the main points only:\n\n',
'detailed': 'Provide a detailed summary of the following transcript in english. Keep all relevant information, removing only filler words and repetitions:\n\n',
}
for model_size in ('270m', '1b', '4b'):
model_path = f"google/t5gemma-2-{model_size}-{model_size}"
processor = AutoProcessor.from_pretrained(model_path)
model = AutoModelForSeq2SeqLM.from_pretrained(model_path)
model = model.to('cuda')
for prompt_type in ('simple', 'concise', 'detailed'):
prompt = prompt_templates[prompt_type] + text
print('=' * 80)
print(model_size, prompt_type)
model_inputs = processor(text=prompt, return_tensors="pt").to('cuda')
generation = model.generate(**model_inputs, max_new_tokens=2000, do_sample=False)
print(processor.decode(generation[0]))
print('=' * 80)
del model
del processor