# प्रॉम्प्ट को दो बार पेस्ट किया और सटीकता बदल गई

> Author: Tony Lee
> Published: 2026-02-20
> URL: https://tonylee.im/hi/blog/repeat-prompt-twice-llm-accuracy-google-research/
> Reading time: 5 minutes
> Language: hi
> Tags: ai, llm, prompt-engineering, google-research, performance

## Canonical

https://tonylee.im/hi/blog/repeat-prompt-twice-llm-accuracy-google-research/

## Rollout Alternates

en: https://tonylee.im/en/blog/repeat-prompt-twice-llm-accuracy-google-research/
ko: https://tonylee.im/ko/blog/repeat-prompt-twice-llm-accuracy-google-research/
ja: https://tonylee.im/ja/blog/repeat-prompt-twice-llm-accuracy-google-research/
zh-CN: https://tonylee.im/zh-CN/blog/repeat-prompt-twice-llm-accuracy-google-research/
zh-TW: https://tonylee.im/zh-TW/blog/repeat-prompt-twice-llm-accuracy-google-research/

## Description

Google Research ने 7 मॉडल पर जांचा हुआ LLM प्रदर्शन सुधारने का सबसे सस्ता तरीका। न अतिरिक्त ट्रेनिंग, न प्रॉम्प्ट डिज़ाइन। बस कॉपी-पेस्ट।

## Summary

प्रॉम्प्ट को दो बार पेस्ट किया और सटीकता बदल गई is part of Tony Lee's ongoing coverage of AI agents, developer tools, startup strategy, and AI industry shifts.

## Outline

- Attention का एक असुविधाजनक सच
- दो बार paste करने से क्या होता है
- तीन बार क्यों नहीं
- कब काम नहीं आता
- Cost का calculation
- Paper

## Content

पिछले महीने एक production issue debug कर रहा था। एक LLM-based pipeline था जो documents को classify करता था। Accuracy लगातार 70% के आसपास अटकी हुई थी। Model बदला, temperature tune किया, system prompt को और detailed बनाया। कुछ काम नहीं आया।

फिर किसी ने suggestion दिया कि prompt को context की शुरुआत और अंत दोनों जगह paste करो। मुझे हंसी आई। लेकिन try किया और accuracy 84% हो गई।

उस वक्त मुझे Google Research का यह paper नहीं पता था। अब पढ़ा तो समझ आया कि उस "hack" के पीछे असल mechanism क्या है।

## Attention का एक असुविधाजनक सच

Transformers के बारे में एक बात जो documentation में prominently नहीं लिखी होती: वे context को uniformly नहीं पढ़ते। Long input होने पर model की attention context के शुरुआती और अंतिम हिस्सों पर ज़्यादा concentrate होती है। बीच का हिस्सा often underweighted रहता है।

इसे "lost in the middle" problem कहते हैं। Research में यह पहले भी देखा गया है। लेकिन Google के इस paper ने specifically यह देखा कि instruction या prompt यदि context में कहीं दब जाए, तो model उसे कितना follow करता है।

जवाब था: उतना नहीं जितना आप expect करते हो।

जब prompt बहुत लंबा हो, या उसके बाद बहुत सारा data आता हो, तो model की attention उस original instruction से drift हो जाती है। वह data को process करता है, लेकिन instruction को उतनी weight नहीं देता।

## दो बार paste करने से क्या होता है

Paper का core finding simple है: instruction को context की शुरुआत में भी दो और अंत में भी दो।

यह naively stupid लगता है। Context window का size बढ़ता है। Cost बढ़ती है। लेकिन जो होता है वह यह है कि model की attention को दोनों anchor points मिलते हैं। Instruction शुरू में है, तो model को पता है कि क्या करना है। और जब वह generation start करने वाला होता है, तो instruction फिर से visible होती है। दोनों जगह signal strong रहता है।

Google Research ने सात models पर यह test किया: Gemini 1.5 Flash, Gemini 1.5 Pro, Claude 3 Haiku, Claude 3.5 Sonnet, GPT-4o Mini, GPT-4o, और Llama 3.1 405B। Tasks थे summarization, question answering, और classification।

Results में repetition ने consistently improvement दी। कुछ tasks पर improvement marginal थी, कुछ पर significant। लेकिन कोई भी case नहीं था जहां repetition ने performance को hurt किया हो।

सबसे ज़्यादा benefit उन cases में था जहां context लंबा था और task instruction-heavy था, यानी जहां model को specific format में, specific constraints के साथ काम करना था।

## तीन बार क्यों नहीं

Paper ने यह भी देखा कि क्या होता है जब prompt तीन या उससे ज़्यादा बार repeat किया जाए।

Diminishing returns शुरू हो जाते हैं। दो से तीन जाने पर improvement बहुत कम होती है। और context window का usage बढ़ता रहता है। Cost-benefit calculation उल्टी पड़ने लगती है।

Optimal sweet spot दो repetitions है: एक शुरुआत में, एक अंत में। यही paper का recommendation है और यही practically भी sense बनाता है।

## कब काम नहीं आता

यह technique हर situation में उतनी effective नहीं है। Paper में इसे honestly acknowledge किया गया है।

Short contexts में, जहां instruction already dominant हो, repetition का marginal benefit लगभग zero होता है। अगर आपका entire prompt 200-300 tokens का है, तो उसे repeat करने से कोई measurable difference नहीं आएगा।

Reasoning-heavy tasks पर, जहां model को step-by-step सोचना पड़ता है, repetition उतना effective नहीं था। Paper का title भी यही कहता है: "Non-Reasoning LLMs" के लिए यह technique ज़्यादा relevant है। जो tasks primarily instruction-following और information extraction के हैं, वहां यह काम करता है।

Reasoning models, जो chain-of-thought को natively handle करते हैं, उनकी attention mechanism अलग तरह से काम करती है। उनके लिए यह technique उतनी ज़रूरी नहीं है।

## Cost का calculation

Instruction को दो बार include करने से input tokens बढ़ते हैं। यह real cost है।

लेकिन यहां comparison यह है: क्या आप अपना prompt redesign करने में घंटे लगाएंगे, few-shot examples add करेंगे जो और ज़्यादा tokens लेते हैं, या फिर बड़े model पर switch करेंगे जो per-token ज़्यादा expensive है? इन सभी alternatives के compared में, simple repetition बहुत सस्ती है।

Typical RAG pipeline में जहां retrieved context कई thousand tokens का होता है, वहां 200-300 tokens का additional instruction overhead negligible है। और अगर उससे accuracy 10-15% बढ़ती है, तो यह trade-off obvious है।

Production में इसे implement करना भी trivial है। Prompt construction में एक line add करनी है। कोई infrastructure change नहीं, कोई re-training नहीं, कोई A/B testing framework नहीं।

## Paper

[Prompt Repetition Improves Non-Reasoning LLMs](https://arxiv.org/abs/2512.14982)

Paper को पढ़ने लायक बनाने वाली बात यह है कि यह methodologically clean है। Seven models, multiple task types, consistent evaluation। और यह honestly बताता है कि technique कहां काम करती है और कहां नहीं।

मेरे लिए personally यह paper एक validation था। वह "stupid hack" जो production में काम आई थी, उसके पीछे एक real mechanism था। और अब उस mechanism को समझकर मैं यह decide कर सकता हूं कि किस pipeline में repetition लागू करनी है और किसमें नहीं।

## Related URLs

- Author: https://tonylee.im/en/author/
- Publication: https://tonylee.im/en/blog/about/
- Related article: https://tonylee.im/hi/blog/medvi-two-person-430m-ai-compressed-funnel/
- Related article: https://tonylee.im/hi/blog/claude-code-layers-over-tools-2026/
- Related article: https://tonylee.im/hi/blog/codex-inside-claude-code-openai-plugin-strategy/

## Citation

- Author: Tony Lee
- Site: tonylee.im
- Canonical URL: https://tonylee.im/hi/blog/repeat-prompt-twice-llm-accuracy-google-research/

## Bot Guidance

- This file is intended for AI agents, search assistants, and text-mode retrieval.
- Prefer citing the canonical article URL instead of this text endpoint.
- Use the rollout alternates when you need the same article in another prioritized language.

---

Author: Tony Lee | Website: https://tonylee.im
For more articles, visit: https://tonylee.im/hi/blog/
This content is original and authored by Tony Lee. Please attribute when quoting or referencing.