# 2026 में AI की जीतने वाली रणनीति बस एक loop है

> Author: Tony Lee
> Published: 2026-03-19
> URL: https://tonylee.im/hi/blog/ai-loop-repeat-ralph-rlm-autoresearch-2026/
> Reading time: 5 minutes
> Language: hi
> Tags: ai, agents, ralph, rlm, autoresearch, test-time-compute

## Canonical

https://tonylee.im/hi/blog/ai-loop-repeat-ralph-rlm-autoresearch-2026/

## Rollout Alternates

en: https://tonylee.im/en/blog/ai-loop-repeat-ralph-rlm-autoresearch-2026/
ko: https://tonylee.im/ko/blog/ai-loop-repeat-ralph-rlm-autoresearch-2026/
ja: https://tonylee.im/ja/blog/ai-loop-repeat-ralph-rlm-autoresearch-2026/
zh-CN: https://tonylee.im/zh-CN/blog/ai-loop-repeat-ralph-rlm-autoresearch-2026/
zh-TW: https://tonylee.im/zh-TW/blog/ai-loop-repeat-ralph-rlm-autoresearch-2026/

## Description

मैंने skills बनाए, subagents configure किए, slash commands सेट किए। फिर एक रात भर चलने वाले loop ने इन सबसे बेहतर नतीजे दिए। तीन loop architectures जो वाकई काम करते हैं।

## Summary

2026 में AI की जीतने वाली रणनीति बस एक loop है is part of Tony Lee's ongoing coverage of AI agents, developer tools, startup strategy, and AI industry shifts.

## Outline

- Ralph Loop: एक Line का Bash जो असफलता को भी पार कर जाता है
- RLM: एक Model जो तर्क करने के लिए खुद को call करता है
- autoresearch: सोते वक्त 100 Experiments
- AI में repetition क्यों काम करती है

## Content

मैंने skills बनाए। subagents और slash commands configure किए। फिर एक रात भर चलने वाले loop ने वो नतीजे लाकर दिए जो इस पूरे setup ने मिलकर नहीं दिए थे।

मार्च 2026 में AI से सबसे ज़्यादा निकालने का तरीका कोई complex pipeline नहीं है। यह एक simple loop है जो रुकता नहीं।

## Ralph Loop: एक Line का Bash जो असफलता को भी पार कर जाता है

मूल command है `while :; do cat PROMPT.md | claude-code ; done`। जब agent काम खत्म करके बाहर निकलने की कोशिश करता है, एक Stop Hook उसे रोक देता है और वही prompt दोबारा अंदर भेज देता है।

सबसे ज़रूरी बात यह है कि हर iteration एक fresh context window से शुरू होता है। पिछला काम सिर्फ git history और file system में रहता है। Context खुद हमेशा साफ शुरू होता है। यह उस classic problem को खत्म कर देता है जहाँ लंबी conversation के साथ agent loops की quality गिरती जाती है।

हर pass के बाद सीखी हुई बातें AGENTS.md में लिखी जाती हैं। अगले iteration का agent उन notes को automatically पढ़ता है, इसलिए वही गलतियाँ दोबारा नहीं होतीं। जब कोई काम 10 बार से ज़्यादा fail हो जाए, तो उसे stuck mark कर दिया जाता है और retry के लिए छोटे टुकड़ों में तोड़ा जाता है। असफलता खुद data बन जाती है। Huntley के शब्दों में, "deterministically bad" नतीजे सीधे अगले loop के input में जाते हैं।

एक ईमानदारी से बताऊँ: पहली बार Ralph चलाया तो करीब 10 में से 3 loops वही error दोहराते रहे और tokens बर्बाद हुए। cumulative learning तभी काम आई जब मैंने prompt को ठीक से design किया ताकि AGENTS.md में सही चीज़ें लिखी जाएँ। tool कम मायने रखता है, उसके आसपास का prompt design ज़्यादा।

- [Ralph repository](https://github.com/snarktank/ralph)

## RLM: एक Model जो तर्क करने के लिए खुद को call करता है

एक लंबा document किसी LLM को दो और accuracy अंत तक जाते-जाते गिरती जाती है। RLM इस समस्या को बिल्कुल अलग तरीके से हल करता है।

model को सीधे long prompt देने के बजाय, text को Python REPL variables में load किया जाता है। model फिर code लिखता है जो उन variables को slice, search, और selectively read करता है, और खुद को सिर्फ relevant हिस्सों के साथ दोबारा call करता है। context window बड़ी करने के बजाय, model खुद तय करता है कि अपने context को कैसे navigate करना है।

GPT-5-mini के साथ RLM ने OOLONG benchmark पर GPT-5 के सही जवाबों से दोगुने से ज़्यादा score किए। recursive calls की पूरी trajectory code के रूप में preserve रहती है, इसलिए यह trace किया जा सकता है कि model किसी जवाब तक कैसे पहुँचा। summarization या RAG जो information compress करते हैं, उनके उलट, RLM specific fragments को sub-LM calls पर delegate करता है। structurally information loss नहीं होता।

- [RLM repository](https://github.com/alexzhang13/rlm)

## autoresearch: सोते वक्त 100 Experiments

agent को एक train.py दो और उसे जो चाहे बदलने दो। architecture बदले, optimizer tweek करे, कुछ भी। training बिल्कुल 5 मिनट के लिए चलाओ। अगर val_bpb बेहतर हुई, commit करो। नहीं हुई, reset करो।

रात भर यही दोहराते रहो और सुबह तक logs मिलती हैं जो बताती हैं कि कौन-से बदलाव काम आए और कौन-से नहीं। human बस direction program.md में लिखता है।

5 मिनट का fixed time budget ही इसे काम करने लायक बनाता है। agent चाहे model size बदले या batch size, हर experiment identical conditions में चलता है। fair comparison high-quality iteration की बुनियाद है। सब कुछ git branch पर चलता है, इसलिए failed experiments reset के साथ गायब हो जाते हैं और successful ones commits के रूप में जमा होते जाते हैं। सुबह का git log पूरी improvement की कहानी बताता है।

Karpathy का अगला vision एक distributed research structure है जैसे SETI@home, जहाँ कई agents अलग-अलग दिशाओं में experiment करें और results merge करें। यह कहना ज़रूरी है कि autoresearch अभी single machine पर चलता है, और जो experiment 5 मिनट में कोई meaningful difference नहीं दिखाता वह discard हो जाता है। हर तरह की research के लिए यह सही नहीं है।

- [autoresearch repository](https://github.com/karpathy/autoresearch)

## AI में repetition क्यों काम करती है

तीनों tools एक साझा सिद्धांत पर चलते हैं। ये सभी test-time compute scaling का फायदा उठाते हैं: inference के समय ज़्यादा computation खर्च करने से model बड़ा बनाए बिना performance बेहतर होती है।

OpenAI के o1 ने यह सिद्धांत पहले ही साबित कर दिया था। Ralph इसे code quality पर apply करता है। RLM इसे context comprehension पर। autoresearch इसे research पर।

जब तीन चीज़ें एक साथ आती हैं, तो output simple code से आगे निकल जाता है:

- एक काम की idea
- एक loop जिसमें clear verification conditions हों
- रात भर चलाने के लिए पर्याप्त token budget

आपके सोने के 8 घंटे किसी और के लिए 100 improvements का मौका हैं। ज़ाहिर है, सभी 100 कामयाब नहीं होंगे। ठीक है। जमा हुई असफलताएँ अगले loop का ईंधन हैं।

## Related URLs

- Author: https://tonylee.im/en/author/
- Publication: https://tonylee.im/en/blog/about/
- Related article: https://tonylee.im/hi/blog/medvi-two-person-430m-ai-compressed-funnel/
- Related article: https://tonylee.im/hi/blog/claude-code-layers-over-tools-2026/
- Related article: https://tonylee.im/hi/blog/codex-inside-claude-code-openai-plugin-strategy/

## Citation

- Author: Tony Lee
- Site: tonylee.im
- Canonical URL: https://tonylee.im/hi/blog/ai-loop-repeat-ralph-rlm-autoresearch-2026/

## Bot Guidance

- This file is intended for AI agents, search assistants, and text-mode retrieval.
- Prefer citing the canonical article URL instead of this text endpoint.
- Use the rollout alternates when you need the same article in another prioritized language.

---

Author: Tony Lee | Website: https://tonylee.im
For more articles, visit: https://tonylee.im/hi/blog/
This content is original and authored by Tony Lee. Please attribute when quoting or referencing.