๐Ÿ“š Weekly AI Paper Digest

๊ธฐ๊ฐ„: 2026-05-04 ~ 2026-05-09 ์„ ์ •: ์ด๋ฒˆ ์ฃผ ๊ฐ€์žฅ ์ฃผ๋ชฉ๋ฐ›์€ ๋…ผ๋ฌธ Top 5


๐Ÿ† ์ด๋ฒˆ ์ฃผ Top 5

์ˆœ์œ„๋…ผ๋ฌธโฌ†๏ธDeep Dive
๐Ÿฅ‡MolmoAct2: Action Reasoning Models for Rโ€ฆ266DD-082
๐ŸฅˆFrom Context to Skills: Can Language Modโ€ฆ145DD-083
๐Ÿฅ‰Stream-R1: Reliability-Perplexity Aware โ€ฆ117DD-084
4.RLDX-1 Technical Report101DD-085
5.ARIS: Autonomous Research via Adversariaโ€ฆ99DD-086

๐Ÿ” ์ด๋ฒˆ ์ฃผ ํŠธ๋ Œ๋“œ

ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ

  • VLA (Vision-Language-Action) ๋ชจ๋ธ: ์–ธ์–ด์™€ ์‹œ๊ฐ ์ •๋ณด๋ฅผ ํ†ตํ•ด ๋กœ๋ด‡์˜ ๋ฌผ๋ฆฌ์  ํ–‰๋™์„ ์ œ์–ดํ•˜๋Š” ์œตํ•ฉ ๋ชจ๋ธ๋กœ, ์‹ค์ œ ํ™˜๊ฒฝ ๋ฐฐ์น˜๋ฅผ ์œ„ํ•œ ์—ฐ๊ตฌ๊ฐ€ ๊ธ‰์ฆํ•˜๊ณ  ์žˆ์Œ.
  • ์‹ค์„ธ๊ณ„ ๋ฐฐ์น˜ (Real-world Deployment): ๋‹จ์ˆœํ•œ ๋ฒค์น˜๋งˆํฌ ์„ฑ๋Šฅ์„ ๋„˜์–ด, ์‹ค์ œ ๋ฌผ๋ฆฌ ์„ธ๊ณ„์˜ ์ง€์—ฐ ์‹œ๊ฐ„(Latency), ๋ณต์žก์„ฑ, ์‹ ๋ขฐ์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ์‹œ๋„.
  • ์ž์œจ ์—ฐ๊ตฌ ์—์ด์ „ํŠธ (Autonomous Agents): LLM์ด ์Šค์Šค๋กœ ์—ฐ๊ตฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ  ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” โ€˜ํ•˜๋‹ˆ์Šค(Harness)โ€™ ๊ตฌ์กฐ์™€ ํ˜‘์—… ๋ฐฉ์‹์— ๋Œ€ํ•œ ๊ณ ๋„ํ™”.
  • ์‹ ๋ขฐ๋„ ๊ธฐ๋ฐ˜ ์ฆ๋ฅ˜ (Reliability-aware Distillation): ๋น„๋””์˜ค ์ƒ์„ฑ ๋“ฑ ์ƒ์„ฑ ๋ชจ๋ธ์˜ ํ•™์Šต ํšจ์œจ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด, ๊ต์‚ฌ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์„ ๋ฌด๋น„ํŒ์ ์œผ๋กœ ์ˆ˜์šฉํ•˜์ง€ ์•Š๊ณ  ์‹ ๋ขฐ๋„๋ฅผ ๊ฐ€์ค‘์น˜๋กœ ๋ฐ˜์˜ํ•˜๋Š” ๊ธฐ๋ฒ•.
  • ์Šคํ‚ฌ ์ถ”์ถœ ๋ฐ ํ•™์Šต (Skill Learning): ๋ณต์žกํ•œ ์ปจํ…์ŠคํŠธ์—์„œ ์–ธ์–ด ๋ชจ๋ธ์ด ์Šค์Šค๋กœ ๊ทœ์น™์ด๋‚˜ ์ ˆ์ฐจ๋ฅผ โ€˜์Šคํ‚ฌโ€™๋กœ ์ถ”์ถœํ•˜์—ฌ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์„ ๊ฐ•ํ™”ํ•˜๋Š” ๋ฐฉ์‹.

๊ณตํ†ต ์ฃผ์ œ

์ด๋ฒˆ ์ฃผ ์—ฐ๊ตฌ๋“ค์€ AI ๋ชจ๋ธ์ด ๋‹จ์ˆœํžˆ ์ •๋ณด๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ฑฐ๋‚˜ ์ƒ์„ฑํ•˜๋Š” ๋‹จ๊ณ„๋ฅผ ๋„˜์–ด, **์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ํ–‰๋™ํ•˜๊ฑฐ๋‚˜ ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ์ž์œจ์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋Š” โ€˜์‹คํ–‰ ๊ฐ€๋Šฅํ•œ AI(Actionable AI)โ€˜**๋กœ ์ง„ํ™”ํ•˜๊ณ  ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํŠนํžˆ ๋กœ๋ด‡ ์ œ์–ด(VLA)์™€ ์ž์œจ ์—ฐ๊ตฌ ์—์ด์ „ํŠธ ๋ถ„์•ผ์—์„œ, ๋ชจ๋ธ์˜ ์ง€๋Šฅ์„ ์‹ค์ œ ๋ฌผ๋ฆฌ์  ์ž‘์—…์ด๋‚˜ ์žฅ๊ธฐ๊ฐ„์˜ ์—ฐ๊ตฌ ๊ณผ์ •์— ํšจ์œจ์ ์ด๊ณ  ์‹ ๋ขฐ์„ฑ ์žˆ๊ฒŒ ์ ์šฉํ•˜๋ ค๋Š” ๋…ธ๋ ฅ์ด ๋‘๋“œ๋Ÿฌ์ง‘๋‹ˆ๋‹ค.

์ฃผ๋ชฉํ•  ์ 

๋กœ๋ด‡ ๊ณตํ•™ ๋ถ„์•ผ์—์„œ๋Š” VLA ๋ชจ๋ธ์˜ ์‹ค์šฉ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด โ€˜ํ–‰๋™ ์ถ”๋ก (Action Reasoning)โ€˜๊ณผ ๋ณต์žกํ•œ ๊ธฐ์–ต๋ ฅ/์šด๋™ ์ธ์‹ ๋Šฅ๋ ฅ์„ ๊ฒฐํ•ฉํ•˜๋Š” ๊ธฐ์ˆ (MolmoAct2, RLDX-1)์ด ์ฃผ๋ชฉ๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ƒ์„ฑ ๋ชจ๋ธ๊ณผ ์–ธ์–ด ๋ชจ๋ธ์˜ ํ•™์Šต ๋ฐฉ์‹์— ์žˆ์–ด์„œ๋„ ๋‹จ์ˆœํ•œ ๋ฐ์ดํ„ฐ ์–‘์˜ ํ™•๋ณด๋ณด๋‹ค๋Š” โ€˜์–ด๋–ค ์ •๋ณด๊ฐ€ ๋” ๊ฐ€์น˜ ์žˆ๋Š”์ง€(Reliability)โ€˜๋ฅผ ํŒ๋‹จํ•˜๊ฑฐ๋‚˜ โ€˜ํ•ต์‹ฌ ์Šคํ‚ฌโ€™์„ ์ถ”์ถœํ•˜์—ฌ ํ•™์Šต ํšจ์œจ๊ณผ ์ถ”๋ก  ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๋ ค๋Š” ์ •๊ตํ•œ ์ตœ์ ํ™” ๊ธฐ์ˆ (Stream-R1, From Context to Skills)์ด ๋“ฑ์žฅํ–ˆ์Šต๋‹ˆ๋‹ค.

์‹ค๋ฌด ์‹œ์‚ฌ์ 

๋กœ๋ด‡ ๋ฐ ์ž๋™ํ™” ๋ถ„์•ผ ๊ฐœ๋ฐœ์ž๋Š” VLA ๋ชจ๋ธ์˜ ์˜คํ”ˆ ์†Œ์Šคํ™” ๊ฒฝํ–ฅ๊ณผ ์‹ค์‹œ๊ฐ„ ์ œ์–ด๋ฅผ ์œ„ํ•œ ์ง€์—ฐ ์‹œ๊ฐ„ ์ตœ์ ํ™” ๊ธฐ์ˆ ์„ ์ฃผ๋ชฉํ•˜์—ฌ, ์‹ค์ œ ์‚ฐ์—… ํ˜„์žฅ์— ํˆฌ์ž… ๊ฐ€๋Šฅํ•œ ๋กœ๋ด‡ ์ œ์–ด ์‹œ์Šคํ…œ์„ ์„ค๊ณ„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. AI ์—ฐ๊ตฌ์ž ๋ฐ ์—”์ง€๋‹ˆ์–ด๋Š” ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ๊ฑฐ๋Œ€ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ ์™ธ์—๋„ **โ€˜๋ฐ์ดํ„ฐ์˜ ์‹ ๋ขฐ๋„ ๊ฐ€์ค‘์น˜โ€™๋‚˜ โ€˜์ปจํ…์ŠคํŠธ๋กœ๋ถ€ํ„ฐ์˜ ์Šคํ‚ฌ ์ถ”์ถœ ๊ตฌ์กฐโ€™**์™€ ๊ฐ™์€ ํ•™์Šต ํšจ์œจํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ๊ทน์ ์œผ๋กœ ๋„์ž…ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ž์œจ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ ๊ตฌ์ถ• ์‹œ ๋ชจ๋ธ ์ž์ฒด๋ณด๋‹ค๋Š” ์—์ด์ „ํŠธ์˜ ์ •๋ณด ์ €์žฅ ๋ฐ ๊ฒ€์ƒ‰์„ ๊ด€๋ฆฌํ•˜๋Š” โ€˜ํ•˜๋‹ˆ์Šค(Harness)โ€™ ์•„ํ‚คํ…์ฒ˜๊ฐ€ ์„ฑ๋Šฅ์˜ ํ•ต์‹ฌ์ž„์„ ์ธ์ง€ํ•˜๊ณ  ์‹œ์Šคํ…œ ๋ ˆ๋ฒจ์˜ ์„ค๊ณ„์— ์ง‘์ค‘ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ“‘ ๋…ผ๋ฌธ๋ณ„ ์š”์•ฝ

๐Ÿฅ‡ 1. MolmoAct2: Action Reasoning Models for Real-world Deployment

arXiv: 2605.02881 | โฌ†๏ธ 266 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: vla embodied-ai robotics open-source molmoact2 flow-matching reasoning

๊ธฐ์กด์˜ ํ์‡„์ ์ด๊ฑฐ๋‚˜ ๊ณ ์„ฑ๋Šฅ ํ•˜๋“œ์›จ์–ด๋ฅผ ์š”๊ตฌํ•˜๋˜ ๋ชจ๋ธ๋“ค๊ณผ ๋‹ฌ๋ฆฌ, ์‹ค์ œ ํ˜„์žฅ ๋ฐฐ์น˜๋ฅผ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ์™„์ „ ๊ฐœ๋ฐฉํ˜•์ด๊ณ  ํšจ์œจ์ ์ธ ํ–‰๋™ ์ถ”๋ก  ๋ชจ๋ธ์„ ์ œ์‹œํ•˜์—ฌ ๋กœ๋ด‡์˜ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ๊ณผ ์‹ค์šฉ์„ฑ์„ ํš๊ธฐ์ ์œผ๋กœ ๋†’์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅˆ 2. From Context to Skills: Can Language Models Learn from Context Skillfully?

arXiv: 2604.27660 | โฌ†๏ธ 145 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: context-learning self-play llm-agents skill-augmentation reasoning automous-learning ctx2skill

์ด ๋…ผ๋ฌธ์€ ๋ณต์žกํ•œ ๋งฅ๋ฝ์—์„œ ์ธ๊ฐ„์˜ ๊ฐœ์ž… ์—†์ด๋„ ์–ธ์–ด ๋ชจ๋ธ์ด ์Šค์Šค๋กœ ํ•„์š”ํ•œ ์ง€์‹๊ณผ ๊ทœ์น™์„ ์ถ”์ถœํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋Šฅ๋ ฅ์„ ๊ฐ–์ถ”๋„๋ก ๋งŒ๋“  ์ž๊ฐ€ ์ง„ํ™”(Self-evolving) ํ”„๋ ˆ์ž„์›Œํฌ์ธ Ctx2Skill์„ ์ œ์‹œํ–ˆ๋‹ค๋Š” ์ ์—์„œ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅ‰ 3. Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

arXiv: 2605.03849 | โฌ†๏ธ 117 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: stream-r1 video-generation knowledge-distillation reward-modeling ai-efficiency diffusion-models computer-vision

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


4. 4. RLDX-1 Technical Report

arXiv: 2605.03269 | โฌ†๏ธ 101 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: rldx-1 vla multi-modal-learning robotics dexterous-manipulation synthetic-data transformer

๊ธฐ์กด ์‹œ๊ฐ-์–ธ์–ด-ํ–‰๋™ ๋ชจ๋ธ์˜ ์ง€๋Šฅ์  ์ดํ•ด ๋Šฅ๋ ฅ์— ๋™์ž‘ ์ธ์‹, ์žฅ๊ธฐ ๊ธฐ์–ต, ๋ฌผ๋ฆฌ์  ๊ฐ๊ฐ๊ณผ ๊ฐ™์€ ๊ธฐ๋Šฅ์  ์—ญ๋Ÿ‰์„ ํ†ตํ•ฉํ•˜์—ฌ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ์‚ฌ๋žŒ๊ณผ ๊ฐ™์€ ์ •๊ตํ•œ ์กฐ์ž‘์ด ๊ฐ€๋Šฅํ•œ ๋ฒ”์šฉ ๋กœ๋ด‡ ์ •์ฑ…์„ ๊ตฌํ˜„ํ–ˆ๋‹ค๋Š” ์ ์—์„œ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


5. 5. ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

arXiv: 2605.03042 | โฌ†๏ธ 99 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: autonomous-research multi-agent adversarial-collaboration ml-automation llm-agents peer-review system-architecture

๋‹จ์ผ ๋ชจ๋ธ์ด ์Šค์Šค๋กœ ์ˆ˜ํ–‰ํ•˜๊ณ  ๊ฒ€์ฆํ•˜๋Š” ๊ธฐ์กด ๋ฐฉ์‹์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด, ์„œ๋กœ ๋‹ค๋ฅธ ๋ชจ๋ธ ๊ณ„์—ด์ด ๋Œ€๋ฆฝ์ ์œผ๋กœ ํ˜‘์—…ํ•˜๋Š” ์—„๊ฒฉํ•œ ๊ฒ€์ฆ ์‹œ์Šคํ…œ์„ ํ†ตํ•ด ์žฅ๊ธฐ์ ์ธ ๋จธ์‹ ๋Ÿฌ๋‹ ์—ฐ๊ตฌ์˜ ์‹ ๋ขฐ์„ฑ์„ ํ™•๋ณดํ•œ ARIS ์‹œ์Šคํ…œ์„ ์ œ์‹œํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿ“… ์ƒ์„ฑ์ผ: 2026-05-10 | ๐Ÿค– GLM-4.7 Weekly Digest