๐Ÿ“š Weekly AI Paper Digest

๊ธฐ๊ฐ„: 2026-05-25 ~ 2026-05-30 ์„ ์ •: ์ด๋ฒˆ ์ฃผ ๊ฐ€์žฅ ์ฃผ๋ชฉ๋ฐ›์€ ๋…ผ๋ฌธ Top 5


๐Ÿ† ์ด๋ฒˆ ์ฃผ Top 5

์ˆœ์œ„๋…ผ๋ฌธโฌ†๏ธDeep Dive
๐Ÿฅ‡Gamma-World: Generative Multi-Agent Worlโ€ฆ404DD-097
๐ŸฅˆSkillOpt: Executive Strategy for Self-Evโ€ฆ207DD-098
๐Ÿฅ‰DVAO: Dynamic Variance-adaptive Advantagโ€ฆ132DD-099
4.LocateAnything: Fast and High-Quality Viโ€ฆ127DD-100
5.AgentDoG 1.5: A Lightweight and Scalableโ€ฆ120DD-101

๐Ÿ” ์ด๋ฒˆ ์ฃผ ํŠธ๋ Œ๋“œ

ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ

  • ๋ฉ€ํ‹ฐ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ (Multi-Agent Systems): ๋‹จ์ผ ์—์ด์ „ํŠธ๋ฅผ ๋„˜์–ด ์—ฌ๋Ÿฌ ์—์ด์ „ํŠธ๊ฐ€ ์ƒํ˜ธ์ž‘์šฉํ•˜๊ฑฐ๋‚˜ ๊ณต์œ  ํ™˜๊ฒฝ์—์„œ ๋™์‹œ์— ํ–‰๋™ํ•˜๋Š” ๋ณต์žกํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ์„ธ๊ณ„ ๋ชจ๋ธ๋ง.
  • ์ž๊ธฐ ์ง„ํ™” ์—์ด์ „ํŠธ (Self-Evolving Agents): ์‚ฌ๋žŒ์˜ ๊ฐœ์ž… ์—†์ด ์—์ด์ „ํŠธ๊ฐ€ ์ž์‹ ์˜ ์Šคํ‚ฌ(Skill)์„ ์™ธ๋ถ€ ๊ฐ€์ค‘์น˜์ฒ˜๋Ÿผ ์ตœ์ ํ™”ํ•˜์—ฌ ์Šค์Šค๋กœ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜.
  • ํšจ์œจ์  ์ •๋ ฌ ๋ฐ ์ตœ์ ํ™” (Efficient Alignment & Optimization): ๋‹ค์ค‘ ๋ณด์ƒ(Multi-reward) ํ™˜๊ฒฝ์—์„œ์˜ ๊ฐ•ํ™” ํ•™์Šต ์ •๋ ฌ, ๊ฒฝ๋Ÿ‰ํ™”๋œ ์•ˆ์ „ ํ”„๋ ˆ์ž„์›Œํฌ, ๋ณ‘๋ ฌ ๋””์ฝ”๋”ฉ์„ ํ†ตํ•œ ์ถ”๋ก  ์†๋„ ํ–ฅ์ƒ.

๊ณตํ†ต ์ฃผ์ œ

์ด๋ฒˆ ์ฃผ ๋…ผ๋ฌธ๋“ค์€ ๋‹จ์ผ ์—์ด์ „ํŠธ์˜ ํ•œ๊ณ„๋ฅผ ๋„˜์–ด์„  ๋ณตํ•ฉ์ ์ด๊ณ  ์—ญ๋™์ ์ธ ์—์ด์ „ํŠธ ์ƒํƒœ๊ณ„์™€ ๊ทธ ํ†ต์ œ ๋ฐฉ์‹์— ์ง‘์ค‘ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹จ์ˆœํ•œ ์ง€์‹œ ์ˆ˜ํ–‰์„ ๋„˜์–ด, ์—์ด์ „ํŠธ๊ฐ€ ๋ณต์žกํ•œ ํ™˜๊ฒฝ(๋ฉ€ํ‹ฐ ์—์ด์ „ํŠธ)์—์„œ ์ƒํ˜ธ์ž‘์šฉํ•˜๊ฑฐ๋‚˜ ์Šค์Šค๋กœ ํ•™์Šต(์ž๊ธฐ ์ง„ํ™”)ํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ๊ฐ–์ถ”๋„๋ก ์„ค๊ณ„ํ•˜๋Š” ๋™์‹œ์—, ์ด๋Ÿฌํ•œ ๊ณ ๊ธฐ๋Šฅ ์—์ด์ „ํŠธ๋ฅผ ์•ˆ์ „ํ•˜๊ณ  ํšจ์œจ์ ์œผ๋กœ ์ œ์–ดํ•˜๊ธฐ ์œ„ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜(์ •๋ ฌ, ์ตœ์ ํ™”, ์•ˆ์ „ ํ”„๋ ˆ์ž„์›Œํฌ)์ด ํ•จ๊ป˜ ์ œ์‹œ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ฃผ๋ชฉํ•  ์ 

**โ€˜Gamma-Worldโ€™**์™€ **โ€˜SkillOptโ€™**๋Š” ์—์ด์ „ํŠธ๋ฅผ ๋‹จ์ˆœํ•œ ์‹คํ–‰ ๋„๊ตฌ๊ฐ€ ์•„๋‹Œ, ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ์ž‘์šฉํ•˜๋ฉฐ ์Šค์Šค๋กœ ๋‚ด๋ถ€ ์ƒํƒœ(์Šคํ‚ฌ)๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” โ€˜ํ•™์Šต ์ฃผ์ฒดโ€™๋กœ ์ •์˜ํ•˜๊ณ  ์žˆ๋‹ค๋Š” ์ ์ด ํฅ๋ฏธ๋กญ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ **โ€˜LocateAnythingโ€™**์ด๋‚˜ **โ€˜AgentDoG 1.5โ€™**์—์„œ ๋ณด๋“ฏ์ด, ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๊ฒƒ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋‚˜ ๊ฒฝ๋Ÿ‰ํ™”๋ฅผ ํ†ตํ•ด ์‹ค์ œ ์‹ค๋ฌด ํ™˜๊ฒฝ์—์„œ์˜ ์†๋„์™€ ๋ณด์•ˆ์„ฑ์„ ํ™•๋ณดํ•˜๋ ค๋Š” ๊ธฐ์ˆ ์  ์‹œ๋„๊ฐ€ ๋‘๋“œ๋Ÿฌ์ง‘๋‹ˆ๋‹ค.

์‹ค๋ฌด ์‹œ์‚ฌ์ 

๊ฐœ๋ฐœ์ž์™€ ์—ฐ๊ตฌ์ž๋Š” ์ด์ œ LLM์„ ํ™œ์šฉํ•œ ๋‹จ์ผ ์ฑ—๋ด‡ ๊ฐœ๋ฐœ์„ ๋„˜์–ด ๋ฉ€ํ‹ฐ ์—์ด์ „ํŠธ ๊ฐ„์˜ ํ˜‘๋ ฅ ๋ฐ ๊ฒฝ์Ÿ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์„ค๊ณ„ํ•  ์ค€๋น„๋ฅผ ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์—์ด์ „ํŠธ์˜ ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ฐ•ํ™” ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ์ •๋ ฌ ๊ธฐ๋ฒ•(DVAO ๋“ฑ)๊ณผ ์ž๊ธฐ ์ตœ์ ํ™” ๋ฃจํ”„๋ฅผ ๋ชจ๋ธ ๊ฐœ๋ฐœ ํŒŒ์ดํ”„๋ผ์ธ์— ์ ๊ทน ๋„์ž…ํ•ด์•ผ ํ•˜๋ฉฐ, ๋ฐฐํฌ ์ „ **์ถ”๋ก  ์†๋„(๋ณ‘๋ ฌ ๋””์ฝ”๋”ฉ)์™€ ์•ˆ์ „์„ฑ(Alignment Framework)**์„ ๊ฒ€์ฆํ•˜๋Š” ํ”„๋กœ์„ธ์Šค๊ฐ€ ํ•„์ˆ˜์ ์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.


๐Ÿ“‘ ๋…ผ๋ฌธ๋ณ„ ์š”์•ฝ

๐Ÿฅ‡ 1. Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players

arXiv: 2605.28816 | โฌ†๏ธ 404 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: world-model multi-agent diffusion-transformer simplex-rope video-generation simulation efficient-architecture

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅˆ 2. SkillOpt: Executive Strategy for Self-Evolving Agent Skills

arXiv: 2605.23904 | โฌ†๏ธ 207 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: skillopt llm-agent text-optimization self-evolving prompt-optimization reinforcement-learning nlp

์ด ๋…ผ๋ฌธ์€ ๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์ˆ˜์ •ํ•˜์ง€ ์•Š๊ณ ๋„ ํ…์ŠคํŠธ ํ˜•ํƒœ์˜ โ€˜์Šคํ‚ฌ(Skill)โ€˜์„ ๋งˆ์น˜ ์‹ ๊ฒฝ๋ง์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ตœ์ ํ™”ํ•˜๋“ฏ ์•ˆ์ •์ ์ด๊ณ  ์ง€์†์ ์œผ๋กœ ๋ฐœ์ „์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์ตœ์ดˆ์˜ ์ตœ์ ํ™” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ–ˆ๋‹ค๋Š” ์ ์—์„œ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅ‰ 3. DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

arXiv: 2605.25604 | โฌ†๏ธ 132 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: llm rlhf grpo multi-reward optimization alignment davo reasoning

DVAO๋Š” ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ฐ•ํ™” ํ•™์Šต ์ •๋ ฌ ๊ณผ์ •์—์„œ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ณด์ƒ(Multi-reward)์„ ๋™์‹œ์— ์ตœ์ ํ™”ํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ํ•™์Šต ๋ถˆ์•ˆ์ •์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๋ถ„์‚ฐ(Variance)์„ ๋™์ ์œผ๋กœ ์กฐ์ ˆํ•˜์—ฌ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ณด๋‹ค ์•ˆ์ •์ ์ด๊ณ  ํšจ์œจ์ ์ธ ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค๋Š” ์ ์—์„œ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


4. 4. LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

arXiv: 2605.27365 | โฌ†๏ธ 127 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: vlm object-detection grounding parallel-decoding computer-vision efficiency transformer

๊ธฐ์กด์˜ ์ˆœ์ฐจ์  ํ† ํฐ ์ƒ์„ฑ ๋ฐฉ์‹์ด ๊ฐ€์ง„ ์†๋„์™€ ์ •ํ™•๋„์˜ ํ•œ๊ณ„๋ฅผ, ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค(Bounding Box)๋ฅผ ํ•˜๋‚˜์˜ ๋‹จ์œ„๋กœ ํ•œ ๋ฒˆ์— ํ•ด์„ํ•˜๋Š” ๋ณ‘๋ ฌ ๋””์ฝ”๋”ฉ(Parallel Decoding) ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ํš๊ธฐ์ ์œผ๋กœ ๊ฐœ์„ ํ•˜์—ฌ ์‹ค์‹œ๊ฐ„ ๋น„์ „-์–ธ์–ด ๋ชจ๋ธ์˜ ์‹ค์šฉํ™”๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


5. 5. AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

arXiv: 2605.29801 | โฌ†๏ธ 120 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: agent-safety alignment lightweight-models data-purification trajectory-analysis rlhf guardrails

์ตœ์‹  ์˜คํ”ˆ ์›”๋“œ AI ์—์ด์ „ํŠธ(OpenClaw ๋“ฑ)์˜ ๋ณด์•ˆ ์œ„ํ—˜์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๋‹จ 1,000๊ฐœ์˜ ์ ์€ ๋ฐ์ดํ„ฐ๋กœ๋„ ์ตœ์ƒ์œ„ ํ์‡„ํ˜• ๋ชจ๋ธ(Closed-source model)๊ณผ ๋Œ€๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๋‚ด๋Š” ๊ฒฝ๋Ÿ‰ํ™”๋œ ์ •๋ ฌ ํ”„๋ ˆ์ž„์›Œํฌ์ธ AgentDoG 1.5๋ฅผ ์ œ์•ˆํ•˜์—ฌ ์•ˆ์ „ํ•˜๊ณ  ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์—์ด์ „ํŠธ ์ƒํƒœ๊ณ„๋ฅผ ๊ตฌ์ถ•ํ–ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿ“… ์ƒ์„ฑ์ผ: 2026-05-31 | ๐Ÿค– GLM-4.7 Weekly Digest