๐Ÿ“š Weekly AI Paper Digest

๊ธฐ๊ฐ„: 2026-04-13 ~ 2026-04-18 ์„ ์ •: ์ด๋ฒˆ ์ฃผ ๊ฐ€์žฅ ์ฃผ๋ชฉ๋ฐ›์€ ๋…ผ๋ฌธ Top 5


๐Ÿ† ์ด๋ฒˆ ์ฃผ Top 5

์ˆœ์œ„๋…ผ๋ฌธโฌ†๏ธDeep Dive
๐Ÿฅ‡WildDet3D: Scaling Promptable 3D Detectiโ€ฆ238DD-067
๐ŸฅˆSeedance 2.0: Advancing Video Generationโ€ฆ136DD-068
๐Ÿฅ‰The Past Is Not Past: Memory-Enhanced Dyโ€ฆ135DD-069
4.ClawGUI: A Unified Framework for Traininโ€ฆ134DD-070
5.QuanBench+: A Unified Multi-Framework Beโ€ฆ121DD-071

๐Ÿ” ์ด๋ฒˆ ์ฃผ ํŠธ๋ Œ๋“œ

ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ

  • ์ŠคํŽ˜์ด์…œ ์ธํ…”๋ฆฌ์ „์Šค (Spatial Intelligence): ๋‹จ์ผ ์ด๋ฏธ์ง€๋กœ 3D ๊ณต๊ฐ„์„ ์ดํ•ดํ•˜๊ณ  ๊ฐ์ฒด๋ฅผ ๊ฐ์ง€ํ•˜๋ฉฐ, ์˜คํ”ˆ ์›”๋“œ ํ™˜๊ฒฝ์—์„œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ํ†ตํ•ด ์ž‘๋™ํ•˜๋Š” ๊ธฐ์ˆ .
  • ๋„ค์ดํ‹ฐ๋ธŒ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ƒ์„ฑ (Native Multi-modal Generation): ํ…์ŠคํŠธ, ์ด๋ฏธ์ง€๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์˜ค๋””์˜ค์™€ ๋น„๋””์˜ค๋ฅผ ํ†ตํ•ฉ์ ์œผ๋กœ ์ƒ์„ฑํ•˜๊ณ  ๋ณต์žกํ•œ ์„ธ๊ณ„๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ์•„ํ‚คํ…์ฒ˜.
  • GUI ์—์ด์ „ํŠธ ์ธํ”„๋ผ (GUI Agent Infrastructure): API๊ฐ€ ์•„๋‹Œ ์‹œ๊ฐ์  ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ํ†ตํ•ด ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ์ œ์–ดํ•˜๋Š” ์—์ด์ „ํŠธ๋ฅผ ํ•™์Šต ๋ฐ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ํ†ตํ•ฉ ํ”„๋ ˆ์ž„์›Œํฌ.
  • ๋ฉ”๋ชจ๋ฆฌ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™” ํ•™์Šต (Memory-Enhanced RL): ๊ณผ๊ฑฐ์˜ ์‹คํŒจ ํŒจํ„ด์„ ๊ธฐ์–ตํ•˜์—ฌ ๋ณด์ƒ์„ ๋™์ ์œผ๋กœ ์กฐ์ •ํ•˜๊ณ  ์ •์ฑ…์˜ ๋‹ค์–‘์„ฑ์„ ํ™•๋ณดํ•˜๋Š” LLM ํ•™์Šต ๋ฐฉ๋ฒ•.
  • ์ „๋ฌธ ๋ถ„์•ผ ๋ฒค์น˜๋งˆํ‚น (Specialized Benchmarking): ์–‘์ž ์ปดํ“จํŒ…๊ณผ ๊ฐ™์€ ํŠน์ • ๋„๋ฉ”์ธ์—์„œ์˜ ์ฝ”๋“œ ์ƒ์„ฑ ๋Šฅ๋ ฅ์„ ๋‹ค์ค‘ ํ”„๋ ˆ์ž„์›Œํฌ์— ๊ฑธ์ณ ํ‰๊ฐ€ํ•˜๋Š” ์ฒ™๋„.

๊ณตํ†ต ์ฃผ์ œ

์ด๋ฒˆ ์ฃผ ๋…ผ๋ฌธ๋“ค์€ AI๊ฐ€ ํ…์ŠคํŠธ๋‚˜ ์ด๋ฏธ์ง€๋ฅผ ๋„˜์–ด 3D ๊ณต๊ฐ„, ๋น„๋””์˜ค, ์˜ค๋””์˜ค, GUI ํ™˜๊ฒฝ ๋“ฑ ๋”์šฑ ๋ณต์žกํ•˜๊ณ  ์‹ค์ œ์ ์ธ ์„ธ๊ณ„(World)๋ฅผ ์ดํ•ดํ•˜๊ณ  ์ƒํ˜ธ์ž‘์šฉํ•˜๋ ค๋Š” ์‹œ๋„๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋‹จ์ˆœํžˆ ๋ชจ๋ธ์˜ ํฌ๊ธฐ๋ฅผ ํ‚ค์šฐ๋Š” ๊ฒƒ์„ ๋„˜์–ด, ์—์ด์ „ํŠธ์˜ ํ›ˆ๋ จ ์ธํ”„๋ผ๋ฅผ ๊ตฌ์ถ•ํ•˜๊ฑฐ๋‚˜ ๊ฐ•ํ™” ํ•™์Šต์˜ ๋ณด์ƒ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๊ฐœ์„ ํ•˜๋Š” ๋“ฑ **โ€˜๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ์‹œ์Šคํ…œ์ ์ด๊ณ  ์•Œ๊ณ ๋ฆฌ์ฆ˜์ ์ธ ๊ณ ๋„ํ™”โ€™**์— ์ง‘์ค‘ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์–‘์ž ์ฝ”๋“œ ์ƒ์„ฑ๊ณผ ๊ฐ™์ด ํŠน์ˆ˜ํ•œ ๋ถ„์•ผ์—์„œ์˜ LLM ํ™œ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ๊ฒ€์ฆํ•˜๋Š” ๊ธฐ์ค€ ๋งˆ๋ จ์˜ ์ค‘์š”์„ฑ์ด ๊ฐ•์กฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ฃผ๋ชฉํ•  ์ 

ํŠนํžˆ ํฅ๋ฏธ๋กœ์šด ์ ์€ WildDet3D๊ฐ€ NLP๋‚˜ 2D ๋น„์ „์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋˜ โ€˜ํ”„๋กฌํ”„ํŠธ(Promptable)โ€™ ๊ฐœ๋…์„ 3D ๊ฐ์ฒด ๊ฐ์ง€๋กœ ํ™•์žฅํ•˜์—ฌ, ์ •์˜๋˜์ง€ ์•Š์€ ์˜คํ”ˆ ์›”๋“œ ํ™˜๊ฒฝ์—์„œ๋„ ์œ ์—ฐํ•˜๊ฒŒ ์ž‘๋™ํ•˜๋„๋ก ์„ค๊ณ„ํ–ˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ClawGUI๋Š” ๋ชจ๋ธ๋ง ๋Šฅ๋ ฅ ์ž์ฒด๋ณด๋‹ค๋Š” ์—์ด์ „ํŠธ ์—ฐ๊ตฌ์˜ ๋ณ‘๋ชฉ์ด ๋˜๋Š” โ€˜์ „์ฒด ์Šคํƒ ์ธํ”„๋ผโ€™ ๋ถ€์žฌ๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ, ์‹ค์ œ ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ์ž๋™ํ™”ํ•˜๋Š” ๋‹จ๊ณ„๋กœ ๋‚˜์•„๊ฐ€๊ธฐ ์œ„ํ•œ ๋ฐœํŒ์„ ๋งˆ๋ จํ–ˆ๋‹ค๋Š” ์ ์—์„œ ์ฃผ๋ชฉ๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์‹ค๋ฌด ์‹œ์‚ฌ์ 

๊ฐœ๋ฐœ์ž์™€ ์—ฐ๊ตฌ์ž๋Š” ์ด์ œ ํ…์ŠคํŠธ ์ค‘์‹ฌ์˜ LLM ๊ฐœ๋ฐœ์„ ๋„˜์–ด ๋น„์ „-์–ธ์–ด-์˜ค๋””์˜ค๊ฐ€ ํ†ตํ•ฉ๋œ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ƒ์„ฑ ๋ชจ๋ธ์˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ดํ•ดํ•ด์•ผ ํ•  ์‹œ์ ์— ์™”์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์—์ด์ „ํŠธ๋ฅผ ๊ฐœ๋ฐœํ•  ๋•Œ๋Š” ๋ชจ๋ธ์˜ ์ง€๋Šฅ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์‹ค์ œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜๊ณผ ์ƒํ˜ธ์ž‘์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํ•™์Šต ๋ฐ ํ‰๊ฐ€ ํ™˜๊ฒฝ(Infrastructure) ๊ตฌ์ถ•์ด ํ•ต์‹ฌ ์„ฑ๊ณต ์š”์†Œ๊ฐ€ ๋  ๊ฒƒ์ž„์„ ์ธ์ง€ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๊ฐ•ํ™” ํ•™์Šต์„ ์ ์šฉํ•  ๋•Œ ๊ณผ๊ฑฐ ์‹คํŒจ ๊ธฐ๋ก์„ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ์ค‘์‹ฌ์˜ ๋ณด์ƒ ์„ค๊ณ„๊ฐ€ ๋ชจ๋ธ์˜ ํ’ˆ์งˆ์„ ๋†’์ด๋Š” ๋ฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•  ์ˆ˜ ์žˆ์Œ์„ ๊ณ ๋ คํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ“‘ ๋…ผ๋ฌธ๋ณ„ ์š”์•ฝ

๐Ÿฅ‡ 1. WildDet3D: Scaling Promptable 3D Detection in the Wild

arXiv: 2604.08626 | โฌ†๏ธ 238 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: ai-paper ml

์ด ๋…ผ๋ฌธ์€ ๋‹จ์•ˆ ์ด๋ฏธ์ง€์—์„œ ํ…์ŠคํŠธ๋‚˜ ํด๋ฆญ๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•ด, ํ•™์Šต๋˜์ง€ ์•Š์€ ๊ฐ์ฒด๊นŒ์ง€ ์‹ค์ œ ํ™˜๊ฒฝ(In the Wild)์—์„œ 3D๋กœ ๊ฐ์ง€ํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์ดˆ์˜ ํ†ตํ•ฉ ๊ธฐํ•˜ํ•™ ์ธ์‹ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์‹œํ•˜์—ฌ ๊ฐœ๋ฐฉํ˜• ์„ธ๊ณ„์—์„œ์˜ ๊ณต๊ฐ„ ์ง€๋Šฅ์„ ํ™•์žฅํ–ˆ๋‹ค๋Š” ์ ์— ์ค‘์š”ํ•œ ์˜๋ฏธ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅˆ 2. Seedance 2.0: Advancing Video Generation for World Complexity

arXiv: 2604.14148 | โฌ†๏ธ 136 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: video-generation diffusion-transformer world-modeling temporal-consistency seedance generative-ai computer-vision

์ด ๋…ผ๋ฌธ์€ ๋‹จ์ˆœํ•œ ํ™”๋ คํ•จ์„ ๋„˜์–ด, ๋ฌผ๋ฆฌ ๋ฒ•์น™๊ณผ ๋ณต์žกํ•œ ์ƒํ˜ธ์ž‘์šฉ์ด ๋’ค์„ž์ธ ์‹ค์ œ ์„ธ๊ณ„์˜ โ€˜๋ณต์žก์„ฑ(World Complexity)โ€˜์„ ๋ชจ๋ธ๋งํ•˜์—ฌ ์ƒ์„ฑํ˜• ๋™์˜์ƒ์˜ ํ˜„์‹ค๊ฐ๊ณผ ๋…ผ๋ฆฌ์  ์ผ๊ด€์„ฑ์„ ํš๊ธฐ์ ์œผ๋กœ ๋†’์ธ ๋ฐ์— ๊ทธ ์ค‘์š”์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅ‰ 3. The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

arXiv: 2604.11297 | โฌ†๏ธ 135 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: llm reinforcement-learning reward-shaping exploration clustering memory-system reasoning

๊ฐ•ํ™” ํ•™์Šต์„ ์ ์šฉํ•œ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(Large Language Models)์ด ํ•™์Šต ๊ณผ์ •์—์„œ ํŠน์ • ํ–‰๋™์—๋งŒ ๊ณ ์ฐฉํ™”๋˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๊ณผ๊ฑฐ์˜ ์˜ค๋ฅ˜ ํŒจํ„ด์„ ๊ธฐ์–ตํ•˜๊ณ  ์ด๋ฅผ ๋™์ ์œผ๋กœ ๋ณด์ƒ ์„ค๊ณ„์— ๋ฐ˜์˜ํ•˜๋Š” ๋ฉ”๋ชจ๋ฆฌ ๊ธฐ๋ฐ˜ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ–ˆ๋‹ค๋Š” ์ ์—์„œ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


4. 4. ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

arXiv: 2604.11784 | โฌ†๏ธ 134 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: gui-agent reinforcement-learning automation framework mobile-world mllm deployment evaluation

์ด ๋…ผ๋ฌธ์€ GUI ์—์ด์ „ํŠธ ๊ฐœ๋ฐœ์˜ ๊ฐ€์žฅ ํฐ ๋ณ‘๋ชฉ์ด์—ˆ๋˜ ์ธํ”„๋ผ ๋ถ€์žฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ, ํ›ˆ๋ จ๋ถ€ํ„ฐ ํ‰๊ฐ€ ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ ๋ฐฐํฌ๊นŒ์ง€๋ฅผ ํ•˜๋‚˜์˜ ํ‹€์—์„œ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์ดˆ์˜ ํ†ตํ•ฉํ˜• ์˜คํ”ˆ์†Œ์Šค ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ๊ณตํ•จ์œผ๋กœ์จ ์‹ค์ œ ๊ธฐ๊ธฐ์—์„œ ์ž‘๋™ํ•˜๋Š” ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” AI ์ž๋™ํ™”์˜ ๊ธธ์„ ์—ด์—ˆ๋‹ค๋Š” ์ ์—์„œ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


5. 5. QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation

arXiv: 2604.08570 | โฌ†๏ธ 121 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: quantum-computing llm-benchmark code-generation qiskit pennylane cirq evaluation-metric kl-divergence

์ด ๋…ผ๋ฌธ์ด ์ค‘์š”ํ•œ ์ด์œ ๋Š” ๋‹จ์ผ ํ”„๋ ˆ์ž„์›Œํฌ์— ๊ตญํ•œ๋˜์ง€ ์•Š๊ณ  Qiskit, PennyLane, Cirq๋ฅผ ๋ชจ๋‘ ์•„์šฐ๋ฅด๋Š” ํ†ตํ•ฉ ๋ฒค์น˜๋งˆํฌ(QuanBench+)๋ฅผ ํ†ตํ•ด ์–ธ์–ด ๋ชจ๋ธ์˜ ์ˆœ์ˆ˜ํ•œ ์–‘์ž ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์ดˆ์˜ ํ‘œ์ค€ํ™”๋œ ์ง€ํ‘œ๋ฅผ ์ œ์‹œํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿ“… ์ƒ์„ฑ์ผ: 2026-04-19 | ๐Ÿค– GLM-4.7 Weekly Digest