๐Ÿ“š Weekly AI Paper Digest

๊ธฐ๊ฐ„: 2026-04-20 ~ 2026-04-25 ์„ ์ •: ์ด๋ฒˆ ์ฃผ ๊ฐ€์žฅ ์ฃผ๋ชฉ๋ฐ›์€ ๋…ผ๋ฌธ Top 5


๐Ÿ† ์ด๋ฒˆ ์ฃผ Top 5

์ˆœ์œ„๋…ผ๋ฌธโฌ†๏ธDeep Dive
๐Ÿฅ‡Tstars-Tryon 1.0: Robust and Realistic Vโ€ฆ244DD-072
๐ŸฅˆLLaDA2.0-Uni: Unifying Multimodal Undersโ€ฆ227DD-073
๐Ÿฅ‰AgentSPEX: An Agent SPecification and EXโ€ฆ153DD-074
4.Extending One-Step Image Generation fromโ€ฆ94DD-075
5.OneVL: One-Step Latent Reasoning and Plaโ€ฆ84DD-076

๐Ÿ” ์ด๋ฒˆ ์ฃผ ํŠธ๋ Œ๋“œ

ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ

  • ์›์Šคํ… ์ƒ์„ฑ(One-Step Generation): ๋‹ค๋‹จ๊ณ„ ์ถ”๋ก  ๊ณผ์ •์„ ๊ฑฐ์น˜์ง€ ์•Š๊ณ  ๋‹จ์ผ ๋‹จ๊ณ„์—์„œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๊ฑฐ๋‚˜ ๋ณต์žกํ•œ ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ์†๋„๋ฅผ ํš๊ธฐ์ ์œผ๋กœ ๋†’์ด๋Š” ๊ธฐ์ˆ 
  • ํ†ตํ•ฉํ˜• ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ(Unified Multimodal): ์ดํ•ด(Understanding)์™€ ์ƒ์„ฑ(Generation) ๋ชจ๋ธ์„ ๋ถ„๋ฆฌํ•˜์ง€ ์•Š๊ณ  ํ•˜๋‚˜์˜ ์•„ํ‚คํ…์ฒ˜์—์„œ ํ†ตํ•ฉํ•˜์—ฌ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ์‹
  • ์—์ด์ „ํŠธ ์ œ์–ด ์–ธ์–ด(Agent Specification Language): ์—์ด์ „ํŠธ์˜ ํ–‰๋™๊ณผ ํ๋ฆ„์„ ๋‹จ์ˆœ ํ”„๋กฌํ”„ํŠธ๊ฐ€ ์•„๋‹Œ ๋ช…์‹œ์ ์ธ ์–ธ์–ด๋กœ ์ •์˜ํ•˜์—ฌ ์ œ์–ด ๊ฐ€๋Šฅ์„ฑ์„ ๋†’์ด๋Š” ํ”„๋ ˆ์ž„์›Œํฌ
  • ์‹ค์‹œ๊ฐ„ ์ตœ์ ํ™”(Real-time Optimization): ์ž์œจ ์ฃผํ–‰ ๋“ฑ ์‹ค์‹œ๊ฐ„ ์„ฑ๋Šฅ์ด ์ค‘์š”ํ•œ ํ™˜๊ฒฝ์—์„œ ์ง€์—ฐ ์‹œ๊ฐ„์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ถ”๋ก  ๊ณผ์ •์„ ์••์ถ•ํ•˜๋Š” ๊ธฐ์ˆ 

๊ณตํ†ต ์ฃผ์ œ

์ด๋ฒˆ ์ฃผ AI ์—ฐ๊ตฌ ํŠธ๋ Œ๋“œ๋Š” **โ€˜ํšจ์œจ์„ฑ์˜ ๊ทน๋Œ€ํ™”(์†๋„)โ€˜์™€ โ€˜์‹œ์Šคํ…œ์˜ ๊ตฌ์กฐํ™”(ํ†ต์ œ)โ€˜**๋กœ ์š”์•ฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฐ๊ตฌ์ง„๋“ค์€ ๊ธฐ์กด ๋‹ค๋‹จ๊ณ„ ์ƒ์„ฑ ๋ชจ๋ธ์ด๋‚˜ ์ถ”๋ก  ๋ฐฉ์‹์˜ ๋น„ํšจ์œจ์„ฑ์„ ๊ฐœ์„ ํ•˜์—ฌ ๋‹จ์ผ ๋‹จ๊ณ„(One-step)์—์„œ ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•˜๋ ค๋Š” ์‹œ๋„๋ฅผ ์ด๋ฏธ์ง€ ์ƒ์„ฑ๊ณผ ์ž์œจ ์ฃผํ–‰ ๋ถ„์•ผ์—์„œ ๋™์‹œ์— ์ง„ํ–‰ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋™์‹œ์— ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๊ธฐ๋Šฅ์„ ํ†ตํ•ฉํ•˜๊ฑฐ๋‚˜ ์—์ด์ „ํŠธ์˜ ์‹คํ–‰ ํ๋ฆ„์„ ๋ช…์‹œ์ ์œผ๋กœ ์ •์˜ํ•˜๋Š” ๋“ฑ, AI ์‹œ์Šคํ…œ์„ ๋”์šฑ ๊ฐ•๋ ฅํ•˜๊ณ  ํ†ต์ œ ๊ฐ€๋Šฅํ•œ ํ˜•ํƒœ๋กœ ๋ฐœ์ „์‹œํ‚ค๋Š” ๋ฐฉํ–ฅ์— ์ง‘์ค‘ํ–ˆ์Šต๋‹ˆ๋‹ค.

์ฃผ๋ชฉํ•  ์ 

๊ฐ€์žฅ ํฅ๋ฏธ๋กœ์šด ์ ์€ โ€˜์›์Šคํ…(One-step)โ€™ ๊ธฐ์ˆ ์˜ ํ™•์žฅ์ž…๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ถ„์•ผ์—์„œ๋Š” ํด๋ž˜์Šค ๋ ˆ์ด๋ธ”์—์„œ ํ…์ŠคํŠธ ์ž…๋ ฅ์œผ๋กœ ์กฐ๊ฑด์„ ํ™•์žฅํ•˜์—ฌ ์›์Šคํ… ์ƒ์„ฑ์˜ ํ™œ์šฉ๋„๋ฅผ ๋†’์˜€์œผ๋ฉฐ, ์ž์œจ ์ฃผํ–‰ ๋ถ„์•ผ์—์„œ๋Š” ์‚ฌ๊ณ  ๊ณผ์ •(Chain-of-Thought)์„ ์ž ์žฌ ๊ณต๊ฐ„(Latent Space)์œผ๋กœ ์••์ถ•ํ•˜์—ฌ ์‹ค์‹œ๊ฐ„ ์ฒ˜๋ฆฌ์˜ ๋ณ‘๋ชฉ ํ˜„์ƒ์„ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, LLM ์—์ด์ „ํŠธ์˜ ๋ถˆํ™•์‹คํ•œ ํ–‰๋™์„ ์žก๊ธฐ ์œ„ํ•œ ์ „์šฉ ๋ช…์„ธ ์–ธ์–ด(AgentSPEX)๋ฅผ ์ œ์•ˆํ•œ ์ ์€ AI๊ฐ€ ๋‹จ์ˆœํ•œ ์ฑ—๋ด‡์„ ๋„˜์–ด ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ์†Œํ”„ํŠธ์›จ์–ด ์‹œ์Šคํ…œ์œผ๋กœ ์ง„ํ™”ํ•˜๊ณ  ์žˆ์Œ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

์‹ค๋ฌด ์‹œ์‚ฌ์ 

๊ฐœ๋ฐœ์ž์™€ ์—ฐ๊ตฌ์ž๋Š” ์ถ”๋ก  ์†๋„์™€ ๋น„์šฉ ํšจ์œจ์„ฑ์„ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ์›์Šคํ… ์ƒ์„ฑ ๋ฐ ์ž ์žฌ์  ์ถ”๋ก (Latent Reasoning) ๊ธฐ๋ฒ•์— ์ฃผ๋ชฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ์„œ๋น„์Šค ๋ ˆ๋ฒจ์—์„œ ์‹ค์‹œ๊ฐ„ ๋ฐ˜์‘ ์†๋„๊ฐ€ ์ค‘์š”ํ•œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๊ฐœ๋ฐœํ•œ๋‹ค๋ฉด, ๊ธฐ์กด์˜ ์ž๊ฐ€ํšŒ๊ท€(Autoregressive) ๋ฐฉ์‹ ๋Œ€์‹  ์••์ถ•๋œ ์ถ”๋ก  ๋ฐฉ์‹์„ ๋„์ž…ํ•˜๋Š” ๊ฒƒ์„ ๊ณ ๋ คํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ณต์žกํ•œ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•  ๋•Œ๋Š” ๋ฐ˜์‘ํ˜• ํ”„๋กฌํ”„ํŒ…์— ์˜์กดํ•˜๊ธฐ๋ณด๋‹ค ๊ตฌ์กฐํ™”๋œ ์›Œํฌํ”Œ๋กœ์šฐ๋‚˜ ๋ช…์‹œ์ ์ธ ์ œ์–ด ์–ธ์–ด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‹œ์Šคํ…œ์˜ ์•ˆ์ •์„ฑ๊ณผ ๋””๋ฒ„๊น… ์šฉ์ด์„ฑ์„ ํ™•๋ณดํ•˜๋Š” ์ „๋žต์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ“‘ ๋…ผ๋ฌธ๋ณ„ ์š”์•ฝ

๐Ÿฅ‡ 1. Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

arXiv: 2604.19748 | โฌ†๏ธ 244 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: virtual-try-on fashion-tech image-generation computer-vision generative-ai robustness commercial-ai

์ด ๋…ผ๋ฌธ์€ ์‹ค์ œ ์ƒ์šฉ ํ™˜๊ฒฝ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๊ทนํ•œ์˜ ์กฐ๊ฑด์—์„œ๋„ ๊ฒฌ๊ณ ํ•˜๊ณ  ์‚ฌ์‹ค์ ์ธ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋Š” ๋Œ€๊ทœ๋ชจ ๊ฐ€์ƒ ํ”ผํŒ… ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•˜์—ฌ ๊ธฐ์ˆ ์˜ ์‹ค์ œ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ์ž…์ฆํ–ˆ๊ธฐ์— ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅˆ 2. LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

arXiv: 2604.20796 | โฌ†๏ธ 227 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: multimodal diffusion-model llm image-generation nlp unified-architecture ai-research

์ด์‚ฐ ํ™•์‚ฐ ์–ธ์–ด ๋ชจ๋ธ(Discrete Diffusion Large Language Model)์„ ํ†ตํ•ด ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ดํ•ด์™€ ์ƒ์„ฑ์„ ๋‹จ์ผ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ํ†ตํ•ฉํ•˜์—ฌ, ๋ณ„๋„์˜ ๋ชจ๋ธ ์—†์ด๋„ ํ•˜๋‚˜์˜ ๋ชจ๋ธ๋กœ ํ…์ŠคํŠธ์™€ ์ด๋ฏธ์ง€๋ฅผ ์ž์œ ๋กญ๊ฒŒ ํ•ด์„ํ•˜๊ณ  ์ฐฝ์ž‘ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์‹œํ–ˆ๊ธฐ์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅ‰ 3. AgentSPEX: An Agent SPecification and EXecution Language

arXiv: 2604.13346 | โฌ†๏ธ 153 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: llm-agents workflow-orchestration agentspex dsl react-prompting ai-research software-engineering

๋ณต์žกํ•œ LLM ์—์ด์ „ํŠธ์˜ ์ž‘์—… ํ๋ฆ„์„ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์ฝ”๋“œ(Python)์—์„œ ๋ถ„๋ฆฌํ•˜์—ฌ ๋ช…์‹œ์ ์ธ ์ œ์–ด ํ๋ฆ„๊ณผ ๋ชจ๋“ˆํ˜• ๊ตฌ์กฐ๋กœ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋Š” ์ „์šฉ ์–ธ์–ด AgentSPEX๋ฅผ ์ œ์•ˆํ•˜์—ฌ, ์—์ด์ „ํŠธ ๊ฐœ๋ฐœ์˜ ์œ ์ง€๋ณด์ˆ˜์„ฑ๊ณผ ์ œ์–ด ๊ฐ€๋Šฅ์„ฑ์„ ํš๊ธฐ์ ์œผ๋กœ ๊ฐœ์„ ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


4. 4. Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation

arXiv: 2604.18168 | โฌ†๏ธ 94 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: one-step-generation text-to-image meanflow flow-matching semantic-representation efficiency blip3o generative-models

์ด๋ฏธ์ง€ ์ƒ์„ฑ์„ ๋‹จ ํ•œ ๋‹จ๊ณ„๋กœ ์™„๋ฃŒํ•˜๋Š” ๊ธฐ์ˆ ์ธ MeanFlow๋ฅผ ๋‹จ์ˆœํ•œ ํด๋ž˜์Šค ๋ถ„๋ฅ˜์—์„œ ์ž์—ฐ์–ด ํ”„๋กฌํ”„ํŠธ๋กœ ํ™•์žฅํ•˜์—ฌ, ์†๋„ ์ €ํ•˜ ์—†์ด๋„ ๋ณต์žกํ•œ ํ…์ŠคํŠธ ์˜๋ฏธ๋ฅผ ๋ฐ˜์˜ํ•œ ๊ณ ํ’ˆ์งˆ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์„ ์ตœ์ดˆ๋กœ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


5. 5. OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

arXiv: 2604.18486 | โฌ†๏ธ 84 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: autonomous-driving vla chain-of-thought latent-reasoning world-model real-time-planning qwen-vl

์ด ๋…ผ๋ฌธ์€ ์ž์œจ์ฃผํ–‰์—์„œ Chain-of-Thought(CoT) ์ถ”๋ก ์˜ ๋†’์€ ์ •ํ™•๋„์™€ ์‹ค์‹œ๊ฐ„ ์ฒ˜๋ฆฌ๊ฐ€ ํ•„์š”ํ•œ ์†๋„ ์‚ฌ์ด์˜ Trade-off๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์‹œ๊ฐ์  ์–ธ์–ด์  ์ถ”๋ก ์„ ์••์ถ•๋œ ์ž ์žฌ ํ† ํฐ์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๋‹จ ํ•œ ๋ฒˆ์˜ ๋‹จ๊ณ„(One-step)๋กœ ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•œ ์ฃผํ–‰ ๊ณ„ํš์„ ์ˆ˜๋ฆฝํ•˜๋Š” ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿ“… ์ƒ์„ฑ์ผ: 2026-04-26 | ๐Ÿค– GLM-4.7 Weekly Digest