๐Ÿ“š Weekly AI Paper Digest

๊ธฐ๊ฐ„: 2026-05-18 ~ 2026-05-23 ์„ ์ •: ์ด๋ฒˆ ์ฃผ ๊ฐ€์žฅ ์ฃผ๋ชฉ๋ฐ›์€ ๋…ผ๋ฌธ Top 5


๐Ÿ† ์ด๋ฒˆ ์ฃผ Top 5

์ˆœ์œ„๋…ผ๋ฌธโฌ†๏ธDeep Dive
๐Ÿฅ‡CiteVQA: Benchmarking Evidence Attributiโ€ฆ262DD-092
๐ŸฅˆCode as Agent Harness199DD-093
๐Ÿฅ‰Anti-Self-Distillation for Reasoning RL โ€ฆ189DD-094
4.DelTA: Discriminative Token Credit Assigโ€ฆ189DD-095
5.TransitLM: A Large-Scale Dataset and Benโ€ฆ167DD-096

๐Ÿ” ์ด๋ฒˆ ์ฃผ ํŠธ๋ Œ๋“œ

ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ

  • ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ์ถ”๋ก  (Trustworthy Reasoning): ๋‹จ์ˆœํžˆ ์ •๋‹ต์„ ๋งžํžˆ๋Š” ๊ฒƒ์„ ๋„˜์–ด, ๋‹ต๋ณ€์˜ ๊ทผ๊ฑฐ๊ฐ€ ์˜ฌ๋ฐ”๋ฅธ์ง€ ์ฆ๋ช…ํ•˜๊ฑฐ๋‚˜(Evidence Attribution) ํ•™์Šต ๊ณผ์ •์—์„œ์˜ ์‹ ๋ขฐ์„ฑ์„ ๋†’์ด๋Š” ๋ฐฉํ–ฅ(RL)์˜ ์—ฐ๊ตฌ.
  • ์ฝ”๋“œ ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ (Code as Agent Harness): ์ฝ”๋“œ๋ฅผ ๋‹จ์ˆœํ•œ ์ƒ์„ฑ ๊ฒฐ๊ณผ๋ฌผ์ด ์•„๋‹Œ, ์—์ด์ „ํŠธ๊ฐ€ ์‚ฌ๊ณ ํ•˜๊ณ  ํ–‰๋™ํ•˜๋ฉฐ ํ™˜๊ฒฝ์„ ๋ชจ๋ธ๋งํ•˜๋Š” ํ•ต์‹ฌ ๋„๊ตฌ(Substrate)๋กœ ํ™œ์šฉํ•˜๋Š” ํŒจ๋Ÿฌ๋‹ค์ž„.
  • ๊ฐ•ํ™” ํ•™์Šต์˜ ์ •๊ตํ™” (Advanced RL for Reasoning): ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•œ ๋ณด์ƒ(Verifiable Rewards)์„ ํ†ตํ•ด ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํ‚ค์šฐ๋˜, ํ† ํฐ ๋‹จ์œ„์˜ ์‹ ๋ขฐ ํ• ๋‹น(Credit Assignment)์ด๋‚˜ ์ž๊ธฐ ์ฆ๋ฅ˜(Self-Distillation)์˜ ์‹คํŒจ ์›์ธ์„ ๋ถ„์„ํ•˜์—ฌ ํ•™์Šต ํšจ์œจ์„ ๊ทน๋Œ€ํ™”ํ•˜๋ ค๋Š” ์‹œ๋„.
  • ๊ตฌ์กฐ ๋…๋ฆฝ์  ์ง€๋Šฅ (Map-Free Intelligence): ๋ณต์žกํ•œ ๋‚ด๋ถ€ ์ง€๋„ ์—”์ง„์ด๋‚˜ ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์˜์กดํ•˜์ง€ ์•Š๊ณ  ์–ธ์–ด ๋ชจ๋ธ์ด ์ง์ ‘ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๊ฒฝ๋กœ๋ฅผ ๊ณ„ํšํ•˜๋Š” ์‹ค์šฉ์  ์ ‘๊ทผ.

๊ณตํ†ต ์ฃผ์ œ

์ด๋ฒˆ ์ฃผ ๋…ผ๋ฌธ๋“ค์€ AI ๋ชจ๋ธ์ด ๋‹จ์ˆœํžˆ โ€˜์ •๋‹ต์„ ์ƒ์„ฑโ€™ํ•˜๋Š” ๋‹จ๊ณ„๋ฅผ ๋„˜์–ด, ๊ทธ ์ •๋‹ต์ด โ€˜์–ด๋–ป๊ฒŒ(How)โ€™ ๊ทธ๋ฆฌ๊ณ  โ€˜์™œ(Why)โ€™ ๋„์ถœ๋˜์—ˆ๋Š”์ง€์— ๋Œ€ํ•œ ์‹ ๋ขฐ์„ฑ๊ณผ ํˆฌ๋ช…์„ฑ์„ ํ™•๋ณดํ•˜๋Š” ๋ฐ ์ฃผ๋ ฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌธ์„œ ๋ถ„์„์—์„œ์˜ ๊ทผ๊ฑฐ ์ œ์‹œ ์š”๊ตฌ, ์ˆ˜ํ•™ ์ถ”๋ก ์—์„œ์˜ ๋ณด์ƒ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๋ถ„์„, ๊ทธ๋ฆฌ๊ณ  ์ฝ”๋“œ๋ฅผ ํ†ตํ•œ ํ™˜๊ฒฝ ์ƒํ˜ธ์ž‘์šฉ ๋“ฑ ๋ชจ๋ธ์˜ ์‚ฌ๊ณ  ๊ณผ์ •(Process)์„ ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•œ ํ˜•ํƒœ๋กœ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ๊ณตํ†ต๋œ ๋ฐฉํ–ฅ์„ฑ์ž…๋‹ˆ๋‹ค.

์ฃผ๋ชฉํ•  ์ 

CiteVQA๋Š” ๊ธฐ์กด VQA ํ‰๊ฐ€ ๋ฐฉ์‹์˜ ํ•œ๊ณ„๋ฅผ ์ง€์ ํ•˜๋ฉฐ, ๋ชจ๋ธ์ด ์šฐ์—ฐํžˆ ์ •๋‹ต์„ ๋งžํžˆ๋”๋ผ๋„ ์ž˜๋ชป๋œ ๊ทผ๊ฑฐ๋ฅผ ๋Œ”์„ ๊ฒฝ์šฐ ์ด๋ฅผ ์˜ค๋‹ต์œผ๋กœ ๊ฐ„์ฃผํ•ด์•ผ ํ•œ๋‹ค๊ณ  ์ฃผ๋ชฉํ•˜์—ฌ ํฅ๋ฏธ๋ฅผ ๋•๋‹ˆ๋‹ค. ๋˜ํ•œ, DelTA์™€ Anti-Self-Distillation ๋…ผ๋ฌธ์—์„œ๋Š” ๊ฑฐ์‹œ์ ์ธ ๊ฐ•ํ™” ํ•™์Šต ์„ฑ๋Šฅ ํ–ฅ์ƒ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋ณด์ƒ์ด ๊ฐœ๋ณ„ ํ† ํฐ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ๋ถ„์„ํ•˜๊ฑฐ๋‚˜ ์ž๊ธฐ ์ฆ๋ฅ˜ ํ•™์Šต์ด ์–ธ์ œ ์‹คํŒจํ•˜๋Š”์ง€๋ฅผ ์ˆ˜ํ•™์ ์œผ๋กœ(์ƒํ˜ธ ์ •๋ณด๋Ÿ‰ ๋“ฑ) ๊ทœ๋ช…ํ•˜์—ฌ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋‚ด๋ถ€ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํŒŒ๊ณ ๋“œ๋Š” ๋ฏธ์„ธํ•œ ์ ‘๊ทผ์ด ์ธ์ƒ์ ์ž…๋‹ˆ๋‹ค.

์‹ค๋ฌด ์‹œ์‚ฌ์ 

๊ฐœ๋ฐœ์ž์™€ ์—ฐ๊ตฌ์ž๋Š” RAG(๊ฒ€์ƒ‰ ์ฆ๊ฐ• ์ƒ์„ฑ)๋‚˜ ๋ฌธ์„œ ๋ถ„์„ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•  ๋•Œ ์ตœ์ข… ๋‹ต๋ณ€์˜ ์ •ํ™•๋„๋ฟ๋งŒ ์•„๋‹ˆ๋ผ **๋‹ต๋ณ€์˜ ๊ทผ๊ฑฐ๊ฐ€ ์ถœ์ฒ˜์™€ ์ผ์น˜ํ•˜๋Š”์ง€๋ฅผ ๊ฒ€์ฆํ•˜๋Š” ํ”„๋กœ์„ธ์Šค(Citation verification)**๋ฅผ ๋ณ„๋„๋กœ ์„ค๊ณ„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ณต์žกํ•œ ์ถ”๋ก ์ด ํ•„์š”ํ•œ ์—์ด์ „ํŠธ๋ฅผ ๊ฐœ๋ฐœ ์‹œ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ์ธํ„ฐํŽ˜์ด์Šค๋กœ ์ ๊ทน ํ™œ์šฉํ•˜๊ณ , ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ• ์‹œ ์™ธ๋ถ€ ๋„๊ตฌ ์˜์กด๋„๋ฅผ ๋‚ฎ์ถ˜ ์ˆœ์ˆ˜ ์–ธ์–ด ๊ธฐ๋ฐ˜์˜ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ(Map-free)์„ ํ‚ค์šฐ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ๋ คํ•ด๋ณผ ๋งŒํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ“‘ ๋…ผ๋ฌธ๋ณ„ ์š”์•ฝ

๐Ÿฅ‡ 1. CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

arXiv: 2605.12882 | โฌ†๏ธ 262 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: citevqa document-intelligence multimodal-llm benchmark hallucination evidence-attribution trustworthy-ai doc-vqa

์ด ๋…ผ๋ฌธ์€ ๋ฌธ์„œ ์ดํ•ด ๋ชจ๋ธ์˜ ์‹ ๋ขฐ์„ฑ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ๋‹จ์ˆœํ•œ ์ •๋‹ต๋ฅ  ํ‰๊ฐ€๋ฅผ ๋„˜์–ด, ๋ชจ๋ธ์ด ๋‹ต์„ ๋„์ถœํ•œ ๊ทผ๊ฑฐ๊ฐ€ ๋˜๋Š” ๋ฌธ์„œ ๋‚ด ํŠน์ • ์œ„์น˜๋ฅผ ์ •ํ™•ํžˆ ์ธ์šฉ(Citation)ํ•˜๋Š”์ง€๊นŒ์ง€ ํ‰๊ฐ€ํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ์•ˆํ–ˆ๋‹ค๋Š” ์ ์—์„œ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅˆ 2. Code as Agent Harness

arXiv: 2605.18747 | โฌ†๏ธ 199 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: llm-agent code-as-harness multi-agent-system software-engineering prompt-engineering ai-orchestration reasoning survey-paper

์ด ๋…ผ๋ฌธ์€ ์ฝ”๋“œ๋ฅผ ๋‹จ์ˆœํ•œ ์ƒ์„ฑ ๊ฒฐ๊ณผ๋ฌผ์ด ์•„๋‹Œ, AI ์—์ด์ „ํŠธ๊ฐ€ ์ถ”๋ก ํ•˜๊ณ  ํ–‰๋™ํ•˜๋ฉฐ ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ์ž‘์šฉํ•˜๋Š” ํ•ต์‹ฌ ์ธํ”„๋ผ์ธ โ€˜์—์ด์ „ํŠธ ํ•˜๋„ค์Šค(Agent Harness)โ€˜๋กœ ์žฌ์ •์˜ํ•˜๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ์†Œํ”„ํŠธ์›จ์–ด ๊ณตํ•™์˜ ํŒจ๋Ÿฌ๋‹ค์ž„์„ ๋‹จ์ผ ๋ชจ๋ธ ์ƒ์„ฑ์—์„œ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ํ˜‘์—… ์ฒด๊ณ„๋กœ ํ™•์žฅํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ์ด๋ก ์  ํ‹€์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅ‰ 3. Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

arXiv: 2605.11609 | โฌ†๏ธ 189 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: anti-self-distillation reasoning-rl pmi llm math-reasoning rlvr on-policy-learning

์ˆ˜ํ•™์  ์ถ”๋ก  ๊ณผ์ •์—์„œ ๋ชจ๋ธ ์Šค์Šค๋กœ ์ƒ์„ฑํ•œ ์ •๋‹ต ์ •๋ณด๊ฐ€ ์˜คํžˆ๋ ค ํƒ์ƒ‰ ๋Šฅ๋ ฅ์„ ์ €ํ•ดํ•œ๋‹ค๋Š” ๋ฌธ์ œ๋ฅผ ํ†ต๊ณ„์  ๋ถ„์„์œผ๋กœ ๊ทœ๋ช…ํ•˜๊ณ , ์ด๋ฅผ ์—ญ์ด์šฉํ•˜์—ฌ ํ•™์Šต ํšจ์œจ์„ ํš๊ธฐ์ ์œผ๋กœ ๋†’์ธ ์ƒˆ๋กœ์šด ๊ฐ•ํ™” ํ•™์Šต ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์‹œํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


4. 4. DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

arXiv: 2605.21467 | โฌ†๏ธ 189 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: rlvr llm reasoning credit-assignment reinforcement-learning math-reasoning delta-paper

์ด ๋…ผ๋ฌธ์€ ์‹œํ€€์Šค ์ˆ˜์ค€์˜ ๋ณด์ƒ๋งŒ์œผ๋กœ ํ† ํฐ ์ˆ˜์ค€์˜ ํ•™์Šต์„ ์œ ๋„ํ•˜๋Š” ๊ฐ•ํ™” ํ•™์Šต(RLVR) ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ด๋ก ์ ์œผ๋กœ ๊ทœ๋ช…ํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ์–ธ์–ด ๋ชจ๋ธ์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํšจ๊ณผ์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์ƒˆ๋กœ์šด ์‹ ์šฉ ํ• ๋‹น ๊ธฐ๋ฒ•์„ ์ œ์‹œํ–ˆ๊ธฐ์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


5. 5. TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation

arXiv: 2605.22355 | โฌ†๏ธ 167 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: transitlm route-planning llm nlp spatial-reasoning map-free benchmark transportation

์ด ๋…ผ๋ฌธ์ด ์ค‘์š”ํ•œ ์ด์œ ๋Š” ๋ณต์žกํ•œ ๋Œ€์ค‘๊ตํ†ต ๊ฒฝ๋กœ ํƒ์ƒ‰์„ ๊ธฐ์กด์˜ ์ง€๋„ ๋ฐ์ดํ„ฐ๋‚˜ ๋ผ์šฐํŒ… ์—”์ง„(Routing Engine) ์—†์ด ์˜ค๋กœ์ง€ ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ ํ•™์Šตํ•˜์—ฌ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Œ์„ ์ฆ๋ช…ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿ“… ์ƒ์„ฑ์ผ: 2026-05-24 | ๐Ÿค– GLM-4.7 Weekly Digest