๐Ÿ“š Weekly AI Paper Digest

๊ธฐ๊ฐ„: 2026-04-06 ~ 2026-04-11 ์„ ์ •: ์ด๋ฒˆ ์ฃผ ๊ฐ€์žฅ ์ฃผ๋ชฉ๋ฐ›์€ ๋…ผ๋ฌธ Top 5


๐Ÿ† ์ด๋ฒˆ ์ฃผ Top 5

์ˆœ์œ„๋…ผ๋ฌธโฌ†๏ธDeep Dive
๐Ÿฅ‡Adamโ€™s Law: Textual Frequency Law on Larโ€ฆ411DD-062
๐ŸฅˆGrandCode: Achieving Grandmaster Level iโ€ฆ348DD-061
๐Ÿฅ‰Rethinking Generalization in Reasoning Sโ€ฆ228DD-066
4.InCoder-32B-Thinking: Industrial Code Woโ€ฆ225DD-064
5.Video-MME-v2: Towards the Next Stage in โ€ฆ225DD-063

๐Ÿ” ์ด๋ฒˆ ์ฃผ ํŠธ๋ Œ๋“œ

ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ

  • ์ถ”๋ก  ๋ฐ ์ฝ”๋“œ ํŠนํ™” (Reasoning & Code Specialization): ๊ฒฝ์Ÿ์  ํ”„๋กœ๊ทธ๋ž˜๋ฐ์ด๋‚˜ ์‚ฐ์—…์šฉ ์ฝ”๋“œ์™€ ๊ฐ™์€ ๊ณ ๋‚œ๋„ ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ์œ„ํ•œ ๋ชจ๋ธ์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ์ฃผ๋ฅผ ์ด๋ฃธ.
  • ์ผ๋ฐ˜ํ™” ๋ฐ ํ•™์Šต ๋ฉ”์ปค๋‹ˆ์ฆ˜ (Generalization & Learning Dynamics): SFT(๊ฐ๋… ๋ฏธ์„ธ ์กฐ์ •)๊ฐ€ ๋‹จ์ˆœ ์•”๊ธฐ์— ๊ทธ์น˜๋Š”์ง€ ์•„๋‹ˆ๋ฉด ์ผ๋ฐ˜ํ™”๊ฐ€ ๊ฐ€๋Šฅํ•œ์ง€๋ฅผ ์žฌ์กฐ๋ช…ํ•˜๋ฉฐ, ์ตœ์ ํ™”์™€ ๋ฐ์ดํ„ฐ์˜ ์กฐ๊ฑด์„ ๋ถ„์„.
  • ๊ฐ•ํ™” ํ•™์Šต์˜ ๋ถ€ํ™œ (Rise of RL): ์ฝ”๋”ฉ ๋ฐ ๋ฌธ์ œ ํ•ด๊ฒฐ ์˜์—ญ์—์„œ ์ธ๊ฐ„ ์ˆ˜์ค€์„ ๋„˜์–ด์„œ๊ธฐ ์œ„ํ•ด ์—์ด์ „ํŠธ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™” ํ•™์Šต(RL)์„ ์ ๊ทน์ ์œผ๋กœ ํ™œ์šฉ.
  • ํ‰๊ฐ€์˜ ๊ฒฌ๊ณ ํ•จ (Robust Evaluation): ๊ธฐ์กด ๋ฒค์น˜๋งˆํฌ์˜ ์ ์ˆ˜ ์ธํ”Œ๋ ˆ์ด์…˜ ๋ฌธ์ œ๋ฅผ ์ง€์ ํ•˜๊ณ , ๋ชจ๋ธ์˜ ์‹ค์ œ ์„ฑ๋Šฅ๊ณผ ์‹ ๋ขฐ์„ฑ์„ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•œ ๋” ์—„๊ฒฉํ•œ ํ‰๊ฐ€ ๊ธฐ์ค€ ์ œ์‹œ.

๊ณตํ†ต ์ฃผ์ œ

์ด๋ฒˆ ์ฃผ ๋…ผ๋ฌธ๋“ค์€ ํŠนํžˆ ์ฝ”๋“œ ์ƒ์„ฑ๊ณผ ๋ณต์žกํ•œ ์ถ”๋ก (Reasoning) ์˜์—ญ์—์„œ AI์˜ ์„ฑ๋Šฅ์„ ์ธ๊ฐ„ ์ˆ˜์ค€ ์ด์ƒ์œผ๋กœ ๋Œ์–ด์˜ฌ๋ฆฌ๋Š” ๋ฐ ์ง‘์ค‘ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹จ์ˆœํžˆ ๋ชจ๋ธ์˜ ๊ทœ๋ชจ๋ฅผ ํ‚ค์šฐ๋Š” ๊ฒƒ์„ ๋„˜์–ด, **๊ฐ•ํ™” ํ•™์Šต(RL)๊ณผ ๊ณ ํ’ˆ์งˆ์˜ ์ถ”๋ก  ๋ฐ์ดํ„ฐ(Chain-of-Thought)**๋ฅผ ์–ด๋–ป๊ฒŒ ํšจ์œจ์ ์œผ๋กœ ํ™œ์šฉํ•  ๊ฒƒ์ธ์ง€, ๊ทธ๋ฆฌ๊ณ  SFT์˜ ์ง„์งœ ์—ญํ• ์ด ๋ฌด์—‡์ธ์ง€์— ๋Œ€ํ•ด ์‹ฌ๋„ ์žˆ๋Š” ๋ถ„์„์„ ์‹œ๋„ํ•˜๊ณ  ์žˆ๋‹ค๋Š” ์ ์ด ํŠน์ง•์ž…๋‹ˆ๋‹ค.

์ฃผ๋ชฉํ•  ์ 

๊ธฐ์กด์—๋Š” โ€˜SFT๋Š” ์•”๊ธฐ๋ฅผ ํ•˜๊ณ  RL์€ ์ผ๋ฐ˜ํ™”๋ฅผ ํ•œ๋‹คโ€™๋Š” ํ†ต๋…์ด ์ง€๋ฐฐ์ ์ด์—ˆ์œผ๋‚˜, **3๋ฒˆ ๋…ผ๋ฌธ(Rethinking Generalization in Reasoning SFT)**์ด ์ด๋ฅผ ๋ฐ˜๋ฐ•ํ•˜๋ฉฐ SFT ์—ญ์‹œ ์ตœ์ ํ™” ์กฐ๊ฑด์— ๋”ฐ๋ผ ๊ฐ•๋ ฅํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ•œ ์ ์ด ๋งค์šฐ ํฅ๋ฏธ๋กญ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ธ๊ฐ„์ด ์—ฌ์ „ํžˆ ์šฐ์œ„๋ฅผ ์ ํ•˜๊ณ  ์žˆ๋˜ **๊ฒฝ์Ÿ์  ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ถ„์•ผ(GrandCode)**์— ๋‹ค์ค‘ ์—์ด์ „ํŠธ RL์„ ๋„์ž…ํ•˜์—ฌ ๊ทธ๋žœ๋“œ๋งˆ์Šคํ„ฐ ์ˆ˜์ค€์— ๋„์ „ํ•˜๊ฑฐ๋‚˜, **์‚ฐ์—…์šฉ ์ฝ”๋“œ(Industrial Code)**์˜ ํ•˜๋“œ์›จ์–ด ์ œ์•ฝ ์กฐ๊ฑด๊นŒ์ง€ ์ดํ•ดํ•˜๋Š” โ€˜์„ธ๊ณ„ ๋ชจ๋ธ(World Model)โ€˜์„ ๊ตฌ์ถ•ํ•˜๋ ค๋Š” ์‹œ๋„๋Š” AI๊ฐ€ ์‹ค์ „ ํ™˜๊ฒฝ์—์„œ ์–ผ๋งˆ๋‚˜ ๋” ๋˜‘๋˜‘ํ•ด์งˆ ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

์‹ค๋ฌด ์‹œ์‚ฌ์ 

๊ฐœ๋ฐœ์ž์™€ ์—ฐ๊ตฌ์ž๋Š” ์ถ”๋ก  ๋Šฅ๋ ฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด ๋ง‰์—ฐํ•œ RL ์ ์šฉ๋ณด๋‹ค๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ตฌ์„ฑ(๊ธด CoT, ์—๋Ÿฌ ์ค‘์‹ฌ ํ•ฉ์„ฑ ๋“ฑ)๊ณผ ์ตœ์ ํ™” ์ „๋žต์„ ์žฌ๊ฒ€ํ† ํ•˜์—ฌ SFT์˜ ์ž ์žฌ๋ ฅ์„ ๋จผ์ € ๊ทน๋Œ€ํ™”ํ•˜๋Š” ๋ฐฉ์•ˆ์„ ๊ณ ๋ คํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ์ฝ”๋”ฉ ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•  ๋•Œ ๋‹จ์ˆœํžˆ ์ •๋‹ต ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋„˜์–ด, ์—”์ง€๋‹ˆ์–ด์˜ ๋ฌธ์ œ ํ•ด๊ฒฐ ๊ณผ์ •(Reasoning Traces)์„ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ํฌํ•จ์‹œ์ผœ ๋„๋ฉ”์ธ ํŠนํ™”๋œ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ฐ•ํ™”ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ชจ๋ธ ํ‰๊ฐ€ ์‹œ ๋†’์€ ๋ฆฌ๋”๋ณด๋“œ ์ ์ˆ˜์— ์•ˆ์ฃผํ•˜๊ธฐ๋ณด๋‹ค Video-MME-v2์™€ ๊ฐ™์€ ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ๊ฐ€ ์ œ์‹œํ•˜๋Š” โ€˜๊ฒฌ๊ณ ํ•จ(Robustness)โ€™ ์ง€ํ‘œ๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์˜ ์‹ค์ œ ์œ ํ‹ธ๋ฆฌํ‹ฐ๋ฅผ ํ™•์ธํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ“‘ ๋…ผ๋ฌธ๋ณ„ ์š”์•ฝ

๐Ÿฅ‡ 1. Adamโ€™s Law: Textual Frequency Law on Large Language Models

arXiv: 2604.02176 | โฌ†๏ธ 411 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: llm data-selection frequency-law prompt-engineering fine-tuning nlp efficiency

๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์˜ ํ•™์Šต๊ณผ ์ถ”๋ก  ๊ณผ์ •์—์„œ ๋” ์ž์ฃผ ๋“ฑ์žฅํ•˜๋Š” ํ…์ŠคํŠธ ํ‘œํ˜„์„ ์‚ฌ์šฉํ•˜๋ฉด ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋œ๋‹ค๋Š” โ€˜ํ…์ŠคํŠธ ๋นˆ๋„ ๋ฒ•์น™โ€™์„ ์ œ์•ˆํ•˜์—ฌ, ํ”„๋กฌํ”„ํŒ…๊ณผ ํŒŒ์ธ ํŠœ๋‹ ํšจ์œจ์„ ๋†’์ด๋Š” ์ƒˆ๋กœ์šด ์ง€์นจ์„ ์ œ์‹œํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅˆ 2. GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning

arXiv: 2604.02721 | โฌ†๏ธ 348 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: competitive-programming reinforcement-learning multi-agent grpo llm agentic-ai code-generation

์ด ๋…ผ๋ฌธ์€ ์ธ๊ฐ„์ด ์šฐ์œ„๋ฅผ ์ ํ•˜๋˜ ๊ฒฝ์Ÿ ํ”„๋กœ๊ทธ๋ž˜๋ฐ(Competitive Programming) ๋ถ„์•ผ์—์„œ AI๊ฐ€ ์ฒ˜์Œ์œผ๋กœ ์‹ค์‹œ๊ฐ„ ๋Œ€ํšŒ์—์„œ ์ธ๊ฐ„ ๊ทธ๋žœ๋“œ๋งˆ์Šคํ„ฐ๋ฅผ ์ œ์น˜๊ณ  1์œ„๋ฅผ ์ฐจ์ง€ํ•œ GrandCode ์‹œ์Šคํ…œ์„ ํ†ตํ•ด, ๋‹ค์ค‘ ์—์ด์ „ํŠธ ํ˜‘์—…๊ณผ ์ง€์—ฐ๋œ ๋ณด์ƒ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ์ƒˆ๋กœ์šด ๊ฐ•ํ™” ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐ€๋Šฅ์„ฑ์„ ์ž…์ฆํ–ˆ๋‹ค๋Š” ์ ์—์„œ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅ‰ 3. Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

arXiv: 2604.06628 | โฌ†๏ธ 228 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: reasoning-sft llm scaling-laws generalization optimization data-efficiency math-reasoning ai-mentoring

์ถ”๋ก  ๋Šฅ๋ ฅ์„ ์œ„ํ•œ ์ง€๋„ ํ•™์Šต(SFT) ๊ณผ์ •์—์„œ ์ตœ์ ํ™”(Optimization), ๋ฐ์ดํ„ฐ(Data), ๋ชจ๋ธ ๋Šฅ๋ ฅ(Model Capability)์ด ์ƒํ˜ธ์ž‘์šฉํ•˜๋Š” ๋ฐฉ์‹์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋ถ„์„ํ•˜์—ฌ, ๋‹จ์ˆœํžˆ ๋ฐ์ดํ„ฐ๋ฅผ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ์„ ๋„˜์–ด ํšจ์œจ์ ์ธ ์ถ”๋ก  ๋ชจ๋ธ ํ•™์Šต ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•œ ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


4. 4. InCoder-32B-Thinking: Industrial Code World Model for Thinking

arXiv: 2604.03144 | โฌ†๏ธ 225 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: industrial-code code-generation chain-of-thought world-model verilog gpu-optimization ai-mentoring llm-reasoning

์ผ๋ฐ˜์ ์ธ ์ฝ”๋“œ ์ƒ์„ฑ ๋Šฅ๋ ฅ๊ณผ ์‚ฐ์—… ํ˜„์žฅ์˜ ์—„๊ฒฉํ•œ ํ•˜๋“œ์›จ์–ด ์ œ์•ฝ ์กฐ๊ฑด์„ ๋งŒ์กฑ์‹œํ‚ค๋Š” ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ฒฐํ•ฉํ•˜์—ฌ, ๋ณต์žกํ•œ ์นฉ ์„ค๊ณ„๋‚˜ GPU ์ตœ์ ํ™”์™€ ๊ฐ™์€ ์‹ค์ „ ์‚ฐ์—… ์ฝ”๋“œ ๊ฐœ๋ฐœ์˜ ์„ฑ๋Šฅ์„ ํš๊ธฐ์ ์œผ๋กœ ๊ฐœ์„ ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


5. 5. Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

arXiv: 2604.05015 | โฌ†๏ธ 225 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: video-mme-v2 benchmark video-understanding evaluation temporal-reasoning data-contamination multimodal-llm robustness

๊ธฐ์กด ๋ฒค์น˜๋งˆํฌ์˜ ์ ์ˆ˜ ๋ถ€ํ’€๋ ค์ง๊ณผ ๋ฐ์ดํ„ฐ ๋ˆ„์ถœ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ, ๋น„๋””์˜ค ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(Video MLLM)์˜ ์ง„์งœ ์ถ”๋ก  ๋Šฅ๋ ฅ๊ณผ ์‹ ๋ขฐ์„ฑ์„ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ์ฐจ์„ธ๋Œ€ ํ‰๊ฐ€ ๊ธฐ์ค€์„ ์ œ์‹œํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿ“… ์ƒ์„ฑ์ผ: 2026-04-12 | ๐Ÿค– GLM-4.7 Weekly Digest