AI Infra · AI 洞察

趋势信号公众号普通来源

AI没人教，却偷偷学会了“情绪”

钛媒体AGI · 2026-05-31 · AI Infra

AI没人教，却偷偷学会了“情绪”

GPU

76 综合 80 重要 65 热度

趋势信号公众号普通来源

面对相同的物理法则，GPU、FPGA与碳基大脑为何走向不同分支？

WeChat AI Sources (WeRSS Local) · 2026-05-30 · AI Infra

面对相同的物理法则，GPU、FPGA与碳基大脑为何走向不同分支？

GPU

76 综合 80 重要 65 热度

今日高优先公众号普通来源

超越TurboQuant，面向长上下文推理的真2-bit KV Quantization算法问世

WeChat AI Sources (WeRSS Local) · 2026-05-29 · AI Infra

超越TurboQuant，面向长上下文推理的真2-bit KV Quantization算法问世

quantization推理系统

85 综合 92 重要 65 热度

今日高优先论文普通来源

Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference

arXiv AI Infra · 2026-05-12 · AI Infra

这条论文在当前主题评分较高，值得快速判断：Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference。

GPUHBMMLIRTensorRT-LLMmemoryvLLM

91 综合 100 重要 65 热度

今日高优先论文普通来源

Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC

arXiv AI Infra · 2026-04-08 · AI Infra

这条论文在当前主题评分较高，值得快速判断：Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC。

GPUNPURDMASGLangTensorRT-LLMmemory

91 综合 100 重要 65 热度

今日高优先论文普通来源

Comparative Analysis of Large Language Model Inference Serving Systems: A Performance Study of vLLM and HuggingFace TGI

arXiv AI Infra · 2025-11-17 · AI Infra

这条论文在当前主题评分较高，值得快速判断：Comparative Analysis of Large Language Model Inference Serving Systems: A Performance Study of vLLM and HuggingFace TGI。

GPUPagedAttentionTGImemoryvLLM推理系统

91 综合 100 重要 65 热度

今日高优先论文普通来源

TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference

arXiv AI Infra · 2025-05-16 · AI Infra

这条论文在当前主题评分较高，值得快速判断：TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference。

GPUNVLinkSGLangTensorRT-LLMvLLM

91 综合 100 重要 65 热度

今日高优先论文普通来源

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving

arXiv AI Infra · 2025-01-02 · AI Infra

这条论文在当前主题评分较高，值得快速判断：FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving。

DiTFlashInferGPUSGLangmemoryvLLM

91 综合 100 重要 65 热度

今日高优先论文普通来源

COMET: Towards Partical W4A4KV4 LLMs Serving

arXiv AI Infra · 2024-10-16 · AI Infra

这条论文在当前主题评分较高，值得快速判断：COMET: Towards Partical W4A4KV4 LLMs Serving。

DiTGPUKV cacheTensorRT-LLMmemoryquantization

91 综合 100 重要 65 热度

今日高优先论文普通来源

SCOOT: SLO-Oriented Performance Tuning for LLM Inference Engines

arXiv AI Infra · 2024-08-08 · AI Infra

这条论文在当前主题评分较高，值得快速判断：SCOOT: SLO-Oriented Performance Tuning for LLM Inference Engines。

TensorRT-LLMvLLM

91 综合 100 重要 65 热度

今日高优先论文普通来源

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

arXiv AI Infra · 2024-05-07 · AI Infra

这条论文在当前主题评分较高，值得快速判断：QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving。

DiTGPUKV cacheTensorRT-LLMmemoryquantization

91 综合 100 重要 65 热度

今日高优先论文普通来源

SGLang: Efficient Execution of Structured Language Model Programs

arXiv AI Infra · 2023-12-12 · AI Infra

这条论文在当前主题评分较高，值得快速判断：SGLang: Efficient Execution of Structured Language Model Programs。

KV cacheNPUSGLang

91 综合 100 重要 65 热度