Research — AI news & research

Benchmarks, datasets, training techniques and the papers pushing the state of the art. Updated continuously from 20+ curated sources.

Agents Vision Robotics Policy Research Tools LLMs

arxiv · AI

ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs

arXiv:2606.12451v1 Announce Type: new Abstract: Large language models deployed as agents over large tool catalogs face a critical tool-retrieval bottleneck. As embedding-based retrieval approaches rely on compact encode…

arxiv · Computer Vision

Stereo Vision-Based Fall Prediction and Detection using Human Pose Estimation on the AMD Kria K26 SOM

arXiv:2606.12473v1 Announce Type: new Abstract: Background and Objective: Falls among elderly people can cause serious injury and reduce quality of life. Timely prediction and detection are essential to prevent harm and…

arxiv · Machine Learning

Quickest Detection of Hallucination Onset: Delay Bounds and Learned CUSUM Statistics

arXiv:2606.12476v1 Announce Type: new Abstract: Token-level hallucination detectors are evaluated as classifiers, by AUC over all tokens, yet a streaming monitor is judged by its reaction time: the number of tokens that…

arxiv · Machine Learning

Boltzmann Attention: Learnable Ising Couplings for Cooperative Attention

arXiv:2606.12478v1 Announce Type: new Abstract: Attention mechanisms are central to modern sequence models, yet standard attention computes relevance primarily through individual query--key similarities. Although softma…

arxiv · Machine Learning

ReCal: Reward Calibration for RL-based LLM Routing

arXiv:2606.12479v1 Announce Type: new Abstract: Large language model (LLM) routing has emerged as an effective paradigm for leveraging the complementary strengths of multiple LLMs through dynamic model and reasoning-str…

arxiv · Machine Learning

Representing Time Series as Structured Programs for LLM Reasoning

arXiv:2606.12481v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated strong reasoning and instruction-following capabilities, making them potentially powerful tools for time-series analysis. Ho…

arxiv · Machine Learning

Scalable anomaly detection via a univariate Christoffel function

arXiv:2606.12483v1 Announce Type: new Abstract: Anomaly detection plays a critical role in identifying unusual patterns across domains such as fraud detection, network intrusion, and system fault diagnosis. Recently, Ch…

arxiv · Machine Learning

Speculative Rollback Correction for Quality-Diverse Web Agent Imitation

arXiv:2606.12485v1 Announce Type: new Abstract: Training interactive web agents through imitation learning from expert trajectories has emerged as a highly effective approach. However, determining the optimal timing for…

arxiv · Machine Learning

An Empirical Study on Predictive Maintenance for Component X in Heavy-Duty Scania Trucks

arXiv:2606.12486v1 Announce Type: new Abstract: Condition-based Predictive Maintenance (PdM) for truck fleets has gained momentum in recent years. This maintenance strategy aims to minimize unplanned downtimes and reduc…

arxiv · Machine Learning

DynamicPTQ: Mitigating Activation Quantization Collapse via Residual-Stream Dynamics

arXiv:2606.12487v1 Announce Type: new Abstract: Post-training quantization (PTQ) is essential for efficient large language model inference, but reliably quantizing activations remains challenging when weights, activatio…

arxiv · Machine Learning

A Stationary (and Therefore Compatible) Representation is All You Need

arXiv:2606.12488v1 Announce Type: new Abstract: Learning compatible representations aims to learn feature representations that can be used interchangeably over time whenever a model undergoes updates. In this paper, we…

arxiv · Machine Learning

Robustness Verification of Recurrent Neural Networks with Abstraction Refinement

arXiv:2606.12490v1 Announce Type: new Abstract: Certified local robustness verification for recurrent neural networks (RNNs) is challenging because approximation errors introduced by nonlinear relaxations can propagate…

arxiv · Machine Learning

Net-Ev$^2$: A Generative Simulator for Network Event Evolution

arXiv:2606.12494v1 Announce Type: new Abstract: Reducing real-world trial and error has long been a central goal of decision making, and generative simulators advance this goal by modeling the evolution of future states…

arxiv · Machine Learning

$\mu$VLA: On Recurrent Memory for Partially Observable Manipulation in VLA Models

arXiv:2606.12497v1 Announce Type: new Abstract: Vision-language-action (VLA) models predict chunks of future actions from the current observation, an assumption that fails under partial observability, where decisions de…

arxiv · Machine Learning

Improving Crash Frequency Prediction from Simulated Traffic Conflicts Using Machine Learning Based Microsimulation

arXiv:2606.12500v1 Announce Type: new Abstract: Traffic microsimulation combined with surrogate safety measures has increasingly been used as a proactive alternative to historical crash data for predicting crash frequen…

arxiv · Machine Learning

Policy-driven Conformal Prediction for Trustworthy QoT Estimation

arXiv:2606.12501v1 Announce Type: new Abstract: We propose Conformal QoT, a policy-driven framework that combines statistically guaranteed QoT estimation with operational decision policies, enabling reliable lightpath-f…

arxiv · Machine Learning

Dolph2Vec: Self-Supervised Representations of Dolphin Vocalizations

arXiv:2606.12503v1 Announce Type: new Abstract: Self-supervised learning (SSL) has opened new opportunities in bioacoustics by enabling scalable modeling of animal vocalizations without the need for expensive manual ann…

arxiv · Machine Learning

Boosting Direct Preference Optimization with Penalization

arXiv:2606.12505v1 Announce Type: new Abstract: Offline preference optimization has become a practical substitute for reinforcement learning from human feedback, but pairwise objectives such as Direct Preference Optimiz…

arxiv · Machine Learning

Rubric-Guided Self-Distillation: Post-Training Without Rubric Verifiers

arXiv:2606.12507v1 Announce Type: new Abstract: Rubrics have emerged as an alternative to RLVR in open-ended domains where a single ground-truth final answer is not available. Existing rubric-based training methods rely…

arxiv · Machine Learning

Crossing the Validation Crisis: Cross-Validation Reduces Benchmarking Variance Surprisingly Well

arXiv:2606.12552v1 Announce Type: new Abstract: Modern machine learning progresses through empirical work, benchmarking new methods to evaluate relative performance. However, the statistical variability inherent to eval…

arxiv · Computer Vision

HairPort: In-context 3D-aware Hair Import and Transfer for Images

arXiv:2606.12562v1 Announce Type: new Abstract: Transferring hairstyles between images is an important but challenging task in computer graphics, computer vision, and visual effects. It enables users to explore new look…

arxiv · AI

Arbor: Tree Search as a Cognition Layer for Autonomous Agents

arXiv:2606.12563v1 Announce Type: new Abstract: Arbor is a multi-agent framework that introduces structured tree search as a cognition layer for autonomous agents operating in large, stateful action spaces. Prior autono…

arxiv · NLP / LLM

EDEN: A Large-Scale Corpus of Clinical Notes for Italian

arXiv:2606.12569v1 Announce Type: new Abstract: We present EDEN (Emergency Department Electronic Notes), a new and unique large-scale corpus of clinical notes produced in Emergency Departments of Italian hospitals. The…

arxiv · Computer Vision

High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

arXiv:2606.12575v1 Announce Type: new Abstract: Few-step diffusion distillation has become increasingly mature for 4-8-step generation, yet pushing further to 2 steps remains challenging. In this work, we introduce Z-Im…

arxiv · NLP / LLM

Helping Figures Tell their Story! Paper-Grounded Video Generation Explaining Complex Scientific Figures

arXiv:2606.12576v1 Announce Type: new Abstract: Scientific figures compress complex pipelines into a single canvas, yet understanding them requires paper-grounded, step-by-step narration aligned with visual highlights a…

arxiv · NLP / LLM

MARD: Mirror-Augmented Reasoning Distillation for Mechanism-Level Drug-Drug Interaction Prediction

arXiv:2606.12578v1 Announce Type: new Abstract: Mechanism-level drug-drug interaction (DDI) prediction requires identifying which enzyme or pharmacodynamic axis is implicated, in which direction, and with which evidence…

arxiv · AI

Strategic Decision Support for AI Agents

arXiv:2606.12587v1 Announce Type: new Abstract: Traditionally, decision support studies how humans use machine learning models to make better decisions. In modern agentic systems, this division of roles is increasingly…

arxiv · Computer Vision

Analyzing and Improving Fine-grained Preference Optimization in Medical LVLMs

arXiv:2606.12590v1 Announce Type: new Abstract: Large Vision-Language Models (LVLMs) have achieved strong performance across medical imaging tasks, yet they remain prone to factual inconsistencies, poor visual grounding…

arxiv · AI

Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation

arXiv:2606.12594v1 Announce Type: new Abstract: Modern Lean theorem provers achieve strong performance only with substantial training and inference compute, driven in part by scarce verified proof data and the long reas…

arxiv · Machine Learning

Emerging Flexible Designs for Geospatial Multimodal Foundation Models

arXiv:2606.12595v1 Announce Type: new Abstract: Foundation models are rapidly transforming Earth observation by enabling scalable pretraining across diverse unlabeled geospatial modalities. However, their architectural…

sciencedaily

A classic brain test exposed AI's biggest weakness

Researchers gave top AI models a classic attention test used in psychology and found a major flaw. While the models could correctly name colors in short lists, their performance deteriorated sharply as the task became lo…

arxiv

Automated Scoring of Arabic Text Using Large Language Models: A Literature Review

arXiv:2606.09830v1 Announce Type: new Abstract: In modern educational systems, Automatic Text Scoring (ATS) plays a central role by enabling scalable and consistent evaluation of learner responses without human intervention. Recently, the increased accessibility of LLMs and Arabic-specific datasets…

arxiv

Mechanistic Analysis of Alignment Algorithms in Language Models

arXiv:2606.09850v1 Announce Type: new Abstract: Post-training alignment algorithms are predominantly evaluated as black boxes, obscuring how they reshape language models' internal computations. We present a systematic mechanistic analysis of six preference-optimization methods: PPO, DPO, SimPO, ORP…

arxiv

SynIB: Informational Bottleneck for Maximizing Synergy in Multimodal Learning

arXiv:2606.09853v1 Announce Type: new Abstract: A central objective in multimodal learning is to capture synergy: task-relevant information that arises only from the joint use of multiple modalities, and is not available from any single modality alone. While most approaches operate at the architect…

arxiv

Can Multi-Agent LLMs Identify Their Peers? Stylometric Fingerprinting in Role-Constrained Political Analysis

arXiv:2606.09854v1 Announce Type: new Abstract: Multi-agent large language model (LLM) pipelines for political statement analysis are vulnerable to peer-preservation bias: models tend to protect peer models from deactivation and show identity-dependent scoring distortions. Prompt-level anonymizatio…

arxiv

Using Probabilistic Programs to Train Inductive Reasoning in Large Language Models

arXiv:2606.09856v1 Announce Type: new Abstract: Post-training Large Language Models (LLMs) for reasoning typically focuses on deductive tasks such as mathematics and coding where correctness is verifiable. Yet, many real-world reasoning problems are inductive: agents must infer uncertain beliefs fr…

arxiv

Uncertainty-aware Multi-fidelity Closure via Conditional Normalizing Flows

arXiv:2606.09857v1 Announce Type: new Abstract: Reduced-order models (ROMs) provide an efficient surrogate for complex multiscale systems, but their predictive accuracy is often compromised by truncation errors and the inadequate representation of interactions between resolved and unresolved scales…

arxiv

Mitigating Manifold Departure: Uncertainty-Aware Subspace Rectification for Trustworthy MLLM Decoding

arXiv:2606.09859v1 Announce Type: new Abstract: MLLMs frequently hallucinate objects inconsistent with visual inputs. This issue is typically attributed to the over-reliance on language priors, which can override the visual context. Recent training-free decoding strategies address this by penalizin…

arxiv

Conformal Risk Prediction for Non-Alcoholic Fatty Liver Disease Using Gradient Boosting with Distribution-Free Coverages

arXiv:2606.09860v1 Announce Type: new Abstract: Non-alcoholic fatty liver disease (NAFLD) affects roughly 25% of global adults, posing substantial hepatic and cardiovascular risks. Yet, population-level screening tools remain inadequate. We present Method, a machine-learning framework for NAFLD ris…

arxiv

Time Series as Language: A Universal Tokenizer for General-Purpose Time Series Foundation Models

arXiv:2606.09861v1 Announce Type: new Abstract: While Next-Token Prediction (NTP) has unified LLM pretraining, its adaptation to unbounded, continuous time series (TS) remains open. To bridge the gap, we introduce UniTok, a universal tokenizer that transforms TS into discrete tokens, and UniTok-FM,…

arxiv

Blurry Window Attention

arXiv:2606.09862v1 Announce Type: new Abstract: The Softmax Attention operation in Transformer language models has a quadratic complexity in the sequence length and a growing state size in the form of KV cache, which becomes a bottleneck in long context scenarios. To overcome this limitation, alter…

ieee

Beyond Dexterity: Why Contact May Define the Next Era of Robotics

This article is brought to you by AGILINK.Throughout the exhibition hall at the 2026 IEEE International Conference on Robotics (ICRA), in Vienna, one demonstration seemed to attract a disproportionate amount of attention…

arxiv

Page image classifier fine-tuned on century-spanning archives of scanned documents for further content-specific processing

arXiv:2606.07558v1 Announce Type: new Abstract: Purpose: Digitization projects in the humanities produce vast, heterogeneous archives of historical documents, making manual sorting impractical at scale. This work addresses the need for an automated system to classify scanned page images based on vi…

arxiv

Multimodal Group Emotion Recognition In-the-Wild Towards a Privacy-Safe Non-Individual Approach

arXiv:2606.07585v1 Announce Type: new Abstract: This thesis addresses group emotion recognition (GER) in-the-wild with a focus on privacy preservation. Unlike traditional emotion recognition methods that rely on individual-level cues such as face, gaze, or voice analysis, this work uses collective …

arxiv

SlideCheck: Guiding Self-Supervised Pretraining of Pathology Foundation Models via Dataset Distributions

arXiv:2606.07590v1 Announce Type: new Abstract: Pathology foundation models are pretrained on large streams of WSI-derived patches, while supervision during data construction is often slide-level, sparse, or heterogeneous. This mismatch makes it difficult to understand and control which biological …

arxiv

A Mechanistic Analysis of Adversarial Fine-tuning of Vision Transformers

arXiv:2606.07593v1 Announce Type: new Abstract: The widespread use of image classification models in high-risk, real-world situations necessitates making these models robust to slight disturbances or perturbations, such as blurring or sharpening, in the input images. While vision transformers (ViTs…

arxiv

VisualLeakBench: Reproducible Action-Boundary Propagation Failures in Vision-Language Agents

arXiv:2606.07595v1 Announce Type: new Abstract: Vision-language agents increasingly consume screenshots, documents, and user interfaces before writing to memory, sending messages, or invoking external tools. We study a concrete failure mode in this setting: action-boundary propagation, where sensit…

arxiv

Can You Trust What You See? Human and AI Detection of Synthetic Legal Evidence

arXiv:2606.07613v1 Announce Type: new Abstract: Visual evidence has long been treated as a reliable form of legal proof, but advances in artificial intelligence (AI) are undermining that assumption. This article asks how well humans and frontier multimodal large language models (MLLMs) can distingu…

arxiv

SENTRY: Statistical Reliability Analysis of Vision Transformers Under Soft Errors

arXiv:2606.07620v1 Announce Type: new Abstract: With the growth of Vision Transformers in safety-critical domains like autonomous systems and medical imaging, ensuring their reliability against soft errors is paramount. While ViTs offer state-of-the-art accuracy, their massive parameter counts rend…

arxiv

Eyes All Around: Design and Analysis of 360-Degree LiDAR Perception Using Equivariant Feature Learning in Unstructured Traffic

arXiv:2606.07626v1 Announce Type: new Abstract: Perception in dense, unstructured urban traffic remains a major challenge for autonomous driving because of the wide variety of road users, frequent occlusions, irregular motion patterns, and the lack of standardized road layouts. Although recent LiDA…

arxiv

AMN: An Adaptive Multi-Scale Fusion Network with Boundary and Uncertainty Modeling for Nuclei Segmentation

arXiv:2606.07633v1 Announce Type: new Abstract: Accurate classification of nuclei subtypes in histopathology images is critical for downstream tasks including tumor grading, immune infiltrate quantification, and prognosis prediction. Existing approaches rely on either convolutional or transformer-b…

arxiv

NeuroAlign: Hierarchical Multimodal Fusion of Dynamic and Structural Neuroimaging for MCI Analysis

arXiv:2606.07635v1 Announce Type: new Abstract: Multimodal neuroimaging fusion of functional MRI (fMRI) and diffusion tensor imaging (DTI) provides complementary information for cognitive impairment analysis, but remains challenged by heterogeneous feature spaces and misaligned representations. We …

arxiv

PathoSage: Towards Multi-Source Evidence Adjudication in Pathology via Experience-Aware Agentic Workflow

arXiv:2606.07549v1 Announce Type: new Abstract: Recent advances in Multimodal Large Language Models (MLLMs) and agent workflows have shown strong promise for computational pathology, yet reliable patch-level reasoning remains challenging. End-to-end pathology MLLMs often hallucinate morphological f…

arxiv

OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs

arXiv:2606.07577v1 Announce Type: new Abstract: Audio-visual large language models (LLMs) hold strong promise for long-form video understanding, yet their long-video inference is fundamentally limited by the linear growth of video tokens and key-value (KV) caches. We present OmniMem, a memory-effic…

arxiv

Syll: Open-Source Personal Automation with Cross-Surface Execution

arXiv:2606.07594v1 Announce Type: new Abstract: Personal AI agents must increasingly operate across APIs, shells, web surfaces, and desktop GUIs, yet many systems remain tuned to a single interface and offer limited support for user teaching and auditability. We present Syll, an open-source, self-h…

arxiv

A case study of evaluating AI agents on a neuroscience data-to-discovery pipeline

arXiv:2606.07718v1 Announce Type: new Abstract: Agentic AI tools offer a promising path to automating software development bottlenecks in scientific research pipelines, particularly for stages that take domain experts days to months to build, where scientists care about correctness and robustness, …

arxiv

Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning

arXiv:2606.07720v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated remarkable reasoning abilities on mathematical and multi-hop planning tasks. The CoCoNuT (Chain of Continuous Thought) paradigm~\cite{hao2024coconut} extends this by enabling models to reason in latent sp…

arxiv

Automatic Extraction of Structured Information from Brain MRI Reports Using an Open-Weight Large Language Model

arXiv:2606.07721v1 Announce Type: new Abstract: Objectives: Automatic data extraction from free-text radiology reports enables large-scale research, but few studies assessed the performance of large language models (LLMs) on Dutch neuroradiology reports. Methods: We analyzed 947 brain MRI reports f…

arxiv

Some hypotheses on how chatbots work in problem-solving-driven conversations. Large Language Models as confirmation of the Innovation Illusion

arXiv:2606.07722v1 Announce Type: new Abstract: This article offers a perspective on the nature of chatbots as genuine conversation partners when discussing problems in relation to their solutions. What can chatbots do and what can't they do, and how can this be explained? Our argument draws on Agg…

arxiv

Land cover and flood type govern the detection limits of satellite-based flood mapping across diverse global flood events

arXiv:2606.07780v1 Announce Type: new Abstract: Floods are among the most destructive natural hazards, and their increasing frequency under climate change makes satellite-based inundation mapping essential for disaster response. Geospatial foundation models pretrained on satellite archives offer ge…

Browse briefing issues →

Get the briefing

The one story that matters, 5 headlines and the paper everyone's citing — every Tuesday, free.

Subscribe free