Hi, I'm

Xavier Juanola.

PhD candidate at UPF researching Audio-Visual Sound Source Localization. Bridging deep learning research and real-world ML engineering.

Publications Download CV Contact

4Publications

4Talks

4Years Teaching

Scroll

About Me

Physicist turned AI Researcher

I'm Xavier, a PhD candidate at Universitat Pompeu Fabra (Barcelona), working on Audio-Visual Sound Source Localization within the IMVA group under Prof. Gloria Haro and co-advised by Prof. Magdalena Fuentes (NYU). My research asks: can models learn to localize sound in an image — even when the audio is silence, noise, or from offscreen?

My path has been unconventional. I started with a BS in Theoretical Physics (UB), followed by two master's degrees — Intelligent Interactive Systems (UPF) and Astrophysics & Cosmology (UAB). Before the PhD, I spent 3+ years as a Data Scientist and AI Engineer in industry (DRIVING01, Schneider Electric), building NLP systems and predictive models. A visiting scholar stint at NYU MARL in 2024 deepened my research collaborations and led to an ICASSP 2025 publication.

I'm supported by an FPI doctoral fellowship from Spain's Ministry of Science and am part of the MuVAU research project. When not doing research, I teach Calculus II labs at UPF.

Education

PhD, Information & Communication Technologies — UPF (2022–present)
MS, Astrophysics & Cosmology — UAB (2020–2021)
MS, Intelligent Interactive Systems — UPF (2019–2020)
BS, Theoretical Physics — UB (2014–2019)

Research

Publications

BMVC 2025 2025

Learning from Silence and Noise for Visual Sound Source Localization Models

Xavier Juanola, Giovana Morais, Magdalena Fuentes, Gloria Haro

Project arXiv PDF Code

CVPR Workshop 2025 2025

Visual Sound Source Localization: Assessing Performance with Both Positive and Negative Audio

Xavier Juanola, Giovana Morais, Magdalena Fuentes, Gloria Haro

Proceedings PDF

ICASSP 2025 2025

A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio

Xavier Juanola, Gloria Haro, Magdalena Fuentes

Project arXiv PDF Proceedings Code

IPOL MLBriefs 2024 2024

A Brief Analysis of SLAVC method for Sound Source Localization

Xavier Juanola, Gloria Haro

Paper page PDF

Work

Projects

SSL-SaN

Self-supervised model for visual sound source localization robust to negative audio — silence, noise, and offscreen sounds. Introduces new metrics and extended evaluation benchmark IS3+.

PyTorch Audio-Visual Self-supervised BMVC 2025

VSSL Eval

Extended benchmark and new metrics for rigorous evaluation of visual sound source localization models. Tests models in scenarios with negative audio (silence, noise, offscreen).

Evaluation Benchmarking ICASSP 2025

SLAVC Analysis

Interactive analysis and online executable demo of the SLAVC self-supervised sound source localization method. Users can test the model on custom image-audio pairs.

Demo IPOL 2024 Interactive

Expertise

Skills

Languages

Python
∫ MATLAB

ML / Deep Learning

PyTorch (torchvision, etc.)
HuggingFace
scikit-learn
OpenCLIP
timm
NumPy · Pandas
einops · Lightning
etc.

Audio / Vision

librosa
OpenCV
DINO · CLIP
etc.

Tools

Git
Docker · Singularity
HPC · SLURM
W&B
Jupyter · LaTeX
Power BI
etc.

Background

Experience

Researching audio-visual sound source localization in the IMVA group. Advisors: Gloria Haro (UPF) and Magdalena Fuentes (NYU). FPI doctoral scholarship. Part of the MuVAU project (Spanish Ministry of Science).

Deep LearningAudio-VisualResearch

Collaborated with Magdalena Fuentes on sound localization research. Culminated in ICASSP 2025 publication.

ResearchCollaboration

Led predictive modeling using Salesforce data to assess offer acceptance probabilities. Contributed to AI conversational agent development. Served as global AI consultant.

MLNLPConsulting

Built AI conversational agents for real estate and Liceu using NLP and neural networks. Later developed ML predictive models for Chupa Chups, COMSA Service, and real estate. Created Power BI dashboards.

NLPMLPower BI

Service

Organizing Committee — LAMIR 2024
Hackathon Organization — LAMIR 2024
Reviewer — LAMIR 2024

Education

Teaching

Calculus II (23952) — Labs & Seminars

Universitat Pompeu Fabra · Barcelona

Jan–March 2022 · 2023 · 2024 · 2025 4 years

Teaching assistant for laboratory sessions and seminars. Topics: multivariable functions, domain and image, curves and surfaces, partial derivatives, tangent subspaces, Taylor approximation, multiple integration, and gradient descent optimization.

Get in Touch

Contact

Interested in collaboration, research, or just want to chat about audio-visual ML? Reach out.

xavier.juanola@upf.edu Google Scholar xavijuanola xavijuanola

Download CV

Xavier Juanola.

Physicist turned AI Researcher

Education

Publications

Learning from Silence and Noise for Visual Sound Source Localization Models

Visual Sound Source Localization: Assessing Performance with Both Positive and Negative Audio

A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio

A Brief Analysis of SLAVC method for Sound Source Localization

Projects

SSL-SaN

VSSL Eval

SLAVC Analysis

Skills

Languages

ML / Deep Learning

Audio / Vision

Tools

Experience

PhD Candidate

Visiting Scholar

Data Scientist

AI/NLP Engineer → Data Scientist

Service

Teaching

Calculus II (23952) — Labs & Seminars

Contact