Hi, I'm

Xavier Juanola.

|

PhD candidate at UPF researching Audio-Visual Sound Source Localization. Bridging deep learning research and real-world ML engineering.

4Publications
4Talks
4Years Teaching
Scroll
Xavier Juanola Molet

Physicist turned AI Researcher

I'm Xavier, a PhD candidate at Universitat Pompeu Fabra (Barcelona), working on Audio-Visual Sound Source Localization within the IMVA group under Prof. Gloria Haro and co-advised by Prof. Magdalena Fuentes (NYU). My research asks: can models learn to localize sound in an image — even when the audio is silence, noise, or from offscreen?

My path has been unconventional. I started with a BS in Theoretical Physics (UB), followed by two master's degrees — Intelligent Interactive Systems (UPF) and Astrophysics & Cosmology (UAB). Before the PhD, I spent 3+ years as a Data Scientist and AI Engineer in industry (DRIVING01, Schneider Electric), building NLP systems and predictive models. A visiting scholar stint at NYU MARL in 2024 deepened my research collaborations and led to an ICASSP 2025 publication.

I'm supported by an FPI doctoral fellowship from Spain's Ministry of Science and am part of the MuVAU research project. When not doing research, I teach Calculus II labs at UPF.

Education

  • PhD, Information & Communication Technologies — UPF (2022–present)
  • MS, Astrophysics & Cosmology — UAB (2020–2021)
  • MS, Intelligent Interactive Systems — UPF (2019–2020)
  • BS, Theoretical Physics — UB (2014–2019)

Publications

Projects

SSL-SaN

Self-supervised model for visual sound source localization robust to negative audio — silence, noise, and offscreen sounds. Introduces new metrics and extended evaluation benchmark IS3+.

PyTorch Audio-Visual Self-supervised BMVC 2025

VSSL Eval

Extended benchmark and new metrics for rigorous evaluation of visual sound source localization models. Tests models in scenarios with negative audio (silence, noise, offscreen).

Evaluation Benchmarking ICASSP 2025

SLAVC Analysis

Interactive analysis and online executable demo of the SLAVC self-supervised sound source localization method. Users can test the model on custom image-audio pairs.

Demo IPOL 2024 Interactive

Skills

Languages

  • Python
  • MATLAB

ML / Deep Learning

  • PyTorch (torchvision, etc.)
  • HuggingFace
  • scikit-learn
  • OpenCLIP
  • timm
  • NumPy · Pandas
  • einops · Lightning
  • etc.

Audio / Vision

  • librosa
  • OpenCV
  • DINO · CLIP
  • etc.

Tools

  • Git
  • Docker · Singularity
  • HPC · SLURM
  • W&B
  • Jupyter · LaTeX
  • Power BI
  • etc.

Experience

PhD Candidate

Universitat Pompeu Fabra · Barcelona

Nov 2022 – present

Researching audio-visual sound source localization in the IMVA group. Advisors: Gloria Haro (UPF) and Magdalena Fuentes (NYU). FPI doctoral scholarship. Part of the MuVAU project (Spanish Ministry of Science).

Deep LearningAudio-VisualResearch

Visiting Scholar

NYU MARL · New York

2024

Collaborated with Magdalena Fuentes on sound localization research. Culminated in ICASSP 2025 publication.

ResearchCollaboration

Data Scientist

Schneider Electric · Barcelona

2021 – 2022

Led predictive modeling using Salesforce data to assess offer acceptance probabilities. Contributed to AI conversational agent development. Served as global AI consultant.

MLNLPConsulting

AI/NLP Engineer → Data Scientist

DRIVING01 · Barcelona

2019 – 2021

Built AI conversational agents for real estate and Liceu using NLP and neural networks. Later developed ML predictive models for Chupa Chups, COMSA Service, and real estate. Created Power BI dashboards.

NLPMLPower BI

Service

Teaching

Calculus II (23952) — Labs & Seminars

Universitat Pompeu Fabra · Barcelona

Jan–March 2022 · 2023 · 2024 · 2025 4 years

Teaching assistant for laboratory sessions and seminars. Topics: multivariable functions, domain and image, curves and surfaces, partial derivatives, tangent subspaces, Taylor approximation, multiple integration, and gradient descent optimization.

Contact

Interested in collaboration, research, or just want to chat about audio-visual ML? Reach out.