Publications

SDBench: A Comprehensive Benchmark Suite for Speaker Diarization

Published in Interspeech, 2025, 2025

Even state-of-the-art speaker diarization systems exhibit high variance in error rates across different datasets, representing numerous use cases and domains. Furthermore, comparing across systems requires careful application of best practices such as dataset splits and metric definitions to allow for applesto-apples comparison. We propose SDBench (Speaker Diarization Benchmark), an open-source benchmark suite that integrates 13 diverse datasets with built-in tooling for consistent and fine-grained analysis of speaker diarization performance for various on-device and server-side systems. SDBench1 enables reproducible evaluation and easy integration of new systems over time. To demonstrate the efficacy of SDBench, we built SpeakerKit, an inference efficiency-focused system built on top of Pyannote v3. SDBench enabled rapid execution of ablation studies that led to SpeakerKit being 9.6x faster than Pyannote v3 while achieving comparable error rates. Read more

Recommended citation: Berkin Durmus, Blaise Munyampirwa , Eduardo Pacheco, Atila Orhon, Andrey Leonov. SDBench: A Comprehensive Benchmark Suite for Speaker Diarization. https://arxiv.org/pdf/2507.16136

Down with the Hierarchy: The H in HNSW stands for Hubs

Published in ICML 2025 VecDB Workshop, 2025

Driven by recent breakthrough advances in neural representation learning, approximate near-neighbor (ANN) search over vector embeddings has emerged as a critical computational workload. With the introduction of the seminal Hierarchical Navigable Small World (HNSW) algorithm, graph-based indexes have established themselves as the overwhelmingly dominant paradigm for efficient and scalable ANN search. Read more

Recommended citation: Blaise Munyampirwa, Vihan Lakshman, Benjamin Coleman. Down with the Hierarchy: The H in HNSW stands for Hubs. https://arxiv.org/pdf/2412.01940

Deep learning detects actionable molecular and clinical features directly from head/neck squamous cell carcinoma histopathology slides

Published in International Journal of Radiation Oncology, Biology, Physics, 2020

The purpose of this abstract is to describe the application of deep learning to digital histopathology slide data for detection of clinically relevant features. Deep learning is a form of artificial intelligence which can process graphical data and “learn” to extract hidden features. Here we test the ability of deep learning to detect human papilloma virus, location of origin, and other features. Read more

Recommended citation: Deep learning detects actionable molecular and clinical features directly from head/neck squamous cell carcinoma histopathology slides. J. Dolezal, J.N. Kather, S. Kochanny, J. Schulte, A. Patel, B. Munyampirwa, S. Morin, A. Srisuwananukorn, N. Cipriani, D. Basu, A. Pearson. International Journal of Radiation Oncology, Biology, Physics, Volume 106, Issue 5, 1165 https://www.redjournal.org/article/S0360-3016(19)34202-6/abstract