Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Page Not Found

Page not found. Your pixels are in another canvas. Read more

Page not in menu

This is a page not in th emain menu Read more

Jupyter notebook markdown generator

Posts

Reproducing Kernel Hilbert Spaces

7 minute read

Published: November 13, 2023

In this blog post we will examine what Reproducing Kernel Hilbert Spaces (RKHS) are and the important properties that make them useful in statistical machine learning and functional analysis. Without loss of generality, we will take the general metric space $(\mathcal{X}, d)$ to be the standard Euclidean metric space $(\mathbb{R}^n, d)$. Read more

portfolio

Portfolio item number 1

Short description of portfolio item number 1
Read more

Portfolio item number 2

Short description of portfolio item number 2
Read more

publications

Deep learning detects actionable molecular and clinical features directly from head/neck squamous cell carcinoma histopathology slides

Published in International Journal of Radiation Oncology, Biology, Physics, 2020

The purpose of this abstract is to describe the application of deep learning to digital histopathology slide data for detection of clinically relevant features. Deep learning is a form of artificial intelligence which can process graphical data and “learn” to extract hidden features. Here we test the ability of deep learning to detect human papilloma virus, location of origin, and other features. Read more

Recommended citation: Deep learning detects actionable molecular and clinical features directly from head/neck squamous cell carcinoma histopathology slides. J. Dolezal, J.N. Kather, S. Kochanny, J. Schulte, A. Patel, B. Munyampirwa, S. Morin, A. Srisuwananukorn, N. Cipriani, D. Basu, A. Pearson. International Journal of Radiation Oncology, Biology, Physics, Volume 106, Issue 5, 1165 https://www.redjournal.org/article/S0360-3016(19)34202-6/abstract

Down with the Hierarchy: The H in HNSW stands for Hubs

Published in ICML 2025 VecDB Workshop, 2025

Driven by recent breakthrough advances in neural representation learning, approximate near-neighbor (ANN) search over vector embeddings has emerged as a critical computational workload. With the introduction of the seminal Hierarchical Navigable Small World (HNSW) algorithm, graph-based indexes have established themselves as the overwhelmingly dominant paradigm for efficient and scalable ANN search. Read more

Recommended citation: Blaise Munyampirwa, Vihan Lakshman, Benjamin Coleman. Down with the Hierarchy: The H in HNSW stands for Hubs. https://arxiv.org/pdf/2412.01940

SDBench: A Comprehensive Benchmark Suite for Speaker Diarization

Published in Interspeech, 2025, 2025

Even state-of-the-art speaker diarization systems exhibit high variance in error rates across different datasets, representing numerous use cases and domains. Furthermore, comparing across systems requires careful application of best practices such as dataset splits and metric definitions to allow for applesto-apples comparison. We propose SDBench (Speaker Diarization Benchmark), an open-source benchmark suite that integrates 13 diverse datasets with built-in tooling for consistent and fine-grained analysis of speaker diarization performance for various on-device and server-side systems. SDBench1 enables reproducible evaluation and easy integration of new systems over time. To demonstrate the efficacy of SDBench, we built SpeakerKit, an inference efficiency-focused system built on top of Pyannote v3. SDBench enabled rapid execution of ablation studies that led to SpeakerKit being 9.6x faster than Pyannote v3 while achieving comparable error rates. Read more

Recommended citation: Berkin Durmus, Blaise Munyampirwa , Eduardo Pacheco, Atila Orhon, Andrey Leonov. SDBench: A Comprehensive Benchmark Suite for Speaker Diarization. https://arxiv.org/pdf/2507.16136

talks

Optimizing HNSW in the age of vector databases

Published: January 22, 2025

Near neighbor search over vector embeddings is a linchpin of modern ML infrastructure, forming a core component of established applications to search and retrieval as well as emerging LLM applications via retrieval-augmented generation (RAG). The seminal Hierarchical Navigable Small World (HNSW) graph index is perhaps the most popular choice in current vector database implementations. In this talk, we share two methods to significantly optimize the HNSW memory consumption and query latency, by removing the hierarchical component of the index and reordering the graph layout. Our extensive benchmark studies show that these methods are simple, easy to productionize, and offer robust performance improvements (on the order of 20-30% peak memory and latency). Read more

Blaise Munyampirwa

Sitemap

Pages

Page Not Found

About Me

Archive Layout with Content

Posts by Category

Posts by Collection

CV

Markdown

Page not in menu

Page Archive

Portfolio

Publications

Sitemap

Posts by Tags

Talk map

Talks and Presentations

Teaching

Terms and Privacy Policy

Blog posts

Jupyter notebook markdown generator

Posts

Reproducing Kernel Hilbert Spaces

portfolio

Portfolio item number 1

Portfolio item number 2

publications

Deep learning detects actionable molecular and clinical features directly from head/neck squamous cell carcinoma histopathology slides

Down with the Hierarchy: The H in HNSW stands for Hubs

SDBench: A Comprehensive Benchmark Suite for Speaker Diarization

talks

Optimizing HNSW in the age of vector databases

teaching