Page Not Found
Page not found. Your pixels are in another canvas. Read more
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas. Read more
This is a page not in th emain menu Read more
Published:
In this blog post we will examine what Reproducing Kernel Hilbert Spaces (RKHS) are and the important properties that make them useful in statistical machine learning and functional analysis. Without loss of generality, we will take the general metric space $(\mathcal{X}, d)$ to be the standard Euclidean metric space $(\mathbb{R}^n, d)$. Read more
Short description of portfolio item number 1 Read more
Short description of portfolio item number 2 Read more
Published in International Journal of Radiation Oncology, Biology, Physics, 2020
The purpose of this abstract is to describe the application of deep learning to digital histopathology slide data for detection of clinically relevant features. Deep learning is a form of artificial intelligence which can process graphical data and “learn” to extract hidden features. Here we test the ability of deep learning to detect human papilloma virus, location of origin, and other features. Read more
Recommended citation: Deep learning detects actionable molecular and clinical features directly from head/neck squamous cell carcinoma histopathology slides. J. Dolezal, J.N. Kather, S. Kochanny, J. Schulte, A. Patel, B. Munyampirwa, S. Morin, A. Srisuwananukorn, N. Cipriani, D. Basu, A. Pearson. International Journal of Radiation Oncology, Biology, Physics, Volume 106, Issue 5, 1165 https://www.redjournal.org/article/S0360-3016(19)34202-6/abstract
Published in ICML 2025 VecDB Workshop, 2025
Driven by recent breakthrough advances in neural representation learning, approximate near-neighbor (ANN) search over vector embeddings has emerged as a critical computational workload. With the introduction of the seminal Hierarchical Navigable Small World (HNSW) algorithm, graph-based indexes have established themselves as the overwhelmingly dominant paradigm for efficient and scalable ANN search. Read more
Recommended citation: Blaise Munyampirwa, Vihan Lakshman, Benjamin Coleman. Down with the Hierarchy: The H in HNSW stands for Hubs. https://arxiv.org/pdf/2412.01940
Published in Interspeech, 2025, 2025
Even state-of-the-art speaker diarization systems exhibit high variance in error rates across different datasets, representing numerous use cases and domains. Furthermore, comparing across systems requires careful application of best practices such as dataset splits and metric definitions to allow for applesto-apples comparison. We propose SDBench (Speaker Diarization Benchmark), an open-source benchmark suite that integrates 13 diverse datasets with built-in tooling for consistent and fine-grained analysis of speaker diarization performance for various on-device and server-side systems. SDBench1 enables reproducible evaluation and easy integration of new systems over time. To demonstrate the efficacy of SDBench, we built SpeakerKit, an inference efficiency-focused system built on top of Pyannote v3. SDBench enabled rapid execution of ablation studies that led to SpeakerKit being 9.6x faster than Pyannote v3 while achieving comparable error rates. Read more
Recommended citation: Berkin Durmus, Blaise Munyampirwa , Eduardo Pacheco, Atila Orhon, Andrey Leonov. SDBench: A Comprehensive Benchmark Suite for Speaker Diarization. https://arxiv.org/pdf/2507.16136
Published:
Near neighbor search over vector embeddings is a linchpin of modern ML infrastructure, forming a core component of established applications to search and retrieval as well as emerging LLM applications via retrieval-augmented generation (RAG). The seminal Hierarchical Navigable Small World (HNSW) graph index is perhaps the most popular choice in current vector database implementations. In this talk, we share two methods to significantly optimize the HNSW memory consumption and query latency, by removing the hierarchical component of the index and reordering the graph layout. Our extensive benchmark studies show that these methods are simple, easy to productionize, and offer robust performance improvements (on the order of 20-30% peak memory and latency). Read more