FlatNav Index Module

This module provides interfaces to create and manipulate FlatNav index structures.

create

flatnav.index.create(distance_type: str, dim: int, dataset_size: int, max_edges_per_node: int, index_data_type: flatnav.data_type.DataType = <DataType.float32: 9>, verbose: bool = False, collect_stats: bool = False) → object

Constructs a an in-memory index with the parameters. Args:

distance_type (str): The type of distance metric to use (‘l2’ for Euclidean, ‘angular’ for inner product). dim (int): The number of dimensions in the dataset. dataset_size (int): The number of vectors in the dataset. max_edges_per_node (int): The maximum number of edges per node in the graph. verbose (bool, optional): Enables verbose output. Defaults to False. collect_stats (bool, optional): Collects performance statistics. Defaults to False.

Returns:: Union[IndexL2Float, IndexIPFloat]: The constructed index.

Index Classes

Classes for managing FlatNav indices.

IndexL2Float

class flatnav.index.IndexL2Float

Bases: pybind11_object

add(self: flatnav.index.IndexL2Float, data: numpy.ndarray, ef_construction: int, num_initializations: int = 100, labels: object = None) → None

Add vectors(data) to the index with the given ef_construction parameter and optional labels. ef_construction determines how many vertices are visited while inserting every vector in the underlying graph structure. Args:

data (np.ndarray): The data to add to the index. ef_construction (int): The number of vertices to visit while inserting every vector in the graph. num_initializations (int, optional): The number of initializations to perform. Defaults to 100. labels (Optional[np.ndarray], optional): The labels for the data. Defaults to None.

Returns:: None

allocate_nodes(self: flatnav.index.IndexL2Float, data: numpy.ndarray[numpy.float32]) → flatnav.index.IndexL2Float

Allocate nodes in the underlying graph structure for the given data. Unlike the add method, this method does not construct the edge connectivity. It only allocates memory for each node in the graph. When using this method, you should invoke build_graph_links explicity. `NOTE`: In most cases you should not need to use this method. Args:

data (np.ndarray): The data to add to the index.

Returns:: None

build_graph_links(self: flatnav.index.IndexL2Float, mtx_filename: str) → None

Construct the edge connectivity of the underlying graph. This method should be invoked after allocating nodes using the allocate_nodes method. Args:

mtx_filename (str): The filename of the matrix file.

Returns:: None

get_graph_outdegree_table(self: flatnav.index.IndexL2Float) → List[List[int]]: Returns the outdegree table (adjacency list) representation of the underlying graph. Returns:

List[List[int]]: The outdegree table.

get_query_distance_computations(self: flatnav.index.IndexL2Float) → int: Returns the number of distance computations performed during the last search operation. This method also resets the distance computations counter. Returns:

int: The number of distance computations.

static load_index(filename: str) → flatnav.index.IndexL2Float

Load a FlatNav index from a given file location. Args:

filename (str): The file location to load the index from.

Returns:: Union[L2Inde, IndexIPFloat]: The loaded index.

property max_edges_per_node

property num_threads: Returns the number of threads used for constructing the graph and/or performing KNN search. Returns:

int: The number of threads.

reorder(self: flatnav.index.IndexL2Float, strategies: List[str]) → None

Perform graph re-ordering based on the given sequence of re-ordering strategies. Supported re-ordering strategies include gorder and rcm. Reference:

Graph Reordering for Cache-Efficient Near Neighbor Search: https://arxiv.org/pdf/2104.03221

Args:: strategies (List[str]): The sequence of re-ordering strategies.
Returns:: None

save(self: flatnav.index.IndexL2Float, filename: str) → None

Save a FlatNav index at the given file location. Args:

filename (str): The file location to save the index.

Returns:: None

search(self: flatnav.index.IndexL2Float, queries: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) → Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

This is a batched version of the search_single method. Return top K closest data points for every query in the provided queries. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for every query.

Args:: queries (np.ndarray): The query vectors. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for every query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.
Returns:: Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

search_single(self: flatnav.index.IndexL2Float, query: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) → Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

Return top K closest data points for the given query. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for the query.

Args:: query (np.ndarray): The query vector. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for the query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.
Returns:: Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

set_num_threads(self: flatnav.index.IndexL2Float, num_threads: int) → None

Set the number of threads to use for constructing the graph and/or performing KNN search. Args:

num_threads (int): The number of threads to use.

Returns:: None

IndexL2Uint8

class flatnav.index.IndexL2Uint8

Bases: pybind11_object

add(self: flatnav.index.IndexL2Uint8, data: numpy.ndarray, ef_construction: int, num_initializations: int = 100, labels: object = None) → None

Add vectors(data) to the index with the given ef_construction parameter and optional labels. ef_construction determines how many vertices are visited while inserting every vector in the underlying graph structure. Args:

data (np.ndarray): The data to add to the index. ef_construction (int): The number of vertices to visit while inserting every vector in the graph. num_initializations (int, optional): The number of initializations to perform. Defaults to 100. labels (Optional[np.ndarray], optional): The labels for the data. Defaults to None.

Returns:: None

allocate_nodes(self: flatnav.index.IndexL2Uint8, data: numpy.ndarray[numpy.float32]) → flatnav.index.IndexL2Uint8

Allocate nodes in the underlying graph structure for the given data. Unlike the add method, this method does not construct the edge connectivity. It only allocates memory for each node in the graph. When using this method, you should invoke build_graph_links explicity. `NOTE`: In most cases you should not need to use this method. Args:

data (np.ndarray): The data to add to the index.

Returns:: None

build_graph_links(self: flatnav.index.IndexL2Uint8, mtx_filename: str) → None

Construct the edge connectivity of the underlying graph. This method should be invoked after allocating nodes using the allocate_nodes method. Args:

mtx_filename (str): The filename of the matrix file.

Returns:: None

get_graph_outdegree_table(self: flatnav.index.IndexL2Uint8) → List[List[int]]: Returns the outdegree table (adjacency list) representation of the underlying graph. Returns:

List[List[int]]: The outdegree table.

get_query_distance_computations(self: flatnav.index.IndexL2Uint8) → int: Returns the number of distance computations performed during the last search operation. This method also resets the distance computations counter. Returns:

int: The number of distance computations.

static load_index(filename: str) → flatnav.index.IndexL2Uint8

Load a FlatNav index from a given file location. Args:

filename (str): The file location to load the index from.

Returns:: Union[L2Inde, IndexIPFloat]: The loaded index.

property max_edges_per_node

property num_threads: Returns the number of threads used for constructing the graph and/or performing KNN search. Returns:

int: The number of threads.

reorder(self: flatnav.index.IndexL2Uint8, strategies: List[str]) → None

Perform graph re-ordering based on the given sequence of re-ordering strategies. Supported re-ordering strategies include gorder and rcm. Reference:

Graph Reordering for Cache-Efficient Near Neighbor Search: https://arxiv.org/pdf/2104.03221

Args:: strategies (List[str]): The sequence of re-ordering strategies.
Returns:: None

save(self: flatnav.index.IndexL2Uint8, filename: str) → None

Save a FlatNav index at the given file location. Args:

filename (str): The file location to save the index.

Returns:: None

search(self: flatnav.index.IndexL2Uint8, queries: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) → Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

This is a batched version of the search_single method. Return top K closest data points for every query in the provided queries. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for every query.

Args:: queries (np.ndarray): The query vectors. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for every query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.
Returns:: Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

search_single(self: flatnav.index.IndexL2Uint8, query: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) → Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

Return top K closest data points for the given query. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for the query.

Args:: query (np.ndarray): The query vector. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for the query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.
Returns:: Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

set_num_threads(self: flatnav.index.IndexL2Uint8, num_threads: int) → None

Set the number of threads to use for constructing the graph and/or performing KNN search. Args:

num_threads (int): The number of threads to use.

Returns:: None

IndexIPFloat

class flatnav.index.IndexIPFloat

Bases: pybind11_object

add(self: flatnav.index.IndexIPFloat, data: numpy.ndarray, ef_construction: int, num_initializations: int = 100, labels: object = None) → None

Add vectors(data) to the index with the given ef_construction parameter and optional labels. ef_construction determines how many vertices are visited while inserting every vector in the underlying graph structure. Args:

data (np.ndarray): The data to add to the index. ef_construction (int): The number of vertices to visit while inserting every vector in the graph. num_initializations (int, optional): The number of initializations to perform. Defaults to 100. labels (Optional[np.ndarray], optional): The labels for the data. Defaults to None.

Returns:: None

allocate_nodes(self: flatnav.index.IndexIPFloat, data: numpy.ndarray[numpy.float32]) → flatnav.index.IndexIPFloat

Allocate nodes in the underlying graph structure for the given data. Unlike the add method, this method does not construct the edge connectivity. It only allocates memory for each node in the graph. When using this method, you should invoke build_graph_links explicity. `NOTE`: In most cases you should not need to use this method. Args:

data (np.ndarray): The data to add to the index.

Returns:: None

build_graph_links(self: flatnav.index.IndexIPFloat, mtx_filename: str) → None

Construct the edge connectivity of the underlying graph. This method should be invoked after allocating nodes using the allocate_nodes method. Args:

mtx_filename (str): The filename of the matrix file.

Returns:: None

get_graph_outdegree_table(self: flatnav.index.IndexIPFloat) → List[List[int]]: Returns the outdegree table (adjacency list) representation of the underlying graph. Returns:

List[List[int]]: The outdegree table.

get_query_distance_computations(self: flatnav.index.IndexIPFloat) → int: Returns the number of distance computations performed during the last search operation. This method also resets the distance computations counter. Returns:

int: The number of distance computations.

static load_index(filename: str) → flatnav.index.IndexIPFloat

Load a FlatNav index from a given file location. Args:

filename (str): The file location to load the index from.

Returns:: Union[L2Inde, IndexIPFloat]: The loaded index.

property max_edges_per_node

property num_threads: Returns the number of threads used for constructing the graph and/or performing KNN search. Returns:

int: The number of threads.

reorder(self: flatnav.index.IndexIPFloat, strategies: List[str]) → None

Perform graph re-ordering based on the given sequence of re-ordering strategies. Supported re-ordering strategies include gorder and rcm. Reference:

Graph Reordering for Cache-Efficient Near Neighbor Search: https://arxiv.org/pdf/2104.03221

Args:: strategies (List[str]): The sequence of re-ordering strategies.
Returns:: None

save(self: flatnav.index.IndexIPFloat, filename: str) → None

Save a FlatNav index at the given file location. Args:

filename (str): The file location to save the index.

Returns:: None

search(self: flatnav.index.IndexIPFloat, queries: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) → Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

This is a batched version of the search_single method. Return top K closest data points for every query in the provided queries. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for every query.

Args:: queries (np.ndarray): The query vectors. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for every query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.
Returns:: Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

search_single(self: flatnav.index.IndexIPFloat, query: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) → Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

Return top K closest data points for the given query. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for the query.

Args:: query (np.ndarray): The query vector. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for the query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.
Returns:: Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

set_num_threads(self: flatnav.index.IndexIPFloat, num_threads: int) → None

Set the number of threads to use for constructing the graph and/or performing KNN search. Args:

num_threads (int): The number of threads to use.

Returns:: None

IndexIPUint8

class flatnav.index.IndexIPUint8

Bases: pybind11_object

add(self: flatnav.index.IndexIPUint8, data: numpy.ndarray, ef_construction: int, num_initializations: int = 100, labels: object = None) → None

Add vectors(data) to the index with the given ef_construction parameter and optional labels. ef_construction determines how many vertices are visited while inserting every vector in the underlying graph structure. Args:

data (np.ndarray): The data to add to the index. ef_construction (int): The number of vertices to visit while inserting every vector in the graph. num_initializations (int, optional): The number of initializations to perform. Defaults to 100. labels (Optional[np.ndarray], optional): The labels for the data. Defaults to None.

Returns:: None

allocate_nodes(self: flatnav.index.IndexIPUint8, data: numpy.ndarray[numpy.float32]) → flatnav.index.IndexIPUint8

Allocate nodes in the underlying graph structure for the given data. Unlike the add method, this method does not construct the edge connectivity. It only allocates memory for each node in the graph. When using this method, you should invoke build_graph_links explicity. `NOTE`: In most cases you should not need to use this method. Args:

data (np.ndarray): The data to add to the index.

Returns:: None

build_graph_links(self: flatnav.index.IndexIPUint8, mtx_filename: str) → None

Construct the edge connectivity of the underlying graph. This method should be invoked after allocating nodes using the allocate_nodes method. Args:

mtx_filename (str): The filename of the matrix file.

Returns:: None

get_graph_outdegree_table(self: flatnav.index.IndexIPUint8) → List[List[int]]: Returns the outdegree table (adjacency list) representation of the underlying graph. Returns:

List[List[int]]: The outdegree table.

get_query_distance_computations(self: flatnav.index.IndexIPUint8) → int: Returns the number of distance computations performed during the last search operation. This method also resets the distance computations counter. Returns:

int: The number of distance computations.

static load_index(filename: str) → flatnav.index.IndexIPUint8

Load a FlatNav index from a given file location. Args:

filename (str): The file location to load the index from.

Returns:: Union[L2Inde, IndexIPFloat]: The loaded index.

property max_edges_per_node

property num_threads: Returns the number of threads used for constructing the graph and/or performing KNN search. Returns:

int: The number of threads.

reorder(self: flatnav.index.IndexIPUint8, strategies: List[str]) → None

Perform graph re-ordering based on the given sequence of re-ordering strategies. Supported re-ordering strategies include gorder and rcm. Reference:

Graph Reordering for Cache-Efficient Near Neighbor Search: https://arxiv.org/pdf/2104.03221

Args:: strategies (List[str]): The sequence of re-ordering strategies.
Returns:: None

save(self: flatnav.index.IndexIPUint8, filename: str) → None

Save a FlatNav index at the given file location. Args:

filename (str): The file location to save the index.

Returns:: None

search(self: flatnav.index.IndexIPUint8, queries: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) → Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

This is a batched version of the search_single method. Return top K closest data points for every query in the provided queries. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for every query.

Args:: queries (np.ndarray): The query vectors. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for every query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.
Returns:: Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

search_single(self: flatnav.index.IndexIPUint8, query: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) → Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

Return top K closest data points for the given query. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for the query.

Args:: query (np.ndarray): The query vector. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for the query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.
Returns:: Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

set_num_threads(self: flatnav.index.IndexIPUint8, num_threads: int) → None

Set the number of threads to use for constructing the graph and/or performing KNN search. Args:

num_threads (int): The number of threads to use.

Returns:: None