FlatNav Index Module

This module provides interfaces to create and manipulate FlatNav index structures.

create

flatnav.index.create(distance_type: str, dim: int, dataset_size: int, max_edges_per_node: int, index_data_type: flatnav.data_type.DataType = <DataType.float32: 9>, verbose: bool = False, collect_stats: bool = False) object

Constructs a an in-memory index with the parameters. Args:

distance_type (str): The type of distance metric to use (‘l2’ for Euclidean, ‘angular’ for inner product). dim (int): The number of dimensions in the dataset. dataset_size (int): The number of vectors in the dataset. max_edges_per_node (int): The maximum number of edges per node in the graph. verbose (bool, optional): Enables verbose output. Defaults to False. collect_stats (bool, optional): Collects performance statistics. Defaults to False.

Returns:

Union[IndexL2Float, IndexIPFloat]: The constructed index.

Index Classes

Classes for managing FlatNav indices.

IndexL2Float

class flatnav.index.IndexL2Float

Bases: pybind11_object

add(self: flatnav.index.IndexL2Float, data: numpy.ndarray, ef_construction: int, num_initializations: int = 100, labels: object = None) None

Add vectors(data) to the index with the given ef_construction parameter and optional labels. ef_construction determines how many vertices are visited while inserting every vector in the underlying graph structure. Args:

data (np.ndarray): The data to add to the index. ef_construction (int): The number of vertices to visit while inserting every vector in the graph. num_initializations (int, optional): The number of initializations to perform. Defaults to 100. labels (Optional[np.ndarray], optional): The labels for the data. Defaults to None.

Returns:

None

allocate_nodes(self: flatnav.index.IndexL2Float, data: numpy.ndarray[numpy.float32]) flatnav.index.IndexL2Float

Allocate nodes in the underlying graph structure for the given data. Unlike the add method, this method does not construct the edge connectivity. It only allocates memory for each node in the graph. When using this method, you should invoke build_graph_links explicity. `NOTE`: In most cases you should not need to use this method. Args:

data (np.ndarray): The data to add to the index.

Returns:

None

Construct the edge connectivity of the underlying graph. This method should be invoked after allocating nodes using the allocate_nodes method. Args:

mtx_filename (str): The filename of the matrix file.

Returns:

None

get_graph_outdegree_table(self: flatnav.index.IndexL2Float) List[List[int]]

Returns the outdegree table (adjacency list) representation of the underlying graph. Returns:

List[List[int]]: The outdegree table.

get_query_distance_computations(self: flatnav.index.IndexL2Float) int

Returns the number of distance computations performed during the last search operation. This method also resets the distance computations counter. Returns:

int: The number of distance computations.

static load_index(filename: str) flatnav.index.IndexL2Float

Load a FlatNav index from a given file location. Args:

filename (str): The file location to load the index from.

Returns:

Union[L2Inde, IndexIPFloat]: The loaded index.

property max_edges_per_node
property num_threads

Returns the number of threads used for constructing the graph and/or performing KNN search. Returns:

int: The number of threads.

reorder(self: flatnav.index.IndexL2Float, strategies: List[str]) None

Perform graph re-ordering based on the given sequence of re-ordering strategies. Supported re-ordering strategies include gorder and rcm. Reference:

  1. Graph Reordering for Cache-Efficient Near Neighbor Search: https://arxiv.org/pdf/2104.03221

Args:

strategies (List[str]): The sequence of re-ordering strategies.

Returns:

None

save(self: flatnav.index.IndexL2Float, filename: str) None

Save a FlatNav index at the given file location. Args:

filename (str): The file location to save the index.

Returns:

None

search(self: flatnav.index.IndexL2Float, queries: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

This is a batched version of the search_single method. Return top K closest data points for every query in the provided queries. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for every query.

Args:

queries (np.ndarray): The query vectors. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for every query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.

Returns:

Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

search_single(self: flatnav.index.IndexL2Float, query: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

Return top K closest data points for the given query. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for the query.

Args:

query (np.ndarray): The query vector. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for the query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.

Returns:

Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

set_num_threads(self: flatnav.index.IndexL2Float, num_threads: int) None

Set the number of threads to use for constructing the graph and/or performing KNN search. Args:

num_threads (int): The number of threads to use.

Returns:

None

IndexL2Uint8

class flatnav.index.IndexL2Uint8

Bases: pybind11_object

add(self: flatnav.index.IndexL2Uint8, data: numpy.ndarray, ef_construction: int, num_initializations: int = 100, labels: object = None) None

Add vectors(data) to the index with the given ef_construction parameter and optional labels. ef_construction determines how many vertices are visited while inserting every vector in the underlying graph structure. Args:

data (np.ndarray): The data to add to the index. ef_construction (int): The number of vertices to visit while inserting every vector in the graph. num_initializations (int, optional): The number of initializations to perform. Defaults to 100. labels (Optional[np.ndarray], optional): The labels for the data. Defaults to None.

Returns:

None

allocate_nodes(self: flatnav.index.IndexL2Uint8, data: numpy.ndarray[numpy.float32]) flatnav.index.IndexL2Uint8

Allocate nodes in the underlying graph structure for the given data. Unlike the add method, this method does not construct the edge connectivity. It only allocates memory for each node in the graph. When using this method, you should invoke build_graph_links explicity. `NOTE`: In most cases you should not need to use this method. Args:

data (np.ndarray): The data to add to the index.

Returns:

None

Construct the edge connectivity of the underlying graph. This method should be invoked after allocating nodes using the allocate_nodes method. Args:

mtx_filename (str): The filename of the matrix file.

Returns:

None

get_graph_outdegree_table(self: flatnav.index.IndexL2Uint8) List[List[int]]

Returns the outdegree table (adjacency list) representation of the underlying graph. Returns:

List[List[int]]: The outdegree table.

get_query_distance_computations(self: flatnav.index.IndexL2Uint8) int

Returns the number of distance computations performed during the last search operation. This method also resets the distance computations counter. Returns:

int: The number of distance computations.

static load_index(filename: str) flatnav.index.IndexL2Uint8

Load a FlatNav index from a given file location. Args:

filename (str): The file location to load the index from.

Returns:

Union[L2Inde, IndexIPFloat]: The loaded index.

property max_edges_per_node
property num_threads

Returns the number of threads used for constructing the graph and/or performing KNN search. Returns:

int: The number of threads.

reorder(self: flatnav.index.IndexL2Uint8, strategies: List[str]) None

Perform graph re-ordering based on the given sequence of re-ordering strategies. Supported re-ordering strategies include gorder and rcm. Reference:

  1. Graph Reordering for Cache-Efficient Near Neighbor Search: https://arxiv.org/pdf/2104.03221

Args:

strategies (List[str]): The sequence of re-ordering strategies.

Returns:

None

save(self: flatnav.index.IndexL2Uint8, filename: str) None

Save a FlatNav index at the given file location. Args:

filename (str): The file location to save the index.

Returns:

None

search(self: flatnav.index.IndexL2Uint8, queries: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

This is a batched version of the search_single method. Return top K closest data points for every query in the provided queries. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for every query.

Args:

queries (np.ndarray): The query vectors. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for every query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.

Returns:

Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

search_single(self: flatnav.index.IndexL2Uint8, query: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

Return top K closest data points for the given query. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for the query.

Args:

query (np.ndarray): The query vector. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for the query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.

Returns:

Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

set_num_threads(self: flatnav.index.IndexL2Uint8, num_threads: int) None

Set the number of threads to use for constructing the graph and/or performing KNN search. Args:

num_threads (int): The number of threads to use.

Returns:

None

IndexIPFloat

class flatnav.index.IndexIPFloat

Bases: pybind11_object

add(self: flatnav.index.IndexIPFloat, data: numpy.ndarray, ef_construction: int, num_initializations: int = 100, labels: object = None) None

Add vectors(data) to the index with the given ef_construction parameter and optional labels. ef_construction determines how many vertices are visited while inserting every vector in the underlying graph structure. Args:

data (np.ndarray): The data to add to the index. ef_construction (int): The number of vertices to visit while inserting every vector in the graph. num_initializations (int, optional): The number of initializations to perform. Defaults to 100. labels (Optional[np.ndarray], optional): The labels for the data. Defaults to None.

Returns:

None

allocate_nodes(self: flatnav.index.IndexIPFloat, data: numpy.ndarray[numpy.float32]) flatnav.index.IndexIPFloat

Allocate nodes in the underlying graph structure for the given data. Unlike the add method, this method does not construct the edge connectivity. It only allocates memory for each node in the graph. When using this method, you should invoke build_graph_links explicity. `NOTE`: In most cases you should not need to use this method. Args:

data (np.ndarray): The data to add to the index.

Returns:

None

Construct the edge connectivity of the underlying graph. This method should be invoked after allocating nodes using the allocate_nodes method. Args:

mtx_filename (str): The filename of the matrix file.

Returns:

None

get_graph_outdegree_table(self: flatnav.index.IndexIPFloat) List[List[int]]

Returns the outdegree table (adjacency list) representation of the underlying graph. Returns:

List[List[int]]: The outdegree table.

get_query_distance_computations(self: flatnav.index.IndexIPFloat) int

Returns the number of distance computations performed during the last search operation. This method also resets the distance computations counter. Returns:

int: The number of distance computations.

static load_index(filename: str) flatnav.index.IndexIPFloat

Load a FlatNav index from a given file location. Args:

filename (str): The file location to load the index from.

Returns:

Union[L2Inde, IndexIPFloat]: The loaded index.

property max_edges_per_node
property num_threads

Returns the number of threads used for constructing the graph and/or performing KNN search. Returns:

int: The number of threads.

reorder(self: flatnav.index.IndexIPFloat, strategies: List[str]) None

Perform graph re-ordering based on the given sequence of re-ordering strategies. Supported re-ordering strategies include gorder and rcm. Reference:

  1. Graph Reordering for Cache-Efficient Near Neighbor Search: https://arxiv.org/pdf/2104.03221

Args:

strategies (List[str]): The sequence of re-ordering strategies.

Returns:

None

save(self: flatnav.index.IndexIPFloat, filename: str) None

Save a FlatNav index at the given file location. Args:

filename (str): The file location to save the index.

Returns:

None

search(self: flatnav.index.IndexIPFloat, queries: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

This is a batched version of the search_single method. Return top K closest data points for every query in the provided queries. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for every query.

Args:

queries (np.ndarray): The query vectors. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for every query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.

Returns:

Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

search_single(self: flatnav.index.IndexIPFloat, query: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

Return top K closest data points for the given query. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for the query.

Args:

query (np.ndarray): The query vector. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for the query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.

Returns:

Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

set_num_threads(self: flatnav.index.IndexIPFloat, num_threads: int) None

Set the number of threads to use for constructing the graph and/or performing KNN search. Args:

num_threads (int): The number of threads to use.

Returns:

None

IndexIPUint8

class flatnav.index.IndexIPUint8

Bases: pybind11_object

add(self: flatnav.index.IndexIPUint8, data: numpy.ndarray, ef_construction: int, num_initializations: int = 100, labels: object = None) None

Add vectors(data) to the index with the given ef_construction parameter and optional labels. ef_construction determines how many vertices are visited while inserting every vector in the underlying graph structure. Args:

data (np.ndarray): The data to add to the index. ef_construction (int): The number of vertices to visit while inserting every vector in the graph. num_initializations (int, optional): The number of initializations to perform. Defaults to 100. labels (Optional[np.ndarray], optional): The labels for the data. Defaults to None.

Returns:

None

allocate_nodes(self: flatnav.index.IndexIPUint8, data: numpy.ndarray[numpy.float32]) flatnav.index.IndexIPUint8

Allocate nodes in the underlying graph structure for the given data. Unlike the add method, this method does not construct the edge connectivity. It only allocates memory for each node in the graph. When using this method, you should invoke build_graph_links explicity. `NOTE`: In most cases you should not need to use this method. Args:

data (np.ndarray): The data to add to the index.

Returns:

None

Construct the edge connectivity of the underlying graph. This method should be invoked after allocating nodes using the allocate_nodes method. Args:

mtx_filename (str): The filename of the matrix file.

Returns:

None

get_graph_outdegree_table(self: flatnav.index.IndexIPUint8) List[List[int]]

Returns the outdegree table (adjacency list) representation of the underlying graph. Returns:

List[List[int]]: The outdegree table.

get_query_distance_computations(self: flatnav.index.IndexIPUint8) int

Returns the number of distance computations performed during the last search operation. This method also resets the distance computations counter. Returns:

int: The number of distance computations.

static load_index(filename: str) flatnav.index.IndexIPUint8

Load a FlatNav index from a given file location. Args:

filename (str): The file location to load the index from.

Returns:

Union[L2Inde, IndexIPFloat]: The loaded index.

property max_edges_per_node
property num_threads

Returns the number of threads used for constructing the graph and/or performing KNN search. Returns:

int: The number of threads.

reorder(self: flatnav.index.IndexIPUint8, strategies: List[str]) None

Perform graph re-ordering based on the given sequence of re-ordering strategies. Supported re-ordering strategies include gorder and rcm. Reference:

  1. Graph Reordering for Cache-Efficient Near Neighbor Search: https://arxiv.org/pdf/2104.03221

Args:

strategies (List[str]): The sequence of re-ordering strategies.

Returns:

None

save(self: flatnav.index.IndexIPUint8, filename: str) None

Save a FlatNav index at the given file location. Args:

filename (str): The file location to save the index.

Returns:

None

search(self: flatnav.index.IndexIPUint8, queries: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

This is a batched version of the search_single method. Return top K closest data points for every query in the provided queries. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for every query.

Args:

queries (np.ndarray): The query vectors. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for every query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.

Returns:

Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

search_single(self: flatnav.index.IndexIPUint8, query: numpy.ndarray, K: int, ef_search: int, num_initializations: int = 100) Tuple[numpy.ndarray[numpy.float32], numpy.ndarray[numpy.int32]]

Return top K closest data points for the given query. The results are returned as a Tuple of distances and label ID’s. The ef_search parameter determines how many neighbors are visited while finding the closest neighbors for the query.

Args:

query (np.ndarray): The query vector. K (int): The number of neighbors to return. ef_search (int): The number of neighbors to visit while finding the closest neighbors for the query. num_initializations (int, optional): The number of initializations to perform. Defaults to 100.

Returns:

Tuple[np.ndarray, np.ndarray]: The distances and label ID’s of the closest neighbors.

set_num_threads(self: flatnav.index.IndexIPUint8, num_threads: int) None

Set the number of threads to use for constructing the graph and/or performing KNN search. Args:

num_threads (int): The number of threads to use.

Returns:

None