vocabtree  0.0.1
Dataset Class Referenceabstract

The Dataset class is an abstract wrapper describing a dataset. More...

#include <dataset.hpp>

Inheritance diagram for Dataset:

Public Member Functions

 Dataset (const std::string &base_location)
 Constructs a dataset given a base location. More...
 
 Dataset (const std::string &base_location, const std::string &db_data_location)
 Loads a dataset from the db_data_location. More...
 
virtual ~Dataset ()
 
virtual bool write (const std::string &db_data_location)=0
 Writes the dataset mapping to the input data location. More...
 
virtual bool read (const std::string &db_data_location)=0
 Reads the dataset mapping from the input data location. More...
 
virtual std::shared_ptr< Imageimage (uint64_t id) const =0
 Given a unique integer ID, returns an Image associated with that ID. More...
 
virtual uint64_t num_images () const =0
 Returns the number of images in the dataset. More...
 
std::string location () const
 Returns the absolute path of the data directory. More...
 
std::string location (const std::string &relative_path) const
 Returns the absolute path of the file (appends the file path to the database path). More...
 
virtual bool add_image (const std::shared_ptr< const Image > &image)=0
 Adds the given image to the database, if there is an id collision, will not add the image and return false, otherwise returns true. More...
 
std::vector< std::shared_ptr
< const Image > > 
all_images () const
 Returns a vector of all images in the dataset. More...
 
std::vector< std::shared_ptr
< const Image > > 
random_images (size_t count) const
 Returns a vector of random images in the dataset of size count. More...
 
std::vector< Datasetshard (const std::vector< std::string > &new_locations)
 : Shards the dataset to the new input locations, and returns the sharded datasets More...
 

Protected Attributes

std::string data_directory
 

Detailed Description

The Dataset class is an abstract wrapper describing a dataset.

A dataset consiste of the actual data, plus a way to convert the images, or frames of a video into an integer index. The dataset should at minimum provide an easy way to map image paths to unique integers. For a sample implementation of a Dataset see the SimpleDataset class.

Combined with the Image class implementation, a Dataset + Image provides a way to find relevant paths for features and images. Note that the implementation of a Dataset or Image class should implement a relative path to the Image data, with the absolute path being interchangebale.

Definition at line 17 of file dataset.hpp.

Constructor & Destructor Documentation

Dataset::Dataset ( const std::string &  base_location)

Constructs a dataset given a base location.

An example base location might be /c/data/. Given this base location, an implementation of the Dataset should find all the data and construct a mapping between the data and the id, for example by searching through base_location + /images/.

Definition at line 7 of file dataset.cxx.

References data_directory.

7  {
8  data_directory = base_location;
9 }
Dataset::Dataset ( const std::string &  base_location,
const std::string &  db_data_location 
)

Loads a dataset from the db_data_location.

The base_location provides the absolute path of data.

Definition at line 11 of file dataset.cxx.

References data_directory.

11  {
12  data_directory = base_location;
13 }
Dataset::~Dataset ( )
virtual

Definition at line 15 of file dataset.cxx.

15 { }

Member Function Documentation

virtual bool Dataset::add_image ( const std::shared_ptr< const Image > &  image)
pure virtual

Adds the given image to the database, if there is an id collision, will not add the image and return false, otherwise returns true.

Implemented in SimpleDataset.

std::vector< std::shared_ptr< const Image > > Dataset::all_images ( ) const

Returns a vector of all images in the dataset.

Definition at line 26 of file dataset.cxx.

References image(), and num_images().

Referenced by compute_bow(), compute_bow_features(), random_images(), and train_index().

26  {
27  std::vector< std::shared_ptr< const Image> > images(this->num_images());
28  for (uint64_t i = 0; i < this->num_images(); i++) {
29  images[i] = this->image(i);
30  }
31  return images;
32 }
virtual std::shared_ptr<Image> Dataset::image ( uint64_t  id) const
pure virtual

Given a unique integer ID, returns an Image associated with that ID.

Implemented in SimpleDataset.

Referenced by MatchesPage::add_match(), all_images(), benchmark_dataset(), compute_features(), InvertedIndex::search(), VocabTree::train(), and BagOfWords::train().

std::string Dataset::location ( ) const
std::string Dataset::location ( const std::string &  relative_path) const

Returns the absolute path of the file (appends the file path to the database path).

Definition at line 21 of file dataset.cxx.

References data_directory.

21  {
22  return data_directory + "/" + relative_path;
23 }
virtual uint64_t Dataset::num_images ( ) const
pure virtual

Returns the number of images in the dataset.

Implemented in SimpleDataset.

Referenced by all_images(), benchmark_dataset(), compute_features(), operator<<(), and InvertedIndex::search().

std::vector< std::shared_ptr< const Image > > Dataset::random_images ( size_t  count) const

Returns a vector of random images in the dataset of size count.

Definition at line 34 of file dataset.cxx.

References all_images().

Referenced by compute_bow(), main(), train_bow(), and train_tree().

34  {
35  std::vector< std::shared_ptr< const Image> > all = this->all_images();
36  std::random_shuffle(all.begin(), all.end());
37  std::vector< std::shared_ptr< const Image> > images(all.begin(), all.begin() + count);
38  return images;
39 }
virtual bool Dataset::read ( const std::string &  db_data_location)
pure virtual

Reads the dataset mapping from the input data location.

Returns true if successful, false otherwise.

Implemented in SimpleDataset.

std::vector<Dataset> Dataset::shard ( const std::vector< std::string > &  new_locations)

: Shards the dataset to the new input locations, and returns the sharded datasets

virtual bool Dataset::write ( const std::string &  db_data_location)
pure virtual

Writes the dataset mapping to the input data location.

Returns true if successful, false otherwise.

Implemented in SimpleDataset.

Field Documentation

std::string Dataset::data_directory
protected

Definition at line 66 of file dataset.hpp.

Referenced by SimpleDataset::construct_dataset(), Dataset(), and location().


The documentation for this class was generated from the following files: