vocabtree  0.0.1
SimpleDataset Class Reference

SimpleDataset is a sample implementation of a Dataset, where the data is stored as JPEG images in a single folder called images/ and features are stored in a folder feats/<feat_name>. More...

#include <dataset.hpp>

Inheritance diagram for SimpleDataset:
Collaboration diagram for SimpleDataset:

Data Structures

class  SimpleImage
 SimpleImage class used with the SimpleDataset class. More...
 

Public Member Functions

 SimpleDataset (const std::string &base_location)
 Creates a simple dataset from the images in base_location/images. More...
 
 SimpleDataset (const std::string &base_location, const std::string &db_data_location)
 If a dataset file is location at db_data_location, will load that file from. More...
 
 ~SimpleDataset ()
 
bool write (const std::string &db_data_location)
 Writes the SimpleDataset out to the specified file. More...
 
bool read (const std::string &db_data_location)
 Reads the specified SimpleDataset. More...
 
std::shared_ptr< Imageimage (uint64_t id) const
 Given a unique integer ID, returns an Image associated with that ID. More...
 
bool add_image (const std::shared_ptr< const Image > &image)
 Adds the given image to the database, if there is an id collision, will not add the image and return false, otherwise returns true. More...
 
uint64_t num_images () const
 Returns the number of images in the dataset. More...
 
- Public Member Functions inherited from Dataset
 Dataset (const std::string &base_location)
 Constructs a dataset given a base location. More...
 
 Dataset (const std::string &base_location, const std::string &db_data_location)
 Loads a dataset from the db_data_location. More...
 
virtual ~Dataset ()
 
std::string location () const
 Returns the absolute path of the data directory. More...
 
std::string location (const std::string &relative_path) const
 Returns the absolute path of the file (appends the file path to the database path). More...
 
std::vector< std::shared_ptr
< const Image > > 
all_images () const
 Returns a vector of all images in the dataset. More...
 
std::vector< std::shared_ptr
< const Image > > 
random_images (size_t count) const
 Returns a vector of random images in the dataset of size count. More...
 
std::vector< Datasetshard (const std::vector< std::string > &new_locations)
 : Shards the dataset to the new input locations, and returns the sharded datasets More...
 

Private Member Functions

void construct_dataset ()
 Constructs the dataset an fills in the image id map. More...
 

Private Attributes

boost::bimap< std::string,
uint64_t > 
id_image_map
 

Additional Inherited Members

- Protected Attributes inherited from Dataset
std::string data_directory
 

Detailed Description

SimpleDataset is a sample implementation of a Dataset, where the data is stored as JPEG images in a single folder called images/ and features are stored in a folder feats/<feat_name>.

For example, given a base absolute path of /c/data/. Image data is found in /c/data/images and sift features are found in /c/data/feats/sift/.

Definition at line 76 of file dataset.hpp.

Constructor & Destructor Documentation

SimpleDataset::SimpleDataset ( const std::string &  base_location)

Creates a simple dataset from the images in base_location/images.

It is recommended to then call write(...) to save the dataset so that it does not have to traverse the HDD everytime we load the dataset.

Definition at line 46 of file dataset.cxx.

References construct_dataset().

46  : Dataset(base_location) {
47  this->construct_dataset();
48 }
SimpleDataset::SimpleDataset ( const std::string &  base_location,
const std::string &  db_data_location 
)

If a dataset file is location at db_data_location, will load that file from.

Otherwise, this will create the dataset from base_location/images and call write(db_data_location).

Definition at line 50 of file dataset.cxx.

References construct_dataset(), filesystem::file_exists(), read(), and write().

51  : Dataset(base_location, db_data_location) {
52  if (filesystem::file_exists(db_data_location)) {
53  this->read(db_data_location);
54  }
55  else {
56  this->construct_dataset();
57  this->write(db_data_location);
58  }
59 }
SimpleDataset::~SimpleDataset ( )

Definition at line 61 of file dataset.cxx.

61 { }

Member Function Documentation

bool SimpleDataset::add_image ( const std::shared_ptr< const Image > &  image)
virtual

Adds the given image to the database, if there is an id collision, will not add the image and return false, otherwise returns true.

Implements Dataset.

Definition at line 147 of file dataset.cxx.

References id_image_map, and image().

Referenced by read().

147  {
148  if (id_image_map.right.find(image->id) != id_image_map.right.end()) return false;
149  const std::shared_ptr<const SimpleDataset::SimpleImage> simage = std::static_pointer_cast<const SimpleDataset::SimpleImage>(image);
150  id_image_map.insert(boost::bimap<std::string, uint64_t>::value_type(simage->location(), simage->id));
151  return true;
152 }
void SimpleDataset::construct_dataset ( )
private

Constructs the dataset an fills in the image id map.

Definition at line 70 of file dataset.cxx.

References Dataset::data_directory, id_image_map, and filesystem::list_files().

Referenced by SimpleDataset().

70  {
71  const std::vector<std::string> &image_file_paths = filesystem::list_files(data_directory + "/images/", ".jpg");
72  for (size_t i = 0; i < image_file_paths.size(); i++) {
73  id_image_map.insert(boost::bimap<std::string, uint64_t>::value_type( image_file_paths[i].substr(data_directory.size(), image_file_paths[i].size() - data_directory.size()), i));
74  }
75 }
std::shared_ptr< Image > SimpleDataset::image ( uint64_t  id) const
virtual

Given a unique integer ID, returns an Image associated with that ID.

Implements Dataset.

Definition at line 63 of file dataset.cxx.

References id_image_map.

Referenced by add_image(), main(), and write().

63  {
64  const std::string &image_path = id_image_map.right.at(id);
65 
66  std::shared_ptr<Image> current_image = std::make_shared<SimpleImage>(image_path, id);
67  return current_image;
68 }
uint64_t SimpleDataset::num_images ( ) const
virtual

Returns the number of images in the dataset.

Implements Dataset.

Definition at line 122 of file dataset.cxx.

References id_image_map.

Referenced by read(), and write().

122  {
123  return id_image_map.size();
124 }
bool SimpleDataset::read ( const std::string &  db_data_location)
virtual

Reads the specified SimpleDataset.

See write(const std::string &db_data_location) for more information about the binary format. Returns true if success, false otherwise. (checks the ifstream error bit).

Implements Dataset.

Definition at line 77 of file dataset.cxx.

References add_image(), filesystem::file_exists(), and num_images().

Referenced by SimpleDataset().

77  {
78  if (!filesystem::file_exists(db_data_location)) return false;
79  std::ifstream ifs(db_data_location, std::ios::binary);
80 
81  uint64_t num_images;
82  ifs.read((char *)&num_images, sizeof(uint64_t));
83 
84  for (uint64_t i = 0; i < num_images; i++) {
85 
86  uint64_t image_id;
87  uint16_t length;
88  ifs.read((char *)&image_id, sizeof(uint64_t));
89  ifs.read((char *)&length, sizeof(uint16_t));
90 
91  std::string image_location;
92  image_location.resize(length);
93 
94  ifs.read((char *)&image_location[0], sizeof(char)* length);
95  std::shared_ptr<const SimpleImage> simage = std::make_shared<const SimpleImage>(image_location, image_id);
96  this->add_image(simage);
97 
98  }
99  return (ifs.rdstate() & std::ifstream::failbit) == 0;
100 }
bool SimpleDataset::write ( const std::string &  db_data_location)
virtual

Writes the SimpleDataset out to the specified file.

If the containing directory does not exist, it will be automatically created. The Dataset data is stored in a binary format with num_images() entries of the form uint64_t, uint16_t, char * corresponding to an image id, string length, and the image location string respectively. Returns true if success, fail otherwise (checks the ofstream error bit).

Implements Dataset.

Definition at line 102 of file dataset.cxx.

References filesystem::create_file_directory(), image(), SimpleDataset::SimpleImage::location(), and num_images().

Referenced by SimpleDataset().

102  {
103  filesystem::create_file_directory(db_data_location);
104 
105  std::ofstream ofs(db_data_location, std::ios::binary | std::ios::trunc);
106  uint64_t num_images = this->num_images();
107  ofs.write((const char *)&num_images, sizeof(uint64_t));
108 
109  for (uint64_t i = 0; i < this->num_images(); i++) {
110  std::shared_ptr<SimpleDataset::SimpleImage> image = std::static_pointer_cast<SimpleDataset::SimpleImage>(this->image(i));
111  const std::string &image_location = image->location();
112  uint64_t image_id = image->id;
113  uint16_t length = image_location.size();
114  ofs.write((const char *)&image_id, sizeof(uint64_t));
115  ofs.write((const char *)&length, sizeof(uint16_t));
116  ofs.write((const char *)&image_location[0], sizeof(char)* length);
117  }
118 
119  return (ofs.rdstate() & std::ofstream::failbit) == 0;
120 }

Field Documentation

boost::bimap<std::string, uint64_t> SimpleDataset::id_image_map
private

Definition at line 135 of file dataset.hpp.

Referenced by add_image(), construct_dataset(), image(), and num_images().


The documentation for this class was generated from the following files: