vocabtree  0.0.1
BagOfWords Class Reference

Implements a Bag of Words based (BoW) image search. More...

#include <bag_of_words.hpp>

Inheritance diagram for BagOfWords:
Collaboration diagram for BagOfWords:

Data Structures

struct  MatchResults
 Subclass of match results base which also returns scores. More...
 
struct  SearchParams
 Subclass of train params base which specifies inverted index training parameters. More...
 
struct  TrainParams
 Subclass of train params base which specifies inverted index training parameters. More...
 

Public Member Functions

 BagOfWords ()
 
 BagOfWords (const std::string &file_path)
 
bool train (Dataset &dataset, const std::shared_ptr< const TrainParamsBase > &params, const std::vector< std::shared_ptr< const Image > > &examples)
 Given a set of training parameters, list of images, trains. More...
 
bool load (const std::string &file_path)
 Loads a trained search structure from the input filepath. More...
 
bool save (const std::string &file_path) const
 Saves a trained search structure to the input filepath. More...
 
std::shared_ptr< MatchResultsBasesearch (Dataset &dataset, const std::shared_ptr< const SearchParamsBase > &params, const std::shared_ptr< const Image > &example)
 Given a set of search parameters, a query image, searches for matching images and returns the match. More...
 
cv::Mat vocabulary () const
 Returns the vocabulary matrix. More...
 
uint32_t num_clusters () const
 Returns the number of clusters in the vocabulary. More...
 
- Public Member Functions inherited from SearchBase
 SearchBase ()
 
 SearchBase (const std::string &file_path)
 
virtual ~SearchBase ()
 
std::vector< std::shared_ptr
< MatchResultsBase > > 
search (Dataset &dataset, const std::shared_ptr< SearchParamsBase > &params, const std::vector< std::shared_ptr< const Image > > &examples)
 Given a set of search parameters, list of query images, searches for matching images and returns the result matches. More...
 

Protected Attributes

cv::Mat vocabulary_matrix
 

Detailed Description

Implements a Bag of Words based (BoW) image search.

Note that search here is not implemented and would throw an error should you try to call it. A naive implementation would have to compute tf-idf with all possible image. Instead, you should train a BoW model and use this model in conjuction with a InvertedIndex search model to perform a query.

Definition at line 9 of file bag_of_words.hpp.

Constructor & Destructor Documentation

BagOfWords::BagOfWords ( )

Definition at line 18 of file bag_of_words.cxx.

18  : SearchBase() {
19 
20 
21 }
BagOfWords::BagOfWords ( const std::string &  file_path)

Definition at line 23 of file bag_of_words.cxx.

References filesystem::file_exists(), and load().

23  : SearchBase(file_path) {
24  if(!filesystem::file_exists(file_path)) {
25  std::cerr << "Error reading bag of words from " << file_path << std::endl;
26  return;
27  }
28  if(!this->load(file_path)) {
29  std::cerr << "Error reading bag of words from " << file_path << std::endl;
30  }
31 }

Member Function Documentation

bool BagOfWords::load ( const std::string &  file_path)
virtual

Loads a trained search structure from the input filepath.

Implements SearchBase.

Definition at line 33 of file bag_of_words.cxx.

References filesystem::load_cvmat(), and vocabulary_matrix.

Referenced by BagOfWords(), and compute_bow().

33  {
34  std::cout << "Reading bag of words from " << file_path << "..." << std::endl;
35 
36  if (!filesystem::load_cvmat(file_path, vocabulary_matrix)) {
37  std::cerr << "Failed to read vocabulary from " << file_path << std::endl;
38  return false;
39  }
40 
41  std::cout << "Done reading bag of words." << std::endl;
42 
43  return true;
44 }
uint32_t BagOfWords::num_clusters ( ) const

Returns the number of clusters in the vocabulary.

Definition at line 156 of file bag_of_words.cxx.

References vocabulary_matrix.

156  {
157  return vocabulary_matrix.rows;
158 }
bool BagOfWords::save ( const std::string &  file_path) const
virtual

Saves a trained search structure to the input filepath.

Implements SearchBase.

Definition at line 47 of file bag_of_words.cxx.

References filesystem::create_file_directory(), vocabulary_matrix, and filesystem::write_cvmat().

Referenced by compute_bow(), main(), and train_bow().

47  {
48  std::cout << "Writing bag of words to " << file_path << "..." << std::endl;
49 
51  if (!filesystem::write_cvmat(file_path, vocabulary_matrix)) {
52  std::cerr << "Failed to write vocabulary to " << file_path << std::endl;
53  return false;
54  }
55 
56  std::cout << "Done writing bag of words." << std::endl;
57  return true;
58 }
std::shared_ptr< MatchResultsBase > BagOfWords::search ( Dataset dataset,
const std::shared_ptr< const SearchParamsBase > &  params,
const std::shared_ptr< const Image > &  example 
)
virtual

Given a set of search parameters, a query image, searches for matching images and returns the match.

Search is not valid for bag of words - this would require computing tf-idf on all possible images in the dataset, and this function will assert(0) should you try to run it. Instead, you should train a Bag of Words (BoW) model and use it with one of the other search mechanisms, such as the inverted index.

Implements SearchBase.

Definition at line 147 of file bag_of_words.cxx.

147  {
148  assert(0);
149  return nullptr;
150 }
bool BagOfWords::train ( Dataset dataset,
const std::shared_ptr< const TrainParamsBase > &  params,
const std::vector< std::shared_ptr< const Image > > &  examples 
)
virtual

Given a set of training parameters, list of images, trains.

Returns true if successful, false if not successful.

Implements SearchBase.

Definition at line 77 of file bag_of_words.cxx.

References filesystem::file_exists(), Dataset::image(), filesystem::load_cvmat(), Dataset::location(), vision::merge_descriptors(), BagOfWords::TrainParams::numClusters, and vocabulary_matrix.

Referenced by compute_bow(), main(), and train_bow().

77  {
78 
79  const std::shared_ptr<const TrainParams> &ii_params = std::static_pointer_cast<const TrainParams>(params);
80 
81  uint32_t k = ii_params->numClusters;
82  uint32_t n = ii_params->numFeatures;
83 
84  std::vector<uint64_t> all_ids(examples.size());
85  for (uint64_t i = 0; i < examples.size(); i++) {
86  all_ids[i] = examples[i]->id;
87  }
88  std::random_shuffle(all_ids.begin(), all_ids.end());
89 
90  std::vector<cv::Mat> all_descriptors;
91  uint64_t num_features = 0;
92  for (size_t i = 0; i < all_ids.size(); i++) {
93  std::shared_ptr<Image> image = std::static_pointer_cast<Image>(dataset.image(all_ids[i]));
94  if (image == nullptr) continue;
95 
96  const std::string &descriptors_location = dataset.location(image->feature_path("descriptors"));
97  if (!filesystem::file_exists(descriptors_location)) continue;
98 
99  cv::Mat descriptors, descriptorsf;
100  if (filesystem::load_cvmat(descriptors_location, descriptors)) {
101  num_features += descriptors.rows;
102  if (n > 0 && num_features > n) break;
103  descriptors.convertTo(descriptorsf, CV_32FC1);
104  all_descriptors.push_back(descriptorsf);
105  }
106  }
107  const cv::Mat merged_descriptor = vision::merge_descriptors(all_descriptors, true);
108 
109 #if ENABLE_FASTCLUSTER && ENABLE_MPI
110 
111  int rank = MPI::COMM_WORLD.Get_rank();
112 
113  uint32_t D = merged_descriptor.cols;
114  float *dataf = (float *)merged_descriptor.data;
115  vocabulary_matrix = cv::Mat(k, D, CV_32FC1);
116  float *clusters = (float *)vocabulary_matrix.data;
117 
118  // initialize the clusters (random at the moment)
119  if(rank == 0) { // initial clusters get broadcast
120  std::vector<uint64_t> indices(num_features);
121  for(size_t i=0; i<indices.size(); i++) {
122  indices[i] = i;
123  }
124  std::random_shuffle(indices.begin(), indices.end());
125  for(size_t i=0; i<k; i++) {
126  memcpy(&clusters[D * i], &dataf[D * indices[i]], sizeof(float) * D);
127  }
128  }
129 
130  fastcluster::kmeans<float>(load_rows_in_mem, (void *)merged_descriptor.data,
131  build_nnobj, (void *)merged_descriptor.data,
132  (float *)clusters, num_features, D, k, 16, 0, (char *)0);
133  return true;
134 #else
135  cv::Mat labels;
136  uint32_t attempts = 1;
137  cv::TermCriteria tc(cv::TermCriteria::COUNT | cv::TermCriteria::EPS, 16, 0.0001);
138  if (k > merged_descriptor.rows) { // k > n
139  std::cerr << "Warning: # clusters > # features, automatically setting #clusters = #features." << std::endl;
140  k = merged_descriptor.rows;
141  }
142  cv::kmeans(merged_descriptor, k, labels, tc, attempts, cv::KMEANS_PP_CENTERS, vocabulary_matrix);
143 #endif
144  return true;
145 }
cv::Mat BagOfWords::vocabulary ( ) const

Returns the vocabulary matrix.

Definition at line 152 of file bag_of_words.cxx.

References vocabulary_matrix.

Referenced by compute_bow(), and main().

152  {
153  return vocabulary_matrix;
154 }

Field Documentation

cv::Mat BagOfWords::vocabulary_matrix
protected

Definition at line 56 of file bag_of_words.hpp.

Referenced by load(), num_clusters(), save(), train(), and vocabulary().


The documentation for this class was generated from the following files: