minor bug fixes, updated readme
[RBC.git] / readme.txt
1 ***Random Ball Cover (RBC) v0.2.5***
2 Lawrence Cayton
3 lcayton@tuebingen.mpg.de
4
5 (C) Copyright 2010, Lawrence Cayton [lcayton@tuebingen.mpg.de]
6
7 This program is free software: you can redistribute it and/or modify
8 it under the terms of the GNU General Public License as published by
9 the Free Software Foundation, either version 3 of the License, or
10 (at your option) any later version.
11
12 This program is distributed in the hope that it will be useful,
13 but WITHOUT ANY WARRANTY; without even the implied warranty of
14 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15 GNU General Public License for more details.
16
17 You should have received a copy of the GNU General Public License
18 along with this program. If not, see <http://www.gnu.org/licenses/>.
19
20
21 ---------------------------------------------------------------------
22 SUMMARY
23
24 This is a C and CUDA implementation of the Random Ball Cover data
25 structure for fast nearest neighbor search on a GPU. The code
26 implements the one-shot algorithm.
27
28 See the following papers for a detailed description of the search
29 algorithm and the theory behind it.
30
31 * L. Cayton, A nearest neighbor data structure for graphics hardware.
32 ADMS, 2010.
33 * L. Cayton, Accelerating nearest neighbor search on manycore systems.
34 Submitted, 2011.
35
36
37 ---------------------------------------------------------------------
38 COMPILATION
39
40 Type make in a shell. Requires GCC and NVCC (CUDA). The code has
41 been developed under GCC 4.4 and CUDA 3.1.
42
43
44 ---------------------------------------------------------------------
45 USE
46
47 A sample driver is provided for the RBC. To try it out, type
48 $ testRBC
49 at the prompt and a list of options will be displayed.
50
51 The sample driver can be used with either text or binary input. The
52 text input format is one database element per line, with features
53 separated by spaces. The binary input format is in floats, but can be
54 changed to doubles.
55
56 The output file format is a list of the queries' NNs,
57 followed by a list of the distances to those NNs. Note that by
58 default, all input and output is stored in single-precision (float)
59 format.
60
61 Basic functionality is provided through this driver, but I recommend
62 integrating the RBC code directly into your code for the best
63 results. For many applications, the RBC needs to be built only once,
64 and then can be queried many times.
65
66 The method requires a single parameter, the number of
67 representatives. This parameter allows you to trade-off between
68 search quality and search speed. The best way to set this parameter
69 is to try a few different values out; a good starting point is
70 generally 5*sqrt(n), where n is the number of database points. Use
71 the eval option (-e) to print out the error rate. See the paper
72 (Cayton, 2011) for detailed information on this parameter.
73
74 The sample_input directory contains examples in both binary and
75 text. The sample_db set contains 1024 points, each of which has 16
76 dimensions. The sample_query set contains 128 sample queries (which
77 of course also have 16 dimensions). To try it out, run
78
79 $ testRBC -X sample_input/sample_db.txt -Q sample_input/sample_queries.txt -n 1024 -m 128 -d 16 -r 128
80
81 or to try it out with the binary files, run
82
83 $ testRBC -x sample_input/sample_db.bin -q sample_input/sample_queries.bin -n 1024 -m 128 -d 16 -r 128
84
85 Note that the -r 128 descibes the number of representatives, which
86 controls the accuracy of the search. You might try varying this
87 parameter to see the effects (there is nothing special about 128).
88 You can print out the accuracy using by adding the -e switch; this
89 will say the average number of the 32 nearest neighbors that were
90 actually returned.
91
92 ---------------------------------------------------------------------
93 FILES
94
95 * brute.{h,cu} -- implementation of brute force search (CPU and GPU
96 versions)
97 * defs.h -- definitions of constants and macros, including the
98 distance metric.
99 * driver.cu -- example code for using the RBC data structure.
100 * kernels.{h,cu} -- implementation of all the (device) kernel functions,
101 except those related to the scan (see sKernels below)
102 * kernelWrap.{h,cu} -- CPU wrapper code around the kernels.
103 * rbc.{h,cu} -- the core of the RBC data structure. Includes the
104 implementation of build and search algorithms.
105 * rbc_include.h -- header file to include in your driver.
106 * sKernel.{h,cu} -- implementation of the kernel functions related to
107 the parallel scan algorithm (used within the build method).
108 * sKernelWrap.{h,cu} -- wrappers for the kernels in sKernel.
109 * utils.{h,cu} -- misc utilities used in the code.
110 * utilsGPU.{h,cu} -- misc utilities related to the GPU.
111
112
113 ---------------------------------------------------------------------
114 MISC NOTES ON THE CODE
115
116 * The code currently computes distance using the L_2 (Euclidean)
117 metric. If you wish to use a different notion of distance, you must
118 modify defs.h. It is quite simple to switch to any metric that
119 operates alongs the coordinates independently (eg, any L_p metric),
120 but more complex metrics will require some aditional work. The L_1
121 metric (manhatten distance) is already defined in defs.h.
122
123 * The k-NN code is currently hard-coded for k=32. It is hard-coded
124 because it uses a manually implemented sorting network. This design
125 allows all sorting to take place in on-chip (shared) memory, and is
126 highly efficient. Note that the NNs are returned in sorted order,
127 so that if one wants only, say, 5 NNs, one can simply ignore the
128 last 27 returned indices. For k>32, contact the author.
129
130 * The code requires that the entire DB and query set fit into the
131 device memory.
132
133 * Currently the software works in single precision. If you wish to
134 switch to double precision, you must edit the defs.h file. Simply
135 uncomment the lines
136
137 typedef double real;
138 #define MAX_REAL DBL_MAX
139
140 and comment out the lines
141
142 typedef float real;
143 #define MAX_REAL FLT_MAX
144
145 Then, you must add the compiler flag
146 -arch=sm_20
147 to the NVCCFLAGS line of the Makefile (or sm_13 for older GPUs).
148
149 Finally, do a
150 $ make clean
151 followed by another make.
152
153 * For the most part, device variables (ie arrays residing on the GPU)
154 begin with a lowercase d. For example, the device version of the
155 DB variable x is dx.
156
157 * The computePlan code is a bit more complex than is needed for the
158 version of the RBC search algorithm described in the paper. The
159 search algorithm described in the paper has two steps: (1) Find the
160 closest representative to the query. (2) Explore the points owned
161 by that representative (ie the s-closest points to the representative
162 in the DB). The computePlan code is more complex to make it easy
163 to try out other options. For example, one could search the points
164 owned by the *two* closest representatives to the query instead. This
165 would require only minor changes to the code, though is currently
166 untested.
167
168 * This software has been tested on the following graphics cards:
169 NVIDIA GTX 285, GT 430, GTX 480, GeForce 320M, Tesla c2050
170
171 * This sotware has been developed under the following software setup:
172 Ubuntu 10.04 (linux)
173 gcc 4.4
174 cuda 3.2
175
176 It has also been tested under Mac OSX. Please share your
177 experience getting it to work under Windows!
178
179 * If you are running this code on a GPU which is also driving your
180 display: A well-known issue with CUDA code in this situation is that
181 a process within the operating system will automatically kill
182 kernels that have been running for more than 5-10 seconds or so.
183 You can get around this in Linux by switching out of X-Windows (often
184 CTRL-ALT-F1 does the trick) and running the code directly from the
185 terminal.