updated readme
[RBC.git] / readme.txt
1 ***Random Ball Cover (RBC) v0.2***
2 Lawrence Cayton
3 lcayton@tuebingen.mpg.de
4
5 (C) Copyright 2010, Lawrence Cayton [lcayton@tuebingen.mpg.de]
6
7 This program is free software: you can redistribute it and/or modify
8 it under the terms of the GNU General Public License as published by
9 the Free Software Foundation, either version 3 of the License, or
10 (at your option) any later version.
11
12 This program is distributed in the hope that it will be useful,
13 but WITHOUT ANY WARRANTY; without even the implied warranty of
14 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15 GNU General Public License for more details.
16
17 You should have received a copy of the GNU General Public License
18 along with this program. If not, see <http://www.gnu.org/licenses/>.
19
20 ---------------------------------------------------------------------
21 SUMMARY
22
23 This is a C and CUDA implementation of the Random Ball Cover data
24 structure for fast nearest neighbor search on a GPU. The code
25 implements the one-shot algorithm.
26
27 See the following papers for a detailed description of the search
28 algorithm and the theory behind it.
29
30 * L. Cayton, A nearest neighbor data structure for graphics hardware.
31 ADMS, 2010.
32 * L. Cayton, Accelerating nearest neighbor search on manycore systems.
33 Submitted.
34
35
36 ---------------------------------------------------------------------
37 COMPILATION
38
39 Type make in a shell. Requires GCC and NVCC (CUDA). The code has
40 been developed under GCC 4.4 and CUDA 3.1.
41
42
43 ---------------------------------------------------------------------
44 USE
45
46 A sample driver is provided for the RBC. To try it out, type
47 $ testRBC
48 at the prompt and a list of options will be displayed.
49
50 The output file format is a list of the queries' NNs,
51 followed by a list of the distances to those NNs. Note that by
52 default, all input and output is stored in single-precision (float)
53 format.
54
55 Basic functionality is provided through this driver, but I recommend
56 integrating the RBC code directly into your code for the best
57 results. In particular, the best way to use the current
58 implementation is to build the RBC once, then query it many times.
59
60 The method requires a single parameter, the number of
61 representatives. This parameter allows one to trade-off between
62 search quality and search speed. The best way to set this parameter
63 is to try a few different values out; a good starting point is
64 generally 5*sqrt(n), where n is the number of database points. Use
65 the eval option (-e) to print out the error rate. See the paper for
66 detailed information on this parameter.
67
68
69 ---------------------------------------------------------------------
70 FILES
71
72 * brute.{h,cu} -- implementation of brute force search (CPU and GPU
73 versions)
74 * defs.h -- definitions of constants and macros, including the
75 distance metric.
76 * driver.cu -- example code for using the RBC data structure.
77 * kernels.{h,cu} -- implementation of all the (device) kernel functions,
78 except those related to the scan (see sKernels below)
79 * kernelWrap.{h,cu} -- CPU wrapper code around the kernels.
80 * rbc.{h,cu} -- the core of the RBC data structure. Includes the
81 implementation of build and search algorithms.
82 * sKernel.{h,cu} -- implementation of the kernel functions related to
83 the parallel scan algorithm (used within the build method).
84 * sKernelWrap.{h,cu} -- wrappers for the kernels in sKernel.
85 * utils.{h,cu} -- misc utilities used in the code.
86 * utilsGPU.{h,cu} -- misc utilities related to the GPU.
87
88
89 ---------------------------------------------------------------------
90 MISC NOTES ON THE CODE
91
92 * The code currently computes distance using the L_1 (manhattan)
93 metric. If you wish to use a different notion of distance, you must
94 modify defs.h. It is quite simple to switch to any metric that
95 operates alongs the coordinates independently (eg, any L_p metric),
96 but more complex metrics will require some aditional work. The L_2
97 metric (standard Euclidean distance) is already defined in defs.h.
98
99 * The k-NN code is currently hard-coded for k=32. It is hard-coded
100 because it uses a manually implemented sorting network. This design
101 allows all sorting to take place in on-chip (shared) memory, and is
102 highly efficient. Note that the NNs are returned in sorted order,
103 so that if one wants only, say, 5 NNs, one can simply ignore the
104 last 27 returned indices. For k>32, contact the author.
105
106 * The code requires that the entire DB and query set fit into the
107 device memory.
108
109 * Currently the software works in single precision. If you wish to
110 switch to double precision, you must edit the defs.h file. Simply
111 uncomment the lines
112
113 typedef double real;
114 #define MAX_REAL DBL_MAX
115
116 and comment out the lines
117
118 typedef float real;
119 #define MAX_REAL FLT_MAX
120
121 Then, you must do a
122
123 make clean
124
125 followed by another make.
126
127 * This software has been tested on the following graphics cards:
128 NVIDIA GTX 285, GT 430, GTX 480
129 NVIDIA Tesla c2050.
130
131 * This sotware has been tested under the following software setup:
132 Ubuntu 10.04 (linux)
133 gcc 4.4
134 cuda 3.1
135
136 It has been reported to work under Mac OSX. Please share your
137 experience getting it to work under Windows!
138
139 * For the most part, device variables (ie arrays residing on the GPU)
140 begin with a lowercase d. For example, the device version of the
141 DB variable x is dx.
142
143 * The computePlan code is a bit more complex than is needed for the
144 version of the RBC search algorithm described in the paper. The
145 search algorithm described in the paper has two steps: (1) Find the
146 closest representative to the query. (2) Explore the points owned
147 by that representative (ie the s-closest points to the representative
148 in the DB). The computePlan code is more complex to make it easy
149 to try out other options. For example, one could search the points
150 owned by the *two* closest representatives to the query instead. This
151 would require only minor changes to the code, though is currently
152 untested.
153
154 * If you are running this code on a GPU which is also driving your
155 display: A well-known issue with CUDA code in this situation is that
156 a process within the operating system will automatically kill
157 kernels that have been running for more than 5-10 seconds or so.
158 You can get around this in Linux by switching out of X-Windows (often
159 CTRL-ALT-F1 does the trick) and running the code directly from the
160 terminal.