1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.12"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>Doxygen: External Indexing and Searching</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
</script>
<link href="doxygen_manual.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectalign" style="padding-left: 0.5em;">
<div id="projectname">Doxygen
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.12 -->
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('extsearch.html','');});
</script>
<div id="doc-content">
<div class="header">
<div class="headertitle">
<div class="title">External Indexing and Searching </div> </div>
</div><!--header-->
<div class="contents">
<div class="toc"><h3>Table of Contents</h3>
<ul><li class="level1"><a href="#extsearch_intro">Introduction</a></li>
<li class="level1"><a href="#extsearch_config">Configuring</a><ul><li class="level2"><a href="#extsearch_single">Single project index</a></li>
<li class="level2"><a href="#extsearch_multi">Multi project index</a></li>
</ul>
</li>
<li class="level1"><a href="#extsearch_update">Updating the index</a></li>
<li class="level1"><a href="#extsearch_api">Programming interface</a><ul><li class="level2"><a href="#extsearch_api_index">Indexer input format</a></li>
<li class="level2"><a href="#extsearch_api_search_in">Search URL format</a></li>
<li class="level2"><a href="#extsearch_api_search_out">Search results format</a></li>
</ul>
</li>
</ul>
</div>
<div class="textblock"><h1><a class="anchor" id="extsearch_intro"></a>
Introduction</h1>
<p>With release 1.8.3, doxygen provides the ability to search through HTML using an external indexing tool and search engine. This has several advantages:</p><ul>
<li>For large projects it can have significant performance advantages over doxygen's built-in search engine, as doxygen uses a rather simple indexing algorithm.</li>
<li>It allows combining the search data of multiple projects into one index, allowing a global search across multiple doxygen projects.</li>
<li>It allows adding additional data to the search index, i.e. other web pages not produced by doxygen.</li>
<li>The search engine needs to run on a web server, but clients can still browse the web pages locally.</li>
</ul>
<p>To avoid that everyone has to start writing their own indexer and search engine, doxygen provides an example tool for each action: <code>doxyindexer</code> for indexing the data and <code>doxysearch.cgi</code> for searching through the index.</p>
<p>The data flow is shown in the following diagram:</p>
<div class="image">
<img src="extsearch_flow.png" alt="extsearch_flow.png"/>
<div class="caption">
External Search Data Flow</div></div>
<ul>
<li><code>doxygen</code> produces the raw search data</li>
<li><code>doxyindexer</code> indexes the data into a search database <code>doxysearch.db</code></li>
<li>when a user performs a search from a doxygen generated HTML page, the CGI binary <code>doxysearch.cgi</code> will be invoked.</li>
<li>the <code>doxysearch.cgi</code> tool will perform a query on the database and return the results.</li>
<li>The browser will show the search results.</li>
</ul>
<h1><a class="anchor" id="extsearch_config"></a>
Configuring</h1>
<p>The first step is to make the search engine available via a web server. If you use <code>doxysearch.cgi</code> this means making the <a href="http://en.wikipedia.org/wiki/Common_Gateway_Interface">CGI</a> binary available from the web server (i.e. be able to run it from a browser via an URL starting with http:)</p>
<p>How to setup a web server is outside the scope of this document, but if you for instance have Apache installed, you could simply copy the <code>doxysearch.cgi</code> file from doxygen's <code>bin</code> dir to the <code>cgi-bin</code> of the Apache web server. Read the <a href="http://httpd.apache.org/docs/2.2/howto/cgi.html">apache documentation</a> for details.</p>
<p>To test if <code>doxysearch.cgi</code> is accessible start your web browser and point to URL to the binary and add <code>?test</code> at the end </p><pre class="fragment">http://yoursite.com/path/to/cgi/doxysearch.cgi?test
</pre><p>You should get the following message: </p><pre class="fragment">Test failed: cannot find search index doxysearch.db
</pre><p>If you use Internet Explorer you may be prompted to download a file, which will then contain this message.</p>
<p>Since we didn't create or install a doxysearch.db it is ok for the test to fail for this reason. How to correct this is discussed in the next section.</p>
<p>Before continuing with the next section add the above URL (without the <code>?test</code> part) to the <code>SEARCHENGINE_URL</code> tag in doxygen's configuration file: </p><pre class="fragment">SEARCHENGINE_URL = http://yoursite.com/path/to/cgi/doxysearch.cgi
</pre><h2><a class="anchor" id="extsearch_single"></a>
Single project index</h2>
<p>To use the external search option, make sure the following options are enabled in doxygen's configuration file: </p><pre class="fragment">SEARCHENGINE = YES
SERVER_BASED_SEARCH = YES
EXTERNAL_SEARCH = YES
</pre><p>This will make doxygen generate a file called <code>searchdata.xml</code> in the output directory (configured with <a class="el" href="config.html#cfg_output_directory">OUTPUT_DIRECTORY</a>). You can change the file name (and location) with the <a class="el" href="config.html#cfg_searchdata_file">SEARCHDATA_FILE</a> option.</p>
<p>The next step is to put the raw search data into an index for efficient searching. You can use <code>doxyindexer</code> for this. Simply run it from the command line: </p><pre class="fragment">doxyindexer searchdata.xml
</pre><p>This will create a directory called <code>doxysearch.db</code> with some files in it. By default the directory will be created at the location from which doxyindexer was started, but you can change the directory using the <code>-o</code> option.</p>
<p>Copy the <code>doxysearch.db</code> directory to the same directory as where the <code>doxysearch.cgi</code> is located and rerun the browser test by pointing the browser to </p><pre class="fragment">http://yoursite.com/path/to/cgi/doxysearch.cgi?test
</pre><p>You should now get the following message: </p><pre class="fragment">Test successful.
</pre><p>Now you should be able to search for words and symbols from the HTML output.</p>
<h2><a class="anchor" id="extsearch_multi"></a>
Multi project index</h2>
<p>In case you have more than one doxygen project and these projects are related, it may be desirable to allow searching for words in all projects from within the documentation of any of the projects.</p>
<p>To make this possible all that is needed is to combine the search data for all projects into a single index, e.g. for two projects A and B for which the searchdata.xml is generated in directories project_A and project_B run: </p><pre class="fragment">doxyindexer project_A/searchdata.xml project_B/searchdata.xml
</pre><p>and then copy the resulting <code>doxysearch.db</code> to the directory where also <code>doxysearch.cgi</code> is located.</p>
<p>The <code>searchdata.xml</code> file doesn't contain any absolute paths or links, so how can the search results from multiple projects be linked back to the right documentation set? This is where the <code>EXTERNAL_SEARCH_ID</code> and <code>EXTRA_SEARCH_MAPPINGS</code> options come into play.</p>
<p>To be able to identify the different projects, one needs to set a unique ID using <a class="el" href="config.html#cfg_external_search_id">EXTERNAL_SEARCH_ID</a> for each project.</p>
<p>To link the search results to the right project, you need to define a mapping per project using the <a class="el" href="config.html#cfg_extra_search_mappings">EXTRA_SEARCH_MAPPINGS</a> tag. With this option to can define the mapping from IDs of other projects to the (relative) location of documentation of those projects.</p>
<p>So for projects A and B the relevant part of the configuration file could look as follows: </p><pre class="fragment">project_A/Doxyfile
------------------
EXTERNAL_SEARCH_ID = A
EXTRA_SEARCH_MAPPINGS = B=../../project_B/html
</pre><p>for project A and for project B </p><pre class="fragment">project_B/Doxyfile
------------------
EXTERNAL_SEARCH_ID = B
EXTRA_SEARCH_MAPPINGS = A=../../project_A/html
</pre><p>with these settings, projects A and B can share the same search database, and the search results will link to the right documentation set.</p>
<h1><a class="anchor" id="extsearch_update"></a>
Updating the index</h1>
<p>When you modify the source code, you should re-run doxygen to get up to date documentation again. When using external searching you also need to update the search index by re-running <code>doxyindexer</code>. You could wrap the call to doxygen and doxyindexer together in a script to make this process easier.</p>
<h1><a class="anchor" id="extsearch_api"></a>
Programming interface</h1>
<p>Previous sections have assumed you use the tools <code>doxyindexer</code> and <code>doxysearch.cgi</code> to do the indexing and searching, but you could also write your own index and search tools if you like.</p>
<p>For this 3 interfaces are important</p><ul>
<li>The format of the input for the index tool.</li>
<li>The format of the input for the search engine.</li>
<li>The format of the output of search engine.</li>
</ul>
<p>The next subsections describe these interfaces in more detail.</p>
<h2><a class="anchor" id="extsearch_api_index"></a>
Indexer input format</h2>
<p>The search data produced by doxygen follows the <a href="http://wiki.apache.org/solr/UpdateXmlMessages">Solr XML index message</a> format.</p>
<p>The input for the indexer is an XML file, which consists of one <code><add></code> tag containing multiple <code><doc></code> tags, which in turn contain multiple <code><field></code> tags.</p>
<p>Here is an example of one doc node, which contains the search data and meta data for one method: </p><pre class="fragment"><add>
...
<doc>
<field name="type">function</field>
<field name="name">QXmlReader::setDTDHandler</field>
<field name="args">(QXmlDTDHandler *handler)=0</field>
<field name="tag">qtools.tag</field>
<field name="url">de/df6/class_q_xml_reader.html#a0b24b1fe26a4c32a8032d68ee14d5dba</field>
<field name="keywords">setDTDHandler QXmlReader::setDTDHandler QXmlReader</field>
<field name="text">Sets the DTD handler to handler DTDHandler()</field>
</doc>
...
</add>
</pre><p>Each field has a name. The following field names are supported:</p><ul>
<li><em>type</em>: the type of the search entry; can be one of: source, function, slot, signal, variable, typedef, enum, enumvalue, property, event, related, friend, define, file, namespace, group, package, page, dir</li>
<li><em>name</em>: the name of the search entry; for a method this is the qualified name of the method, for a class it is the name of the class, etc.</li>
<li><em>args</em>: the parameter list (in case of functions or methods)</li>
<li><em>tag</em>: the name of the tag file used for this project.</li>
<li><em>url</em>: the (relative) URL to the HTML documentation for this entry.</li>
<li><em>keywords</em>: important words that are representative for the entry. When searching for such keyword, this entry should get a higher rank in the search results.</li>
<li><em>text</em>: the documentation associated with the item. Note that only words are present, no markup.</li>
</ul>
<dl class="section note"><dt>Note</dt><dd>Due to the potentially large size of the XML file, it is recommended to use a <a href="http://en.wikipedia.org/wiki/Simple_API_for_XML">SAX based parser</a> to process it.</dd></dl>
<h2><a class="anchor" id="extsearch_api_search_in"></a>
Search URL format</h2>
<p>When the search engine is invoked from a doxygen generated HTML page, a number of parameters are passed to via the <a href="http://en.wikipedia.org/wiki/Query_string">query string</a>.</p>
<p>The following fields are passed:</p><ul>
<li><em>q</em>: the query text as entered by the user</li>
<li><em>n</em>: the number of search results requested.</li>
<li><em>p</em>: the number of search page for which to return the results. Each page has <em>n</em> values.</li>
<li><em>cb</em>: the name of the callback function, used for JSON with padding, see the next section.</li>
</ul>
<p>From the complete list of search results, the range <code>[n*p - n*(p+1)-1]</code> should be returned.</p>
<p>Here is an example of how a query looks like. </p><pre class="fragment">http://yoursite.com/path/to/cgi/doxysearch.cgi?q=list&n=20&p=1&cb=dummy
</pre><p>It represents a query for the word 'list' (<code>q=list</code>) requesting 20 search results (<code>n=20</code>), starting with the result number 20 (<code>p=1</code>) and using callback 'dummy' (<code>cb=dummy</code>):</p>
<dl class="section note"><dt>Note</dt><dd>The values are <a href="http://en.wikipedia.org/wiki/Percent-encoding">URL encoded</a> so they have to be decoded before they can be used.</dd></dl>
<h2><a class="anchor" id="extsearch_api_search_out"></a>
Search results format</h2>
<p>When invoking the search engine as shown in the previous subsection, it should reply with the results. The format of the reply is <a href="http://en.wikipedia.org/wiki/JSONP">JSON with padding</a>, which is basically a javascript struct wrapped in a function call. The name of function should be the name of the callback (as passed with the <em>cb</em> field in the query).</p>
<p>With the example query as shown the previous subsection the main structure of the reply should look as follows: </p><pre class="fragment">dummy({
"hits":179,
"first":20,
"count":20,
"page":1,
"pages":9,
"query": "list",
"items":[
...
]})
</pre><p>The fields have the following meaning:</p><ul>
<li><em>hits</em>: the total number of search results (could be more than was requested).</li>
<li><em>first</em>: the index of first result returned: <img class="formulaInl" alt="$\min(n*p,\mbox{\em hits})$" src="form_6.png"/>.</li>
<li><em>count</em>: the actual number of results returned: <img class="formulaInl" alt="$\min(n,\mbox{\em hits}-\mbox{\em first})$" src="form_7.png"/></li>
<li><em>page</em>: the page number of the result: <img class="formulaInl" alt="$p$" src="form_8.png"/></li>
<li><em>pages</em>: the total number of pages: <img class="formulaInl" alt="$\lceil\frac{\mbox{\em hits}}{n}\rceil$" src="form_9.png"/>.</li>
<li><em>items</em>: an array containing the search data per result.</li>
</ul>
<p>Here is an example of how the element of the <em>items</em> array should look like: </p><pre class="fragment">{"type": "function",
"name": "QDir::entryInfoList(const QString &nameFilter, int filterSpec=DefaultFilter, int sortSpec=DefaultSort) const",
"tag": "qtools.tag",
"url": "d5/d8d/class_q_dir.html#a9439ea6b331957f38dbad981c4d050ef",
"fragments":[
"Returns a <span class=\"hl\">list</span> of QFileInfo objects for all files and directories...",
"... pointer to a QFileInfoList The <span class=\"hl\">list</span> is owned by the QDir object...",
"... to keep the entries of the <span class=\"hl\">list</span> after a subsequent call to this..."
]
},
</pre><p>The fields for such an item have the following meaning:</p><ul>
<li><em>type</em>: the type of the item, as found in the field with name "type" in the raw search data.</li>
<li><em>name</em>: the name of the item, including the parameter list, as found in the fields with name "name" and "args" in the raw search data.</li>
<li><em>tag</em>: the name of the tag file, as found in the field with name "tag" in the raw search data.</li>
<li><em>url</em>: the name of the (relative) URL to the documentation, as found in the field with name "url" in the raw search data.</li>
<li>"fragments": an array with 0 or more fragments of text containing words that have been search for. These words should be wrapped in <code><span class="hl"></code> and <code></span></code> tags to highlight them in the output. </li>
</ul>
</div></div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="footer">Generated by
<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.12 </li>
</ul>
</div>
</body>
</html>
|