~hww3/caudium

ref: 93d5a75fba08db71b832716377276978cdd7888d caudium/README.Lucene -rw-r--r-- 1.9 KiB
93d5a75fWilliam Welliver storage: method "None" was missing stop function. should fix error on shutdown 1 year, 9 days ago
                                                                                
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
Search Engine powered by Jakarta Lucene
---------------------------------------
akarta Lucene is a high-performance, full-featured text search engine 
written entirely in Java.  It is a technology suitable for nearly any 
application that requires full-text search, especially cross-platform.  

To use the Lucene search tools, you will need a pike that supports Java (see 
pike --features for that). 

How to install and use:

1. Have a pike with java support running Caudium.
2. Create a chase data directory, such as /path/to/caudium/search_data
3. Crate a chase profiles directory, called "profiles" withint the data 
   directory, example: /path/to/caudium/search_data/profiles
4. Create a profile, as demonstrated below, and save it to the profiles 
    directory.
5. Generate an index using the following command:
   ./start-caudium --quiet --program ./bin/indexer.pike -v --profile=/path/to/profile
   (currently requires a version of pike > 7.4.25, or a copy of 
     Web.Crawler from pike CVS) 
6. Add Chase module to a virtual server 
7. Point that module at the chase data directory
8. Use these tags to get a form and search results:

  <chase_form>
  <chase_results>
  <chase_powered>


TODO

Indexer needs web interface
Search module needs more flexible output

EXAMPLE PROFILE

<profile name="blah">
<crawler>
  <startingpoint>http://caudium.net/</startingpoint>
  <allow type="glob">http://caudium.net/*</allow>
  <http_user>user</http_user>
  <http_password>mypass</http_password>
</crawler>
<converters>
  <converter type="converter" 
    mimetype="application/pdf">pdftotext -htmlmeta %f -</converter>
</converters>
<indexer>
  <allowtype>text/*</allowtype>
  <allowtype>application/pdf</allowtype>
  <denytype>image/gif</denytype>
  <denytype>image/jpeg</denytype>
  <denytype>image/png</denytype>
  <temp>/tmp</temp>
</indexer>
<index>
  <location>/tmp/db</location>
  <stopwordsfile>/tmp/stopwords</stopwordsfile>
</index>
</profile>