Solr has a lot of extensive features - PowerPoint PPT Presentation

About This Presentation
Title:

Solr has a lot of extensive features

Description:

Solr Integration and Enhancements Solr has a lot of extensive features Todd Hatcher What is Solr? Solr offers advanced, optimized, scalable searching capabilities ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 13
Provided by: ToddHa8
Category:

less

Transcript and Presenter's Notes

Title: Solr has a lot of extensive features


1
Solr Integration and Enhancements
  • Solr has a lot of extensive features

Todd Hatcher
2
What is Solr?
  • Solr offers advanced, optimized, scalable
    searching capabilities
  • Communicate with Solr using XML, JSON and HTTP
  • Includes a HTML admin interface
  • Solr is built on top of Lucene
  • Rich features of Lucene can be leveraged when
    using Solr
  • Solr is very configurable

3
Integration with ColdFusion
  • Very little direct integration with ColdFusion
  • ColdFusion communicates with Solr using HTTP
  • Solr runs in its own JVM, does not share with
    ColdFusion
  • Using ColdFusion installation, Solr runs in a
    jetty servlet container on port 8983
    (http//localhost8983/solr)
  • Solr is exposed in production by default
  • Important files located C\ColdFusion9\solr\multic
    ore
  • Solr offers a lot more than what is available
    using cfindex cfcollection cfsearch

4
Solr
  • What is a core? its like a verity collection
    (a searchable data group)
  • Single Core (one index) vs Multicore (multiple
    isolated configurations/schemas/indexes using
    same Solr instance)
  • C\ColdFusion9\solr\multicore\solr.xml is the
    central file that points to locations of the Solr
    cores configuration and data (this what CF
    administrator reads/writes to when creating and
    using Solr collections)
  • You can put your Solr cores under you project
    directory and keep them in source control

5
core/conf/solrconfig.xml
  • Main configuration for solr core
  • ltqueryResponseWriter namejson /gt determines
    the format of the results. ColdFusion uses xslt
    by default
  • You can return JSON, XML, python, ruby, php
  • Multiple query response writers can be
    configured, one can be set as default others can
    be specified by passing parameter wtname (eg.
    wtjson)
  • cfsearch type of methods will not work if the
    response writer is not what ColdFusion is
    expecting

6
core/conf/schema.xml
  • Field Types maps custom types to the solr/lucene
    type
  • type solr.TextField allows for analyzers
  • Analyzers can be run at index time or query time
  • They allow for manipulations of the data
    (typically filtering)
  • The order in which filters are declared is the
    order processed
  • StopFilterFactory removes common words that do
    not help the search results
  • WordDelimiterFilterFactory can adds words like
    WiFi, Wi, Fi by splitting the original into
    subwords

7
core/conf/schema.xml cont.
  • EnglishPorterFilterFactory determines root word
    using word variations like -ing determines root
    word and adds to index
  • SynonymFilterFactory treats words as same
  • DoubleMetaphoneFilterFactory for phonetic logic
    (better than Soundex which Verity uses)
  • TextSpell/TextSpellPhrase feedback did you mean
  • ltcopyField sourcefieldName destd/gt dest
    fieldtype can run different analyzers on source
    field and store result
  • wiki.apache.org/solr/AnalyzersTokenizersTokenFilte
    rs
  • Adobe adds quite a bit to the file to create
    fieldtypes to be compatible with what was in
    verity

8
core/conf/schema.xml cont.
  • Similar to creating a database table. Maps field
    names to types using ltfield /gt
  • Gives you the ability to store additional data
  • Field can be indexed (searchable)
  • Field can be stored (referenced and returned with
    results)
  • Field can be required
  • ltuniqueKeygtfield namelt/uniqueKeygt
  • ltsolrQueryParse defaultOperatorOR /gt

9
Indexing
  • Data is sent using api - HTTP POST to Solr as
    XML/JSON/Binary
  • Commit is an intensive task. Do bulk adds first
    then call commit
  • ltcfindex /gt calls commit after each index
    (confirmed?)
  • Commit after each would noticeably increase index
    time
  • Efficient Process add data (queue), commit,
    optimize

10
Search Syntax
  • fieldterm ( returns everything)
  • A score is generated at query time, the value
    itself doesnt have any meaning, the scores are
    relevant only when relative to each other (a
    scale)
  • fq can filter query based on some supplied
    condition
  • wt is the return type of the results (xml,json,
    etc.)
  • qt is the request handler used to process the
    request (default is standard)
  • fl is the list of fields to return (field must be
    stored)
  • q is the query string
  • You can specify the start value and maxrows

11
DisMaxRequestHandler
  • Declared in solrconfig.xml
  • Allows simplified searching without strict syntax
  • Can be configured with default weighted
    parameters (which can be overriden)
  • Causes the q parameters to be parsed differently

12
Resources
  • Lucene In Action
  • http//wiki.apache.org/solr/
  • http//cfadminsearcher.riaforge.org/
  • http//cfsolrlib.riaforge.org/
  • CF Solr Lib written by Shannon Hicks Wrapper
    for Solr functionality
Write a Comment
User Comments (0)
About PowerShow.com