Machine Learning in GATE - PowerPoint PPT Presentation

About This Presentation
Title:

Machine Learning in GATE

Description:

Machine Learning in GATE. Valentin Tablan. 2. Machine Learning in GATE. Uses classification. ... Attributes refer to instance annotations. ... – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 12
Provided by: vale236
Category:

less

Transcript and Presenter's Notes

Title: Machine Learning in GATE


1
Machine Learning in GATE
  • Valentin Tablan

2
Machine Learning in GATE
  • Uses classification.
  • Attr1, Attr2, Attr3, Attrn ? Class
  • Classifies annotations.
  • (Documents can be classified as well using a
    simple trick.)
  • Annotations of a particular type are selected as
    instances.
  • Attributes refer to instance annotations.
  • Attributes have a position relative to the
    instance annotation they refer to.

3
Attributes
  • Attributes can be
  • Boolean
  • The lack of presence of an annotation of a
    particular type partially overlapping the
    referred instance annotation.
  • Nominal
  • The value of a particular feature of the referred
    instance annotation. The complete set of
    acceptable values must be specified a-priori.
  • Numeric
  • The numeric value (converted from String) of a
    particular feature of the referred instance
    annotation.

4
Implementation
  • Machine Learning PR in GATE.
  • Has two functioning modes
  • training
  • application
  • Uses an XML file for configuration
  • lt?xml version"1.0" encoding"windows-1252"?gt
  • ltML-CONFIGgt
  • ltDATASETgt lt/DATASETgt
  • ltENGINEgtlt/ENGINEgt
  • ltML-CONFIGgt

5
ltDATASETgt
  • ltDATASETgt
  • ltINSTANCE-TYPEgtTokenlt/INSTANCE-TYPEgt
  • ltATTRIBUTEgt
  • ltNAMEgtPOS_category(0)lt/NAMEgt
  • ltTYPEgtTokenlt/TYPEgt
  • ltFEATUREgtcategorylt/FEATUREgt
  • ltPOSITIONgt0lt/POSITIONgt
  • ltVALUESgt
  • ltVALUEgtNNlt/VALUEgt
  • ltVALUEgtNNPlt/VALUEgt
  • ltVALUEgtNNPSlt/VALUEgt
  • lt/VALUESgt
  • ltCLASS/gt
  • lt/ATTRIBUTEgt
  • lt/DATASETgt

6
ltENGINEgt
  • ltENGINEgt
  • ltWRAPPERgtgate.creole.ml.weka.Wrapperlt/WRAPPERgt
  • ltOPTIONSgt
  • ltCLASSIFIERgtweka.classifiers.j48.J48lt/CLASS
    IFIERgt
  • ltCLASSIFIER-OPTIONSgt-K 3lt/CLASSIFIER-OPTION
    Sgt
  • ltCONFIDENCE-THRESHOLDgt0.85lt/CONFIDENCE-THRE
    SHOLDgt
  • lt/OPTIONSgt
  • lt/ENGINEgt

7
Attributes Position
Instances type Token
8
Machine Learning PR
  • Can save a learnt model to an external file for
    later use.
  • Saves the actual model and the collected dataset.
  • Can export the collected dataset in .arff format.

9
Standard Use Scenario
  • Application
  • Prepare data by enriching the documents with
    annotation for attributes. (e.g. run Tokeniser,
    POS tagger, Gazetteer, etc).
  • Load the previously saved model.
  • Run the ML PR in application mode.
  • Save the learnt model.
  • Training
  • Prepare training data by enriching the documents
    with annotation for attributes. (e.g. run
    Tokeniser, POS tagger, Gazetteer, etc).
  • Run the ML PR in training mode.
  • Export the dataset as .arff and perform
    experiments using the WEKA interface in order to
    find the best attribute set / algorithm /
    algorithm options.
  • Update the configuration file accordingly.
  • Run the ML PR again to collect the actual data.
  • Save the learnt model.

10
An Example
  • Learn POS category from POS context.

11
Using Other ML Libraries
  • The MLEngine Interface
  • Method Summary
  • void addTrainingInstance(List attributes) Adds
    a new training instance to the dataset. 
  • Object classifyInstance(List attributes)
    Classifies a new instance. 
  • void init() This method will be called after an
    engine is created and has its dataset and options
    set. 
  • void setDatasetDefinition(DatasetDefintion definit
    ion) Sets the definition for the dataset used. 
  • void setOptions(org.jdom.Element options) Sets
    the options from an XML JDom element.
  • void setOwnerPR(ProcessingResource pr)
    Registers the PR using the engine with the
    engine. 
Write a Comment
User Comments (0)
About PowerShow.com