D2K Tutorial - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

D2K Tutorial

Description:

D2K Tutorial. How to write a module? alg | Automated Learning Group. Outline. Modules and D2K ... http://alg.ncsa.uiuc.edu/do/tools/d2k/tutorials ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 73
Provided by: lorett5
Category:

less

Transcript and Presenter's Notes

Title: D2K Tutorial


1
D2K Tutorial
  • How to write a module?

2
Outline
  • Modules and D2K
  • Command Line D2K and scripts
  • Using Eclipse
  • References
  • http//alg.ncsa.uiuc.edu/do/tools/d2k/documentatio
    n
  • http//alg.ncsa.uiuc.edu/do/tools/d2k/tutorials
  • http//alg.ncsa.uiuc.edu/tools/docs/d2k/manual/ind
    ex.html
  • http//alg.ncsa.uiuc.edu/tools/docs/d2k/principles
    /index.html
  • http//alg.ncsa.uiuc.edu/tools/docs/d2k/faq/faq.ht
    ml

3
How Modules Work Review
How To Write a Module
  • Module life cycle begins when it receives an
    input
  • Each time the infrastructure delivers an input,
    it will check to determine if the module is ready
    to execute
  • When the module is ready, the infrastructure
    queues the module for execution
  • System will then assign the module to a thread
    when one becomes available
  • When the module executes, it will do its thing,
    provide its outputs, then exit
  • If the module is still ready to execute, the
    system will queue it again
  • Otherwise, it drops off the face of the earth
    until it is enabled again

4
How hard is it to write a module?
How To Write a Module
  • We have an API to define what a given module is.
  • Modules need the following methods implemented
  • Module Info (getModuleInfo)
  • Input and Output Info (getInputInfo and
    getOutputInfo)
  • Input and Output Types (getInputTypes and
    getOutputTypes)
  • Names (getModuleName, getInputName,
    getOutputName)
  • Module execution (doit)
  • Flexibility exists for other methods to be
    overwritten to provide different functionality.
  • Optional methods exist for providing more
    information about properties, etc.

5
Objects Passed Between Modules
How To Write a Module
  • Tables are used often by modules
  • A Table has n rows and m columns
  • Columns can be any data type
  • Many different Table types exist
  • Table
  • ExampleTable
  • PredictionTable

6
Table API
How To Write a Module
  • Many methods to access data
  • public double getDouble()
  • public void setDouble(double)
  • public double getInteger()
  • public void setInteger(Integer)
  • etc
  • Meta methods
  • public int getNumRows()
  • public int getNumColumns()
  • public String getColumnLabel(int)
  • public String getColumnType(int)
  • public String getColumnComment(int)
  • public void setColumnIsNominal(boolean value, int
    position)
  • public void setColumnIsScalar(boolean value, int
    position)
  • public boolean isColumnScalar(int position)
  • public addColumn(col)
  • etc..
  • Table Factory
  • public TableFactory getTableFactory() used to
    create new columns in your table

7
Adding a column to a new Table
How To Write a Module
  • Table t (Table)pullInput(0)
  • TableFactory factory t.getTableFactory()
  • MutableTable newTbl (MutableTable)factory.create
    Table()
  • Column col factory.createColumn
    (ColumnTypes.DOUBLE)
  • col.addRows(10)
  • newTbl.addColumn(col)

8
Writing a Module
How To Write a Module
  • Modules take some number of inputs
  • Produce some number of outputs
  • Inputs, outputs, and module function must all be
    documented
  • Example Finding the mean of all entries in a
    Table

9
Step 1 imports
How To Write a Module
  • Must import necessary packages
  • Include the base implementation of a Module
  • import ncsa.d2k.core.modules.
  • Include the interface definitions of a Table
  • import ncsa.d2k.modules.core.datatype.table.

10
Step 2 Define the Class
How To Write a Module
  • Must extend one of the basic module types
  • public class TableMean extends ComputeModule
  • Can also extend an existing module
  • public class TableMedian extends TableMean

11
Step 3 - Inputs
How To Write a Module
  • Define the inputs for this module
  • Inputs (and outputs) must be Objects they cannot
    be primitive data types. Primitives can be boxed
    in Integer, Double, etc.
  • Three methods need to be implemented
  • public String getInputTypes()
  • public String getInputInfo(int i)
  • public String getInputName(int i)

12
Step 3a getInputTypes()
How To Write a Module
  • Defines the order and data types of the inputs to
    the module
  • public String getInputTypes()
  • String in ncsa.d2k.modules.core.datatype.ta
    ble.Table
  • return in
  • If multiple inputs then it looks like this
  • public String getInputTypes()
  • String in ncsa.d2k.modules.core.datatype.ta
    ble.Table,
  • ncsa.d2k.modules.core.datatype.table.Table
  • return in

13
Step 3b getInputInfo(int i)
How To Write a Module
  • Provide a detailed description of the input
  • i is the index of the input
  • public String getInputInfo(int i)
  • if (i 0)
  • return The table we want to analyze.
  • else
  • return null

14
Step 3c getInputName(int I)
How To Write a Module
  • Provide a name for each input
  • i is the index of the input
  • public String getInputName(int i)
  • if (i 0)
  • return Table
  • else
  • return null

15
Step 4 - Outputs
How To Write a Module
  • Define the outputs for this module
  • Three methods need to be implemented
  • public String getOutputTypes()
  • public String getOutputInfo(int i)
  • public String getOutputName(int i)

16
Step 4a getOutputTypes()
How To Write a Module
  • Defines the order and data types of the outputs
    to the module
  • public String getOutputTypes()
  • String in ncsa.d2k.modules.core.datatype.ta
    ble.Table
  • return in
  • If multiple outputs then it looks like this
  • public String getOutputTypes()
  • String in ncsa.d2k.modules.core.datatype.ta
    ble.Table, java.lang.Double
  • return in

17
Step 4b getOutputInfo(int i)
How To Write a Module
  • Provide a detailed description of the output
  • i is the index of the output
  • public String getOutputInfo(int i)
  • if (i 0)
  • return The table we want to analyze.
  • else if (i 1)
  • return The mean value of all entries in the
    table.
  • else
  • return null

18
Step 4c getOutputName(int I)
How To Write a Module
  • Provide a name for each output
  • i is the index of the output
  • public String getOutputName(int i)
  • if (i 0)
  • return Table
  • else if (i 1)
  • return Mean Value
  • else
  • return null

19
Step 5 Module Info
How To Write a Module
  • Provide a description of the module function
  • public String getModuleInfo()
  • return This is a module that calculates the
    mean of each
  • attribute in the table.
  • Provide a name for the module
  • public String getModuleName() return
    TableMean

20
Properties
How To Write a Module
  • Properties are parameters set at runtime
  • Example A debug property
  • private boolean debug false
  • public boolean getDebug()
  • return debug
  • public void setDebug(boolean val)
  • debug val
  • public PropertyDescription getPropertyDescriptio
    ns() PropertyDescription pds new
    PropertyDescription1
  • pds0 new PropertyDescription(debug,
    Verbose debugging output, Print verbose
    debugging output)

21
doit()
How To Write a Module
  • Perform the real work of the module
  • Pull in the inputs
  • public Object pullInput(int I)
  • Push the outputs to the next module
  • public void pushOutput(Object out, int I)

22
doit() example
How To Write a Module
  • public void doit() throws Exception
  • Table table (Table)pullInput(0)
  • double mean 0
  • for(int i 0 i lt table.getNumRows () i)
  • for(int j 0 j lt table.getNumColumns ()
    j)
  • mean table.getDouble(i, j)
  • mean / ij
  • pushOutput(table, 0)
  • pushOutput(new Double(mean), 1)

23
Other methods
How To Write a Module
  • beginExecution()
  • endExecution()
  • isReady()

24
beginExecution()
How To Write a Module
  • Method called when the itinerary begins execution
  • Perform initializations here
  • public void beginExecution()
  • someStateVariable false

25
endExecution()
How To Write a Module
  • Called when the itinerary finishes execution
  • Perform clean-up here
  • public void endExecution() largeMemoryObject
    null

26
isReady()
How To Write a Module
  • D2K modules will become ready to execute whenever
    their enabling criteria is met.
  • There are two types of enabling conditions
  • receipt of data
  • arrival of triggers
  • Many modules will not need to make changes to
    their default behavior.
  • By default, modules will enable whenever each of
    its inputs contains data, and any triggers
    attached have also arrived.
  • The getFlags method returns an array of integers.
    This array contains a value for each input. For
    example, the first integer value represents the
    number of parcels available in the associated
    input pipe. Essentially, we only care that there
    is one, not how many.
  • For example, this isReady() method will return
    true when it receives any one of two inputs
  • public boolean isReady()
  • if(this.getFlags()0 gt 0
    this.getFlags()1 gt 0 )
  • return true
  • else
  • return false

27
D2K Command Line (1)
How To Write a Module
  • D2K provides a command line interface for
    executing D2K itineraries
  • -nogui
  • This argument disables the graphical user
    interface. If no itinerary is loaded using the
    -load argument, a script file should be included
    to programmatically create an itinerary.
  • -load ltfile namegt
  • This option specifies an itinerary to load. If a
    full path name is not specified, D2K will look in
    the itineraries directory to find the itinerary.
    If the -nogui option is specified, the itinerary
    will be loaded, any script specified will be
    applied, and the itinerary will execute. If
    -nogui is not specified, the itinerary will
    simply be loaded into the D2K Toolkit Workspace.
  • -jini ltjiniurlgt
  • Identifies the Jini URL to use when searching for
    Jini enabled D2K services. This option overrides
    the setting in the D2K properties file.
  • -script ltfilenamegt
  • If this option is included, the script in the
    given filename will be applied to the loaded
    itinerary, or if no itinerary is loaded, the
    script can be used to create an itinerary. If the
    -nogui option is not specified, the script is
    ignored.
  • -threads ltnumber threadsgt
  • Use this option to specify the number of threads
    D2K should create and employ for the execution of
    the itinerary. This value is typically equal to
    the number of the processors on the machine
    running D2K. This option also overrides the
    setting in the D2K properties file.
  • -vis ltvis file namegt
  • This option is used to display a previously saved
    visualization.

28
D2K Command Line (2)
How To Write a Module
  • java -server -Xmx256M -Xms256M -cp ltCLASSPATHgt
    ncsa.d2k.D2K -nogui -load headless.itn gt
    outputfile

29
Scripting Itinerary Modifications
How To Write a Module
  • D2K supports a number of scripting commands.
  • These commands can be stored in a text file and
    applied to an existing itinerary or they can
    create a completely new itinerary.
  • add ltmodule namegt ltclass namegt
  • Add an instance of a module of the given class
    name to the itinerary, and name it "module name".
  • set ltmodule namegt ltproperty namegt ltvaluegt
  • Set the property named "property name" of the
    module named "module name" to the "value". The
    property name here is the name as determined by
    the name of the setter/getter methods.
  • remove ltmodule namegt
  • Remove the module named "module name" from the
    itinerary.
  • link ltparent module namegt ltoutput port indexgt
    ltchild module namegt ltinput port indexgt
  • Connect the module named "parent module name" to
    the module named "child module name". Parent's
    output port is indicated by "output port index"
    and the input port is indicated by "input port
    index".
  • unlink ltparent module namegt ltoutput port indexgt
  • Disconnect the port at "output port index" form
    the module with the name "parent module name".

30
Script Example
How To Write a Module
  • The following script illustrates how the script
    commands might be used
  • set "Apriori" minimumSupport 40.0
  • set "Compute Confidence" confidence 90.0
  • remove "Rule Visualization
  • remove "RuleAssocReport
  • remove "FanOut1
  • add "Headless Rule Assoc Report"
    "ncsa.d2k.modules.core.discovery.ruleassociation.
    HeadlessRuleAssocReport
  • link "Compute Confidence" 0 "Headless Rule Assoc
    Report" 0

31
Using Eclipse (1)
How To Write a Module
  • In Eclipse
  • Select File -gt New -gt Project
  • In New Project wizard
  • Select Java Project
  • Set project name to (for example) d2ktoolkit
  • Select create separate source and output
    folders under project layout
  • Click Next, not Finish

32
Using Eclipse (2)
How To Write a Module
  • Select Libraries tab and click Add External
    JARs...
  • NOTE Or you can create a User Library in the
    same manner, which can be used in multiple
    Eclipse projects.

33
Using Eclipse (3)
How To Write a Module
  • Navigate to c\Program Files\D2KToolkit\lib
  • Select all jar and zip files in this directory
    (this is all files except the ext/ and plugins/
    subdirectories)
  • Click Finish

34
Using Eclipse (4) Exporting
How To Write a Module
  • Modules that use D2K should now compile.
  • You need to put these compiles classes into your
    modules directory.
  • Use the File -gt Export utility.
  • Select Jar File.
  • Click Next.

35
Using Eclipse (5) Exporting
How To Write a Module
  • Expand your project and select the src directory.
  • Click on the Export generated class files and
    resources.
  • Choose a file name under Jar File
  • Click on Compress the contents of the Jar file.
  • Click Finish.

36
Executing D2K from within Eclipse (1)
How To Write a Module
  • In Eclipse, select Run -gt Run...

37
Executing D2K from within Eclipse (2)
How To Write a Module
  • Executing D2K from within Eclipse
  • Optionally, enter a new name for the run
    configuration (for example, toolkit)
  • Set main class ncsa.d2k.gui.ToolKit (capital T,
    capital K)

38
Executing D2K from within Eclipse (3)
How To Write a Module
  • Increasing Memory Available
  • Select Arguments tab
  • Set VM Arguments to Xmx256M
  • Set working directory
  • Select Arguments tab
  • Under working directory, uncheck check box and
    set directory to c\Program Files\D2KToolkit
  • Select Apply and then, if desired, Run

39
How To Write a Module
Homework
40
Homework Problem
How To Write a Module
  • Write a module to calculate the average for each
    (numeric) column in a Table.
  • Output the means in a Table.

41
The ALG Team
How To Write a Module
  • Staff
  • Bernie Acs
  • Loretta Auvil
  • David Clutter
  • Vered Goren
  • Eugene Grois
  • Luigi Marini
  • Robert McGrath
  • Chris Navarro
  • Greg Pape
  • Barry Sanders
  • Andrew Shirk
  • David Tcheng
  • Michael Welge
  • Students
  • Chen Chen
  • Hong Cheng
  • Yaniv Eytani
  • Fang Guo
  • Govind Kabra
  • Chao Liu
  • Haitao Mo
  • Xuanhui Wang
  • Qian Yang
  • Feida Zhu

42
References
How To Write a Module
  • http//alg.ncsa.uiuc.edu/do/tools/d2k/documentatio
    n
  • http//alg.ncsa.uiuc.edu/do/tools/d2k/tutorials
  • http//alg.ncsa.uiuc.edu/tools/docs/d2k/manual/ind
    ex.html
  • http//alg.ncsa.uiuc.edu/tools/docs/d2k/principles
    /index.html
  • http//alg.ncsa.uiuc.edu/tools/docs/d2k/faq/faq.ht
    ml

43
Appendix Answer to the Homework Problem
How To Write a Module
  • Write a module to calculate the average for each
    (numeric) column in a Table.
  • Output the means in a Table.

44
Details of Homework Problem
How To Write a Module
  • Problem Find the mean for each column in a Table
  • Input data A two dimensional table
  • Output A table that has the average for each
    column
  • Approach
  • Create Module to read Table and output a Table
    with the column averages.
  • Export to a jar file, install in D2K modules
  • Create itinerary to read data from a file or
    other source into a Table, then apply the average
    module from Step 1, then output the results,
    e.g., in a TableView
  • Create or use appropriate data set the input
    module to use the data.
  • Run the itinerary.

45
A. Creating a Module
How To Write a Module
  • Imports
  • Define the Class
  • Define the Inputs
  • Define the Outputs
  • Module Information
  • Properties
  • Do the real work the doit()

46
Step 1 imports
How To Write a Module
  • Must import necessary packages
  • Include the base implementation of a Module
  • import ncsa.d2k.core.modules.
  • Include the interface definitions of a Table
  • import ncsa.d2k.modules.core.datatype.table.

47
Step 2 Define the Class
How To Write a Module
  • Must extend one of the basic module types
  • public class TableMean extends ComputeModule
  • Can also extend an existing module
  • public class TableMedian extends TableMean
  • Hint You may want to work from an example
    module. For example, the Principles of Module
    Development has several examples at the end.
    You might cut and paste one of these, such as the
    ModelPredictive.java
  • http//alg.ncsa.uiuc.edu/tools/docs/d2k/principles
    /index.html

48
Step 3 - Inputs
How To Write a Module
  • Define the inputs for this module
  • Inputs (and outputs) must be Objects they cannot
    be primitive data types. Primitives can be boxed
    in Integer, Double, etc.
  • Three methods need to be implemented
  • public String getInputTypes()
  • public String getInputInfo(int i)
  • public String getInputName(int i)

49
Step 3a getInputTypes()
How To Write a Module
  • Defines the order and data types of the inputs to
    the module
  • public String getInputTypes()
  • String in ncsa.d2k.modules.core.datatype.ta
    ble.Table
  • return in
  • If multiple inputs then it looks like this
  • public String getInputTypes()
  • String in ncsa.d2k.modules.core.datatype.ta
    ble.Table,
  • ncsa.d2k.modules.core.datatype.table.Table
  • return in

50
Step 3b getInputInfo(int i)
How To Write a Module
  • Provide a detailed description of the input
  • i is the index of the input
  • public String getInputInfo(int i)
  • if (i 0)
  • return The table we want to analyze.
  • else
  • return null

51
Step 3c getInputName(int I)
How To Write a Module
  • Provide a name for each input
  • i is the index of the input
  • public String getInputName(int i)
  • if (i 0)
  • return Table
  • else
  • return null

52
Step 4 - Outputs
How To Write a Module
  • Define the outputs for this module
  • Three methods need to be implemented
  • public String getOutputTypes()
  • public String getOutputInfo(int i)
  • public String getOutputName(int i)

53
Step 4a getOutputTypes()
How To Write a Module
  • Defines the order and data types of the outputs
    to the module
  • public String getOutputTypes()
  • String in ncsa.d2k.modules.core.datatype.ta
    ble.Table
  • return in

54
Step 4b getOutputInfo(int i)
How To Write a Module
  • Provide a detailed description of the output
  • i is the index of the output
  • public String getOutputInfo(int i)
  • if (i 0)
  • return The table we want to analyze.
  • else if (i 1)
  • return The mean value of all entries in the
    table.
  • else
  • return null

55
Step 4c getOutputName(int I)
How To Write a Module
  • Provide a name for each output
  • i is the index of the output
  • public String getOutputName(int i)
  • if (i 0)
  • return Table
  • else if (i 1)
  • return Mean Value
  • else
  • return null

56
Step 5 Module Info
How To Write a Module
  • Provide a description of the module function
  • public String getModuleInfo()
  • return This is a module that calculates the
    mean of each
  • attribute in the table.
  • Provide a name for the module
  • public String getModuleName() return
    TableMean

57
Step 6 Set up Properties
How To Write a Module
  • Properties are parameters set at runtime
  • Example A debug property
  • private boolean debug false
  • public boolean getDebug()
  • return debug
  • public void setDebug(boolean val)
  • debug val
  • public PropertyDescription getPropertyDescriptio
    ns() PropertyDescription pds new
    PropertyDescription1
  • pds0 new PropertyDescription(debug,
    Verbose debugging output, Print verbose
    debugging output)

58
Step 7 doit()
How To Write a Module
  • Perform the real work of the module
  • Pull in the inputs
  • public Object pullInput(int I)
  • Push the outputs to the next module
  • public void pushOutput(Object out, int I)

59
doit() example (part 1)
How To Write a Module
  • public void doit() throws Exception
  • / read the input (from input 0) into a Table /
  • Table t (Table)pullInput(0)
  • / Use a TableFactory to create table fo
    rthe output values /
  • TableFactory factory t.getTableFactory()
  • MutableTable newTbl (MutableTable)factory.crea
    teTable()

60
doit() example (2 of 3)
How To Write a Module
  • / For each column, compute the mean across
    all the rows. /
  • for(int i 0 i lt t.getNumColumns () i)
  • Column col factory.createColumn(ColumnTypes.DO
    UBLE)
  • col.addRows(1)
  • newTbl.addColumn(col) newTbl.setColumnLabel(t
    .getColumnLabel(i)"Mean",i)
  • if (t.isColumnNumeric(i))
  • double mean 0
  • for(int j 0 j lt t.getNumRows () j)
  • mean t.getDouble(j, i)
  • mean / t.getNumRows()
  • newTbl.setDouble(mean,0,i)
  • else
  • / if the column is non-numeric, set mean to
    -1 /
  • newTbl.setDouble(-1,0,i)

61
doit() example (3 of 3)
How To Write a Module
  • / put the means out to port 0 /
  • pushOutput(newTbl, 0)

62
Source code
How To Write a Module
  • An example solution at
  • http//algdocs.ncsa.uiuc.edu/TableMean.java.
    txt

63
B. Compile, create jar, install in D2K/modules
How To Write a Module
  • Compile the module
  • Create a Jar file. (E.g., export from eclipse)
  • Install Jar in D2K/modules

64
C. Create an Itinerary
How To Write a Module
  • The itinerary must read the data into a Table,
    then run the TableMean, and then output
  • Hint You might find an example itinerary that
    is similar. For example, in the D2K toolkit
    there is an example in
  • itineraries/DataLoading/FileSupport/Delimi
    tedFileToTable
  • You could copy this itinerary, and then add the
    new module

65
Step 1 Copy Itinerary
How To Write a Module
  • Open the Itineraries tab
  • Select the itinerary, e.g., DataLoadinggtFileSuppor
    tgtDelimitedFileToTable
  • Load the Itinerary
  • Select FilegtSave itinerary As, save a copy of the
    Itinerary with a new name.

66
Step 2 Add the new Module to the itinerary
How To Write a Module
  • Open the Modules tab
  • Find the new module
  • Drag onto work area
  • Create connection from the output of the
    ParseTable to the input of the new module

67
Step 3 Add visualization
How To Write a Module
  • Find the ncsa/d2k/modules/core/vis /TableViewer
    module
  • Drag onto work area
  • Create a connection from the output of the new
    module to the input of the TableViewer

68
The New Itinerary
How To Write a Module
69
C. Select a Dataset
How To Write a Module
  • Determine what data to use
  • This example needs a table with one or more
    columns of numbers
  • Example in the D2K directory, data/UCI/iris.csv
  • Set the input to the file
  • Click on the input module
  • Set the File Name to the dataset

70
D. Run the itinerary
How To Write a Module
  • Click Run
  • The result will pop up in a TableViewer window.

71
Example Result
How To Write a Module
72
Done
How To Write a Module
  • and its just that simple!
Write a Comment
User Comments (0)
About PowerShow.com