David Newman, UC Irvine Lecture 8: Matlab 1 - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

David Newman, UC Irvine Lecture 8: Matlab 1

Description:

Apply training idf weights to test data. Use: idf(wordi) = log(D/Di) ... hist. Matlab tutorial (2) Play with 450 New York Times news articles. Read docID wordID count ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 9
Provided by: Informatio367
Category:

less

Transcript and Presenter's Notes

Title: David Newman, UC Irvine Lecture 8: Matlab 1


1
CS 277 Data MiningLecture 8 Matlab tutorial
  • David Newman
  • Department of Computer Science
  • University of California, Irvine

2
Notices
  • Project progress report 1 due Tues Oct 30
  • Homework 2 questions?
  • Apply training idf weights to test data
  • Use idf(wordi) log(D/Di)

3
Project Progress Report 1 Due Tues Oct 30
  • Written Progress Report
  • Expect at least 3 pages (should be typed not
    handwritten)
  • Hand in written document in class on Tues Oct 30
  • Also submit by email 1 Powerpoint or PDF slide
  • 1 slide that describes your project
  • Should contain
  • Your name (top right corner)
  • Clear description of the main task
  • Some visual graphic of data relevant to your task
  • 1 bullet or 2 on what methods you plan to use
  • Preliminary results or results of exploratory
    data analysis
  • Make it graphical (use text sparingly)
  • Email as an attachment no later 12 noon Tues Oct
    30
  • subject line cs277 yourname_20071030.ppt

4
List of Sections for your Progress Report
  • Clear description of task (reuse original
    proposal if needed)
  • Discussion of relevant literature
  • Discuss prior published/related work (if it
    exists)
  • Preliminary data evaluation
  • Exploratory data analysis relevant to your task
  • Include as many of plots/graphs as you think are
    useful/relevant
  • Preliminary algorithm work
  • Summary of your progress on algorithm
    implementation so far
  • If you are not at this point yet, say so
  • Relevant information about other code/algorithms
    you have downloaded, some preliminary testing on,
    etc.
  • Difficulties encountered so far
  • Plans for the remainder of the quarter
  • Algorithm implementation
  • Experimental methods

5
Sentence re-writes (from last week)
  • However, most recognizers are prone to making
    errors.
  • However, most recognizers make errors.
  • We are interested in discovering these
    constraints or patterns in an automatic way.
  • We are interested in automatically discovering
    these constraints or patterns.
  • A list of keywords that are frequent in pages
    which are announcing events are fed into the
    search API.
  • We first find keywords that frequently occur in
    event pages. Then we feed these keywords into
    the search API.
  • For each of the models, the parameters that do
    best on the evaluation set are used for testing.
  • For each model, we use the parameters that
    produce the highest accuracy on the evaluation
    set.
  • The goal of modeling network growth by evolution
    presented in this proposal is to study the
    process of genome evolution.
  • The goal of modeling network growth by evolution
    is to understand the process of genome evolution.
  • Old links are deleted when new connections
    satisfy generated or predicted rules better.
  • Old links are deleted when new connections better
    satisfy generated or predicted rules.

6
Matlab tutorial (1)
  • Did you all try Matlab?
  • Can you explain these functions?
  • help
  • load
  • textread
  • sparse
  • zeros
  • size
  • length
  • whos
  • sum
  • max
  • find
  • sort
  • repmat
  • (the colon operator)
  • ./ (element-wise divide)
  • rand

7
Matlab tutorial (2)
  • Play with 450 New York Times news articles
  • Read docID wordID count
  • How many words?
  • Which word doc has word that occurs most often?
  • second most often?
  • How many words per doc?
  • Word frequency
  • Confirm Zipf law
  • Top-10 words?
  • Remove stopwords
  • Create dataset by eliminating words with freqlt10
  • Compute idf weights
  • Top-10 words?

8
Matlab tutorial (3)
  • Perform tf-idf retrieval
  • What is average overlap between doc i and doc j?
  • Compute conditional probabilities
  • prob( word doc ), prob( doc word )
  • Confirm Bayes rule
  • Other tasks?
Write a Comment
User Comments (0)
About PowerShow.com