Text Mining SAS-L Topics - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Text Mining SAS-L Topics

Description:

Text Mining SASL Topics – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 19
Provided by: lho8
Learn more at: https://ipsr.ku.edu
Category:
Tags: sas | mining | sl | text | topics

less

Transcript and Presenter's Notes

Title: Text Mining SAS-L Topics


1
Text Mining SAS-L Topics
  • Larry Hoyle, Policy Research Institute,
    University of Kansas

2
SAS-L topics
  • Read each weekly topic list from
    http//www.listserv.uga.edu/archives/sas-l.html
  • Parse topic, HTMLdecode
  • Strip Re
  • / strip variations of re /
  • topicRE prxparse('/ RrEe (.)/')
  • if prxmatch(topicRE, topic) then do
  • topic prxposn(topicRE, 1,topic)
  • end
  • Proc SQL to aggregate topic counts across weeks

3
SAS-L 2005
  • 35324 thread/topic lines in the html files
  • 7081 threads after merging across weeks and a
    little cleaning

4
SAS-L Top Threads in Number of Messages
5
Text Miner on the SAS-L topics
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
Largest clusters
11
Smaller Clusters
12
Message Content
13
Web scraping with tmfilter
  • options noxwait
  • macro aweek(week0501a)
  • x "md C\ddrive\projects\sugs\sugi31\SASLBOF\posts
    \week"
  • x "md C\ddrive\projects\sugs\sugi31\SASLBOF\filte
    redposts\week"
  • libname sugi31 'C\ddrive\projects\sugs\sugi31\SAS
    LBOF\datasets'
  • tmfilter(
  • datasetsugi31.SLweek.,
  • dirC\ddrive\projects\sugs\sugi31\SASLBOF\posts\
    week,
  • destdirC\ddrive\projects\sugs\sugi31\SASLBOF\fil
    teredPosts\week,
  • URLhttp//listserv.uga.edu/cgi-bin/wa?A1indweek
    .NRSTR(Lsas-l),
  • depth1,
  • linkssugi31.SLweek.L,
  • norestrict' ',
  • numchars2000)
  • mend aweek

14
Parse date and sender
Should parse this out
15
Using a 10 sample of message text
16
Using a 10 sample of message text
17
Filter out too common terms, listserv
18
Filter out too common terms, listserv
Write a Comment
User Comments (0)
About PowerShow.com