Improving the Web Design Mining Web Data at Cityjob.com - PowerPoint PPT Presentation

Loading...

PPT – Improving the Web Design Mining Web Data at Cityjob.com PowerPoint presentation | free to download - id: 7df348-ZTJlM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Improving the Web Design Mining Web Data at Cityjob.com

Description:

Mining Web Data at Cityjob.com Hing-Po Lo, Linda Lu, Miriam Chan Department of Management Sciences City University of Hong Kong, Hong Kong – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 32
Provided by: msg68
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Improving the Web Design Mining Web Data at Cityjob.com


1
Improving the Web Design Mining Web Data at
Cityjob.com
Hing-Po Lo, Linda Lu, Miriam Chan
Department of Management Sciences
City University of Hong Kong, Hong Kong
mshplo_at_cityu.edu.hk
2
I. Introduction
Customer Relationship Management
Data Mining
The Web
3
A. The Web
  • More than 200 millions surfers per day
  • Huge volume of data captured from the Web
  • Only 2 of web data analyzed

USB
4
B. Customer Relationship Management
  • DOT COM companies
  • work in an information-intensive and
    ultra-competitive mode
  • require the use of CRM to establish a
    personalized
  • relationship with their customers

5
C. Data Mining Tools
  • There are many software and web vendors that may
    help to explore and mine the web log files.
  • Most study the clickstream at the session
    level. In order to conduct CRM, one has to
    analyze the web log file at the customer level.
  • A tailor-made software using SAS macro and
    Enterprise Miner has been developed.

6
Cityjob.COM
  • It offers information on almost all posts
    available from major companies in HK.
  • It receives on average over several thousand
    visitors per day.

7
II. The Data
  • Study Period
  • 11 December 2000 to 4 February 2001
  • Three types of data files
  • Web log files
  • Subscribers profiles
  • Jobs profiles.

8
  1. Web log files

Software Microsoft Internet Information Server
4.0



Version 1.0




Date 2000-12-11 000000




Fields date time c-ip
cs-username s-sitename s-computername s-ip
cs-method cs-uri-stem cs-uri-query sc-status
sc-win32-status sc-bytes cs-bytes time-taken
cs(Cookie)

2000-12-11 000000 208.223.166.3 - W3SVC4
PROD5_WEB 202.130.170.225 GET /default.asp - 200
0 15838 645 1297 RMIDd0dfa603398e0850CityjobID
LASTUPD20001130LOGINslooIND000OPN000CT
Y091RDBc80200000000000000020028311b1b000000000
0000000ASPSESSIO  
9
2. Subscribers profiles
User ID Age Sex Ed. level P. income H. income Country Marital Status Em. Status Occ.
cityjob94290 27 F SEC HK S FT CUS
cityjob94293 26 M DIP 2 HK S FT FIN
cityjob94338 28 F SEC HK S FT ACC
cityjob94345 34 M UC 8 9 HK M FT MGT
Contd
Ind Reg. Date Interest
HOT 20001030 MKT
BNK 20001030 BANK, FIN, INVEST, MKT
OMF 20001030 ENTER, GAME, HKNEWS, PROPOMF
DPT 20001030 CNEWS, COMPU, ECON, ENTER, HKNEWS,
10
3. Jobs profiles
Job ID Title Type Work Exp. Quali. Industry Level
cityjobB7200 ORG. MANAGER IT 4 UC BANK MID
cityjobAVU10 EXECUTIVE OFFICER II LEG 3 DIP GOV JUN
cityjobB7040 ASST. ACCOUNTANT ACC 5 SEC RET PRO
cityjobB7530 SALES EXECUTIVE SAL 4 UC TDG JUN
11
Web log files
Jobs files
Subscribers files
12
SAS macros were written to perform the following
tasks
A Reading the web log files
B Cleaning the data files
C Creating new variables
D Merging the data files
E   Prepare different SAS data files
13
Useful Summary Information
  • Subscribers profiles
  • Jobs profiles
  • Web log files
  • Web log files User ID
  • E. Web log files Job ID

14
(No Transcript)
15
The most popular jobs
Job ID Title Industry Visit No. Popularity Index
cityjobCM070 OFFICER - CORPORATE BANKING BNK 7748 100.0
cityjobC8570 ADMINISTRATIVE ASSISTANT GOV 6552 84.6
cityjobCDU20 EXECUTIVE TRAINEE - INVESTMENT PRODUCTS BNK 5148 64.9
cityjobCL580 CONTRACT HOUSING OFFICER GOV 4944 63.8
cityjobCK570 EXECUTIVES FOR CORPORATE FINANCE BNK 4664 60.2
16
  • . Collaborative Filtering

1. By Association Rules
  • Whenever a visitor enquires about a particular
    job, we can cross sell similar jobs by
    recommending other jobs that have the highest
    association with the original one.
  • The association is based on the click history
    of all the visitors to the Web.

17
For example,if
  • Job A cityjobCF520
  • Title Assistant Accountant Qualification
    Diploma Working experience one year

then
  • Job B cityjobCF180
  • Title Assistant Accountant Qualification
    Diploma Working experience three year 
  • Job C cityjobCF100
  • Title Assistant Accountant Qualification
    University/College Working experience not
    specified 
  • Job D cityjobCEUJ0
  • Title Assistant Accountant Qualification Not
    specified Working experience two years

18
  • This group of 4 jobs has a
  • Confidence Value of 50.3
  • given a visitor enquires about job A, the
    probability that he would also enquire about jobs
    B, C, and D is 0.503
  • Lift Value of 298.46
  • if a visitor has enquired about job A, he is
    almost 300 times more likely to enquire about
    jobs B, C, and D than a visitor chosen at random.

19
2. By Popularity Index
For example,if
  • Job A cityjobCDU20
  • Title EXECUTIVE TRAINEE - INVESTMENT PRODUCTS,
    Type FIN, Working Experience 0, Qualification
    UC, Industry BNK, Level JUN, Index of
    popularity 64.9.

then (with same type, industry and qualification)
  • Job B cityjobCM470
  • Title ASSOCIATE (TREASURY), Type FIN, Working
    Experience 3, Qualification UC, Industry BNK,
    Level JUN, Index of popularity 59.2.
  •   Job C cityjobCM470
  • Title ASSOCIATES (CRM), Type FIN, Working
    Experience 2, Qualification UC, Industry BNK,
    Level JUN, Index of popularity 44.6.
  •   Job D cityjobCFLC0
  • Title DEALER INVESTOR ADVISOR, Type FIN,
    Working Experience 3, Qualification UC,
    Industry BNK, Level PRO, Index of popularity
    36.6.

20
  • . Predictive Models
  • Churn (Attrition) model
  • To identify subscribers with high likelihood of
    ceasing their current activity of visiting the
    Web site,thus the Cityjob.com can take action to
    retain them. It is often less expensive to retain
    them than it is to win them back.
  • Popular job model
  • What are the characteristics of jobs that
    would attract more visitors? Are they related to
    their job type and job industry?

21
  • 1. The Churn (Attrition) Model
  • Sample All subscribers of Cityjob.com.
  •   Dependent Variable Visit 1 if the
    subscriber has
  • visited the Cityjob.com during the study
    period
  • Visit 0 otherwise.

22
  • Factors used Gender Age Educational Level
  • dummy variables for interest and
    country
  • no. of days since registration.
  •  Sampling procedure Stratified sampling based on
  • the variable Visit is used to
    obtain equal number
  • of observations from the two groups
    of
  • subscribers (Y1 and Y0).
  • Data partition Training data 70, Validation
    data 30

23
(No Transcript)
24
  • Lift Chart
  •  Churn model
  • (logistic regression )
  • important factors
  • No. of days since registration
  • Educational level,
  • Gender
  • Whether has interest in computer games or not.

25
  • 2. The Popular Job Model
  • Sample All jobs advertised on the
    Cityjob.com.
  •  Dependent Variable Popular 1 if the job
    has been
  • visited for at least 20 times, Popular
    0 otherwise.  

26
  • Factors used Dummy variables for different job
    types,
  • job industries, job level,
    qualification required,
  • working experience.
  •   Data partition Training data 70, Validation
    data 30
  •   Missing values missing values for working
    experience
  • and qualification required were replaced
    by 0 and
  • 3 (Secondary school completed)
    respectively.

27
(No Transcript)
28
  • Lift Chart
  • popular job model
  • (logistic regression )
  • Important factors
  • 1. higher qualification(more likely)
  • 2. higher level (more likely)
  • 3. jobs industries
  • accounting, banking, building ,
  • construction ( more likely )
  • 4. jobs types
  •   art/design/creative, engineering,
  • sales (less likely)

29
  • . Recommendation
  • Web Design 
  • a. To develop a collaborative filtering
    system
  • b. To include a popularity index

2. Marketing Strategies
a. To develop appropriate marketing strategies
for customer retention
b. To develop Cityjob.coms own web monitor
system
30
  • .Unexpected Discovery

There was a user who came everyday during the
study period at exactly the same time (400 a.m.
HK time) and stayed for one to three hours
browsing more than 500 pages each time (average 5
sec. per page).
31
The End
About PowerShow.com