Title: Towards a typology of web registers: A multidimensional analysis
1Towards a typology of web registers A
multi-dimensional analysis
- Douglas Biber
- Northern Arizona University
- (Collaborating researchers Jerry Kurjian and
James K. Jones)
2Methodology for construction of the web corpus
- Two Google categories chosen for analysis Home
and Science - Multiple Google sub-categories under each
top-level category - Home
- Apartment Living. Consumer Information,
Cooking, Do-It-Yourself, Domestic Services,
Emergency Preparation, Entertaining, Family,
Gardens, Home Automation, Home Business, Home
Buyers, Home Improvement, Homemaking, Homeowners,
Moving and Relocating, News and Media, Personal
Finance, Personal Organization, Pets, Rural
Living, Seniors, Shopping, Software, Urban Living - Science
- Agriculture, Anomalies and Alternative
Science, Astronomy, Biology, Chats and Forums,
Chemistry, Conferences, Earth Sciences,
Educational Resources, Employment, Environment,
History of Science, Institutions, Instruments and
Supplies, Math, Methods and Techniques, Museums,
News, Philosophy of Science, Physics,
Publications, Reference, Science in Society,
Social Sciences, Software, Technology, Women
3Construction of the web corpus (cont.)
- Download method
- For each sub-category, webpages from two websites
were saved - Each website contributed approximately 50
webpages. - Thus, each sub-category contributed approximately
100 webpages to its corpus. - The Science sub-corpus consists of webpages from
81 websites - The Home sub-corpus from 63 websites
4Construction of the web corpus (cont.)
- Website selection
- Used the Google list of linked websites, ordered
by rank of relevance. Chose the first and last
website from the list. - Sites near the top of the list were nearly always
large "authoritative" commercial or governmental
sites sites toward the end of the list were
often smaller, more personal sites. - Sampling method
- Automatic browser downloaded c. 200 webpages per
site. - Every 4th webpage was selected.
5Composition of the corpus of web documents
- The corpus extracted from the Web
- Home Science
- total documents 2426 2678
- documents 200 words 1765 1905
- unproblematic documents 1400 1576
- (i.e., adjectives)
- The corpus used for subsequent analyses
- of documents of words average length
of document - Home 1400 1.68 million 1201 words
- Science 1576 2.06 million 1308 words
- Total 2976 documents 3.74 million words
6Results of the factor analysis 4 factor
solution Promax rotation
-
Factor 1 Factor 2 Factor 3
Factor 4 - Factor 1 Features
- Positive
- mentalv 0.52866 0.13219
0.48879 0.15881 - that_del 0.50498 -0.05699
0.36928 0.07394 - pro3 0.49600 -0.08733
-0.00962 -0.07585 - pro1 0.48887 -0.08592
0.34927 -0.03674 - fact_vth 0.48566 -0.04446
0.13415 0.06093 - factadvl 0.45301 0.17506
0.10033 0.01065 - commv 0.43627 0.07929
0.09676 0.25554 - nonf_vth 0.40773 0.02291
-0.04169 0.11524 - perfects 0.39889 -0.06459
-0.14479 -0.05132 - lkly_vth 0.36778 0.03769
0.13411 0.12921 - sub_othr 0.32987 0.22566
-0.01789 -0.12808 - it 0.30174 0.28261
0.01546 -0.08610 - all_nth 0.29096 0.14375
-0.09966 0.19167 - Negative
- (nouns -0.55720 -0.56705
0.05097 -0.00803
7Table 2. Results of the factor analysis(cont.)
- Factor 3 Features
- Positive
- pro2 -0.20975 0.23060
0.67108 -0.08567 - vprogrsv 0.10331 -0.03972
0.40924 0.04331 - dsre_vto 0.13164 0.04493
0.40622 0.05607 - groupn 0.05023 -0.17888
0.36500 0.13152 - actv 0.13322 0.16240
0.32304 -0.15965 - wh_cl 0.16565 0.07859
0.28940 0.05465 - Negative
- prep 0.18466 -0.05765
-0.48977 0.14476 - allpasv -0.02260 0.22100
-0.45995 0.05585 -
- Factor 4 Features
- Positive
- n_nom -0.12713 -0.09568
0.09601 0.80425 - abstrctn 0.00280 0.15802
0.08745 0.65036 - wrdlngth -0.19665 -0.21485
-0.10344 0.66341 - cognitn 0.22940 0.17306
-0.03108 0.41358
8- Inter-Factor Correlations
- Factor1 Factor2
Factor3 Factor4 - Factor1 1.00000 0.30424
0.12491 -0.28968 - Factor2 0.30424 1.00000
0.40607 -0.23135 - Factor3 0.12491 0.40607
1.00000 -0.32306 - Factor4 -0.28968 -0.23135
-0.32306 1.00000
9Table 3 Summary of the factorial structure
- Dimension 1 Personal, Involved (Stance-focused)
Narration - Features with positive loadings past tense,
- mental verbs, that-deletions, 3rd person
pronouns, - 1st person pronouns, certainty/mental verb
that-clause, certainty adverbials, communication
verbs, communication verb that-clause, perfect
aspect, likelihood/mental verb that-clause,
other adverbial clause, pronoun it, indefinite
pronouns, - noun that-clause
- Features with negative loadings nouns
10Table 3 Summary of the factorial structure
(cont.)
- Dimension 2 Persuasive/argumentative discourse
- Features with positive loadings present tense,
possibility modals, main verb be, predicative
adjectives, conditional adverbial clauses,
linking adverbials, necessity modals,
demonstrative pronouns, prediction modals, split
auxiliaries - Features with negative loadings
- nouns, past tense
11Table 3 Summary of the factorial structure
(cont.)
- Dimension 3 Advice ??
- Features with positive loadings
- 2nd person pronouns, progressive verbs,
- desire verb to-clause, group nouns, activity
verbs, WH clauses - Features with negative loadings prepositions,
passive verbs
12Table 3 Summary of the factorial structure
(cont.)
- Dimension 4 Abstract/technical discourse
- Features with positive loadings
- nominalizations, abstract nouns, long words,
cognitive nouns, topic adjectives, attributive
adjectives - Features with negative loadings
- concrete nouns
13Distribution of texts from two Google categories
on Dimension 1 Personal Narration
14Distribution of texts from two Google categories
on Dimension 2 Persuasion
15Distribution of texts from two Google categories
on Dimension 3 Advice
16Distribution of texts from two Google categories
on Dimension 4 Technical Discourse
17Distribution of texts from Google sub-categories
on Dimension 1
18Duncans test for Home subcategories Dimension 1
(Means with the same letter are not
significantly different )
- Duncan Grouping Mean
N Category - A -169.61 58
family - A -177.58 62
seniors - B A -194.86 64
personalorg - B A C -201.70 96
urban - B D C -227.00 112
smallbiz - E D C -242.62 74
ruralliv - E D F -254.00 98
finance - E D F -260.13 54
domestic - E D F -261.62 57
shopping - E G D F -271.36 65
emergency - E G H F -280.42 85
pets - I E G H F -287.48 37
realest - I J G H F -293.83 60
cook - I J G H F -297.46 80
entertain - I J G H F -297.91 73
consuminfo - I J G H -312.85 48
homeowner - I J H K -321.90 49
diy - I J H K -328.79 16
moving
19Plot of web documents along Dimensions 1 and 4
- Dimension 1
-
- 8
- 8
- 8
- 40 8 8
- 8 8 8 88
- 8 8 8 88
- 8 8888 8 88 8
- 8 8 88 6888 6
- 86 8 8 888686 6 8 666 8 6
- 86 66866 88 8 8666 3 82
6 6 - 20 8 6 8 368 86368688 636 666 6
6 - 6366866336336666266616
62 22 - 1 3666 666663363363663363661613
2226 6 2 2 2 - 6 6 3 333 663631666366362363222
226222 22 2 2 - 3 3633136363333363636626326223
22622 222222222 2 2 - 3 3331333633333363636333632332
22322222222 222222 22 - 3 131 3 33 633333333333333433366322
2222222222222 22222222 2 2
20Table 4 Summary of the Cluster Analysis
-
Maximum Distance - from Seed
Nearest Distance Between - Cluster Frequency to Observation
Cluster Cluster Centroids
--------------------------------------------------
------ - 1 428 27.83 4
14.72 - 2 490 22.09 3
17.24 - 3 599 24.33 2
17.24 - 4 503 25.36 1
14.72 - 5 620 21.50 1
18.22 - 6 244 30.02 3
18.06 - 7 21 23.32 5
22.22 - 8 71 24.09 6
19.52
21Cluster means for each dimension
- Cluster Dim. 1 Dim. 2 Dim. 3
Dim. 4 - Pers. Narr. Persuasion Advice
Technical
--------------------------------------------------
---- - 1 -3.84 -7.38 -9.03
-6.72 - 2 4.20 5.64 -3.31
7.07 - 3 5.44 6.93 5.77
-7.46 - 4 -6.48 -4.76 4.29
-1.71 - 5 -9.18 -9.56 -8.10
10.54 - 6 14.18 17.48 17.49
-6.41 - 7 -9.65 -9.44 -9.36
32.72 - 8 28.30 7.04 11.51
-12.53
22(No Transcript)
23(No Transcript)
24Breakdown of Home and Science Web documents
across the 8 text types
- CLUSTER Web Category
-
- Home Science Total
- -----------------------------------
- 1 150 278 428
- 10.7 17.6
- -----------------------------------
- 2 149 341 490
- 10.6 21.6
- -----------------------------------
- 3 385 214 599
- 27.5 13.5
- -----------------------------------
- 4 344 159 503
- 24.5 10.1
- -----------------------------------
- 5 139 481 620
Informational discourse - 9.9 30.5
25Breakdown of selected Web sub-categories across
the 8 text types
- CLUSTER Web Sub-categories
(within Home and Science) - altsci earthscifamily finance
hist home- seniors tech Total -
owner (all docs) - ----------------------------------------------
-------------------------------------------- - 1 6 13 3 1
38 0 1 4 428 - 11.1 23.2 5.1 1.0
54.3 0.0 1.6 5.3 - ----------------------------------------------
-------------------------------------------- - 2 15 11 0 20
7 2 6 12 490 - 27.7 19.6 0.0 20.4
10.0 4.1 9.6 16.0 - ----------------------------------------------
-------------------------------------------- - 3 13 4 16 37
8 25 11 9 599 - 24.0 7.1 27.5 37.7
11.4 52.0 17.7 12.0 - ----------------------------------------------
-------------------------------------------- - 4 1 8 12 21
8 21 14 7 503 - 1.8 14.2 20.6 21.4
11.4 43.7 22.5 9.3 - ----------------------------------------------
-------------------------------------------- - 5 10 17 6 5
8 0 4 39 620 - Informational 18.5 30.3 10.3 5.1
11.4 0.0 6.4 52.0
26Home / Family Web Page Text Type 5
Informational discourse
- General Science Information
- Amino Acids - Symbols, formulas and 3D images.
- Bird Species - Pictures and scientific names will
help improve your identification skills.
Includes herons, sparrows, warblers, woodpeckers,
owls and more. - Chemicool Periodic Table - Search and learn about
the elements. - Entomology for Beginners - the basics of insect
study. - Grasshopper - science links and a list of cool
museums to visit. - Human Anatomy 1994 - These x-rays have labeled
body parts. - K-12 WWW Links - links to sites for answers to
any science question. - Mad Scientist Network,The - answers to science
questions. - Microworlds - This interactive tour uses
graphics, photos and text to explore the - structure of materials.
- SciEd
- Science Bytes, from UT
- Science Education Gate-Way - K-12 science
education resource center for teachers and
students with learning adventures in Earth and
Space science from a NASA-sponsored partnership
of museums, researchers and educators. - Science Learning Center - Access to exhibits,
publications, museums and more.
27Home / Family Web Page Text Type 6 Persuasive
Advice
- What is the Mom Team??
- The Mom Team is an organization that is
dedicated to assisting, training and supporting
others who would like to work from home with
their own business. - What kind of business is it?
- All members of the MOM Team are simply customers
of a wonderful company where we save time, money,
provide a safer environment for our homes and
improve our health. - Everyone also has the option to own their own
business to add to their income, replace an
income or more depending on their own personal
goals. - It was just announced this morning that for the
month of June, you can join our awesome group and
begin living the dream of working from home for
only ONE DOLLAR!! This is incredible and we
didn't want you miss the opportunity to take
advantage of this awesome promotion. - How much income can I earn?
- It's up to you. You can earn a few hundred
dollars a month or even thousands each month
depending on you and your own personal goals. - Do you have to sell products?
- No. We don't sell, or stock any products. We
don't have to deliver anything or collect any
money. - What do I need to be able to run this business
from my home? - You need a computer (or access to one), a
telephone, and a willingness to become part of
our team and use our proven system. - How much does this cost to get started?
- You can get started for just 29.00 US.
28Science / Alternative Science Web Page (Part 1)
Text Type 7 Technical discourse
- Father Jerome's SPECIALIZED DICTIONARY of
PSYCHOSOCIOLOGICAL KEYWORDS/PHRASES used in his
QUALIA III Monograph. - This SPECIALIZED DICTIONARY (and the factual
sociological content of the QUALIA III Monograph)
is derived from a book by one of the 20th
Century's greatest and foremost Sociologists, Dr.
Pierre Bourdieu, of one of France's premiere
graduate Institutes, the College de France,
Paris, where Dr. Bourdieu is the President of the
College of Sociology. His book is Reproduction in
Education, Society and Culture, where
'Reproduction' means the reproducing, producing,
and continuing, of the existing 'status quo', or
social means, or mechanisms, or, in reality, the
underlying structure, of any society, by which
the 'Rulers', the Nobility, the Aristocracy
(i.e., the Rich), actually 'rule' and control
that society, and continue their societal rule,
from generation to generation, by 'training' and
placing their progeny, their children, in the
'positions of power' throughout society which
enable them to inherit not only the 'riches' but
also the 'power', as passed on to them by that
ruling class. -
29Science / Alternative Science Web Page (Part 2)
Text Type 7 Technical discourse
- DEFINITIONS of Keywords/Phrases used in the
QUALIA III Monograph - a full account of the selection process
- a negatively constructive societal system
- absolute societal control
- academia's essential internal function
- academia's ideological function
- academic autonomy and class relations
- academic consecration
-
- theodicy
- theoretical construction
- title
- traditional economic conduct
- tyrannical positivity
- U.S. statistics
- ultimate rationale
- ultimate truth
- unconscious sanctions anticipation
- violence
30Home / Family Web Page Text Type 8 Personal
Narrative
- Shelters in My Storm
- by Cyd
- Shalece
- The biggest help in my experience of foster
care was a three year old. - Shalece taught me so much about how to love truly
and without asking - anything in return. She taught me what it means
to be family, when from - the day I walked into her house, I was her big
sister. She never let me - forget that even when I had to leave. To this
day she is excited when I - come to see her. She has never let me down. I
love and trust her more - than anyone else. Her parents were also of great
help to me, but they could - never have reached me like that tiny little girl
with the large heart did from - day one. Some day when she is old Some day when
she is old enough to - understand, I think I will show her this to let
her know I really feel grateful - to her. I think people have a hard time seeing
that from me.