Demonstration of the Java version of the PinkhamPearson index for the comparison of community struct - PowerPoint PPT Presentation

1 / 3
About This Presentation
Title:

Demonstration of the Java version of the PinkhamPearson index for the comparison of community struct

Description:

Demonstration of the Java version of the Pinkham-Pearson index for the ... EATR EB-TR-75013, Mar - Sep 74, 41pp. Pinkham, C.F.A. In prep. ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 4
Provided by: www2No
Category:

less

Transcript and Presenter's Notes

Title: Demonstration of the Java version of the PinkhamPearson index for the comparison of community struct


1
Demonstration of the Java version of the
Pinkham-Pearson index for the comparison of
community structure By Pinkham, C. F. A., J.
Gareth Pearson, Brian P. Reid, and Victor T.
Chevalier
Figure 2 shows the screen from BioSim2 after
entering the data. This screen is obtained by
opening the program, typing in the titles for the
overall analysis, rows and columns, selecting
which version of B to use, determining what to do
with low denominators, and pasting the data in
the data field. These data are then processed by
selecting Process Data from the Actions
option on the menu bar. The entire process
should only take seconds. Note the scroll bar at
the bottom of the data field, indicating the
compressed data matrix is wider than the screen
display. A similar scroll bar appears at the
side if the compressed data matrix is longer than
the screen display. The only differences between
the opening screen and the screen after the data
is processed, is that the message, Data
successfully processed. is entered where
Welcome to BioSim2.beta.30! is located and the
color of the background bar changes from gray to
green. Notice the tabs at the bottom of the
screen. These will be sequentially selected
below to observe the output of the program.
Abstract The Pinkham-Pearson index of similarity
has been evaluated by EPA as one of the more
powerful tools for comparing community structure
in its rapid bioassessment protocol. However, its
use has been limited because the program that ran
it, BioSim1, was only available in DOS format. A
user-friendly, beta version of BioSim2 is now
available in a Java format that can run on
Windows, Mac OS, Linux, or anything with Java
v1.4 or higher. Its use and power will be
demonstrated using real data. If you have data
you'd like to try, bring them in an xls, csv, or
other common spread sheet format. Copies of the
beta version of BioSim2 will be available.
Finally, Figure 9 shows a transposed, double
dendrogram ordered, original data matrix as it is
printed out by BioSim2 with embedded, colored
annotations and annotations to the side and
bottom that must be done by hand. The former is
obtained by selecting the Print option from the
menu bar and selecting Transposed Reordered
Data. The data displayed on all these screens
can similarly be printed out as separate
pages. Note in this figure that the red vertical
line separates the two divisions indicated by the
first joining branch in the site dendrogram at an
average B of 0.271. The division to the right of
this line is further separated by a green
vertical line into the two divisions indicated by
the second joining branch at an average B of
0.430. Switching to the divisions indicated by
the taxa dendrogram, note that the red horizontal
line separates the two major divisions indicated
by the first joining branch in the taxa
dendrogram at an average B of 0.169. The second
joining branch further separates the division
above the 0.169 line into two divisions at an
average B of 0.233. The third joining branch
subdivides the division below the red line into
two divisions at an average B of 0.268. Finally,
the fourth joining branch subdivides the division
above the top green line into two divisions at an
average B of 0.278. Thus there are three
divisions indicated by the two major joining
branches of the taxa dendrogram and five
divisions indicated by the four major joining
branches in the taxa dendrogram, forming 15
rectangles of sites with taxa having similar
abundances in the community structure. If we
classify 0 as being Absent, 1-10 as being Rare,
11-50 as being Occasional, 51-150 as being
Common, and greater than 150 as being Dominant
(indicated by the pink letters in each
rectangle), it becomes clear that the absence of
DT-Dicr, DCD-Paga and DCO-Lapp from the CC sites,
compared to their mostly rare occurrence in the
PC and CPC sites, coupled with the dominance of
DS-Pros, DS-Meta and EB-Baet in the CC sites
coupled with their mostly occasional occurrence
in the PC and CPC sites, are the major reasons
the CC sites are separating from the other two
sites. Now that different taxa have been
identified that have different places in the
community structures of these sites, the
underlying biological explanation for the
differences can be pursued. In addition to its
printing each of these output files directly,
BioSim2 generates and saves a single HTML page
containing all tables and images of the
dendrograms and plots. Tables are saved
individually in csv format (comma separated
values, ascii text) appropriate for spreadsheets,
and the images are saved as jpg files.
Introduction The index of similarity, B (Pinkham
and Pearson, 1976), was proposed as a means for
determining the impact of pollution on
communities. Pinkham and Pearson showed that the
index overcame many of the shortcomings inherent
in other indexes used for the same purpose and
was more versatile. Its use was coded in a DOS
program (BioSim) (Pinkham et al., 1975). Since
its publication, it has been widely used for
diverse investigations. In 1989, Plafkin et al.,
included it in EPAs rapid bioassessment
protocols for use in streams and rivers. In
1990, it was identified by EPA as one of six
commonly used community similarity indexes in a
manual describing guidelines and standardized
procedures for using benthic macroinvertebrates
to evaluate the biological integrity of surface
waters (Klemm et al., 1990). At least one state,
Vermont, has adopted B as a legal requirement for
assessing surface water quality (Vermont
Department of Environmental Conservation, 1990).
Barbour et al. (1992) in a systematic comparison
of the metrics proposed in EPA's rapid
bioassessment protocol (Pfalkin et al., 1989),
concluded that B "may be the most appropriate
metric to serve as a measure of community
similarity. In almost all published cases of
its use, however, it was not being used to its
full potential. Recognizing this, Pearson and
Pinkham (1992) published a strategy for using it
in an improved DOS version (BioSim1) (Gonzales,
D., et al., 1993). However, this strategy also
failed to encourage a widespread use of its full
capabilities. It soon became apparent that the
major reason for this shortcoming was the DOS
platform of BioSim1. This paper introduces the
Beta version of the Java program (Reid, et. al.,
in prep) that finally overcomes that shortcoming.
Figure 5. The dendrogram of taxa
Figure 2. Appearance of the screen after
entering the data
Figure 6. Cophenetic correlation plot for the
dendrogram of taxa
Figure 3 shows the resulting row dendrogram. A
new feature is the addition of the average
B-value at each joining branch. Note that the
conditions selected in the first screen are
printed at the top and the cophenetic correlation
coefficient for the entire dendrogram is also
given (rcs 0.892).
2
Method BioSim2 was written with an entirely new
appearance and a simpler approach while retaining
most of the features discussed in Pearson and
Pinkham (1992). A major change was to drop the
agglomerative clustering method for forming
dendrograms used in former versions of BioSim
(Bonham-Carter, 1967) in favor of the simpler,
average link method (Pankhurst, 1991). This
method searches through each possible pair of
unlinked parameters and an average B-value is
calculated for the pair see Pearson and Pinkham
(1992) for a definition of these terms. This
average B-value is determined in three possible
situations. 1) Both parameters are not found in
any other cluster formed already. This normally
happens toward the beginning of the process. In
this case, the average is their single B-value.
2) One of these parameters is already part of
another cluster. All B-values involving the
unlinked parameter and every member of this other
cluster is averaged. 3) Both parameters belong
to existing clusters. All B-values between the
two clusters are averaged. After searching the
complete set of unlinked pairs, the pair with the
highest average B-value is linked at that average
value. In addition to the above, the authors
decided 1) to reduce the presentation of the
program to a single screen that would
sequentially display the original data and then
the results 2) to make the conditions of the
original data matrix flexible enough that most
presently-used spreadsheet formats would be
acceptable, 3) to provide a plot of the actual
B-values used to calculate each average B-value
found on the dendrogram, to visualize how well
each joining branch of the dendrogram represents
the actual distribution of B-values that formed
that joining branch, 4) to reorient the
dendrogram 180? so that the plot resulting from
3) could be easily compared with the dendrogram,
5) eliminate the matrix of cophenetic correlation
coefficients (Kaesler, 1970) in favor of the
simpler and more valuable cophenetic correlation
coefficient for the entire dendrogram and 6) to
eliminate many of the choices that needed to be
made in BioSim1 by making most options automatic
in BioSim2.
Figure 7 shows the matrix of Bs for both the
rows and columns. Notice the scroll bars that
appear automatically to accommodate large
matrices. Although not usually examined, these
matrices are good to have to find B values
calculated between all pairs of parameters if
they are needed to check how tightly one
parameter links to another.
Figure 3. The dendrogram of sites
Figure 9. Printout of the original (compressed)
data matrix rearranged in double-dendrogram
order, transposed, and annotated by being broken
into subdivisions of sites and taxa having
similar abundances in community structure.
Figure 4 shows the cophenetic correlation plot
for this diagram. Note that it is not the next
item on the tabs, but it is appropriate to show
it here in line with the dendrogram that it
represents. Each joining branch on the
dendrogram has its appropriate location on the
cophenetic correlation plot and they are lined up
in this presentation so the scatter associated
with each joining branch can readily be assessed.
All points are distributed close to the line, so
this dendrogram is a very good representation of
the relationships among the sites.
Conclusion BioSim2 is a major improvement over
prior versions. Most importantly, it is
compatible with most available platforms. In
addition, it offers a more user-friendly
interface and eliminates the need to make many of
the decisions required before processing. This
beta version is available on Pinkhams web sites
and a users manual (Pearson, et. al) is in
preparation. Users are requested to forward
their comments/problems to either Pinkham or
Pearson at the addresses given so the usefulness
of the BioSim2 can continue to grow. ____ a
http//www2.norwich.edu/pinkhamc/
Figure 7. Row and Column Matrix of Bs.
Figure 8 shows the final, and most desirable
output the original compressed data matrix
rearranged in a two-way table (Pearson and
Pinkham, 1992) representing the original data
rearranged in the order indicated by both
dendrograms (double dendrogram ordered original
data matrix). Note that it is provided in the
manner that the original data were entered and in
a transposed manner. Often one way will fit into
a document better than the other.
Literature Cited Bonham-Carter, G. F. 1967.
Fortran IV program for Q-mode cluster analysis of
non-quantitative data using IBM 7090/7094
computers. Kans. Geol. Surv., Computer
Contribution No. 17. Barbour, M.T., J.L.
Plafkin, B.P. Bradley, C.G. Graves, and R.W.
Wisseman. 1992. Evaluation of EPAs rapid
bioassessment benthic metrics metric redundancy
and variability among reference stream sites,
Environmental Toxicology and Chemistry, 11(4)
437-449. Gonzales, D.A., J.G. Pearson and C.F.A.
Pinkham. 1993. Users manual for BIOSIM1, beta
version 1.0. EPA Environmental Monitoring
Systems Laboratory, Las Vegas, NV. Kaesler, R.L.
1970. The Cophenetic correlation Correlation
Coefficient in Paleoecology, Geological Society
of America Bulletin. Lawrence, KS. pp 1261-1266.
Results/Demonstration The best way to provide a
demonstration is to walk through a sample
study. Figure 1 shows the compressed data matrix
(Peason and Pinkham, 1992) for a study conducted
in Alaska by one of us (Pinkham, In prep) in
2002. Note that this is an Excel spread sheet.
It was merely copied and pasted into the data
field of BioSim2. Also note it is necessary to
have headings for all columns, including column 1.
Figure 4. Cophenetic correlation plot for the
dendrogram of sites
3
Klemm, D.J., P.A. Lewis and J.M. Lazorchak. 1990.
Macroinvertebrate Field and Laboratory Methods
for Evaluating the Biological Integrity of
Surface Waters. Aquatic Biology Branch, Quality
Assurance Research Division, Environmental
Monitoring Systems Laboratory. Cincinnati, OH.
EPA-600/0-90-000, U.S.EPA, Washington,
D.C. Pankhurst, Richard J. 1991. Practical
Taxonomic Computing. Cambridge University Press,
Cambridge, 201 pp. Pearson, J.G., and C.F.A.
Pinkham. 1992. Strategy for data analysis in
environmental surveys emphasizing the index of
biotic similarity and BIOSIM1. Water Environ.
Res., 64901-909. Pearson, J.G., Brian P. Reid,
Carlos F. A. Pinkham, Victor T. Chevalier. In
prep. Users Manual for BiosSim2, a Java-based
computer program to calculate the Pinkham-Pearson
index of similarity. Pflafkin, J.L., M.T.
Barbour, K.D. Porter, S.K. Gross, and R.M.
Hughes. 1989. Rapid Bioassessment Protocols for
Use in Streams and Rivers Benthic
Macroinvertebrates and Fish, US EPA, Washington,
DC, EPA 444/4-89-001. Pinkham, C.F.A., and
Pearson, J.G. 1976. Applications of a new
coefficient of similarity to pollution surveys.
J. Water Pollut. Control Fed., 48, 717. Pinkham,
C.F.A., J.G. Pearson, W.L. Clontz and A.E. Asaki.
1975. A Computer Program for Calculations of
Measures of Biotic Similarity Between Samples and
the Plotting of the Relationship Between These
Measures. EATR EB-TR-75013, Mar - Sep 74,
41pp. Pinkham, C.F.A. In prep. Studies of the
differences in community structure of stream
macroinvertebrates in a Long-Term Ecological
Research Watershed in Alaska, Part 1, differences
around a confluence of two streams.
Figures 5 and 6 show the resulting column
dendrogram and cophenetic correlation plot,
respectively. The resulting scatter is still
fairly close to the line, with the scatter about
the joining branch at 0.278 being the greatest
and thus suggesting that the most caution should
be applied to interpreting conditions revealed by
this joining branch.
Figure 1. The compressed data matrix used for
this demonstration.
Figure 8. Original (compressed) data matrix
rearranged in double-dendrogram order.
Current Address Biology Department, Norwich
University, 158 Harmon Drive, Northfield, VT
05663 USEPA, PO Box 93478, Las Vegas, NV
89193-3478 DMS Computing, Dartmouth Medical
School, 1 Rope Ferry Road, Hanover, New Hampshire
03755 431 Isom Road, Suite 125, San Antonio, TX
78216
Acknowledgements This project was supported in
part by EPSCoR Baccalaureate College Summer
Research Program under NSF Grant Number,
EPS-0236976 and by the generous donation of Dr
Reids time for the Java program from the
Academic Computing Team, Information Technology
Department of Norwich University.
Reid, Brian P., Victor T. Chevalier, C. F. A.
Pinkham, and J. Gareth Pearson. In prep.
BioSim2, a Java-based computer program to
calculate the Pinkham-Pearson index of
similarity. Vermont Department of Environmental
Conservation. 1990. Indirect Discharge Rule.
Chap. 14 Environmental Protection Rules. Agency
of Natural Resources, Waterbury, Vermont.
Write a Comment
User Comments (0)
About PowerShow.com