Is your software worth your money Introducing GCDkit family of tools for presentation and interpreta - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Is your software worth your money Introducing GCDkit family of tools for presentation and interpreta

Description:

Colin M. Farrow. Computing Service, University of Glasgow. Is your software worth your money? Introducing GCDkit family of tools for presentation and interpretation ... – PowerPoint PPT presentation

Number of Views:226
Avg rating:3.0/5.0
Slides: 68
Provided by: vojtech2
Category:

less

Transcript and Presenter's Notes

Title: Is your software worth your money Introducing GCDkit family of tools for presentation and interpreta


1
Is your software worth your money? Introducing
GCDkit family of tools for presentation and
interpretation of compositional data in igneous
geochemistry
Vojtech JanouekCzech Geological Survey
Charles University, Prague Vojtech ErbanCzech
Geological Survey, Prague Colin M.
FarrowComputing Service, University of Glasgow
Lucie Tajcmanová, Elika ácková, Axel Renno,
Christian Bertoldi
2
(No Transcript)
3
Spreadsheets
Advantages
  • Widespread
  • Easy to use
  • Zero extra costs
  • DIY (mostly)

Disadvantages
  • Scarcity of dedicated applications - DIY (mostly)
  • Complex, prone to errors
  • Low efficiency for repeated tasks
  • Limited protection of the primary data
  • Low quality of graphical output

4
Dedicated software
  • DOS/MS Windows

5
Dedicated software
Disadvantages
  • Lack of documentation (black box)
  • Incomplete difficult to modify (source not
    available, legal problems, programming
    required)
  • Complicated data input/import
  • Poor quality of graphical output
  • (User interface)
  • (Price)

6
Solution ?!
  • Complex statistical
  • computing environments
  • (S-Plus, Statistica,
  • MatLab, Mathematica Co.)

7
(No Transcript)
8
R an open source alternative
  • What is R? A bit of history
  • R programming language/statistical environment
    designed by Ihaka Gentleman (1996), Dept.
    Statistics, Auckland
  • Since 1997 developed by R Core Team of about a
    dozen people, CRAN (http//www.r-project.org)
  • Syntax based on S (S PLUS) language by Becker,
    Chambers Wilks, ATT Bell Labs (Becker et al.
    1988)
  • Free/open-source software distributed under GNU
    copyleft
  • Current version 2.6 (1.0 published on 29 Feb 2000)

9
R an open source alternative
  • Environment for statistics and graphics
  • Large collection of statistical and database
    tools
  • Graphical facilities for data exploration and
    plotting, high-level graphical output
  • Effective programming language, modular (most
    functions of R are written in R)
  • Great number of additional packages
  • Compatible with commercially successful S/S-PLUS

10
R an open source alternative
  • Features of open source software
  • complicated syntax steep learning curve
  • fragmentary documentationbut rapidly improving
    e.g., Wikipaedia, new books documentation to S
    PLUS
  • excellent control over individual functions
    power
  • freeware, frequently updated, growing community
  • available for all main OS (Mac, M Win, Unix...)

11
(No Transcript)
12
Geochemical Data Toolkit GCDkit
OUR AIMS
  • Build a more human interface to the wealth of
    functions in R
  • Add standard geochemical calculations plots
  • Provide a whole family of tools for
    interpretation of whole-rock trace- and
    major-element data, Sr-Nd isotopic data (GCDkit
    proper), recalculation and classification of
    mineral (electron microprobe) analyses
    (GCDkit-Mineral), (watch out - more GCDkits to
    come ?)
  • Windows-like GUI no programming necessary!!
  • Data ready for further handling under R (dot
    prompt veterans)

13
(No Transcript)
14
GCDkit (s.s.)
MAIN FEATURES
  • Standard geochemical calculations involving
    major-, trace-element as well as Sr-Nd data
  • Effective data management (searching, grouping)
  • Common plots (binary, ternary, spider)
  • Publication quality graphics
  • Modular architecture ( easily expandable and
    modifiable)
  • Transparent functionality availability (open
    source, WWW)

15
GCDkit-Mineral (under development)
MAIN FEATURES
  • Recalculation of mineral formulae (a.p.f.u.)
    using several possible schemes (fixed no. of O
    equivalents, cations, negative charges in the
    whole formula, specified crystallographic
    position, given cations)
  • Estimation of FeII/FeIII ratio when iron
    oxidation state is unknown
  • Export of structural formulae into HTML (correct
    formatting)
  • Assignment of atoms to appropriate structural
    positions
  • Recasting the individual analyses to end members

16
GCDkit-Mineral (under development)
MAIN FEATURES
  • User-defined parameters (formulae, Mg/(MgFe_A),
    external scripts)
  • Classifications based on IMA rules (so far
    available only pyroxenes, feldspars and
    amphiboles, micas in progress)
  • Control over the recalculation algorithms (stored
    in lucid external databases)
  • GCDkit-like usability (searching, subsetting,
    grouping, flexible and publication quality
    graphics, modularity, transparent functionality
    open source availability)

17
Data handling Importing data
  • Input data in tab-delimited text files or pasting
    from clipboard (free-form, missing values
    allowed)
  • Import from XLS, MDB, DBF (Excel, Access, other
    databases)
  • Import from geochemical databases such as GEOROC
    and PETDB
  • Loading data from other popular geochemical
    packages (NewPet, IgPet, PetroGraph)
  • Adding user-defined variables
  • Editing variables and samples in a spreadsheet

18
Data format for GCDkit(text and XLS files)
19
Correct data format
  • Text files are separated by tabulators, commas or
    semicolons
  • The 1st line contains unique labels for the data
    columns (e.g. SiO2, Fe2O3, Nd), the 1st
    column unique sample IDs
  • Decimal commas are converted to decimal points if
    appropriate
  • Missing values are allowed anywhere in the data
    file
  • As such are interpreted also values 0, or any
    of NA,N.A.,-, b.d., bdor can be
    replaced by half of the detection limit
  • Total iron as ferrous oxideFeOt or FeO
  • Structurally bound water H2O.PLUS, H2O,
    H2OPLUS, H2O_PLUS

20
Correct data format
  • Column Symbol (if any) plotting symbols (as
    codes or single characters)
  • A column with name starts with Col (if any)
    specification of plotting colour (code or English
    name)
  • Column cex (if any) magnification of plotting
    symbols
  • Column Mineral (if any) showing which mineral
    species the analysis belongs to (if not
    specified, the user is prompted for this info
    interactively) GCDkit-Mineral only
  • Avoid special symbols in the column names, and,
    if possible, accented characters throughout the
    file!
  • !Avoid the symbol which is used in R to
    indicate comments!

21
Plotting symbols colours
22
Plotting symbols and colours
Use codes from the table or single character
vectors as ,B,s
23
Plotting symbols and colours
24
Specifying a variable in GCDkit
25
Specifying a single variable
  • Enter complete name of a variable (e.g., SiO2)
  • Type only part of the variable name. If the
    result is ambiguous, the desired variable has to
    be selected from the list of the multiple matches
    by mouse (applies also for empty patterns)
  • Specify the variable sequence number (2 for the
    second one).
  • Often if a formula is entered, the results are
    interpreted and computed by the calculation core.

S
26
Data handling Grouping
  • GROUP samples that belong together on the
    basis of
  • identical labelthe same rock type, intrusion,
    locality, ...
  • value of a numeric variableSiO2 lt65 , SiO2
    65-70 , SiO2 gt70
  • position in a classification diagrame.g. TAS
    diagram rhyolites, basalts...
  • cluster analysis
  • groups by outline (defined interactively on a
    diagram)

Statistics Plotting symbols Plotting
colours Special diagrams
27
Data handling Grouping
28
Data handling Grouping
29
Data handling Grouping
30
Data handling Searching subsets
  • Range of samples
  • Boolean conditions
  • Regular expressions in sample names or textual
    labels
  • Subsets by diagram

SuiteSázava.AND.SiO2gt55
31
Data handling Searching subsets
  • Subsets by diagram

32
Sample selection (Searching and subsetting)
33
Searching and subsetting
  • The search pattern is first tested whether it
    could be interpreted as a query of the sample
    name(s). The list of exact sample names separated
    by commas is allowed.
  • The pattern is assumed to correspond to a
    selection of sample sequence numbers.
  • Lastly the search pattern is interpreted as a
    Boolean condition.
  • Entering empty pattern usually returns all the
    samples in the data set.

34
Searching and subsetting
1. By sample name
  • Search pattern ozSamples with names Koz,
    KozD-5, Roz-5
  • Search pattern Bl-1,Bl-2,Koz-3Samples with
    names Bl-1,Bl-2,Koz-3
  • Regular expressions are implemented (advanced
    technique)Search pattern BSamples with names
    Bl-1, BlD-2 ...

35
Searching and subsetting
2. By sample range
In this case the search pattern is treated as a
selection of sample sequence numbers (effectively
a list separated by commas that may also contain
ranges expressed by colons).
  • Search pattern 15 First to fifth samples in
    the data set
  • Search pattern 1,10 First and tenth samples
  • Search pattern 15, 1011, 25 Samples number
    1, 2, ...5, 10, 11, 25

36
Searching and subsetting
3. By Boolean conditions
  • Patterns may employ variable names and in R
    common comparison operators (see Table).
  • The character strings should be quoted.
  • The conditions can be combined together by
    logical and, or and brackets.
  • Logical and can be expressed as .and. .AND.
  • Logical or can be expressed as .or. .OR.
  • Regular expressions can be employed

37
Searching and subsetting
3. By Boolean conditions
  • Search pattern Intrusion"Rhum Finds all
    analyses from Rhum
  • Search pattern Intrusion"Rhum".and.SiO2gt65Searc
    h pattern Intrusion"Rhum".AND.SiO2gt65Search
    pattern Intrusion"Rhum"SiO2gt65 All analyses
    from Rhum with silica greater than 65 (all
    three expressions are equivalent)
  • Search pattern MgOgt10(Locality"Skye"Locality"
    Islay") All analyses from Skye or Islay with
    MgO greater than 10

38
Descriptive statistics
  • Histograms
  • Box-and-whiskers plots
  • Correlation matrices
  • Principal components
  • Cluster analysis

SiO2,A/CNK,mg
  • ...many others (including standard R functions)

39
Plotting
  • Binary plots
  • Ternary plots
  • 3D plots
  • XYZ plots
  • Coplots
  • Anomaly plots

40
Spiderplots
  • Selected samples 17 standards by sample by
    average
  • By groups shaded fields for each an extra
    window
  • Spider boxplots ( normalization by a sample)

41
Classification
42
Classification
43
Figaro Plot editing
  • A set of graphical utilities for R implemented in
    GCDkit
  • Tools to create figure objects, containing both
    data and methods to make subsequent changes to
    plot
  • Allows a degree of interactive editing before
    committing to hardcopy.

44
Figaro Plot editing
  • edit title, subtitle
  • edit axis labels
  • zoom in graph
  • add legend
  • interactive identification
  • export (.ps, .wmf) (e.g, CorelDraw!)

AEOL126
AEOL125
AEOL125
45
Figaro Plot editing
46
Saving graphics in GCDkit
47
Saving graphics in GCDkit
  • Menu GCDkit Save all graphics - saves all
    opened graphs in a single Postscript (.ps) or
    .pdf file
  • Menu File offers variable export filters for
    currently opened graphical window
  • Similar selection is accessible via the context
    menu (right click by the mouse on the plot
    window)

48
Saving graphics in GCDkit
Each of the graphs saved separately, the graphs
are numbered
Single PDF file with multiple pages
The great spring cleaning
49
Multiple plots
50
Specifying multiple variables
51
Specifying multiple variables
  • List of column name(s), in full, separated by
    commas
  • Sequence numbers of variables or their ranges
    (1,1015)
  • Name of a built-in list, such as LILE, REE,
    major and HFSE or their combinations with the
    column names

SiO2,TiO2,K2O,MgO
  • User-defined list simple character vector.
    Currently only a single, stand-alone user-defined
    list can be employed as a search criterion
  • For empty patterns, the correct name(s) has to be
    selected by mouse click(s) ( Shift Ctrl) from
    the list of the available variables

52
Specifying multiple variables
  • Search pattern majorSiO2, TiO2, Al2O3, Fe2O3,
    FeO, MnO, MgO, CaO, Na2O, K2O, P2O5
  • Search pattern LILERb, Sr, Ba, K, Cs, Li
  • Search pattern HFSENb, Zr, Hf, Ti, Ta, La, Ce,
    Y, Ga, Sc, Th, U
  • Search pattern REELa, Ce, Pr, Nd, Sm, Eu, Gd,
    Tb, Dy, Ho, Er, Tm, Yb, Lu
  • Search pattern 15,7Numeric data columns
    number 1, 2, ...5, 7
  • User-defined listmy.elemslt-c("Rb","Sr","Ba")Se
    arch pattern my.elemsRb, Sr, Ba

53
Calculations
Na2OK2O
SiO2,FeOt/TiO2,3CaO,MgO
Built-in calculation engine
54
Formulae calculation core
55
Formulae and calculation core
Formula can involve any combination of names of
existing numerical columns, with the constants,
brackets, arithmetic operators -/ and R
functions.
Examples of valid formulae
  • (Na2OK2O)/CaO
  • Rb2
  • log10(Sr)
  • mean(SiO2)/10

56
Calculations
  • Common recalculations
  • Millications, anhydrous basis
  • Various indices (Larsens, Kunos)
  • Norms (Nigglis values, CIPW, Catanorm, Granite
    Mesonorm etc.)
  • Custom variables formulae ( scripts)
  • Ready for standard R functions

57
Calculations
CIPW normative compositions
Appending calculation results to the data
58
Calculations
59
Calculations
60
Plugins
  • Zircon, monazite apatite saturation
    calculations
  • Sr-Nd isotopic data (initial ratios, model ages,
    isochrons )
  • Tetrad effect
  • R code files stored in directory Plugin
  • All executed upon loading new data
  • Additional functions, accessible via newly
    appended menu items
  • A platform for DIY additions written by R
    literate geochemists

61
Plugins
  • Contour plots
  • isolines

Andean volcanism
62
Future?
Our mission
  • Write new plugins, e.g. for modelling of
    petrogenetic processes in igneous geochemistry
    (such as crystallization of the magma)
  • Release eventually a platform-independent system,
    using Tcl/Tk-based interface on Unix/Linux and
    Mac
  • Trigger user feedback (bug reports, contributed
    code)

63
Future?GCDkit-Mineral released (at last)
64
Future?
... is bright!
65
8 February 8 October 2007
66
Acknowledgements
Financial support from Austrian Science
Foundation (FWF 15133GEO), Czech Grant Agency
(GACR 205/01/0331) and Czech Geological Survey
(3314) is gratefully acknowledged.
  • YOU for attention (?!)
  • R Development Core Team (for R)
  • Brave b testers (for brevity)
  • Sample data for Andean igneous rocks (n gt 2000)
    from database GEOROC, University of Mainz
  • Mineral recalculation schemes Lucie Tajcmanová
    (Prague), Elika ácková (Prague), Axel Renno
    (Freiberg), Christian Bertoldi (Salzburg)

http//www.r-project.org
67
http//www.gla.ac.uk/gcdkit
Write a Comment
User Comments (0)
About PowerShow.com