The CCPN Project - PowerPoint PPT Presentation

1 / 78
About This Presentation
Title:

The CCPN Project

Description:

Data is lost' along the way. Dedicated converters needed ... Bonvin (Haddock, RECOORD) Vriend, Vuister (Queen, What-Check) Henrick, Vranken (NMR database) ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 79
Provided by: TimSt56
Category:
Tags: ccpn | haddock | project

less

Transcript and Presenter's Notes

Title: The CCPN Project


1
The CCPN Project
  • Tim Stevens and Wayne Boucher
  • October 2005

2
CCPN at Göteborg Day 1
  • Introduction to CCPN
  • The CcpNmr applications
  • Analysis basics
  • Future developments
  • Analysis advanced

3
CCPN at Göteborg Day 2
  • An overview of the data model
  • API Tutorial
  • Analysis Macros
  • Widgets and Popups

4
  • CCPN Overview

5
The CCPN Project
  • Collaborative Computing Project for NMR
  • Started in 1999
  • Collaborators in several countries
  • Developers at University of Cambridge and EBI
  • Unifying platform for NMR software
  • Similar to CCP4 (X-ray)
  • Main goals
  • Data standards and data exchange
  • Software development and distribution
  • Meetings to determine and disseminate best
    practice
  • Open source access

6
People
  • Cambridge
  • Ernest Laue
  • Rasmus Fogh
  • Dan ODonovan
  • EBI, Hinxton
  • Kim Henrick
  • John Ionides
  • Wim Vranken
  • Anne Pajon

7
History
  • Workshops
  • EBI (2000, 2001)
  • Washington (2000)
  • Funding
  • BBSRC (2000-2003, 2003-2006)
  • NMRQUAL (2001-2004)
  • TEMBLOR (2002-2005)
  • NMR-EXTEND (2005-2008)

8
NMR Software
  • Problem - Heterogeneous development
  • Lots of proprietary data formats
  • Lots of stand-alone programs
  • Data is lost along the way
  • Dedicated converters needed
  • Not acceptable for structural genomics projects
  • Solution - Unity
  • Data standards
  • Ease of transfer between programs
  • Completeness, integrity, deposition, data mining
  • Libraries

9
Data Format vs. Data Model
  • Data format - How data is stored
  • STAR
  • XML
  • SQL
  • Tab-separated ascii
  • Data model - What data means
  • RCSB (PDB) mmCIF
  • XML DTD or schemas
  • SQL schema

10
CCPN Approach
  • Data model rather than data format
  • Format independent
  • Language independent
  • Scientifically descriptive (NMR)
  • Library (API) in memory manipulation
  • Create, update, delete query objects
  • One for each language
  • Error checking
  • I/O modules load/store data from/to disk
  • One for each (storage format, language)
  • Bookkeeping

11
Application View
User
GUI
Application1
Application2
Application3
API
In Memory Representation (Python, Java, C, Perl)
I/O
Data Store (XML, SQL)
12
Model-Driven Architecture
  • UML Unified Modelling Language
  • Abstract representation of semantics
  • Pictorial
  • Mapping from UML to anything
  • Multi-language
  • Multi-format
  • Architecture neutral (e.g. distributed or not)
  • Power good and bad
  • CCPN uses Object Domain as its UML tool
  • Python as scripting language

13
Autogeneration
Documentation
UML Model
Package 1
APIs
User
Package 2
Python
Package 3
Application
Storage
Java
SQL
Deposition
XML
Perl
Domain
Program
MEMOPS
Experts
Developers
framework
14
Data Model Packages
Reference
CcpNmr
Programs
Citations
Experimental
Laboratory
NMR
Protocols
Samples
Nuclei and
Structure
Molecule
Isotopes
Molecule
Targets
Sequence
Structure and
Compound
Compound
Coordinates
Source
Preparation
Molecular
System
Residue
Project
Organisms,
Template
Tracking
Taxonomy
X
-
ray
Crystallisation
Crystallography
15
UML Example
16
CCPN API
  • Classes for developers
  • Mainly getters and setters
  • More than just code stubs
  • Constraints (e.g. cardinality) enforced
  • Links the hard part
  • Mostly (gt 99) auto generated from UML
  • Some helper functions and constraints hand coded
  • Currently around 360k lines in Python and 650k
    lines in Java

17
Developer Benefits
  • Specified data model and API
  • No I/O code
  • Concentrate on science, not bookkeeping
  • Extendible
  • Application data can be assigned to any object
  • UML model can be extended (packages)
  • Notification system
  • Register interest when specified attribute
    changes (class, not object, level)
  • Undo/Redo (in future)

18
Current Status of API
  • Stable and released
  • Python and XML code generation
  • NMR, molecule description and structure data
    model
  • In testing stages
  • Java and SQL database code generation
  • Protein production data model
  • Preliminary
  • X-ray crystallography data model

19
  • CcpNmr Applications

20
Structural Biology Pipeline
NMR machine
21
NMR Applications
CcpNmr Processing
CcpNmr Analysis
Validation software
ARIA 2.0
CCPN Data Model
Reference data
NMRStar 3.0
CcpNmr FormatConverter
Other formats (NmrView, XEasy, )
22
Main CcpNmr Applications
  • Format Converter
  • Conversion to and from legacy formats
  • Analysis
  • Graphical analysis (e.g. assignment) program
  • Processing (coming soon)
  • Azara process wrapped in data model

23
CcpNmr Format Converter
  • Import/export of data formats to the Data Model
  • For harvesting/deposition purposes
  • Allow people to use or try out the data model
  • Interaction with existing programs
  • Fully or partially handles
  • Ansig, Auremol, Autoassign, Azara, Bruker,
    Charmm, CNS/XPLOR/ARIA, Concoord,
    Diana/Dyana/Cyana, Discover, Fasta, Felix,
    Module, .mol, Molmol, Monte, NmrDraw, NMRPipe,
    NMR-STAR (v2.1.1, v3.0), NmrView, Pdb, Pipp,
    Pistachio, Pronto, Sparky, Talos, Varian, XEasy
  • Sequences, chemical compounds, coordinates, NMR
    measurements, constraints and peak lists,
    processing and acquisition parameters.

24
Format Converter - The NMR Translator
Peaks
Chemical shifts
Acquisition parameters
XEasy
NmrView
XEasy
NmrView
Bruker
Varian
...
...
Format specific readers
Generic peak converter
Generic chemical shift converter
Generic acquisition parameters converter
Data model entry
CCPN Data Model
Format specific writers
XEasy
XEasy
NmrView
NMRPipe
Azara
...
...
NmrView
Chemical shifts
Peaks
Processing parameters
25
Format Converter Design
  • Wim Vranken (EBI)
  • Set of Python scripts
  • Accessed via
  • Tkinter (Tcl/Tk)
  • custom Python scripts
  • http//www.ebi.ac.uk/msd-srv/docs/NMR/NMRtoolkit/m
    ain.html

26
CcpNmr Analysis
  • Requirements
  • Cross platform
  • Scalable
  • Extensible
  • Open and easy scripting language
  • Modern graphical user interface
  • Uses CCPN data model and API
  • Software
  • Python, Tcl/Tk, C, OpenGL
  • (Java, X, Motif)
  • OS
  • Linux, Sun, SGI, OSX (Windows)

27
Spectrum Windows
  • N-dim. windows
  • Multiple spectra
  • Automatic mapping
  • Contours on fly
  • Aliasing
  • Strips cells
  • Mouse and key
  • Blocked data
  • Azara
  • Felix
  • NMRPipe
  • UCSF

28
Graphical Interface
  • Menus and popup dialogues
  • CcpNmr widgets
  • Main objects
  • Spectra
  • Windows
  • Peaks
  • Resonances
  • Molecules
  • Structures

29
Assignment
  • Peak finding and fitting
  • Rich assignment model
  • Mainly mouse-driven
  • Can assign to atoms
  • Ambiguous contributions
  • Existing structure
  • Short resonance list
  • Multiple peaks easily
  • Navigation

30
The CLOUDS Protocol
Spectra
Pick Peaks, Link Shifts Combine
Pick Peaks Normalise
  • Automated assignment structure determination
  • Miguel Llinas, Alex Grishaev, et al.
  • Spatial distribution of anonymous resonances
    generated with NOEs
  • Integrated within CCPN
  • An Analysis module
  • Data Model glues modules
  • Functional platform
  • Distribution network

Spin Systems
NOE intensities
Relaxation Matrix Optimisation
Distance Constraints
Hydrogen Atom Molecular Dynamics
Proton Clouds
Chain Fitting Molecular Replacement
Chain Assignment
Full Structure Calculation
Protein Structure
31
The CLOUDS Protocol
A fitted protein backbone
A family of Clouds
32
Other Features
  • Works with FormatConverter
  • Chemical compounds database
  • NMR reference information
  • Hard copy
  • PostScript
  • PDF
  • Table export
  • Rate analysis
  • Macros
  • Structures

33
CcpNmr Analysis Tutorial Part I
34
CCPN Future
35
Extend-NMR
  • EU STREP application funded to fully integrate
    software from
  • Bruker (TOPSPIN, acquisition)
  • Billeter, Orekhov (Garant, Munin, MDD)
  • Kalbitzer (Auremol)
  • Llinas (CLOUDS)
  • Nilges (Inferential Structure Determination)
  • Bonvin (Haddock, RECOORD)
  • Vriend, Vuister (Queen, What-Check)
  • Henrick, Vranken (NMR database)
  • Focus on complexes and development of better
    software methodology

36
LIMS Collaborations
  • PIMS project collaboration
  • Protein production LIMS (with EBI, Sport
    Consortia, OPPF and Poupon)
  • EU STREP application (SFGLIMS) to work with
  • Poupon (Protein Production)
  • Perrakis (Biophysical methods, crystallisation)
  • Bricogne (X-ray data collection and structure
    generation)
  • Prilusky, Sussman (Bioinformatics, data mining)

37
Data Model Extensions
  • EXTEND-NMR
  • New NMR applications
  • Solid state NMR
  • PIMS
  • LIMS for protein production
  • SFGLIMS
  • LIMS for NMR and X-ray structure determination
  • X-ray
  • Chemoinformatics
  • (Metabolomics?)

38
Code Generation Plans
  • C/C/FORTRAN code
  • Needed for Extend-NMR and for CcpNmr Processing
  • Needed for interface to CYANA, NMRPIPE, AUTOPSY,
    etc.
  • Java/Database code
  • Extend for LIMS, high-throughput projects,
    NMRVIEW
  • Basic Machinery
  • Upgrades for long term extensibility/maintainabili
    ty and performance

39
API Languages and Formats
Language
Python
Java
C
Perl
XML
Format
SQL
  • For all languages
  • Metamodel
  • Documentation
  • For all formats
  • Schemas
  • I/O mappings

40
New Core API technology
  • Reduce burden of adding new languages, formats
  • Languages (Python, Java, C, Perl)
  • Storage formats (XML, SQL)

Most of the logic
Language Format independent
Language dependent only
Format dependent only
Language Format dependent
Code required for new language
Code required for new format
41
Core API technology, cont.
  • Remodelling of implementation details
  • Storages, collection types, root objects, etc.
  • Complex data types
  • e.g. rotation matrix
  • Client/Server architecture
  • For PIMS and SFGLIMS

42
Analysis Development
  • Beyond CLOUDS
  • Large proteins, homologues
  • Processing linked in
  • Couplings (RDCs, TROSY), dihedral constraints
  • Titrations (Ka, Kd)
  • Chain states (alternate conformations)
  • Solid State NMR
  • Organic chemistry NMR (1D)
  • Publication-ready diagrams and tables
  • Windows version

43
Developments in Extend-NMR
  • Integrated Bayesian, maximum entropy, methods
    for data-processing, analysis and structure
    calculation
  • Molecular replacement for NMR
  • Further RECOORD development
  • Databank for Experimental NMR spectra (DEN)
  • MSD database analysis

44
Licenses
  • GPL
  • Data model
  • Scripts which produce APIs
  • LGPL
  • Generic libraries
  • Widget libraries
  • Format Converter
  • CCPN
  • Analysis

45
Resources, 1
  • SourceForge
  • CVS repository for code
  • API and FormatConverter releases
  • http//sourceforge.net/projects/ccpn
  • CCPN
  • Meetings, workshops
  • API, FormatConverter and Analysis releases
  • http//www.ccpn.ac.uk

46
Resources, 2
  • EBI
  • Format Converter
  • Databases (MSD group)
  • http//www.ebi.ac.uk/msd-srv/docs/NMR/NMRtoolkit/m
    ain.html
  • JISCMAIL
  • Email list
  • http//www.jiscmail.ac.uk/lists/ccpnmr.html
  • (http//www.jiscmail.ac.uk/lists/nmrgen.html)

47
CcpNmr Analysis Tutorial Part II
48
CCPN at Göteborg Day 2
  • An overview of the data model
  • API Tutorial
  • Analysis Macros
  • Widgets and Popups

49
Major Data Model Packages
50
CCPN Packages
  • Groupings of related data
  • e.g. NMR, X-ray, Molecular description
  • Connections between packages
  • e.g. NMR loads Nucleus (isotope) information
  • Allows lazy loading
  • Only load relevant data
  • Only load when a link is queried
  • Save only modified
  • Reference packages
  • Chemical compound, Reference chemical shifts

Molecule
ChemComp
People
MolSystem
Nucleus
Sample
Coordinates
Nmr
51
ChemElement
52
ChemElement - Details
53
Coordnates
54
Analysis
55
Implementation
56
Molecules and MolSystems
  • Molecules
  • Templates for specifying molecular connectivity.
  • Sequences, chemical components, protonation state
    etc.
  • A kind of reference, e.g. Lysozyme
  • MolSystems
  • Contain chains, which contain residues, which
    contain atoms.
  • The objects you assign to.
  • Built using molecule templates, e.g. a
    homo-oligomer is built using the same template to
    make different chains.
  • Stored in different packages
  • Molecule.xml, MolSystem.xml

57
MolSystem
58
Molecule
59
ChemComp
60
Experiment, Spectrum Shift List Objects
  • Experiment
  • The set-up under particular conditions at a
    particular time, not a class of experiment.
  • Spectrum
  • Known as Data Source in the data model. A pointer
    to a chunk of data that results from an
    experiment. Several spectra may result from the
    same experiment if they are processed
    differently.
  • Peak List
  • A set of crosspeaks that have been picked for a
    spectrum. A spectrum can have several peak lists.
    The user can separate peaks into classes, e.g.
    picked in different ways.
  • Shift List
  • A set of chemical shifts, which are derived from
    peaks and may be linked to atoms. Valid for a set
    of experiments with similar conditions that give
    similar chemical shifts. Using different shift
    lists doesnt change assignments, but it does
    change which peaks are used in the calculation of
    a shift value.

61
Nmr
62
Nmr.Peak
63
Resonances and Assignment
  • Resonances
  • The centre of the NMR data model
  • Connect to peaks
  • Different peaks may be caused by the same thing.
  • Connect to atoms
  • A connection to NMR equivalent atoms. Need not be
    set if anonymous.
  • Have chemical shifts
  • May have different shifts under different
    conditions.

Experiment Spectra Conditions
Constraint Distance Dihedral
Measurement Chemical Shift Relaxation Coupling
Peak Dimensions
Resonance
Annotation Spin System Connectivity Residue Type
Structure Co-ordinates
Molecule Atoms Residues Chains
64
Nmr.Resonance
65
NmrConstraints
66
Python API coding tutorial
67
Development in the CCPN framework
  • CcpNmr Macros
  • Small home-use Python functions
  • Additions to function library
  • Functions incorporated in software release
  • Community sharing
  • Embedded options
  • Extension to CcpNmr application
  • Stand-alone applications
  • Built on CCPN libraries and API
  • CcpNmr Clouds has examples of all of these

68
The Python interface to the CCPN Data Model
  • Find the number of assigned peaks in a spectrum
  • count 0
  • for peakList in spectrum.peakLists
  • for peak in peakList.peaks
  • for peakDim in peak.peakDims
  • if peakDim.peakDimContribs
  • count 1
  • break
  • Find all H-C partners in a residue
  • pairs
  • for atom in residue.atoms
  • if atom.chemAtom.elementSymbol C
  • for bond in atom.chemAtom.chemBonds
  • chemAtoms list(bond.chemAtoms)
  • chemAtoms.remove(chemAtom)
  • if chemAtoms0.elementSymbol H

69
CcpNmr Analysis Macros
  • Python scripts/functions
  • Accessible from Analysis and embeddable
  • Argument server
  • An interface to the Analysis program
  • Access to objects
  • Selected peaks
  • Cursor position
  • Spectra
  • Windows
  • Etc
  • High-level function library
  • Windows, Assignment, Molecules, Constraints
  • Documented

70
Macro 1 - Simple stuff
  • Python language
  • Function anatomy
  • Import library functions
  • ArgumentServer
  • Simple program
  • def addMarksToPeaks(argServer, peaksNone)
  • """Descrn Adds position line markers to the
    selected peaks.
  • Inputs ArgumentServer, List of Nmr.Peaks
  • Output None
  • """
  • from ccpnmr.analysis.MarkBasic import
    createPeakMark
  • if not peaks
  • peaks argServer.getCurrentPeaks()
  • no peaks - nothing happens
  • for peak in peaks

71
Macro 2 - Ask the user
def calcAveragePeakListIntensity(argServer,
peakListNone, intensityType'height')
"""Descrn Find the average height of peaks in a
peak list. Inputs ArgumentServer,
Nmr.PeakList Output Float """ from
ccpnmr.analysis.ConstraintBasic import
getMeanPeakIntensity if not peakList
peakList argServer.getPeakList() if not
peakList argServer.showWarning('No peak list
selected') return answer
argServer.askYesNo('Use peak volumes? Height will
be used otherwise.') if answer is true
intensityType 'volume' spec
peakList.dataSource expt
spec.experiment intensity getMeanPeakIntensit
y(peakList.peaks, intensityTypeintensityType)
data (intensityType,expt.name,spec.name,pe
akList.serial,intensity)) argServer.showInfo(
'Mean peak s for s s peak list d is e'
data return intensity
72
Macro 3 - Popup loader
def openMyPopup(argServer) """Descrn Opens
and example popup. Inputs ArgumentServer
Output None """ peakList
argServer.getPeakList() popup
MyPopup(argServer.parent, peakList) from
memops.gui.BasePopup import BasePopup from
memops.gui.ButtonList import ButtonList from
memops.gui.ScrolledGraph import
ScrolledGraph from ccpnmr.analysis.PeakBasic
import getPeakHeight, getPeakVolume
73
Macro 3 - The popup
class MyPopup(BasePopup) def __init__(self,
parent, peakList, args, kw) self.peakList
peakList self.colours 'red',
'green' self.dataSets
BasePopup.__init__(self, parentparent,
title'Test Popup', kw) def body(self,
guiParent) row 0 self.graph
ScrolledGraph(guiParent) self.graph.grid(row
row, column0, sticky'NSEW') row 1
texts 'Draw graph','Goodbye' commands
self.draw, self.destroy buttons
ButtonList(guiParent, textstexts, commands
commands) buttons.grid(rowrow, column0,
sticky'NSEW') def draw(self)
self.dataSets self.getData()
self.graph.update(self.dataSets, self.colours)
def getData(self) peakData (
getPeakVolume(peak) or 0.0, peak) for peak in
self.peakList.peaks peakData.sort()
heights volumes i 0 for
volume, peak in peakData
heights.append(i, getPeakHeight(peak) or 0.0)
volumes.append(i, volume) i 1
74
CcpNmr Graphical Widgets
  • A library for any developer to use

ColorList PulldownMenu ScrolledMatrix LabelFrame C
heckButton Button Label Entry ButtonList
75
CcpNmr Mega Widgets
  • Build them into your own code!
  • ScrolledMatrix
  • ScrolledGraph
  • StructureFrame

76
Ccp Stand-Alone AppTemplate
  • Menu System
  • Project handling
  • New
  • Load
  • Save
  • Backup
  • Popup template
  • Widgets
  • Geometry
  • Plumbing

77
Popup Constructors and Notifiers
External Influence
User Influence
  • Init
  • Setup local variables
  • Subclass popup window
  • Body
  • Arrange Graphical elements
  • Set up Data Model notifiers
  • Set initial state
  • Update
  • Process updated values
  • Redraw widgets based on status
  • Widget callback
  • From entry, buttons etc
  • User functions
  • Data Model change

Initialisation
Widgets
Body
Data Model
Notifiers
Update Filter
Update
78
Aftercare
  • www.ccpn.ac.uk
  • Downloads
  • Data Model documentation
  • Analysis documentation
  • Tutorials
  • Mailing List
  • http//www.jiscmail.ac.uk/lists/CCPNMR.html
  • Quick response
  • Bugs
  • Requests
Write a Comment
User Comments (0)
About PowerShow.com