Suite of Python MadMapper scripts for quality control of genetic markers, group analysis and inference of linear order of markers on linkage groups - PowerPoint PPT Presentation

About This Presentation
Title:

Suite of Python MadMapper scripts for quality control of genetic markers, group analysis and inference of linear order of markers on linkage groups

Description:

Suite of Python MadMapper scripts for quality control of genetic markers, group analysis and inferen – PowerPoint PPT presentation

Number of Views:473
Avg rating:3.0/5.0
Slides: 25
Provided by: ako9
Category:

less

Transcript and Presenter's Notes

Title: Suite of Python MadMapper scripts for quality control of genetic markers, group analysis and inference of linear order of markers on linkage groups


1
Suite of Python MadMapper scripts for quality
control of genetic markers,group analysis and
inference of linear order of markers on linkage
groups Visualization and validation of genetic
mapsusing two-dimensional CheckMatrix
heat-plots Alexander Kozik and Richard
Michelmore, UC Davis Genome Center
http//cgpdb.ucdavis.edu/XLinkage/MadMapper/
1
2
Mapping using Recombinant Inbred Lines
Genetic Cross
Genotyping
Raw Marker Scores
Mapping Inference of Linear Order of Markers
2
3
MadMapper and CheckMatrix are Python scripts and
can be used on any computer platform UNIX,
Windows, Mac OS-X. Grouping can be done on a set
of 2,000 markers map construction works in
reasonable timeframe with up to 500
markers http//cgpdb.ucdavis.edu/XLinkage/MadMapp
er/
MadMapper_RECBIT - quality control of genetic
markers and group analysis
MadMapper_XDELTA - inference of linear order
of markers on linkage groups
CheckMatrix (py_matrix_2D_V248_RECBIT.py ) -
visualization and validation of genetic maps
using two-dimensional heat-plots and graphical
genotyping
3
4
MadMapper_RECBIT input and output files
Trio Analysis .z_trio_good .z_trio_best
.z_trios_bad analysis of all trios
(triplets) for non-redundant set of markers
one input file - locus file with raw marker
scores
82 output files
LOG file ( .x_log_file ) information about run
parameters
Marker Scores Info .x_scores_stat detailed
information about scores and linkage
Marker summary .z_marker_sum for each
marker quality class is assigned - - useful
for selection of core markers
Group Info Summary file .x_tree_clust
Summary for clustering results for all 16
iterations Distinct linkage groups can be
inferred by analysis of this clustering /
grouping information
Non-Redundant Marker Scores .z_nr_scores.loc
locus file with non-redundant raw marker scores
4
5
MadMapper BIT scoring system



EXAMPLES OF SCORING


POSITIVE
LINKAGE

AAAAAAAAAAAAAAAAAAAA
BIT SCORE 620 120
AAAAAAAAAAAAAAAAAAAA REC SCORE 0 (0.0)
..

AAAAAAAAAAAAAAAAAAAA BIT SCORE 618 - 62
96 AAAAAAAAAAAAAAAAAABB REC
SCORE 2 (2/20 0.1)

AAAAAAAAAABBBBBBBBBB BIT SCORE 610
610 120 AAAAAAAAAABBBBBBBBBB
REC SCORE 0 (0.0)
..
AAAAAAAAABABBBBBBBBB BIT SCORE 618
- 62 96 AAAAAAAAAABBBBBBBBBB
REC SCORE 2 (2/20 0.1)


NO LINKAGE

..........
AAAAAAAAAAAAAAAAAAAA BIT SCORE 610 -
610 0 AAAAAAAAAABBBBBBBBBB
REC SCORE 10 (10/20 0.5) . . . .
. . . . . .
BBBAABBAAAAAAABAABBB BIT SCORE 610
- 610 0 BABBAABBABABABBBAABA
REC SCORE 10 (10/20 0.5)


NEGATIVE LINKAGE

..................
AAAAAAAAAAAAAAAAAAAA BIT SCORE
62 - 618 -96 AABBBBBBBBBBBBBBBBB
B REC SCORE 18 (18/20 0.9)
..................
ABABABABABABABABABAB BIT SCORE
62 - 618 -96 ABBABABABABABABABAB
A REC SCORE 18 (18/20 0.9)




-------
GENOTYPES
BIT A 1st B 2nd
SCORING SYSTEM C - NOT A ( H
or B ) REC
D - NOT B ( H or A )
------- H - A and B

. ----------------------
-------------------- .

. A B C D H
- .
-----------------
--------------------------------
6 -6 -4 4 -2 0
A
0 1
1 0 0.5 0
-------------------------------------------
------ -6 6 4
-4 -2 0 B

1 0 0 1 0.5
0 -------------------------------
------------------ -4
4 4 -4 0 0
C
1 0 0 1
0 0 -------------------
------------------------------
4 -4 -4 4 0 0
D
0 1
1 0 0 0
-------------------------------------------
------ -2 -2 0
0 2 0 H

0.5 0.5 0 0 0
0 -------------------------------
------------------ 0
0 0 0 0 0
-
0 0 0 0
0 0 -------------------
------------------------------.



5
6
Arabidopsis Genetic Map Comparison of Different
Scoring Systems
JoinMap LOD scores
JoinMap REC scores
MadMapper BIT scores
MadMapper REC scores
6
7
MadMapper_RECBIT Clustering Group Info Summary
.x_tree_clust file
7
8
MadMapper_RECBIT BIN Analysis
M_1 A A A B B B A A A A B B B B A A - A A B B B
B A B B A A B A A A B B B B M_2 A A A B B B A -
A A B B B B A A A A A B B B B A B B A A B A A A B
B B B M_3 A A A B B B A A A A B B - B A A A A A
B B B B A B B A A B A A A B B B B M_4 A A A B B
B A A A A B B A B A A A A A B B - B A B B A A B A
A A B B B B
Linked Group
Diluted Node
Saturated Node
Example of Complete Graph all nodes
are saturated
8
9
MadMapper_RECBIT Marker Summary .z_marker_sum
file
9
10
MadMapper_RECBIT Trio (Triplet) Analysis
M_1 A A A B B B A A A A B B B B A A - A A B B B
B A B B A A B A A A B B B B X
X X
X M_M A A A B A B A - A A B B B B A B A A A B B
A B A B B A A B A A A B B A B X
X X
X M_2 A A A B B B A A A A B B - B A A A A A B
B B B A B B A A B A A A B B B B
flanking marker 1
middle marker
flanking marker 2
MARKER_Flank_1 REC1 BIT1 D_FR1 MARKER_Middle REC2 BIT2 D_FR2 MARKER_Flank_2   REC_Flank BIT_F D_FR_F   D_REC   D_REC_Flank
COR47 0.1 336 0.6931 CAT3 0.0857 348 0.6931 G2395 0.0253 450 0.7822 5 0
G2395 0.0857 348 0.6931 CAT3 0.1 336 0.6931 COR47 0.0253 450 0.7822 5 0
LK141 0.1522 192 0.4554 GUT15 0.193 210 0.5644 MI238 0.1231 294 0.6436 4 0
MI238 0.193 210 0.5644 GUT15 0.1522 192 0.4554 LK141 0.1231 294 0.6436 4 0
MI204 0.051 528 0.9703 MI51 0.0806 312 0.6139 SGCSNP41 0.0161 360 0.6139 4 0
SGCSNP41 0.0806 312 0.6139 MI51 0.051 528 0.9703 MI204 0.0161 360 0.6139 4 0

M336 0.0494 438 0.802 COR15 0.0385 432 0.7723 VE018 0.0879 450 0.901 0 0
VE018 0.0385 432 0.7723 COR15 0.0494 438 0.802 M336 0.0879 450 0.901 0 0
ARR7 0.0115 510 0.8614 COR47 0 282 0.4653 F15571 0.0179 324 0.5545 0 0
F15571 0 282 0.4653 COR47 0.0115 510 0.8614 ARR7 0.0179 324 0.5545 0 0
PAP3 0.0227 504 0.8713 COR78 0.0577 276 0.5149 PDC2 0.0588 270 0.505 0 0
PDC2 0.0577 276 0.5149 COR78 0.0227 504 0.8713 PAP3 0.0588 270 0.505 0 0
Bad Trios
Number of double crossovers is high for bad trios
Good Trios
Number of double crossovers should be low for
good trios
10
11
CheckMatrix Usage three input files are required
CheckMatrix (py_matrix_2D_V248_RECBIT.py )
upon program execution three output files will be
generated
HEAT PLOT it assists to validate the quality of
constructed genetic map and identify markers with
wrong position
GRAPHICAL GENOTYPING visualization of haplotypes
per recombinant line (suspicious double
crossovers are highlighted)
CIRCULAR GRAPH it assists to validate genetic
map and identify markers with spurious linkage
11
12
Genetic Map Visualization using CheckMatrix
good map
12
13
Genetic Map Visualization using CheckMatrix
wrong map
13
14
Genetic Map Visualization using CheckMatrix
disordered markers
14
15
Minimum Entropy Approach to Infer Linear
Order Using MadMapper_XDELTA program
random order high entropy
MadMapper_XDELTA analyzes two-dimensional
matrices of all pairwise scores and finds best
map that has minimal total sum of differences
between adjacent cells (map with lowest
entropy).
partially wrong order
right order low entropy
Visualization of numerical data using ChekMatrix
15
16
MINIMUM ENTROPY APPROACH TO INFER LINEAR ORDER
OF MARKERS
CheckMatrix Color Scheme
Two-dimensional matrix of recombination pairwise
scores
adjacent cells (values)
Numerical data generated by MadMapper
Visualization of numerical data using CheckMatrix
16
17
MadMapper_XDELTA Usage
  • MadMapper_XDELTA takes as input three files
  • Matrix (pairwise distances between markers)
  • List of frame markers
  • List of markers to map

First step finding of the best map for frame
markers by checking all possible combinations (up
to 10 markers)
optionally unlimited list of frame markers
with fixed order
Best-Fit extension
Take one marker from the list of markers to map
and insert it into 2-dimensional matrix of the
current best map. Check for all possible
positions. Calculate delta and find the map
with lowest delta value (lowest entropy)
Move to the next marker to map until all markers
are mapped. Optional shuffling (ripple) after
several steps
17
18
Example of Best-Fit Extension

MATRIX (ALL PAIRS) madmapper_test_small.out.pai
rs_all MARKERS TO MAP madmapper_test_small
.list FRAME MARKERS LIST madmapper_test_small.
frame OUTPUT MAP FILE madmapper_test_small.
xdelta MAX FRAME LENGTH 12 FIXED FRAME
ORDER FALSE LINKAGE GROUP ID LG DUMMY
DEBUG TRUE
GM02 GM06 GM10
1.52 0.5067 1 GM02
GM10 GM06 1.92 0.64
2 GM06 GM02 GM10 1.68
0.56 3 GM03 GM02
GM06 GM10 2.16 0.54
1 GM02 GM03 GM06 GM10
2.0 0.5 2 GM02 GM06
GM03 GM10 2.64 0.66
3 GM02 GM06 GM10 GM03
3.2 0.8 4 GM08
GM02 GM03 GM06 GM10 3.64
0.728 1 GM02 GM08 GM03
GM06 GM10 4.32 0.864
2 GM02 GM03 GM08 GM06 GM10
3.28 0.656 3 GM02
GM03 GM06 GM08 GM10 2.56
0.512 4 GM02 GM03 GM06
GM10 GM08 3.16 0.632
5 GM09 GM02 GM03 GM06
GM08 GM10 4.8 0.8
1 GM02 GM09 GM03 GM06 GM08
GM10 5.92 0.9867
2 GM02 GM03 GM09 GM06 GM08 GM10
4.72 0.7867 3 GM02
GM03 GM06 GM09 GM08 GM10
3.76 0.6267 4 GM02 GM03
GM06 GM08 GM09 GM10 3.12
0.52 5 GM02 GM03 GM06
GM08 GM10 GM09 3.52
0.5867 6
map calculated by checking of all possible
combinations
marker GM03 was inserted
marker GM08 was inserted
marker GM09 was inserted
18
19
MadMapper_XDELTA Map Output
A marker above
B middle marker
Distance A-B
Distance B-C
Distance A-C
(A-B B-C) - A-C
A-B B-C
C marker below
LG MARKER POS 1 DST1 2 DST2 3 DST3 S SUMM D DIFF STATUS CLASS
2 G4553 0 1 0 2 NNNNNN 3 NNNNNN S NNNNNN D NNNNNN NNNNNN NNNNN
2 M246 1 1 0.043 2 0.0213 3 0.0778 S 0.0643 D -0.0135 GOOD __0__
2 MI320 2 1 0.0213 2 0 3 0.0211 S 0.0213 D 0.0002 GOOD __0__
.. .. .. .. .. .. .. ..
2 NGA1126 26 1 0.0225 2 0.0645 3 0.0702 S 0.087 D 0.0168 GOOD __0__
2 SGCSNP135 27 1 0.0645 2 0.0645 3 0.0842 S 0.129 D 0.0448 GOOD __0__
2 MI54 28 1 0.0645 2 0.0211 3 0.0968 S 0.0856 D -0.0112 GOOD __0__
2 VE014 29 1 0.0211 2 0.0532 3 0.0745 S 0.0743 D -0.0002 GOOD __0__
2 M283 30 1 0.0532 2 0.1803 3 0.1833 S 0.2335 D 0.0502 GOOD __1__
2 SGCSNP333 31 1 0.1803 2 0.1167 3 0.0968 S 0.297 D 0.2002 GOOD LARGE
2 SGCSNP210 32 1 0.1167 2 0 3 0.1154 S 0.1167 D 0.0013 GOOD __0__
2 COP1 33 1 0 2 0.0196 3 0.0263 S 0.0196 D -0.0067 GOOD __0__
2 SPL3 34 1 0.0196 2 0.04 3 0.0339 S 0.0596 D 0.0257 GOOD __0__
2 C4H 35 1 0.04 2 0.0227 3 0.0303 S 0.0627 D 0.0324 GOOD __0__
.. .. .. .. .. .. .. ..
2 M336 54 1 0.0519 2 0.0625 3 0 S 0.1144 D 0.1144 GOOD __X__
2 UBIQUE 55 1 0.0625 2 0.0526 3 0.0619 S 0.1151 D 0.0532 GOOD __1__
2 MI79A 56 1 0.0526 2 0.0781 3 0.0645 S 0.1307 D 0.0662 GOOD __1__
2 ATHB7 57 1 0.0781 2 0.1579 3 0.1698 S 0.236 D 0.0662 GOOD __1__
2 SGCSNP214 58 1 0.1579 2 0.1429 3 0.1667 S 0.3008 D 0.1341 GOOD __X__
2 SGCSNP198 59 1 0.1429 2 NNNNNN 3 NNNNNN S NNNNNN D NNNNNN NNNNNN NNNNN
A
B
C
19
20
20
21
Physical order of markers (based on BLAST search)
Side-by-side comparison of linear order of
markers on Arabidopsis genome inferred by three
different approaches (mapping programs) and
comparison with physical order of markers (Col- 0
genomic sequence) MadMapper_XDELTA (minimum
entropy approach), JoinMap (maximum likelihood)
and RECORD (minimum number of recombination
events) Diagonal dot-plot was created using
GenoPix_2D_Plotter http//www.atgc.org/GenoPix_2D_
Plotter/
Inferred order of markers by mapping programs
MadMapper
JoinMap
RECORD
21
22
Arabidopsis Genetic Map constructed by MadMapper
and visualized with CheckMatrix 2D Heat Plot
Linkage group I
Regions with Negative Linkage
Main Diagonal with Linked Markers
Linkage group II
Linkage group III
High Density of Markers
Regions with Quasi Linkage
Linkage group IV
Low Density of Markers
Allele Composition per Marker
Linkage group V
Linkage group I
Linkage group II
Linkage group III
Linkage group IV
Linkage group V
22
23
Arabidopsis Genetic Map constructed by
MadMapper and visualized with CheckMatrix
Graphical Genotyping
23
24
REFERENCES AND DATA SOURCES 1. Dean and Lister
Arabidopsis Genetic Map and Raw Data
http//www.arabidopsis.info/new_ri_map.html 2.
MadMapper http//cgpdb.ucdavis.edu/XLinkage/M
adMapper/ 3. JoinMap http//www.kyazma.nl/ind
ex.php/mc.JoinMap 4. RECORD
http//www.dpw.wau.nl/pv/pub/recORD/index.htm 5.
GenoPix_2D_Plotter http//www.atgc.org/GenoPix
_2D_Plotter/ CREDITS This work was funded
by NSF grant 0421630 to Compositae Genome
Consortium http//compgenomics.ucdavis.edu/ P
AG-14 POSTERS WITH EXAMPLES OF MADMAPPER
USAGE P751 High-Density Haplotyping With
Microarray-Based Single Feature Polymorphism
Markers In Arabidopsis P761 Gene Expression
Markers Using Transcript Levels Obtained From
Microarrays To Genotype A Segregating
Population P957 MadMapper And CheckMatrix -
Python Scripts To Infer Orders Of Genetic Markers
And For Visualization And Validation Of Genetic
Maps And Haplotypes
24
Write a Comment
User Comments (0)
About PowerShow.com