Mutation Nomenclature - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Mutation Nomenclature

Description:

AMP Program Committee (Karen Weck, Jeff Kant) AMP Genetics Subdivision (Paul Rothberg, Vicky Pratt) Johan den Dunnen and Human Genome Variation Society (HGVS) ... – PowerPoint PPT presentation

Number of Views:3267
Avg rating:5.0/5.0
Slides: 60
Provided by: shuji
Category:

less

Transcript and Presenter's Notes

Title: Mutation Nomenclature


1
Mutation Nomenclature
  • Shuji Ogino, M.D., Ph.D.
  • Brigham and Womens Hospital
  • Dana-Farber Cancer Institute
  • Harvard Medical School

2
Special Thanks!
  • AMP TE Committee (Tom Prior)
  • AMP Program Committee (Karen Weck, Jeff Kant)
  • AMP Genetics Subdivision (Paul Rothberg, Vicky
    Pratt)
  • Johan den Dunnen and Human Genome Variation
    Society (HGVS)
  • CAP Molecular Pathology Committee (Peggy Gulley,
    Jeff Kant)

3
Why Standardize Nomenclature?
  • The human genome is a big world
  • Precise map and address are necessary
  • We all should use the same language (like
    English)
  • Successful examples
  • Metric system
  • CD system
  • HLA system

4
Users of Genetic Information
  • Laboratorians
  • Researchers
  • Clinicians / health care providers
  • Patients / family members
  • Ultimate goals
  • To decrease ambiguity and miscommunications
  • To facilitate research and better practice

5
Standard nomenclatureCons
  • Difficult for experts
  • Solution
  • Describe standard names along with colloquial
    names

6
Standard nomenclaturePros
  • Easier to locate mutations
  • Easier in literature search
  • Friendly to newcomers
  • Every expert was once a newcomer
  • Every expert is a newcomer outside of expertise

7
Variant vs. Mutation vs. Polymorphism
  • Variant (change) neutral term
  • Mutation implies bad change
  • Polymorphism different definitions are confusing

8
Genetic (DNA) changes
  • Somatic vs. germline
  • Small changes (discussed today)
  • DNA sequence change
  • Large changes
  • DNA (gene) copy number change
  • Large position change (translocation)

9
Factor V Leiden (so-called1691GgtA or R506Q)
Basic Structure of Standard Nomenclature
F5 NM_000130.3 c. 1601GgtA (p.Arg534Gln)
Protein
Reference sequence (Genbank accession No. and
Version No.)
Coding DNA sequence
HGNC Official Gene Symbol
Guanine-to- Adenine at position 1601
Arg-to- Gln at codon 534
10
Rule
  • HUGO Gene Nomenclature Committee
    (http//www.gene.ucl.ac.uk/nomenclature/index.html
    )
  • HGNC-approved official gene symbols should be
    used
  • No Greek letters or Roman numerals (as in IGF-II
    and NF??)
  • No h or m prefix (as in hMLH1 or mTOR)
  • No - (as in K-ras)
  • All capital letters

11
HGNC-Approved Official Symbols
  • IGF2 (IGF-II)
  • CTNNB1 (?-catenin)
  • NFKBIB (NF??)
  • ABL1 (ABL)
  • F2 (prothrombin)
  • KRAS (K-ras)
  • CDKN2A (p16)

Describing both official and colloquial names
may be good
12
Here Reference sequence No.
13
Great news!RefSeqGene project
  • It is often difficult to decide which reference
    sequence to use
  • Ongoing NCBI project
  • Compiling full length reference sequence for each
    gene
  • The use of standard nomenclature will be much
    easier

14
After completion of the RefSeqGene project...
F5 NM_000130.3c.1601GgtA (p.Arg534Gln) F5
c.1601GgtA (p.Arg534Gln)
Reference sequence is not necessary because each
gene has a single coding DNA reference sequence
15
Reference Sequence
  • Coding DNA (c.)
  • Genomic DNA (g.)
  • Amino acid protein (p.)
  • Mitochondrial DNA (m.)
  • RNA (r.)

16
Reference Sequence
  • Coding DNA (c.) (derived from cDNA)
  • Doesnt cover all transcripts, introns or
    intergenic regions
  • Good for exons, 5-UTR, 3-UTR
  • Genomic DNA (g.)
  • Sequence present as in the human genome
  • Complete, but cumbersome (e.g.,
    g.21895321_21895324del)
  • Good for introns and intergenic regions

17
Coding DNA Ref Seq Numbering
Exon 1
Exon 2
Poly A addition site
Transcription start site
Intron 1
Codon 1 ATG
Stop codon end
3-UTR
5-UTR
-30 to -1 1 to 36 361 to 36 37- to
37-1 37 to 96 1 to 170
18
Reference sequence (F5, the Factor V
gene) NM_000130.3
1 gcaagaactg caggggagga ggacgctgcc acccacagcc
tctagagctc attgcagctg 61 ggacagcccg gagtgtggtt
agcagctcgg caagcgctgc ccaggtcctg gggtggtggc 121
agccagcggg agcaggaaag gaagcatgtt cccaggctgc
ccacgcctct gggtcctggt
ATG is 146-148
19
To get coding DNA reference sequence Click CDS
at Genbank reference sequence
Click this
20
Coding DNA reference sequence (F5, the Factor V
gene) NM_000130.3
1 atgttcccag gctgcccacg cctctgggtc ctggtggtct
tgggcaccag ctgggtaggc 61 tgggggagcc aagggacaga
agcggcacag ctaaggcagt tctacgtggc tgctcagggc 121
atcagttgga gctaccgacc tgagcccaca aactcaagtt
tgaatctttc tgtaacttcc
Now start with ATG
21
Important Rule
  • All variants should be described at the DNA level
    (along with amino acid change)
  • Description at only amino acid level is ambiguous
  • DNA change may have effect on transcription,
    splicing, or RNA stability (which cannot be
    described by amino acid change)

22
Amino acid 1 vs. 3 letter code
  • One-letter
  • Simple
  • NCBI uses this
  • Currently in wide use
  • The prefix p. makes nomenclature unequivocal
  • Three-letter
  • HGVS recommends
  • Prevent mistakes and confusion (especially by lay
    people)

23
Exon / Intron Numbering
  • Not solid
  • Many exon 0, exon 2a, exon 2b, etc.
  • Why is 3rd exon not exon 3 (but exon 2, or
    exon 2b)?
  • Exon/intron numbering should not be used for
    standard nomenclature
  • c.23-2CgtT, instead of c.IVS1-2CgtT

24
2 variants in 1 gene
  • Compound heterozygote (trans)
  • c.515CgtTc.2598delT
  • c.515CgtT? unknown mutation
  • 2 variants in 1 allele (cis)
  • c.515CgtT c.2598delT
  • 2 variants, alleles unknown
  • c.515CgtT()c.2598delT

25
SNP
  • AF478696.1g.21538GgtA
  • HGVS nomenclature
  • rs2306220
  • Genbank database SNP ID
  • (http//www.ncbi.nlm.nih.gov/sites/entrez)

26
Mutalyzer
  • www.lovd.nl/mutalyzer/
  • Helps naming of variants

27
Current issues
  • Intronic nucleotides do not exist in full length
    coding DNA reference sequence
  • Where and which nucleotide ???
  • (e.g., CFTR NM_000492.3c.3718-2477CgtT)
  • Solutions
  • Use of genomic reference sequence
  • CFTR AY848832.1g.40725CgtT
  • 2. RefSeqGene project

28
Q A
29
Question 1. Deletion
CFTR NM_000492.3
1501 accattaaag aaaatatcat ctttggtgtt tcctatgatg
aatatagata cagaagcgtc 1561 atcaaagcat gccaactaga
agaggacatc tccaagtttg cagagaaaga caatatagtt
506 507 508 509 510 Ile Ile Phe
Gly Val ATC ATC TTT GGT GTT
ATC ATT GGT GTT Ile Ile Gly Val

Wild-type Mutant
30
Question 1
CFTR NM_000492.3
1501 accattaaag aaaatatcat ctttggtgtt tcctatgatg
aatatagata cagaagcgtc 1561 atcaaagcat gccaactaga
agaggacatc tccaagtttg cagagaaaga caatatagtt
506 507 508 509 510 Ile Ile Phe
Gly Val ATC ATC TTT GGT GTT
ATC ATT GGT GTT Ile Ile Gly Val

Wild-type Mutant
1521 1523
31
Rules
  • c. based on coding DNA ref seq
  • g. based on genomic DNA ref seq
  • A range of nucleotide numbers is indicated by _
  • del for deletion

32
Answer 1
CFTR NM_000492.3
1501 accattaaag aaaatatcat ctttggtgtt tcctatgatg
aatatagata cagaagcgtc 1561 atcaaagcat gccaactaga
agaggacatc tccaagtttg cagagaaaga caatatagtt
506 507 508 509 510 Ile Ile Phe
Gly Val ATC ATC TTT GGT GTT
ATC ATT GGT GTT Ile Ile Gly Val

Wild-type Mutant
CFTR NM_000492.3c.1521_1523del (p.F508del)
33
Answer 1
CFTR NM_000492.3
1501 accattaaag aaaatatcat ctttggtgtt tcctatgatg
aatatagata cagaagcgtc 1561 atcaaagcat gccaactaga
agaggacatc tccaagtttg cagagaaaga caatatagtt
506 507 508 509 510 Ile Ile Phe
Gly Val ATC ATC TTT GGT GTT
ATC ATT GGT GTT Ile Ile Gly Val

Wild-type Mutant
CFTR NM_000492.3c.1521_1523del ? codon 508
(TTT) del
34
Question 2. Hemoglobin ?S (sickle cell anemia)
(Glu6Val, E6V) Coding DNA reference sequence
NM_000518.4
AgtT (codon 7 GAGgtGTG)
Start
1 atggtgcatc tgactcctga ggagaagtct gccgttactg
ccctgtgggg caaggtgaac 1 2 3 4 5 6 7
What is wrong with Glu6Val (in terms of
standard)? What is the official gene symbol for
the hemoglobin beta? What is nomenclature for
this common variant?
35
Answer 2. Hemoglobin ?S (sickle cell anemia)
(Glu6Val, E6V) Coding DNA reference sequence
NM_000518.4
AgtT (codon 7 GAGgtGTG)
Start
1 atggtgcatc tgactcctga ggagaagtct gccgttactg
ccctgtgggg caaggtgaac 1 2 3 4 5 6 7
What is wrong with Glu6Val (in terms of
standard)? This is codon 7 in primary
transcript What is the official gene symbol for
the hemoglobin beta? HBB What is nomenclature
for this common variant? HBB NM_000518.4c.20AgtT
(p.Glu7Val) (so-called hemoglobin ?S)
36
Question 3. Mutation in 3-UTR
Codon 177
Ala Asn Leu Ala Ser STOP CAA
GCC AAT CTT GCT AGC TAG AGT TTT GGT TCT

AGT AAG GTT CT
Wild-type Mutant
C
Genbank accession number is NM_00001 version
3. Gene symbol AAA1 What is nomenclature for
this mutation based on coding DNA reference
sequence?
37
Rules
  • c. based on coding DNA ref seq
  • for 3-UTR numbering
  • c.1 indicates the first nucleotide after the
    stop codon
  • A range of nucleotide numbers is indicated by _
  • delins for deletion-insertion

38
Answer 3. Mutation in 3-UTR
Codon 177
4 6
Ala Asn Leu Ala Ser STOP CAA
GCC AAT CTT GCT AGC TAG AGT TTT GGT TCT

AGT AAG GTT CT
Wild-type Mutant
C
Genbank accession number is NM_00001 version
3. Gene symbol AAA1 AAA1 NM_00001.3c.4_6delin
sAA NOT AAA1 NM_00001.3c.4_6TTTgtAA (gt is
only for a single nucleotide substitution)
39
Question 4. Intronic duplication mutation
p. (protein) amino acid numbers
61
Ala Asn Leu Ala
CTTAATAG GCC AAT CTT GCT
c. (coding) nt numbers
181
Insertion of A
Genbank accession number is NM_00001 version
3. Gene symbol AAA1 What is nomenclature for
this mutation based on coding DNA reference
sequence?
40
Rules
  • c. based on coding DNA ref seq
  • An intron nucleotide can be indicated as
    c.12091 or c.1210-3 (examples)
  • dup for duplication
  • The dup designation is preferred to ins,
    because it is simpler, and implicates mechanism
    (duplication)

41
Answer 4
p. (protein) amino acid numbers
Ala Asn Leu Ala
CTTAATAG GCC AAT CTT GCT
c. (coding) nt numbers
181
Insertion of A
Genbank accession number is NM_00001 version
3. Gene symbol AAA1 AAA1 NM_00001.3c.181-2dup
dup is preferred to AAA1 NM_00001.3c.181-2_181
-1insA because it is simpler and implies
mechanism of mutation
42
Q5. SNP in MGMT 5-UTR (non-coding exon)
69 Kb
CgtT change
Intron 1
Exon 1
Caccgtttgcgacttg
gtgagtgtctgggtcgcctcgctcccggaagagtg
cDNA reference sequence MGMT NM_002412.2 Includes

Exon 1
Exon 2
ATG codon 1
40 bp
Ogino et al. Carcinogenesis 2007 shows this
common SNP is strongly linked to MGMT
methylation in colorectal cancer
43
Rules
  • c. based on coding DNA ref seq
  • A nucleotide in 5-UTR can be indicated as c.-1
    or c.-23 (examples)

44
A5. SNP in MGMT 5-UTR (non-coding exon)
MGMT NM_002412.2c.-56CgtT (rs16906252)
Intron 1
Exon 1
Caccgtttgcgacttg
gtgagtgtctgggtcgcctcgctcccggaagagtg
cDNA reference sequence MGMT NM_002412.2 Includes

Exon 1
Exon 2
ATG codon 1
40 bp
Ogino et al. Carcinogenesis 2007
45
Question 6. First codon mutation
p. (protein) amino acid numbers
1 2 3 4 5 6 7 8 9 10
11 Met Ala Asn Leu Ala Ser Pro Arg Phe
Gly Ser CAA ATG GCC AAT CTT GCT AGC CCT
AGA TTT GGT TCT 1 4 7 10 13 16
19 22 25 28 31
c. (coding) nt numbers
Mutation
TgtG
Genbank accession number is NM_00001 version
3. Gene Symbol AAA1 What is nomenclature for
this mutation based on coding DNA reference
sequence?
46
Answer 6
p. (protein) amino acid numbers
1 2 3 4 5 6 7 8 9 10
11 Met Ala Asn Leu Ala Ser Pro Arg Phe
Gly Ser CAA ATG GCC AAT CTT GCT AGC CCT
AGA TTT GGT TCT 1 4 7 10 13 16
19 22 25 28 31
c. (coding) nt numbers
TgtG
Genbank accession number NM_00001 version 3.
Gene symbol AAA1 AAA1 NM_00001.3c.2TgtG (p.?
or p.0) p.Met1Arg doesnt make sense.
47
Question 7. Inversion
1 atgcagaggt cgcctctgga aaaggccagc gttgtctcca
aacttttttt cagctggacc 61 agaccaattt tgaggaaagg
atacagacag cgcctggaat tgtcagacat ataccaaatc 121
ccttctgttg attctgctga caatctatct gaaaaattgg
aaagagaatg ggatagagag 181 ctggcttcaa agaaaaatcc
taaactcatt aatgcccttc ggcgatgttt tttctggaga 241
tttatgttct atggaatctt tttatattta ggggaagtca
ccaaagcagt acagcctctc 301 ttactgggaa gaatcatagc
ttcctatgac ccggataaca aggaggaacg ctctatcgcg 361
atttatctag gcataggctt atgccttctc tttattgtca
ggacactgct cctacaccca Change
to actgtt
Coding DNA reference sequence for the AAA1
gene Genbank No. NM_00001 Version 3
48
Answer 7. Inversion
1 atgcagaggt cgcctctgga aaaggccagc gttgtctcca
aacttttttt cagctggacc 61 agaccaattt tgaggaaagg
atacagacag cgcctggaat tgtcagacat ataccaaatc 121
ccttctgttg attctgctga caatctatct gaaaaattgg
aaagagaatg ggatagagag 181 ctggcttcaa agaaaaatcc
taaactcatt aatgcccttc ggcgatgttt tttctggaga 241
tttatgttct atggaatctt tttatattta ggggaagtca
ccaaagcagt acagcctctc 301 ttactgggaa gaatcatagc
ttcctatgac ccggataaca aggaggaacg ctctatcgcg 361
atttatctag gcataggctt atgccttctc tttattgtca
ggacactgct cctacaccca actgtt
Coding DNA reference sequence for the AAA1
gene Genbank No. NM_00001 Version 3 AAA1
NM_00001.3c.395_400inv AAA1 NM_00001.3c.395_400
delinsACTGTT But the inv description is simpler
and can implicate mechanism of this mutation
49
Question 8. Known polymorphisms in repeat sequence
Genomic reference sequence AJ574948.1 (CFTR
intron 8, 5T/7T/9T)
Exon 9 starts here (No. 1210 In coding DNA ref
seq NM_000492.3)
1 tataattatg tactataaag taataatgta tacagtgtaa
tggatcatgg gccatgtgct 61 tttcaaacta attgtacata
aaacaagcat ctattgaaaa tatctgacaa actcatcttt 121
tatttttgat gtgtgtgtgt gtgtgtgtgt gtttttttaa
cagggatttg gggaattatt 181 tgagaaagca aaacaaaaca
ataacaatag aaaaacttct aatggtgatg acagcctctt 241
cttcagtaat ttctcacttc ttggtactcc tgtcctgaaa
gatattaatt tcaagataga 301 aagaggacag ttgttggcgg
ttgctggatc cactggagca ggcaaggtag ttcttttgtt 361
cttcactatt aagaacttaa tttggtgtcc atgtctcttt
ttttttctag tttgtagtgc 421 tggaaggtat ttttggagaa
attcttacat gagcattagg agaatgt
What is collective nomenclature for the repeat
polymorphisms? What is nomenclature for the 5T
variant? What is nomenclature for the 9T
variant?
50
Rules for repeat polymorphism
  • Use the number of the first nucleotide
  • Followed by repeated unit (e.g., T or CA)
  • Describe the number of repeats in
  • 8 a particular polymorphism (8 repeats)
  • (5_10) uncertain between 5 and 10 repeats

51
Answer 8. CFTR intron 8, 5T/7T/9T
g.152 or c.1210-12
AJ574948.1
Exon 9 starts here
1 tataattatg tactataaag taataatgta tacagtgtaa
tggatcatgg gccatgtgct 61 tttcaaacta attgtacata
aaacaagcat ctattgaaaa tatctgacaa actcatcttt 121
tatttttgat gtgtgtgtgt gtgtgtgtgt gtttttttaa
cagggatttg gggaattatt 181 tgagaaagca aaacaaaaca
ataacaatag aaaaacttct aatggtgatg acagcctctt 241
cttcagtaat ttctcacttc ttggtactcc tgtcctgaaa
gatattaatt tcaagataga 301 aagaggacag ttgttggcgg
ttgctggatc cactggagca ggcaaggtag ttcttttgtt 361
cttcactatt aagaacttaa tttggtgtcc atgtctcttt
ttttttctag tttgtagtgc 421 tggaaggtat ttttggagaa
attcttacat gagcattagg agaatgt
Repeat polymorphisms are collectively described
as CFTR AJ574948.1g.152T(5_9) or CFTR
NM000492.3c.1210-12T(5_9) The 5T is CFTR
AJ574948.1g.152T5 or CFTR NM000492.3c.1210-1
2T5 The 9T is CFTR AJ574948.1g.152T9 or
CFTR NM000492.3c.1210-12T9
52
Question 9. Known dinucleotide repeat
polymorphisms
Genomic DNA reference sequence AJ00001.1
1 tataattatg tactataaag taataatgta tacagtgtaa
tggatcatgg gccatgtgct 61 tttcaaacta attgtacata
aaacaagcat ctattgaaaa tatctgacaa actcatcttt 121
tatttttgat gtgtgtgtgt gtgtgtgtgt gtttttttaa
cagggatttg gggaattatt 181 tgagaaagca aaacaaaaca
ataacaatag aaaaacttct aatggtgatg acagcctctt 241
cttcagtaat ttctcacttc ttggtactcc tgtcctgaaa
gatattaatt tcaagataga 301 aagaggacag ttgttggcgg
ttgctggatc cactggagca ggcaaggtag ttcttttgtt 361
cttcactatt aagaacttaa tttggtgtcc atgtctcttt
ttttttctag tttgtagtgc 421 tggaaggtat ttttggagaa
attcttacat gagcattagg agaatgt
This TG repeat polymorphisms ranges 8 to 12
repeats in the general population What is
collective nomenclature for the repeat
polymorphisms? What is nomenclature for the
8-repeat allele? What is nomenclature for the
11-repeat allele?
53
Answer 9. Known dinucleotide repeat polymorphisms
130
1 tataattatg tactataaag taataatgta tacagtgtaa
tggatcatgg gccatgtgct 61 tttcaaacta attgtacata
aaacaagcat ctattgaaaa tatctgacaa actcatcttt 121
tatttttgat gtgtgtgtgt gtgtgtgtgt gtttttttaa
cagggatttg gggaattatt 181 tgagaaagca aaacaaaaca
ataacaatag aaaaacttct aatggtgatg acagcctctt 241
cttcagtaat ttctcacttc ttggtactcc tgtcctgaaa
gatattaatt tcaagataga 301 aagaggacag ttgttggcgg
ttgctggatc cactggagca ggcaaggtag ttcttttgtt 361
cttcactatt aagaacttaa tttggtgtcc atgtctcttt
ttttttctag tttgtagtgc 421 tggaaggtat ttttggagaa
attcttacat gagcattagg agaatgt
The repeat polymorphisms are collectively
described as AJ00001.1g.130TG(8_12) The
8-repeat allele is described as
AJ00001.1g.130TG8 The 11-repeat allele is
described as AJ00001.1g.130TG11
54
Question 10. Prothrombin G20210A DNA reference
sequence AF478696.1
Stop
3-UTR
21421 attgatcagt ttggagagta gggggccact catattctgg
gctcctggaa ccaatcccgt 21481 gaaagaatta
tttttgtgtt tctaaaacta tggttcccaa taaaagtgac
tctcagcgag
GgtA
What is wrong with G20210A (for the nucleotide
change)? What is the official gene symbol for
the prothrombin gene? What is nomenclature for
this common variant?
55
Answer 10. Prothrombin G20210A DNA reference
sequence AF478696.1
Stop
3-UTR
21421 attgatcagt ttggagagta gggggccact catattctgg
gctcctggaa ccaatcccgt 21481 gaaagaatta
tttttgtgtt tctaaaacta tggttcccaa taaaagtgac
tctcagcgag
GgtA
What is wrong with G20210A (for the nucleotide
change)? This can mean Gly-to-Ala change at
codon 20,210. What is the official gene symbol
for the prothrombin gene? F2 What is
nomenclature for this common variant? F2
AF478696.1g.21538GgtA F2 AF478696.1c.97GgtA (so-
called prothrombin G20210A)
56
Q11. Deletion of non-coding exon
Stop codon ends at c.885 (NM_00001.1) gene symbol
AAA1
Exon 7
Exon 8
?
?
c.5 c.573
Exon 7 ends at c.4
Exon 8 deletion, but exact positions of start
and end of deletion unknown
What is nomenclature for this exon 8 deletion?
57
Rules
  • For uncertain positions, use ?, but describe as
    specific as possible
  • Exon/intron numbers should not be used
  • Exon/intron numbering is neither uniform nor
    permanent

58
A11. Deletion of non-coding exon
Stop codon ends at c.885 (NM_00001.1) gene symbol
AAA1
Exon 7
Exon 8
?
?
c.5 c.573
Exon 7 ends at c.4
Exon 8 deletion, but exact positions of start
and end of deletion unknown
What is nomenclature for this exon 8
deletion? AAA1 NM_00001.1c.5-?_573?del
59
Take home messages
  • All DNA changes should be described at DNA level
    (with amino acid changes if known)
  • Use HGNC-approved official gene symbols
  • Standard names can be accompanied by (colloquial
    names)
Write a Comment
User Comments (0)
About PowerShow.com