Mutation Nomenclature

About This Presentation

Title:

Mutation Nomenclature

Description:

AMP Program Committee (Karen Weck, Jeff Kant) AMP Genetics Subdivision (Paul Rothberg, Vicky Pratt) Johan den Dunnen and Human Genome Variation Society (HGVS) ... – PowerPoint PPT presentation

Number of Views:3267

Avg rating:5.0/5.0

Slides: 60

Provided by: shuji

Category:

more less

Transcript and Presenter's Notes

Title: Mutation Nomenclature

1
Mutation Nomenclature

Shuji Ogino, M.D., Ph.D.
Brigham and Womens Hospital
Dana-Farber Cancer Institute
Harvard Medical School

2
Special Thanks!

AMP TE Committee (Tom Prior)
AMP Program Committee (Karen Weck, Jeff Kant)
AMP Genetics Subdivision (Paul Rothberg, Vicky
Pratt)
Johan den Dunnen and Human Genome Variation
Society (HGVS)
CAP Molecular Pathology Committee (Peggy Gulley,
Jeff Kant)

3
Why Standardize Nomenclature?

The human genome is a big world
Precise map and address are necessary
We all should use the same language (like
English)
Successful examples
Metric system
CD system
HLA system

4
Users of Genetic Information

Laboratorians
Researchers
Clinicians / health care providers
Patients / family members
Ultimate goals
To decrease ambiguity and miscommunications
To facilitate research and better practice

5
Standard nomenclatureCons

Difficult for experts
Solution
Describe standard names along with colloquial
names

6
Standard nomenclaturePros

Easier to locate mutations
Easier in literature search
Friendly to newcomers
Every expert was once a newcomer
Every expert is a newcomer outside of expertise

7
Variant vs. Mutation vs. Polymorphism

Variant (change) neutral term
Mutation implies bad change
Polymorphism different definitions are confusing

8
Genetic (DNA) changes

Somatic vs. germline
Small changes (discussed today)
DNA sequence change
Large changes
DNA (gene) copy number change
Large position change (translocation)

9
Factor V Leiden (so-called1691GgtA or R506Q)
Basic Structure of Standard Nomenclature
F5 NM_000130.3 c. 1601GgtA (p.Arg534Gln)
Protein
Reference sequence (Genbank accession No. and
Version No.)
Coding DNA sequence
HGNC Official Gene Symbol
Guanine-to- Adenine at position 1601
Arg-to- Gln at codon 534
10
Rule

HUGO Gene Nomenclature Committee
(http//www.gene.ucl.ac.uk/nomenclature/index.html
)
HGNC-approved official gene symbols should be
used
No Greek letters or Roman numerals (as in IGF-II
and NF??)
No h or m prefix (as in hMLH1 or mTOR)
No - (as in K-ras)
All capital letters

11
HGNC-Approved Official Symbols

IGF2 (IGF-II)
CTNNB1 (?-catenin)
NFKBIB (NF??)
ABL1 (ABL)
F2 (prothrombin)
KRAS (K-ras)
CDKN2A (p16)

Describing both official and colloquial names
may be good
12
Here Reference sequence No.
13
Great news!RefSeqGene project

It is often difficult to decide which reference
sequence to use
Ongoing NCBI project
Compiling full length reference sequence for each
gene
The use of standard nomenclature will be much
easier

14
After completion of the RefSeqGene project...
F5 NM_000130.3c.1601GgtA (p.Arg534Gln) F5
c.1601GgtA (p.Arg534Gln)
Reference sequence is not necessary because each
gene has a single coding DNA reference sequence
15
Reference Sequence

Coding DNA (c.)
Genomic DNA (g.)
Amino acid protein (p.)
Mitochondrial DNA (m.)
RNA (r.)

16
Reference Sequence

Coding DNA (c.) (derived from cDNA)
Doesnt cover all transcripts, introns or
intergenic regions
Good for exons, 5-UTR, 3-UTR
Genomic DNA (g.)
Sequence present as in the human genome
Complete, but cumbersome (e.g.,
g.21895321_21895324del)
Good for introns and intergenic regions

17
Coding DNA Ref Seq Numbering
Exon 1
Exon 2
Poly A addition site
Transcription start site
Intron 1
Codon 1 ATG
Stop codon end
3-UTR
5-UTR
-30 to -1 1 to 36 361 to 36 37- to
37-1 37 to 96 1 to 170
18
Reference sequence (F5, the Factor V
gene) NM_000130.3
1 gcaagaactg caggggagga ggacgctgcc acccacagcc
tctagagctc attgcagctg 61 ggacagcccg gagtgtggtt
agcagctcgg caagcgctgc ccaggtcctg gggtggtggc 121
agccagcggg agcaggaaag gaagcatgtt cccaggctgc
ccacgcctct gggtcctggt
ATG is 146-148
19
To get coding DNA reference sequence Click CDS
at Genbank reference sequence
Click this
20
Coding DNA reference sequence (F5, the Factor V
gene) NM_000130.3
1 atgttcccag gctgcccacg cctctgggtc ctggtggtct
tgggcaccag ctgggtaggc 61 tgggggagcc aagggacaga
agcggcacag ctaaggcagt tctacgtggc tgctcagggc 121
atcagttgga gctaccgacc tgagcccaca aactcaagtt
tgaatctttc tgtaacttcc
Now start with ATG
21
Important Rule

All variants should be described at the DNA level
(along with amino acid change)
Description at only amino acid level is ambiguous
DNA change may have effect on transcription,
splicing, or RNA stability (which cannot be
described by amino acid change)

22
Amino acid 1 vs. 3 letter code

One-letter
Simple
NCBI uses this
Currently in wide use
The prefix p. makes nomenclature unequivocal

Three-letter
HGVS recommends
Prevent mistakes and confusion (especially by lay
people)

23
Exon / Intron Numbering

Not solid
Many exon 0, exon 2a, exon 2b, etc.
Why is 3rd exon not exon 3 (but exon 2, or
exon 2b)?
Exon/intron numbering should not be used for
standard nomenclature
c.23-2CgtT, instead of c.IVS1-2CgtT

24
2 variants in 1 gene

Compound heterozygote (trans)
c.515CgtTc.2598delT
c.515CgtT? unknown mutation
2 variants in 1 allele (cis)
c.515CgtT c.2598delT
2 variants, alleles unknown
c.515CgtT()c.2598delT

25
SNP

AF478696.1g.21538GgtA
HGVS nomenclature
rs2306220
Genbank database SNP ID
(http//www.ncbi.nlm.nih.gov/sites/entrez)

26
Mutalyzer

www.lovd.nl/mutalyzer/
Helps naming of variants

27
Current issues

Intronic nucleotides do not exist in full length
coding DNA reference sequence
Where and which nucleotide ???
(e.g., CFTR NM_000492.3c.3718-2477CgtT)
Solutions
Use of genomic reference sequence
CFTR AY848832.1g.40725CgtT
2. RefSeqGene project

28
Q A
29
Question 1. Deletion
CFTR NM_000492.3
1501 accattaaag aaaatatcat ctttggtgtt tcctatgatg
aatatagata cagaagcgtc 1561 atcaaagcat gccaactaga
agaggacatc tccaagtttg cagagaaaga caatatagtt
506 507 508 509 510 Ile Ile Phe
Gly Val ATC ATC TTT GGT GTT
ATC ATT GGT GTT Ile Ile Gly Val

Wild-type Mutant
30
Question 1
CFTR NM_000492.3
1501 accattaaag aaaatatcat ctttggtgtt tcctatgatg
aatatagata cagaagcgtc 1561 atcaaagcat gccaactaga
agaggacatc tccaagtttg cagagaaaga caatatagtt
506 507 508 509 510 Ile Ile Phe
Gly Val ATC ATC TTT GGT GTT
ATC ATT GGT GTT Ile Ile Gly Val

Wild-type Mutant
1521 1523
31
Rules

c. based on coding DNA ref seq
g. based on genomic DNA ref seq
A range of nucleotide numbers is indicated by _
del for deletion

32
Answer 1
CFTR NM_000492.3
1501 accattaaag aaaatatcat ctttggtgtt tcctatgatg
aatatagata cagaagcgtc 1561 atcaaagcat gccaactaga
agaggacatc tccaagtttg cagagaaaga caatatagtt
506 507 508 509 510 Ile Ile Phe
Gly Val ATC ATC TTT GGT GTT
ATC ATT GGT GTT Ile Ile Gly Val

Wild-type Mutant
CFTR NM_000492.3c.1521_1523del (p.F508del)
33
Answer 1
CFTR NM_000492.3
1501 accattaaag aaaatatcat ctttggtgtt tcctatgatg
aatatagata cagaagcgtc 1561 atcaaagcat gccaactaga
agaggacatc tccaagtttg cagagaaaga caatatagtt
506 507 508 509 510 Ile Ile Phe
Gly Val ATC ATC TTT GGT GTT
ATC ATT GGT GTT Ile Ile Gly Val

Wild-type Mutant
CFTR NM_000492.3c.1521_1523del ? codon 508
(TTT) del
34
Question 2. Hemoglobin ?S (sickle cell anemia)
(Glu6Val, E6V) Coding DNA reference sequence
NM_000518.4
AgtT (codon 7 GAGgtGTG)
Start
1 atggtgcatc tgactcctga ggagaagtct gccgttactg
ccctgtgggg caaggtgaac 1 2 3 4 5 6 7
What is wrong with Glu6Val (in terms of
standard)? What is the official gene symbol for
the hemoglobin beta? What is nomenclature for
this common variant?
35
Answer 2. Hemoglobin ?S (sickle cell anemia)
(Glu6Val, E6V) Coding DNA reference sequence
NM_000518.4
AgtT (codon 7 GAGgtGTG)
Start
1 atggtgcatc tgactcctga ggagaagtct gccgttactg
ccctgtgggg caaggtgaac 1 2 3 4 5 6 7
What is wrong with Glu6Val (in terms of
standard)? This is codon 7 in primary
transcript What is the official gene symbol for
the hemoglobin beta? HBB What is nomenclature
for this common variant? HBB NM_000518.4c.20AgtT
(p.Glu7Val) (so-called hemoglobin ?S)
36
Question 3. Mutation in 3-UTR
Codon 177
Ala Asn Leu Ala Ser STOP CAA
GCC AAT CTT GCT AGC TAG AGT TTT GGT TCT

AGT AAG GTT CT
Wild-type Mutant
C
Genbank accession number is NM_00001 version
3. Gene symbol AAA1 What is nomenclature for
this mutation based on coding DNA reference
sequence?
37
Rules

c. based on coding DNA ref seq
for 3-UTR numbering
c.1 indicates the first nucleotide after the
stop codon
A range of nucleotide numbers is indicated by _
delins for deletion-insertion

38
Answer 3. Mutation in 3-UTR
Codon 177
4 6
Ala Asn Leu Ala Ser STOP CAA
GCC AAT CTT GCT AGC TAG AGT TTT GGT TCT

AGT AAG GTT CT
Wild-type Mutant
C
Genbank accession number is NM_00001 version
3. Gene symbol AAA1 AAA1 NM_00001.3c.4_6delin
sAA NOT AAA1 NM_00001.3c.4_6TTTgtAA (gt is
only for a single nucleotide substitution)
39
Question 4. Intronic duplication mutation
p. (protein) amino acid numbers
61
Ala Asn Leu Ala
CTTAATAG GCC AAT CTT GCT
c. (coding) nt numbers
181
Insertion of A
Genbank accession number is NM_00001 version
3. Gene symbol AAA1 What is nomenclature for
this mutation based on coding DNA reference
sequence?
40
Rules

c. based on coding DNA ref seq
An intron nucleotide can be indicated as
c.12091 or c.1210-3 (examples)
dup for duplication
The dup designation is preferred to ins,
because it is simpler, and implicates mechanism
(duplication)

41
Answer 4
p. (protein) amino acid numbers
Ala Asn Leu Ala
CTTAATAG GCC AAT CTT GCT
c. (coding) nt numbers
181
Insertion of A
Genbank accession number is NM_00001 version
3. Gene symbol AAA1 AAA1 NM_00001.3c.181-2dup
dup is preferred to AAA1 NM_00001.3c.181-2_181
-1insA because it is simpler and implies
mechanism of mutation
42
Q5. SNP in MGMT 5-UTR (non-coding exon)
69 Kb
CgtT change
Intron 1
Exon 1
Caccgtttgcgacttg
gtgagtgtctgggtcgcctcgctcccggaagagtg
cDNA reference sequence MGMT NM_002412.2 Includes

Exon 1
Exon 2
ATG codon 1
40 bp
Ogino et al. Carcinogenesis 2007 shows this
common SNP is strongly linked to MGMT
methylation in colorectal cancer
43
Rules

c. based on coding DNA ref seq
A nucleotide in 5-UTR can be indicated as c.-1
or c.-23 (examples)

44
A5. SNP in MGMT 5-UTR (non-coding exon)
MGMT NM_002412.2c.-56CgtT (rs16906252)
Intron 1
Exon 1
Caccgtttgcgacttg
gtgagtgtctgggtcgcctcgctcccggaagagtg
cDNA reference sequence MGMT NM_002412.2 Includes

Exon 1
Exon 2
ATG codon 1
40 bp
Ogino et al. Carcinogenesis 2007
45
Question 6. First codon mutation
p. (protein) amino acid numbers
1 2 3 4 5 6 7 8 9 10
11 Met Ala Asn Leu Ala Ser Pro Arg Phe
Gly Ser CAA ATG GCC AAT CTT GCT AGC CCT
AGA TTT GGT TCT 1 4 7 10 13 16
19 22 25 28 31
c. (coding) nt numbers
Mutation
TgtG
Genbank accession number is NM_00001 version
3. Gene Symbol AAA1 What is nomenclature for
this mutation based on coding DNA reference
sequence?
46
Answer 6
p. (protein) amino acid numbers
1 2 3 4 5 6 7 8 9 10
11 Met Ala Asn Leu Ala Ser Pro Arg Phe
Gly Ser CAA ATG GCC AAT CTT GCT AGC CCT
AGA TTT GGT TCT 1 4 7 10 13 16
19 22 25 28 31
c. (coding) nt numbers
TgtG
Genbank accession number NM_00001 version 3.
Gene symbol AAA1 AAA1 NM_00001.3c.2TgtG (p.?
or p.0) p.Met1Arg doesnt make sense.
47
Question 7. Inversion
1 atgcagaggt cgcctctgga aaaggccagc gttgtctcca
aacttttttt cagctggacc 61 agaccaattt tgaggaaagg
atacagacag cgcctggaat tgtcagacat ataccaaatc 121
ccttctgttg attctgctga caatctatct gaaaaattgg
aaagagaatg ggatagagag 181 ctggcttcaa agaaaaatcc
taaactcatt aatgcccttc ggcgatgttt tttctggaga 241
tttatgttct atggaatctt tttatattta ggggaagtca
ccaaagcagt acagcctctc 301 ttactgggaa gaatcatagc
ttcctatgac ccggataaca aggaggaacg ctctatcgcg 361
atttatctag gcataggctt atgccttctc tttattgtca
ggacactgct cctacaccca Change
to actgtt
Coding DNA reference sequence for the AAA1
gene Genbank No. NM_00001 Version 3
48
Answer 7. Inversion
1 atgcagaggt cgcctctgga aaaggccagc gttgtctcca
aacttttttt cagctggacc 61 agaccaattt tgaggaaagg
atacagacag cgcctggaat tgtcagacat ataccaaatc 121
ccttctgttg attctgctga caatctatct gaaaaattgg
aaagagaatg ggatagagag 181 ctggcttcaa agaaaaatcc
taaactcatt aatgcccttc ggcgatgttt tttctggaga 241
tttatgttct atggaatctt tttatattta ggggaagtca
ccaaagcagt acagcctctc 301 ttactgggaa gaatcatagc
ttcctatgac ccggataaca aggaggaacg ctctatcgcg 361
atttatctag gcataggctt atgccttctc tttattgtca
ggacactgct cctacaccca actgtt
Coding DNA reference sequence for the AAA1
gene Genbank No. NM_00001 Version 3 AAA1
NM_00001.3c.395_400inv AAA1 NM_00001.3c.395_400
delinsACTGTT But the inv description is simpler
and can implicate mechanism of this mutation
49
Question 8. Known polymorphisms in repeat sequence
Genomic reference sequence AJ574948.1 (CFTR
intron 8, 5T/7T/9T)
Exon 9 starts here (No. 1210 In coding DNA ref
seq NM_000492.3)
1 tataattatg tactataaag taataatgta tacagtgtaa
tggatcatgg gccatgtgct 61 tttcaaacta attgtacata
aaacaagcat ctattgaaaa tatctgacaa actcatcttt 121
tatttttgat gtgtgtgtgt gtgtgtgtgt gtttttttaa
cagggatttg gggaattatt 181 tgagaaagca aaacaaaaca
ataacaatag aaaaacttct aatggtgatg acagcctctt 241
cttcagtaat ttctcacttc ttggtactcc tgtcctgaaa
gatattaatt tcaagataga 301 aagaggacag ttgttggcgg
ttgctggatc cactggagca ggcaaggtag ttcttttgtt 361
cttcactatt aagaacttaa tttggtgtcc atgtctcttt
ttttttctag tttgtagtgc 421 tggaaggtat ttttggagaa
attcttacat gagcattagg agaatgt
What is collective nomenclature for the repeat
polymorphisms? What is nomenclature for the 5T
variant? What is nomenclature for the 9T
variant?
50
Rules for repeat polymorphism

Use the number of the first nucleotide
Followed by repeated unit (e.g., T or CA)
Describe the number of repeats in
8 a particular polymorphism (8 repeats)
(5_10) uncertain between 5 and 10 repeats

51
Answer 8. CFTR intron 8, 5T/7T/9T
g.152 or c.1210-12
AJ574948.1
Exon 9 starts here
1 tataattatg tactataaag taataatgta tacagtgtaa
tggatcatgg gccatgtgct 61 tttcaaacta attgtacata
aaacaagcat ctattgaaaa tatctgacaa actcatcttt 121
tatttttgat gtgtgtgtgt gtgtgtgtgt gtttttttaa
cagggatttg gggaattatt 181 tgagaaagca aaacaaaaca
ataacaatag aaaaacttct aatggtgatg acagcctctt 241
cttcagtaat ttctcacttc ttggtactcc tgtcctgaaa
gatattaatt tcaagataga 301 aagaggacag ttgttggcgg
ttgctggatc cactggagca ggcaaggtag ttcttttgtt 361
cttcactatt aagaacttaa tttggtgtcc atgtctcttt
ttttttctag tttgtagtgc 421 tggaaggtat ttttggagaa
attcttacat gagcattagg agaatgt
Repeat polymorphisms are collectively described
as CFTR AJ574948.1g.152T(5_9) or CFTR
NM000492.3c.1210-12T(5_9) The 5T is CFTR
AJ574948.1g.152T5 or CFTR NM000492.3c.1210-1
2T5 The 9T is CFTR AJ574948.1g.152T9 or
CFTR NM000492.3c.1210-12T9
52
Question 9. Known dinucleotide repeat
polymorphisms
Genomic DNA reference sequence AJ00001.1
1 tataattatg tactataaag taataatgta tacagtgtaa
tggatcatgg gccatgtgct 61 tttcaaacta attgtacata
aaacaagcat ctattgaaaa tatctgacaa actcatcttt 121
tatttttgat gtgtgtgtgt gtgtgtgtgt gtttttttaa
cagggatttg gggaattatt 181 tgagaaagca aaacaaaaca
ataacaatag aaaaacttct aatggtgatg acagcctctt 241
cttcagtaat ttctcacttc ttggtactcc tgtcctgaaa
gatattaatt tcaagataga 301 aagaggacag ttgttggcgg
ttgctggatc cactggagca ggcaaggtag ttcttttgtt 361
cttcactatt aagaacttaa tttggtgtcc atgtctcttt
ttttttctag tttgtagtgc 421 tggaaggtat ttttggagaa
attcttacat gagcattagg agaatgt
This TG repeat polymorphisms ranges 8 to 12
repeats in the general population What is
collective nomenclature for the repeat
polymorphisms? What is nomenclature for the
8-repeat allele? What is nomenclature for the
11-repeat allele?
53
Answer 9. Known dinucleotide repeat polymorphisms
130
1 tataattatg tactataaag taataatgta tacagtgtaa
tggatcatgg gccatgtgct 61 tttcaaacta attgtacata
aaacaagcat ctattgaaaa tatctgacaa actcatcttt 121
tatttttgat gtgtgtgtgt gtgtgtgtgt gtttttttaa
cagggatttg gggaattatt 181 tgagaaagca aaacaaaaca
ataacaatag aaaaacttct aatggtgatg acagcctctt 241
cttcagtaat ttctcacttc ttggtactcc tgtcctgaaa
gatattaatt tcaagataga 301 aagaggacag ttgttggcgg
ttgctggatc cactggagca ggcaaggtag ttcttttgtt 361
cttcactatt aagaacttaa tttggtgtcc atgtctcttt
ttttttctag tttgtagtgc 421 tggaaggtat ttttggagaa
attcttacat gagcattagg agaatgt
The repeat polymorphisms are collectively
described as AJ00001.1g.130TG(8_12) The
8-repeat allele is described as
AJ00001.1g.130TG8 The 11-repeat allele is
described as AJ00001.1g.130TG11
54
Question 10. Prothrombin G20210A DNA reference
sequence AF478696.1
Stop
3-UTR
21421 attgatcagt ttggagagta gggggccact catattctgg
gctcctggaa ccaatcccgt 21481 gaaagaatta
tttttgtgtt tctaaaacta tggttcccaa taaaagtgac
tctcagcgag
GgtA
What is wrong with G20210A (for the nucleotide
change)? What is the official gene symbol for
the prothrombin gene? What is nomenclature for
this common variant?
55
Answer 10. Prothrombin G20210A DNA reference
sequence AF478696.1
Stop
3-UTR
21421 attgatcagt ttggagagta gggggccact catattctgg
gctcctggaa ccaatcccgt 21481 gaaagaatta
tttttgtgtt tctaaaacta tggttcccaa taaaagtgac
tctcagcgag
GgtA
What is wrong with G20210A (for the nucleotide
change)? This can mean Gly-to-Ala change at
codon 20,210. What is the official gene symbol
for the prothrombin gene? F2 What is
nomenclature for this common variant? F2
AF478696.1g.21538GgtA F2 AF478696.1c.97GgtA (so-
called prothrombin G20210A)
56
Q11. Deletion of non-coding exon
Stop codon ends at c.885 (NM_00001.1) gene symbol
AAA1
Exon 7
Exon 8
?
?
c.5 c.573
Exon 7 ends at c.4
Exon 8 deletion, but exact positions of start
and end of deletion unknown
What is nomenclature for this exon 8 deletion?
57
Rules

For uncertain positions, use ?, but describe as
specific as possible
Exon/intron numbers should not be used
Exon/intron numbering is neither uniform nor
permanent

58
A11. Deletion of non-coding exon
Stop codon ends at c.885 (NM_00001.1) gene symbol
AAA1
Exon 7
Exon 8
?
?
c.5 c.573
Exon 7 ends at c.4
Exon 8 deletion, but exact positions of start
and end of deletion unknown
What is nomenclature for this exon 8
deletion? AAA1 NM_00001.1c.5-?_573?del
59
Take home messages

All DNA changes should be described at DNA level
(with amino acid changes if known)
Use HGNC-approved official gene symbols
Standard names can be accompanied by (colloquial
names)

Write a Comment

User Comments (0)