Title: Structure Prediction and Modeling of a Eukaryotic Member of the Major Facilitator Superfamily
1Structure Prediction and Modeling of a Eukaryotic
Member of the Major Facilitator Superfamily
2Major Facilitator Superfamily (MFS)
- MEMBRANE TRANSPORT
- Largest secondary transporter protein family
known so far with more than 1000 members
identified.1 - Use a solute gradient to drive the translocation
of substrates such as ions, sugars, amino acids,
peptides and other hydrophilic solutes.2 - Typically 400-600 amino acids long.
- 12 transmembrane ?-helices, with both the N- and
C-termini in the cytosol.3 - Two six-helix halves connected by a central loop.
- Found in all three kingdoms of living organisms.
3Identifying Templates and Targets
- TEMPLATES - Two known structures
- Lactose Permease (LacY) E. Coli
- Glycerol-3-Phosphate Transporter (GlpT) E. Coli
- Sequence identity between the two is negligible
(9). - CE algorithm for structural alignment indicates
that they superimpose over most of their chain
length (RMSD3.7Å) - 1st GOAL To find a Eukaryotic member of the MFS
that shows enough sequence identity with one of
the known structures to allow reasonable
alignment.
4Function and Mechanism of LacY and GlpT
Both use a solute gradient to drive translocation
of substrate - LacY mediates the coupled
transport of lactose and H - GlpT catalyzes the
exhange of glycerol-3-phosphate for phosphate
- Alternating-Access Model
- Outward-facing conformation exposed to the
extracellular side. - Inward-facing conformation exposed to the
cytoplasm. - Ribbon Representation
- Amino-terminal domain (blue).
- Carboxyl-terminal domain (green).
- Bends and other irregularities in the ?-helices
are indicated by deviations from ideally straight
and continuous helical ribbon.
5Identifying Templates and Targets
- Lactose Permease (LacY)
- Obtained protein pdb file from protein data bank
(1PV6) and extracted amino acid sequence in FASTA
format. www.rcsb.org/pdb - Searched for a TARGET with high sequence identity
using NCBI BLAST. www.ncbi.hlm.nih.gov - General search against all organisms 2
iterations, threshold 0.005 - - hits were mainly bacterial proteins.
- 2. Saved the results as a profile (PSSM)
- 3. More sensitive search using the original
sequence as well as the saved profile as input
while limiting to a eukaryotic search 2
iterations, threshold 0.01 - Unable to identify a suitable target.
6Identifying Templates and Targets
- Glucose-3-Phosphate Transporter (GlpT)
- Obtained protein pdb file from protein data bank
(1PW4) and extracted amino acid sequence in FASTA
format. www.rcsb.org/pdb - Searched for a TARGET with high sequence identity
using NCBI BLAST. www.ncbi.hlm.nih.gov - General search against all organisms 2
iterations, threshold 0.005 - Obtained a suitable TARGET Glucose-6-Phosphate
Translocase - Homo Sapien
- 3. Utilized BLink to identify several eukaryotic
close targets for use in multiple sequence
alignments.
7Multiple sequence alignment
- Only template and target - initial review
- Both templates, target and close targets
- 15 proteins similar to the target selected from
different species to get a better alignment - Only template and target extracted
- Around 30 similarity between template and
target - Well distributed alignment
8Alignment using FUGUE
10 20 30 40
50 hs1pw4a ( 5 ) fkpaphkarlpaaeidptYrrl
rwqIflGIffGyaAYylVRkNFALAMpy QUERY g6pt
-------------MAAQGYGYYRTVIFSAMFGGYSLYYFNRKTFSFVM
PS
aaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaa
60 70 80
90 100 hs1pw4a ( 55 )
L-veqgfsrgDLGfALSGISiAygfSkfimgsvSdrsnPrvfLPaGLilA
QUERY g6pt LVEEIPLDKDDLGFITSSQSAAYAIS
KFVSGVLSDQMSARWLFSSGLLLV
aaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaa
110 120
130 140 150 hs1pw4a ( 104 )
AavMlfMGfvpwATssiavMfvlLflCGwfQGmGwpPCgrTmvhwwsqke
QUERY g6pt GLVNIFFAWSSTV----PVFAALWFL
NGLAQGLGWPPCGKVLRKWFEPSQ
aaaaaaaaa aaaa aaaaaaaaaaaaaaa aaaaaaaaa
a 160 170
180 190 200 hs1pw4a ( 154 )
rggivsVwncAhNvggGiPPllFllGmawfndwhAALYmPAfcAilvA
lf QUERY g6pt FGTWWAILSTSMNLAGGLGPILAT
ILAQSY-SWRSTLALSGALCVVVSFL
aaaaaaaaaaaaaaaa aaaaaaaaaaa
aaaaaaaaaaaaa 210
220 230 240 250 hs1pw4a
( 204 ) AfamMrdTpqsCglppiee-----ykndtakqifmq
yVlpnklLwyIAiA QUERY g6pt
CLLLIHNEPADVGLRNLDPMPSEGKKGSLKEESTLQELLLSPYLWVLSTG
aaaa
aaaaaa aaaaaaaaa
260 270 280 290
300 hs1pw4a ( 262 ) NvfVyLLRYGiLDwSPtylkev
KhfaldkSSwAYflYEyagipGTllCgw QUERY g6pt
YLVVFGVKTCCTDWGQFFLIQEKGQSALVGSSYMSALEVGGLVGSIAA
GY aaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
310 320 330 340
350 hs1pw4a ( 312 ) msdkv----------frgnrGa
TGvfFMtlVtiaTivywmnpagNptvdm QUERY g6pt
LSDRAMAKAGLSNYGNPRHGLLLFMMAGMTVSMYLFRVTVTSDSPKLW
IL aaaa
aaaaaaaaaaaaaaaaaa aaaaa
360 370 380 390
400 hs1pw4a ( 352 )
iCmivIGflIyGPvmLIglHAleLApkkAagtAagfTglfGylgGSvaAs
QUERY g6pt VLGAVFGFSSYGPIALFGVIANESAP
PNLCGTSHAIVGLMANVGGFL-AG
aaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa
410 420
430 440 450 hs1pw4a ( 402 )
aiVGytvdffgwdgGfmvMigGSilAvilLivVmigekrrheqllqelv
p QUERY g6pt LPFSTIAKHYSWSTAFWVAEVICAA
STAAFFLLRNIRTKMGRVSKKAE--
aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa33333
9MPSA - only template and target
P_1P4W FKPAPHKARLPAAEIDPTYRRLRWQIFLG
IFFGYAAYYLVRKNFALAMPYLVEQG-FSRG GLUCOSE6HUMAN
-------------MAAQGYGYYRTVIFSAMFGGYSLYYFNRKTFSFV
MPSLVEEIPLDKD
. ..
.. P_1P4W DLGFALSGISIAYGFSKFIMGSV
SDRSNPRVFLPAGLILAAAVMLFMGFVPWATSSIAVM GLUCOSE6HUMA
N DLGFITSSQSAAYAISKFVSGVLSDQMSARWLFSSGLLLVG
LVNIFFAWS----STVPVF .
. .. ... .
. P_1P4W FVLLFLCGWFQGMGWPPCGRT
MVHWWSQKERGGIVSVWNCAHNVGGGIPPLLFLLGMAWF GLUCOSE6HU
MAN AALWFLNGLAQGLGWPPCGKVLRKWFEPSQFGTWWAILS
TSMNLAGGLGPILATI-LAQS .
. . . . .
P_1P4W NDWHAALYMPAFCAILVALFAFA
MMRDTPQSCGLP-----PIEEYKNDTAKQIFMQYVLP GLUCOSE6HUMA
N YSWRSTLALSGALCVVVSFLCLLLIHNEPADVGLRNLDPMP
SEGKKGSLKEESTLQELLL .
.. .. . ..
P_1P4W NKLLWYIAIANVFVYLLRYGILDW
SPTYLKEVKHFALDKSSWAYFLYEYAGIPGTLLCGW GLUCOSE6HUMAN
SPYLWVLSTGYLVVFGVKTCCTDWGQFFLIQEKGQSALVGSS
YMSALEVGGLVGSIAAGY .
. . . . .
. P_1P4W MSDKVFRGN--------RGATGVF
FMTLVTIATIVYWMNPAGN--PTVDMICMIVIGFLI GLUCOSE6HUMAN
LSDRAMAKAGLSNYGNPRHGLLLFMMAGMTVSMYLFRVTVTS
DSPKLWILVLGAVFGFSS .
. . .
P_1P4W YGPVMLIGLHALELAPKKAAGTAAGFT
GLFGYLGGSVAASAIVGYTVDFFGWDGGFMVMI GLUCOSE6HUMAN
YGPIALFGVIANESAPPNLCGTSHAIVGLMANVGGFLAGLPFSTI
AKHYSWSTAFWVAEV
. ... . . . . .
P_1P4W GGSILAVILLIVVMIGEKRRHEQLLQ
ELVP GLUCOSE6HUMAN ICAASTAAFFLLRNIRTKMGRVSK
KAE--- .
.
10Extracted template-target
P_1PW4 -------FKPAPHKARLPAAEIDPTYRRLRWQIFLGI
FFGYAAYYLVRKNFALAMPYLVE gi2765461e
--------------------MAAQGYGYYRTVIFSAMFGGYSLYYFNRKT
FSFVMPSLVE
. P_1PW4
QGFS---RGDLGFALSGISIAYGFSKFIMGSVSDRSNPRVFLPAGLILAA
AVMLFMGFVP gi2765461e EIPLD--KDDLGFITSSQSAAYAISK
FVSGVLSDQMSARWLFSSGLLLVGLVNIFFAWSS
. . .
P_1PW4 WATSS--IAVMFVLLFLCGWFQGMGWPP
CGRTMVHWWSQKERGGIVSVWNCAHN--VGGG gi2765461e
TVP------VFAALWFLNGLAQGLGWPPCGKVLRKWFEPSQFGTWWAILS
TSMN--LAGG .
. .. P_1PW4
IPP-------LLFLLGMAWFN-----------DWHAALYMPAFCAILVAL
FAFAMMRDTP gi2765461e LGP-------ILATILAQSYS-----
-------WRSTLALSGALCVVVSFLCLLLIHNEP
. ..
P_1PW4 QSCGLPPIEEYKNDT-------------
------AKQIFMQYVLPNKLLWYIAIANVFVY gi2765461e
ADVGLRNLDPMPSEG--------------KKGSLKEESTLQELLLSPYLW
VLSTGYLVVF
. . . P_1PW4
LLRYGILDWSPTYLKEVKHFALDK-SSWAYFLYEYAGIPGTLLCGWMSDK
VFR------- gi2765461e GVKTCCTDWGQFFLIQEKGQSALV-G
SSYMSALEVGGLVGSIAAGYLSDRAMAKAGLSNY
. .
P_1PW4 -GNRGATGVFFMTLVTIATIVYWMNPAG
---------------NPTVDMICMIVIGFLIY gi2765461e
GNPRHGLLLFMMAGMTVSMYLFRVTVTSD-----------S--PKLWILV
LGAVFGFSSY
P_1PW4
GP-VMLIGLHALELAPKKAAGTAAGFTGLFGYLGGSVAASAIVGYTVDF-
FGWDGGFMVM gi2765461e GP-IALFGVIANESAPPNLCGTSHAI
VGLMANVG-GFLAGLPFSTIAKH-YSWSTAFWVA
P_1PW4 IGGSILAVILLIVVMIGEKRRHEQLLQE
LVP----------------------------- gi2765461e
EVICAASTAAFFLLRNIRTKMGRVSKKAE---------------------
----------
11Checking alignment in MODELER
- Using chk_align.top script
- _aln.pos 210 220 230 240
250 260 270 - 1PW4 MRDTPQSCGLPPIEEYKND/T-----AKQIFMQYVLPNKL
LWYIAIANVFVYLLRYGILDWSPTYLKE - G6PT IHNEPADVGLRNLDPMPSE-GKKGSLKEESTLQELLLSPY
LWVLSTGYLVVFGVKTCCTDWGQFFLIQ - _consrvd
- Problem near chain break
- _aln.pos 210 220 230 240
250 260 270 - 1PW4 MRDTPQSCGLPPIEEYKND/----TAKQIFMQYVLPNKLL
WYIAIANVFVYLLRYGILDWSPTYLKEV - G6PT IHNEPADVGLRNLDPMPSEGKKGSLKEESTLQELLLSPYL
WVLSTGYLVVFGVKTCCTDWGQFFLIQE - _consrvd
12Modeler Runs
- Using extracted template and target alignment
- Sequence for template extracted from structure
using Insight - Missing residues in structure appear as chain
breaks - Parameters
- OUTPUT_CONTROL 1 1 1 1 1
- STARTING_MODEL 1
- ENDING_MODEL 5
- LIBRARY_SCHEDULE 4
- MD_LEVEL 'refine_1'
13PROSA 2 runs
- Used to evaluate models
- Models with best scores from MODELER were
compared using PROSA - Z value used for initial comparison
- Graph used to identify location of major
violations
14Model Selection Criteria
- MODELER log file
- Minimum energy
- Number of violations
- Number of really bad violations
- Location of violations with respect to alignment
and structure - PROSA 2 log file
- Z score closest to template
- Peaks and troughs in graph relative to template
15Adjusting the alignment
- Comparison of structures obtained from modeler in
Insight - Alignment violations clearly visible
- Criteria for modifying alignment
- Unequal number of residues in loop
- Unsatisfied structural similarity constraints
- Residues violating constraints as generated by
modeler
16(No Transcript)
171st run - adjustment in Insight
18(No Transcript)
19Loop Modeling
- Modeler Run 2
- Loop Modeling Run 1
20Loop modeling
- Generate models based on adjusted alignment
- 25 models obtained
- Models selected based on minimum energy and
constraint violations - Parameters
- OUTPUT_CONTROL 1 1 1 1 1
- STARTING_MODEL 1
- ENDING_MODEL 5
- LIBRARY_SCHEDULE 2
- MD_LEVEL 'refine_3
- DO_LOOPS 1
- LOOP_ENDING_MODEL 5
- LOOP_MD_LEVEL 'refine_3
21Loop Modeling Run 1Best 4 Models Picked
- ID1, ID2
1 5 - Current energy
192 - PROSA Z score -6.60
- ( Z score of template -7.3 )
- ID1, ID2
3 2 - Current energy
387 - PROSA Z score -6.57
- ID1, ID2
4 2 - Current energy
363 - PROSA Z score -6.76
- ID1, ID2
5 4 - Current energy
242 - PROSA Z score -6.3
22Violations - MODELER log file
ID1, ID2
1 5 Current energy
192.1849
RESTRAINT_GROUP NUM
NUMVI NUMVP RMS_1 RMS_2 MOL.PDF
S_i ----------------------------------------------
--------------------------------------------------
- 25 Phi/Psi pair of dihedral restraints 64
44 11 36.170 140.638 79.036 1.000
--------------------------------------------------
----------------------------------------------- F
eature 25 Phi/Psi
pair of dihedral restraints List of the
RVIOL violations larger than 6.5000
ICSR RESNO1/2 ATM1/2 INDATM1/2 FEAT
restr viol rviol RESTR VIOL RVIOL
7 1360 45D 46K C N 368 370 -68.99
-70.20 30.80 2.20 -62.90 150.55 19.23
7 46K 46K N CA 370 371 109.62
140.40 -40.80 8 1361 46K
47D C N 377 379 173.18 54.50 123.21
12.43 -63.30 132.44 18.20 8 47D
47D N CA 379 380 7.79 40.90
-40.00 9 1362 47D 48D C N 385
387 -138.58 -63.30 76.02 11.52 -63.30
76.02 11.52 9 48D 48D N CA
387 388 -29.45 -40.00 -40.00
12 1369 103F 104A C N 811 813 -69.81
-68.20 21.24 1.77 -62.50 165.18 26.73
12 104A 104A N CA 813 814 124.12
145.30 -40.90 13 1370 104A
105A C N 816 818 -169.75 -62.50 107.58
21.02 -62.50 107.58 21.02 13 105A
105A N CA 818 819 -49.29 -40.90
-40.90
231st loop model - violations in Insight
Residue 104
Residue 46
24Loop Model Run 1 - adjustment
25Loop Modeling 2
- Refinement of Loop Model 1
- Loop Modeling 2
- Modeler Run 3
26Loop Modeling Run 2Best 5 Models
- ID1, ID2
5 1 - Current energy
237.4322 - PROSA Z score -5.82
- ID1, ID2
3 1 - Current energy
222.2522 - PROSA Z score -6.27
- ID1, ID2
1 1 - Current energy
195.7286 - PROSA Z score -6.32
- ID1, ID2
2 4 - Current energy
226.8002 - PROSA Z score -6.09
- ID1, ID2
2 2 - Current energy
198.0359 - PROSA Z score -6.15
27Violations - MODELER log file
ID1, ID2
1 1 Current energy
195.7286
RESTRAINT_GROUP NUM
NUMVI NUMVP RMS_1 RMS_2 MOL.PDF
S_i ----------------------------------------------
--------------------------------------------------
- 4 Stereochemical improper torsion pot 156
1 2 1.943 1.943 16.723
1.000 25 Phi/Psi pair of dihedral restraints
67 40 11 34.260 132.074 73.358
1.000
--------------------------------------------------
----------------------------------------------- F
eature 25 Phi/Psi
pair of dihedral restraints List of the
RVIOL violations larger than 6.5000
ICSR RESNO1/2 ATM1/2 INDATM1/2 FEAT
restr viol rviol RESTR VIOL RVIOL
3 1430 45D 46K C N 368 370 -103.79
-118.00 33.92 1.76 -62.90 154.80 22.53
3 46K 46K N CA 370 371 169.89
139.10 -40.80 4 1431 46K
47D C N 377 379 -95.02 -70.90 59.16
2.00 -63.30 119.95 16.85 4 47D
47D N CA 379 380 -155.68 150.30
-40.00 5 1432 47D 48D C N 385
387 -63.33 -70.90 31.08 1.19 -63.30
160.16 19.77 5 48D 48D N CA
387 388 120.16 150.30 -40.00
9 1441 103F 104A C N 811 813 -122.41
-134.00 20.39 1.24 -62.50 166.47 30.50
9 104A 104A N CA 813 814 163.78
147.00 -40.90 10 1442 104A
105A C N 816 818 -64.90 -68.20 29.69
2.28 -62.50 156.71 25.57 10 105A
105A N CA 818 819 115.80 145.30
-40.90
28Loop Model Violation Sites
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33Refinements in Final Model
- Some regions can be realigned and refined further
taking into consideration their energy
violations. - Other tools could be used such as PROCHECK etc in
addition to Modeler and PROSA to get further
insight into energy details. - Structural alignment of model with other known
transport protein structures might be of some
help.