Converting Passport Data to the FAOIPGRI MCPD standard - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Converting Passport Data to the FAOIPGRI MCPD standard

Description:

Converting Passport Data to the FAO/IPGRI MCPD standard ... Script. Choices for moving from A to B. GUI. Script. Data flow data conversion. Perl conversion program ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 23
Provided by: thomas510
Category:

less

Transcript and Presenter's Notes

Title: Converting Passport Data to the FAOIPGRI MCPD standard


1
Converting Passport Data to the FAO/IPGRI MCPD
standard
  • An approach using the Perl programming language
  • T. Metz
  • September 2003

2
FAO/IPGRI MCPD an emergingde facto standard
  • EPGRIS input to the MCPD revision
  • Wide adoption
  • Standards are compromises
  • Database standard vs. data exchange standard
  • Tools and methodologies for implementing/conformin
    g
  • Cookbook approach vs. textbook approach

3
Approaches to converting data
GUIScript
4
Choices for moving from A to B
GUIScript
5
Data flow data conversion
ASCII file with tab-delimited passport data,
formatted as per genebank documentation system
Genebank documentation system
Perl conversion program
ASCII file with tab-delimited passport data,
formatted as per FAO/IPGRI MCPD standard
NI
EURISCO
6
The Perl interpreter software
C\perlgtdir PERL EXE 16,384 09-23-99
534p PERL.EXE PERL DLL 741,376
09-23-99 534p PERL.DLL C\perlgt
7
Running the conversion program
C\passport2mcpdgtdir DATA1IN DAT 3,080,192
06-22-03 606p data1in.dat CONVERT1 PL
1,272 09-08-03 1140p convert1.pl C\passport2m
cpdgtc\perl\perl convert1.pl C\passport2mcpdgtdir
DATA1OUT DAT 3,735,554 09-09-03 1243a
data1out.dat DATA1IN DAT 3,080,192
06-22-03 606p data1in.dat CONVERT1 PL
1,272 09-08-03 1140p convert1.pl C\passpor
t2mcpdgt
8
Input data (data1in.dat)
  • 1 Hordeum vulgare ABC001
  • 2 ZEA Mays ABC002
  • 3 Curcurbis melo ABC003
  • 4 Daucus carota ABC004
  • 1 Hordeum vulgare ABC001
  • 2 ZEA Mays ABC002
  • 3 Curcurbis melo ABC003
  • 4 Daucus carota ABC004
  • 1 Hordeum vulgare ABC001
  • 2 ZEA Mays ABC002

9
Elements of the conversion program- files for
input and output -
  • open(INPUT, "ltdata1in.dat" )
  • open(OUTPUT,"gtdata1out.dat")
  • close(INPUT)
  • close(OUTPUT)

10
Elements of the conversion program- reading
input and writing output -
  • while(defined (linein ltINPUTgt))
  • chomp(linein)
  • (row, genus, species, accenumb)
    split("\t", linein)
  • lineout join("\t", mcpd_instcode,
    mcpd_accenumb,
  • mcpd_genus,
    mcpd_species) . "\n"
  • print OUTPUT lineout

11
Elements of the conversion program- converting
input to output -
  • mcpd_accenumb accenumb
  • mcpd_genus FirstCapital(genus)
  • mcpd_species lc(species)
  • mcpd_instcode NLD001"

12
Elements of the conversion program- user-defined
conversion function -
  • FirstCapital
  • This function expects one character string as
    input.
  • It converts the entire string first to lower
    case
  • characters and then the first character to
    upper
  • case.
  • sub FirstCapital
  • my(inword) _at__
  • return ucfirst(lc(inword))

13
Elements of the conversion program- other
conversion functions -
  • Splitting a field
  • Merging two fields
  • Removing blank spaces
  • Recoding
  • Unit conversion
  • Converting geo-reference values (longitude,
    latitude)
  • Converting date values
  • Transliteration / character set conversion

14
Elements of the conversion program- advanced
functionality -
  • Direct database query (SQL)
  • Result uploading (FTP, HTTP)
  • Scheduling (CRON)

15
Input data (data1in.dat)
  • 1 Hordeum vulgare ABC001
  • 2 ZEA Mays ABC002
  • 3 Curcurbis melo ABC003
  • 4 Daucus carota ABC004
  • 1 Hordeum vulgare ABC001
  • 2 ZEA Mays ABC002
  • 3 Curcurbis melo ABC003
  • 4 Daucus carota ABC004
  • 1 Hordeum vulgare ABC001
  • 2 ZEA Mays ABC002

16
Output data (data1out.dat)
  • instcode accenumb genus species
  • NLD001 ABC001 Hordeum vulgare
  • NLD001 ABC002 Zea mays
  • NLD001 ABC003 Curcurbis melo
  • NLD001 ABC004 Daucus carota
  • NLD001 ABC001 Hordeum vulgare
  • NLD001 ABC002 Zea mays
  • NLD001 ABC003 Curcurbis melo
  • NLD001 ABC004 Daucus carota
  • NLD001 ABC001 Hordeum vulgare
  • NLD001 ABC002 Zea mays

17
GUI vs. programming approach
  • Programming requires higher level of IT skills
    than GUI
  • GUI is convenient for small datasets and unique
    conversions
  • Programming is applicable for large datasets and
    repeated conversion
  • Remote diagnosis and support is easier for
    programming
  • Problem solutions are easier transferable for
    programming

18
GUI vs. programming approach
  • Programming approach has high reliability and
    repeatability
  • Programming has higher initial investment, GUI
    has higher repeat cost
  • Programming is more resilient to staff changes
    and skills erosion
  • No data size limitation for programming approach

19
Why Perl ?
  • Free open source software
  • Available on almost any combination of hardware
    and software
  • Programs are portable
  • Developed for text manipulation
  • Perl is easy to start with, especially for
    programming beginners
  • Huge user and developer community long term
    availability

20
Conclusions
  • NIs and EURISCO should be complete and up-to-date
    (not snapshots)
  • Repeated tasks e.g. data transformation should be
    automated
  • A programming approach can help to reduce
    manual transformation work
  • Beware of the 80/20 rule
  • Perl is a suitable solution for the programming
    approach

21
Now what ?
  • Training manual (cookbook) under development
  • Early adopters ?
  • Remote assistance and support

22
  • The End
Write a Comment
User Comments (0)
About PowerShow.com