Introducing Voyager with Unicode - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Introducing Voyager with Unicode

Description:

Voyager with Unicode : A Catalogers Session Connie Braun Training Consultant Agenda Release Update General release occurred October 6, 2004! 4 production ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 69
Provided by: libUwater
Category:

less

Transcript and Presenter's Notes

Title: Introducing Voyager with Unicode


1
Voyager with Unicode A Catalogers Session
Connie Braun Training Consultant
2
Agenda
Introduction Your Work Environment
Conversion New Features Learning More QA
3
Release Update
  • General release occurred October 6, 2004!
  • 4 production partners
  • 1 Windows Server, 3 Solaris
  • 8 test server partners
  • 4 Task Force members (large non-roman
    collections)
  • 1 large consortia with Universal Borrowing
    Universal Catalog
  • 2 European customers
  • As of 01/20/05, 71 customers have upgraded and
    are functioning in a production environment with
    Voyager with Unicode. Approximately 50 upgrades
    are scheduled between now and May 2005.

4
Why Unicode in Voyager?
  • Brings Voyager up to current IT standards
  • Finds and displays records in the native
    language
  • Create and edit any MARC record using UTF-8
  • Import and export of records with any supported
    character set
  • Operators may select a Unicode-compliant font of
    their choice
  • Display Unicode characters in OPAC without
    proprietary software

5
Implementing Voyager with Unicode
For our customers, its business as usual, but
with some interesting changes and improvements,
especially in Cataloging. Helping everyone to
implement a Unicode-compliant system is
Endeavors aim. The Unicode standard is an
important step towards realizing that
goal. Implementing the Unicode standard is an
extension of Endeavors original mission access
to information regardless of location or format.
6
Following Standards
  • Follows Standards (not proprietary)
  • See http//www.unicode.org for much more detail
    on these standards.
  • See http//lcweb.loc.gov/marc/specifications/specc
    harucs.html for details on LCs format of MARC
    records that use Unicode. Voyager follows this
    specification.
  • Specifics on the Code Tables may be viewed at
    http//www.loc.gov/marc/specifications/specchartab
    les.html
  • The Voyager implementation of the Unicode
    standard gives libraries and their users greater
    flexibility when accessing collection materials
    that contain both Roman and non-Roman text.

7
Multilingual Input and Display
  • By introducing improved multilingual input and
    display capabilities in Voyager, characters now
    display correctly according to the Unicode and
    MARC standards.
  • Greater script coverage for cataloging items in
    your collections, published in languages around
    the world.
  • How many? The total number of possible characters
    for UTF-8 is 2,147,483,648!

8
Preview Server
  • Anyone interested in trying out Voyager with
    Unicode before your upgrade? You can!
  • http//support.endinfosys.com/cust/voy/upgrade/uni
    code/testwv_pre.html provides all the details
    necessary to get you started
  • Preview Server uses the Voyager training database
    that has been augmented with numerous records in
    both Roman and non-Roman languages
  • Try keyword searches
  • non roman script japanese
  • non roman script arabic
  • roman script french
  • roman script italian

9
Agenda
  • Introduction
  • Your Work Environment
  • Workstation Requirements
  • Setting Up For Languages Other Than English
  • Tag Tables
  • Session Defaults and Preferences
  • Conversion
  • New Features
  • QA

10
Workstation Requirements
  • In order to enjoy the full range of benefits, PCs
    must have up-to-date operating systems and
    productivity software.
  • This means that staff PCs will need
  • Windows 2000 or XP operating system
  • Unicode standard compliant Internet browser
  • IE 6
  • Netscape 6
  • Unicode-compliant font Lucida and Arial Unicode
    MS

11
MS Windows
  • Voyager is more integrated with Windows in terms
    of
  • Standard Windows 2000/XP Unicode support
  • Standard Unicode fonts
  • Standard input using Input Method Editors (IMEs)
  • Standard browser support

12
Setting Up for Languages Other Than English
  • Workstations need to be specifically configured
    to work with languages other than English
  • Likely will require technical IT assistance to
    install needed languages on staff PCs
  • Best to install all languages so that cataloger
    may easily include new ones as necessary

13
Adding Languages to PCs
  • Regional and language options are specific to
    each PC
  • Among options available via Start Settings
    Control Panel
  • Details button on Languages tab lets operator
    view or change languages and methods to enter
    text
  • Can include supplemental language support, too

14
Choosing Languages
  • Languages added to PCs will match languages for
    items found in your collections
  • Add and remove according to your needs as few or
    many as necessary
  • May also set preferences for language bar and key
    settings

15
Tag Tables
  • MARC Tag Tables have been completely revised and
    rewritten for Voyager with Unicode

16
Tag Tables
  • Ability to modify tag table configuration remains
    the same as in earlier releases
  • But, may not specify anything for Leader position
    9 since that byte is now hard-coded to identify
    records that have been converted to UTF-8
  • May want to consider whether or not library will
    need or want to revise Tag Tables for local use
  • See Appendix A of Cataloging Users Guide for
    full details on revising, maintaining and
    updating the Tag Tables

17
Record Validation
  • MARC validation
  • MARC21 character set validation
  • Authority control validation
  • Decomposition of accented characters for MARC21

18
Session Defaults and PreferencesRecord
Validation
  • Bypass MARC21 Character set validation
  • Uses MARC21 Repertoire.cfg to control validation
    of the MARC21 character set
  • Helps to enforce MARC21 standard
  • Bypass Decomposition of accented characters for
    MARC21
  • Allows records to be saved to the database
    without decomposing the characters
  • IMPORTANT If you select this option, MARC21
    rules are ignored. We strongly recommend that
    this check box be un-checked, in order to comply
    with the MARC21 standard.

19
Session Defaults and PreferencesMapping Tab
  • Expected Character Set of Imported Records now
    has six options

20
Session Defaults and Preferences Colors/Fonts Tab
21
Agenda
  • Introduction
  • Your Work Environment
  • Conversion
  • Data Conversion
  • Conversion Error Logging
  • Conversion Details
  • Identifying Non-Unicode Data
  • The Rest of Voyager
  • New Features
  • Learning More
  • QA

22
Data Conversion
  • Conversion process during upgrade treats data
    differently than when importing records through
    Cataloging client or via BulkImport
  • MARC records are converted from VRLIN (Voyager
    legacy encoding) to MARC21 compliant UTF-8
    encoding
  • Leader position 9 becomes an a
  • Conversion Log Created
  • UTF-8 allows for variable length characters. The
    majority of characters in the database occupy the
    same amount of space as before conversion.
  • Note All indexes and database columns with MARC
    data are regenerated after conversion.

23
Conversion Details
  • IMPORTANT! NO RECORDS ARE LOST
  • Each field in the record handled individually.
  • As each field is processed, it may change length,
    requiring adjustments to the leader and directory
    of the record. 
  • Records are saved to the database with a leader
    position 9 a. 
  • Both record-level and field-level checking are
    performed. In rare cases an entire record might
    fail conversion it is more likely that an
    individual field fails to be converted.
  • Records may not convert if they contain text that
    cannot be mapped into Unicode according to the
    standard MARC-8 to Unicode mappings.
  • Records that do not convert are stored in the
    database as is, without being converted to
    Unicode.

24
Conversion Error Logging
  • Libraries need to know the details about the
  • results of the conversion process.
  • Full error checking and logging is included as
    part of the upgrade
  • Technical Users Guide, Chapter 4
  • Cataloging Users Guide, Appendix C
  • Library designates should review this file to
    plan for correcting any records that have errors

25
Sample from Conversion Log File

26
Conversion Log Details 1
  • 1 2 3 4 5 6 7
  • 11 secs read982 changed791 8800 okay982
    errors0 written982
  • 21 secs read1931 changed1558 8800 okay1931
    errors0 written1931
  • 29 secs read2848 changed2087 8800 okay2848
    errors0 written2848
  • 36 secs read3699 changed2533 8800 okay3699
    errors0 written3699
  • 43 secs read4607 changed3076 8800 okay4607
    errors0 written4607
  • 51 secs read5519 changed3610 8800 okay5519
    errors0 written5519

Legend 1 number of seconds used by job so far 2 readnumber of records processed 3 changednumber of records changed 4 880how many records contain 880s 5 okay records processed successfully 6 errors records not processed due to errors 7 written records written to the database
27
Conversion Log Details 2
  • 1 2 3 4 5 6 7 8
  • bib 6213 17(700) c-gt8 loose char page0 at
    20 '091e ..
  • 9
  • bib 35322 14(856) c-gt8 undefined char page0
    at 61 'fc7220486973746f .r Histo
  • 10
  • bib 35516 23(856) c-gt8 no char to combine to
    page0 at 82 '1e .


1 record type and id 2 index within record of field that generated error 3 tag that generated error 4 c-gt8 indicates conversion to UTF-8 encoding 5 description of error 6 pagesubset to which source character belongs 7 at position of source character that caused error 8 hex dump of source character 9 description of error 10 description of error
28
Conversion Log Details 3
loose char a warning message indicating that a character not strictly part of Voyager encoding has been converted (e.g. unexpected carriage return)
no char to combine to a warning message indicating that a combining character appeared but it lacks a base character with which to combine (e.g. umlaut but no a, o, u base letter)
undefined char an error message indicating that there is a single character that cannot be mapped to UTF-8
29
Identifying non-Unicode data
  • To identify a non-Unicode record in the
    Cataloging client, select a color for Conversion
    records in Session Defaults and Preferences gt
    Colors-Fonts tab.

30
Identifying non-Unicode data
  • Any non-converted record displays in the color
    selected in Options/Preferences.

31
Identifying non-Unicode data
  • There are other ways to identify records that
    have conversion errors.

Records that cannot be converted to Unicode are
viewable in the Cataloging module with nc (not
converted) displayed in the Title Bar.
Any characters that cannot be matched or
recognized are replaced with a Unicode
substitution character.
32
Fonts and Unicode
  • A MARC record may contain non-Roman characters
    even though you cannot see them.
  • Records are sure to display correctly if a
    Unicode-compliant font has been selected.
  • Lucida Sans Unicode installed by default with
    Windows
  • Arial Unicode MS
  • Good choice for libraries with mixed cataloging
  • Included with Microsoft Office and other
    Microsoft products

33
The Rest of Voyager
  • Non-MARC data is not converted
  • Acquisitions data
  • Circulation data (patron info, etc.)
  • Item data
  • Reporter
  • Not Unicode standard compliant
  • Translates data to LATIN1
  • Dots appear where you used to see squares

34
Agenda
  • Introduction
  • Your Work Environment
  • Conversion
  • New Features
  • Cataloging
  • Diacritics Special Characters, Importing
    Records, New Record Views, Search URIs
  • WebVoyáge
  • Browsers, Searching, Displaying
  • Interacting with Other Systems
  • Learning More
  • QA

35
Diacritic and Special Character Entry
  • Cataloging practices then and now
  • Pre-Unicode input in Cataloging accent
    character (diacritic) precedes the base
    character.
  • Example Espana
  • Post-Unicode input in Cataloging accent
    character (diacritic) follows the base character.
  • Example Espana
  • Ability to display combined characters is an
    improvement over past versions and a way to
    insure accurate entry
  • Example España

36
Special Characters.cfg
SpecialCharacters.cfg, located in the
C\Voyager\Catalog folder, defines the content of
the special character entry dialog box. Operators
may define their most frequently used characters
here.
37
Special Character Entry
This is what the dialog box in Cataloging looks
like.
The key press column identifies the keyboard
equivalent that may be used instead of turning on
Special Character Mode in Cataloging.
38
Finding Little Used Characters
  • For situations where a character not part of the
    Special Characters list is needed, operator can
    use Character Map from MS Windows
  • Start Programs Accessories System Tools
    Character Map
  • Locate character or perform search
  • Select and Copy character, then paste into
    position in bib record

39
Cataloging Input of Non-Roman Text
Voyager with Unicode allows Cataloging operators
to use all of the standard Microsoft Windows
keyboard and input method editors (IMEs). With
this functionality in place, operators may search
for, display, and edit the contents of all MARC
records using the full range of UTF-8
characters. Entire JACKPHY group is part of the
UTF-8 character set which includes right-to-left
input needed for Arabic, Persian, Hebrew and
Yiddish. Reminder JACKPHY Japanese, Arabic,
Chinese, Korean, Persian, Hebrew, Yiddish
40
Linking in a MARC21 Record
Tag I1 I2 Subfield Data
100 1 6 880-01 a An, Zhen.
245 1 0 6 880-02 a Ri yue yun yan / c An Zhen zhu.
250 6 880-03 a Di 1 ban.
260 6 880-04 a Changchun Shi b Changchun chu ban she, c 1997.
300 a 4, 2, 291 p. c 21 cm.
440 0 6 880-05 a Zhongguo li dai wang chao xing shuai qu shi lu
500 a Non-Roman script Chinese
651 0 a China x History y Ming dynasty, 1368-1644.
880 1 6 100-01/1 a ? ?.
880 1 0 6 245-02/1 a ?? ?? / c ? ? ?.
880 6 250-03/1 a ?1?.
880 6 260-04/1 a ??? b ?? ???,c 1997.
880 0 6 440-05/1 a ?? ?? ?? ?? ???
41
Using On-Screen Keyboard
  • Typically, the path is StartProgramsAccessories
    AccessibilityOn-Screen Keyboard

42
Importing Records
  • Conversion process is separate and distinct from
    the process of importing records
  • Important distinction for operators who import
    records through the Cataloging client or via
    BulkImport
  • Expected character set needs to be accurately
    identified if records are to be imported
    correctly
  • Some experimentation may be necessary to
    determine the correct character set
  • Lets look at some details to help everyone
    understand what is happening

43
Record Exchange Scenarios
44
Voyager 2001.2 and earlier
  • In Voyager 2001.2 and earlier, there were several
    options from which to choose regarding the
    character set
  • Latin1
  • OCLC
  • RLIN legacy
  • MARC21 MARC8
  • Until now it has been quite simple to choose the
    correct option when importing records through the
    Cataloging client or processing large numbers of
    records through BulkImport.

45
After Upgrade to Voyager 2003.1
  • From Voyager 2003.1 forward, there are numerous
    options from which to choose regarding the
    character set
  • Latin1 (non-Unicode)
  • MARC21 MARC8 (non-Unicode)
  • MARC21 UTF8
  • OCLC (non-Unicode)
  • RLIN legacy (non-Unicode)
  • Voyager legacy (non-Unicode)
  • With Voyager 2003.1 and beyond, it is very
    important to determine the character set of
    records before importing records through the
    Cataloging client or processing large numbers of
    records through BulkImport. Some experimentation
    may be necessary.
  • transition to MARC21 UTF8 occurs as Unicode
    standard becomes pervasive

46
One Year From Now
  • In Voyager 2003.1 and beyond, numerous options
    for character sets will continue to be needed
  • Latin1 (non-Unicode)
  • MARC21 MARC8 (non-Unicode)
  • MARC21 UTF8
  • OCLC (non-Unicode)
  • RLIN legacy (non-Unicode)
  • Voyager legacy (non-Unicode)
  • But, the Unicode standard will be much more
    pervasive, having been adopted and deployed by
    bibliographic utilities, vendors who massage
    records, vendors who supply records, and others.
  • This means that selecting the correct option will
    again be simpler, even though knowing the
    character sets will continue to be very
    important.

47
Bulk Import
  • Bulk Import of MARC Records
  • Fundamentally the same as before
  • Leader byte 9 is checked against the incoming
    character set identified in the import rule.
  • Blank non-Unicode converted imported
  • a Unicode imported
  • Neither Blank nor a errors out not imported
  • See log.imp.yyyymmdd for details on import
    success
  • Records that cannot be converted are not
    imported found in err.imp.yyyymmdd

48
Bulk Import and Expected Character Set
  • Character set mapping for Bulk Import is
    designated in the Bulk Import rule in SysAdmin gt
    Cataloging gt Bulk Import Rules.

49
MARC Export
  • Default export character set is MARC21 UTF-8
  • Use the a option to choose different character
    set (in the command line)
  • See page 10-8, in Technical Users Guide for more
    detail
  • LATIN1 records will get a dot exported for
    characters outside the LATIN1 character set
  • If mapping for a composed character is not found,
    it decomposes and Voyager attempts to find a
    match for each part.

50
New ISBN Indexes
  • For improved duplicate detection
  • New ISBN Index
  • 020N 020a Number only
  • 020R 020z Number only
  • 020 a 1234567890 (Knopf)
  • 020 a 1234567890
  • ? Check Bibliographic and Authority duplicate
    detection profiles in System Administration!

51
HTTP Posting
  • Much easier access to WebVoyáge display from
    clients
  • Available in Cataloging, Acquisitions
    Circulation
  • Toggle record view from staff client to WebVoyáge
  • Record menu in Cataloging contains a Send Record
    to option
  • Send Record To WebVoyáge
  • LinkFinderPlus available in Cataloging,
    Acquisitions Circulation
  • Record menu in Cataloging contains a Send Record
    to option
  • Send Record To LinkFinderPlus
  • Configured in voyager.ini file MARC POSTing
    stanza

52
Enabling HTTP Posting
  • To enable HTTP posting, a stanza is added to
    the voyager.ini file. An example is shown below.
  • MARC POSTing
  • WebVoyage"http//train20031-c1db.comet.endinfosys
    .com/cgi-bin/Pbibredirect.cgi"
  • LinkfinderPlus"http//207.56.64.116/cgi-bin/Phttp
    linkresolver.cgi"

53
Easier Access to OPAC Display
  • Send Record To.in Cataloging
  • Send Record To.in Acquisitions

54
Search URI
  • Staff Client Search URI in Cataloging,
    Circulation and Acquisitions
  • Drive searches to resources on the web
  • Add new button to search interface in staff
    clients
  • Click buttona browser is opened search is
    executed
  • This is PC specific (voyager.ini)
  • Possible applications
  • Link to another OPAC
  • Link to one of your vendors
  • Link to an online book seller

55
Presenting Search URI
Staff client search URI
Available in Cataloging, Circulation, and
Acquisitions
56
Adding Search URIs
  • clipped from voyager.ini
  • SearchURI
  • NameGoogle
  • URIhttp//www.google.com
  • CopyY
  • SearchSyntax/search?qltsearchtextgt
  • NameBarnesNoble
  • URIhttp//search.barnesandnoble.com
  • CopyY
  • SearchSyntax/booksearch/results.asp?WRDltsearcht
    extgt
  • NameGale Group
  • URIhttp//www.galegroup.com
  • CopyY
  • SearchSyntax/servlet/SearchPageServlet?region9
    imprintltsearchtextgt

57
WebVoyáge and Unicode
  • MARC data supplied to the browser in UTF-8
  • IE 6 generally displays Unicode characters
    correctly. Some characters do not display
    correctly unless a Unicode-compliant font is
    selected.
  • Netscape 6 figures out that it needs to display
    Unicode characters without any special settings
  • Consider new help text in your OPAC to help
    patrons understand about language options,
    especially if there are records using different
    languages in your database
  • New UTF-8 download/save format

58
Searching in WebVoyáge
  • Search and display in native languages for staff
    and users.
  • WebVoyáge and Cataloging allow Unicode character
    input you can search for and retrieve records in
    native languages.
  • Record display includes non-Latin scripts,
    including right-to-left scripts like Arabic and
    Hebrew. Voyager takes advantage of the web
    browsers native rendering support.

59
Records with Other Languages in the OPAC
60
Displaying Records in WebVoyáge
61
Linking in a MARC21 Record
Tag I1 I2 Subfield Data
100 1 6 880-01 a An, Zhen.
245 1 0 6 880-02 a Ri yue yun yan / c An Zhen zhu.
250 6 880-03 a Di 1 ban.
260 6 880-04 a Changchun Shi b Changchun chu ban she, c 1997.
300 a 4, 2, 291 p. c 21 cm.
440 0 6 880-05 a Zhongguo li dai wang chao xing shuai qu shi lu
500 a Non-Roman script Chinese
651 0 a China x History y Ming dynasty, 1368-1644.
880 1 6 100-01/1 a ? ?.
880 1 0 6 245-02/1 a ?? ?? / c ? ? ?.
880 6 250-03/1 a ?1?.
880 6 260-04/1 a ??? b ?? ???,c 1997.
880 0 6 440-05/1 a ?? ?? ?? ?? ???
62
Interacting with Other Systems
  • Incoming Z39.50 Connections
  • Records in Unicode databases are UTF8 encoded
  • z3950svr may send either or both MARC8-encoded or
    UTF8-encoded records
  • Default is set to send MARC8 encoded records
  • But, two different z3950svr ports can be
    configured to provide records in both formats,
    thereby accommodating all sites connecting to
    database

63
Interacting with Other Systems
  • Outgoing Z39.50 Connections
  • Retrieves and displays records of any type in
    UTF-8
  • Converts incoming records based on new Database
    Definitions setting in System Administration
    called Source Character Set
  • Latin1 (non Unicode)
  • MARC 21 MARC8 (non Unicode)
  • MARC21 UTF8
  • OCLC (non Unicode)
  • RLIN legacy (non Unicode)
  • Voyager legacy (non Unicode)

64
Agenda
Introduction Your Work Environment
Conversion New Features Learning More Final QA
65
If you want to know more about..
Coded Character Sets - EndUser 2004 Session
29 Title Coded Character Sets A Technical
Primer for Librarians Presenters Michael Doran,
Systems Librarian, University of Texas at
Arlington Dan Sweeney, Business Analyst II,
Endeavor Information Systems Great Website
http//rocky.uta.edu/doran/charsets/ Strategie
s and Tools for Cleaning Up Your Data -- EndUser
2004 Session 45 Title Transitioning To Unicode
Strategies for Tidying Your Data Presenters Fran
Budde, Acquisitions Cataloging Specialist,
Pacific Lutheran University Francesca Lane
Rasmus, Director, Technical Services, Pacific
Lutheran University Layne Nordgren, Director of
Instructional Technologies/Library Systems,
Pacific Lutheran University
66
If you want to know more about..
  • Special Character Input/Issues EndUser
    2004Session 65
  • Title Why Unicode?
  • Presenter Martin Heijdra, Chinese Bibliographer/
    Head of Public Services,
  • East Asian Library, Princeton University
  • Preparing for Unicode Conversion Cataloging
    Issues EndUser 2004 Session 74
  • Title Unicode Conversion at the Library of
    Congress
  • Presenter Ann Della Porta, Assistant
    Coordinator, Integrated Systems
  • Office, Library of Congress
  • SupportWeb KnowledgeBase, EndUser archives
  • http//support.endinfosys.com/cust/index.html

67
If you want to know more about.
  • 880 Alternate Graphic Representation (R)
  • http//www.loc.gov/marc/bibliographic/ecbdhold.htm
    lmrcb880
  • OCLC Character Sets
  • http//www.oclc.org/support/documentation/worldcat
    /records/subscription/5/5.pdf
  • Original Scripts in RLG Databases
  • http//www.rlg.org/origscripts.html
  • MARC 21 Concise Bibliographic Control Subfields
  • http//www.loc.gov/marc/bibliographic/ecbdcntf.htm
    l
  • MARC 21 Concise Bibliographic Multiscript
    Records
  • http//www.loc.gov/marc/bibliographic/ecbdmulti.ht
    ml

68
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com