CSC Online Error Monitoring with the DDU - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

CSC Online Error Monitoring with the DDU

Description:

Perform online data unpacking and status monitoring in real-time ... Reads a 20-degree slice through an endcap. GbE/SPY. To Local. DAQ. Mezz Board. Input. FIFOs ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 11
Provided by: ValuedGate91
Category:
Tags: csc | ddu | error | monitoring | online

less

Transcript and Presenter's Notes

Title: CSC Online Error Monitoring with the DDU


1
CSC Online ErrorMonitoring with the DDU
J. Gilmore CSC-DPG 41 July 17, 2008
2
DDU Overview
  • Functions
  • Merge data from 15 CSCs
  • Perform online data unpacking and status
    monitoring in real-time
  • (CRC, word count, format quality, BXN, L1A
    number, buffer status, link status)
  • Send CSC status to FMM
  • Large Buffer Capacity
  • 2.5 MB buffer
  • Average DDU data volume estimated to be 0.4kB per
    L1A at LHC (_at_1034 lumi)
  • Buffer can hold over
    6000 events
  • Status info accessed via VME

15 Optical Fiber Inputs. Reads a 20-degree
slice through an endcap
GbE/SPY To Local DAQ
3
Data Unpacking in the DDU
  • Scan data for evidence of SEUs, determine if
    Reset is needed
  • Data errors are an indicator for SEU
  • Requires Hard Reset, report it to FMM
  • Monitor front-end data for event sync loss
  • Requires Sync Reset, report it to FMM
  • Watch for buffer warning signals, avoid
    Overflows!
  • Set FMM Warning as needed, at half-to-3/4 full
    (many events!)
  • Beyond 90 full DDU will set FMM Busy
  • As buffers get near empty, DDU returns to FMM
    Ready
  • Note that Buffer Overflows will lead to other
    errors if not Reset
  • Sync loss, Data corruption, Timeout errors
  • Diagnose cause and source of problems
  • Track which CSCs have set which error types
  • Report Reset Required states via VME Interrupt
  • Tracking for chronic problems in offline log
    files
  • Provide VME registers for diagnostics and
    monitoring
  • Include status and error information in the DDU
    Trailer

4
Reported Error Categories I
  • Configuration failures
  • Constants loaded on a board are not correct
  • Caused by communication errors, bad timing or
    hardware
  • Often leads to data errors Timeout, bad DAV,
    sync loss, buffer overflow, dead or hot channels,
    format errors, data corruption
  • Format error, Consistency error or Not Present
  • An expected format marker is not detected in the
    proper position
  • Can cause DDU to misidentify a board
    header/trailer word
  • May show as missing board in event
  • May show as bad L1A, CRC or word count
  • Caused by config fail, bad hardware or signal
    timing/quality
  • Hot/dead channels or Empty/Missing CSC
  • Caused by HV, config fail, bad hardware or signal
    timing/quality
  • Can lead to buffer overflows
  • Missing CSCs are caused by LV-off or disabled
    CSCs
  • DAV-LCT mismatch
  • A CFEB was triggered but it failed to send data
  • Caused by config fail, bad hardware or signal
    timing/quality
  • Can lead to buffer overflows or Timeout errors

5
Reported Error Categories II
  • Full FIFO _at_DMB (ALCT or CFEB buffer overflow)
  • Caused by config fail, bad hardware or signal
    timing/quality
  • Can cause Sync loss, Data corruption, or Timeout
  • L1A Number Mismatch Errors
  • Fundamental sign of sync loss
  • Caused by problem with hardware or signal
    timing/quality
  • Possibly SEU related
  • CRC error bit error detected in transmission
  • Generally a minor concern, affecting only one
    event
  • Only serious if it affects multiple
    Header/Trailer bits
  • May be an indicator of a deeper problem
  • CSC electronics have a CRC at every level to
    detect bit errors
  • CFEB, ALCT, TMB, DMB and DDU
  • Overall severity of an error is hard to predict
  • Cases that appear as Critical require a Reset
    as they usually lead to more errors, but
    sometimes may be self-correcting

6
Event Quality Indicators from DDU
  • The Single Error flag in DDU trailer Do Not
    Analyze Event
  • Any events with non-perfect data checks will get
    this
  • Minor bit errors or format problems, SCA Full
  • Single Warning if problem might not affect the
    data payload
  • Clean single-bit error in a header/trailer-word
    marker
  • Fiber receiver/link error that may have occurred
    between events
  • DCM phase-lock-loss that may occur between events
  • The Critical Error Sync Lost case Data
    Integrity Failure
  • L1A mismatch detected twice on one CSC
  • Two different boards in the same event
  • Separate occurrences in two different events
  • Buffer Overflow at DMB or DDU
  • Note offline analysis might not see the loss in
    data integrity
  • At the full point, a buffer still has many good
    events to read out before the compromised data is
    observed, and sTTS actions can conceal all this
  • The Critical Error Hard Reset case Unpacker
    Failure Likely
  • Anything that corrupts the data irreversibly
  • Violation of event boundaries, cant determine
    end-of-CSC data stream
  • Anything that looks like an SEUe.g. repeated
    trivial errors

7
Summary
  • The DDU performs online CSC error monitoring in
    real-time
  • The monitor status is in the DDU Trailer for
    every event
  • The DDU monitoring results are useful for offline
    data quality checking
  • Details of DDU monitoring status can be found
    here
  • http//www.physics.ohio-state.edu/cms/dd
    u/ddu2_pro.htmltr-1

8
DDU Error Table I

1 Error bits resulting in RESET REQUIRED
persist until the RESET occurs. Questionable
cases (in gold) indicate that a reset is only
required for mitigation of recurring errors. TBD
sync/hard reset distinctions. 2 Found inside an
event, i.e. between Beginning-Of-Event (Header1
signature) and End-Of-Event (combination
Trailer1Trailer2 signatures), at least one of
the following Extra DMB_Header1, Extra
DMB_Header2, Lone Word, Extra TMB/ALCT_Trailer,
Extra DMB_Trailer1, DMB_Trailer2. 3 Missing
TMB/ALCT_Trailer word, missing DMB Header word,
Wrong First word, or Extra Control words.
9
DDU Error Table II

1 Error bits resulting in RESET REQUIRED
persist until the RESET occurs. Questionable
cases (in gold) indicate that a reset is only
required for mitigation of recurring errors. TBD
sync/hard reset distinctions. 2 Found inside an
event, i.e. between Beginning-Of-Event (Header1
signature) and End-Of-Event (combination
Trailer1Trailer2 signatures), at least one of
the following Extra DMB_Header1, Extra
DMB_Header2, Lone Word, Extra TMB/ALCT_Trailer,
Extra DMB_Trailer1, DMB_Trailer2. 3 Missing
TMB/ALCT_Trailer word, missing DMB Header word,
Wrong First word, or Extra Control words.
10
DDU Error Table III
  • Footnotes for the error table
  • 1 Error bits resulting in RESET REQUIRED
    persist until the RESET occurs. Questionable
    cases (in gold) indicate that a reset is only
    required for mitigation of recurring errors. TBD
    sync/hard reset distinctions.
  • 2 Found inside an event, i.e. between
    Beginning-Of-Event (Header1 signature) and
    End-Of-Event (combination Trailer1Trailer2
    signatures), at least one of the following
    Extra DMB_Header1, Extra DMB_Header2, Lone Word,
    Extra TMB/ALCT_Trailer, Extra DMB_Trailer1,
    DMB_Trailer2.
  • 3 Missing TMB/ALCT_Trailer word, missing DMB
    Header word, Wrong First word, or Extra Control
    words.
Write a Comment
User Comments (0)
About PowerShow.com