Anomaly Recovery and the Mars Exploration Rovers - PowerPoint PPT Presentation


PPT – Anomaly Recovery and the Mars Exploration Rovers PowerPoint presentation | free to view - id: 363e0-MzkzZ


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Anomaly Recovery and the Mars Exploration Rovers


Each Rover has spent over two years on ... Mars Exploration Rover. page ... Mars Exploration Rover. page 15 'Potato' Rock. Sol 339 on Spirit. A planned drive ... – PowerPoint PPT presentation

Number of Views:160
Avg rating:3.0/5.0
Slides: 20
Provided by: Bdew8


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Anomaly Recovery and the Mars Exploration Rovers

Anomaly Recovery and the Mars Exploration Rovers
  • Beth Dewell and Jacob Matijevic

The Mars Exploration Rover
  • Currently MER A(Spirit) Sol 856 MER
    B(Opportunity) Sol 836
  • Each Rover has spent over two years on the
    surface of Mars
  • Variation in terrain, changing environment, and
    problems in operation from the surface of another
    planet were taken into account in the system
  • A long-range planning team prepares outlines of
    future activities
  • Strategic plan guides the operation through the
    creation of near term objectives for vehicle
  • Each day a separate team (on each vehicle)
    prepares a set of sequences of commands that
    implement the objectives for a short period (1-3
    days commonly) of the strategic plan
  • Tactical plan is a set of sequences prepared
    based on the best knowledge of the state of the
  • Constraints times of communication, data
    storage, time and energy available

System Design
  • Computer built around a RAD-6000 CPU (Rad6k),
    RAM and non-volatile memory(FLASH and EEPROM)
  • Energy
  • Collected from the solar array
  • Channeled along the system power bus that is
    supported by two Li-ion batteries
  • Power not required to support loads recharges
  • Batteries support loads drawing power in excess
    of that supplied by the solar panel
  • Power in excess of loads and recharge required
    by the batteries is channeled to an external
    shunt radiator
  • Regulation and distribution of power is managed
    by 2 battery control boards (BCBs)
  • Batteries also supply power to the mission clock
    with an alarm clock feature, programmable by
  • Mobility
  • Six wheeled driven, four wheeled steered vehicle
  • Communications
  • X-band a Small Deep Space Transponder, and two
    Solid State Power Amplifiers, supported by a
    body-fixed, monopole Low Gain Antenna (LGA) and a
    High Gain Antenna (HGA)
  • UHF a transceiver, supported by a body-fixed,
    monopole antenna

Flight Software Design
  • Autonomous Operations
  • Maintains the vehicle in the state needed to
    receive and act upon commands, execute sequences
    of commands when available, and collect and
    format data for transmission
  • Support wakeups (i.e., boot of the CPU) and
    shutdowns as part of normal operations
  • Wakeup is scheduled once each day when energy
    production from the solar array can support the
    load associated with the CPU and supporting
  • BCB determines if energy sufficiently meets
  • A shutdown is controlled by parameters,
    established to ensure a power and thermal balance
  • Communications
  • Timed events maintained in an onboard
    communication windows table (X-band and UHF)
  • At any given time table contains about 6 weeks of
    timed events, covering uplink and downlink
  • Sequenced Control
  • Sequenced commands are on-board and controlling
    vehicle activities including wakeup/shutdown

Flight Software Telemetry and Exception Response
  • 3 Types of Flight software telemetry
  • Event reports (EVRs)
  • Engineering data (EHA)
  • Data products
  • Exception Response
  • Warning An EVR (warning) may be written in the
    record to note the occurrence
  • Fault Ongoing process or sequence ended
  • EVRs, EHA, and perhaps a fault data product are
    generated for the record
  • Fatal FSW autonomously reboots when an
    unrecoverable problem is encountered
  • No time to document conditions at the time of the
  • FSW temporarily stores a small number of EVRs in
    EEPROM during execution to be recovered after the
    reboot has been accomplished
  • Autonomous mode results

When a problem occurs
  • Outline the strategy for resolving the problem
    and continue, as possible
  • Types of Problems
  • Simply sequence did not fully execute
  • Component performance unexpected or incomplete
  • Additional data may be requested
  • An engineering test is scheduled
  • Typically, other parts of the rover are
    unaffected by the component anomaly, so the
    beginnings of corrective action begin
  • For persistent component problem, an anomaly team
    is formed and a multi-sol investigation is
  • Moves the corrective action from a tactical
    response to a strategic response, often requiring
    experts to help in diagnosis and recovery
  • Tactical process otherwise continues doing
    science as possible while recovery strategy is
    developed and demonstrated

Major MER Anomalies
  • Software Anomalies
  • A race condition, initialization counter
  • Another race condition, imaging interface
  • Corrupted command conjunction test
  • Exception in evaluation of a DDI during mobility
  • Upload fault during forward link commanding
  • Hardware Anomalies
  • Stuck-on Heater
  • RF drive actuator
  • RF Steering actuator
  • IDD azimuth actuator
  • Environmentally Induced Anomalies
  • Clock Fault
  • 'Potato' Rock
  • Embedding in terrain

A race condition, initialization counter
  • Sol 131 on Spirit
  • Vehicle was unexpectedly in autonomous operation
    no sequences were active on board
  • Fatal exception from the FSW initialization
    module was noted in the EVR log in telemetry
  • Problem
  • Vulnerability which occurs when the
    initialization module was attempting to increment
    the initialization counter
  • Counter resides in non-volatile memory
  • Writing to this memory required permission from a
    separate software service that managed access to
    the memory
  • Between the request and the grant of access to
    write to the memory location, another software
    module had requested, been granted access and had
    written to non-volatile memory
  • The initialization module, finding it could not
    write the initialization counter, declared an
    exception resulting in a fatal condition

Initialization counter, contd
  • All processes time-share the use of the single
  • No guarantee that the three actions desired by
    the initialization module (i.e., request, being
    granted write access, and writing to memory)
    occur contiguously
  • Vulnerability viewed as a function of the number
    of processes in operation at the time of the
    write of the initialization counter
  • Vulnerability duration a few microseconds to
    perform the three actions within about a 4 minute
    window during initialization
  • Advisory given with added restriction of IDD use
    during the 4-minute window
  • Likelihood of recurrence was deemed so slight as
    to not warrant further action
  • No FSW change
  • Understood race condition between software
    modules a race that the initialization module
    won for many initializations (over 560 at that
    time) and many sols since this occurrence

Initialization counter Recurrence Resolution
  • Recurrence
  • Spirit on sol 209
  • Opportunity on sol 596
  • Opportunity on sol 622
  • Occurred during initialization in preparation for
    an afternoon UHF communication window
  • The FSW response caused the loss of that
    communication window
  • No telemetry, leaving the recovery team to sift
    through many possibilities for the problem
  • Next uplink window, sol 623, commands were issued
    and sequence control was regained
  • Due to the delay in recovery of a sol after the
    sol 622 event, team enforced a 'keep out zone'
    for operations after wakeup
  • Due to energy considerations, only enforced
    during the wakeup prior to an afternoon UHF pass
  • Ensured that a recurrence would not jeopardize
    the return of engineering and science data needed
    to plan for the next sol

Stuck-on Heater
  • Sol 2 on Opportunity
  • First overnight UHF pass on sol 2 at 0330 LST
    nighttime loads from sol 1 2300 LST to sol 2
    330 LST were 0.5Amps larger than predicted
  • Next communication session showed that the
    additional load had remained on until 10 LST
  • Dissipated 180 W-hrs
  • Highest likelihood fault Rover Power
    Distribution Unit (RPDU) load unexpectedly
    powered on
  • Load size, on/off times, temperature narrowed
    down to an IDD heater circuit
  • Off/on times correlate to predicted thermostat
    box switch times
  • Temp. sensor on MI near the IDD heater circuit
    recorded overnight rise

Stuck-on Heater Implications
  • Rover still completely functional as designed
  • Energy drain reduces energy available for science
  • Especially important in Winter
  • Spacecraft survival an issue

Stuck-on Heater Recovery Deep Sleep
  • Deep Sleep
  • Remove batteries from power bus
  • Causes Battery Control Board to not be powered
  • Only mission clock and alarm clock powered
  • Faulty heater circuit turned off at night
  • At dawn, BCBs awakened when sufficient light
    hits solar arrays
  • Net savings 180W-hr/sol
  • Implemented 1st time on sol 101-102 permanently
    on sol 206
  • Could be temporarily disabled on a nightly basis
    by command
  • Survival heaters for the miniTES and Rover
    Electronics Module, which are normally left on
    during the night, were taken off-line
  • Colder temperatures on the miniTES (routinely
    below the acceptable flight temperature limits)
    have undoubtedly contributed to a degradation of
    this instrument on Opportunity
  • This degradation has not been experienced by the
    miniTES on Spirit

'Potato' Rock
  • Sol 339 on Spirit
  • A planned drive up Husband Hill
  • At the first turn command in the drive sequence,
    the rover was commanded to turn in place within
    18 seconds
  • In the last 2 seconds, the right rear wheel
    current spiked and the drive motor actuator
  • Sol 339 data revealed A rock was lodged between
    the inner ring of the wheel and the actuators

'Potato' Rock Recovery
  • Sol 340
  • Rock dislodged from the actuator by spinning the
    right rear wheel in the opposite direction
  • Rock was now inside the wheel
  • Sol 343
  • - Right rear wheel was straightened, and then the
    drive and steering actuators were temporarily
  • - Two small 0.3 meter arcing drives were
    attempted using the remaining 5 wheels
  • - Rock remained inside the wheel
  • Sol 344
  • The rover was commanded to back down the hill
    (all actuators enabled this time)
  • Rock remained inside the wheel

'Potato' Rock Recovery, contd
  • Sol 345
  • - Turn in place to drive back down the hill, and
    to perform a final turn in place
  • Rock remained inside the wheel
  • Sol 346
  • - Two drives and one more turn in place were
  • - Afternoon images confirmed the rock was out of
    the wheel
  • Sols 348-350
  • - Nominal Operations resumed

'Potato' Rock Analysis
  • Rock jamming in the mobility mechanisms had
    occurred on prior rover systems
  • MER system had additional external clearances
    around the wheels and the drive actuator
  • Wheel wells not 'closed out' due to weight
  • Despite these provisions, vehicle geometry, loose
    rocks, and regolith of the terrain on Husband
    Hill made this rock jamming possible

Closing Thoughts
  • The system was designed to anticipate a number of
    likely faults and conditions due to sequence
    execution and environmental change, which led to
    an operational flexibility reducing recovery
  • Less than 5 (about 30 sols out of 850 sols of
    operations) have been devoted to recovery on
    each of the vehicles.
  • These considerations will likely be present in
    any future surface mission.