MBUF Problems and solutions on VxWorks - PowerPoint PPT Presentation

About This Presentation
Title:

MBUF Problems and solutions on VxWorks

Description:

Neither ics-accl-srv1 nor the CA gateway were able to get to dtl-hprf-ioc3. Via 'cu', the IOC looked fine except for error messages ... – PowerPoint PPT presentation

Number of Views:350
Avg rating:3.0/5.0
Slides: 12
Provided by: aps2
Learn more at: https://epics.anl.gov
Category:

less

Transcript and Presenter's Notes

Title: MBUF Problems and solutions on VxWorks


1
MBUF Problems and solutions on VxWorks
  • Dave Thompson and cast of many.

2
MBUF Problems
  • This is usually how it lands in my inbox
  • On Tue, 2003-05-06 at 2038, Kay-Uwe Kasemir
    wrote
  • gt Hi
  • gt
  • gt Neither ics-accl-srv1 nor the CA gateway were
    able to get to dtl-hprf-ioc3.
  • gt
  • gt Via "cu", the IOC looked fine except for error
    messages
  • gt (CA_TCP) CAS Client accept error was
    "S_errno_ENOBUFS"
  • (CA_online) ../online_notify.c CA beacon error
    was "S_errno_ENOBUFS
  • This has been a problem since before our front
    end commissioning even though we are using power
    pc IOCs and a fully switched, full duplex, 100
    MHz Cisco based network infrastructure.
  • The error is coming from the Channel Access
    Server.

3
Contributing Circumstances
  • (According to Jeff Hill)
  • The total number of connected clients is high.
  • the server's sustained (data) production rate is
    higher than the client's sustained consumption
    rate.
  • clients that subscribe for monitor events but do
    not call ca_pend_event() or ca_poll() to process
    their CA input queue
  • The server does not get a chance to run
  • The server has multiple stale connections
  • And also probably
  • tNetTask does not get to run

4
Contributing Circumstances
  • SNS Now has a number of different IOCs
  • 21 VxWorks IOCS
  • 21 /- Windows IOCs
  • 1 Linux IOC
  • 4 OPIs in control room and many others on site
  • Servers running CA clients like the archiver
  • Users remotely logged in running edm via sshs X
    tunnel.
  • CA Gateway
  • Other IP clients and services running on vxWorks
    and servers.
  • Other IP applications running on IOCs such as log
    tasks, etherIP and serial devices running over IP.

5
Our experience to date
  • At SNS we have seen all of the contributing
    circumstances that Jeff mentions.
  • At BNL, Larry Hoff saw the problem on an IOC
    where the network tasks were being starved.
  • Many of our IOCs have heavy connection loads.
  • There are some CA client and Java CA client
    applications which need to be checked.
  • IOCs get hard reboots to fix problems and thus
    leave stale connections.
  • Other network problems have existed and been
    fixed including CA gateway loopback.

6
Late breaking
  • Jeff Hill was at ORNL last week.
  • One of the things he suspected was that the noise
    on the Ethernet wiring causes the link to
    re-negotiate speed and full/half duplex
    operation.
  • He confirmed that the combination of the MV2100
    and the Cisco switches is prone to frequent
    auto-negotiation, shutting down Ethernet I/O on
    the IOC.
  • This is not JUST a boot-up problem.

7
What is an mbuf anyway?
VxWorks uses this structure to avoid calls to the
heap functions malloc() and free() from within
the network driver.


  • mBlks are the nodes that make up a linked list
    of clusters.
  • The clusters store the data while it is in the
    network stack.
  • There is a fixed number of clusters of differing
    sizes.
  • Since a given cluster block can exist on more
    than one list, then you need 2X as many mBlks as
    clusters.

8
Mbuf and cluster pools
  • Each network interface has its own mbuf pool
  • netStackDataPoolShow() (aka mbufShow)
  • The system has a separate mbuf/cluster pool used
    for routing, socket information, and the arp
    table.
  • netStackSysPoolShow()

9
Output from mbufShow
number of mbufs 400 number of times failed to
find space 0 number of times waited for space
0 number of times drained protocols for space
0 size clusters free usage -------------
--------------------------------------------------
---------------- 64 200 199
1746 128 400 400
190088 256 80 80 337
512 80 80 0
1024 50 50 1
2048 50 50 0 4096
50 50 0 8192
50 50 0
High turnover rate
Added at SNS
This one is mis-configured. Why?
10
Our Default Net Pool Sizes
You should add these lines to config.h or maybe
configAll.h define NUM_64 100 / no. 64 byte
clusters / define NUM_128 200 define
NUM_256 40 / no. 256 byte clusters / define
NUM_512 40 / no. 512 byte clusters / define
NUM_1024 25 / no. 1024 byte clusters / define
NUM_2048 25 / no. 2048 byte clusters / define
NUM_CL_BLKS (NUM_64 NUM_128 NUM_256
\ NUM_512 NUM_1024 NUM_2048 \
NUM_4096NUM_8192) define NUM_NET_MBLKS
2(NUM_CL_BLKS) These will override the
definitions in usrNetwork.c.
11
What we are doing at SNS
  • We are using a kernel addition that provides for
    setting the network stack sizes on the bootline.
  • 4X the vxWorks default sizes are working well.
  • We see high use rates for the 128 byte clusters
    so that allocation is set extra high.
  • Use huge numbers only if trying to diagnose
    problem such as a resource leak.
  • Configuring the network interfaces to disable
    auto-negotiation of speed and full-duplex.
  • Code for the kernel addition is available at
    http//ics-web1.sns.ornl.gov/EPICS-S2003
Write a Comment
User Comments (0)
About PowerShow.com