MBUF Problems and solutions on VxWorks

About This Presentation

Title:

MBUF Problems and solutions on VxWorks

Description:

Neither ics-accl-srv1 nor the CA gateway were able to get to dtl-hprf-ioc3. Via 'cu', the IOC looked fine except for error messages ... – PowerPoint PPT presentation

Number of Views:350

Avg rating:3.0/5.0

Slides: 12

Provided by: aps2

Learn more at: https://epics.anl.gov

Category:

more less

Transcript and Presenter's Notes

Title: MBUF Problems and solutions on VxWorks

1
MBUF Problems and solutions on VxWorks

Dave Thompson and cast of many.

2
MBUF Problems

This is usually how it lands in my inbox
On Tue, 2003-05-06 at 2038, Kay-Uwe Kasemir
wrote
gt Hi
gt
gt Neither ics-accl-srv1 nor the CA gateway were
able to get to dtl-hprf-ioc3.
gt
gt Via "cu", the IOC looked fine except for error
messages
gt (CA_TCP) CAS Client accept error was
"S_errno_ENOBUFS"
(CA_online) ../online_notify.c CA beacon error
was "S_errno_ENOBUFS
This has been a problem since before our front
end commissioning even though we are using power
pc IOCs and a fully switched, full duplex, 100
MHz Cisco based network infrastructure.
The error is coming from the Channel Access
Server.

3
Contributing Circumstances

(According to Jeff Hill)
The total number of connected clients is high.
the server's sustained (data) production rate is
higher than the client's sustained consumption
rate.
clients that subscribe for monitor events but do
not call ca_pend_event() or ca_poll() to process
their CA input queue
The server does not get a chance to run
The server has multiple stale connections
And also probably
tNetTask does not get to run

4
Contributing Circumstances

SNS Now has a number of different IOCs
21 VxWorks IOCS
21 /- Windows IOCs
1 Linux IOC
4 OPIs in control room and many others on site
Servers running CA clients like the archiver
Users remotely logged in running edm via sshs X
tunnel.
CA Gateway
Other IP clients and services running on vxWorks
and servers.
Other IP applications running on IOCs such as log
tasks, etherIP and serial devices running over IP.

5
Our experience to date

At SNS we have seen all of the contributing
circumstances that Jeff mentions.
At BNL, Larry Hoff saw the problem on an IOC
where the network tasks were being starved.
Many of our IOCs have heavy connection loads.
There are some CA client and Java CA client
applications which need to be checked.
IOCs get hard reboots to fix problems and thus
leave stale connections.
Other network problems have existed and been
fixed including CA gateway loopback.

6
Late breaking

Jeff Hill was at ORNL last week.
One of the things he suspected was that the noise
on the Ethernet wiring causes the link to
re-negotiate speed and full/half duplex
operation.
He confirmed that the combination of the MV2100
and the Cisco switches is prone to frequent
auto-negotiation, shutting down Ethernet I/O on
the IOC.
This is not JUST a boot-up problem.

7
What is an mbuf anyway?
VxWorks uses this structure to avoid calls to the
heap functions malloc() and free() from within
the network driver.

mBlks are the nodes that make up a linked list
of clusters.
The clusters store the data while it is in the
network stack.
There is a fixed number of clusters of differing
sizes.
Since a given cluster block can exist on more
than one list, then you need 2X as many mBlks as
clusters.

8
Mbuf and cluster pools

Each network interface has its own mbuf pool
netStackDataPoolShow() (aka mbufShow)
The system has a separate mbuf/cluster pool used
for routing, socket information, and the arp
table.
netStackSysPoolShow()

9
Output from mbufShow
number of mbufs 400 number of times failed to
find space 0 number of times waited for space
0 number of times drained protocols for space
0 size clusters free usage -------------
--------------------------------------------------
---------------- 64 200 199
1746 128 400 400
190088 256 80 80 337
512 80 80 0
1024 50 50 1
2048 50 50 0 4096
50 50 0 8192
50 50 0
High turnover rate
Added at SNS
This one is mis-configured. Why?
10
Our Default Net Pool Sizes
You should add these lines to config.h or maybe
configAll.h define NUM_64 100 / no. 64 byte
clusters / define NUM_128 200 define
NUM_256 40 / no. 256 byte clusters / define
NUM_512 40 / no. 512 byte clusters / define
NUM_1024 25 / no. 1024 byte clusters / define
NUM_2048 25 / no. 2048 byte clusters / define
NUM_CL_BLKS (NUM_64 NUM_128 NUM_256
\ NUM_512 NUM_1024 NUM_2048 \
NUM_4096NUM_8192) define NUM_NET_MBLKS
2(NUM_CL_BLKS) These will override the
definitions in usrNetwork.c.
11
What we are doing at SNS

We are using a kernel addition that provides for
setting the network stack sizes on the bootline.
4X the vxWorks default sizes are working well.
We see high use rates for the 128 byte clusters
so that allocation is set extra high.
Use huge numbers only if trying to diagnose
problem such as a resource leak.
Configuring the network interfaces to disable
auto-negotiation of speed and full-duplex.
Code for the kernel addition is available at
http//ics-web1.sns.ornl.gov/EPICS-S2003