Our recv1000'c driver - PowerPoint PPT Presentation

About This Presentation
Title:

Our recv1000'c driver

Description:

There exist quite a few similarities between implementing the transmit ... 17: ACK (Rx-ACK Frame detected) 16: SRPD (Small Rx-Packet detected) ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 26
Provided by: CRU7
Learn more at: https://www.cs.usfca.edu
Category:
Tags: ack | append | driver | recv1000

less

Transcript and Presenter's Notes

Title: Our recv1000'c driver


1
Our recv1000.c driver
  • Implementing a packet-receive capability with
    the Intel 82573L network interface controller

2
Similarities
  • There exist quite a few similarities between
    implementing the transmit-capability and the
    receive-capability in a device-driver for
    Intels 82573L ethernet controller
  • Identical device-discovery and ioremap steps
  • Same steps for global reset of the hardware
  • Comparable data-structure initializations
  • Parallel setups for the TX and RX registers
  • But there also are a few fundamental differences
    (such as active versus passive roles for
    driver)

3
push versus pull
Host memory
Ethernet controller
transmit packet buffer
transmit-FIFO
push
to/from LAN
receive-FIFO
receive packet buffer
pull
The write() routine in our xmit1000.c driver
could transfer data at any time, but the
read() routine in our recv1000.c driver has
to wait for data to arrive. So to avoid doing
any wasteful busy-waiting, our recv1000.c
driver can use the Linux kernels sleep/wakeup
mechanism if it enables NICs interrupts!
4
Sleep/wakeup
  • We will need to employ a wait-queue, we will need
    to enable device-interrupts, and we will need to
    write and install the code for an interrupt
    service routine (ISR)
  • So our recv1000.c driver will have a few
    additional code and data components that were
    absent in our xmit1000.c driver

5
Drivers components
my_isr()
wait_queue_head
This function will awaken any sleeping reader-task
my_fops
read
my_read()
This function will program the actual
data-transfer
struct holds one function-pointer
my_get_info()
This function will allow us to inspect the
receive-descriptors
module_init()
module_exit()
This function will detect and configure the
hardware, define page-mappings, allocate and
initialize the descriptors, install our ISR and
enable interrupts, start the receive engine,
create the pseudo-file and register my_fops
This function will do needed cleanup when
its time to unload our driver turn off the
receive engine, disable interrupts and remove
our ISR, free memory, delete page-table
entries, the pseudo-file, and the my_fops
6
How NICs interrupts work
  • There are four interrupt-related registers which
    are essential for us to understand

ICR
0x00C0 0x00C8 0x00D0 0x00D8
Interrupt Cause Read
ICS
Interrupt Cause Set
IMS
Interrupt Mask Set/Read
IMC
Interrupt Mask Clear
7
Interrupt event-types
31 30

18 17 16 15 14
10 9 8 7 6 5 4 2
1 0
reserved
reserved
31 INT_ASSERTED (1yes,0no)
17 ACK (Rx-ACK Frame detected) 16 SRPD (Small
Rx-Packet detected) 15 TXD_LOW (Tx-Descr Low
Thresh hit) 9 MDAC (MDI/O Access Completed)
7 RXT0 ( Receiver Timer expired) 6 RXO
(Receiver Overrun) 4 RXDMT0 (Rx-Desc Min
Thresh hit) 2 LSC (Link Status Change) 1
TXQE( Transmit Queue Empty) 0 TXDW (Transmit
Descriptor Written Back)
82573L
8
Interrupt Mask Set/Read
  • This register is used to enable a selection of
    the devices interrupts which the driver will be
    prepared to recognize and handle
  • A particular interrupt becomes enabled if
    software writes a 1 to the corresponding bit of
    this Interrupt Mask Set register
  • Writing 0 to any register-bit has no effect, so
    interrupts can be enabled one-at-a-time

9
Interrupt Mask Clear
  • Your driver can discover which interrupts have
    been enabled by reading IMS but your driver
    cannot disable any interrupts by writing to
    that register
  • Instead a specific interrupt can be disabled by
    writing a 1 to the corresponding bit in the
    Interrupt Mask Clear register
  • Writing 0 to a register-bit has no effect on
    the interrupt controllers Interrupt Mask

10
Interrupt Cause Read
  • Whenever interrupts occur, your drivers
    interrupt service routine can discover the
    specific conditions that triggered them if it
    reads the Interrupt Cause Read register
  • In this case your driver can clear any selection
    of these bits (except bit 31) by writing 1s to
    them (writing 0s to this register will have no
    effect)
  • If case no interrupt has occurred, reading this
    register may have the side-effect of clearing it

11
Interrupt Cause Set
  • For testing your drivers interrupt-handler, you
    can artificially trigger any particular
    combination of interrupts by writing 1s into
    the corresponding register-bits of this Interrupt
    Cause Set register (assuming your combination of
    bits corresponds to interrupts that are enabled
    by 1s being present for them in the Interrupt
    Mask)

12
Our interrupt-handler
  • We decided to enable all possible causes (and we
    log them via printk() messages weve omitted
    in the code-fragment here)

irqreturn_t my_isr( int irq, void dev_id )
int intr_cause ioread32( io E1000_ICR
) if ( intr_cause 0 ) return
IRQ_NONE wake_up_interruptible( wq_rd
) iowrite32( intr_cause, io E1000_ICR
) return IRQ_HANDLED
13
We tweak our packet-format
  • Our xmit1000.c driver elected to have the NIC
    append padding to any short packets
  • But this prevents a receiver from knowing how
    many bytes represent actual data
  • To solve this problem, we added our own count
    field to each packets payload

0 6
12
14
actual bytes of user-data
destination MAC-address
source MAC-address
Type/Len
count
14
Our read() method
ssize_t my_read( struct file file, char buf,
size_t len, loff_t pos ) static int rxhead
0 // to remember where we left off unsigned
char from phys_to_virt( rxdesc rxhead
.base_addr ) unsigned int count // go to
sleep if no new data-packets have been received
yet if ( ioread32( io E1000_RDH ) rxhead
) if ( wait_event_interruptible( wq_rd,
ioread32( io E1000_RDH ) ! rxhead ) )
return EINTR // get the number of actual
data-bytes in the new (possibly padded)
data-packet count (unsigned short)(from
14) // data-count as stored by xmit1000.c if
( count gt len ) count len // cant transfer
more bytes than buffer can hold if (
copy_to_user( buf, from16, count ) ) return
EFAULT // advance our static array-index
variable to the next receive-descriptor rxhead
(1 rxhead) 8 // this index wraps-around
after 8 descriptors return count // tell
kernel how many bytes were transferred
15
Hardwares initialization
  • We allocate and initialize a minimum-size Receive
    Descriptor Queue (8 descriptors)
  • We perform a global reset via the RST-bit in
    the NICs Device Control register (with a
    side-effect of zeroing both RDH and RDT)
  • We configure the receive engine (RCTL) plus a
    few additional registers that affect the
    network-controllers reception-options (namely
    RXCSUM, RFCTL, PSRCTL)

16
Receive Control (0x0100)
31 30 29 28 27 26
25 24 23 22 21
20 19 18 17 16
R 0
0
0
FLXBUF
SE CRC
BSEX
R 0
PMCF
DPF
R 0
CFI
CFI EN
VFE
BSIZE
15 14 13 12 11
10 9 8 7 6 5
4 3 2 1 0
B A M
R 0
MO
DTYP
RDMTS
I L O S
S L U
LPE
UPE
0 0
R 0
SBP
E N
LBM
MPE
EN Receive Enable DTYP Descriptor
Type DPF Discard Pause Frames SBP Store Bad
Packets MO Multicast Offset PMCF Pass MAC
Control Frames UPE Unicast Promiscuous Enable
BAM Broadcast Accept Mode BSEX Buffer Size
Extension MPE Multicast Promiscuous Enable
BSIZE Receive Buffer Size SECRC Strip
Ethernet CRC LPE Long Packet reception Enable
VFE VLAN Filter Enable FLXBUF Flexible
Buffer size LBM Loopback Mode CFIEN
Canonical Form Indicator Enable RDMTS
Rx-Descriptor Minimum Threshold Size CFI
Cannonical Form Indicator bit-value
Our driver initially will program this register
with the value 0x0400801C. Then later, when
everything is ready, it will turn on bit 1 to
start the receive engine
82573L
17
Packet-Split Rx Control (0x2170)
31 30 29 24
23 22 21 16 15
14 13 8 7
6 0
BSIZE3 (in KB)
BSIZE2 (in KB)
BSIZE1 (in KB)
BSIZE0 (in 1/8 KB)
0
0
0
0
0
0
0
If the controller is configured to use the
packet-split feature (RCTL.DTYP1), then this
register controls the sizes of the four
receive-buffers, so there are certain
requirements that nonzero values appear in
several of these fields. But our recv1000.c
driver will use the legacy receive-descriptor
format (i.e., RCRL.DTYP0) and so this register
will be disregarded by the NIC and therefore we
are allowed to program it with the value
0x00000000.
18
Receive Filter Control (0x5008)
31 30 29 28 27 26
25 24 23 22 21
20 19 18 17 16
PHY RST
VME
R 0
TFCE
RFCE
RST
R 0
R 0
R 0
R 0
R 0
ADV D3 WUC
R 0
D/UD status
R 0
reserved
15 14 13 12 11
10 9 8 7 6 5
4 3 2 1 0
EXSTEN
IPFRSP _DIS
ACKD _DIS
ACK DIS
IPv6 XSUM _DIS
IPv6 _DIS
NFS_VER
NSFR _DIS
NSFW _DIS
R 0
R 0
R 1
0 0
iSCSI _DIS
GIO M D
iSCSI_DWC
Our driver writes 0x00000000 to this register,
which among other effects will cause the
ethernet controller NOT to write Extended Status
information into our device-drivers
legacy-format Receive Descriptors (bit 15
EXTEN0)
19
RX Checksum Control (0x5000)
31

10 9 8 7 0
reserved
packet checksum start
TCP/UDP Checksum Off-load enabled (1yes, 0no)
IP Checksum Off-load enabled
(1yes, 0no) This field controls the starting
byte for the Packet Checksum calculation
Our driver programs this register with the value
0x00000000 (which disables Checksum Off-loading
for TCP/UDP packets (which we wont be
receiving) and for IP packets (which likewise
wont be sent by our xmit1000.c driver), and
all Packet-Checksums will be calculated starting
from the very first byte
20
Rx-Descriptor Control (0x2828)
31 30 29 28 27 26
25 24 23 22 21
20 19 18 17 16
0
0
0
0
0
0
0
G R A N
0
0
WTHRESH (Writeback Threshold)
15 14 13 12 11
10 9 8 7 6 5
4 3 2 1 0
0
0
0
FRC DPLX
FRC SPD
0
HTHRESH (Host Threshold)
I L O S
0 0
A S D E
0
L R S T
0 0
0
0
PTHRESH (Prefetch Threshold)
0
0
This register controls the fetching and write
back of receive descriptors. The three
threshhold values are used to determine when
descriptors are read from, and written to, host
memory. Their values can be in units of cache
lines or of descriptors (each descriptor is 16
bytes), based on the value of the GRAN bit
(0cache lines, 1descriptors). When GRAN 1,
all descriptors are written back (even if not
requested). --Intel manual
Recommended for 82573 0x01010000 (GRAN1,
WTHRESH1)
21
Maximum-size buffers
  • We use a minimal number of maximum-size
    receive-buffers (eight of 1536-bytes)

buffer 7
buffer 6
buffer 5
buffer 4
buffer 3
buffer 2
buffer 1
buffer 0
kernel memory
ring of eight rx-descriptors
22
NIC owns our rx-descriptors
RDBAH/RDBAL
RDH
descriptor 0
0 1 2 3 4 5 6 7 8
This register gets initialized to 0, then gets
changed by the controller as new packets are
received
descriptor 1
descriptor 2
descriptor 3
RDLEN
descriptor 4
0x80
descriptor 5
descriptor 6
descriptor 7
RDT
descriptor 8
This register gets initialized to 8, then never
gets changed
Our static variable
rxhead
23
Driver defects
  • If an application tries to read from our
    device-file /dev/nic, but the controller
    received a packet that contains more bytes of
    data than the user requested, excess bytes get
    lost (i.e., discarded)
  • If an application delays reading packets while
    the controller continues receiving, then an
    earlier packet gets overwritten

24
In-class exercise 1
  • Discuss with your nearest class-member your ideas
    for how these driver defects might be overcome,
    so that packet-data being received will be
    protected against getting lost and/or being
    overwritten

25
In-class exercise 2
  • Login to a pair of machines on the anchor
    cluster and install our xmit1000.ko and our
    recv1000.ko modules (one on each)
  • Try transferring a textfile from one of the
    machines to the other, by using cat
  • anchor01 cat textfile gt /dev/nic
  • anchor02 cat /dev/nic gt recv1000.out
  • How large a textfile can you successfully
    transfer using our simple driver-modules?
Write a Comment
User Comments (0)
About PowerShow.com