Discussion:
[dpdk-dev] IPv6 Offload Capabilities
Gal Sagie
2015-01-05 07:56:41 UTC
Permalink
Hello All,

I noticed that in version 1.8, there are no flags to indicate IPv6 check
sum offloading
(only DEV_TX_OFFLOAD_IPV4_CKSUM)
which means TSO offloading is also not supported for IPv6.

Are there any plans/road map to support this?

Thanks
Gal.
Matthew Hall
2015-01-05 08:09:20 UTC
Permalink
Post by Gal Sagie
I noticed that in version 1.8, there are no flags to indicate IPv6 check
sum offloading
(only DEV_TX_OFFLOAD_IPV4_CKSUM)
which means TSO offloading is also not supported for IPv6.
I need that feature too. Right now I disabled the IP checksum offloading because I was making some greenfield code which does both protocol versions cleanly, so it's not nice or polite to use real strange asymmetric logic in there.

Then I went looking and DPDK doesn't offer an accelerated user-space routine for it. Which seems like it could work out quite poorly for people trying to use ARM and PPC where the offloads might not be present. I had to steal an unaccelerated one from *BSD just to get things running until I could figure out a better way, which worked right for IPv6 and ICMP datagrams so everything can use 100% the same clean code.

I think a bit more thought is needed around the various crypto / checksum / hash features in DPDK in general for the future versions.

1) The hash table and LPM table have real strict limits about what kinds of keys and values can be used. I have much bigger keys than the usual classic packet keys (which I also need to support) and these won't work in the DPDK's tables. It's a real bummer because I could use these for implementing high speed logging and management protocols where I need to access some funky keys and values at a very high perf rate, not just extremely small ones at line-rate perf rate, as they've got now. It'd also be good if they could work on bigger stuff like L4-L7 security indicators (IPs work, domains, URLs, emails, MD5's, SHA256's, etc. don't normally fit in DPDK's extremely locked down tables).

2) The checksum operations are kind of a hodgepodge and don't always have a consistent vision to them... some things like the 16-bit-based IP checksum appear to be missing any routine, including any accelerated one when the offload doesn't work (like for ICMPv4, ICMPv6, and any IPv6 datagrams, or other weird crap like IPv6 pseudo headers, even contemplating those gives me a headache, but at least my greenfield code for it works now).

3) There isn't a real flexible choice of hash functions for the things which use hashes... for example, something which offered bidirectional programming of the Flow Director hash algo by stock / default (as seen in a paper one of the Intel guys posted recently) would be super awesome.

Matthew.
Thomas Monjalon
2015-01-05 08:36:54 UTC
Permalink
Hi Gal and Matthew,
Post by Matthew Hall
Post by Gal Sagie
I noticed that in version 1.8, there are no flags to indicate IPv6 check
sum offloading
(only DEV_TX_OFFLOAD_IPV4_CKSUM)
which means TSO offloading is also not supported for IPv6.
I need that feature too. Right now I disabled the IP checksum offloading
because I was making some greenfield code which does both protocol versions
cleanly, so it's not nice or polite to use real strange asymmetric logic in
there.
Which checksum are you talking about? IPv6 checsum doesn't exist.
Post by Matthew Hall
Then I went looking and DPDK doesn't offer an accelerated user-space routine
for it. Which seems like it could work out quite poorly for people trying to
use ARM and PPC where the offloads might not be present. I had to steal an
unaccelerated one from *BSD just to get things running until I could figure
out a better way, which worked right for IPv6 and ICMP datagrams so
everything can use 100% the same clean code.
What are you talking about?
Post by Matthew Hall
I think a bit more thought is needed around the various crypto / checksum /
hash features in DPDK in general for the future versions.
1) The hash table and LPM table have real strict limits about what kinds of
keys and values can be used. I have much bigger keys than the usual classic
packet keys (which I also need to support) and these won't work in the
DPDK's tables. It's a real bummer because I could use these for implementing
high speed logging and management protocols where I need to access some
funky keys and values at a very high perf rate, not just extremely small
ones at line-rate perf rate, as they've got now. It'd also be good if they
could work on bigger stuff like L4-L7 security indicators (IPs work,
domains, URLs, emails, MD5's, SHA256's, etc. don't normally fit in DPDK's
extremely locked down tables).
Can we have the same performance with extended tables?
Maybe you just want to implement your own tables.
Post by Matthew Hall
2) The checksum operations are kind of a hodgepodge and don't always have a
consistent vision to them... some things like the 16-bit-based IP checksum
appear to be missing any routine, including any accelerated one when the
offload doesn't work (like for ICMPv4, ICMPv6, and any IPv6 datagrams, or
other weird crap like IPv6 pseudo headers, even contemplating those gives me
a headache, but at least my greenfield code for it works now).
Please detail which function is missing for which usage.
Post by Matthew Hall
3) There isn't a real flexible choice of hash functions for the things which
use hashes... for example, something which offered bidirectional programming
of the Flow Director hash algo by stock / default (as seen in a paper one of
the Intel guys posted recently) would be super awesome.
Again, a reference to the paper would help.
--
Thomas
Matthew Hall
2015-01-06 05:25:37 UTC
Permalink
Post by Thomas Monjalon
Which checksum are you talking about? IPv6 checsum doesn't exist.
The same computation algorithm must be reused to calculate the IPV6
Pseudoheader checksum when generating ICMPV6, UDPV6, and other L4 protocols
whose definitions were retroactively modified to include the IPV6
pseudoheader, that happen to use the same checksum in L4 which IP used in L3.
Post by Thomas Monjalon
Post by Matthew Hall
Then I went looking and DPDK doesn't offer an accelerated user-space routine
for it. Which seems like it could work out quite poorly for people trying to
use ARM and PPC where the offloads might not be present. I had to steal an
unaccelerated one from *BSD just to get things running until I could figure
out a better way, which worked right for IPv6 and ICMP datagrams so
everything can use 100% the same clean code.
What are you talking about?
Yeah this is referring to the IP checksum algorithm, "the ones' complement of
the ones' complement sum of some 16-bit words". I didn't find a speedy version
of it for manually hacking together IPV6 based frames anyplace inside DPDK.
Post by Thomas Monjalon
Can we have the same performance with extended tables?
Maybe you just want to implement your own tables.
One thing is for sure. People using DPDK are not going to be Intel
acceleration experts. If we were we wouldn't need to use DPDK. ;)

Therefore any table that comes with DPDK is definitely going to be using
better optimizations than whatever we come up with on our own, not to mention
reinventing the wheel incompatibly is a bad thing, despite that many C
developers like to do so. ;)

I'm a security expert but I'm not an Intel-friendly hash table expert. It
would be totally OK if the table didn't run as fast when bigger stuff was
used, but right now big stuff is just prohibited with a bunch of hard-coded
sizes and this seems like a bad thing.
Post by Thomas Monjalon
Post by Matthew Hall
2) The checksum operations are kind of a hodgepodge and don't always have a
consistent vision to them... some things like the 16-bit-based IP checksum
appear to be missing any routine, including any accelerated one when the
offload doesn't work (like for ICMPv4, ICMPv6, and any IPv6 datagrams, or
other weird crap like IPv6 pseudo headers, even contemplating those gives me
a headache, but at least my greenfield code for it works now).
Please detail which function is missing for which usage.
rte_hash_crc exists, rte_hash_crc_4byte exists, there is no rte_hash_ip_cksum
to use when checksum offloading doesn't work for some reason (in BSD it's
called in_cksum). The jhash and CRC API's don't look to be consistent /
compatible. An expandable API with some enum of hash algorithms and a standard
calling convention for accelerated / special algorithms (like ones which
assume 4-byte input) would make this more generic.
Post by Thomas Monjalon
Post by Matthew Hall
3) There isn't a real flexible choice of hash functions for the things which
use hashes... for example, something which offered bidirectional programming
of the Flow Director hash algo by stock / default (as seen in a paper one of
the Intel guys posted recently) would be super awesome.
Again, a reference to the paper would help.
http://www.ndsl.kaist.edu/~shinae/papers/TR-symRSS.pdf

Mentioned by jim at netgate.com (Jim Thompson) .

To sum up the paper, there is a special way to set up the Flow Director hash,
which barely changes packet evenness from the default setting, which will get
both directions of L4 flows routed into the same CPU cores.

But the larger architectural point was my proposed goal that all of the
various kinds of hashes (flow hashes, checksums / packet hashes, table lookup
hashes, etc.) could use a consistent pluggable API so we could easily move
back and forth between them and write clean consistent code any time a hash is
being used.

Matthew.
Matthew Hall
2015-01-06 05:30:27 UTC
Permalink
Post by Matthew Hall
The same computation algorithm must be reused to calculate the IPV6
Pseudoheader checksum when generating ICMPV6, UDPV6, and other L4 protocols
whose definitions were retroactively modified to include the IPV6
pseudoheader, that happen to use the same checksum in L4 which IP used in L3.
To clarify, this is the part of the RFC which mentions it:

https://tools.ietf.org/html/rfc2460#section-8.1

Also, somebody else mentioned using TSO (TCP Segmentation Offload).

I did look at it but since it only seemed to work in TCP if I read everything
right, that'd mean I had inconsistent code for IPv4 versus IPv6 stack, and
inconsistent behavior for TCP from that for ICMP and UDP.

I was trying to avoid writing too much of this messy code if possible.

Matthew.
Thomas Monjalon
2015-01-14 11:29:55 UTC
Permalink
Hi Matthew,
Post by Matthew Hall
Post by Thomas Monjalon
Post by Matthew Hall
2) The checksum operations are kind of a hodgepodge and don't always have a
consistent vision to them... some things like the 16-bit-based IP checksum
appear to be missing any routine, including any accelerated one when the
offload doesn't work (like for ICMPv4, ICMPv6, and any IPv6 datagrams, or
other weird crap like IPv6 pseudo headers, even contemplating those gives me
a headache, but at least my greenfield code for it works now).
Please detail which function is missing for which usage.
rte_hash_crc exists, rte_hash_crc_4byte exists, there is no rte_hash_ip_cksum
to use when checksum offloading doesn't work for some reason (in BSD it's
called in_cksum). The jhash and CRC API's don't look to be consistent /
compatible. An expandable API with some enum of hash algorithms and a standard
calling convention for accelerated / special algorithms (like ones which
assume 4-byte input) would make this more generic.
[...]
Post by Matthew Hall
But the larger architectural point was my proposed goal that all of the
various kinds of hashes (flow hashes, checksums / packet hashes, table lookup
hashes, etc.) could use a consistent pluggable API so we could easily move
back and forth between them and write clean consistent code any time a hash is
being used.
Thank you for your detailed comments.
Are you saying that you want to work on such hash API for DPDK?
--
Thomas
Olivier MATZ
2015-01-05 08:33:15 UTC
Permalink
Hello,
Post by Gal Sagie
I noticed that in version 1.8, there are no flags to indicate IPv6 check
sum offloading
(only DEV_TX_OFFLOAD_IPV4_CKSUM)
There is no L3 checksum field in IPv6 header, that's why there is no
DEV_TX_OFFLOAD_IPV6_CKSUM flag.
Post by Gal Sagie
which means TSO offloading is also not supported for IPv6.
TSO is supported for IPv6. Please see the test report sent on the
mailing list:
http://dpdk.org/ml/archives/dev/2014-November/007991.html

Regards,
Olivier
Loading...