Discussion:
[dpdk-dev] [dpdk-announce] DPDK Features for Q1 2015
(too old to reply)
O'driscoll, Tim
2014-10-22 13:48:36 UTC
Permalink
We're starting to plan our DPDK features for next year. We're planning to have a DPDK 2.0 release at the end of March, and we'd like to inform the community of the features that we hope to submit to that release. The current list of features, along with brief descriptions, is included below.

There will naturally be some changes to this list as work on these features progresses. Some will inevitably turn out to be bigger than anticipated and will need to be deferred to a later date, while other priorities will arise and need to be accommodated. So, this list will be subject to change, and should be taken as guidance on what we hope to submit, not a commitment.

Our aim in providing this information now is to solicit input from the community. We'd like to make sure we avoid duplication or conflicts with work that others are planning, so we'd be interested in hearing any plans that others in the community have for contributions to DPDK in this timeframe. This will allow us to build a complete picture and ensure we avoid duplication of effort.

I'm sure people will have questions, and will be looking for more information on these features. Further details will be provided by the individual developers over the next few months. We aim to make better use of the RFC process in this release, and provide early outlines of the features so that we can obtain community feedback as soon as possible.

We also said at the recent DPDK Summit that we would investigate holding regular community conference calls. We'll be scheduling the first of these calls soon, and will use this to discuss the 2.0 (Q1 2015) features in a bit more detail.


2.0 (Q1 2015) DPDK Features:
Bifurcated Driver: With the Bifurcated Driver, the kernel will retain direct control of the NIC, and will assign specific queue pairs to DPDK. Configuration of the NIC is controlled by the kernel via ethtool.

Support the new Intel SoC platform, along with the embedded 10GbE NIC.

Packet Reordering: Assign a sequence number to packets on Rx, and then provide the ability to reorder on Tx to preserve the original order.

Packet Distributor (phase 2): Implement the following enhancements to the Packet Distributor that was originally delivered in the DPDK 1.7 release: performance improvements; the ability for packets from a flow to be processed by multiple worker cores in parallel and then reordered on Tx using the Packet Reordering feature; the ability to have multiple Distributors which share Worker cores.

Support Multiple Threads per Core: Use Linux cgroups to allow multiple threads to run on a single core. This would be useful in situations where a DPDK thread does not require the full resources of a core.

Support the Fedora 21 OS.

Support the host interface for Intel's next generation Ethernet switch. This only covers basic support for the host interface. Support for additional features will be added later.

Cuckoo Hash: A new hash algorithm was implemented as part of the Cuckoo Switch project (see http://www.cs.cmu.edu/~dongz/papers/cuckooswitch.pdf), and shows some promising performance results. This needs to be modified to make it more generic, and then incorporated into DPDK.

Provide DPDK support for uio_pci_generic.

Integrated Qemu Userspace vHost: Modify Userspace vHost to use Qemu version 2.1, and remove the need for the kernel module (cuse.ko).

PCI Hot Plug: When you migrate a VM, you need hot plug support as the new VF on the new hardware you are running on post-migration needs to be initialized. With an emulated NIC migration is seamless as all configuration for the NIC is within the RAM of the VM and the hypervisor. With a VF you have actual hardware in the picture which needs to be set up properly.

Additional XL710/X710 40-Gigabit Ethernet Controller Features: Support for additional XL710/X710 40-Gigabit Ethernet Controller features, including bandwidth and QoS management, NVGRE and other network overlay support, TSO, IEEE1588, DCB support. SR-IOV switching and port mirroring.

Single Virtio Driver: Merge existing Virtio drivers into a single implementation, incorporating the best features from each of the existing drivers.

X32 ABI: This is an application binary interface project for the Linux kernel that allows programs to take advantage of the benefits of x86-64 (larger number of CPU registers, better floating-point performance, faster position-independent code shared libraries, function parameters passed via registers, faster syscall instruction) while using 32-bit pointers and thus avoiding the overhead of 64-bit pointers.

AVX2 ACL: Modify ACL library to use AVX2 instructions to improve performance.

Interrupt mode for PMD: Allow DPDK process to transition to interrupt mode when load is low so that other processes can run, or else power can be saved. This will increase latency/jitter.

DPDK Headroom: Provide a mechanism to indicate how much headroom (spare capacity) exists in a DPDK process.


Thanks,
Tim
Thomas Monjalon
2014-10-22 14:20:53 UTC
Permalink
Thanks Tim for sharing your plan.
It's really helpful to improve community collaboration.

I'm sure it's going to generate some interesting discussions.
Please take care to discuss such announce on dev list only.
The ***@dpdk.org list is moderated to keep a low traffic.

I would like to open discussion about a really important feature,
Post by O'driscoll, Tim
Bifurcated Driver: With the Bifurcated Driver, the kernel will retain
direct control of the NIC, and will assign specific queue pairs to DPDK.
Configuration of the NIC is controlled by the kernel via ethtool.
This design allows to keep the configuration code in one place: the kernel.
In the meantime, we are trying to add a lot of code to configure the NICs,
which looks to be a duplication of effort.
Why should we have two ways of configuring e.g. flow director?

Since you at Intel, you'll be supporting your code, I am fine for duplication,
but I feel it's worth arguing why both should be available instead of one.
--
Thomas
Zhou, Danny
2014-10-22 14:44:06 UTC
Permalink
Thomas,

In terms of the bifurcated driver, it is actually the same thing. Specifically, the bifurcated
driver PMD in DPDK depends on kernel code(af_packet and 10G/40G NIC) changes. Once the
kernel patches are upstreamed, the corresponding DPDK PMDs patches will be
submitted to dpdk.org. John Fastabend and John Ronciak are working with very
closely to achieve the same goal.

-Danny
-----Original Message-----
Sent: Wednesday, October 22, 2014 10:21 PM
To: O'driscoll, Tim
Subject: Re: [dpdk-dev] DPDK Features for Q1 2015
Thanks Tim for sharing your plan.
It's really helpful to improve community collaboration.
I'm sure it's going to generate some interesting discussions.
Please take care to discuss such announce on dev list only.
I would like to open discussion about a really important feature,
Post by O'driscoll, Tim
Bifurcated Driver: With the Bifurcated Driver, the kernel will retain
direct control of the NIC, and will assign specific queue pairs to DPDK.
Configuration of the NIC is controlled by the kernel via ethtool.
This design allows to keep the configuration code in one place: the kernel.
In the meantime, we are trying to add a lot of code to configure the NICs,
which looks to be a duplication of effort.
Why should we have two ways of configuring e.g. flow director?
Since you at Intel, you'll be supporting your code, I am fine for duplication,
but I feel it's worth arguing why both should be available instead of one.
--
Thomas
Liang, Cunming
2014-10-22 15:05:32 UTC
Permalink
Post by Thomas Monjalon
This design allows to keep the configuration code in one place: the kernel.
In the meantime, we are trying to add a lot of code to configure the NICs,
which looks to be a duplication of effort.
Why should we have two ways of configuring e.g. flow director?
[Liang, Cunming] The HW sometimes provides additional ability than existing abstraction API.
On that time(HW ability is a superset to the abstraction wrapper, e.g. flow director), we need to provide another choice.
Ethtools is good, but can't apply anything supported in NIC.
Bifurcated driver considers a lot on reusing existing rx/tx routine.
We'll send RFC patch soon if kernel patch moving fast.
-----Original Message-----
Sent: Wednesday, October 22, 2014 10:44 PM
To: Thomas Monjalon; O'driscoll, Tim
Subject: Re: [dpdk-dev] DPDK Features for Q1 2015
Thomas,
In terms of the bifurcated driver, it is actually the same thing. Specifically, the
bifurcated
driver PMD in DPDK depends on kernel code(af_packet and 10G/40G NIC) changes. Once the
kernel patches are upstreamed, the corresponding DPDK PMDs patches will be
submitted to dpdk.org. John Fastabend and John Ronciak are working with very
closely to achieve the same goal.
-Danny
Post by Thomas Monjalon
-----Original Message-----
Sent: Wednesday, October 22, 2014 10:21 PM
To: O'driscoll, Tim
Subject: Re: [dpdk-dev] DPDK Features for Q1 2015
Thanks Tim for sharing your plan.
It's really helpful to improve community collaboration.
I'm sure it's going to generate some interesting discussions.
Please take care to discuss such announce on dev list only.
I would like to open discussion about a really important feature,
Post by O'driscoll, Tim
Bifurcated Driver: With the Bifurcated Driver, the kernel will retain
direct control of the NIC, and will assign specific queue pairs to DPDK.
Configuration of the NIC is controlled by the kernel via ethtool.
This design allows to keep the configuration code in one place: the kernel.
In the meantime, we are trying to add a lot of code to configure the NICs,
which looks to be a duplication of effort.
Why should we have two ways of configuring e.g. flow director?
Since you at Intel, you'll be supporting your code, I am fine for duplication,
but I feel it's worth arguing why both should be available instead of one.
--
Thomas
Zhu, Heqing
2014-10-22 15:36:11 UTC
Permalink
-----Original Message-----
Sent: Wednesday, October 22, 2014 8:06 AM
To: Zhou, Danny; Thomas Monjalon; O'driscoll, Tim
Subject: Re: [dpdk-dev] DPDK Features for Q1 2015
Post by Thomas Monjalon
This design allows to keep the configuration code in one place: the
kernel.
Post by Thomas Monjalon
In the meantime, we are trying to add a lot of code to configure the
NICs, which looks to be a duplication of effort.
Why should we have two ways of configuring e.g. flow director?
[heqing] There will be multiple choices for DPDK usage model if/after this feature is available,
the customer can choose the DPDK with or without the bifurcated driver.
[Liang, Cunming] The HW sometimes provides additional ability than existing
abstraction API.
On that time(HW ability is a superset to the abstraction wrapper, e.g. flow
director), we need to provide another choice.
Ethtools is good, but can't apply anything supported in NIC.
Bifurcated driver considers a lot on reusing existing rx/tx routine.
We'll send RFC patch soon if kernel patch moving fast.
-----Original Message-----
Sent: Wednesday, October 22, 2014 10:44 PM
To: Thomas Monjalon; O'driscoll, Tim
Subject: Re: [dpdk-dev] DPDK Features for Q1 2015
Thomas,
In terms of the bifurcated driver, it is actually the same thing.
Specifically, the bifurcated driver PMD in DPDK depends on kernel
code(af_packet and 10G/40G NIC) changes. Once the kernel patches are
upstreamed, the corresponding DPDK PMDs patches will be submitted to
dpdk.org. John Fastabend and John Ronciak are working with very
closely to achieve the same goal.
-Danny
Post by Thomas Monjalon
-----Original Message-----
Monjalon
Post by Thomas Monjalon
Sent: Wednesday, October 22, 2014 10:21 PM
To: O'driscoll, Tim
Subject: Re: [dpdk-dev] DPDK Features for Q1 2015
Thanks Tim for sharing your plan.
It's really helpful to improve community collaboration.
I'm sure it's going to generate some interesting discussions.
Please take care to discuss such announce on dev list only.
I would like to open discussion about a really important feature,
Post by O'driscoll, Tim
Bifurcated Driver: With the Bifurcated Driver, the kernel will
retain direct control of the NIC, and will assign specific queue pairs to
DPDK.
Post by Thomas Monjalon
Post by O'driscoll, Tim
Configuration of the NIC is controlled by the kernel via ethtool.
This design allows to keep the configuration code in one place: the
kernel.
Post by Thomas Monjalon
In the meantime, we are trying to add a lot of code to configure the
NICs, which looks to be a duplication of effort.
Why should we have two ways of configuring e.g. flow director?
Since you at Intel, you'll be supporting your code, I am fine for
duplication, but I feel it's worth arguing why both should be available
instead of one.
Post by Thomas Monjalon
--
Thomas
Luke Gorrie
2014-10-22 16:10:13 UTC
Permalink
Hi Tim,
Post by O'driscoll, Tim
Bifurcated Driver: With the Bifurcated Driver, the kernel will retain
direct control of the NIC, and will assign specific queue pairs to DPDK.
Configuration of the NIC is controlled by the kernel via ethtool.
That sounds awesome and potentially really useful for other people writing
userspace data planes too. If I understand correctly, this way the messy
details can be contained in one place (kernel) and the application (DPDK
PMD or otherwise) will access the NIC TX/RX queue via the ABI defined in
the hardware data sheet.

Single Virtio Driver: Merge existing Virtio drivers into a single
Post by O'driscoll, Tim
implementation, incorporating the best features from each of the existing
drivers.
Cool. Do you have a strategy in mind already for zero-copy optimisation
with VMDq? I have seen some patches floating around for this and it's an
area of active interest for myself and others. I see a lot of potential for
making this work more effectively with some modest extensions to Virtio and
guest behaviour, and would love to meet kindred spirits who are thinking
along these lines too.
O'driscoll, Tim
2014-10-23 12:29:04 UTC
Permalink
Post by O'driscoll, Tim
Single Virtio Driver: Merge existing Virtio drivers into a single implementation, incorporating
the best features from each of the existing drivers.
Cool. Do you have a strategy in mind already for zero-copy optimisation with VMDq? I have
seen some patches floating around for this and it's an area of active interest for myself and
others. I see a lot of potential for making this work more effectively with some modest
extensions to Virtio and guest behaviour, and would love to meet kindred spirits who are
thinking along these lines too.
At the moment, we're not planning any additional work in this area. We would be interested in hearing more details on your thoughts for improvements in this area though, and I'm sure others in the community would be interested in this too. Have you thought about submitting an RFC to prompt a dis
Matthew Hall
2014-10-22 19:22:19 UTC
Permalink
Post by O'driscoll, Tim
Single Virtio Driver: Merge existing Virtio drivers into a single
implementation, incorporating the best features from each of the existing
drivers.
Tim,

There is a lot of good stuff in there.

Specifically, in the virtio-net case above, I have discovered, and Sergio at
Intel just reproduced today, that neither virtio PMD works at all inside of
VirtualBox. One can't init, and the other gets into an infinite loop. But yet
it's claiming support for VBox on the DPDK Supported NICs page though it
doesn't seem it ever could have worked.

So I'd like to request an initiative alongside any virtio-net and/or vmxnet3
type of changes, to make some kind of a Virtualization Test Lab, where we
support VMWare ESXi, QEMU, Xen, VBox, and the other popular VM systems.

Otherwise it's hard for us community / app developers to make the DPDK
available to end users in simple, elegant ways, such as packaging it into
Vagrant VM's, Amazon AMI's etc. which are prebaked and ready-to-run.

Note personally of course I prefer using things like the 82599... but these
are only going to be present after the customers have begun to adopt and test
in the virtual environment, then they decide they like it and want to scale up
to bigger boxes.

Another thing which would help in this area would be additional improvements
to the NUMA / socket / core / number of NICs / number of queues
autodetections. To write a single app which can run on a virtual card, a
hardware card without RSS available, and a hardware card with RSS available,
in a thread-safe, flow-safe way, is somewhat complex at the present time.

I'm running into this in the VM based environments because most VNIC's don't
have RSS and it complicates the process of keeping consistent state of the
flows among the cores.

Thanks,
Matthew.
O'driscoll, Tim
2014-10-24 08:10:40 UTC
Permalink
Post by O'driscoll, Tim
Single Virtio Driver: Merge existing Virtio drivers into a single
implementation, incorporating the best features from each of the
existing drivers.
Specifically, in the virtio-net case above, I have discovered, and Sergio at Intel
just reproduced today, that neither virtio PMD works at all inside of
VirtualBox. One can't init, and the other gets into an infinite loop. But yet it's
claiming support for VBox on the DPDK Supported NICs page though it
doesn't seem it ever could have worked.
At the moment, within Intel we test with KVM, Xen and ESXi. We've never tested with VirtualBox. So, maybe this is an error on the Supported NICs page, or maybe somebody else is testing that configuration.
So I'd like to request an initiative alongside any virtio-net and/or vmxnet3
type of changes, to make some kind of a Virtualization Test Lab, where we
support VMWare ESXi, QEMU, Xen, VBox, and the other popular VM
systems.
Otherwise it's hard for us community / app developers to make the DPDK
available to end users in simple, elegant ways, such as packaging it into
Vagrant VM's, Amazon AMI's etc. which are prebaked and ready-to-run.
Expanding the scope of virtualization testing is a good idea, especially given industry trends like NFV. We're in the process of getting our DPDK Test Suite ready to push to dpdk.org soon. The hope is that others will use it to validate changes they're making to DPDK, and contribute test cases so that we can build up a more comprehensive set over time.

One area where this does need further work is in virtualization. At the moment, our virtualization tests are manual, so they won't be included in the initial DPDK Test Suite release. We will look into automating our current virtualization tests and adding these to the test suite in future.
Another thing which would help in this area would be additional
improvements to the NUMA / socket / core / number of NICs / number of
queues autodetections. To write a single app which can run on a virtual card,
a hardware card without RSS available, and a hardware card with RSS
available, in a thread-safe, flow-safe way, is somewhat complex at the
present time.
I'm running into this in the VM based environments because most VNIC's
don't have RSS and it complicates the process of keeping consistent state of
the flows among the cores.
This is interesting. Do you have more details on what you're thinking here, that perhaps could be used as the basis for an RFC?


Tim
Thomas Monjalon
2014-10-24 10:10:20 UTC
Permalink
Post by O'driscoll, Tim
Specifically, in the virtio-net case above, I have discovered, and Sergio at Intel
just reproduced today, that neither virtio PMD works at all inside of
VirtualBox. One can't init, and the other gets into an infinite loop. But yet it's
claiming support for VBox on the DPDK Supported NICs page though it
doesn't seem it ever could have worked.
At the moment, within Intel we test with KVM, Xen and ESXi. We've never
tested with VirtualBox. So, maybe this is an error on the Supported NICs
page, or maybe somebody else is testing that configuration.
I'm the author of this page. I think I've written VirtualBox to show where
virtio is implemented. You interpreted this as "supported environment", so
I'm removing it.
Thanks for testing and reporting.
--
Thomas
Matthew Hall
2014-10-24 19:02:21 UTC
Permalink
Post by Thomas Monjalon
I'm the author of this page. I think I've written VirtualBox to show where
virtio is implemented. You interpreted this as "supported environment", so
I'm removing it. Thanks for testing and reporting.
Of course, I'm very sorry to see VirtualBox go, but happy to have accurate
documentation.

Thanks Thomas.

Matthew.
Matthew Hall
2014-10-24 19:01:26 UTC
Permalink
Post by O'driscoll, Tim
At the moment, within Intel we test with KVM, Xen and ESXi. We've never
tested with VirtualBox. So, maybe this is an error on the Supported NICs
page, or maybe somebody else is testing that configuration.
So, one of the most popular ways developers test out new code these days is
using Vagrant or Docker. Vagrant by default creates machines using VirtualBox.
VirtualBox runs on nearly everything out there (Linux, Windows, OS X, and
more). Docker uses Linux LXC so it isn't multiplatform. There is a system
called CoreOS which is still under development. It requires bare-metal w/
custom Linux on top.

https://www.vagrantup.com/
https://www.docker.com/
https://coreos.com/

As an open source DPDK app developer, who previously used it successfully in
some commercial big-iron projects in the past, now I'm trying to drive
adoption of the technology among security programmers. I'm doing it because I
think DPDK is better than everything else I've seen for packet processing.

So it would help to drive adoption if there were a multiplatform
virtualization environment that worked with the best performing DPDK drivers,
so I could make it easy for developers to download, install, and run, so
they'll get excited and learn more about all the great work you guys did and
use it to build more DPDK apps.

I don't care if it's VBox necessarily. But we should support at least 1
end-developer-friendly Virtualization environment so I can make it easy to
deploy and run an app and get people excited to work with the DPDK. Low
barrier to entry is important.
Post by O'driscoll, Tim
One area where this does need further work is in virtualization. At the
moment, our virtualization tests are manual, so they won't be included in
the initial DPDK Test Suite release. We will look into automating our
current virtualization tests and adding these to the test suite in future.
Sounds good. Then we could help you make it work and keep it working on more
platforms.
Post by O'driscoll, Tim
Post by Matthew Hall
Another thing which would help in this area would be additional
improvements to the NUMA / socket / core / number of NICs / number of
queues autodetections. To write a single app which can run on a virtual card,
a hardware card without RSS available, and a hardware card with RSS
available, in a thread-safe, flow-safe way, is somewhat complex at the
present time.
I'm running into this in the VM based environments because most VNIC's
don't have RSS and it complicates the process of keeping consistent state of
the flows among the cores.
This is interesting. Do you have more details on what you're thinking here,
that perhaps could be used as the basis for an RFC?
It's something I am still trying to figure out how to deal with actually,
hence all my virtio-net questions and PCI bus questions I've been hounding
about on the list the last few weeks. It would be good if you had a contact
for the virtual DPDK at Intel or 6WIND who could help me figure out the
solution pattern.

I think it might involve making an app or some DPDK helper code which has
something like this algorithm:

At load-time, app autodetects if RSS is available or not, and if NUMA is
present or not.

If RSS is available, and NUMA is not available, enable RSS and create 1 RX
queue for each lcore.

If RSS is available, and NUMA is available, find the NUMA socket of the NIC,
and make 1 RX queue for each connected lcore on that NUMA socket.

If RSS is not available, and NUMA is not available, then configure the
distributor framework. (I never used it so I am not sure if this part is
right). Create 1 Load Balance on master lcore that does RX from all NICs,
and hashes up and distributes packets to every other lcore.

If RSS is not available, and NUMA is available, then configure the distributor
framework. (Again this might not be right). Create 1 Load Balance on first
lcore on each socket that does RX from all NUMA connected NICs, and hashes up
and distibutes packets to other NUMA connected lcores.
Post by O'driscoll, Tim
Tim
Thanks,
Matthew.
Tetsuya Mukawa
2014-10-23 03:06:49 UTC
Permalink
Hi,
Post by O'driscoll, Tim
PCI Hot Plug: When you migrate a VM, you need hot plug support as the new VF on the new hardware you are running on post-migration needs to be initialized. With an emulated NIC migration is seamless as all configuration for the NIC is within the RAM of the VM and the hypervisor. With a VF you have actual hardware in the picture which needs to be set up properly.
I have patch series for that feature.
The patches add feature that DPDK apps can attach and detach physical
NIC ports and virtual device ports at runtime.
Also I have patches for testpmd to attach and detach ports dynamically.

For example, after patching, we can type following commands.
testpmd> port attach p 0000:02:00.0
testpmd> port attach v eth_pcap0,iface=eth0
testpmd> port detach p <detaching port_id>
testpmd> port detach v <detaching port_id>
(These are just RFC.)

Now I am collecting up patches to submit to dpdk.org. So I can send RFC
patches soon. Hopefully next week.

Thanks,
Tetsuya
O'driscoll, Tim
2014-10-23 10:04:50 UTC
Permalink
Post by O'driscoll, Tim
Post by O'driscoll, Tim
PCI Hot Plug: When you migrate a VM, you need hot plug support as the
new VF on the new hardware you are running on post-migration needs to be
initialized. With an emulated NIC migration is seamless as all configuration for
the NIC is within the RAM of the VM and the hypervisor. With a VF you have
actual hardware in the picture which needs to be set up properly.
I have patch series for that feature.
The patches add feature that DPDK apps can attach and detach physical NIC
ports and virtual device ports at runtime.
Also I have patches for testpmd to attach and detach ports dynamically.
For example, after patching, we can type following commands.
testpmd> port attach p 0000:02:00.0
testpmd> port attach v eth_pcap0,iface=eth0 port detach p <detaching
testpmd> port_id> port detach v <detaching port_id>
(These are just RFC.)
Now I am collecting up patches to submit to dpdk.org. So I can send RFC
patches soon. Hopefully next week.
That's great. I also heard privately from another person who is also working on patches for this. If you can submit an RFC, then we should be able to have a discussion on it and avoid duplication of effort.


Tim
Tetsuya Mukawa
2014-10-23 03:17:28 UTC
Permalink
Hi All,
Post by O'driscoll, Tim
Single Virtio Driver: Merge existing Virtio drivers into a single implementation, incorporating the best features from each of the existing drivers.
It's nice plan. We should do it.
In my understanding, the following drivers could be merged into a single
virtio PMD since they consist of similar code for handling the virtio ring.

- librte_pmd_virtio
- librte_pmd_xenvirt
- librte_vhost (cuse)

librte_vhost is not a PMD, but we can easily write a librte_pmd_vhost
based on librte_vhost.
Before doing it, we need to consider vhost-user extension for librte_vhost.

BTW, I have a RFC patch for librte_vhost to handle vhost-user.
It may be the first step to merge all virtio drivers.

My patch introduces an abstraction layer to hide differences between
legacy cuse vhost and vhost-user from DPDK apps.
Also in my patch, virtio negotiation and initialization code and virtio
handling code is separated.
So, legacy cuse vhost and vhost-user can share virtio handling code

Anyway, I will send a RFC patch soon as the first step of merging all
virtio drivers.

Thanks,
Tetsuya
O'driscoll, Tim
2014-10-23 11:27:41 UTC
Permalink
Post by O'driscoll, Tim
Post by O'driscoll, Tim
Single Virtio Driver: Merge existing Virtio drivers into a single
implementation, incorporating the best features from each of the existing
drivers.
It's nice plan. We should do it.
In my understanding, the following drivers could be merged into a single
virtio PMD since they consist of similar code for handling the virtio ring.
- librte_pmd_virtio
- librte_pmd_xenvirt
- librte_vhost (cuse)
librte_vhost is not a PMD, but we can easily write a librte_pmd_vhost based
on librte_vhost.
Before doing it, we need to consider vhost-user extension for librte_vhost.
BTW, I have a RFC patch for librte_vhost to handle vhost-user.
It may be the first step to merge all virtio drivers.
My patch introduces an abstraction layer to hide differences between legacy
cuse vhost and vhost-user from DPDK apps.
Also in my patch, virtio negotiation and initialization code and virtio handling
code is separated.
So, legacy cuse vhost and vhost-user can share virtio handling code
Anyway, I will send a RFC patch soon as the first step of merging all virtio
drivers.
That's great Tetsuya. There was some discussion on the mailing list previously on vhost-user in response to an RFC from Huawei Xie (http://dpdk.org/ml/archives/dev/2014-August/004875.html). If you're planning an additional RFC on this, that should help to progress things and to make sure we're not duplicating work.


Tim
Xie, Huawei
2014-10-31 22:05:24 UTC
Permalink
Hi Tetsuya:
I am implementing vhost-user, and the functionality works now.
During this work, I have refactored vhost code a bit for better modularization, basically
virtio part, control message part(vhost-user, vhost cuse) and data part. :).
Let us see your patch, if its modularization is further, I will generate the
vhost-user patch based on yours rather than mine, :).
-----Original Message-----
Sent: Wednesday, October 22, 2014 8:17 PM
To: O'driscoll, Tim
Subject: Re: [dpdk-dev] [dpdk-announce] DPDK Features for Q1 2015
Hi All,
Post by O'driscoll, Tim
Single Virtio Driver: Merge existing Virtio drivers into a single implementation,
incorporating the best features from each of the existing drivers.
It's nice plan. We should do it.
In my understanding, the following drivers could be merged into a single
virtio PMD since they consist of similar code for handling the virtio ring.
- librte_pmd_virtio
- librte_pmd_xenvirt
- librte_vhost (cuse)
librte_vhost is not a PMD, but we can easily write a librte_pmd_vhost
based on librte_vhost.
Before doing it, we need to consider vhost-user extension for librte_vhost.
BTW, I have a RFC patch for librte_vhost to handle vhost-user.
It may be the first step to merge all virtio drivers.
My patch introduces an abstraction layer to hide differences between
legacy cuse vhost and vhost-user from DPDK apps.
Also in my patch, virtio negotiation and initialization code and virtio
handling code is separated.
So, legacy cuse vhost and vhost-user can share virtio handling code
Anyway, I will send a RFC patch soon as the first step of merging all
virtio drivers.
Thanks,
Tetsuya
Tetsuya Mukawa
2014-11-02 12:50:50 UTC
Permalink
Hi Xie
Post by Xie, Huawei
I am implementing vhost-user, and the functionality works now.
During this work, I have refactored vhost code a bit for better modularization, basically
virtio part, control message part(vhost-user, vhost cuse) and data part. :).
Let us see your patch, if its modularization is further, I will generate the
vhost-user patch based on yours rather than mine, :).
Sure. My patches are based on old your librte_vhost patches.
So I will rebase my patches and send it in few days.

Thanks,
Tetsuya

Jay Rolette
2014-10-23 14:18:26 UTC
Permalink
Tim,

Thanks for sharing this. If nothing else, I wanted to at least provide some
feedback on the parts that look useful to me for my applications/product.
Bits that make me interested in the release:



*> 2.0 (Q1 2015) DPDK Features:> Bifurcated Driver: With the Bifurcated
Driver, the kernel will retain direct control of the NIC, and will assign
specific queue pairs to DPDK. Configuration of the NIC is controlled by the
kernel via ethtool.*

Having NIC configuration, port stats, etc. available via the normal Linux
tools is very helpful - particularly on new products just getting started
with DPDK.


*> Packet Reordering: Assign a sequence number to packets on Rx, and then
provide the ability to reorder on Tx to preserve the original order.*

This could be extremely useful but it depends on where it goes. The current
design being discussed seems fundamentally flawed to me. See the thread on
the RFC for details.


*> Packet Distributor (phase 2): Implement the following enhancements to
the Packet Distributor that was originally delivered in the DPDK 1.7
release: performance improvements; the ability for packets from a flow to
be processed by multiple worker cores in parallel and then reordered on Tx
using the Packet Reordering feature; the ability to have multiple
Distributors which share Worker cores.*

TBD on this for me. The 1.0 version of our product is based on DPDK 1.6 and
I haven't had a chance to look at what is happening with Packet Distributor
yet. An area of potential interest at least.


*> Cuckoo Hash: A new hash algorithm was implemented as part of the Cuckoo
Switch project (see http://www.cs.cmu.edu/~dongz/papers/cuckooswitch.pdf
<http://www.cs.cmu.edu/~dongz/papers/cuckooswitch.pdf>), and shows some
promising performance results. This needs to be modified to make it more
generic, and then incorporated into DPDK.*

More performance == creamy goodness, especially if it is in the plumbing
and doesn't require significant app changes.


*> Interrupt mode for PMD: Allow DPDK process to transition to interrupt
mode when load is low so that other processes can run, or else power can be
saved. This will increase latency/jitter.*

Yes! I don't care about power savings, but I do care about giving a good
product impression in the lab during evals without having to sacrifice
overall system performance when under load. Hybrid drivers that use
interrupts when load is low and poll-mode when loaded are ideal, IMO.

It seems an odd thing, but during lab testing, it is normal for customers
to fire the box up and just start running pings or some other low volume
traffic through the box. If the PMDs are configured to batch in sizes
optimal for best performance under load, the system can look *really* bad
in these initial tests. We go through a fair bit of gymnastics right now to
work around this without just giving up on batching in the PMDs.


*> DPDK Headroom: Provide a mechanism to indicate how much headroom (spare
capacity) exists in a DPDK process.*

Very helpful in the field. Anything that helps customers understand how
much headroom is left on their box before they need to take action is a
huge win. CPU utilization is a bad indicator, especially with a PMD
architecture.

Hope this type of feedback is helpful.

Regards,
Jay
Alex Markuze
2014-10-23 14:52:42 UTC
Permalink
Post by Jay Rolette
Tim,
Thanks for sharing this. If nothing else, I wanted to at least provide some
feedback on the parts that look useful to me for my applications/product.
*> 2.0 (Q1 2015) DPDK Features:> Bifurcated Driver: With the Bifurcated
Driver, the kernel will retain direct control of the NIC, and will assign
specific queue pairs to DPDK. Configuration of the NIC is controlled by the
kernel via ethtool.*
Having NIC configuration, port stats, etc. available via the normal Linux
tools is very helpful - particularly on new products just getting started
with DPDK.
*> Packet Reordering: Assign a sequence number to packets on Rx, and then
provide the ability to reorder on Tx to preserve the original order.*
This could be extremely useful but it depends on where it goes. The current
design being discussed seems fundamentally flawed to me. See the thread on
the RFC for details.
*> Packet Distributor (phase 2): Implement the following enhancements to
the Packet Distributor that was originally delivered in the DPDK 1.7
release: performance improvements; the ability for packets from a flow to
be processed by multiple worker cores in parallel and then reordered on Tx
using the Packet Reordering feature; the ability to have multiple
Distributors which share Worker cores.*
TBD on this for me. The 1.0 version of our product is based on DPDK 1.6 and
I haven't had a chance to look at what is happening with Packet Distributor
yet. An area of potential interest at least.
*> Cuckoo Hash: A new hash algorithm was implemented as part of the Cuckoo
Switch project (see http://www.cs.cmu.edu/~dongz/papers/cuckooswitch.pdf
<http://www.cs.cmu.edu/~dongz/papers/cuckooswitch.pdf>), and shows some
promising performance results. This needs to be modified to make it more
generic, and then incorporated into DPDK.*
More performance == creamy goodness, especially if it is in the plumbing
and doesn't require significant app changes.
*> Interrupt mode for PMD: Allow DPDK process to transition to interrupt
mode when load is low so that other processes can run, or else power can be
saved. This will increase latency/jitter.*
Yes! I don't care about power savings, but I do care about giving a good
product impression in the lab during evals without having to sacrifice
overall system performance when under load. Hybrid drivers that use
interrupts when load is low and poll-mode when loaded are ideal, IMO.
It seems an odd thing, but during lab testing, it is normal for customers
to fire the box up and just start running pings or some other low volume
traffic through the box. If the PMDs are configured to batch in sizes
optimal for best performance under load, the system can look *really* bad
in these initial tests. We go through a fair bit of gymnastics right now to
work around this without just giving up on batching in the PMDs.
*>> I second this, DPDK is great for kernel bypass and zero-copy. But not
all apps are network bound, thus interrupt mode is something that is
extremely helpful.
*> DPDK Headroom: Provide a mechanism to indicate how much headroom (spare
capacity) exists in a DPDK process.*
Very helpful in the field. Anything that helps customers understand how
much headroom is left on their box before they need to take action is a
huge win. CPU utilization is a bad indicator, especially with a PMD
architecture.
Hope this type of feedback is helpful.
Regards,
Jay
Loading...