Discussion:
[dpdk-dev] [RFC PATCH v2] Add VHOST PMD
(too old to reply)
Tetsuya Mukawa
2015-08-31 03:55:25 UTC
Permalink
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. To work the PMD correctly, below patches are needed.

- [PATCH 1/3] vhost: Fix return value of GET_VRING_BASE message
- [PATCH 2/3] vhost: Fix RESET_OWNER handling not to close callfd
- [PATCH 3/3] vhost: Fix RESET_OWNER handling not to free virtqueue


PATCH v2 changes:
- Fix issues reported by checkpatch.pl
(Thanks to Stephen Hemminger)

Tetsuya Mukawa (1):
vhost: Add VHOST PMD

config/common_linuxapp | 6 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 61 +++
drivers/net/vhost/rte_eth_vhost.c | 640 ++++++++++++++++++++++++++++
drivers/net/vhost/rte_pmd_vhost_version.map | 4 +
mk/rte.app.mk | 8 +-
6 files changed, 722 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
--
2.1.4
Tetsuya Mukawa
2015-08-31 03:55:26 UTC
Permalink
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The PMD can have 'iface' parameter like below to specify a path to connect
to a virtio-net device.

$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
config/common_linuxapp | 6 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 61 +++
drivers/net/vhost/rte_eth_vhost.c | 640 ++++++++++++++++++++++++++++
drivers/net/vhost/rte_pmd_vhost_version.map | 4 +
mk/rte.app.mk | 8 +-
6 files changed, 722 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..7310240 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -446,6 +446,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
CONFIG_RTE_LIBRTE_VHOST_DEBUG=n

#
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
#Compile Xen domain0 support
#
CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 5ebf963..e46a38e 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -49,5 +49,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt

+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
include $(RTE_SDK)/mk/rte.sharelib.mk
include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..018edde
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,61 @@
+# BSD LICENSE
+#
+# Copyright (c) 2010-2015 Intel Corporation.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in
+# the documentation and/or other materials provided with the
+# distribution.
+# * Neither the name of Intel corporation nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include +=
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..679e893
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,640 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2010-2015 Intel Corporation.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#define ETH_VHOST_IFACE_ARG "iface"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+ ETH_VHOST_IFACE_ARG,
+ NULL
+};
+
+static struct ether_addr base_eth_addr = {
+ .addr_bytes = {
+ 0x56 /* V */,
+ 0x48 /* H */,
+ 0x4F /* O */,
+ 0x53 /* S */,
+ 0x54 /* T */,
+ 0x00
+ }
+};
+
+struct vhost_queue {
+ struct virtio_net *device;
+ struct pmd_internal *internal;
+ struct rte_mempool *mb_pool;
+ rte_atomic64_t rx_pkts;
+ rte_atomic64_t tx_pkts;
+ rte_atomic64_t err_pkts;
+ rte_atomic16_t rx_executing;
+ rte_atomic16_t tx_executing;
+};
+
+struct pmd_internal {
+ TAILQ_ENTRY(pmd_internal) next;
+ char *dev_name;
+ char *iface_name;
+ unsigned nb_rx_queues;
+ unsigned nb_tx_queues;
+ rte_atomic16_t xfer;
+
+ struct vhost_queue rx_vhost_queues[RTE_PMD_RING_MAX_RX_RINGS];
+ struct vhost_queue tx_vhost_queues[RTE_PMD_RING_MAX_TX_RINGS];
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+ TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static struct rte_eth_link pmd_link = {
+ .link_speed = 10000,
+ .link_duplex = ETH_LINK_FULL_DUPLEX,
+ .link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t nb_rx = 0;
+
+ if (unlikely(r->internal == NULL))
+ return 0;
+
+ if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+ return 0;
+
+ rte_atomic16_set(&r->rx_executing, 1);
+
+ if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+ goto out;
+
+ nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+ VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
+
+ rte_atomic64_add(&(r->rx_pkts), nb_rx);
+
+out:
+ rte_atomic16_set(&r->rx_executing, 0);
+
+ return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(r->internal == NULL))
+ return 0;
+
+ if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+ return 0;
+
+ rte_atomic16_set(&r->tx_executing, 1);
+
+ if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+ goto out;
+
+ nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+ VIRTIO_RXQ, bufs, nb_bufs);
+
+ rte_atomic64_add(&(r->tx_pkts), nb_tx);
+ rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
+
+out:
+ rte_atomic16_set(&r->tx_executing, 0);
+
+ return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ return rte_vhost_driver_register(internal->iface_name);
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ rte_vhost_driver_unregister(internal->iface_name);
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ internal->rx_vhost_queues[rx_queue_id].mb_pool = mb_pool;
+ dev->data->rx_queues[rx_queue_id] = &internal->rx_vhost_queues[rx_queue_id];
+ return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ dev->data->tx_queues[tx_queue_id] = &internal->tx_vhost_queues[tx_queue_id];
+ return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+ struct rte_eth_dev_info *dev_info)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ dev_info->driver_name = drivername;
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)-1;
+ dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+ dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+ dev_info->min_rx_bufsize = 0;
+ dev_info->pci_dev = dev->pci_dev;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+ unsigned i;
+ unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+ const struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_rx_queues; i++) {
+ igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i].rx_pkts.cnt;
+ rx_total += igb_stats->q_ipackets[i];
+ }
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_tx_queues; i++) {
+ igb_stats->q_opackets[i] = internal->tx_vhost_queues[i].tx_pkts.cnt;
+ igb_stats->q_errors[i] = internal->tx_vhost_queues[i].err_pkts.cnt;
+ tx_total += igb_stats->q_opackets[i];
+ tx_err_total += igb_stats->q_errors[i];
+ }
+
+ igb_stats->ipackets = rx_total;
+ igb_stats->opackets = tx_total;
+ igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+ unsigned i;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++)
+ internal->rx_vhost_queues[i].rx_pkts.cnt = 0;
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ internal->tx_vhost_queues[i].tx_pkts.cnt = 0;
+ internal->tx_vhost_queues[i].err_pkts.cnt = 0;
+ }
+}
+
+static void
+eth_queue_release(void *q __rte_unused) { ; }
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused) { return 0; }
+
+static const struct eth_dev_ops ops = {
+ .dev_start = eth_dev_start,
+ .dev_stop = eth_dev_stop,
+ .dev_configure = eth_dev_configure,
+ .dev_infos_get = eth_dev_info,
+ .rx_queue_setup = eth_rx_queue_setup,
+ .tx_queue_setup = eth_tx_queue_setup,
+ .rx_queue_release = eth_queue_release,
+ .tx_queue_release = eth_queue_release,
+ .link_update = eth_link_update,
+ .stats_get = eth_stats_get,
+ .stats_reset = eth_stats_reset,
+};
+
+static struct eth_driver rte_vhost_pmd = {
+ .pci_drv = {
+ .name = "rte_vhost_pmd",
+ .drv_flags = RTE_PCI_DRV_DETACHABLE,
+ },
+};
+
+static struct rte_pci_id id_table;
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+ int found = 0;
+ struct pmd_internal *internal;
+
+ if (ifname == NULL)
+ return NULL;
+
+ pthread_mutex_lock(&internal_list_lock);
+
+ TAILQ_FOREACH(internal, &internals_list, next) {
+ if (!strcmp(internal->iface_name, ifname)) {
+ found = 1;
+ break;
+ }
+ }
+
+ pthread_mutex_unlock(&internal_list_lock);
+
+ if (!found)
+ return NULL;
+
+ return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "invalid argument\n");
+ return -1;
+ }
+
+ internal = find_internal_resource(dev->ifname);
+ if (internal == NULL) {
+ RTE_LOG(INFO, PMD, "invalid device name\n");
+ return -1;
+ }
+
+ eth_dev = rte_eth_dev_allocated(internal->dev_name);
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
+ return -1;
+ }
+
+ internal = eth_dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ vq->device = dev;
+ vq->internal = internal;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ vq->device = dev;
+ vq->internal = internal;
+ }
+
+ dev->flags |= VIRTIO_DEV_RUNNING;
+ dev->priv = eth_dev;
+
+ eth_dev->data->dev_link.link_status = 1;
+ rte_atomic16_set(&internal->xfer, 1);
+
+ RTE_LOG(INFO, PMD, "New connection established\n");
+
+ return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "invalid argument\n");
+ return;
+ }
+
+ eth_dev = (struct rte_eth_dev *)dev->priv;
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "failuer to find a ethdev\n");
+ return;
+ }
+
+ internal = eth_dev->data->dev_private;
+
+ /* Wait until rx/tx_pkt_burst stops accesing vhost device */
+ rte_atomic16_set(&internal->xfer, 0);
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ while (rte_atomic16_read(&vq->rx_executing))
+ rte_pause();
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ while (rte_atomic16_read(&vq->tx_executing))
+ rte_pause();
+ }
+
+ eth_dev->data->dev_link.link_status = 0;
+
+ dev->priv = NULL;
+ dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ vq->device = NULL;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ vq->device = NULL;
+ }
+
+ RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *vhost_driver_session(void *param __rte_unused)
+{
+ static struct virtio_net_device_ops *vhost_ops;
+
+ vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+ if (vhost_ops == NULL)
+ rte_panic("Can't allocate memory\n");
+
+ /* set vhost arguments */
+ vhost_ops->new_device = new_device;
+ vhost_ops->destroy_device = destroy_device;
+ if (rte_vhost_driver_callback_register(vhost_ops) < 0)
+ rte_panic("Can't register callbacks\n");
+
+ /* start event handling */
+ rte_vhost_driver_session_start();
+
+ rte_free(vhost_ops);
+ pthread_exit(0);
+}
+
+static pthread_once_t once_cont = PTHREAD_ONCE_INIT;
+static pthread_t session_th;
+
+static void vhost_driver_session_start(void)
+{
+ int ret;
+
+ ret = pthread_create(&session_th, NULL, vhost_driver_session, NULL);
+ if (ret)
+ rte_panic("Can't create a thread\n");
+}
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+ char *iface_name,
+ const unsigned numa_node)
+{
+ struct rte_eth_dev_data *data = NULL;
+ struct rte_pci_device *pci_dev = NULL;
+ struct pmd_internal *internal = NULL;
+ struct rte_eth_dev *eth_dev = NULL;
+ struct ether_addr *eth_addr = NULL;
+ uint16_t nb_rx_queues = 1;
+ uint16_t nb_tx_queues = 1;
+
+ RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+ numa_node);
+
+ /* now do all data allocation - for eth_dev structure, dummy pci driver
+ * and internal (private) data
+ */
+ data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+ if (data == NULL)
+ goto error;
+
+ pci_dev = rte_zmalloc_socket(name, sizeof(*pci_dev), 0, numa_node);
+ if (pci_dev == NULL)
+ goto error;
+
+ internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+ if (internal == NULL)
+ goto error;
+
+ eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+ if (eth_addr == NULL)
+ goto error;
+ *eth_addr = base_eth_addr;
+ eth_addr->addr_bytes[5] = index;
+
+ /* reserve an ethdev entry */
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+ if (eth_dev == NULL)
+ goto error;
+
+ /* now put it all together
+ * - store queue data in internal,
+ * - store numa_node info in pci_driver
+ * - point eth_dev_data to internal and pci_driver
+ * - and point eth_dev structure to new eth_dev_data structure
+ */
+ internal->nb_rx_queues = nb_rx_queues;
+ internal->nb_tx_queues = nb_tx_queues;
+ internal->dev_name = strdup(name);
+ if (internal->dev_name == NULL)
+ goto error;
+ internal->iface_name = strdup(iface_name);
+ if (internal->iface_name == NULL)
+ goto error;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_INSERT_TAIL(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ rte_vhost_pmd.pci_drv.name = drivername;
+ rte_vhost_pmd.pci_drv.id_table = &id_table;
+
+ pci_dev->numa_node = numa_node;
+ pci_dev->driver = &rte_vhost_pmd.pci_drv;
+
+ data->dev_private = internal;
+ data->port_id = eth_dev->data->port_id;
+ memmove(data->name, eth_dev->data->name, sizeof(data->name));
+ data->nb_rx_queues = (uint16_t)nb_rx_queues;
+ data->nb_tx_queues = (uint16_t)nb_tx_queues;
+ data->dev_link = pmd_link;
+ data->mac_addrs = eth_addr;
+
+ /* We'll replace the 'data' originally allocated by eth_dev. So the
+ * vhost PMD resources won't be shared between multi processes.
+ */
+ eth_dev->data = data;
+ eth_dev->driver = &rte_vhost_pmd;
+ eth_dev->dev_ops = &ops;
+ eth_dev->pci_dev = pci_dev;
+
+ /* finally assign rx and tx ops */
+ eth_dev->rx_pkt_burst = eth_vhost_rx;
+ eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+ /* start vhost driver session. It should be called only once */
+ pthread_once(&once_cont, vhost_driver_session_start);
+
+ return data->port_id;
+
+error:
+ rte_free(data);
+ rte_free(pci_dev);
+ rte_free(internal);
+ rte_free(eth_addr);
+
+ return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **iface_name = extra_args;
+
+ if (value == NULL)
+ return -1;
+
+ *iface_name = value;
+
+ return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+ struct rte_kvargs *kvlist = NULL;
+ int ret = 0;
+ int index;
+ char *iface_name;
+
+ RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+ kvlist = rte_kvargs_parse(params, valid_arguments);
+ if (kvlist == NULL)
+ return -1;
+
+ if (strlen(name) < strlen("eth_vhost"))
+ return -1;
+
+ index = strtol(name + strlen("eth_vhost"), NULL, 0);
+ if (errno == ERANGE)
+ return -1;
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+ &open_iface, &iface_name);
+ if (ret < 0)
+ goto out_free;
+
+ eth_dev_vhost_create(name, index, iface_name, rte_socket_id());
+ }
+
+out_free:
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+ struct rte_eth_dev *eth_dev = NULL;
+ struct pmd_internal *internal;
+
+ RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+ if (name == NULL)
+ return -EINVAL;
+
+ /* find an ethdev entry */
+ eth_dev = rte_eth_dev_allocated(name);
+ if (eth_dev == NULL)
+ return -ENODEV;
+
+ internal = eth_dev->data->dev_private;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_REMOVE(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ eth_dev_stop(eth_dev);
+
+ if ((internal) && (internal->dev_name))
+ free(internal->dev_name);
+ if ((internal) && (internal->iface_name))
+ free(internal->iface_name);
+ rte_free(eth_dev->data->dev_private);
+ rte_free(eth_dev->data);
+ rte_free(eth_dev->pci_dev);
+
+ rte_eth_dev_release_port(eth_dev);
+ return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+ .name = "eth_vhost",
+ .type = PMD_VDEV,
+ .init = rte_pmd_vhost_devinit,
+ .uninit = rte_pmd_vhost_devuninit,
+};
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..5151684
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,4 @@
+DPDK_2.2 {
+
+ local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 3871205..1c42fb1 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -144,7 +144,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += -lrte_pmd_af_packet
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += -lrte_pmd_null

-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)

endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
--
2.1.4
Loftus, Ciara
2015-09-23 17:47:21 UTC
Permalink
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The PMD can have 'iface' parameter like below to specify a path to connect
to a virtio-net device.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
---
config/common_linuxapp | 6 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 61 +++
drivers/net/vhost/rte_eth_vhost.c | 640
++++++++++++++++++++++++++++
drivers/net/vhost/rte_pmd_vhost_version.map | 4 +
mk/rte.app.mk | 8 +-
6 files changed, 722 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..7310240 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -446,6 +446,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
#
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
#Compile Xen domain0 support
#
CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 5ebf963..e46a38e 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -49,5 +49,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
include $(RTE_SDK)/mk/rte.sharelib.mk
include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..018edde
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,61 @@
+# BSD LICENSE
+#
+# Copyright (c) 2010-2015 Intel Corporation.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in
+# the documentation and/or other materials provided with the
+# distribution.
+# * Neither the name of Intel corporation nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include +=
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c
b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..679e893
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,640 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2010-2015 Intel Corporation.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#define ETH_VHOST_IFACE_ARG "iface"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+ ETH_VHOST_IFACE_ARG,
+ NULL
+};
+
+static struct ether_addr base_eth_addr = {
+ .addr_bytes = {
+ 0x56 /* V */,
+ 0x48 /* H */,
+ 0x4F /* O */,
+ 0x53 /* S */,
+ 0x54 /* T */,
+ 0x00
+ }
+};
+
+struct vhost_queue {
+ struct virtio_net *device;
+ struct pmd_internal *internal;
+ struct rte_mempool *mb_pool;
+ rte_atomic64_t rx_pkts;
+ rte_atomic64_t tx_pkts;
+ rte_atomic64_t err_pkts;
+ rte_atomic16_t rx_executing;
+ rte_atomic16_t tx_executing;
+};
+
+struct pmd_internal {
+ TAILQ_ENTRY(pmd_internal) next;
+ char *dev_name;
+ char *iface_name;
+ unsigned nb_rx_queues;
+ unsigned nb_tx_queues;
+ rte_atomic16_t xfer;
Is this flag just used to indicate the state of the virtio_net device?
Ie. if =0 then virtio_dev=NULL and if =1 then virtio_net !=NULL & the VIRTIO_DEV_RUNNING flag is set?
Post by Tetsuya Mukawa
+
+ struct vhost_queue
rx_vhost_queues[RTE_PMD_RING_MAX_RX_RINGS];
+ struct vhost_queue
tx_vhost_queues[RTE_PMD_RING_MAX_TX_RINGS];
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+ TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static struct rte_eth_link pmd_link = {
+ .link_speed = 10000,
+ .link_duplex = ETH_LINK_FULL_DUPLEX,
+ .link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t nb_rx = 0;
+
+ if (unlikely(r->internal == NULL))
+ return 0;
+
+ if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+ return 0;
+
+ rte_atomic16_set(&r->rx_executing, 1);
+
+ if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+ goto out;
+
+ nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+ VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
+
+ rte_atomic64_add(&(r->rx_pkts), nb_rx);
+
+ rte_atomic16_set(&r->rx_executing, 0);
+
+ return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(r->internal == NULL))
+ return 0;
+
+ if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+ return 0;
+
+ rte_atomic16_set(&r->tx_executing, 1);
+
+ if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+ goto out;
+
+ nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+ VIRTIO_RXQ, bufs, nb_bufs);
+
+ rte_atomic64_add(&(r->tx_pkts), nb_tx);
+ rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
We may not always want to free these mbufs. For example, if a call is made to rte_eth_tx_burst with buffers from another (non DPDK) source, they may not be ours to free.
Post by Tetsuya Mukawa
+
+ rte_atomic16_set(&r->tx_executing, 0);
+
+ return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ return rte_vhost_driver_register(internal->iface_name);
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ rte_vhost_driver_unregister(internal->iface_name);
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ internal->rx_vhost_queues[rx_queue_id].mb_pool = mb_pool;
+ dev->data->rx_queues[rx_queue_id] = &internal-
Post by Tetsuya Mukawa
rx_vhost_queues[rx_queue_id];
+ return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ dev->data->tx_queues[tx_queue_id] = &internal-
Post by Tetsuya Mukawa
tx_vhost_queues[tx_queue_id];
+ return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+ struct rte_eth_dev_info *dev_info)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ dev_info->driver_name = drivername;
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)-1;
+ dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+ dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+ dev_info->min_rx_bufsize = 0;
+ dev_info->pci_dev = dev->pci_dev;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+ unsigned i;
+ unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+ const struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_rx_queues; i++) {
+ igb_stats->q_ipackets[i] = internal-
Post by Tetsuya Mukawa
rx_vhost_queues[i].rx_pkts.cnt;
+ rx_total += igb_stats->q_ipackets[i];
+ }
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_tx_queues; i++) {
+ igb_stats->q_opackets[i] = internal-
Post by Tetsuya Mukawa
tx_vhost_queues[i].tx_pkts.cnt;
+ igb_stats->q_errors[i] = internal-
Post by Tetsuya Mukawa
tx_vhost_queues[i].err_pkts.cnt;
+ tx_total += igb_stats->q_opackets[i];
+ tx_err_total += igb_stats->q_errors[i];
+ }
+
+ igb_stats->ipackets = rx_total;
+ igb_stats->opackets = tx_total;
+ igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+ unsigned i;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++)
+ internal->rx_vhost_queues[i].rx_pkts.cnt = 0;
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ internal->tx_vhost_queues[i].tx_pkts.cnt = 0;
+ internal->tx_vhost_queues[i].err_pkts.cnt = 0;
+ }
+}
+
+static void
+eth_queue_release(void *q __rte_unused) { ; }
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused) { return 0; }
+
+static const struct eth_dev_ops ops = {
+ .dev_start = eth_dev_start,
+ .dev_stop = eth_dev_stop,
+ .dev_configure = eth_dev_configure,
+ .dev_infos_get = eth_dev_info,
+ .rx_queue_setup = eth_rx_queue_setup,
+ .tx_queue_setup = eth_tx_queue_setup,
+ .rx_queue_release = eth_queue_release,
+ .tx_queue_release = eth_queue_release,
+ .link_update = eth_link_update,
+ .stats_get = eth_stats_get,
+ .stats_reset = eth_stats_reset,
+};
+
+static struct eth_driver rte_vhost_pmd = {
+ .pci_drv = {
+ .name = "rte_vhost_pmd",
+ .drv_flags = RTE_PCI_DRV_DETACHABLE,
+ },
+};
+
+static struct rte_pci_id id_table;
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+ int found = 0;
+ struct pmd_internal *internal;
+
+ if (ifname == NULL)
+ return NULL;
+
+ pthread_mutex_lock(&internal_list_lock);
+
+ TAILQ_FOREACH(internal, &internals_list, next) {
+ if (!strcmp(internal->iface_name, ifname)) {
+ found = 1;
+ break;
+ }
+ }
+
+ pthread_mutex_unlock(&internal_list_lock);
+
+ if (!found)
+ return NULL;
+
+ return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "invalid argument\n");
+ return -1;
+ }
+
+ internal = find_internal_resource(dev->ifname);
+ if (internal == NULL) {
+ RTE_LOG(INFO, PMD, "invalid device name\n");
+ return -1;
+ }
+
+ eth_dev = rte_eth_dev_allocated(internal->dev_name);
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
Typo: Failure. Same for the destroy_device function
Post by Tetsuya Mukawa
+ return -1;
+ }
+
+ internal = eth_dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ vq->device = dev;
+ vq->internal = internal;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ vq->device = dev;
+ vq->internal = internal;
+ }
+
+ dev->flags |= VIRTIO_DEV_RUNNING;
+ dev->priv = eth_dev;
+
+ eth_dev->data->dev_link.link_status = 1;
+ rte_atomic16_set(&internal->xfer, 1);
+
+ RTE_LOG(INFO, PMD, "New connection established\n");
+
+ return 0;
Some freedom is taken away if the new_device and destroy_device callbacks are implemented in the driver.
For example if one wishes to call the rte_vhost_enable_guest_notification function when a new device is brought up. They cannot now as there is no scope to modify these callbacks, as is done in for example the vHost sample app. Is this correct?
Post by Tetsuya Mukawa
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "invalid argument\n");
+ return;
+ }
+
+ eth_dev = (struct rte_eth_dev *)dev->priv;
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "failuer to find a ethdev\n");
+ return;
+ }
+
+ internal = eth_dev->data->dev_private;
+
+ /* Wait until rx/tx_pkt_burst stops accesing vhost device */
+ rte_atomic16_set(&internal->xfer, 0);
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ while (rte_atomic16_read(&vq->rx_executing))
+ rte_pause();
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ while (rte_atomic16_read(&vq->tx_executing))
+ rte_pause();
+ }
+
+ eth_dev->data->dev_link.link_status = 0;
+
+ dev->priv = NULL;
+ dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ vq->device = NULL;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ vq->device = NULL;
+ }
+
+ RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *vhost_driver_session(void *param __rte_unused)
+{
+ static struct virtio_net_device_ops *vhost_ops;
+
+ vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+ if (vhost_ops == NULL)
+ rte_panic("Can't allocate memory\n");
+
+ /* set vhost arguments */
+ vhost_ops->new_device = new_device;
+ vhost_ops->destroy_device = destroy_device;
+ if (rte_vhost_driver_callback_register(vhost_ops) < 0)
+ rte_panic("Can't register callbacks\n");
+
+ /* start event handling */
+ rte_vhost_driver_session_start();
+
+ rte_free(vhost_ops);
+ pthread_exit(0);
+}
+
+static pthread_once_t once_cont = PTHREAD_ONCE_INIT;
+static pthread_t session_th;
+
+static void vhost_driver_session_start(void)
+{
+ int ret;
+
+ ret = pthread_create(&session_th, NULL, vhost_driver_session, NULL);
+ if (ret)
+ rte_panic("Can't create a thread\n");
+}
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+ char *iface_name,
+ const unsigned numa_node)
+{
+ struct rte_eth_dev_data *data = NULL;
+ struct rte_pci_device *pci_dev = NULL;
+ struct pmd_internal *internal = NULL;
+ struct rte_eth_dev *eth_dev = NULL;
+ struct ether_addr *eth_addr = NULL;
+ uint16_t nb_rx_queues = 1;
+ uint16_t nb_tx_queues = 1;
+
+ RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa
socket %u\n",
+ numa_node);
+
+ /* now do all data allocation - for eth_dev structure, dummy pci driver
+ * and internal (private) data
+ */
+ data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+ if (data == NULL)
+ goto error;
+
+ pci_dev = rte_zmalloc_socket(name, sizeof(*pci_dev), 0,
numa_node);
+ if (pci_dev == NULL)
+ goto error;
+
+ internal = rte_zmalloc_socket(name, sizeof(*internal), 0,
numa_node);
+ if (internal == NULL)
+ goto error;
+
+ eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0,
numa_node);
+ if (eth_addr == NULL)
+ goto error;
+ *eth_addr = base_eth_addr;
+ eth_addr->addr_bytes[5] = index;
+
+ /* reserve an ethdev entry */
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+ if (eth_dev == NULL)
+ goto error;
+
+ /* now put it all together
+ * - store queue data in internal,
+ * - store numa_node info in pci_driver
+ * - point eth_dev_data to internal and pci_driver
+ * - and point eth_dev structure to new eth_dev_data structure
+ */
+ internal->nb_rx_queues = nb_rx_queues;
+ internal->nb_tx_queues = nb_tx_queues;
+ internal->dev_name = strdup(name);
+ if (internal->dev_name == NULL)
+ goto error;
+ internal->iface_name = strdup(iface_name);
+ if (internal->iface_name == NULL)
+ goto error;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_INSERT_TAIL(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ rte_vhost_pmd.pci_drv.name = drivername;
+ rte_vhost_pmd.pci_drv.id_table = &id_table;
+
+ pci_dev->numa_node = numa_node;
+ pci_dev->driver = &rte_vhost_pmd.pci_drv;
+
+ data->dev_private = internal;
+ data->port_id = eth_dev->data->port_id;
+ memmove(data->name, eth_dev->data->name, sizeof(data-
Post by Tetsuya Mukawa
name));
+ data->nb_rx_queues = (uint16_t)nb_rx_queues;
+ data->nb_tx_queues = (uint16_t)nb_tx_queues;
+ data->dev_link = pmd_link;
+ data->mac_addrs = eth_addr;
+
+ /* We'll replace the 'data' originally allocated by eth_dev. So the
+ * vhost PMD resources won't be shared between multi processes.
+ */
+ eth_dev->data = data;
+ eth_dev->driver = &rte_vhost_pmd;
+ eth_dev->dev_ops = &ops;
+ eth_dev->pci_dev = pci_dev;
+
+ /* finally assign rx and tx ops */
+ eth_dev->rx_pkt_burst = eth_vhost_rx;
+ eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+ /* start vhost driver session. It should be called only once */
+ pthread_once(&once_cont, vhost_driver_session_start);
+
+ return data->port_id;
+
+ rte_free(data);
+ rte_free(pci_dev);
+ rte_free(internal);
+ rte_free(eth_addr);
+
+ return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **iface_name = extra_args;
+
+ if (value == NULL)
+ return -1;
+
+ *iface_name = value;
+
+ return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+ struct rte_kvargs *kvlist = NULL;
+ int ret = 0;
+ int index;
+ char *iface_name;
+
+ RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+ kvlist = rte_kvargs_parse(params, valid_arguments);
+ if (kvlist == NULL)
+ return -1;
+
+ if (strlen(name) < strlen("eth_vhost"))
+ return -1;
+
+ index = strtol(name + strlen("eth_vhost"), NULL, 0);
+ if (errno == ERANGE)
+ return -1;
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+ &open_iface, &iface_name);
+ if (ret < 0)
+ goto out_free;
+
+ eth_dev_vhost_create(name, index, iface_name,
rte_socket_id());
+ }
+
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+ struct rte_eth_dev *eth_dev = NULL;
+ struct pmd_internal *internal;
+
+ RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+ if (name == NULL)
+ return -EINVAL;
+
+ /* find an ethdev entry */
+ eth_dev = rte_eth_dev_allocated(name);
+ if (eth_dev == NULL)
+ return -ENODEV;
+
+ internal = eth_dev->data->dev_private;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_REMOVE(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ eth_dev_stop(eth_dev);
+
+ if ((internal) && (internal->dev_name))
+ free(internal->dev_name);
+ if ((internal) && (internal->iface_name))
+ free(internal->iface_name);
+ rte_free(eth_dev->data->dev_private);
+ rte_free(eth_dev->data);
+ rte_free(eth_dev->pci_dev);
+
+ rte_eth_dev_release_port(eth_dev);
+ return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+ .name = "eth_vhost",
+ .type = PMD_VDEV,
+ .init = rte_pmd_vhost_devinit,
+ .uninit = rte_pmd_vhost_devuninit,
+};
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map
b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..5151684
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,4 @@
+DPDK_2.2 {
+
+ local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 3871205..1c42fb1 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -144,7 +144,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -
lrte_pmd_pcap
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += -
lrte_pmd_af_packet
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += -lrte_pmd_null
-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)
endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
--
2.1.4
Tetsuya Mukawa
2015-10-16 08:40:36 UTC
Permalink
Post by Loftus, Ciara
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The PMD can have 'iface' parameter like below to specify a path to connect
to a virtio-net device.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
---
config/common_linuxapp | 6 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 61 +++
drivers/net/vhost/rte_eth_vhost.c | 640
++++++++++++++++++++++++++++
drivers/net/vhost/rte_pmd_vhost_version.map | 4 +
mk/rte.app.mk | 8 +-
6 files changed, 722 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
+struct pmd_internal {
+ TAILQ_ENTRY(pmd_internal) next;
+ char *dev_name;
+ char *iface_name;
+ unsigned nb_rx_queues;
+ unsigned nb_tx_queues;
+ rte_atomic16_t xfer;
Is this flag just used to indicate the state of the virtio_net device?
Ie. if =0 then virtio_dev=NULL and if =1 then virtio_net !=NULL & the VIRTIO_DEV_RUNNING flag is set?
Hi Clara,

I am sorry for very late reply.

Yes, it is. Probably we can optimize it more.
I will change this implementation a bit in next patches.
Could you please check it?
Post by Loftus, Ciara
Post by Tetsuya Mukawa
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(r->internal == NULL))
+ return 0;
+
+ if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+ return 0;
+
+ rte_atomic16_set(&r->tx_executing, 1);
+
+ if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+ goto out;
+
+ nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+ VIRTIO_RXQ, bufs, nb_bufs);
+
+ rte_atomic64_add(&(r->tx_pkts), nb_tx);
+ rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
We may not always want to free these mbufs. For example, if a call is made to rte_eth_tx_burst with buffers from another (non DPDK) source, they may not be ours to free.
Sorry, I am not sure what type of buffer you want to transfer.

This is a PMD that wraps librte_vhost.
And I guess other PMDs cannot handle buffers from another non DPDK source.
Should we take care such buffers?

I have also checked af_packet PMD.
It seems the tx function of af_packet PMD just frees mbuf.
Post by Loftus, Ciara
Post by Tetsuya Mukawa
+
+
+ eth_dev = rte_eth_dev_allocated(internal->dev_name);
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
Typo: Failure. Same for the destroy_device function
Thanks, I will fix it in next patches.
Post by Loftus, Ciara
Post by Tetsuya Mukawa
+ return -1;
+ }
+
+ internal = eth_dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ vq->device = dev;
+ vq->internal = internal;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ vq->device = dev;
+ vq->internal = internal;
+ }
+
+ dev->flags |= VIRTIO_DEV_RUNNING;
+ dev->priv = eth_dev;
+
+ eth_dev->data->dev_link.link_status = 1;
+ rte_atomic16_set(&internal->xfer, 1);
+
+ RTE_LOG(INFO, PMD, "New connection established\n");
+
+ return 0;
Some freedom is taken away if the new_device and destroy_device callbacks are implemented in the driver.
For example if one wishes to call the rte_vhost_enable_guest_notification function when a new device is brought up. They cannot now as there is no scope to modify these callbacks, as is done in for example the vHost sample app. Is this correct?
So how about adding one more parameter to be able to choose guest
notification behavior?

ex)
./testpmd --vdev 'eth_vhost0,iface=/tmp/sock0,guest_notification=0'

In above case, all queues in this device will have VRING_USED_F_NO_NOTIFY.

Thanks,
Tetsuya
Loftus, Ciara
2015-10-20 14:13:08 UTC
Permalink
Post by Tetsuya Mukawa
Post by Loftus, Ciara
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin
wrapper
Post by Loftus, Ciara
Post by Tetsuya Mukawa
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The PMD can have 'iface' parameter like below to specify a path to
connect
Post by Loftus, Ciara
Post by Tetsuya Mukawa
to a virtio-net device.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
---
config/common_linuxapp | 6 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 61 +++
drivers/net/vhost/rte_eth_vhost.c | 640
++++++++++++++++++++++++++++
drivers/net/vhost/rte_pmd_vhost_version.map | 4 +
mk/rte.app.mk | 8 +-
6 files changed, 722 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
+struct pmd_internal {
+ TAILQ_ENTRY(pmd_internal) next;
+ char *dev_name;
+ char *iface_name;
+ unsigned nb_rx_queues;
+ unsigned nb_tx_queues;
+ rte_atomic16_t xfer;
Is this flag just used to indicate the state of the virtio_net device?
Ie. if =0 then virtio_dev=NULL and if =1 then virtio_net !=NULL & the
VIRTIO_DEV_RUNNING flag is set?
Hi Clara,
I am sorry for very late reply.
Yes, it is. Probably we can optimize it more.
I will change this implementation a bit in next patches.
Could you please check it?
Of course, thanks.
Post by Tetsuya Mukawa
Post by Loftus, Ciara
Post by Tetsuya Mukawa
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(r->internal == NULL))
+ return 0;
+
+ if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+ return 0;
+
+ rte_atomic16_set(&r->tx_executing, 1);
+
+ if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+ goto out;
+
+ nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+ VIRTIO_RXQ, bufs, nb_bufs);
+
+ rte_atomic64_add(&(r->tx_pkts), nb_tx);
+ rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
We may not always want to free these mbufs. For example, if a call is made
to rte_eth_tx_burst with buffers from another (non DPDK) source, they may
not be ours to free.
Sorry, I am not sure what type of buffer you want to transfer.
This is a PMD that wraps librte_vhost.
And I guess other PMDs cannot handle buffers from another non DPDK source.
Should we take care such buffers?
I have also checked af_packet PMD.
It seems the tx function of af_packet PMD just frees mbuf.
For example if using the PMD with an application that receives buffers from another source. Eg. a virtual switch receiving packets from an interface using the kernel driver.
I see that af_packet also frees the mbuf. I've checked the ixgbe and ring pmds though and they don't seem to free the buffers, although I may have missed something, the code for these is rather large and I am unfamiliar with most of it. If I am correct though, should this behaviour vary from PMD to PMD I wonder?
Post by Tetsuya Mukawa
Post by Loftus, Ciara
Post by Tetsuya Mukawa
+
+
+ eth_dev = rte_eth_dev_allocated(internal->dev_name);
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
Typo: Failure. Same for the destroy_device function
Thanks, I will fix it in next patches.
Post by Loftus, Ciara
Post by Tetsuya Mukawa
+ return -1;
+ }
+
+ internal = eth_dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ vq->device = dev;
+ vq->internal = internal;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ vq->device = dev;
+ vq->internal = internal;
+ }
+
+ dev->flags |= VIRTIO_DEV_RUNNING;
+ dev->priv = eth_dev;
+
+ eth_dev->data->dev_link.link_status = 1;
+ rte_atomic16_set(&internal->xfer, 1);
+
+ RTE_LOG(INFO, PMD, "New connection established\n");
+
+ return 0;
Some freedom is taken away if the new_device and destroy_device
callbacks are implemented in the driver.
Post by Loftus, Ciara
For example if one wishes to call the rte_vhost_enable_guest_notification
function when a new device is brought up. They cannot now as there is no
scope to modify these callbacks, as is done in for example the vHost sample
app. Is this correct?
So how about adding one more parameter to be able to choose guest
notification behavior?
ex)
./testpmd --vdev 'eth_vhost0,iface=/tmp/sock0,guest_notification=0'
In above case, all queues in this device will have
VRING_USED_F_NO_NOTIFY.
I'm not too concerned about this particular function, I was just making an example. The main concern I was expressing here and in the other thread with Bruce, is the risk that we will lose some functionality available in the library but not in the PMD. This function is an example of that. If we could find some way to retain the functionality available in the library, it would be ideal.

Thanks for the response! I will review and test further patches if they become available.

Ciara
Post by Tetsuya Mukawa
Thanks,
Tetsuya
Tetsuya Mukawa
2015-10-21 04:30:54 UTC
Permalink
Post by Loftus, Ciara
Post by Loftus, Ciara
Post by Loftus, Ciara
Post by Tetsuya Mukawa
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(r->internal == NULL))
+ return 0;
+
+ if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+ return 0;
+
+ rte_atomic16_set(&r->tx_executing, 1);
+
+ if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+ goto out;
+
+ nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+ VIRTIO_RXQ, bufs, nb_bufs);
+
+ rte_atomic64_add(&(r->tx_pkts), nb_tx);
+ rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
We may not always want to free these mbufs. For example, if a call is made
to rte_eth_tx_burst with buffers from another (non DPDK) source, they may
not be ours to free.
Sorry, I am not sure what type of buffer you want to transfer.
This is a PMD that wraps librte_vhost.
And I guess other PMDs cannot handle buffers from another non DPDK source.
Should we take care such buffers?
I have also checked af_packet PMD.
It seems the tx function of af_packet PMD just frees mbuf.
For example if using the PMD with an application that receives buffers from another source. Eg. a virtual switch receiving packets from an interface using the kernel driver.
For example, if a software switch on host tries to send data to DPDK
application on guest using vhost PMD and virtio-net PMD.
Also let's assume transfer data of software switch is come from kernel
driver.
In this case, these data on software switch will be copied and
transferred to virio-net PMD through virtqueue.
Because of this, we can free data after sending.
Could you please also check API documentation rte_eth_tx_burst?
(Freeing buffer is default behavior)
Post by Loftus, Ciara
I see that af_packet also frees the mbuf. I've checked the ixgbe and ring pmds though and they don't seem to free the buffers, although I may have missed something, the code for these is rather large and I am unfamiliar with most of it. If I am correct though, should this behaviour vary from PMD to PMD I wonder?
I guess ring PMD is something special.
Because we don't want to copy data with this PMD, RX function doesn't
allocate buffers, also TX function doesn't free buffers.
But other normal PMD will allocate buffers when RX is called, and free
buffers when TX is called.
Post by Loftus, Ciara
Post by Loftus, Ciara
Post by Loftus, Ciara
Post by Tetsuya Mukawa
+
+
+ eth_dev = rte_eth_dev_allocated(internal->dev_name);
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
Typo: Failure. Same for the destroy_device function
Thanks, I will fix it in next patches.
Post by Loftus, Ciara
Post by Tetsuya Mukawa
+ return -1;
+ }
+
+ internal = eth_dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ vq->device = dev;
+ vq->internal = internal;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ vq->device = dev;
+ vq->internal = internal;
+ }
+
+ dev->flags |= VIRTIO_DEV_RUNNING;
+ dev->priv = eth_dev;
+
+ eth_dev->data->dev_link.link_status = 1;
+ rte_atomic16_set(&internal->xfer, 1);
+
+ RTE_LOG(INFO, PMD, "New connection established\n");
+
+ return 0;
Some freedom is taken away if the new_device and destroy_device
callbacks are implemented in the driver.
Post by Loftus, Ciara
For example if one wishes to call the rte_vhost_enable_guest_notification
function when a new device is brought up. They cannot now as there is no
scope to modify these callbacks, as is done in for example the vHost sample
app. Is this correct?
So how about adding one more parameter to be able to choose guest
notification behavior?
ex)
./testpmd --vdev 'eth_vhost0,iface=/tmp/sock0,guest_notification=0'
In above case, all queues in this device will have
VRING_USED_F_NO_NOTIFY.
I'm not too concerned about this particular function, I was just making an example. The main concern I was expressing here and in the other thread with Bruce, is the risk that we will lose some functionality available in the library but not in the PMD. This function is an example of that. If we could find some way to retain the functionality available in the library, it would be ideal.
I will reply to an other thread.
Anyway, I am going to keep current vhost library APIs.

Thanks,
Tetsuya
Bruce Richardson
2015-10-21 10:09:39 UTC
Permalink
Post by Tetsuya Mukawa
Post by Loftus, Ciara
I see that af_packet also frees the mbuf. I've checked the ixgbe and ring pmds though and they don't seem to free the buffers, although I may have missed something, the code for these is rather large and I am unfamiliar with most of it. If I am correct though, should this behaviour vary from PMD to PMD I wonder?
I guess ring PMD is something special.
Because we don't want to copy data with this PMD, RX function doesn't
allocate buffers, also TX function doesn't free buffers.
But other normal PMD will allocate buffers when RX is called, and free
buffers when TX is called.
Yes, this is correct. Ring pmd is the exception since it automatically recycles
buffers, and so does not need to alloc/free mbufs. (ixgbe frees the buffers
post-TX as part of the TX ring cleanup)

/Bruce
Bruce Richardson
2015-10-16 12:52:54 UTC
Permalink
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The PMD can have 'iface' parameter like below to specify a path to connect
to a virtio-net device.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
With this PMD in place, is there any need to keep the existing vhost library
around as a separate entity? Can the existing library be subsumed/converted into
a standard PMD?

/Bruce
Tetsuya Mukawa
2015-10-19 01:51:00 UTC
Permalink
Post by Bruce Richardson
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The PMD can have 'iface' parameter like below to specify a path to connect
to a virtio-net device.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
With this PMD in place, is there any need to keep the existing vhost library
around as a separate entity? Can the existing library be subsumed/converted into
a standard PMD?
/Bruce
Hi Bruce,

I concern about whether the PMD has all features of librte_vhost,
because librte_vhost provides more features and freedom than ethdev API
provides.
In some cases, user needs to choose limited implementation without
librte_vhost.
I am going to eliminate such cases while implementing the PMD.
But I don't have strong belief that we can remove librte_vhost now.

So how about keeping current separation in next DPDK?
I guess people will try to replace librte_vhost to vhost PMD, because
apparently using ethdev APIs will be useful in many cases.
And we will get feedbacks like "vhost PMD needs to support like this usage".
(Or we will not have feedbacks, but it's also OK.)
Then, we will be able to merge librte_vhost and vhost PMD.

Thanks,
Tetsuya
Loftus, Ciara
2015-10-19 09:32:50 UTC
Permalink
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin
wrapper
Post by Bruce Richardson
Post by Tetsuya Mukawa
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The PMD can have 'iface' parameter like below to specify a path to
connect
Post by Bruce Richardson
Post by Tetsuya Mukawa
to a virtio-net device.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
With this PMD in place, is there any need to keep the existing vhost library
around as a separate entity? Can the existing library be
subsumed/converted into
Post by Bruce Richardson
a standard PMD?
/Bruce
Hi Bruce,
I concern about whether the PMD has all features of librte_vhost,
because librte_vhost provides more features and freedom than ethdev API
provides.
In some cases, user needs to choose limited implementation without
librte_vhost.
I am going to eliminate such cases while implementing the PMD.
But I don't have strong belief that we can remove librte_vhost now.
So how about keeping current separation in next DPDK?
I guess people will try to replace librte_vhost to vhost PMD, because
apparently using ethdev APIs will be useful in many cases.
And we will get feedbacks like "vhost PMD needs to support like this usage".
(Or we will not have feedbacks, but it's also OK.)
Then, we will be able to merge librte_vhost and vhost PMD.
I agree with the above. One the concerns I had when reviewing the patch was that the PMD removes some freedom that is available with the library. Eg. Ability to implement the new_device and destroy_device callbacks. If using the PMD you are constrained to the implementations of these in the PMD driver, but if using librte_vhost, you can implement your own with whatever functionality you like - a good example of this can be seen in the vhost sample app.
On the other hand, the PMD is useful in that it removes a lot of complexity for the user and may work for some more general use cases. So I would be in favour of having both options available too.

Ciara
Post by Tetsuya Mukawa
Thanks,
Tetsuya
Bruce Richardson
2015-10-19 09:45:32 UTC
Permalink
Post by Loftus, Ciara
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin
wrapper
Post by Bruce Richardson
Post by Tetsuya Mukawa
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The PMD can have 'iface' parameter like below to specify a path to
connect
Post by Bruce Richardson
Post by Tetsuya Mukawa
to a virtio-net device.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
With this PMD in place, is there any need to keep the existing vhost library
around as a separate entity? Can the existing library be
subsumed/converted into
Post by Bruce Richardson
a standard PMD?
/Bruce
Hi Bruce,
I concern about whether the PMD has all features of librte_vhost,
because librte_vhost provides more features and freedom than ethdev API
provides.
In some cases, user needs to choose limited implementation without
librte_vhost.
I am going to eliminate such cases while implementing the PMD.
But I don't have strong belief that we can remove librte_vhost now.
So how about keeping current separation in next DPDK?
I guess people will try to replace librte_vhost to vhost PMD, because
apparently using ethdev APIs will be useful in many cases.
And we will get feedbacks like "vhost PMD needs to support like this usage".
(Or we will not have feedbacks, but it's also OK.)
Then, we will be able to merge librte_vhost and vhost PMD.
I agree with the above. One the concerns I had when reviewing the patch was that the PMD removes some freedom that is available with the library. Eg. Ability to implement the new_device and destroy_device callbacks. If using the PMD you are constrained to the implementations of these in the PMD driver, but if using librte_vhost, you can implement your own with whatever functionality you like - a good example of this can be seen in the vhost sample app.
On the other hand, the PMD is useful in that it removes a lot of complexity for the user and may work for some more general use cases. So I would be in favour of having both options available too.
Ciara
Thanks.
However, just because the libraries are merged does not mean that you need
be limited by PMD functionality. Many PMDs provide additional library-specific
functions over and above their PMD capabilities. The bonded PMD is a good example
here, as it has a whole set of extra functions to create and manipulate bonded
devices - things that are obviously not part of the general ethdev API. Other
vPMDs similarly include functions to allow them to be created on the fly too.

regards,
/Bruce
Tetsuya Mukawa
2015-10-19 10:50:26 UTC
Permalink
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin
wrapper
Post by Bruce Richardson
Post by Tetsuya Mukawa
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The PMD can have 'iface' parameter like below to specify a path to
connect
Post by Bruce Richardson
Post by Tetsuya Mukawa
to a virtio-net device.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
With this PMD in place, is there any need to keep the existing vhost library
around as a separate entity? Can the existing library be
subsumed/converted into
Post by Bruce Richardson
a standard PMD?
/Bruce
Hi Bruce,
I concern about whether the PMD has all features of librte_vhost,
because librte_vhost provides more features and freedom than ethdev API
provides.
In some cases, user needs to choose limited implementation without
librte_vhost.
I am going to eliminate such cases while implementing the PMD.
But I don't have strong belief that we can remove librte_vhost now.
So how about keeping current separation in next DPDK?
I guess people will try to replace librte_vhost to vhost PMD, because
apparently using ethdev APIs will be useful in many cases.
And we will get feedbacks like "vhost PMD needs to support like this usage".
(Or we will not have feedbacks, but it's also OK.)
Then, we will be able to merge librte_vhost and vhost PMD.
I agree with the above. One the concerns I had when reviewing the patch was that the PMD removes some freedom that is available with the library. Eg. Ability to implement the new_device and destroy_device callbacks. If using the PMD you are constrained to the implementations of these in the PMD driver, but if using librte_vhost, you can implement your own with whatever functionality you like - a good example of this can be seen in the vhost sample app.
On the other hand, the PMD is useful in that it removes a lot of complexity for the user and may work for some more general use cases. So I would be in favour of having both options available too.
Ciara
Thanks.
However, just because the libraries are merged does not mean that you need
be limited by PMD functionality. Many PMDs provide additional library-specific
functions over and above their PMD capabilities. The bonded PMD is a good example
here, as it has a whole set of extra functions to create and manipulate bonded
devices - things that are obviously not part of the general ethdev API. Other
vPMDs similarly include functions to allow them to be created on the fly too.
regards,
/Bruce
Hi Bruce,

I appreciate for showing a good example. I haven't noticed the PMD.
I will check the bonding PMD, and try to remove librte_vhost without
losing freedom and features of the library.

Regards,
Tetsuya
Panu Matilainen
2015-10-19 13:26:21 UTC
Permalink
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin
wrapper
Post by Bruce Richardson
Post by Tetsuya Mukawa
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The PMD can have 'iface' parameter like below to specify a path to
connect
Post by Bruce Richardson
Post by Tetsuya Mukawa
to a virtio-net device.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
With this PMD in place, is there any need to keep the existing vhost library
around as a separate entity? Can the existing library be
subsumed/converted into
Post by Bruce Richardson
a standard PMD?
/Bruce
Hi Bruce,
I concern about whether the PMD has all features of librte_vhost,
because librte_vhost provides more features and freedom than ethdev API
provides.
In some cases, user needs to choose limited implementation without
librte_vhost.
I am going to eliminate such cases while implementing the PMD.
But I don't have strong belief that we can remove librte_vhost now.
So how about keeping current separation in next DPDK?
I guess people will try to replace librte_vhost to vhost PMD, because
apparently using ethdev APIs will be useful in many cases.
And we will get feedbacks like "vhost PMD needs to support like this usage".
(Or we will not have feedbacks, but it's also OK.)
Then, we will be able to merge librte_vhost and vhost PMD.
I agree with the above. One the concerns I had when reviewing the patch was that the PMD removes some freedom that is available with the library. Eg. Ability to implement the new_device and destroy_device callbacks. If using the PMD you are constrained to the implementations of these in the PMD driver, but if using librte_vhost, you can implement your own with whatever functionality you like - a good example of this can be seen in the vhost sample app.
On the other hand, the PMD is useful in that it removes a lot of complexity for the user and may work for some more general use cases. So I would be in favour of having both options available too.
Ciara
Thanks.
However, just because the libraries are merged does not mean that you need
be limited by PMD functionality. Many PMDs provide additional library-specific
functions over and above their PMD capabilities. The bonded PMD is a good example
here, as it has a whole set of extra functions to create and manipulate bonded
devices - things that are obviously not part of the general ethdev API. Other
vPMDs similarly include functions to allow them to be created on the fly too.
regards,
/Bruce
Hi Bruce,
I appreciate for showing a good example. I haven't noticed the PMD.
I will check the bonding PMD, and try to remove librte_vhost without
losing freedom and features of the library.
Hi,

Just a gentle reminder - if you consider removing (even if by just
replacing/renaming) an entire library, it needs to happen the ABI
deprecation process.

It seems obvious enough. But for all the ABI policing here, somehow we
all failed to notice the two compatibility breaking rename-elephants in
the room during 2.1 development:
- libintel_dpdk was renamed to libdpdk
- librte_pmd_virtio_uio was renamed to librte_pmd_virtio

Of course these cases are easy to work around with symlinks, and are
unrelated to the matter at hand. Just wanting to make sure such things
dont happen again.

- Panu -
Richardson, Bruce
2015-10-19 13:27:39 UTC
Permalink
-----Original Message-----
Sent: Monday, October 19, 2015 2:26 PM
Subject: Re: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin
wrapper
Post by Bruce Richardson
Post by Tetsuya Mukawa
of librte_vhost. It means librte_vhost is also needed to compile
the PMD.
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Tetsuya Mukawa
The PMD can have 'iface' parameter like below to specify a path to
connect
Post by Bruce Richardson
Post by Tetsuya Mukawa
to a virtio-net device.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
With this PMD in place, is there any need to keep the existing
vhost library around as a separate entity? Can the existing
library be
subsumed/converted into
Post by Bruce Richardson
a standard PMD?
/Bruce
Hi Bruce,
I concern about whether the PMD has all features of librte_vhost,
because librte_vhost provides more features and freedom than ethdev
API provides.
In some cases, user needs to choose limited implementation without
librte_vhost.
I am going to eliminate such cases while implementing the PMD.
But I don't have strong belief that we can remove librte_vhost now.
So how about keeping current separation in next DPDK?
I guess people will try to replace librte_vhost to vhost PMD,
because apparently using ethdev APIs will be useful in many cases.
And we will get feedbacks like "vhost PMD needs to support like this
usage".
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
(Or we will not have feedbacks, but it's also OK.) Then, we will be
able to merge librte_vhost and vhost PMD.
I agree with the above. One the concerns I had when reviewing the
patch was that the PMD removes some freedom that is available with the
library. Eg. Ability to implement the new_device and destroy_device
callbacks. If using the PMD you are constrained to the implementations of
these in the PMD driver, but if using librte_vhost, you can implement your
own with whatever functionality you like - a good example of this can be
seen in the vhost sample app.
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
On the other hand, the PMD is useful in that it removes a lot of
complexity for the user and may work for some more general use cases. So I
would be in favour of having both options available too.
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Ciara
Thanks.
However, just because the libraries are merged does not mean that you
need be limited by PMD functionality. Many PMDs provide additional
library-specific functions over and above their PMD capabilities. The
bonded PMD is a good example here, as it has a whole set of extra
functions to create and manipulate bonded devices - things that are
obviously not part of the general ethdev API. Other vPMDs similarly
include functions to allow them to be created on the fly too.
Post by Tetsuya Mukawa
Post by Bruce Richardson
regards,
/Bruce
Hi Bruce,
I appreciate for showing a good example. I haven't noticed the PMD.
I will check the bonding PMD, and try to remove librte_vhost without
losing freedom and features of the library.
Hi,
Just a gentle reminder - if you consider removing (even if by just
replacing/renaming) an entire library, it needs to happen the ABI
deprecation process.
It seems obvious enough. But for all the ABI policing here, somehow we all
failed to notice the two compatibility breaking rename-elephants in the
- libintel_dpdk was renamed to libdpdk
- librte_pmd_virtio_uio was renamed to librte_pmd_virtio
Of course these cases are easy to work around with symlinks, and are
unrelated to the matter at hand. Just wanting to make sure such things
dont happen again.
- Panu -
Still doesn't hurt to remind us, Panu! Th
Tetsuya Mukawa
2015-10-21 04:35:56 UTC
Permalink
-----Original Message-----
Sent: Monday, October 19, 2015 2:26 PM
Subject: Re: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin
wrapper
Post by Bruce Richardson
Post by Tetsuya Mukawa
of librte_vhost. It means librte_vhost is also needed to compile
the PMD.
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Tetsuya Mukawa
The PMD can have 'iface' parameter like below to specify a path to
connect
Post by Bruce Richardson
Post by Tetsuya Mukawa
to a virtio-net device.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
With this PMD in place, is there any need to keep the existing
vhost library around as a separate entity? Can the existing
library be
subsumed/converted into
Post by Bruce Richardson
a standard PMD?
/Bruce
Hi Bruce,
I concern about whether the PMD has all features of librte_vhost,
because librte_vhost provides more features and freedom than ethdev
API provides.
In some cases, user needs to choose limited implementation without
librte_vhost.
I am going to eliminate such cases while implementing the PMD.
But I don't have strong belief that we can remove librte_vhost now.
So how about keeping current separation in next DPDK?
I guess people will try to replace librte_vhost to vhost PMD,
because apparently using ethdev APIs will be useful in many cases.
And we will get feedbacks like "vhost PMD needs to support like this
usage".
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
(Or we will not have feedbacks, but it's also OK.) Then, we will be
able to merge librte_vhost and vhost PMD.
I agree with the above. One the concerns I had when reviewing the
patch was that the PMD removes some freedom that is available with the
library. Eg. Ability to implement the new_device and destroy_device
callbacks. If using the PMD you are constrained to the implementations of
these in the PMD driver, but if using librte_vhost, you can implement your
own with whatever functionality you like - a good example of this can be
seen in the vhost sample app.
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
On the other hand, the PMD is useful in that it removes a lot of
complexity for the user and may work for some more general use cases. So I
would be in favour of having both options available too.
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Ciara
Thanks.
However, just because the libraries are merged does not mean that you
need be limited by PMD functionality. Many PMDs provide additional
library-specific functions over and above their PMD capabilities. The
bonded PMD is a good example here, as it has a whole set of extra
functions to create and manipulate bonded devices - things that are
obviously not part of the general ethdev API. Other vPMDs similarly
include functions to allow them to be created on the fly too.
Post by Tetsuya Mukawa
Post by Bruce Richardson
regards,
/Bruce
Hi Bruce,
I appreciate for showing a good example. I haven't noticed the PMD.
I will check the bonding PMD, and try to remove librte_vhost without
losing freedom and features of the library.
Hi,
Just a gentle reminder - if you consider removing (even if by just
replacing/renaming) an entire library, it needs to happen the ABI
deprecation process.
It seems obvious enough. But for all the ABI policing here, somehow we all
failed to notice the two compatibility breaking rename-elephants in the
- libintel_dpdk was renamed to libdpdk
- librte_pmd_virtio_uio was renamed to librte_pmd_virtio
Of course these cases are easy to work around with symlinks, and are
unrelated to the matter at hand. Just wanting to make sure such things
dont happen again.
- Panu -
Still doesn't hurt to remind us, Panu! Thanks. :-)
Hi,

Thanks for reminder. I've checked the DPDK documentation.
I will submit deprecation notice to follow DPDK deprecation process.
(Probably we will be able to remove vhost library in DPDK-2.3 or later.)

BTW, I will merge vhost library and PMD like below.
Step1. Move vhost library under vhost PMD.
Step2. Rename current APIs.
Step3. Add a function to get a pointer of "struct virtio_net device" by
a portno.

Last steps allows us to be able to convert a portno to the pointer of
corresponding vrtio_net device.
And we can still use features and freedom vhost library APIs provided.

Thanks,
Tetsuya
Panu Matilainen
2015-10-21 06:25:12 UTC
Permalink
Post by Tetsuya Mukawa
-----Original Message-----
Sent: Monday, October 19, 2015 2:26 PM
Subject: Re: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin
wrapper
Post by Bruce Richardson
Post by Tetsuya Mukawa
of librte_vhost. It means librte_vhost is also needed to compile
the PMD.
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Tetsuya Mukawa
The PMD can have 'iface' parameter like below to specify a path to
connect
Post by Bruce Richardson
Post by Tetsuya Mukawa
to a virtio-net device.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
With this PMD in place, is there any need to keep the existing
vhost library around as a separate entity? Can the existing
library be
subsumed/converted into
Post by Bruce Richardson
a standard PMD?
/Bruce
Hi Bruce,
I concern about whether the PMD has all features of librte_vhost,
because librte_vhost provides more features and freedom than ethdev
API provides.
In some cases, user needs to choose limited implementation without
librte_vhost.
I am going to eliminate such cases while implementing the PMD.
But I don't have strong belief that we can remove librte_vhost now.
So how about keeping current separation in next DPDK?
I guess people will try to replace librte_vhost to vhost PMD,
because apparently using ethdev APIs will be useful in many cases.
And we will get feedbacks like "vhost PMD needs to support like this
usage".
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
(Or we will not have feedbacks, but it's also OK.) Then, we will be
able to merge librte_vhost and vhost PMD.
I agree with the above. One the concerns I had when reviewing the
patch was that the PMD removes some freedom that is available with the
library. Eg. Ability to implement the new_device and destroy_device
callbacks. If using the PMD you are constrained to the implementations of
these in the PMD driver, but if using librte_vhost, you can implement your
own with whatever functionality you like - a good example of this can be
seen in the vhost sample app.
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
On the other hand, the PMD is useful in that it removes a lot of
complexity for the user and may work for some more general use cases. So I
would be in favour of having both options available too.
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Ciara
Thanks.
However, just because the libraries are merged does not mean that you
need be limited by PMD functionality. Many PMDs provide additional
library-specific functions over and above their PMD capabilities. The
bonded PMD is a good example here, as it has a whole set of extra
functions to create and manipulate bonded devices - things that are
obviously not part of the general ethdev API. Other vPMDs similarly
include functions to allow them to be created on the fly too.
Post by Tetsuya Mukawa
Post by Bruce Richardson
regards,
/Bruce
Hi Bruce,
I appreciate for showing a good example. I haven't noticed the PMD.
I will check the bonding PMD, and try to remove librte_vhost without
losing freedom and features of the library.
Hi,
Just a gentle reminder - if you consider removing (even if by just
replacing/renaming) an entire library, it needs to happen the ABI
deprecation process.
It seems obvious enough. But for all the ABI policing here, somehow we all
failed to notice the two compatibility breaking rename-elephants in the
- libintel_dpdk was renamed to libdpdk
- librte_pmd_virtio_uio was renamed to librte_pmd_virtio
Of course these cases are easy to work around with symlinks, and are
unrelated to the matter at hand. Just wanting to make sure such things
dont happen again.
- Panu -
Still doesn't hurt to remind us, Panu! Thanks. :-)
Hi,
Thanks for reminder. I've checked the DPDK documentation.
I will submit deprecation notice to follow DPDK deprecation process.
(Probably we will be able to remove vhost library in DPDK-2.3 or later.)
BTW, I will merge vhost library and PMD like below.
Step1. Move vhost library under vhost PMD.
Step2. Rename current APIs.
Step3. Add a function to get a pointer of "struct virtio_net device" by
a portno.
Last steps allows us to be able to convert a portno to the pointer of
corresponding vrtio_net device.
And we can still use features and freedom vhost library APIs provided.
Just wondering, is that *really* worth the price of breaking every
single vhost library user out there?

I mean, this is not about removing some bitrotten function or two which
nobody cares about anymore but removing (by renaming) one of the more
widely (AFAICS) used libraries and its entire API.

If current APIs are kept then compatibility is largely a matter of
planting a strategic symlink or two, but it might make the API look
inconsistent.

But just wondering about the benefit of this merge thing, compared to
just adding a vhost pmd and leaving the library be. The ABI process is
not there to make life miserable for DPDK developers, its there to help
make DPDK nicer for *other* developers. And the first and the foremost
rule is simply: dont break backwards compatibility. Not unless there's a
damn good reason to doing so, and I fail to see that reason here.

- Panu -
Post by Tetsuya Mukawa
Thanks,
Tetsuya
Bruce Richardson
2015-10-21 10:22:38 UTC
Permalink
Post by Tetsuya Mukawa
-----Original Message-----
Sent: Monday, October 19, 2015 2:26 PM
Subject: Re: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin
wrapper
Post by Bruce Richardson
Post by Tetsuya Mukawa
of librte_vhost. It means librte_vhost is also needed to compile
the PMD.
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Tetsuya Mukawa
The PMD can have 'iface' parameter like below to specify a path to
connect
Post by Bruce Richardson
Post by Tetsuya Mukawa
to a virtio-net device.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
With this PMD in place, is there any need to keep the existing
vhost library around as a separate entity? Can the existing
library be
subsumed/converted into
Post by Bruce Richardson
a standard PMD?
/Bruce
Hi Bruce,
I concern about whether the PMD has all features of librte_vhost,
because librte_vhost provides more features and freedom than ethdev
API provides.
In some cases, user needs to choose limited implementation without
librte_vhost.
I am going to eliminate such cases while implementing the PMD.
But I don't have strong belief that we can remove librte_vhost now.
So how about keeping current separation in next DPDK?
I guess people will try to replace librte_vhost to vhost PMD,
because apparently using ethdev APIs will be useful in many cases.
And we will get feedbacks like "vhost PMD needs to support like this
usage".
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
(Or we will not have feedbacks, but it's also OK.) Then, we will be
able to merge librte_vhost and vhost PMD.
I agree with the above. One the concerns I had when reviewing the
patch was that the PMD removes some freedom that is available with the
library. Eg. Ability to implement the new_device and destroy_device
callbacks. If using the PMD you are constrained to the implementations of
these in the PMD driver, but if using librte_vhost, you can implement your
own with whatever functionality you like - a good example of this can be
seen in the vhost sample app.
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
On the other hand, the PMD is useful in that it removes a lot of
complexity for the user and may work for some more general use cases. So I
would be in favour of having both options available too.
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Ciara
Thanks.
However, just because the libraries are merged does not mean that you
need be limited by PMD functionality. Many PMDs provide additional
library-specific functions over and above their PMD capabilities. The
bonded PMD is a good example here, as it has a whole set of extra
functions to create and manipulate bonded devices - things that are
obviously not part of the general ethdev API. Other vPMDs similarly
include functions to allow them to be created on the fly too.
Post by Tetsuya Mukawa
Post by Bruce Richardson
regards,
/Bruce
Hi Bruce,
I appreciate for showing a good example. I haven't noticed the PMD.
I will check the bonding PMD, and try to remove librte_vhost without
losing freedom and features of the library.
Hi,
Just a gentle reminder - if you consider removing (even if by just
replacing/renaming) an entire library, it needs to happen the ABI
deprecation process.
It seems obvious enough. But for all the ABI policing here, somehow we all
failed to notice the two compatibility breaking rename-elephants in the
- libintel_dpdk was renamed to libdpdk
- librte_pmd_virtio_uio was renamed to librte_pmd_virtio
Of course these cases are easy to work around with symlinks, and are
unrelated to the matter at hand. Just wanting to make sure such things
dont happen again.
- Panu -
Still doesn't hurt to remind us, Panu! Thanks. :-)
Hi,
Thanks for reminder. I've checked the DPDK documentation.
I will submit deprecation notice to follow DPDK deprecation process.
(Probably we will be able to remove vhost library in DPDK-2.3 or later.)
BTW, I will merge vhost library and PMD like below.
Step1. Move vhost library under vhost PMD.
Step2. Rename current APIs.
Step3. Add a function to get a pointer of "struct virtio_net device" by
a portno.
Last steps allows us to be able to convert a portno to the pointer of
corresponding vrtio_net device.
And we can still use features and freedom vhost library APIs provided.
Just wondering, is that *really* worth the price of breaking every single
vhost library user out there?
I mean, this is not about removing some bitrotten function or two which
nobody cares about anymore but removing (by renaming) one of the more widely
(AFAICS) used libraries and its entire API.
If current APIs are kept then compatibility is largely a matter of planting
a strategic symlink or two, but it might make the API look inconsistent.
But just wondering about the benefit of this merge thing, compared to just
adding a vhost pmd and leaving the library be. The ABI process is not there
to make life miserable for DPDK developers, its there to help make DPDK
dont break backwards compatibility. Not unless there's a damn good reason to
doing so, and I fail to see that reason here.
- Panu -
Good question, and I'll accept that maybe it's not worth doing. I'm not that
much of an expert on the internals and APIs of vhost library.

However, the merge I was looking for was more from a code locality point
of view, to have all the vhost code in one directory (under drivers/net),
than spread across multiple ones. What API's need to be deprecated
or not as part of that work, is a separate question, and so in theory we could
create a combined vhost library that does not deprecate anything (though to
avoid a build-up of technical debt, we'll probably want to deprecate some
functions).

I'll leave it up to the vhost experts do decide what's best, but for me, any
library that handles transmission and reception of packets outside of a DPDK
app should be a PMD library using ethdev rx/tx burst routines, and located
under drivers/net. (KNI is another obvious target for such a move and conversion).

Regards,
/Bruce
Tetsuya Mukawa
2015-10-22 09:50:02 UTC
Permalink
Post by Bruce Richardson
Post by Tetsuya Mukawa
-----Original Message-----
Sent: Monday, October 19, 2015 2:26 PM
Subject: Re: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin
wrapper
Post by Bruce Richardson
Post by Tetsuya Mukawa
of librte_vhost. It means librte_vhost is also needed to compile
the PMD.
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Tetsuya Mukawa
The PMD can have 'iface' parameter like below to specify a path to
connect
Post by Bruce Richardson
Post by Tetsuya Mukawa
to a virtio-net device.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
With this PMD in place, is there any need to keep the existing
vhost library around as a separate entity? Can the existing
library be
subsumed/converted into
Post by Bruce Richardson
a standard PMD?
/Bruce
Hi Bruce,
I concern about whether the PMD has all features of librte_vhost,
because librte_vhost provides more features and freedom than ethdev
API provides.
In some cases, user needs to choose limited implementation without
librte_vhost.
I am going to eliminate such cases while implementing the PMD.
But I don't have strong belief that we can remove librte_vhost now.
So how about keeping current separation in next DPDK?
I guess people will try to replace librte_vhost to vhost PMD,
because apparently using ethdev APIs will be useful in many cases.
And we will get feedbacks like "vhost PMD needs to support like this
usage".
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Post by Tetsuya Mukawa
(Or we will not have feedbacks, but it's also OK.) Then, we will be
able to merge librte_vhost and vhost PMD.
I agree with the above. One the concerns I had when reviewing the
patch was that the PMD removes some freedom that is available with the
library. Eg. Ability to implement the new_device and destroy_device
callbacks. If using the PMD you are constrained to the implementations of
these in the PMD driver, but if using librte_vhost, you can implement your
own with whatever functionality you like - a good example of this can be
seen in the vhost sample app.
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
On the other hand, the PMD is useful in that it removes a lot of
complexity for the user and may work for some more general use cases. So I
would be in favour of having both options available too.
Post by Tetsuya Mukawa
Post by Bruce Richardson
Post by Loftus, Ciara
Ciara
Thanks.
However, just because the libraries are merged does not mean that you
need be limited by PMD functionality. Many PMDs provide additional
library-specific functions over and above their PMD capabilities. The
bonded PMD is a good example here, as it has a whole set of extra
functions to create and manipulate bonded devices - things that are
obviously not part of the general ethdev API. Other vPMDs similarly
include functions to allow them to be created on the fly too.
Post by Tetsuya Mukawa
Post by Bruce Richardson
regards,
/Bruce
Hi Bruce,
I appreciate for showing a good example. I haven't noticed the PMD.
I will check the bonding PMD, and try to remove librte_vhost without
losing freedom and features of the library.
Hi,
Just a gentle reminder - if you consider removing (even if by just
replacing/renaming) an entire library, it needs to happen the ABI
deprecation process.
It seems obvious enough. But for all the ABI policing here, somehow we all
failed to notice the two compatibility breaking rename-elephants in the
- libintel_dpdk was renamed to libdpdk
- librte_pmd_virtio_uio was renamed to librte_pmd_virtio
Of course these cases are easy to work around with symlinks, and are
unrelated to the matter at hand. Just wanting to make sure such things
dont happen again.
- Panu -
Still doesn't hurt to remind us, Panu! Thanks. :-)
Hi,
Thanks for reminder. I've checked the DPDK documentation.
I will submit deprecation notice to follow DPDK deprecation process.
(Probably we will be able to remove vhost library in DPDK-2.3 or later.)
BTW, I will merge vhost library and PMD like below.
Step1. Move vhost library under vhost PMD.
Step2. Rename current APIs.
Step3. Add a function to get a pointer of "struct virtio_net device" by
a portno.
Last steps allows us to be able to convert a portno to the pointer of
corresponding vrtio_net device.
And we can still use features and freedom vhost library APIs provided.
Just wondering, is that *really* worth the price of breaking every single
vhost library user out there?
I mean, this is not about removing some bitrotten function or two which
nobody cares about anymore but removing (by renaming) one of the more widely
(AFAICS) used libraries and its entire API.
If current APIs are kept then compatibility is largely a matter of planting
a strategic symlink or two, but it might make the API look inconsistent.
But just wondering about the benefit of this merge thing, compared to just
adding a vhost pmd and leaving the library be. The ABI process is not there
to make life miserable for DPDK developers, its there to help make DPDK
dont break backwards compatibility. Not unless there's a damn good reason to
doing so, and I fail to see that reason here.
- Panu -
Good question, and I'll accept that maybe it's not worth doing. I'm not that
much of an expert on the internals and APIs of vhost library.
However, the merge I was looking for was more from a code locality point
of view, to have all the vhost code in one directory (under drivers/net),
than spread across multiple ones. What API's need to be deprecated
or not as part of that work, is a separate question, and so in theory we could
create a combined vhost library that does not deprecate anything (though to
avoid a build-up of technical debt, we'll probably want to deprecate some
functions).
I'll leave it up to the vhost experts do decide what's best, but for me, any
library that handles transmission and reception of packets outside of a DPDK
app should be a PMD library using ethdev rx/tx burst routines, and located
under drivers/net. (KNI is another obvious target for such a move and conversion).
Regards,
/Bruce
Hi,

I have submitted latest patches.
I will keep vhost library until we will have agreement to merge it to
vhost PMD.

Regards,
Testuya
Traynor, Kevin
2015-10-27 13:44:36 UTC
Permalink
-----Original Message-----
[snip]
Hi,
I have submitted latest patches.
I will keep vhost library until we will have agreement to merge it to
vhost PMD.
Longer term there are pros and cons to keeping the vhost library. Personally
I think it would make sense to remove sometime as trying to maintain two API's
has a cost, but I think adding a deprecation notice in DPDK 2.2 for removal in
DPDK 2.3 is very premature. Until it's proven *in the field* that the vhost PMD
is a suitable fully functioning replacement for the vhost library and users
have time to migrate, then please don't remove.
Regards,
Testuya
Tetsuya Mukawa
2015-10-28 02:24:16 UTC
Permalink
Post by Traynor, Kevin
-----Original Message-----
[snip]
Hi,
I have submitted latest patches.
I will keep vhost library until we will have agreement to merge it to
vhost PMD.
Longer term there are pros and cons to keeping the vhost library. Personally
I think it would make sense to remove sometime as trying to maintain two API's
has a cost, but I think adding a deprecation notice in DPDK 2.2 for removal in
DPDK 2.3 is very premature. Until it's proven *in the field* that the vhost PMD
is a suitable fully functioning replacement for the vhost library and users
have time to migrate, then please don't remove.
Hi Kevin,

Thanks for commenting. I agree it's not the time to add deprecation notice.
(I haven't included it in the vhost PMD patches)

Tetsuya
Tetsuya Mukawa
2015-10-22 09:45:48 UTC
Permalink
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

I've submitted below patches in former patch sets. But it seems some issues
were fixed already.

- [PATCH 1/3] vhost: Fix return value of GET_VRING_BASE message
- [PATCH 2/3] vhost: Fix RESET_OWNER handling not to close callfd
- [PATCH 3/3] vhost: Fix RESET_OWNER handling not to free virtqueue

I've still seen some reasource leaks of vhost library, but in this RFC,
I focused on vhost PMD.
After I get agreement, I will submit a patch for the leak issue as separated
patch. So please check directionality of vhost PMD.

PATCH v3 changes:
- Optimize performance.
In RX/TX functions, change code to access only per core data.
- Add below API to allow user to use vhost library APIs for a port managed
by vhost PMD. There are a bit of limitations. See "rte_eth_vhost.h".
- rte_eth_vhost_portid2vdev()
To support this functionality, vhost library is also changed.
Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
- Add code to support vhost multiple queues.
Actually, multiple queues functionality is not enabled so far.

Tetsuya Mukawa (2):
vhost: Add callback and private data for vhost PMD
vhost: Add VHOST PMD

config/common_linuxapp | 6 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 62 +++
drivers/net/vhost/rte_eth_vhost.c | 735 ++++++++++++++++++++++++++
drivers/net/vhost/rte_eth_vhost.h | 65 +++
drivers/net/vhost/rte_pmd_vhost_version.map | 8 +
lib/librte_vhost/rte_virtio_net.h | 3 +
lib/librte_vhost/vhost_user/virtio-net-user.c | 8 +-
lib/librte_vhost/virtio-net.c | 40 +-
lib/librte_vhost/virtio-net.h | 3 +-
mk/rte.app.mk | 8 +-
11 files changed, 934 insertions(+), 8 deletions(-)
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_eth_vhost.h
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
--
2.1.4
Tetsuya Mukawa
2015-10-22 09:45:49 UTC
Permalink
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
lib/librte_vhost/rte_virtio_net.h | 3 ++
lib/librte_vhost/vhost_user/virtio-net-user.c | 8 +++---
lib/librte_vhost/virtio-net.c | 40 +++++++++++++++++++++++++--
lib/librte_vhost/virtio-net.h | 3 +-
4 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 93d3e27..ec84c9b 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -108,6 +108,7 @@ struct virtio_net {
uint32_t virt_qp_nb;
uint32_t mem_idx; /** Used in set memory layout, unique for each queue within virtio device. */
void *priv; /**< private context */
+ void *pmd_priv; /**< private context for vhost PMD */
} __rte_cache_aligned;

/**
@@ -198,6 +199,8 @@ int rte_vhost_driver_unregister(const char *dev_name);

/* Register callbacks. */
int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+/* Register callbacks for vhost PMD (Only for internal). */
+int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const);
/* Start vhost driver session blocking loop. */
int rte_vhost_driver_session_start(void);

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 6a12d96..a75697f 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -288,7 +288,7 @@ user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)

if (virtio_is_ready(dev) &&
!(dev->flags & VIRTIO_DEV_RUNNING))
- notify_ops->new_device(dev);
+ notify_new_device(dev);
}

/*
@@ -302,7 +302,7 @@ user_get_vring_base(struct vhost_device_ctx ctx,

/* We have to stop the queue (virtio) if it is running. */
if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

/* Here we are safe to get the last used index */
ops->get_vring_base(ctx, state->index, state);
@@ -333,7 +333,7 @@ user_reset_owner(struct vhost_device_ctx ctx,

/* We have to stop the queue (virtio) if it is running. */
if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

RTE_LOG(INFO, VHOST_CONFIG,
"reset owner --- state idx:%d state num:%d\n", state->index, state->num);
@@ -379,7 +379,7 @@ user_destroy_device(struct vhost_device_ctx ctx)
uint32_t i;

if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

for (i = 0; i < dev->virt_qp_nb; i++)
if (dev && dev->mem_arr[i]) {
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 3131719..eec3c22 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -64,6 +64,8 @@ struct virtio_net_config_ll {

/* device ops to add/remove device to/from data core. */
struct virtio_net_device_ops const *notify_ops;
+/* device ops for vhost PMD to add/remove device to/from data core. */
+struct virtio_net_device_ops const *pmd_notify_ops;
/* root address of the linked list of managed virtio devices */
static struct virtio_net_config_ll *ll_root;

@@ -84,6 +86,29 @@ static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;

static uint64_t VHOST_PROTOCOL_FEATURES = VHOST_SUPPORTED_PROTOCOL_FEATURES;

+int
+notify_new_device(struct virtio_net *dev)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device != NULL)) {
+ int ret = pmd_notify_ops->new_device(dev);
+ if (ret != 0)
+ return ret;
+ }
+ if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
+ return notify_ops->new_device(dev);
+
+ return 0;
+}
+
+void
+notify_destroy_device(volatile struct virtio_net *dev)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device != NULL))
+ pmd_notify_ops->destroy_device(dev);
+ if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
+ notify_ops->destroy_device(dev);
+}
+
/*
* Converts QEMU virtual address to Vhost virtual address. This function is
* used to convert the ring addresses to our address space.
@@ -421,7 +446,7 @@ destroy_device(struct vhost_device_ctx ctx)
* the function to remove it from the data core.
*/
if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
- notify_ops->destroy_device(&(ll_dev_cur->dev));
+ notify_destroy_device(&(ll_dev_cur->dev));
ll_dev_cur = rm_config_ll_entry(ll_dev_cur,
ll_dev_last);
} else {
@@ -884,7 +909,7 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
dev->virtqueue[VIRTIO_RXQ]->enabled = 1;
dev->virtqueue[VIRTIO_TXQ]->enabled = 1;
}
- return notify_ops->new_device(dev);
+ return notify_new_device(dev);
}
/* Otherwise we remove it. */
} else
@@ -1006,3 +1031,14 @@ rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const op

return 0;
}
+
+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const ops)
+{
+ pmd_notify_ops = ops;
+
+ return 0;
+}
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
index ef6efae..f92ed73 100644
--- a/lib/librte_vhost/virtio-net.h
+++ b/lib/librte_vhost/virtio-net.h
@@ -39,7 +39,8 @@

#define VHOST_USER_PROTOCOL_F_VRING_FLAG 2

-struct virtio_net_device_ops const *notify_ops;
struct virtio_net *get_device(struct vhost_device_ctx ctx);

+int notify_new_device(struct virtio_net *dev);
+void notify_destroy_device(volatile struct virtio_net *dev);
#endif
--
2.1.4
Tetsuya Mukawa
2015-10-27 06:12:52 UTC
Permalink
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. The patch will work on below patch series.
- [PATCH v5 00/28] remove pci driver from vdevs

* Known issue.
We may see issues while handling RESET_OWNER message.
These handlings are done in vhost library, so not a part of vhost PMD.
So far, we are waiting for QEMU fixing.

PATCH v4 changes:
- Support vhost multiple queues.
- Rebase on "remove pci driver from vdevs".
- Optimize RX/TX functions.
- Fix resource leaks.
- Fix compile issue.
- Add patch to fix vhost library.

PATCH v3 changes:
- Optimize performance.
In RX/TX functions, change code to access only per core data.
- Add below API to allow user to use vhost library APIs for a port managed
by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
- rte_eth_vhost_portid2vdev()
To support this functionality, vhost library is also changed.
Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
- Add code to support vhost multiple queues.
Actually, multiple queues functionality is not enabled so far.

PATCH v2 changes:
- Fix issues reported by checkpatch.pl
(Thanks to Stephen Hemminger)


Tetsuya Mukawa (3):
vhost: Fix wrong handling of virtqueue array index
vhost: Add callback and private data for vhost PMD
vhost: Add VHOST PMD

config/common_linuxapp | 6 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/vhost.rst | 82 +++
doc/guides/rel_notes/release_2_2.rst | 2 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 62 +++
drivers/net/vhost/rte_eth_vhost.c | 765 ++++++++++++++++++++++++++
drivers/net/vhost/rte_eth_vhost.h | 65 +++
drivers/net/vhost/rte_pmd_vhost_version.map | 8 +
lib/librte_vhost/rte_vhost_version.map | 6 +
lib/librte_vhost/rte_virtio_net.h | 3 +
lib/librte_vhost/vhost_user/virtio-net-user.c | 33 +-
lib/librte_vhost/virtio-net.c | 61 +-
lib/librte_vhost/virtio-net.h | 4 +-
mk/rte.app.mk | 8 +-
15 files changed, 1085 insertions(+), 25 deletions(-)
create mode 100644 doc/guides/nics/vhost.rst
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_eth_vhost.h
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
--
2.1.4
Tetsuya Mukawa
2015-10-27 06:12:53 UTC
Permalink
The patch fixes wrong handling of virtqueue array index.

GET_VRING_BASE:
The vhost backend will receive the message per virtqueue.
Also we should call a destroy callback when both RXQ and TXQ receives
the message.

SET_BACKEND:
Because vhost library supports multiple queue, the index may be over 2.
Also a vhost frontend(QEMU) may send such a index.

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
lib/librte_vhost/vhost_user/virtio-net-user.c | 22 +++++++++++-----------
lib/librte_vhost/virtio-net.c | 5 +++--
2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index a998ad8..3e8dfea 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -283,12 +283,10 @@ user_get_vring_base(struct vhost_device_ctx ctx,
struct vhost_vring_state *state)
{
struct virtio_net *dev = get_device(ctx);
+ uint16_t base_idx = state->index / VIRTIO_QNUM;

if (dev == NULL)
return -1;
- /* We have to stop the queue (virtio) if it is running. */
- if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);

/* Here we are safe to get the last used index */
ops->get_vring_base(ctx, state->index, state);
@@ -300,15 +298,17 @@ user_get_vring_base(struct vhost_device_ctx ctx,
* sent and only sent in vhost_vring_stop.
* TODO: cleanup the vring, it isn't usable since here.
*/
- if (dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd >= 0) {
- close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
- dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
- }
- if (dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd >= 0) {
- close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
- dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
+ if (dev->virtqueue[state->index]->kickfd >= 0) {
+ close(dev->virtqueue[state->index]->kickfd);
+ dev->virtqueue[state->index]->kickfd = -1;
}

+ /* We have to stop the queue (virtio) if it is running. */
+ if ((dev->flags & VIRTIO_DEV_RUNNING) &&
+ (dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd == -1) &&
+ (dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd == -1))
+ notify_ops->destroy_device(dev);
+
return 0;
}

@@ -321,7 +321,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
struct vhost_vring_state *state)
{
struct virtio_net *dev = get_device(ctx);
- uint16_t base_idx = state->index;
+ uint16_t base_idx = state->index / VIRTIO_QNUM;
int enable = (int)state->num;

RTE_LOG(INFO, VHOST_CONFIG,
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 97213c5..ee2e84d 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -778,6 +778,7 @@ static int
set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
{
struct virtio_net *dev;
+ uint32_t base_idx = file->index / VIRTIO_QNUM;

dev = get_device(ctx);
if (dev == NULL)
@@ -791,8 +792,8 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
* we add the device.
*/
if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
- if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
- ((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
+ if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) &&
+ ((int)dev->virtqueue[base_idx + VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED)) {
return notify_ops->new_device(dev);
}
/* Otherwise we remove it. */
--
2.1.4
Yuanhan Liu
2015-10-27 06:29:25 UTC
Permalink
Post by Tetsuya Mukawa
The patch fixes wrong handling of virtqueue array index.
The vhost backend will receive the message per virtqueue.
No, that's not right, we will get GET_VRING_BASE for each queue pair,
including RX and TX virt queue, but not each virt queue.
Post by Tetsuya Mukawa
Also we should call a destroy callback when both RXQ and TXQ receives
the message.
Because vhost library supports multiple queue, the index may be over 2.
Also a vhost frontend(QEMU) may send such a index.
---
lib/librte_vhost/vhost_user/virtio-net-user.c | 22 +++++++++++-----------
lib/librte_vhost/virtio-net.c | 5 +++--
2 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index a998ad8..3e8dfea 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -283,12 +283,10 @@ user_get_vring_base(struct vhost_device_ctx ctx,
struct vhost_vring_state *state)
{
struct virtio_net *dev = get_device(ctx);
+ uint16_t base_idx = state->index / VIRTIO_QNUM;
For the Nth queue (pair), the "state->index" equals to N * 2.
So, dividing it by VIRTIO_QNUM (2) is wrong here.

--yliu
Post by Tetsuya Mukawa
if (dev == NULL)
return -1;
- /* We have to stop the queue (virtio) if it is running. */
- if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
/* Here we are safe to get the last used index */
ops->get_vring_base(ctx, state->index, state);
@@ -300,15 +298,17 @@ user_get_vring_base(struct vhost_device_ctx ctx,
* sent and only sent in vhost_vring_stop.
* TODO: cleanup the vring, it isn't usable since here.
*/
- if (dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd >= 0) {
- close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
- dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
- }
- if (dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd >= 0) {
- close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
- dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
+ if (dev->virtqueue[state->index]->kickfd >= 0) {
+ close(dev->virtqueue[state->index]->kickfd);
+ dev->virtqueue[state->index]->kickfd = -1;
}
+ /* We have to stop the queue (virtio) if it is running. */
+ if ((dev->flags & VIRTIO_DEV_RUNNING) &&
+ (dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd == -1) &&
+ (dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd == -1))
+ notify_ops->destroy_device(dev);
+
return 0;
}
@@ -321,7 +321,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
struct vhost_vring_state *state)
{
struct virtio_net *dev = get_device(ctx);
- uint16_t base_idx = state->index;
+ uint16_t base_idx = state->index / VIRTIO_QNUM;
int enable = (int)state->num;
RTE_LOG(INFO, VHOST_CONFIG,
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 97213c5..ee2e84d 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -778,6 +778,7 @@ static int
set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
{
struct virtio_net *dev;
+ uint32_t base_idx = file->index / VIRTIO_QNUM;
dev = get_device(ctx);
if (dev == NULL)
@@ -791,8 +792,8 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
* we add the device.
*/
if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
- if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
- ((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
+ if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) &&
+ ((int)dev->virtqueue[base_idx + VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED)) {
return notify_ops->new_device(dev);
}
/* Otherwise we remove it. */
--
2.1.4
Yuanhan Liu
2015-10-27 06:33:28 UTC
Permalink
Post by Yuanhan Liu
Post by Tetsuya Mukawa
The patch fixes wrong handling of virtqueue array index.
The vhost backend will receive the message per virtqueue.
No, that's not right, we will get GET_VRING_BASE for each queue pair,
including RX and TX virt queue, but not each virt queue.
Oops, you are right, and I was right in the first time till
Huawei pointted out that I made some unexpected change, and
I then checked the code, I then made a wrong decison (a
bit too tired recently :(

So, I will look at this patch, again.

Sorry for that.

--yliu
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Also we should call a destroy callback when both RXQ and TXQ receives
the message.
Because vhost library supports multiple queue, the index may be over 2.
Also a vhost frontend(QEMU) may send such a index.
---
lib/librte_vhost/vhost_user/virtio-net-user.c | 22 +++++++++++-----------
lib/librte_vhost/virtio-net.c | 5 +++--
2 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index a998ad8..3e8dfea 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -283,12 +283,10 @@ user_get_vring_base(struct vhost_device_ctx ctx,
struct vhost_vring_state *state)
{
struct virtio_net *dev = get_device(ctx);
+ uint16_t base_idx = state->index / VIRTIO_QNUM;
For the Nth queue (pair), the "state->index" equals to N * 2.
So, dividing it by VIRTIO_QNUM (2) is wrong here.
--yliu
Post by Tetsuya Mukawa
if (dev == NULL)
return -1;
- /* We have to stop the queue (virtio) if it is running. */
- if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
/* Here we are safe to get the last used index */
ops->get_vring_base(ctx, state->index, state);
@@ -300,15 +298,17 @@ user_get_vring_base(struct vhost_device_ctx ctx,
* sent and only sent in vhost_vring_stop.
* TODO: cleanup the vring, it isn't usable since here.
*/
- if (dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd >= 0) {
- close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
- dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
- }
- if (dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd >= 0) {
- close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
- dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
+ if (dev->virtqueue[state->index]->kickfd >= 0) {
+ close(dev->virtqueue[state->index]->kickfd);
+ dev->virtqueue[state->index]->kickfd = -1;
}
+ /* We have to stop the queue (virtio) if it is running. */
+ if ((dev->flags & VIRTIO_DEV_RUNNING) &&
+ (dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd == -1) &&
+ (dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd == -1))
+ notify_ops->destroy_device(dev);
+
return 0;
}
@@ -321,7 +321,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
struct vhost_vring_state *state)
{
struct virtio_net *dev = get_device(ctx);
- uint16_t base_idx = state->index;
+ uint16_t base_idx = state->index / VIRTIO_QNUM;
int enable = (int)state->num;
RTE_LOG(INFO, VHOST_CONFIG,
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 97213c5..ee2e84d 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -778,6 +778,7 @@ static int
set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
{
struct virtio_net *dev;
+ uint32_t base_idx = file->index / VIRTIO_QNUM;
dev = get_device(ctx);
if (dev == NULL)
@@ -791,8 +792,8 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
* we add the device.
*/
if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
- if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
- ((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
+ if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) &&
+ ((int)dev->virtqueue[base_idx + VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED)) {
return notify_ops->new_device(dev);
}
/* Otherwise we remove it. */
--
2.1.4
Yuanhan Liu
2015-10-27 06:47:20 UTC
Permalink
Post by Tetsuya Mukawa
The patch fixes wrong handling of virtqueue array index.
The vhost backend will receive the message per virtqueue.
Also we should call a destroy callback when both RXQ and TXQ receives
the message.
Because vhost library supports multiple queue, the index may be over 2.
Also a vhost frontend(QEMU) may send such a index.
Note that only vhost-user supports MQ. vhost-cuse does not.
Post by Tetsuya Mukawa
---
lib/librte_vhost/vhost_user/virtio-net-user.c | 22 +++++++++++-----------
lib/librte_vhost/virtio-net.c | 5 +++--
2 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index a998ad8..3e8dfea 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -283,12 +283,10 @@ user_get_vring_base(struct vhost_device_ctx ctx,
struct vhost_vring_state *state)
{
struct virtio_net *dev = get_device(ctx);
+ uint16_t base_idx = state->index / VIRTIO_QNUM;
So, fixing what my 1st reply said, for Nth queue pair, state->index
is "N * 2 + is_tx". So, the base should be "state->index / 2 * 2".
Post by Tetsuya Mukawa
if (dev == NULL)
return -1;
- /* We have to stop the queue (virtio) if it is running. */
- if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
/* Here we are safe to get the last used index */
ops->get_vring_base(ctx, state->index, state);
@@ -300,15 +298,17 @@ user_get_vring_base(struct vhost_device_ctx ctx,
* sent and only sent in vhost_vring_stop.
* TODO: cleanup the vring, it isn't usable since here.
*/
- if (dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd >= 0) {
- close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
- dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
- }
- if (dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd >= 0) {
- close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
- dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
+ if (dev->virtqueue[state->index]->kickfd >= 0) {
+ close(dev->virtqueue[state->index]->kickfd);
+ dev->virtqueue[state->index]->kickfd = -1;
}
+ /* We have to stop the queue (virtio) if it is running. */
+ if ((dev->flags & VIRTIO_DEV_RUNNING) &&
+ (dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd == -1) &&
+ (dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd == -1))
+ notify_ops->destroy_device(dev);
This is a proper fix then. (You just need fix base_idx).
Post by Tetsuya Mukawa
return 0;
}
@@ -321,7 +321,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
struct vhost_vring_state *state)
{
struct virtio_net *dev = get_device(ctx);
- uint16_t base_idx = state->index;
+ uint16_t base_idx = state->index / VIRTIO_QNUM;
user_set_vring_enable is sent per queue pair (I'm sure this time), so
base_idx equals to state->index. No need fix here.
Post by Tetsuya Mukawa
int enable = (int)state->num;
RTE_LOG(INFO, VHOST_CONFIG,
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 97213c5..ee2e84d 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -778,6 +778,7 @@ static int
set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
{
struct virtio_net *dev;
+ uint32_t base_idx = file->index / VIRTIO_QNUM;
As stated, vhost-cuse doesn't not support MQ.

--yliu
Post by Tetsuya Mukawa
dev = get_device(ctx);
if (dev == NULL)
@@ -791,8 +792,8 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
* we add the device.
*/
if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
- if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
- ((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
+ if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) &&
+ ((int)dev->virtqueue[base_idx + VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED)) {
return notify_ops->new_device(dev);
}
/* Otherwise we remove it. */
--
2.1.4
Tetsuya Mukawa
2015-10-27 07:28:58 UTC
Permalink
Hi Yuanhan,

I appreciate your checking.
I haven't noticed SET_BACKEND is only supported by vhost-cuse. :-(
I will follow your comments, then submit again.

Thanks,
Tetsuya
Post by Yuanhan Liu
Post by Tetsuya Mukawa
The patch fixes wrong handling of virtqueue array index.
The vhost backend will receive the message per virtqueue.
Also we should call a destroy callback when both RXQ and TXQ receives
the message.
Because vhost library supports multiple queue, the index may be over 2.
Also a vhost frontend(QEMU) may send such a index.
Note that only vhost-user supports MQ. vhost-cuse does not.
Post by Tetsuya Mukawa
---
lib/librte_vhost/vhost_user/virtio-net-user.c | 22 +++++++++++-----------
lib/librte_vhost/virtio-net.c | 5 +++--
2 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index a998ad8..3e8dfea 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -283,12 +283,10 @@ user_get_vring_base(struct vhost_device_ctx ctx,
struct vhost_vring_state *state)
{
struct virtio_net *dev = get_device(ctx);
+ uint16_t base_idx = state->index / VIRTIO_QNUM;
So, fixing what my 1st reply said, for Nth queue pair, state->index
is "N * 2 + is_tx". So, the base should be "state->index / 2 * 2".
Post by Tetsuya Mukawa
if (dev == NULL)
return -1;
- /* We have to stop the queue (virtio) if it is running. */
- if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
/* Here we are safe to get the last used index */
ops->get_vring_base(ctx, state->index, state);
@@ -300,15 +298,17 @@ user_get_vring_base(struct vhost_device_ctx ctx,
* sent and only sent in vhost_vring_stop.
* TODO: cleanup the vring, it isn't usable since here.
*/
- if (dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd >= 0) {
- close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
- dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
- }
- if (dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd >= 0) {
- close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
- dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
+ if (dev->virtqueue[state->index]->kickfd >= 0) {
+ close(dev->virtqueue[state->index]->kickfd);
+ dev->virtqueue[state->index]->kickfd = -1;
}
+ /* We have to stop the queue (virtio) if it is running. */
+ if ((dev->flags & VIRTIO_DEV_RUNNING) &&
+ (dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd == -1) &&
+ (dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd == -1))
+ notify_ops->destroy_device(dev);
This is a proper fix then. (You just need fix base_idx).
Post by Tetsuya Mukawa
return 0;
}
@@ -321,7 +321,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
struct vhost_vring_state *state)
{
struct virtio_net *dev = get_device(ctx);
- uint16_t base_idx = state->index;
+ uint16_t base_idx = state->index / VIRTIO_QNUM;
user_set_vring_enable is sent per queue pair (I'm sure this time), so
base_idx equals to state->index. No need fix here.
Post by Tetsuya Mukawa
int enable = (int)state->num;
RTE_LOG(INFO, VHOST_CONFIG,
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 97213c5..ee2e84d 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -778,6 +778,7 @@ static int
set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
{
struct virtio_net *dev;
+ uint32_t base_idx = file->index / VIRTIO_QNUM;
As stated, vhost-cuse doesn't not support MQ.
--yliu
Post by Tetsuya Mukawa
dev = get_device(ctx);
if (dev == NULL)
@@ -791,8 +792,8 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
* we add the device.
*/
if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
- if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
- ((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
+ if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) &&
+ ((int)dev->virtqueue[base_idx + VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED)) {
return notify_ops->new_device(dev);
}
/* Otherwise we remove it. */
--
2.1.4
Yuanhan Liu
2015-10-27 07:34:10 UTC
Permalink
Post by Tetsuya Mukawa
Hi Yuanhan,
I appreciate your checking.
Welcome! And thank you for catching out my faults.

--yliu
Post by Tetsuya Mukawa
I haven't noticed SET_BACKEND is only supported by vhost-cuse. :-(
I will follow your comments, then submit again.
Thanks,
Tetsuya
Post by Yuanhan Liu
Post by Tetsuya Mukawa
The patch fixes wrong handling of virtqueue array index.
The vhost backend will receive the message per virtqueue.
Also we should call a destroy callback when both RXQ and TXQ receives
the message.
Because vhost library supports multiple queue, the index may be over 2.
Also a vhost frontend(QEMU) may send such a index.
Note that only vhost-user supports MQ. vhost-cuse does not.
Post by Tetsuya Mukawa
---
lib/librte_vhost/vhost_user/virtio-net-user.c | 22 +++++++++++-----------
lib/librte_vhost/virtio-net.c | 5 +++--
2 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index a998ad8..3e8dfea 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -283,12 +283,10 @@ user_get_vring_base(struct vhost_device_ctx ctx,
struct vhost_vring_state *state)
{
struct virtio_net *dev = get_device(ctx);
+ uint16_t base_idx = state->index / VIRTIO_QNUM;
So, fixing what my 1st reply said, for Nth queue pair, state->index
is "N * 2 + is_tx". So, the base should be "state->index / 2 * 2".
Post by Tetsuya Mukawa
if (dev == NULL)
return -1;
- /* We have to stop the queue (virtio) if it is running. */
- if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
/* Here we are safe to get the last used index */
ops->get_vring_base(ctx, state->index, state);
@@ -300,15 +298,17 @@ user_get_vring_base(struct vhost_device_ctx ctx,
* sent and only sent in vhost_vring_stop.
* TODO: cleanup the vring, it isn't usable since here.
*/
- if (dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd >= 0) {
- close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
- dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
- }
- if (dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd >= 0) {
- close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
- dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
+ if (dev->virtqueue[state->index]->kickfd >= 0) {
+ close(dev->virtqueue[state->index]->kickfd);
+ dev->virtqueue[state->index]->kickfd = -1;
}
+ /* We have to stop the queue (virtio) if it is running. */
+ if ((dev->flags & VIRTIO_DEV_RUNNING) &&
+ (dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd == -1) &&
+ (dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd == -1))
+ notify_ops->destroy_device(dev);
This is a proper fix then. (You just need fix base_idx).
Post by Tetsuya Mukawa
return 0;
}
@@ -321,7 +321,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
struct vhost_vring_state *state)
{
struct virtio_net *dev = get_device(ctx);
- uint16_t base_idx = state->index;
+ uint16_t base_idx = state->index / VIRTIO_QNUM;
user_set_vring_enable is sent per queue pair (I'm sure this time), so
base_idx equals to state->index. No need fix here.
Post by Tetsuya Mukawa
int enable = (int)state->num;
RTE_LOG(INFO, VHOST_CONFIG,
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 97213c5..ee2e84d 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -778,6 +778,7 @@ static int
set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
{
struct virtio_net *dev;
+ uint32_t base_idx = file->index / VIRTIO_QNUM;
As stated, vhost-cuse doesn't not support MQ.
--yliu
Post by Tetsuya Mukawa
dev = get_device(ctx);
if (dev == NULL)
@@ -791,8 +792,8 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
* we add the device.
*/
if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
- if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
- ((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
+ if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) &&
+ ((int)dev->virtqueue[base_idx + VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED)) {
return notify_ops->new_device(dev);
}
/* Otherwise we remove it. */
--
2.1.4
Tetsuya Mukawa
2015-10-27 06:12:54 UTC
Permalink
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
lib/librte_vhost/rte_vhost_version.map | 6 +++
lib/librte_vhost/rte_virtio_net.h | 3 ++
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
lib/librte_vhost/virtio-net.c | 56 +++++++++++++++++++++++++--
lib/librte_vhost/virtio-net.h | 4 +-
5 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
rte_vhost_driver_unregister;

} DPDK_2.0;
+
+DPDK_2.2 {
+ global:
+
+ rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 426a70d..08e77af 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -106,6 +106,7 @@ struct virtio_net {
char ifname[IF_NAME_SZ]; /**< Name of the tap device or socket path. */
uint32_t virt_qp_nb; /**< number of queue pair we have allocated */
void *priv; /**< private context */
+ void *pmd_priv; /**< private context for vhost PMD */
struct vhost_virtqueue *virtqueue[VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX]; /**< Contains all virtqueue information. */
} __rte_cache_aligned;

@@ -202,6 +203,8 @@ int rte_vhost_driver_unregister(const char *dev_name);

/* Register callbacks. */
int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+/* Register callbacks for vhost PMD (Only for internal). */
+int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const);
/* Start vhost driver session blocking loop. */
int rte_vhost_driver_session_start(void);

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 3e8dfea..dad083b 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -111,7 +111,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)

/* Remove from the data plane. */
if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

if (dev->mem) {
free_mem_region(dev);
@@ -272,7 +272,7 @@ user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)

if (virtio_is_ready(dev) &&
!(dev->flags & VIRTIO_DEV_RUNNING))
- notify_ops->new_device(dev);
+ notify_new_device(dev);
}

/*
@@ -307,7 +307,7 @@ user_get_vring_base(struct vhost_device_ctx ctx,
if ((dev->flags & VIRTIO_DEV_RUNNING) &&
(dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd == -1) &&
(dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd == -1))
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

return 0;
}
@@ -328,10 +328,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
"set queue enable: %d to qp idx: %d\n",
enable, state->index);

- if (notify_ops->vring_state_changed) {
- notify_ops->vring_state_changed(dev, base_idx / VIRTIO_QNUM,
- enable);
- }
+ notify_vring_state_changed(dev, base_idx / VIRTIO_QNUM, enable);

dev->virtqueue[base_idx + VIRTIO_RXQ]->enabled = enable;
dev->virtqueue[base_idx + VIRTIO_TXQ]->enabled = enable;
@@ -345,7 +342,7 @@ user_destroy_device(struct vhost_device_ctx ctx)
struct virtio_net *dev = get_device(ctx);

if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

if (dev && dev->mem) {
free_mem_region(dev);
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index ee2e84d..de5d8ff 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -65,6 +65,8 @@ struct virtio_net_config_ll {

/* device ops to add/remove device to/from data core. */
struct virtio_net_device_ops const *notify_ops;
+/* device ops for vhost PMD to add/remove device to/from data core. */
+struct virtio_net_device_ops const *pmd_notify_ops;
/* root address of the linked list of managed virtio devices */
static struct virtio_net_config_ll *ll_root;

@@ -80,6 +82,43 @@ static struct virtio_net_config_ll *ll_root;
static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;


+int
+notify_new_device(struct virtio_net *dev)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device != NULL)) {
+ int ret = pmd_notify_ops->new_device(dev);
+ if (ret != 0)
+ return ret;
+ }
+ if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
+ return notify_ops->new_device(dev);
+
+ return 0;
+}
+
+void
+notify_destroy_device(volatile struct virtio_net *dev)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device != NULL))
+ pmd_notify_ops->destroy_device(dev);
+ if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
+ notify_ops->destroy_device(dev);
+}
+
+int
+notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->vring_state_changed != NULL)) {
+ int ret = pmd_notify_ops->vring_state_changed(dev, queue_id, enable);
+ if (ret != 0)
+ return ret;
+ }
+ if ((notify_ops != NULL) && (notify_ops->vring_state_changed != NULL))
+ return notify_ops->vring_state_changed(dev, queue_id, enable);
+
+ return 0;
+}
+
/*
* Converts QEMU virtual address to Vhost virtual address. This function is
* used to convert the ring addresses to our address space.
@@ -377,7 +416,7 @@ destroy_device(struct vhost_device_ctx ctx)
* the function to remove it from the data core.
*/
if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
- notify_ops->destroy_device(&(ll_dev_cur->dev));
+ notify_destroy_device(&(ll_dev_cur->dev));
ll_dev_cur = rm_config_ll_entry(ll_dev_cur,
ll_dev_last);
} else {
@@ -794,12 +833,12 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) &&
((int)dev->virtqueue[base_idx + VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED)) {
- return notify_ops->new_device(dev);
+ return notify_new_device(dev);
}
/* Otherwise we remove it. */
} else
if (file->fd == VIRTIO_DEV_STOPPED)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);
return 0;
}

@@ -883,3 +922,14 @@ rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const op

return 0;
}
+
+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const ops)
+{
+ pmd_notify_ops = ops;
+
+ return 0;
+}
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
index 75fb57e..0816e71 100644
--- a/lib/librte_vhost/virtio-net.h
+++ b/lib/librte_vhost/virtio-net.h
@@ -37,7 +37,9 @@
#include "vhost-net.h"
#include "rte_virtio_net.h"

-struct virtio_net_device_ops const *notify_ops;
struct virtio_net *get_device(struct vhost_device_ctx ctx);

+int notify_new_device(struct virtio_net *dev);
+void notify_destroy_device(volatile struct virtio_net *dev);
+int notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable);
#endif
--
2.1.4
Loftus, Ciara
2015-10-30 17:49:35 UTC
Permalink
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.
---
lib/librte_vhost/rte_vhost_version.map | 6 +++
lib/librte_vhost/rte_virtio_net.h | 3 ++
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
lib/librte_vhost/virtio-net.c | 56 +++++++++++++++++++++++++--
lib/librte_vhost/virtio-net.h | 4 +-
5 files changed, 70 insertions(+), 12 deletions(-)
diff --git a/lib/librte_vhost/rte_vhost_version.map
b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
rte_vhost_driver_unregister;
} DPDK_2.0;
+
+DPDK_2.2 {
+
+ rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h
b/lib/librte_vhost/rte_virtio_net.h
index 426a70d..08e77af 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -106,6 +106,7 @@ struct virtio_net {
char ifname[IF_NAME_SZ]; /**< Name of the tap
device or socket path. */
uint32_t virt_qp_nb; /**< number of queue pair
we have allocated */
void *priv; /**< private context */
+ void *pmd_priv; /**< private context for
vhost PMD */
struct vhost_virtqueue
*virtqueue[VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX]; /**< Contains
all virtqueue information. */
} __rte_cache_aligned;
@@ -202,6 +203,8 @@ int rte_vhost_driver_unregister(const char
*dev_name);
/* Register callbacks. */
int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+/* Register callbacks for vhost PMD (Only for internal). */
+int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const);
/* Start vhost driver session blocking loop. */
int rte_vhost_driver_session_start(void);
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c
b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 3e8dfea..dad083b 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -111,7 +111,7 @@ user_set_mem_table(struct vhost_device_ctx ctx,
struct VhostUserMsg *pmsg)
/* Remove from the data plane. */
if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);
if (dev->mem) {
free_mem_region(dev);
@@ -272,7 +272,7 @@ user_set_vring_kick(struct vhost_device_ctx ctx,
struct VhostUserMsg *pmsg)
if (virtio_is_ready(dev) &&
!(dev->flags & VIRTIO_DEV_RUNNING))
- notify_ops->new_device(dev);
+ notify_new_device(dev);
}
/*
@@ -307,7 +307,7 @@ user_get_vring_base(struct vhost_device_ctx ctx,
if ((dev->flags & VIRTIO_DEV_RUNNING) &&
(dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd ==
-1) &&
(dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd ==
-1))
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);
return 0;
}
@@ -328,10 +328,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
"set queue enable: %d to qp idx: %d\n",
enable, state->index);
- if (notify_ops->vring_state_changed) {
- notify_ops->vring_state_changed(dev, base_idx /
VIRTIO_QNUM,
- enable);
- }
+ notify_vring_state_changed(dev, base_idx / VIRTIO_QNUM,
enable);
dev->virtqueue[base_idx + VIRTIO_RXQ]->enabled = enable;
dev->virtqueue[base_idx + VIRTIO_TXQ]->enabled = enable;
@@ -345,7 +342,7 @@ user_destroy_device(struct vhost_device_ctx ctx)
struct virtio_net *dev = get_device(ctx);
if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);
if (dev && dev->mem) {
free_mem_region(dev);
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index ee2e84d..de5d8ff 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -65,6 +65,8 @@ struct virtio_net_config_ll {
/* device ops to add/remove device to/from data core. */
struct virtio_net_device_ops const *notify_ops;
+/* device ops for vhost PMD to add/remove device to/from data core. */
+struct virtio_net_device_ops const *pmd_notify_ops;
/* root address of the linked list of managed virtio devices */
static struct virtio_net_config_ll *ll_root;
@@ -80,6 +82,43 @@ static struct virtio_net_config_ll *ll_root;
static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
+int
+notify_new_device(struct virtio_net *dev)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device != NULL)) {
+ int ret = pmd_notify_ops->new_device(dev);
+ if (ret != 0)
+ return ret;
+ }
+ if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
+ return notify_ops->new_device(dev);
+
+ return 0;
+}
+
+void
+notify_destroy_device(volatile struct virtio_net *dev)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device != NULL))
+ pmd_notify_ops->destroy_device(dev);
+ if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
+ notify_ops->destroy_device(dev);
+}
+
+int
+notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops-
Post by Tetsuya Mukawa
vring_state_changed != NULL)) {
+ int ret = pmd_notify_ops->vring_state_changed(dev,
queue_id, enable);
+ if (ret != 0)
+ return ret;
+ }
+ if ((notify_ops != NULL) && (notify_ops->vring_state_changed != NULL))
+ return notify_ops->vring_state_changed(dev, queue_id,
enable);
+
+ return 0;
+}
+
/*
* Converts QEMU virtual address to Vhost virtual address. This function is
* used to convert the ring addresses to our address space.
@@ -377,7 +416,7 @@ destroy_device(struct vhost_device_ctx ctx)
* the function to remove it from the data core.
*/
if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
- notify_ops->destroy_device(&(ll_dev_cur-
Post by Tetsuya Mukawa
dev));
+ notify_destroy_device(&(ll_dev_cur->dev));
ll_dev_cur = rm_config_ll_entry(ll_dev_cur,
ll_dev_last);
} else {
@@ -794,12 +833,12 @@ set_backend(struct vhost_device_ctx ctx, struct
vhost_vring_file *file)
if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend
!= VIRTIO_DEV_STOPPED) &&
((int)dev->virtqueue[base_idx + VIRTIO_TXQ]-
Post by Tetsuya Mukawa
backend != VIRTIO_DEV_STOPPED)) {
- return notify_ops->new_device(dev);
+ return notify_new_device(dev);
}
/* Otherwise we remove it. */
} else
if (file->fd == VIRTIO_DEV_STOPPED)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);
return 0;
}
@@ -883,3 +922,14 @@ rte_vhost_driver_callback_register(struct
virtio_net_device_ops const * const op
return 0;
}
+
+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const ops)
+{
+ pmd_notify_ops = ops;
+
+ return 0;
+}
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
index 75fb57e..0816e71 100644
--- a/lib/librte_vhost/virtio-net.h
+++ b/lib/librte_vhost/virtio-net.h
@@ -37,7 +37,9 @@
#include "vhost-net.h"
#include "rte_virtio_net.h"
-struct virtio_net_device_ops const *notify_ops;
struct virtio_net *get_device(struct vhost_device_ctx ctx);
+int notify_new_device(struct virtio_net *dev);
+void notify_destroy_device(volatile struct virtio_net *dev);
+int notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable);
#endif
--
2.1.4
Hi Tetsuya,

Thanks for implementing this. I haven't had a chance to actually test it, but if these changes allow users of the PMD to implement their own new_ and destroy_ device functions etc, that's good news.

Thanks,
Ciara
Tetsuya Mukawa
2015-11-02 03:15:03 UTC
Permalink
Post by Bruce Richardson
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.
---
lib/librte_vhost/rte_vhost_version.map | 6 +++
lib/librte_vhost/rte_virtio_net.h | 3 ++
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
lib/librte_vhost/virtio-net.c | 56 +++++++++++++++++++++++++--
lib/librte_vhost/virtio-net.h | 4 +-
5 files changed, 70 insertions(+), 12 deletions(-)
Hi Tetsuya,
Thanks for implementing this. I haven't had a chance to actually test it, but if these changes allow users of the PMD to implement their own new_ and destroy_ device functions etc, that's good news.
Thanks,
Ciara
Hi Ciara,

Yes, the patch works like you said.

Thanks,
Tetsuya
Tetsuya Mukawa
2015-10-27 06:12:55 UTC
Permalink
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
- iface: The parameter is used to specify a path to connect to a
virtio-net device.
- queues: The parameter is used to specify the number of the queues
virtio-net device has.
(Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
-device virtio-net-pci,netdev=net0

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
config/common_linuxapp | 6 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/vhost.rst | 82 +++
doc/guides/rel_notes/release_2_2.rst | 2 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 62 +++
drivers/net/vhost/rte_eth_vhost.c | 765 ++++++++++++++++++++++++++++
drivers/net/vhost/rte_eth_vhost.h | 65 +++
drivers/net/vhost/rte_pmd_vhost_version.map | 8 +
mk/rte.app.mk | 8 +-
10 files changed, 1002 insertions(+), 1 deletion(-)
create mode 100644 doc/guides/nics/vhost.rst
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_eth_vhost.h
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..7310240 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -446,6 +446,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
CONFIG_RTE_LIBRTE_VHOST_DEBUG=n

#
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
#Compile Xen domain0 support
#
CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index d1a92f8..44792fe 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -46,6 +46,7 @@ Network Interface Controller Drivers
intel_vf
mlx4
virtio
+ vhost
vmxnet3
pcap_ring

diff --git a/doc/guides/nics/vhost.rst b/doc/guides/nics/vhost.rst
new file mode 100644
index 0000000..2ec8d79
--- /dev/null
+++ b/doc/guides/nics/vhost.rst
@@ -0,0 +1,82 @@
+.. BSD LICENSE
+ Copyright(c) 2015 IGEL Co., Ltd.. All rights reserved.
+ All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions
+ are met:
+
+ * Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in
+ the documentation and/or other materials provided with the
+ distribution.
+ * Neither the name of IGEL Co., Ltd. nor the names of its
+ contributors may be used to endorse or promote products derived
+ from this software without specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Poll Mode Driver that wraps vhost library
+=========================================
+
+This PMD is a thin wrapper of the DPDK vhost library.
+The User can handle virtqueues as one of normal DPDK port.
+
+Vhost Implementation in DPDK
+----------------------------
+
+Please refer to Chapter "Vhost Library" of Programmer's Guide to know detail of vhost.
+
+Features and Limitations of vhost PMD
+-------------------------------------
+
+In this release, the vhost PMD provides the basic functionality of packet reception and transmission.
+
+* It provides the function to convert port_id to a pointer of virtio_net device.
+ It allows the user to use vhost library with the PMD in parallel.
+
+* It has multiple queues support.
+
+* It supports Port Hotplug functionality.
+
+* Don't need to stop RX/TX, when the user wants to stop a guest or a virtio-net driver on guest.
+
+Vhost PMD with testpmd application
+----------------------------------
+
+This section demonstrates vhost PMD with testpmd DPDK sample application.
+
+#. Launch the testpmd with vhost PMD:
+
+ .. code-block:: console
+
+ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
+
+ Other basic DPDK preparations like hugepage enabling here.
+ Please refer to the *DPDK Getting Started Guide* for detailed instructions.
+
+#. Launch the QEMU:
+
+ .. code-block:: console
+
+ qemu-system-x86_64 <snip>
+ -chardev socket,id=chr0,path=/tmp/sock0 \
+ -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
+ -device virtio-net-pci,netdev=net0
+
+ This command generates one virtio-net device for QEMU.
+ Once device is recognized by guest, The user can handle the device as normal
+ virtio-net device.
+ When initialization processes between virtio-net driver and vhost library are done, Port status of the testpmd will be linked up.
diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 639f129..5930e70 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -11,6 +11,8 @@ New Features

* **Added vhost-user multiple queue support.**

+* **Added vhost PMD.**
+
* **Removed the PCI device from vdev PMD's.**

* This change required modifications to librte_ether and all vdev and pdev PMD's.
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 5ebf963..e46a38e 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -49,5 +49,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt

+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
include $(RTE_SDK)/mk/rte.sharelib.mk
include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..8186a80
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+# BSD LICENSE
+#
+# Copyright (c) 2010-2015 Intel Corporation.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in
+# the documentation and/or other materials provided with the
+# distribution.
+# * Neither the name of Intel corporation nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..004fdaf
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,765 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015 IGEL Co., Ltd.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of IGEL Co.,Ltd. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG "iface"
+#define ETH_VHOST_QUEUES_ARG "queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+ ETH_VHOST_IFACE_ARG,
+ ETH_VHOST_QUEUES_ARG,
+ NULL
+};
+
+static struct ether_addr base_eth_addr = {
+ .addr_bytes = {
+ 0x56 /* V */,
+ 0x48 /* H */,
+ 0x4F /* O */,
+ 0x53 /* S */,
+ 0x54 /* T */,
+ 0x00
+ }
+};
+
+struct vhost_queue {
+ struct virtio_net *device;
+ struct pmd_internal *internal;
+ struct rte_mempool *mb_pool;
+ rte_atomic32_t allow_queuing;
+ rte_atomic32_t while_queuing;
+ uint64_t rx_pkts;
+ uint64_t tx_pkts;
+ uint64_t err_pkts;
+};
+
+struct pmd_internal {
+ TAILQ_ENTRY(pmd_internal) next;
+ char *dev_name;
+ char *iface_name;
+ unsigned nb_rx_queues;
+ unsigned nb_tx_queues;
+
+ struct vhost_queue *rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+ struct vhost_queue *tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+
+ volatile uint16_t once;
+ pthread_t session_th;
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+ TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static struct rte_eth_link pmd_link = {
+ .link_speed = 10000,
+ .link_duplex = ETH_LINK_FULL_DUPLEX,
+ .link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t nb_rx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Dequeue packets from guest TX queue */
+ nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+ VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
+
+ r->rx_pkts += nb_rx;
+
+out:
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+ VIRTIO_RXQ, bufs, nb_bufs);
+
+ r->tx_pkts += nb_tx;
+ r->err_pkts += nb_bufs - nb_tx;
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
+
+out:
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+ int found = 0;
+ struct pmd_internal *internal;
+
+ if (ifname == NULL)
+ return NULL;
+
+ pthread_mutex_lock(&internal_list_lock);
+
+ TAILQ_FOREACH(internal, &internals_list, next) {
+ if (!strcmp(internal->iface_name, ifname)) {
+ found = 1;
+ break;
+ }
+ }
+
+ pthread_mutex_unlock(&internal_list_lock);
+
+ if (!found)
+ return NULL;
+
+ return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "invalid argument\n");
+ return -1;
+ }
+
+ internal = find_internal_resource(dev->ifname);
+ if (internal == NULL) {
+ RTE_LOG(INFO, PMD, "invalid device name\n");
+ return -1;
+ }
+
+ if ((dev->virt_qp_nb < internal->nb_rx_queues) ||
+ (dev->virt_qp_nb < internal->nb_tx_queues)) {
+ RTE_LOG(INFO, PMD, "Not enough queues\n");
+ return -1;
+ }
+
+ eth_dev = rte_eth_dev_allocated(internal->dev_name);
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "Failed to find ethdev\n");
+ return -1;
+ }
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = dev;
+ vq->internal = internal;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = dev;
+ vq->internal = internal;
+ }
+
+ dev->flags |= VIRTIO_DEV_RUNNING;
+ dev->pmd_priv = eth_dev;
+ eth_dev->data->dev_link.link_status = 1;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 1);
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 1);
+ }
+ RTE_LOG(INFO, PMD, "New connection established\n");
+
+ return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "invalid argument\n");
+ return;
+ }
+
+ eth_dev = (struct rte_eth_dev *)dev->pmd_priv;
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "failuer to find a ethdev\n");
+ return;
+ }
+
+ internal = eth_dev->data->dev_private;
+
+ /* Wait until rx/tx_pkt_burst stops accesing vhost device */
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 0);
+ while (rte_atomic32_read(&vq->while_queuing))
+ rte_pause();
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 0);
+ while (rte_atomic32_read(&vq->while_queuing))
+ rte_pause();
+ }
+
+ eth_dev->data->dev_link.link_status = 0;
+
+ dev->pmd_priv = NULL;
+ dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = NULL;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = NULL;
+ }
+
+ RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *vhost_driver_session(void *param __rte_unused)
+{
+ static struct virtio_net_device_ops *vhost_ops;
+
+ vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+ if (vhost_ops == NULL)
+ rte_panic("Can't allocate memory\n");
+
+ /* set vhost arguments */
+ vhost_ops->new_device = new_device;
+ vhost_ops->destroy_device = destroy_device;
+ if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
+ rte_panic("Can't register callbacks\n");
+
+ /* start event handling */
+ rte_vhost_driver_session_start();
+
+ rte_free(vhost_ops);
+ pthread_exit(0);
+}
+
+static void vhost_driver_session_start(struct pmd_internal *internal)
+{
+ int ret;
+
+ ret = pthread_create(&internal->session_th,
+ NULL, vhost_driver_session, NULL);
+ if (ret)
+ rte_panic("Can't create a thread\n");
+}
+
+static void vhost_driver_session_stop(struct pmd_internal *internal)
+{
+ int ret;
+
+ ret = pthread_cancel(internal->session_th);
+ if (ret)
+ rte_panic("Can't cancel the thread\n");
+
+ ret = pthread_join(internal->session_th, NULL);
+ if (ret)
+ rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+ int ret;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+ ret = rte_vhost_driver_register(internal->iface_name);
+ if (ret)
+ return ret;
+
+ vhost_driver_session_start(internal);
+ }
+ return 0;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ if (rte_atomic16_cmpset(&internal->once, 1, 0)) {
+ rte_vhost_driver_unregister(internal->iface_name);
+ vhost_driver_session_stop(internal);
+ }
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+ struct vhost_queue *vq;
+
+ if (internal->rx_vhost_queues[rx_queue_id] != NULL)
+ rte_free(internal->rx_vhost_queues[rx_queue_id]);
+
+ vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (vq == NULL) {
+ RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+ return -ENOMEM;
+ }
+
+ vq->mb_pool = mb_pool;
+ internal->rx_vhost_queues[rx_queue_id] = vq;
+ dev->data->rx_queues[rx_queue_id] = vq;
+ return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+ struct vhost_queue *vq;
+
+ if (internal->tx_vhost_queues[tx_queue_id] != NULL)
+ rte_free(internal->tx_vhost_queues[tx_queue_id]);
+
+ vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (vq == NULL) {
+ RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+ return -ENOMEM;
+ }
+
+ internal->tx_vhost_queues[tx_queue_id] = vq;
+ dev->data->tx_queues[tx_queue_id] = vq;
+ return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+ struct rte_eth_dev_info *dev_info)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ dev_info->driver_name = drivername;
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)-1;
+ dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+ dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+ unsigned i;
+ unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+ const struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] == NULL)
+ continue;
+ igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
+ rx_total += igb_stats->q_ipackets[i];
+ }
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] == NULL)
+ continue;
+ igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
+ igb_stats->q_errors[i] = internal->tx_vhost_queues[i]->err_pkts;
+ tx_total += igb_stats->q_opackets[i];
+ tx_err_total += igb_stats->q_errors[i];
+ }
+
+ igb_stats->ipackets = rx_total;
+ igb_stats->opackets = tx_total;
+ igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+ unsigned i;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] == NULL)
+ continue;
+ internal->rx_vhost_queues[i]->rx_pkts = 0;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] == NULL)
+ continue;
+ internal->tx_vhost_queues[i]->tx_pkts = 0;
+ internal->tx_vhost_queues[i]->err_pkts = 0;
+ }
+}
+
+static void
+eth_queue_release(void *q __rte_unused) { ; }
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused) { return 0; }
+
+static const struct eth_dev_ops ops = {
+ .dev_start = eth_dev_start,
+ .dev_stop = eth_dev_stop,
+ .dev_configure = eth_dev_configure,
+ .dev_infos_get = eth_dev_info,
+ .rx_queue_setup = eth_rx_queue_setup,
+ .tx_queue_setup = eth_tx_queue_setup,
+ .rx_queue_release = eth_queue_release,
+ .tx_queue_release = eth_queue_release,
+ .link_update = eth_link_update,
+ .stats_get = eth_stats_get,
+ .stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+ char *iface_name,
+ int16_t queues,
+ const unsigned numa_node)
+{
+ struct rte_eth_dev_data *data = NULL;
+ struct pmd_internal *internal = NULL;
+ struct rte_eth_dev *eth_dev = NULL;
+ struct ether_addr *eth_addr = NULL;
+
+ RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+ numa_node);
+
+ /* now do all data allocation - for eth_dev structure, dummy pci driver
+ * and internal (private) data
+ */
+ data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+ if (data == NULL)
+ goto error;
+
+ internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+ if (internal == NULL)
+ goto error;
+
+ eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+ if (eth_addr == NULL)
+ goto error;
+ *eth_addr = base_eth_addr;
+ eth_addr->addr_bytes[5] = index;
+
+ /* reserve an ethdev entry */
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+ if (eth_dev == NULL)
+ goto error;
+
+ /* now put it all together
+ * - store queue data in internal,
+ * - store numa_node info in ethdev data
+ * - point eth_dev_data to internals
+ * - and point eth_dev structure to new eth_dev_data structure
+ */
+ internal->nb_rx_queues = queues;
+ internal->nb_tx_queues = queues;
+ internal->dev_name = strdup(name);
+ if (internal->dev_name == NULL)
+ goto error;
+ internal->iface_name = strdup(iface_name);
+ if (internal->iface_name == NULL)
+ goto error;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_INSERT_TAIL(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ data->dev_private = internal;
+ data->port_id = eth_dev->data->port_id;
+ memmove(data->name, eth_dev->data->name, sizeof(data->name));
+ data->nb_rx_queues = queues;
+ data->nb_tx_queues = queues;
+ data->dev_link = pmd_link;
+ data->mac_addrs = eth_addr;
+
+ /* We'll replace the 'data' originally allocated by eth_dev. So the
+ * vhost PMD resources won't be shared between multi processes.
+ */
+ eth_dev->data = data;
+ eth_dev->dev_ops = &ops;
+ eth_dev->driver = NULL;
+ eth_dev->data->dev_flags = RTE_ETH_DEV_DETACHABLE;
+ eth_dev->data->kdrv = RTE_KDRV_NONE;
+ eth_dev->data->drv_name = internal->dev_name;
+ eth_dev->data->numa_node = numa_node;
+
+ /* finally assign rx and tx ops */
+ eth_dev->rx_pkt_burst = eth_vhost_rx;
+ eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+ return data->port_id;
+
+error:
+ rte_free(data);
+ rte_free(internal);
+ rte_free(eth_addr);
+
+ return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **iface_name = extra_args;
+
+ if (value == NULL)
+ return -1;
+
+ *iface_name = value;
+
+ return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint16_t *q = extra_args;
+
+ if ((value == NULL) || (extra_args == NULL))
+ return -EINVAL;
+
+ *q = (uint16_t)strtoul(value, NULL, 0);
+ if ((*q == USHRT_MAX) && (errno == ERANGE))
+ return -1;
+
+ if (*q > RTE_MAX_QUEUES_PER_PORT)
+ return -1;
+
+ return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+ struct rte_kvargs *kvlist = NULL;
+ int ret = 0;
+ int index;
+ char *iface_name;
+ uint16_t queues;
+
+ RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+ kvlist = rte_kvargs_parse(params, valid_arguments);
+ if (kvlist == NULL)
+ return -1;
+
+ if (strlen(name) < strlen("eth_vhost"))
+ return -1;
+
+ index = strtol(name + strlen("eth_vhost"), NULL, 0);
+ if (errno == ERANGE)
+ return -1;
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+ &open_iface, &iface_name);
+ if (ret < 0)
+ goto out_free;
+ }
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+ &open_queues, &queues);
+ if (ret < 0)
+ goto out_free;
+
+ } else
+ queues = 1;
+
+ eth_dev_vhost_create(name, index,
+ iface_name, queues, rte_socket_id());
+
+out_free:
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+ struct rte_eth_dev *eth_dev = NULL;
+ struct pmd_internal *internal;
+ unsigned int i;
+
+ RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+ if (name == NULL)
+ return -EINVAL;
+
+ /* find an ethdev entry */
+ eth_dev = rte_eth_dev_allocated(name);
+ if (eth_dev == NULL)
+ return -ENODEV;
+
+ internal = eth_dev->data->dev_private;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_REMOVE(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ eth_dev_stop(eth_dev);
+
+ if ((internal) && (internal->dev_name))
+ free(internal->dev_name);
+ if ((internal) && (internal->iface_name))
+ free(internal->iface_name);
+
+ rte_free(eth_dev->data->mac_addrs);
+ rte_free(eth_dev->data);
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] != NULL)
+ rte_free(internal->rx_vhost_queues[i]);
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] != NULL)
+ rte_free(internal->tx_vhost_queues[i]);
+ }
+ rte_free(internal);
+
+ rte_eth_dev_release_port(eth_dev);
+
+ return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+ .name = "eth_vhost",
+ .type = PMD_VDEV,
+ .init = rte_pmd_vhost_devinit,
+ .uninit = rte_pmd_vhost_devuninit,
+};
+
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
+{
+ struct rte_eth_dev *eth_dev;
+
+ if (rte_eth_dev_is_valid_port(port_id) == 0)
+ return NULL;
+
+ eth_dev = &rte_eth_devices[port_id];
+ if (strncmp("eth_vhost", eth_dev->data->drv_name,
+ strlen("eth_vhost")) == 0) {
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+
+ internal = eth_dev->data->dev_private;
+ vq = internal->rx_vhost_queues[0];
+ if ((vq != NULL) && (vq->device != NULL))
+ return vq->device;
+ }
+
+ return NULL;
+}
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..22a880f
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,65 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 IGEL Co., Ltd.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of IGEL Co., Ltd. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_AF_PACKET_H_
+#define _RTE_ETH_AF_PACKET_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_virtio_net.h>
+
+/**
+ * The function convert specified port_id to virtio device structure.
+ * The retured device can be used for vhost library APIs.
+ * To use vhost library APIs and vhost PMD parallely, below API should
+ * not be called, because the API will be called by vhost PMD.
+ * - rte_vhost_driver_session_start()
+ * Once a device is managed by vhost PMD, below API should not be called.
+ * - rte_vhost_driver_unregister()
+ * To unregister the device, call Port Hotplug APIs.
+ *
+ * @param port_id
+ * port number
+ * @return
+ * virtio net device structure corresponding to the specified port
+ * NULL will be returned in error cases.
+ */
+struct virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..bf0361a
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,8 @@
+DPDK_2.2 {
+
+ global:
+
+ rte_eth_vhost_portid2vdev;
+
+ local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 9e1909e..806e45c 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -143,7 +143,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += -lrte_pmd_af_packet
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += -lrte_pmd_null

-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)

endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
--
2.1.4
Tetsuya Mukawa
2015-11-02 03:58:55 UTC
Permalink
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. The patch will work on below patch series.
- [PATCH v7 00/28] remove pci driver from vdevs

* Known issue.
We may see issues while handling RESET_OWNER message.
These handlings are done in vhost library, so not a part of vhost PMD.
So far, we are waiting for QEMU fixing.

PATCH v2 changes:
- Remove a below patch that fixes vhost library.
The patch was applied as a separate patch.
- vhost: fix crash with multiqueue enabled
- Fix typos.
(Thanks to Thomas, Monjalon)
- Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
- Support vhost multiple queues.
- Rebase on "remove pci driver from vdevs".
- Optimize RX/TX functions.
- Fix resource leaks.
- Fix compile issue.
- Add patch to fix vhost library.

RFC PATCH v3 changes:
- Optimize performance.
In RX/TX functions, change code to access only per core data.
- Add below API to allow user to use vhost library APIs for a port managed
by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
- rte_eth_vhost_portid2vdev()
To support this functionality, vhost library is also changed.
Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
- Add code to support vhost multiple queues.
Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
- Fix issues reported by checkpatch.pl
(Thanks to Stephen Hemminger)


Tetsuya Mukawa (2):
vhost: Add callback and private data for vhost PMD
vhost: Add VHOST PMD

config/common_linuxapp | 6 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/vhost.rst | 82 +++
doc/guides/rel_notes/release_2_2.rst | 2 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 62 +++
drivers/net/vhost/rte_eth_vhost.c | 765 ++++++++++++++++++++++++++
drivers/net/vhost/rte_eth_vhost.h | 65 +++
drivers/net/vhost/rte_pmd_vhost_version.map | 8 +
lib/librte_vhost/rte_vhost_version.map | 6 +
lib/librte_vhost/rte_virtio_net.h | 3 +
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +-
lib/librte_vhost/virtio-net.c | 56 +-
lib/librte_vhost/virtio-net.h | 4 +-
mk/rte.app.mk | 8 +-
15 files changed, 1072 insertions(+), 13 deletions(-)
create mode 100644 doc/guides/nics/vhost.rst
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_eth_vhost.h
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
--
2.1.4
Tetsuya Mukawa
2015-11-02 03:58:56 UTC
Permalink
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
lib/librte_vhost/rte_vhost_version.map | 6 +++
lib/librte_vhost/rte_virtio_net.h | 3 ++
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
lib/librte_vhost/virtio-net.c | 56 +++++++++++++++++++++++++--
lib/librte_vhost/virtio-net.h | 4 +-
5 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
rte_vhost_driver_unregister;

} DPDK_2.0;
+
+DPDK_2.2 {
+ global:
+
+ rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index b6386f9..033edde 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -121,6 +121,7 @@ struct virtio_net {
char ifname[IF_NAME_SZ]; /**< Name of the tap device or socket path. */
uint32_t virt_qp_nb; /**< number of queue pair we have allocated */
void *priv; /**< private context */
+ void *pmd_priv; /**< private context for vhost PMD */
struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; /**< Contains all virtqueue information. */
} __rte_cache_aligned;

@@ -217,6 +218,8 @@ int rte_vhost_driver_unregister(const char *dev_name);

/* Register callbacks. */
int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+/* Register callbacks for vhost PMD (Only for internal). */
+int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const);
/* Start vhost driver session blocking loop. */
int rte_vhost_driver_session_start(void);

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index d07452a..d8ae2fc 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -111,7 +111,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)

/* Remove from the data plane. */
if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

if (dev->mem) {
free_mem_region(dev);
@@ -272,7 +272,7 @@ user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)

if (virtio_is_ready(dev) &&
!(dev->flags & VIRTIO_DEV_RUNNING))
- notify_ops->new_device(dev);
+ notify_new_device(dev);
}

/*
@@ -288,7 +288,7 @@ user_get_vring_base(struct vhost_device_ctx ctx,
return -1;
/* We have to stop the queue (virtio) if it is running. */
if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

/* Here we are safe to get the last used index */
ops->get_vring_base(ctx, state->index, state);
@@ -324,10 +324,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
"set queue enable: %d to qp idx: %d\n",
enable, state->index);

- if (notify_ops->vring_state_changed) {
- notify_ops->vring_state_changed(dev, base_idx / VIRTIO_QNUM,
- enable);
- }
+ notify_vring_state_changed(dev, base_idx / VIRTIO_QNUM, enable);

dev->virtqueue[base_idx + VIRTIO_RXQ]->enabled = enable;
dev->virtqueue[base_idx + VIRTIO_TXQ]->enabled = enable;
@@ -341,7 +338,7 @@ user_destroy_device(struct vhost_device_ctx ctx)
struct virtio_net *dev = get_device(ctx);

if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

if (dev && dev->mem) {
free_mem_region(dev);
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 3e82605..ee54beb 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -65,6 +65,8 @@ struct virtio_net_config_ll {

/* device ops to add/remove device to/from data core. */
struct virtio_net_device_ops const *notify_ops;
+/* device ops for vhost PMD to add/remove device to/from data core. */
+struct virtio_net_device_ops const *pmd_notify_ops;
/* root address of the linked list of managed virtio devices */
static struct virtio_net_config_ll *ll_root;

@@ -80,6 +82,43 @@ static struct virtio_net_config_ll *ll_root;
static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;


+int
+notify_new_device(struct virtio_net *dev)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device != NULL)) {
+ int ret = pmd_notify_ops->new_device(dev);
+ if (ret != 0)
+ return ret;
+ }
+ if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
+ return notify_ops->new_device(dev);
+
+ return 0;
+}
+
+void
+notify_destroy_device(volatile struct virtio_net *dev)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device != NULL))
+ pmd_notify_ops->destroy_device(dev);
+ if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
+ notify_ops->destroy_device(dev);
+}
+
+int
+notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->vring_state_changed != NULL)) {
+ int ret = pmd_notify_ops->vring_state_changed(dev, queue_id, enable);
+ if (ret != 0)
+ return ret;
+ }
+ if ((notify_ops != NULL) && (notify_ops->vring_state_changed != NULL))
+ return notify_ops->vring_state_changed(dev, queue_id, enable);
+
+ return 0;
+}
+
/*
* Converts QEMU virtual address to Vhost virtual address. This function is
* used to convert the ring addresses to our address space.
@@ -377,7 +416,7 @@ destroy_device(struct vhost_device_ctx ctx)
* the function to remove it from the data core.
*/
if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
- notify_ops->destroy_device(&(ll_dev_cur->dev));
+ notify_destroy_device(&(ll_dev_cur->dev));
ll_dev_cur = rm_config_ll_entry(ll_dev_cur,
ll_dev_last);
} else {
@@ -793,12 +832,12 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
- return notify_ops->new_device(dev);
+ return notify_new_device(dev);
}
/* Otherwise we remove it. */
} else
if (file->fd == VIRTIO_DEV_STOPPED)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);
return 0;
}

@@ -882,3 +921,14 @@ rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const op

return 0;
}
+
+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const ops)
+{
+ pmd_notify_ops = ops;
+
+ return 0;
+}
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
index 75fb57e..0816e71 100644
--- a/lib/librte_vhost/virtio-net.h
+++ b/lib/librte_vhost/virtio-net.h
@@ -37,7 +37,9 @@
#include "vhost-net.h"
#include "rte_virtio_net.h"

-struct virtio_net_device_ops const *notify_ops;
struct virtio_net *get_device(struct vhost_device_ctx ctx);

+int notify_new_device(struct virtio_net *dev);
+void notify_destroy_device(volatile struct virtio_net *dev);
+int notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable);
#endif
--
2.1.4
Tetsuya Mukawa
2015-11-09 05:16:59 UTC
Permalink
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

* Known issue.
We may see issues while handling RESET_OWNER message.
These handlings are done in vhost library, so not a part of vhost PMD.
So far, we are waiting for QEMU fixing.

PATCH v3 changes:
- Rebase on latest matser
- Specify correct queue_id in RX/TX function.

PATCH v2 changes:
- Remove a below patch that fixes vhost library.
The patch was applied as a separate patch.
- vhost: fix crash with multiqueue enabled
- Fix typos.
(Thanks to Thomas, Monjalon)
- Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
- Support vhost multiple queues.
- Rebase on "remove pci driver from vdevs".
- Optimize RX/TX functions.
- Fix resource leaks.
- Fix compile issue.
- Add patch to fix vhost library.

RFC PATCH v3 changes:
- Optimize performance.
In RX/TX functions, change code to access only per core data.
- Add below API to allow user to use vhost library APIs for a port managed
by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
- rte_eth_vhost_portid2vdev()
To support this functionality, vhost library is also changed.
Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
- Add code to support vhost multiple queues.
Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
- Fix issues reported by checkpatch.pl
(Thanks to Stephen Hemminger)


Tetsuya Mukawa (2):
vhost: Add callback and private data for vhost PMD
vhost: Add VHOST PMD

config/common_linuxapp | 6 +
doc/guides/nics/index.rst | 1 +
doc/guides/rel_notes/release_2_2.rst | 2 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 62 +++
drivers/net/vhost/rte_eth_vhost.c | 768 ++++++++++++++++++++++++++
drivers/net/vhost/rte_eth_vhost.h | 65 +++
drivers/net/vhost/rte_pmd_vhost_version.map | 8 +
lib/librte_vhost/rte_vhost_version.map | 6 +
lib/librte_vhost/rte_virtio_net.h | 3 +
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +-
lib/librte_vhost/virtio-net.c | 56 +-
lib/librte_vhost/virtio-net.h | 4 +-
mk/rte.app.mk | 8 +-
14 files changed, 993 insertions(+), 13 deletions(-)
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_eth_vhost.h
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
--
2.1.4
Tetsuya Mukawa
2015-11-09 05:17:00 UTC
Permalink
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
lib/librte_vhost/rte_vhost_version.map | 6 +++
lib/librte_vhost/rte_virtio_net.h | 3 ++
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
lib/librte_vhost/virtio-net.c | 56 +++++++++++++++++++++++++--
lib/librte_vhost/virtio-net.h | 4 +-
5 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
rte_vhost_driver_unregister;

} DPDK_2.0;
+
+DPDK_2.2 {
+ global:
+
+ rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 5687452..3ef6e58 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -128,6 +128,7 @@ struct virtio_net {
char ifname[IF_NAME_SZ]; /**< Name of the tap device or socket path. */
uint32_t virt_qp_nb; /**< number of queue pair we have allocated */
void *priv; /**< private context */
+ void *pmd_priv; /**< private context for vhost PMD */
struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; /**< Contains all virtqueue information. */
} __rte_cache_aligned;

@@ -224,6 +225,8 @@ int rte_vhost_driver_unregister(const char *dev_name);

/* Register callbacks. */
int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+/* Register callbacks for vhost PMD (Only for internal). */
+int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const);
/* Start vhost driver session blocking loop. */
int rte_vhost_driver_session_start(void);

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index d07452a..d8ae2fc 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -111,7 +111,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)

/* Remove from the data plane. */
if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

if (dev->mem) {
free_mem_region(dev);
@@ -272,7 +272,7 @@ user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)

if (virtio_is_ready(dev) &&
!(dev->flags & VIRTIO_DEV_RUNNING))
- notify_ops->new_device(dev);
+ notify_new_device(dev);
}

/*
@@ -288,7 +288,7 @@ user_get_vring_base(struct vhost_device_ctx ctx,
return -1;
/* We have to stop the queue (virtio) if it is running. */
if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

/* Here we are safe to get the last used index */
ops->get_vring_base(ctx, state->index, state);
@@ -324,10 +324,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
"set queue enable: %d to qp idx: %d\n",
enable, state->index);

- if (notify_ops->vring_state_changed) {
- notify_ops->vring_state_changed(dev, base_idx / VIRTIO_QNUM,
- enable);
- }
+ notify_vring_state_changed(dev, base_idx / VIRTIO_QNUM, enable);

dev->virtqueue[base_idx + VIRTIO_RXQ]->enabled = enable;
dev->virtqueue[base_idx + VIRTIO_TXQ]->enabled = enable;
@@ -341,7 +338,7 @@ user_destroy_device(struct vhost_device_ctx ctx)
struct virtio_net *dev = get_device(ctx);

if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

if (dev && dev->mem) {
free_mem_region(dev);
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 14278de..a5aef08 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -65,6 +65,8 @@ struct virtio_net_config_ll {

/* device ops to add/remove device to/from data core. */
struct virtio_net_device_ops const *notify_ops;
+/* device ops for vhost PMD to add/remove device to/from data core. */
+struct virtio_net_device_ops const *pmd_notify_ops;
/* root address of the linked list of managed virtio devices */
static struct virtio_net_config_ll *ll_root;

@@ -81,6 +83,43 @@ static struct virtio_net_config_ll *ll_root;
static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;


+int
+notify_new_device(struct virtio_net *dev)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device != NULL)) {
+ int ret = pmd_notify_ops->new_device(dev);
+ if (ret != 0)
+ return ret;
+ }
+ if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
+ return notify_ops->new_device(dev);
+
+ return 0;
+}
+
+void
+notify_destroy_device(volatile struct virtio_net *dev)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device != NULL))
+ pmd_notify_ops->destroy_device(dev);
+ if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
+ notify_ops->destroy_device(dev);
+}
+
+int
+notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->vring_state_changed != NULL)) {
+ int ret = pmd_notify_ops->vring_state_changed(dev, queue_id, enable);
+ if (ret != 0)
+ return ret;
+ }
+ if ((notify_ops != NULL) && (notify_ops->vring_state_changed != NULL))
+ return notify_ops->vring_state_changed(dev, queue_id, enable);
+
+ return 0;
+}
+
/*
* Converts QEMU virtual address to Vhost virtual address. This function is
* used to convert the ring addresses to our address space.
@@ -378,7 +417,7 @@ destroy_device(struct vhost_device_ctx ctx)
* the function to remove it from the data core.
*/
if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
- notify_ops->destroy_device(&(ll_dev_cur->dev));
+ notify_destroy_device(&(ll_dev_cur->dev));
ll_dev_cur = rm_config_ll_entry(ll_dev_cur,
ll_dev_last);
} else {
@@ -794,12 +833,12 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
- return notify_ops->new_device(dev);
+ return notify_new_device(dev);
}
/* Otherwise we remove it. */
} else
if (file->fd == VIRTIO_DEV_STOPPED)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);
return 0;
}

@@ -883,3 +922,14 @@ rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const op

return 0;
}
+
+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const ops)
+{
+ pmd_notify_ops = ops;
+
+ return 0;
+}
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
index 75fb57e..0816e71 100644
--- a/lib/librte_vhost/virtio-net.h
+++ b/lib/librte_vhost/virtio-net.h
@@ -37,7 +37,9 @@
#include "vhost-net.h"
#include "rte_virtio_net.h"

-struct virtio_net_device_ops const *notify_ops;
struct virtio_net *get_device(struct vhost_device_ctx ctx);

+int notify_new_device(struct virtio_net *dev);
+void notify_destroy_device(volatile struct virtio_net *dev);
+int notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable);
#endif
--
2.1.4
Aaron Conole
2015-11-09 18:16:39 UTC
Permalink
Greetings,
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.
---
lib/librte_vhost/rte_vhost_version.map | 6 +++
lib/librte_vhost/rte_virtio_net.h | 3 ++
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
lib/librte_vhost/virtio-net.c | 56 +++++++++++++++++++++++++--
lib/librte_vhost/virtio-net.h | 4 +-
5 files changed, 70 insertions(+), 12 deletions(-)
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
rte_vhost_driver_unregister;
} DPDK_2.0;
+
+DPDK_2.2 {
+
+ rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 5687452..3ef6e58 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -128,6 +128,7 @@ struct virtio_net {
char ifname[IF_NAME_SZ]; /**< Name of the tap device or socket path. */
uint32_t virt_qp_nb; /**< number of queue pair we have allocated */
void *priv; /**< private context */
+ void *pmd_priv; /**< private context for vhost PMD */
struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; /**< Contains all virtqueue information. */
} __rte_cache_aligned;
Sorry if I'm missing something, but this is an ABI breaker, isn't it? I
think this needs the RTE_NEXT_ABI tag around it.
Tetsuya Mukawa
2015-11-10 03:13:54 UTC
Permalink
Post by Aaron Conole
Greetings,
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.
---
lib/librte_vhost/rte_vhost_version.map | 6 +++
lib/librte_vhost/rte_virtio_net.h | 3 ++
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
lib/librte_vhost/virtio-net.c | 56 +++++++++++++++++++++++++--
lib/librte_vhost/virtio-net.h | 4 +-
5 files changed, 70 insertions(+), 12 deletions(-)
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
rte_vhost_driver_unregister;
} DPDK_2.0;
+
+DPDK_2.2 {
+
+ rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 5687452..3ef6e58 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -128,6 +128,7 @@ struct virtio_net {
char ifname[IF_NAME_SZ]; /**< Name of the tap device or socket path. */
uint32_t virt_qp_nb; /**< number of queue pair we have allocated */
void *priv; /**< private context */
+ void *pmd_priv; /**< private context for vhost PMD */
struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; /**< Contains all virtqueue information. */
} __rte_cache_aligned;
Sorry if I'm missing something, but this is an ABI breaker, isn't it? I
think this needs the RTE_NEXT_ABI tag around it.
Hi Aaron,

Thanks for reviewing. Yes, your are correct.
I guess I can implement vhost PMD without this variable, so I will
remove it.

Thanks,
Tetsuya
Panu Matilainen
2015-11-10 07:16:07 UTC
Permalink
Post by Tetsuya Mukawa
Post by Aaron Conole
Greetings,
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.
---
lib/librte_vhost/rte_vhost_version.map | 6 +++
lib/librte_vhost/rte_virtio_net.h | 3 ++
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
lib/librte_vhost/virtio-net.c | 56 +++++++++++++++++++++++++--
lib/librte_vhost/virtio-net.h | 4 +-
5 files changed, 70 insertions(+), 12 deletions(-)
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
rte_vhost_driver_unregister;
} DPDK_2.0;
+
+DPDK_2.2 {
+
+ rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 5687452..3ef6e58 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -128,6 +128,7 @@ struct virtio_net {
char ifname[IF_NAME_SZ]; /**< Name of the tap device or socket path. */
uint32_t virt_qp_nb; /**< number of queue pair we have allocated */
void *priv; /**< private context */
+ void *pmd_priv; /**< private context for vhost PMD */
struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; /**< Contains all virtqueue information. */
} __rte_cache_aligned;
Sorry if I'm missing something, but this is an ABI breaker, isn't it? I
think this needs the RTE_NEXT_ABI tag around it.
Hi Aaron,
Thanks for reviewing. Yes, your are correct.
I guess I can implement vhost PMD without this variable, so I will
remove it.
No need to.

The librte_vhost ABI has already been broken during the DPDK 2.2 cycle
by the multiqueue changes, but that's okay since it was announced during
2.1 cycle (in commit 3c848bd7b1c6f4f681b833322a748fdefbb5fb2d).

What is missing right now is bumping the library version, and that must
happen before 2.2 is released.

- Panu -
Tetsuya Mukawa
2015-11-10 09:48:04 UTC
Permalink
Post by Panu Matilainen
Post by Tetsuya Mukawa
Post by Aaron Conole
Greetings,
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.
---
lib/librte_vhost/rte_vhost_version.map | 6 +++
lib/librte_vhost/rte_virtio_net.h | 3 ++
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
lib/librte_vhost/virtio-net.c | 56
+++++++++++++++++++++++++--
lib/librte_vhost/virtio-net.h | 4 +-
5 files changed, 70 insertions(+), 12 deletions(-)
diff --git a/lib/librte_vhost/rte_vhost_version.map
b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
rte_vhost_driver_unregister;
} DPDK_2.0;
+
+DPDK_2.2 {
+
+ rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h
b/lib/librte_vhost/rte_virtio_net.h
index 5687452..3ef6e58 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -128,6 +128,7 @@ struct virtio_net {
char ifname[IF_NAME_SZ]; /**< Name of the tap
device or socket path. */
uint32_t virt_qp_nb; /**< number of queue pair we
have allocated */
void *priv; /**< private context */
+ void *pmd_priv; /**< private context for vhost
PMD */
struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS *
2]; /**< Contains all virtqueue information. */
} __rte_cache_aligned;
Sorry if I'm missing something, but this is an ABI breaker, isn't it? I
think this needs the RTE_NEXT_ABI tag around it.
Hi Aaron,
Thanks for reviewing. Yes, your are correct.
I guess I can implement vhost PMD without this variable, so I will
remove it.
No need to.
The librte_vhost ABI has already been broken during the DPDK 2.2 cycle
by the multiqueue changes, but that's okay since it was announced
during 2.1 cycle (in commit 3c848bd7b1c6f4f681b833322a748fdefbb5fb2d).
What is missing right now is bumping the library version, and that
must happen before 2.2 is released.
- Panu -
Hi Panu,

Thank you so much. Let me make sure what you mean.
I guess I need to add RTE_NEXT_ABI tags where pmd_priv is used. This is
because we don't break DPDK-2.1 ABI.
Anyway, the tag will be removed when DPDK-2.2 is released, then we can
use vhost PMD.
Is this correct?

Thanks,
Tetsuya
Panu Matilainen
2015-11-10 10:05:17 UTC
Permalink
Post by Tetsuya Mukawa
Post by Panu Matilainen
Post by Tetsuya Mukawa
Post by Aaron Conole
Greetings,
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.
---
lib/librte_vhost/rte_vhost_version.map | 6 +++
lib/librte_vhost/rte_virtio_net.h | 3 ++
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
lib/librte_vhost/virtio-net.c | 56
+++++++++++++++++++++++++--
lib/librte_vhost/virtio-net.h | 4 +-
5 files changed, 70 insertions(+), 12 deletions(-)
diff --git a/lib/librte_vhost/rte_vhost_version.map
b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
rte_vhost_driver_unregister;
} DPDK_2.0;
+
+DPDK_2.2 {
+
+ rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h
b/lib/librte_vhost/rte_virtio_net.h
index 5687452..3ef6e58 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -128,6 +128,7 @@ struct virtio_net {
char ifname[IF_NAME_SZ]; /**< Name of the tap
device or socket path. */
uint32_t virt_qp_nb; /**< number of queue pair we
have allocated */
void *priv; /**< private context */
+ void *pmd_priv; /**< private context for vhost
PMD */
struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS *
2]; /**< Contains all virtqueue information. */
} __rte_cache_aligned;
Sorry if I'm missing something, but this is an ABI breaker, isn't it? I
think this needs the RTE_NEXT_ABI tag around it.
Hi Aaron,
Thanks for reviewing. Yes, your are correct.
I guess I can implement vhost PMD without this variable, so I will
remove it.
No need to.
The librte_vhost ABI has already been broken during the DPDK 2.2 cycle
by the multiqueue changes, but that's okay since it was announced
during 2.1 cycle (in commit 3c848bd7b1c6f4f681b833322a748fdefbb5fb2d).
What is missing right now is bumping the library version, and that
must happen before 2.2 is released.
- Panu -
Hi Panu,
Thank you so much. Let me make sure what you mean.
I guess I need to add RTE_NEXT_ABI tags where pmd_priv is used. This is
because we don't break DPDK-2.1 ABI.
Anyway, the tag will be removed when DPDK-2.2 is released, then we can
use vhost PMD.
Is this correct?
Not quite. Because the ABI has already been broken between 2.1 and 2.2,
you can ride the same wave without messing with NEXT_ABI and such.

Like said, librte_vhost is pending a LIBABIVER bump to 2, but that is
regardless of this patch.

- Panu -
Tetsuya Mukawa
2015-11-10 10:15:32 UTC
Permalink
Post by Panu Matilainen
Post by Tetsuya Mukawa
Post by Panu Matilainen
Post by Tetsuya Mukawa
Post by Aaron Conole
Greetings,
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.
---
lib/librte_vhost/rte_vhost_version.map | 6 +++
lib/librte_vhost/rte_virtio_net.h | 3 ++
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
lib/librte_vhost/virtio-net.c | 56
+++++++++++++++++++++++++--
lib/librte_vhost/virtio-net.h | 4 +-
5 files changed, 70 insertions(+), 12 deletions(-)
diff --git a/lib/librte_vhost/rte_vhost_version.map
b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
rte_vhost_driver_unregister;
} DPDK_2.0;
+
+DPDK_2.2 {
+
+ rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h
b/lib/librte_vhost/rte_virtio_net.h
index 5687452..3ef6e58 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -128,6 +128,7 @@ struct virtio_net {
char ifname[IF_NAME_SZ]; /**< Name of the tap
device or socket path. */
uint32_t virt_qp_nb; /**< number of queue pair we
have allocated */
void *priv; /**< private context */
+ void *pmd_priv; /**< private context for vhost
PMD */
struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS *
2]; /**< Contains all virtqueue information. */
} __rte_cache_aligned;
Sorry if I'm missing something, but this is an ABI breaker, isn't it? I
think this needs the RTE_NEXT_ABI tag around it.
Hi Aaron,
Thanks for reviewing. Yes, your are correct.
I guess I can implement vhost PMD without this variable, so I will
remove it.
No need to.
The librte_vhost ABI has already been broken during the DPDK 2.2 cycle
by the multiqueue changes, but that's okay since it was announced
during 2.1 cycle (in commit 3c848bd7b1c6f4f681b833322a748fdefbb5fb2d).
What is missing right now is bumping the library version, and that
must happen before 2.2 is released.
- Panu -
Hi Panu,
Thank you so much. Let me make sure what you mean.
I guess I need to add RTE_NEXT_ABI tags where pmd_priv is used. This is
because we don't break DPDK-2.1 ABI.
Anyway, the tag will be removed when DPDK-2.2 is released, then we can
use vhost PMD.
Is this correct?
Not quite. Because the ABI has already been broken between 2.1 and
2.2, you can ride the same wave without messing with NEXT_ABI and such.
Like said, librte_vhost is pending a LIBABIVER bump to 2, but that is
regardless of this patch.
- Panu -
Thanks. I can clearly understand.

Tetsuya
Tetsuya Mukawa
2015-11-09 05:17:01 UTC
Permalink
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
- iface: The parameter is used to specify a path to connect to a
virtio-net device.
- queues: The parameter is used to specify the number of the queues
virtio-net device has.
(Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
-device virtio-net-pci,netdev=net0

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
config/common_linuxapp | 6 +
doc/guides/nics/index.rst | 1 +
doc/guides/rel_notes/release_2_2.rst | 2 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 62 +++
drivers/net/vhost/rte_eth_vhost.c | 768 ++++++++++++++++++++++++++++
drivers/net/vhost/rte_eth_vhost.h | 65 +++
drivers/net/vhost/rte_pmd_vhost_version.map | 8 +
mk/rte.app.mk | 8 +-
9 files changed, 923 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_eth_vhost.h
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 7248262..a264c11 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -458,6 +458,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
CONFIG_RTE_LIBRTE_VHOST_DEBUG=n

#
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
#Compile Xen domain0 support
#
CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 2d4936d..57d1041 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -47,6 +47,7 @@ Network Interface Controller Drivers
mlx4
mlx5
virtio
+ vhost
vmxnet3
pcap_ring

diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 59dda59..4b5644d 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -90,6 +90,8 @@ New Features

* **Added vhost-user multiple queue support.**

+* **Added vhost PMD.**
+
* **Added port hotplug support to vmxnet3.**

* **Added port hotplug support to xenvirt.**
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 6da1ce2..66eb63d 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -50,5 +50,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt

+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
include $(RTE_SDK)/mk/rte.sharelib.mk
include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..8186a80
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+# BSD LICENSE
+#
+# Copyright (c) 2010-2015 Intel Corporation.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in
+# the documentation and/or other materials provided with the
+# distribution.
+# * Neither the name of Intel corporation nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..ff983b5
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,768 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015 IGEL Co., Ltd.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of IGEL Co.,Ltd. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG "iface"
+#define ETH_VHOST_QUEUES_ARG "queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+ ETH_VHOST_IFACE_ARG,
+ ETH_VHOST_QUEUES_ARG,
+ NULL
+};
+
+static struct ether_addr base_eth_addr = {
+ .addr_bytes = {
+ 0x56 /* V */,
+ 0x48 /* H */,
+ 0x4F /* O */,
+ 0x53 /* S */,
+ 0x54 /* T */,
+ 0x00
+ }
+};
+
+struct vhost_queue {
+ rte_atomic32_t allow_queuing;
+ rte_atomic32_t while_queuing;
+ struct virtio_net *device;
+ struct pmd_internal *internal;
+ struct rte_mempool *mb_pool;
+ uint16_t virtqueue_id;
+ uint64_t rx_pkts;
+ uint64_t tx_pkts;
+ uint64_t err_pkts;
+};
+
+struct pmd_internal {
+ TAILQ_ENTRY(pmd_internal) next;
+ char *dev_name;
+ char *iface_name;
+ unsigned nb_rx_queues;
+ unsigned nb_tx_queues;
+
+ struct vhost_queue *rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+ struct vhost_queue *tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+
+ volatile uint16_t once;
+ pthread_t session_th;
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+ TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static struct rte_eth_link pmd_link = {
+ .link_speed = 10000,
+ .link_duplex = ETH_LINK_FULL_DUPLEX,
+ .link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t nb_rx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Dequeue packets from guest TX queue */
+ nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+ r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
+
+ r->rx_pkts += nb_rx;
+
+out:
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+ r->virtqueue_id, bufs, nb_bufs);
+
+ r->tx_pkts += nb_tx;
+ r->err_pkts += nb_bufs - nb_tx;
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
+
+out:
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+ int found = 0;
+ struct pmd_internal *internal;
+
+ if (ifname == NULL)
+ return NULL;
+
+ pthread_mutex_lock(&internal_list_lock);
+
+ TAILQ_FOREACH(internal, &internals_list, next) {
+ if (!strcmp(internal->iface_name, ifname)) {
+ found = 1;
+ break;
+ }
+ }
+
+ pthread_mutex_unlock(&internal_list_lock);
+
+ if (!found)
+ return NULL;
+
+ return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "Invalid argument\n");
+ return -1;
+ }
+
+ internal = find_internal_resource(dev->ifname);
+ if (internal == NULL) {
+ RTE_LOG(INFO, PMD, "Invalid device name\n");
+ return -1;
+ }
+
+ if ((dev->virt_qp_nb < internal->nb_rx_queues) ||
+ (dev->virt_qp_nb < internal->nb_tx_queues)) {
+ RTE_LOG(INFO, PMD, "Not enough queues\n");
+ return -1;
+ }
+
+ eth_dev = rte_eth_dev_allocated(internal->dev_name);
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+ return -1;
+ }
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = dev;
+ vq->internal = internal;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = dev;
+ vq->internal = internal;
+ }
+
+ dev->flags |= VIRTIO_DEV_RUNNING;
+ dev->pmd_priv = eth_dev;
+ eth_dev->data->dev_link.link_status = 1;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 1);
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 1);
+ }
+ RTE_LOG(INFO, PMD, "New connection established\n");
+
+ return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "Invalid argument\n");
+ return;
+ }
+
+ eth_dev = (struct rte_eth_dev *)dev->pmd_priv;
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+ return;
+ }
+
+ internal = eth_dev->data->dev_private;
+
+ /* Wait until rx/tx_pkt_burst stops accessing vhost device */
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 0);
+ while (rte_atomic32_read(&vq->while_queuing))
+ rte_pause();
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 0);
+ while (rte_atomic32_read(&vq->while_queuing))
+ rte_pause();
+ }
+
+ eth_dev->data->dev_link.link_status = 0;
+
+ dev->pmd_priv = NULL;
+ dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = NULL;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = NULL;
+ }
+
+ RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *vhost_driver_session(void *param __rte_unused)
+{
+ static struct virtio_net_device_ops *vhost_ops;
+
+ vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+ if (vhost_ops == NULL)
+ rte_panic("Can't allocate memory\n");
+
+ /* set vhost arguments */
+ vhost_ops->new_device = new_device;
+ vhost_ops->destroy_device = destroy_device;
+ if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
+ rte_panic("Can't register callbacks\n");
+
+ /* start event handling */
+ rte_vhost_driver_session_start();
+
+ rte_free(vhost_ops);
+ pthread_exit(0);
+}
+
+static void vhost_driver_session_start(struct pmd_internal *internal)
+{
+ int ret;
+
+ ret = pthread_create(&internal->session_th,
+ NULL, vhost_driver_session, NULL);
+ if (ret)
+ rte_panic("Can't create a thread\n");
+}
+
+static void vhost_driver_session_stop(struct pmd_internal *internal)
+{
+ int ret;
+
+ ret = pthread_cancel(internal->session_th);
+ if (ret)
+ rte_panic("Can't cancel the thread\n");
+
+ ret = pthread_join(internal->session_th, NULL);
+ if (ret)
+ rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+ int ret;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+ ret = rte_vhost_driver_register(internal->iface_name);
+ if (ret)
+ return ret;
+
+ vhost_driver_session_start(internal);
+ }
+ return 0;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ if (rte_atomic16_cmpset(&internal->once, 1, 0)) {
+ rte_vhost_driver_unregister(internal->iface_name);
+ vhost_driver_session_stop(internal);
+ }
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+ struct vhost_queue *vq;
+
+ if (internal->rx_vhost_queues[rx_queue_id] != NULL)
+ rte_free(internal->rx_vhost_queues[rx_queue_id]);
+
+ vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (vq == NULL) {
+ RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+ return -ENOMEM;
+ }
+
+ vq->mb_pool = mb_pool;
+ vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+ internal->rx_vhost_queues[rx_queue_id] = vq;
+ dev->data->rx_queues[rx_queue_id] = vq;
+ return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+ struct vhost_queue *vq;
+
+ if (internal->tx_vhost_queues[tx_queue_id] != NULL)
+ rte_free(internal->tx_vhost_queues[tx_queue_id]);
+
+ vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (vq == NULL) {
+ RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+ return -ENOMEM;
+ }
+
+ vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+ internal->tx_vhost_queues[tx_queue_id] = vq;
+ dev->data->tx_queues[tx_queue_id] = vq;
+ return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+ struct rte_eth_dev_info *dev_info)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ dev_info->driver_name = drivername;
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)-1;
+ dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+ dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+ unsigned i;
+ unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+ const struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] == NULL)
+ continue;
+ igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
+ rx_total += igb_stats->q_ipackets[i];
+ }
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] == NULL)
+ continue;
+ igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
+ igb_stats->q_errors[i] = internal->tx_vhost_queues[i]->err_pkts;
+ tx_total += igb_stats->q_opackets[i];
+ tx_err_total += igb_stats->q_errors[i];
+ }
+
+ igb_stats->ipackets = rx_total;
+ igb_stats->opackets = tx_total;
+ igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+ unsigned i;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] == NULL)
+ continue;
+ internal->rx_vhost_queues[i]->rx_pkts = 0;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] == NULL)
+ continue;
+ internal->tx_vhost_queues[i]->tx_pkts = 0;
+ internal->tx_vhost_queues[i]->err_pkts = 0;
+ }
+}
+
+static void
+eth_queue_release(void *q __rte_unused) { ; }
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused) { return 0; }
+
+static const struct eth_dev_ops ops = {
+ .dev_start = eth_dev_start,
+ .dev_stop = eth_dev_stop,
+ .dev_configure = eth_dev_configure,
+ .dev_infos_get = eth_dev_info,
+ .rx_queue_setup = eth_rx_queue_setup,
+ .tx_queue_setup = eth_tx_queue_setup,
+ .rx_queue_release = eth_queue_release,
+ .tx_queue_release = eth_queue_release,
+ .link_update = eth_link_update,
+ .stats_get = eth_stats_get,
+ .stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+ char *iface_name,
+ int16_t queues,
+ const unsigned numa_node)
+{
+ struct rte_eth_dev_data *data = NULL;
+ struct pmd_internal *internal = NULL;
+ struct rte_eth_dev *eth_dev = NULL;
+ struct ether_addr *eth_addr = NULL;
+
+ RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+ numa_node);
+
+ /* now do all data allocation - for eth_dev structure, dummy pci driver
+ * and internal (private) data
+ */
+ data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+ if (data == NULL)
+ goto error;
+
+ internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+ if (internal == NULL)
+ goto error;
+
+ eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+ if (eth_addr == NULL)
+ goto error;
+ *eth_addr = base_eth_addr;
+ eth_addr->addr_bytes[5] = index;
+
+ /* reserve an ethdev entry */
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+ if (eth_dev == NULL)
+ goto error;
+
+ /* now put it all together
+ * - store queue data in internal,
+ * - store numa_node info in ethdev data
+ * - point eth_dev_data to internals
+ * - and point eth_dev structure to new eth_dev_data structure
+ */
+ internal->nb_rx_queues = queues;
+ internal->nb_tx_queues = queues;
+ internal->dev_name = strdup(name);
+ if (internal->dev_name == NULL)
+ goto error;
+ internal->iface_name = strdup(iface_name);
+ if (internal->iface_name == NULL)
+ goto error;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_INSERT_TAIL(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ data->dev_private = internal;
+ data->port_id = eth_dev->data->port_id;
+ memmove(data->name, eth_dev->data->name, sizeof(data->name));
+ data->nb_rx_queues = queues;
+ data->nb_tx_queues = queues;
+ data->dev_link = pmd_link;
+ data->mac_addrs = eth_addr;
+
+ /* We'll replace the 'data' originally allocated by eth_dev. So the
+ * vhost PMD resources won't be shared between multi processes.
+ */
+ eth_dev->data = data;
+ eth_dev->dev_ops = &ops;
+ eth_dev->driver = NULL;
+ eth_dev->data->dev_flags = RTE_ETH_DEV_DETACHABLE;
+ eth_dev->data->kdrv = RTE_KDRV_NONE;
+ eth_dev->data->drv_name = internal->dev_name;
+ eth_dev->data->numa_node = numa_node;
+
+ /* finally assign rx and tx ops */
+ eth_dev->rx_pkt_burst = eth_vhost_rx;
+ eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+ return data->port_id;
+
+error:
+ rte_free(data);
+ rte_free(internal);
+ rte_free(eth_addr);
+
+ return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **iface_name = extra_args;
+
+ if (value == NULL)
+ return -1;
+
+ *iface_name = value;
+
+ return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint16_t *q = extra_args;
+
+ if ((value == NULL) || (extra_args == NULL))
+ return -EINVAL;
+
+ *q = (uint16_t)strtoul(value, NULL, 0);
+ if ((*q == USHRT_MAX) && (errno == ERANGE))
+ return -1;
+
+ if (*q > RTE_MAX_QUEUES_PER_PORT)
+ return -1;
+
+ return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+ struct rte_kvargs *kvlist = NULL;
+ int ret = 0;
+ int index;
+ char *iface_name;
+ uint16_t queues;
+
+ RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+ kvlist = rte_kvargs_parse(params, valid_arguments);
+ if (kvlist == NULL)
+ return -1;
+
+ if (strlen(name) < strlen("eth_vhost"))
+ return -1;
+
+ index = strtol(name + strlen("eth_vhost"), NULL, 0);
+ if (errno == ERANGE)
+ return -1;
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+ &open_iface, &iface_name);
+ if (ret < 0)
+ goto out_free;
+ }
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+ &open_queues, &queues);
+ if (ret < 0)
+ goto out_free;
+
+ } else
+ queues = 1;
+
+ eth_dev_vhost_create(name, index,
+ iface_name, queues, rte_socket_id());
+
+out_free:
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+ struct rte_eth_dev *eth_dev = NULL;
+ struct pmd_internal *internal;
+ unsigned int i;
+
+ RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+ if (name == NULL)
+ return -EINVAL;
+
+ /* find an ethdev entry */
+ eth_dev = rte_eth_dev_allocated(name);
+ if (eth_dev == NULL)
+ return -ENODEV;
+
+ internal = eth_dev->data->dev_private;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_REMOVE(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ eth_dev_stop(eth_dev);
+
+ if ((internal) && (internal->dev_name))
+ free(internal->dev_name);
+ if ((internal) && (internal->iface_name))
+ free(internal->iface_name);
+
+ rte_free(eth_dev->data->mac_addrs);
+ rte_free(eth_dev->data);
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] != NULL)
+ rte_free(internal->rx_vhost_queues[i]);
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] != NULL)
+ rte_free(internal->tx_vhost_queues[i]);
+ }
+ rte_free(internal);
+
+ rte_eth_dev_release_port(eth_dev);
+
+ return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+ .name = "eth_vhost",
+ .type = PMD_VDEV,
+ .init = rte_pmd_vhost_devinit,
+ .uninit = rte_pmd_vhost_devuninit,
+};
+
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
+{
+ struct rte_eth_dev *eth_dev;
+
+ if (rte_eth_dev_is_valid_port(port_id) == 0)
+ return NULL;
+
+ eth_dev = &rte_eth_devices[port_id];
+ if (strncmp("eth_vhost", eth_dev->data->drv_name,
+ strlen("eth_vhost")) == 0) {
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+
+ internal = eth_dev->data->dev_private;
+ vq = internal->rx_vhost_queues[0];
+ if ((vq != NULL) && (vq->device != NULL))
+ return vq->device;
+ }
+
+ return NULL;
+}
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..22a880f
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,65 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 IGEL Co., Ltd.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of IGEL Co., Ltd. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_AF_PACKET_H_
+#define _RTE_ETH_AF_PACKET_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_virtio_net.h>
+
+/**
+ * The function convert specified port_id to virtio device structure.
+ * The retured device can be used for vhost library APIs.
+ * To use vhost library APIs and vhost PMD parallely, below API should
+ * not be called, because the API will be called by vhost PMD.
+ * - rte_vhost_driver_session_start()
+ * Once a device is managed by vhost PMD, below API should not be called.
+ * - rte_vhost_driver_unregister()
+ * To unregister the device, call Port Hotplug APIs.
+ *
+ * @param port_id
+ * port number
+ * @return
+ * virtio net device structure corresponding to the specified port
+ * NULL will be returned in error cases.
+ */
+struct virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..bf0361a
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,8 @@
+DPDK_2.2 {
+
+ global:
+
+ rte_eth_vhost_portid2vdev;
+
+ local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 724efa7..1af4bb3 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -148,7 +148,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += -lrte_pmd_af_packet
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += -lrte_pmd_null

-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)

endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
--
2.1.4
Yuanhan Liu
2015-11-09 06:21:42 UTC
Permalink
Hi Tetsuya,

Here I just got some minor nits after a very rough glimpse.

On Mon, Nov 09, 2015 at 02:17:01PM +0900, Tetsuya Mukawa wrote:
...
Post by Tetsuya Mukawa
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t nb_rx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Dequeue packets from guest TX queue */
+ nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+ r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
Unnecessary cast, as rte_vhost_enqueue_burst is defined with uint16_t
return type.
Post by Tetsuya Mukawa
+
+ r->rx_pkts += nb_rx;
+
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+ r->virtqueue_id, bufs, nb_bufs);
Ditto.
Post by Tetsuya Mukawa
+
+ r->tx_pkts += nb_tx;
+ r->err_pkts += nb_bufs - nb_tx;
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
+
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
I personally would not prefer to saving few lines of code to sacrifice
the readability.
Post by Tetsuya Mukawa
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+ struct vhost_queue *vq;
+
+ if (internal->rx_vhost_queues[rx_queue_id] != NULL)
+ rte_free(internal->rx_vhost_queues[rx_queue_id]);
Such NULL check is unnecessary; rte_free will handle it.
Post by Tetsuya Mukawa
+
+ vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (vq == NULL) {
+ RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+ return -ENOMEM;
+ }
+
+ vq->mb_pool = mb_pool;
+ vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+ internal->rx_vhost_queues[rx_queue_id] = vq;
+ dev->data->rx_queues[rx_queue_id] = vq;
+ return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+ struct vhost_queue *vq;
+
+ if (internal->tx_vhost_queues[tx_queue_id] != NULL)
+ rte_free(internal->tx_vhost_queues[tx_queue_id]);
Ditto.
Post by Tetsuya Mukawa
+
+ vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (vq == NULL) {
+ RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+ return -ENOMEM;
+ }
+
+ vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+ internal->tx_vhost_queues[tx_queue_id] = vq;
+ dev->data->tx_queues[tx_queue_id] = vq;
+ return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+ struct rte_eth_dev_info *dev_info)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ dev_info->driver_name = drivername;
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)-1;
+ dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+ dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+ unsigned i;
+ unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+ const struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] == NULL)
+ continue;
+ igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
+ rx_total += igb_stats->q_ipackets[i];
+ }
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] == NULL)
+ continue;
+ igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
+ igb_stats->q_errors[i] = internal->tx_vhost_queues[i]->err_pkts;
+ tx_total += igb_stats->q_opackets[i];
+ tx_err_total += igb_stats->q_errors[i];
+ }
+
+ igb_stats->ipackets = rx_total;
+ igb_stats->opackets = tx_total;
+ igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+ unsigned i;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] == NULL)
+ continue;
+ internal->rx_vhost_queues[i]->rx_pkts = 0;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] == NULL)
+ continue;
+ internal->tx_vhost_queues[i]->tx_pkts = 0;
+ internal->tx_vhost_queues[i]->err_pkts = 0;
+ }
+}
+
+static void
+eth_queue_release(void *q __rte_unused) { ; }
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused) { return 0; }
Ditto.
Post by Tetsuya Mukawa
+
+static const struct eth_dev_ops ops = {
+ .dev_start = eth_dev_start,
+ .dev_stop = eth_dev_stop,
+ .dev_configure = eth_dev_configure,
+ .dev_infos_get = eth_dev_info,
+ .rx_queue_setup = eth_rx_queue_setup,
+ .tx_queue_setup = eth_tx_queue_setup,
+ .rx_queue_release = eth_queue_release,
+ .tx_queue_release = eth_queue_release,
+ .link_update = eth_link_update,
+ .stats_get = eth_stats_get,
+ .stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+ char *iface_name,
+ int16_t queues,
+ const unsigned numa_node)
+{
+ struct rte_eth_dev_data *data = NULL;
+ struct pmd_internal *internal = NULL;
+ struct rte_eth_dev *eth_dev = NULL;
+ struct ether_addr *eth_addr = NULL;
+
+ RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+ numa_node);
+
+ /* now do all data allocation - for eth_dev structure, dummy pci driver
+ * and internal (private) data
+ */
+ data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+ if (data == NULL)
+ goto error;
+
+ internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+ if (internal == NULL)
+ goto error;
+
+ eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+ if (eth_addr == NULL)
+ goto error;
+ *eth_addr = base_eth_addr;
+ eth_addr->addr_bytes[5] = index;
+
+ /* reserve an ethdev entry */
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+ if (eth_dev == NULL)
+ goto error;
+
+ /* now put it all together
+ * - store queue data in internal,
+ * - store numa_node info in ethdev data
+ * - point eth_dev_data to internals
+ * - and point eth_dev structure to new eth_dev_data structure
+ */
+ internal->nb_rx_queues = queues;
+ internal->nb_tx_queues = queues;
+ internal->dev_name = strdup(name);
+ if (internal->dev_name == NULL)
+ goto error;
+ internal->iface_name = strdup(iface_name);
+ if (internal->iface_name == NULL)
+ goto error;
If allocation failed here, you will find that internal->dev_name is not
freed.
Post by Tetsuya Mukawa
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_INSERT_TAIL(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ data->dev_private = internal;
+ data->port_id = eth_dev->data->port_id;
+ memmove(data->name, eth_dev->data->name, sizeof(data->name));
+ data->nb_rx_queues = queues;
+ data->nb_tx_queues = queues;
+ data->dev_link = pmd_link;
+ data->mac_addrs = eth_addr;
+
+ /* We'll replace the 'data' originally allocated by eth_dev. So the
+ * vhost PMD resources won't be shared between multi processes.
+ */
+ eth_dev->data = data;
+ eth_dev->dev_ops = &ops;
+ eth_dev->driver = NULL;
+ eth_dev->data->dev_flags = RTE_ETH_DEV_DETACHABLE;
+ eth_dev->data->kdrv = RTE_KDRV_NONE;
+ eth_dev->data->drv_name = internal->dev_name;
+ eth_dev->data->numa_node = numa_node;
+
+ /* finally assign rx and tx ops */
+ eth_dev->rx_pkt_burst = eth_vhost_rx;
+ eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+ return data->port_id;
+
+ rte_free(data);
+ rte_free(internal);
+ rte_free(eth_addr);
+
+ return -1;
+}
...
...
Post by Tetsuya Mukawa
+
+ if ((internal) && (internal->dev_name))
+ free(internal->dev_name);
+ if ((internal) && (internal->iface_name))
+ free(internal->iface_name);
+
+ rte_free(eth_dev->data->mac_addrs);
+ rte_free(eth_dev->data);
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] != NULL)
+ rte_free(internal->rx_vhost_queues[i]);
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] != NULL)
+ rte_free(internal->tx_vhost_queues[i]);
Ditto.

(Hopefully I could have a detailed review later, say next week).

--yliu
Tetsuya Mukawa
2015-11-09 06:27:21 UTC
Permalink
Hi Liu,

Thank you so much for your reviewing.
I will fix them, then submit again in this week.

Thanks,
Tetsuya
Post by Bruce Richardson
Hi Tetsuya,
Here I just got some minor nits after a very rough glimpse.
...
Post by Tetsuya Mukawa
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t nb_rx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Dequeue packets from guest TX queue */
+ nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+ r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
Unnecessary cast, as rte_vhost_enqueue_burst is defined with uint16_t
return type.
Post by Tetsuya Mukawa
+
+ r->rx_pkts += nb_rx;
+
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+ r->virtqueue_id, bufs, nb_bufs);
Ditto.
Post by Tetsuya Mukawa
+
+ r->tx_pkts += nb_tx;
+ r->err_pkts += nb_bufs - nb_tx;
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
+
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
I personally would not prefer to saving few lines of code to sacrifice
the readability.
Post by Tetsuya Mukawa
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+ struct vhost_queue *vq;
+
+ if (internal->rx_vhost_queues[rx_queue_id] != NULL)
+ rte_free(internal->rx_vhost_queues[rx_queue_id]);
Such NULL check is unnecessary; rte_free will handle it.
Post by Tetsuya Mukawa
+
+ vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (vq == NULL) {
+ RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+ return -ENOMEM;
+ }
+
+ vq->mb_pool = mb_pool;
+ vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+ internal->rx_vhost_queues[rx_queue_id] = vq;
+ dev->data->rx_queues[rx_queue_id] = vq;
+ return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+ struct vhost_queue *vq;
+
+ if (internal->tx_vhost_queues[tx_queue_id] != NULL)
+ rte_free(internal->tx_vhost_queues[tx_queue_id]);
Ditto.
Post by Tetsuya Mukawa
+
+ vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (vq == NULL) {
+ RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+ return -ENOMEM;
+ }
+
+ vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+ internal->tx_vhost_queues[tx_queue_id] = vq;
+ dev->data->tx_queues[tx_queue_id] = vq;
+ return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+ struct rte_eth_dev_info *dev_info)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ dev_info->driver_name = drivername;
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)-1;
+ dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+ dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+ unsigned i;
+ unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+ const struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] == NULL)
+ continue;
+ igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
+ rx_total += igb_stats->q_ipackets[i];
+ }
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] == NULL)
+ continue;
+ igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
+ igb_stats->q_errors[i] = internal->tx_vhost_queues[i]->err_pkts;
+ tx_total += igb_stats->q_opackets[i];
+ tx_err_total += igb_stats->q_errors[i];
+ }
+
+ igb_stats->ipackets = rx_total;
+ igb_stats->opackets = tx_total;
+ igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+ unsigned i;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] == NULL)
+ continue;
+ internal->rx_vhost_queues[i]->rx_pkts = 0;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] == NULL)
+ continue;
+ internal->tx_vhost_queues[i]->tx_pkts = 0;
+ internal->tx_vhost_queues[i]->err_pkts = 0;
+ }
+}
+
+static void
+eth_queue_release(void *q __rte_unused) { ; }
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused) { return 0; }
Ditto.
Post by Tetsuya Mukawa
+
+static const struct eth_dev_ops ops = {
+ .dev_start = eth_dev_start,
+ .dev_stop = eth_dev_stop,
+ .dev_configure = eth_dev_configure,
+ .dev_infos_get = eth_dev_info,
+ .rx_queue_setup = eth_rx_queue_setup,
+ .tx_queue_setup = eth_tx_queue_setup,
+ .rx_queue_release = eth_queue_release,
+ .tx_queue_release = eth_queue_release,
+ .link_update = eth_link_update,
+ .stats_get = eth_stats_get,
+ .stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+ char *iface_name,
+ int16_t queues,
+ const unsigned numa_node)
+{
+ struct rte_eth_dev_data *data = NULL;
+ struct pmd_internal *internal = NULL;
+ struct rte_eth_dev *eth_dev = NULL;
+ struct ether_addr *eth_addr = NULL;
+
+ RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+ numa_node);
+
+ /* now do all data allocation - for eth_dev structure, dummy pci driver
+ * and internal (private) data
+ */
+ data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+ if (data == NULL)
+ goto error;
+
+ internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+ if (internal == NULL)
+ goto error;
+
+ eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+ if (eth_addr == NULL)
+ goto error;
+ *eth_addr = base_eth_addr;
+ eth_addr->addr_bytes[5] = index;
+
+ /* reserve an ethdev entry */
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+ if (eth_dev == NULL)
+ goto error;
+
+ /* now put it all together
+ * - store queue data in internal,
+ * - store numa_node info in ethdev data
+ * - point eth_dev_data to internals
+ * - and point eth_dev structure to new eth_dev_data structure
+ */
+ internal->nb_rx_queues = queues;
+ internal->nb_tx_queues = queues;
+ internal->dev_name = strdup(name);
+ if (internal->dev_name == NULL)
+ goto error;
+ internal->iface_name = strdup(iface_name);
+ if (internal->iface_name == NULL)
+ goto error;
If allocation failed here, you will find that internal->dev_name is not
freed.
Post by Tetsuya Mukawa
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_INSERT_TAIL(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ data->dev_private = internal;
+ data->port_id = eth_dev->data->port_id;
+ memmove(data->name, eth_dev->data->name, sizeof(data->name));
+ data->nb_rx_queues = queues;
+ data->nb_tx_queues = queues;
+ data->dev_link = pmd_link;
+ data->mac_addrs = eth_addr;
+
+ /* We'll replace the 'data' originally allocated by eth_dev. So the
+ * vhost PMD resources won't be shared between multi processes.
+ */
+ eth_dev->data = data;
+ eth_dev->dev_ops = &ops;
+ eth_dev->driver = NULL;
+ eth_dev->data->dev_flags = RTE_ETH_DEV_DETACHABLE;
+ eth_dev->data->kdrv = RTE_KDRV_NONE;
+ eth_dev->data->drv_name = internal->dev_name;
+ eth_dev->data->numa_node = numa_node;
+
+ /* finally assign rx and tx ops */
+ eth_dev->rx_pkt_burst = eth_vhost_rx;
+ eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+ return data->port_id;
+
+ rte_free(data);
+ rte_free(internal);
+ rte_free(eth_addr);
+
+ return -1;
+}
...
...
Post by Tetsuya Mukawa
+
+ if ((internal) && (internal->dev_name))
+ free(internal->dev_name);
+ if ((internal) && (internal->iface_name))
+ free(internal->iface_name);
+
+ rte_free(eth_dev->data->mac_addrs);
+ rte_free(eth_dev->data);
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] != NULL)
+ rte_free(internal->rx_vhost_queues[i]);
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] != NULL)
+ rte_free(internal->tx_vhost_queues[i]);
Ditto.
(Hopefully I could have a detailed review later, say next week).
--yliu
Stephen Hemminger
2015-11-09 22:22:05 UTC
Permalink
On Mon, 9 Nov 2015 14:17:01 +0900
Post by Tetsuya Mukawa
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
You special 2 variable custom locking here is buggy.
If you hit second atomic test, you will leave while_queuing set.
Tetsuya Mukawa
2015-11-10 03:14:19 UTC
Permalink
Post by Stephen Hemminger
On Mon, 9 Nov 2015 14:17:01 +0900
Post by Tetsuya Mukawa
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
You special 2 variable custom locking here is buggy.
If you hit second atomic test, you will leave while_queuing set.
Hi Stephen,

Thanks for reviewing.
I clear while_queuing like below.

+out:
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_rx;
+}

Thanks,
tetsuya
Tetsuya Mukawa
2015-11-13 05:37:58 UTC
Permalink
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.
* Known issue.
We may see issues while handling RESET_OWNER message.
These handlings are done in vhost library, so not a part of vhost PMD.
So far, we are waiting for QEMU fixing.
Fix patches have already been applied. Please help test :)
--yliu
Thanks!
I have checked it, and it worked!

Tetsuya
Tetsuya Mukawa
2015-11-13 03:09:49 UTC
Permalink
Post by Bruce Richardson
Hi Tetsuya,
In my test I created 2 vdev using "--vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' --vdev 'eth_vhost1,iface=/tmp/sock1,queues=1'", and the qemu message got handled in wrong order.
The reason is that: 2 threads are created to handle message from 2 sockets, but their fds are SHARED, so each thread are reading from both sockets.
This can lead to incorrect behaviors, in my case sometimes the VHOST_USER_SET_MEM_TABLE got handled after VRING initialization and lead to destroy_device().
Detailed log as shown below: thread 69351 & 69352 are both reading fd 25. Thanks Yuanhan for helping debugging!
Hi Zhihong and Yuanhan,

Thank you so much for debugging the issue.
I will fix vhost PMD not to create multiple message handling threads.

I am going to submit the PMD today.
Could you please check it again using latest one?

Tetsuya
Post by Bruce Richardson
Thanks
Zhihong
-----------------------------------------------------------------------------------------------------------------
----> debug: setting up new vq conn for fd: 23, tid: 69352
VHOST_CONFIG: new virtio connection is 25
VHOST_CONFIG: new device, handle is 0
----> debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_OWNER
----> debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
----> debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:26
----> debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:27
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:28
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:26
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE
----> debug: device_fh: 0: user_set_mem_table
VHOST_CONFIG: mapped region 0 fd:27 to 0x7ff6c0000000 sz:0xa0000 off:0x0
VHOST_CONFIG: mapped region 1 fd:29 to 0x7ff680000000 sz:0x40000000 off:0xc0000
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
VHOST_CONFIG: vring kick idx:0 file:30
----> debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: virtio is not ready for processing.
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
VHOST_CONFIG: vring kick idx:1 file:31
VHOST_CONFIG: virtio is now ready for processing.
PMD: New connection established
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
-----------------------------------------------------------------------------------------------------------------
Post by Tetsuya Mukawa
...
+
+static void *vhost_driver_session(void *param __rte_unused)
+{
+ static struct virtio_net_device_ops *vhost_ops;
+
+ vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+ if (vhost_ops == NULL)
+ rte_panic("Can't allocate memory\n");
+
+ /* set vhost arguments */
+ vhost_ops->new_device = new_device;
+ vhost_ops->destroy_device = destroy_device;
+ if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
+ rte_panic("Can't register callbacks\n");
+
+ /* start event handling */
+ rte_vhost_driver_session_start();
+
+ rte_free(vhost_ops);
+ pthread_exit(0);
+}
+
+static void vhost_driver_session_start(struct pmd_internal *internal)
+{
+ int ret;
+
+ ret = pthread_create(&internal->session_th,
+ NULL, vhost_driver_session, NULL);
+ if (ret)
+ rte_panic("Can't create a thread\n");
+}
+
...
Wang, Zhihong
2015-11-13 03:50:44 UTC
Permalink
-----Original Message-----
Sent: Friday, November 13, 2015 11:10 AM
Subject: Re: [dpdk-dev] [PATCH v3 2/2] vhost: Add VHOST PMD
Post by Bruce Richardson
Hi Tetsuya,
In my test I created 2 vdev using "--vdev
'eth_vhost0,iface=/tmp/sock0,queues=1' --vdev
'eth_vhost1,iface=/tmp/sock1,queues=1'", and the qemu message got handled
in wrong order.
Post by Bruce Richardson
The reason is that: 2 threads are created to handle message from 2 sockets, but
their fds are SHARED, so each thread are reading from both sockets.
Post by Bruce Richardson
This can lead to incorrect behaviors, in my case sometimes the
VHOST_USER_SET_MEM_TABLE got handled after VRING initialization and lead to
destroy_device().
Post by Bruce Richardson
Detailed log as shown below: thread 69351 & 69352 are both reading fd 25.
Thanks Yuanhan for helping debugging!
Hi Zhihong and Yuanhan,
Thank you so much for debugging the issue.
I will fix vhost PMD not to create multiple message handling threads.
I am going to submit the PMD today.
Could you please check it again using latest one?
Looking forward to it!
Tetsuya
Post by Bruce Richardson
Thanks
Zhihong
----------------------------------------------------------------------
-------------------------------------------
----> debug: setting up new vq conn for fd: 23, tid: 69352
VHOST_CONFIG: new virtio connection is 25
VHOST_CONFIG: new device, handle is 0
----> debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_OWNER
----> debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
----> debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:26
----> debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:27
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:28
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:26
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE
----> debug: device_fh: 0: user_set_mem_table
VHOST_CONFIG: mapped region 0 fd:27 to 0x7ff6c0000000 sz:0xa0000 off:0x0
VHOST_CONFIG: mapped region 1 fd:29 to 0x7ff680000000 sz:0x40000000 off:0xc0000
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
VHOST_CONFIG: vring kick idx:0 file:30
----> debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: virtio is not ready for processing.
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
----> debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
VHOST_CONFIG: vring kick idx:1 file:31
VHOST_CONFIG: virtio is now ready for processing.
PMD: New connection established
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
----------------------------------------------------------------------
-------------------------------------------
...
+
+static void *vhost_driver_session(void *param __rte_unused) {
+ static struct virtio_net_device_ops *vhost_ops;
+
+ vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+ if (vhost_ops == NULL)
+ rte_panic("Can't allocate memory\n");
+
+ /* set vhost arguments */
+ vhost_ops->new_device = new_device;
+ vhost_ops->destroy_device = destroy_device;
+ if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
+ rte_panic("Can't register callbacks\n");
+
+ /* start event handling */
+ rte_vhost_driver_session_start();
+
+ rte_free(vhost_ops);
+ pthread_exit(0);
+}
+
+static void vhost_driver_session_start(struct pmd_internal
+*internal) {
+ int ret;
+
+ ret = pthread_create(&internal->session_th,
+ NULL, vhost_driver_session, NULL);
+ if (ret)
+ rte_panic("Can't create a thread\n"); }
+
...
Tetsuya Mukawa
2015-11-13 04:29:35 UTC
Permalink
+ if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+ &open_iface, &iface_name);
+ if (ret < 0)
+ goto out_free;
+ }
I noticed that the strdup in eth_dev_vhost_create crashes if you don't pass
the iface option, so this should probably return an error if the option
doesn't exist.
Hi Lane,

Yes, you are correct. Thanks for checking!
I will fix it also.

Tetsuya
Tetsuya Mukawa
2015-11-13 06:50:16 UTC
Permalink
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.
* Known issue.
We may see issues while handling RESET_OWNER message.
These handlings are done in vhost library, so not a part of vhost PMD.
So far, we are waiting for QEMU fixing.
Fix patches have already been applied. Please help test :)
--yliu
Hi Yuanhan,

It seems there might be an another issue related with "vq->callfd" in
vhost library.
We may miss something to handle the value correctly.

Anyway, here are steps.
1. Apply vhost PMD patch.
(I guess you don't need it to reproduce the issue, but to reproduce it,
using the PMD may be easy)
2. Start testpmd on host with vhost-user PMD.
3. Start QEMU with virtio-net device.
4. Login QEMU.
5. Bind the virtio-net device to igb_uio.
6. Start testpmd in QEMU.
7. Quit testmd in QEMU.
8. Start testpmd again in QEMU.

It seems when last command is executed, testpmd on host doesn't receive
SET_VRING_CALL message from QEMU.
Because of this, testpmd on host assumes virtio-net device is not ready.
(I made sure virtio_is_ready() was failed on host).

According to QEMU source code, SET_VRING_KICK will be called when
virtqueue starts, but SET_VRING_CALL will be called when virtqueue is
initialized.
Not sure exactly, might be "vq->call" will be valid while connection is
established?

Also I've found a workaround.
Please execute after step7.

8. Bind the virtio-net device to virtio-pci kernel driver.
9. Bind the virtio-net device to igb_uio.
10. Start testpmd in QEMU.

When step8 is executed, connection will be re-established, and testpmd
on host will be able to receive SET_VRING_CALL.
Then testpmd on host can start.

Thanks,
Tetsuya
Yuanhan Liu
2015-11-17 13:26:36 UTC
Permalink
Post by Tetsuya Mukawa
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.
* Known issue.
We may see issues while handling RESET_OWNER message.
These handlings are done in vhost library, so not a part of vhost PMD.
So far, we are waiting for QEMU fixing.
Fix patches have already been applied. Please help test :)
--yliu
Hi Yuanhan,
It seems there might be an another issue related with "vq->callfd" in
vhost library.
We may miss something to handle the value correctly.
Anyway, here are steps.
1. Apply vhost PMD patch.
(I guess you don't need it to reproduce the issue, but to reproduce it,
using the PMD may be easy)
2. Start testpmd on host with vhost-user PMD.
3. Start QEMU with virtio-net device.
4. Login QEMU.
5. Bind the virtio-net device to igb_uio.
6. Start testpmd in QEMU.
7. Quit testmd in QEMU.
8. Start testpmd again in QEMU.
It seems when last command is executed, testpmd on host doesn't receive
SET_VRING_CALL message from QEMU.
Because of this, testpmd on host assumes virtio-net device is not ready.
(I made sure virtio_is_ready() was failed on host).
According to QEMU source code, SET_VRING_KICK will be called when
virtqueue starts, but SET_VRING_CALL will be called when virtqueue is
initialized.
Not sure exactly, might be "vq->call" will be valid while connection is
established?
Yes, it would be valid as far as we don't reset it from another
set_vring_call. So, we should not reset it on reset_device().

--yliu
Post by Tetsuya Mukawa
Also I've found a workaround.
Please execute after step7.
8. Bind the virtio-net device to virtio-pci kernel driver.
9. Bind the virtio-net device to igb_uio.
10. Start testpmd in QEMU.
When step8 is executed, connection will be re-established, and testpmd
on host will be able to receive SET_VRING_CALL.
Then testpmd on host can start.
Thanks,
Tetsuya
Tetsuya Mukawa
2015-11-19 01:20:48 UTC
Permalink
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.
* Known issue.
We may see issues while handling RESET_OWNER message.
These handlings are done in vhost library, so not a part of vhost PMD.
So far, we are waiting for QEMU fixing.
Fix patches have already been applied. Please help test :)
--yliu
Hi Yuanhan,
It seems there might be an another issue related with "vq->callfd" in
vhost library.
We may miss something to handle the value correctly.
Anyway, here are steps.
1. Apply vhost PMD patch.
(I guess you don't need it to reproduce the issue, but to reproduce it,
using the PMD may be easy)
2. Start testpmd on host with vhost-user PMD.
3. Start QEMU with virtio-net device.
4. Login QEMU.
5. Bind the virtio-net device to igb_uio.
6. Start testpmd in QEMU.
7. Quit testmd in QEMU.
8. Start testpmd again in QEMU.
It seems when last command is executed, testpmd on host doesn't receive
SET_VRING_CALL message from QEMU.
Because of this, testpmd on host assumes virtio-net device is not ready.
(I made sure virtio_is_ready() was failed on host).
According to QEMU source code, SET_VRING_KICK will be called when
virtqueue starts, but SET_VRING_CALL will be called when virtqueue is
initialized.
Not sure exactly, might be "vq->call" will be valid while connection is
established?
Yes, it would be valid as far as we don't reset it from another
set_vring_call. So, we should not reset it on reset_device().
--yliu
Hi Yuanhan,

Thanks for checking.
I will submit the patch for this today.

Tetsuya
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Also I've found a workaround.
Please execute after step7.
8. Bind the virtio-net device to virtio-pci kernel driver.
9. Bind the virtio-net device to igb_uio.
10. Start testpmd in QEMU.
When step8 is executed, connection will be re-established, and testpmd
on host will be able to receive SET_VRING_CALL.
Then testpmd on host can start.
Thanks,
Tetsuya
Tetsuya Mukawa
2015-11-13 05:20:31 UTC
Permalink
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
- iface: The parameter is used to specify a path to connect to a
virtio-net device.
- queues: The parameter is used to specify the number of the queues
virtio-net device has.
(Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
-device virtio-net-pci,netdev=net0

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
config/common_linuxapp | 6 +
doc/guides/nics/index.rst | 1 +
doc/guides/rel_notes/release_2_2.rst | 2 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 62 +++
drivers/net/vhost/rte_eth_vhost.c | 783 ++++++++++++++++++++++++++++
drivers/net/vhost/rte_eth_vhost.h | 65 +++
drivers/net/vhost/rte_pmd_vhost_version.map | 8 +
mk/rte.app.mk | 8 +-
9 files changed, 938 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_eth_vhost.h
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 52173d5..1ea23ef 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -461,6 +461,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
CONFIG_RTE_LIBRTE_VHOST_DEBUG=n

#
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
#Compile Xen domain0 support
#
CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 2d4936d..57d1041 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -47,6 +47,7 @@ Network Interface Controller Drivers
mlx4
mlx5
virtio
+ vhost
vmxnet3
pcap_ring

diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 1c02ff6..c2284d3 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -94,6 +94,8 @@ New Features

* **Added vhost-user multiple queue support.**

+* **Added vhost PMD.**
+
* **Added port hotplug support to vmxnet3.**

* **Added port hotplug support to xenvirt.**
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 6da1ce2..66eb63d 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -50,5 +50,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt

+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
include $(RTE_SDK)/mk/rte.sharelib.mk
include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..8186a80
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+# BSD LICENSE
+#
+# Copyright (c) 2010-2015 Intel Corporation.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in
+# the documentation and/or other materials provided with the
+# distribution.
+# * Neither the name of Intel corporation nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..7fb30fe
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,783 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015 IGEL Co., Ltd.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of IGEL Co.,Ltd. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG "iface"
+#define ETH_VHOST_QUEUES_ARG "queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+ ETH_VHOST_IFACE_ARG,
+ ETH_VHOST_QUEUES_ARG,
+ NULL
+};
+
+static struct ether_addr base_eth_addr = {
+ .addr_bytes = {
+ 0x56 /* V */,
+ 0x48 /* H */,
+ 0x4F /* O */,
+ 0x53 /* S */,
+ 0x54 /* T */,
+ 0x00
+ }
+};
+
+struct vhost_queue {
+ rte_atomic32_t allow_queuing;
+ rte_atomic32_t while_queuing;
+ struct virtio_net *device;
+ struct pmd_internal *internal;
+ struct rte_mempool *mb_pool;
+ uint16_t virtqueue_id;
+ uint64_t rx_pkts;
+ uint64_t tx_pkts;
+ uint64_t err_pkts;
+};
+
+struct pmd_internal {
+ TAILQ_ENTRY(pmd_internal) next;
+ char *dev_name;
+ char *iface_name;
+ unsigned nb_rx_queues;
+ unsigned nb_tx_queues;
+
+ struct vhost_queue *rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+ struct vhost_queue *tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+
+ volatile uint16_t once;
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+ TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+pthread_t session_th;
+
+static struct rte_eth_link pmd_link = {
+ .link_speed = 10000,
+ .link_duplex = ETH_LINK_FULL_DUPLEX,
+ .link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t nb_rx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Dequeue packets from guest TX queue */
+ nb_rx = rte_vhost_dequeue_burst(r->device,
+ r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
+
+ r->rx_pkts += nb_rx;
+
+out:
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = rte_vhost_enqueue_burst(r->device,
+ r->virtqueue_id, bufs, nb_bufs);
+
+ r->tx_pkts += nb_tx;
+ r->err_pkts += nb_bufs - nb_tx;
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
+
+out:
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+ return 0;
+}
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+ int found = 0;
+ struct pmd_internal *internal;
+
+ if (ifname == NULL)
+ return NULL;
+
+ pthread_mutex_lock(&internal_list_lock);
+
+ TAILQ_FOREACH(internal, &internals_list, next) {
+ if (!strcmp(internal->iface_name, ifname)) {
+ found = 1;
+ break;
+ }
+ }
+
+ pthread_mutex_unlock(&internal_list_lock);
+
+ if (!found)
+ return NULL;
+
+ return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "Invalid argument\n");
+ return -1;
+ }
+
+ internal = find_internal_resource(dev->ifname);
+ if (internal == NULL) {
+ RTE_LOG(INFO, PMD, "Invalid device name\n");
+ return -1;
+ }
+
+ if ((dev->virt_qp_nb < internal->nb_rx_queues) ||
+ (dev->virt_qp_nb < internal->nb_tx_queues)) {
+ RTE_LOG(INFO, PMD, "Not enough queues\n");
+ return -1;
+ }
+
+ eth_dev = rte_eth_dev_allocated(internal->dev_name);
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+ return -1;
+ }
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = dev;
+ vq->internal = internal;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = dev;
+ vq->internal = internal;
+ }
+
+ dev->flags |= VIRTIO_DEV_RUNNING;
+ dev->pmd_priv = eth_dev;
+ eth_dev->data->dev_link.link_status = 1;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 1);
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 1);
+ }
+ RTE_LOG(INFO, PMD, "New connection established\n");
+
+ return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "Invalid argument\n");
+ return;
+ }
+
+ eth_dev = (struct rte_eth_dev *)dev->pmd_priv;
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+ return;
+ }
+
+ internal = eth_dev->data->dev_private;
+
+ /* Wait until rx/tx_pkt_burst stops accessing vhost device */
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 0);
+ while (rte_atomic32_read(&vq->while_queuing))
+ rte_pause();
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 0);
+ while (rte_atomic32_read(&vq->while_queuing))
+ rte_pause();
+ }
+
+ eth_dev->data->dev_link.link_status = 0;
+
+ dev->pmd_priv = NULL;
+ dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = NULL;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = NULL;
+ }
+
+ RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *vhost_driver_session(void *param __rte_unused)
+{
+ static struct virtio_net_device_ops *vhost_ops;
+
+ vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+ if (vhost_ops == NULL)
+ rte_panic("Can't allocate memory\n");
+
+ /* set vhost arguments */
+ vhost_ops->new_device = new_device;
+ vhost_ops->destroy_device = destroy_device;
+ if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
+ rte_panic("Can't register callbacks\n");
+
+ /* start event handling */
+ rte_vhost_driver_session_start();
+
+ rte_free(vhost_ops);
+ pthread_exit(0);
+}
+
+static void vhost_driver_session_start(void)
+{
+ int ret;
+
+ ret = pthread_create(&session_th,
+ NULL, vhost_driver_session, NULL);
+ if (ret)
+ rte_panic("Can't create a thread\n");
+}
+
+static void vhost_driver_session_stop(void)
+{
+ int ret;
+
+ ret = pthread_cancel(session_th);
+ if (ret)
+ rte_panic("Can't cancel the thread\n");
+
+ ret = pthread_join(session_th, NULL);
+ if (ret)
+ rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+ int ret;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+ ret = rte_vhost_driver_register(internal->iface_name);
+ if (ret)
+ return ret;
+ }
+
+ /* We need only one message handling thread */
+ if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
+ vhost_driver_session_start();
+
+ return 0;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ if (rte_atomic16_cmpset(&internal->once, 1, 0))
+ rte_vhost_driver_unregister(internal->iface_name);
+
+ if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+ vhost_driver_session_stop();
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+ struct vhost_queue *vq;
+
+ rte_free(internal->rx_vhost_queues[rx_queue_id]);
+
+ vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (vq == NULL) {
+ RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+ return -ENOMEM;
+ }
+
+ vq->mb_pool = mb_pool;
+ vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+ internal->rx_vhost_queues[rx_queue_id] = vq;
+ dev->data->rx_queues[rx_queue_id] = vq;
+ return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+ struct vhost_queue *vq;
+
+ rte_free(internal->tx_vhost_queues[tx_queue_id]);
+
+ vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (vq == NULL) {
+ RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+ return -ENOMEM;
+ }
+
+ vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+ internal->tx_vhost_queues[tx_queue_id] = vq;
+ dev->data->tx_queues[tx_queue_id] = vq;
+ return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+ struct rte_eth_dev_info *dev_info)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ dev_info->driver_name = drivername;
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)-1;
+ dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+ dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+ unsigned i;
+ unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+ const struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] == NULL)
+ continue;
+ igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
+ rx_total += igb_stats->q_ipackets[i];
+ }
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] == NULL)
+ continue;
+ igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
+ igb_stats->q_errors[i] = internal->tx_vhost_queues[i]->err_pkts;
+ tx_total += igb_stats->q_opackets[i];
+ tx_err_total += igb_stats->q_errors[i];
+ }
+
+ igb_stats->ipackets = rx_total;
+ igb_stats->opackets = tx_total;
+ igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+ unsigned i;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] == NULL)
+ continue;
+ internal->rx_vhost_queues[i]->rx_pkts = 0;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] == NULL)
+ continue;
+ internal->tx_vhost_queues[i]->tx_pkts = 0;
+ internal->tx_vhost_queues[i]->err_pkts = 0;
+ }
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+ return;
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused)
+{
+ return 0;
+}
+
+static const struct eth_dev_ops ops = {
+ .dev_start = eth_dev_start,
+ .dev_stop = eth_dev_stop,
+ .dev_configure = eth_dev_configure,
+ .dev_infos_get = eth_dev_info,
+ .rx_queue_setup = eth_rx_queue_setup,
+ .tx_queue_setup = eth_tx_queue_setup,
+ .rx_queue_release = eth_queue_release,
+ .tx_queue_release = eth_queue_release,
+ .link_update = eth_link_update,
+ .stats_get = eth_stats_get,
+ .stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+ char *iface_name,
+ int16_t queues,
+ const unsigned numa_node)
+{
+ struct rte_eth_dev_data *data = NULL;
+ struct pmd_internal *internal = NULL;
+ struct rte_eth_dev *eth_dev = NULL;
+ struct ether_addr *eth_addr = NULL;
+
+ RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+ numa_node);
+
+ /* now do all data allocation - for eth_dev structure, dummy pci driver
+ * and internal (private) data
+ */
+ data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+ if (data == NULL)
+ goto error;
+
+ internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+ if (internal == NULL)
+ goto error;
+
+ eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+ if (eth_addr == NULL)
+ goto error;
+ *eth_addr = base_eth_addr;
+ eth_addr->addr_bytes[5] = index;
+
+ /* reserve an ethdev entry */
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+ if (eth_dev == NULL)
+ goto error;
+
+ /* now put it all together
+ * - store queue data in internal,
+ * - store numa_node info in ethdev data
+ * - point eth_dev_data to internals
+ * - and point eth_dev structure to new eth_dev_data structure
+ */
+ internal->nb_rx_queues = queues;
+ internal->nb_tx_queues = queues;
+ internal->dev_name = strdup(name);
+ if (internal->dev_name == NULL)
+ goto error;
+ internal->iface_name = strdup(iface_name);
+ if (internal->iface_name == NULL) {
+ free(internal->dev_name);
+ goto error;
+ }
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_INSERT_TAIL(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ data->dev_private = internal;
+ data->port_id = eth_dev->data->port_id;
+ memmove(data->name, eth_dev->data->name, sizeof(data->name));
+ data->nb_rx_queues = queues;
+ data->nb_tx_queues = queues;
+ data->dev_link = pmd_link;
+ data->mac_addrs = eth_addr;
+
+ /* We'll replace the 'data' originally allocated by eth_dev. So the
+ * vhost PMD resources won't be shared between multi processes.
+ */
+ eth_dev->data = data;
+ eth_dev->dev_ops = &ops;
+ eth_dev->driver = NULL;
+ eth_dev->data->dev_flags = RTE_ETH_DEV_DETACHABLE;
+ eth_dev->data->kdrv = RTE_KDRV_NONE;
+ eth_dev->data->drv_name = internal->dev_name;
+ eth_dev->data->numa_node = numa_node;
+
+ /* finally assign rx and tx ops */
+ eth_dev->rx_pkt_burst = eth_vhost_rx;
+ eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+ return data->port_id;
+
+error:
+ rte_free(data);
+ rte_free(internal);
+ rte_free(eth_addr);
+
+ return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **iface_name = extra_args;
+
+ if (value == NULL)
+ return -1;
+
+ *iface_name = value;
+
+ return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint16_t *q = extra_args;
+
+ if ((value == NULL) || (extra_args == NULL))
+ return -EINVAL;
+
+ *q = (uint16_t)strtoul(value, NULL, 0);
+ if ((*q == USHRT_MAX) && (errno == ERANGE))
+ return -1;
+
+ if (*q > RTE_MAX_QUEUES_PER_PORT)
+ return -1;
+
+ return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+ struct rte_kvargs *kvlist = NULL;
+ int ret = 0;
+ int index;
+ char *iface_name;
+ uint16_t queues;
+
+ RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+ if (strlen(name) < strlen("eth_vhost"))
+ return -1;
+
+ index = strtol(name + strlen("eth_vhost"), NULL, 0);
+ if (errno == ERANGE)
+ return -1;
+
+ kvlist = rte_kvargs_parse(params, valid_arguments);
+ if (kvlist == NULL)
+ return -1;
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+ &open_iface, &iface_name);
+ if (ret < 0)
+ goto out_free;
+ } else {
+ ret = -1;
+ goto out_free;
+ }
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+ &open_queues, &queues);
+ if (ret < 0)
+ goto out_free;
+
+ } else
+ queues = 1;
+
+ eth_dev_vhost_create(name, index,
+ iface_name, queues, rte_socket_id());
+
+out_free:
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+ struct rte_eth_dev *eth_dev = NULL;
+ struct pmd_internal *internal;
+ unsigned int i;
+
+ RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+ if (name == NULL)
+ return -EINVAL;
+
+ /* find an ethdev entry */
+ eth_dev = rte_eth_dev_allocated(name);
+ if (eth_dev == NULL)
+ return -ENODEV;
+
+ internal = eth_dev->data->dev_private;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_REMOVE(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ eth_dev_stop(eth_dev);
+
+ if ((internal) && (internal->dev_name))
+ free(internal->dev_name);
+ if ((internal) && (internal->iface_name))
+ free(internal->iface_name);
+
+ rte_free(eth_dev->data->mac_addrs);
+ rte_free(eth_dev->data);
+
+ for (i = 0; i < internal->nb_rx_queues; i++)
+ rte_free(internal->rx_vhost_queues[i]);
+ for (i = 0; i < internal->nb_tx_queues; i++)
+ rte_free(internal->tx_vhost_queues[i]);
+ rte_free(internal);
+
+ rte_eth_dev_release_port(eth_dev);
+
+ return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+ .name = "eth_vhost",
+ .type = PMD_VDEV,
+ .init = rte_pmd_vhost_devinit,
+ .uninit = rte_pmd_vhost_devuninit,
+};
+
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
+{
+ struct rte_eth_dev *eth_dev;
+
+ if (rte_eth_dev_is_valid_port(port_id) == 0)
+ return NULL;
+
+ eth_dev = &rte_eth_devices[port_id];
+ if (strncmp("eth_vhost", eth_dev->data->drv_name,
+ strlen("eth_vhost")) == 0) {
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+
+ internal = eth_dev->data->dev_private;
+ vq = internal->rx_vhost_queues[0];
+ if ((vq != NULL) && (vq->device != NULL))
+ return vq->device;
+ }
+
+ return NULL;
+}
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..22a880f
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,65 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 IGEL Co., Ltd.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of IGEL Co., Ltd. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_AF_PACKET_H_
+#define _RTE_ETH_AF_PACKET_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_virtio_net.h>
+
+/**
+ * The function convert specified port_id to virtio device structure.
+ * The retured device can be used for vhost library APIs.
+ * To use vhost library APIs and vhost PMD parallely, below API should
+ * not be called, because the API will be called by vhost PMD.
+ * - rte_vhost_driver_session_start()
+ * Once a device is managed by vhost PMD, below API should not be called.
+ * - rte_vhost_driver_unregister()
+ * To unregister the device, call Port Hotplug APIs.
+ *
+ * @param port_id
+ * port number
+ * @return
+ * virtio net device structure corresponding to the specified port
+ * NULL will be returned in error cases.
+ */
+struct virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..bf0361a
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,8 @@
+DPDK_2.2 {
+
+ global:
+
+ rte_eth_vhost_portid2vdev;
+
+ local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 724efa7..1af4bb3 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -148,7 +148,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += -lrte_pmd_af_packet
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += -lrte_pmd_null

-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)

endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
--
2.1.4
Wang, Zhihong
2015-11-16 01:57:05 UTC
Permalink
A quick glimpse and the bug is gone now :)
Will have more test later on.
-----Original Message-----
Sent: Friday, November 13, 2015 1:21 PM
Subject: [PATCH v4 2/2] vhost: Add VHOST PMD
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.
The PMD has 2 parameters.
- iface: The parameter is used to specify a path to connect to a
virtio-net device.
- queues: The parameter is used to specify the number of the queues
virtio-net device has.
(Default: 1)
Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
-device virtio-net-pci,netdev=net0
---
Yuanhan Liu
2015-11-20 11:43:04 UTC
Permalink
On Fri, Nov 13, 2015 at 02:20:31PM +0900, Tetsuya Mukawa wrote:
....
Post by Tetsuya Mukawa
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+pthread_t session_th;
static?
Post by Tetsuya Mukawa
+
+static struct rte_eth_link pmd_link = {
+ .link_speed = 10000,
+ .link_duplex = ETH_LINK_FULL_DUPLEX,
+ .link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t nb_rx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Dequeue packets from guest TX queue */
+ nb_rx = rte_vhost_dequeue_burst(r->device,
+ r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
+
+ r->rx_pkts += nb_rx;
+
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = rte_vhost_enqueue_burst(r->device,
+ r->virtqueue_id, bufs, nb_bufs);
+
+ r->tx_pkts += nb_tx;
+ r->err_pkts += nb_bufs - nb_tx;
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
We should free upto nb_bufs here, but not nb_tx, right?
Post by Tetsuya Mukawa
+
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+ return 0;
+}
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+ int found = 0;
+ struct pmd_internal *internal;
+
+ if (ifname == NULL)
+ return NULL;
+
+ pthread_mutex_lock(&internal_list_lock);
+
+ TAILQ_FOREACH(internal, &internals_list, next) {
+ if (!strcmp(internal->iface_name, ifname)) {
+ found = 1;
+ break;
+ }
+ }
+
+ pthread_mutex_unlock(&internal_list_lock);
+
+ if (!found)
+ return NULL;
+
+ return internal;
+}
+
...
Post by Tetsuya Mukawa
+static void *vhost_driver_session(void *param __rte_unused)
static void *
vhost_driver_session_start(..)
Post by Tetsuya Mukawa
+{
+ static struct virtio_net_device_ops *vhost_ops;
+
+ vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+ if (vhost_ops == NULL)
+ rte_panic("Can't allocate memory\n");
Why not making them static?
Post by Tetsuya Mukawa
+
+ /* set vhost arguments */
+ vhost_ops->new_device = new_device;
+ vhost_ops->destroy_device = destroy_device;
+ if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
+ rte_panic("Can't register callbacks\n");
+
+ /* start event handling */
+ rte_vhost_driver_session_start();
+
+ rte_free(vhost_ops);
+ pthread_exit(0);
+}
+
+static void vhost_driver_session_start(void)
ditto.
Post by Tetsuya Mukawa
+{
+ int ret;
+
+ ret = pthread_create(&session_th,
+ NULL, vhost_driver_session, NULL);
+ if (ret)
+ rte_panic("Can't create a thread\n");
+}
+
+static void vhost_driver_session_stop(void)
Ditto.
Post by Tetsuya Mukawa
+{
+ int ret;
+
+ ret = pthread_cancel(session_th);
+ if (ret)
+ rte_panic("Can't cancel the thread\n");
+
+ ret = pthread_join(session_th, NULL);
+ if (ret)
+ rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
...
Post by Tetsuya Mukawa
+ internal->nb_rx_queues = queues;
+ internal->nb_tx_queues = queues;
+ internal->dev_name = strdup(name);
+ if (internal->dev_name == NULL)
+ goto error;
+ internal->iface_name = strdup(iface_name);
+ if (internal->iface_name == NULL) {
+ free(internal->dev_name);
+ goto error;
+ }
You still didn't resolve my comments from last email: if allocation
failed here, internal->dev_name is not freed.
Post by Tetsuya Mukawa
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_INSERT_TAIL(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
...
Post by Tetsuya Mukawa
+static struct rte_driver pmd_vhost_drv = {
+ .name = "eth_vhost",
+ .type = PMD_VDEV,
+ .init = rte_pmd_vhost_devinit,
+ .uninit = rte_pmd_vhost_devuninit,
+};
+
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
struct virtio_net *
rte_eth_vhost_portid2vdev()

BTW, why making a speical eth API for virtio? This doesn't make too much
sense to me.

Besides those minor nits, this patch looks good to me. Thanks for the
work!

--yliu
Tetsuya Mukawa
2015-11-24 02:48:04 UTC
Permalink
Post by Yuanhan Liu
....
Post by Tetsuya Mukawa
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+pthread_t session_th;
static?
Hi Yuanhan,

I appreciate your carefully reviewing.
I will fix issues you commented, and submit it again.

I added below 2 comments.
Could you please check it?
Post by Yuanhan Liu
Post by Tetsuya Mukawa
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = rte_vhost_enqueue_burst(r->device,
+ r->virtqueue_id, bufs, nb_bufs);
+
+ r->tx_pkts += nb_tx;
+ r->err_pkts += nb_bufs - nb_tx;
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
We should free upto nb_bufs here, but not nb_tx, right?
I guess we don't need to free all packet buffers.
Could you please check l2fwd_send_burst() in l2fwd example?
It seems DPDK application frees packet buffers that failed to send.
Post by Yuanhan Liu
Post by Tetsuya Mukawa
+static struct rte_driver pmd_vhost_drv = {
+ .name = "eth_vhost",
+ .type = PMD_VDEV,
+ .init = rte_pmd_vhost_devinit,
+ .uninit = rte_pmd_vhost_devuninit,
+};
+
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
struct virtio_net *
rte_eth_vhost_portid2vdev()
BTW, why making a speical eth API for virtio? This doesn't make too much
sense to me.
This is a kind of helper function.

I assume that DPDK applications want to know relation between port_id
and virtio device structure.
But, in "new" callback handler that DPDK application registers,
application can receive virtio device structure, but it doesn't tell
which port is.

To know it, probably here are steps that DPDK application needs to do.

1. Store interface name that is specified when vhost pmd is invoked.
(For example, store information like /tmp/socket0 is for port0, and
/tmp/socket1 is for port1)
2. Compare above interface name and dev->ifname that is stored in virtio
device structure, then DPDK application can know which port is.

If DPDK application uses Port Hotplug, I guess above steps are easy.
But if they don't, interface name will be specified in "--vdev" EAL
command line option.
So probably it's not so easy to handle interface name in DPDK application.
This is why I added the function.

Thanks,
Tetsuya
Yuanhan Liu
2015-11-24 03:40:57 UTC
Permalink
Post by Tetsuya Mukawa
Post by Yuanhan Liu
....
Post by Tetsuya Mukawa
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+pthread_t session_th;
static?
Hi Yuanhan,
I appreciate your carefully reviewing.
I will fix issues you commented, and submit it again.
I added below 2 comments.
Could you please check it?
Post by Yuanhan Liu
Post by Tetsuya Mukawa
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = rte_vhost_enqueue_burst(r->device,
+ r->virtqueue_id, bufs, nb_bufs);
+
+ r->tx_pkts += nb_tx;
+ r->err_pkts += nb_bufs - nb_tx;
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
We should free upto nb_bufs here, but not nb_tx, right?
I guess we don't need to free all packet buffers.
Could you please check l2fwd_send_burst() in l2fwd example?
It seems DPDK application frees packet buffers that failed to send.
Yes, you are right. I was thinking it's just a vhost app, and forgot
that this is for rte_eth_tx_burst, sigh ...
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
+static struct rte_driver pmd_vhost_drv = {
+ .name = "eth_vhost",
+ .type = PMD_VDEV,
+ .init = rte_pmd_vhost_devinit,
+ .uninit = rte_pmd_vhost_devuninit,
+};
+
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
struct virtio_net *
rte_eth_vhost_portid2vdev()
BTW, why making a speical eth API for virtio? This doesn't make too much
sense to me.
This is a kind of helper function.
Yeah, I know that. I was thinking that an API prefixed with rte_eth_
should be a common interface for all eth drivers. Here this one is
for vhost PMD only, though.

I then had a quick check of DPDK code, and found a similar example,
bond, such as rte_eth_bond_create(). So, it might be okay to introduce
PMD specific eth APIs?

Anyway, I would suggest you to put it into another patch, so that
it can be reworked (or even dropped) if someone else doesn't like
it (or doesn't think it's necessary).

--yliu
Post by Tetsuya Mukawa
I assume that DPDK applications want to know relation between port_id
and virtio device structure.
But, in "new" callback handler that DPDK application registers,
application can receive virtio device structure, but it doesn't tell
which port is.
To know it, probably here are steps that DPDK application needs to do.
1. Store interface name that is specified when vhost pmd is invoked.
(For example, store information like /tmp/socket0 is for port0, and
/tmp/socket1 is for port1)
2. Compare above interface name and dev->ifname that is stored in virtio
device structure, then DPDK application can know which port is.
If DPDK application uses Port Hotplug, I guess above steps are easy.
But if they don't, interface name will be specified in "--vdev" EAL
command line option.
So probably it's not so easy to handle interface name in DPDK application.
This is why I added the function.
Thanks,
Tetsuya
Tetsuya Mukawa
2015-11-24 03:44:28 UTC
Permalink
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
....
Post by Tetsuya Mukawa
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+pthread_t session_th;
static?
Hi Yuanhan,
I appreciate your carefully reviewing.
I will fix issues you commented, and submit it again.
I added below 2 comments.
Could you please check it?
Post by Yuanhan Liu
Post by Tetsuya Mukawa
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = rte_vhost_enqueue_burst(r->device,
+ r->virtqueue_id, bufs, nb_bufs);
+
+ r->tx_pkts += nb_tx;
+ r->err_pkts += nb_bufs - nb_tx;
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
We should free upto nb_bufs here, but not nb_tx, right?
I guess we don't need to free all packet buffers.
Could you please check l2fwd_send_burst() in l2fwd example?
It seems DPDK application frees packet buffers that failed to send.
Yes, you are right. I was thinking it's just a vhost app, and forgot
that this is for rte_eth_tx_burst, sigh ...
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
+static struct rte_driver pmd_vhost_drv = {
+ .name = "eth_vhost",
+ .type = PMD_VDEV,
+ .init = rte_pmd_vhost_devinit,
+ .uninit = rte_pmd_vhost_devuninit,
+};
+
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
struct virtio_net *
rte_eth_vhost_portid2vdev()
BTW, why making a speical eth API for virtio? This doesn't make too much
sense to me.
This is a kind of helper function.
Yeah, I know that. I was thinking that an API prefixed with rte_eth_
should be a common interface for all eth drivers. Here this one is
for vhost PMD only, though.
I then had a quick check of DPDK code, and found a similar example,
bond, such as rte_eth_bond_create(). So, it might be okay to introduce
PMD specific eth APIs?
Yes, I guess so.
Post by Yuanhan Liu
Anyway, I would suggest you to put it into another patch, so that
it can be reworked (or even dropped) if someone else doesn't like
it (or doesn't think it's necessary).
Sure, it's nice idea.
I will split the patch.

Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
I assume that DPDK applications want to know relation between port_id
and virtio device structure.
But, in "new" callback handler that DPDK application registers,
application can receive virtio device structure, but it doesn't tell
which port is.
To know it, probably here are steps that DPDK application needs to do.
1. Store interface name that is specified when vhost pmd is invoked.
(For example, store information like /tmp/socket0 is for port0, and
/tmp/socket1 is for port1)
2. Compare above interface name and dev->ifname that is stored in virtio
device structure, then DPDK application can know which port is.
If DPDK application uses Port Hotplug, I guess above steps are easy.
But if they don't, interface name will be specified in "--vdev" EAL
command line option.
So probably it's not so easy to handle interface name in DPDK application.
This is why I added the function.
Thanks,
Tetsuya
Rich Lane
2015-11-21 00:15:41 UTC
Permalink
Post by Tetsuya Mukawa
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
...
Post by Tetsuya Mukawa
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = rte_vhost_enqueue_burst(r->device,
+ r->virtqueue_id, bufs, nb_bufs);
+
+ r->tx_pkts += nb_tx;
+ r->err_pkts += nb_bufs - nb_tx;
I don't think a full TX queue is counted as an error by physical NIC PMDs
like ixgbe and i40e. It is counted as an error by the af_packet, pcap, and
ring PMDs. I'd suggest not counting it as an error because it's a common
and expected condition, and the application might just retry the TX later.

Are the byte counts left out because it would be a performance hit? It
seems like it would be a minimal cost given how much we're already touching
each packet.
Post by Tetsuya Mukawa
+static int
+new_device(struct virtio_net *dev)
+{
...
Post by Tetsuya Mukawa
+
+ if ((dev->virt_qp_nb < internal->nb_rx_queues) ||
+ (dev->virt_qp_nb < internal->nb_tx_queues)) {
+ RTE_LOG(INFO, PMD, "Not enough queues\n");
+ return -1;
+ }
Would it make sense to take the minimum of the guest and host queuepairs
and use that below in place of nb_rx_queues/nb_tx_queues? That way the host
can support a large maximum number of queues and each guest can choose how
many it wants to use. The host application will receive vring_state_changed
callbacks for each queue the guest activates.
Tetsuya Mukawa
2015-11-24 04:41:42 UTC
Permalink
Post by Rich Lane
Post by Tetsuya Mukawa
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
...
Post by Tetsuya Mukawa
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = rte_vhost_enqueue_burst(r->device,
+ r->virtqueue_id, bufs, nb_bufs);
+
+ r->tx_pkts += nb_tx;
+ r->err_pkts += nb_bufs - nb_tx;
I don't think a full TX queue is counted as an error by physical NIC PMDs
like ixgbe and i40e. It is counted as an error by the af_packet, pcap, and
ring PMDs. I'd suggest not counting it as an error because it's a common
and expected condition, and the application might just retry the TX later.
Hi Rich,

Thanks for commenting.
I will count it as "imissed".
Post by Rich Lane
Are the byte counts left out because it would be a performance hit? It
seems like it would be a minimal cost given how much we're already touching
each packet.
I just ignore it for performance.
But you are correct, I will add it.
Post by Rich Lane
Post by Tetsuya Mukawa
+static int
+new_device(struct virtio_net *dev)
+{
...
Post by Tetsuya Mukawa
+
+ if ((dev->virt_qp_nb < internal->nb_rx_queues) ||
+ (dev->virt_qp_nb < internal->nb_tx_queues)) {
+ RTE_LOG(INFO, PMD, "Not enough queues\n");
+ return -1;
+ }
Would it make sense to take the minimum of the guest and host queuepairs
and use that below in place of nb_rx_queues/nb_tx_queues? That way the host
can support a large maximum number of queues and each guest can choose how
many it wants to use. The host application will receive vring_state_changed
callbacks for each queue the guest activates.
Thanks for checking this.
I agree with you.

After reading your comment, here is my guess for this PMD behavior.

This PMD should assume that virtio-net device(QEMU) has same or more
queues than specified in vhost PMD option.
In a case that the assumption is break, application should handle
vring_state_change callback correctly.
(Then stop accessing to disabled queues not to spend CPU power.)

Anyway, I will just remove above if-condition, because of above PMD
assumption.

Thanks,
Tetsuya
Tetsuya Mukawa
2015-11-24 09:00:00 UTC
Permalink
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

PATCH v5 changes:
- Rebase on latest master.
- Fix RX/TX routine to count RX/TX bytes.
- Fix RX/TX routine not to count as error packets if enqueue/dequeue
cannot send all packets.
- Fix if-condition checking for multiqueues.
- Add "static" to pthread variable.
- Fix format.
- Change default behavior not to receive queueing event from driver.
- Split the patch to separate rte_eth_vhost_portid2vdev().

PATCH v4 changes:
- Rebase on latest DPDK tree.
- Fix cording style.
- Fix code not to invoke multiple messaging handling threads.
- Fix code to handle vdev parameters correctly.
- Remove needless cast.
- Remove needless if-condition before rt_free().

PATCH v3 changes:
- Rebase on latest matser
- Specify correct queue_id in RX/TX function.

PATCH v2 changes:
- Remove a below patch that fixes vhost library.
The patch was applied as a separate patch.
- vhost: fix crash with multiqueue enabled
- Fix typos.
(Thanks to Thomas, Monjalon)
- Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
- Support vhost multiple queues.
- Rebase on "remove pci driver from vdevs".
- Optimize RX/TX functions.
- Fix resource leaks.
- Fix compile issue.
- Add patch to fix vhost library.

RFC PATCH v3 changes:
- Optimize performance.
In RX/TX functions, change code to access only per core data.
- Add below API to allow user to use vhost library APIs for a port managed
by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
- rte_eth_vhost_portid2vdev()
To support this functionality, vhost library is also changed.
Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
- Add code to support vhost multiple queues.
Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
- Fix issues reported by checkpatch.pl
(Thanks to Stephen Hemminger)


Tetsuya Mukawa (3):
vhost: Add callback and private data for vhost PMD
vhost: Add VHOST PMD
vhost: Add helper function to convert port id to virtio device pointer

config/common_linuxapp | 6 +
doc/guides/nics/index.rst | 1 +
doc/guides/rel_notes/release_2_2.rst | 2 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 62 ++
drivers/net/vhost/rte_eth_vhost.c | 796 ++++++++++++++++++++++++++
drivers/net/vhost/rte_eth_vhost.h | 65 +++
drivers/net/vhost/rte_pmd_vhost_version.map | 8 +
lib/librte_vhost/rte_vhost_version.map | 6 +
lib/librte_vhost/rte_virtio_net.h | 3 +
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +-
lib/librte_vhost/virtio-net.c | 60 +-
lib/librte_vhost/virtio-net.h | 4 +-
mk/rte.app.mk | 8 +-
14 files changed, 1024 insertions(+), 14 deletions(-)
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_eth_vhost.h
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
--
2.1.4
Tetsuya Mukawa
2015-11-24 09:00:01 UTC
Permalink
The vhost PMD will be a wrapper of vhost library, but some of vhost
library APIs cannot be mapped to ethdev library APIs.
Becasue of this, in some cases, we still need to use vhost library APIs
for a port created by the vhost PMD.

Currently, when virtio device is created and destroyed, vhost library
will call one of callback handlers. The vhost PMD need to use this
pair of callback handlers to know which virtio devices are connected
actually.
Because we can register only one pair of callbacks to vhost library, if
the PMD use it, DPDK applications cannot have a way to know the events.

This may break legacy DPDK applications that uses vhost library. To prevent
it, this patch adds one more pair of callbacks to vhost library especially
for the vhost PMD.
With the patch, legacy applications can use the vhost PMD even if they need
additional specific handling for virtio device creation and destruction.

For example, legacy application can call
rte_vhost_enable_guest_notification() in callbacks to change setting.

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
lib/librte_vhost/rte_vhost_version.map | 6 +++
lib/librte_vhost/rte_virtio_net.h | 3 ++
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++---
lib/librte_vhost/virtio-net.c | 60 +++++++++++++++++++++++++--
lib/librte_vhost/virtio-net.h | 4 +-
5 files changed, 73 insertions(+), 13 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
rte_vhost_driver_unregister;

} DPDK_2.0;
+
+DPDK_2.2 {
+ global:
+
+ rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 5687452..3ef6e58 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -128,6 +128,7 @@ struct virtio_net {
char ifname[IF_NAME_SZ]; /**< Name of the tap device or socket path. */
uint32_t virt_qp_nb; /**< number of queue pair we have allocated */
void *priv; /**< private context */
+ void *pmd_priv; /**< private context for vhost PMD */
struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; /**< Contains all virtqueue information. */
} __rte_cache_aligned;

@@ -224,6 +225,8 @@ int rte_vhost_driver_unregister(const char *dev_name);

/* Register callbacks. */
int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+/* Register callbacks for vhost PMD (Only for internal). */
+int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const);
/* Start vhost driver session blocking loop. */
int rte_vhost_driver_session_start(void);

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index d07452a..d8ae2fc 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -111,7 +111,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)

/* Remove from the data plane. */
if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

if (dev->mem) {
free_mem_region(dev);
@@ -272,7 +272,7 @@ user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)

if (virtio_is_ready(dev) &&
!(dev->flags & VIRTIO_DEV_RUNNING))
- notify_ops->new_device(dev);
+ notify_new_device(dev);
}

/*
@@ -288,7 +288,7 @@ user_get_vring_base(struct vhost_device_ctx ctx,
return -1;
/* We have to stop the queue (virtio) if it is running. */
if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

/* Here we are safe to get the last used index */
ops->get_vring_base(ctx, state->index, state);
@@ -324,10 +324,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
"set queue enable: %d to qp idx: %d\n",
enable, state->index);

- if (notify_ops->vring_state_changed) {
- notify_ops->vring_state_changed(dev, base_idx / VIRTIO_QNUM,
- enable);
- }
+ notify_vring_state_changed(dev, base_idx / VIRTIO_QNUM, enable);

dev->virtqueue[base_idx + VIRTIO_RXQ]->enabled = enable;
dev->virtqueue[base_idx + VIRTIO_TXQ]->enabled = enable;
@@ -341,7 +338,7 @@ user_destroy_device(struct vhost_device_ctx ctx)
struct virtio_net *dev = get_device(ctx);

if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

if (dev && dev->mem) {
free_mem_region(dev);
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 8364938..dc977b7 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -65,6 +65,8 @@ struct virtio_net_config_ll {

/* device ops to add/remove device to/from data core. */
struct virtio_net_device_ops const *notify_ops;
+/* device ops for vhost PMD to add/remove device to/from data core. */
+struct virtio_net_device_ops const *pmd_notify_ops;
/* root address of the linked list of managed virtio devices */
static struct virtio_net_config_ll *ll_root;

@@ -81,6 +83,45 @@ static struct virtio_net_config_ll *ll_root;
static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;


+int
+notify_new_device(struct virtio_net *dev)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device != NULL)) {
+ int ret = pmd_notify_ops->new_device(dev);
+
+ if (ret != 0)
+ return ret;
+ }
+ if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
+ return notify_ops->new_device(dev);
+
+ return 0;
+}
+
+void
+notify_destroy_device(volatile struct virtio_net *dev)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device != NULL))
+ pmd_notify_ops->destroy_device(dev);
+ if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
+ notify_ops->destroy_device(dev);
+}
+
+int
+notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->vring_state_changed != NULL)) {
+ int ret = pmd_notify_ops->vring_state_changed(dev, queue_id, enable);
+
+ if (ret != 0)
+ return ret;
+ }
+ if ((notify_ops != NULL) && (notify_ops->vring_state_changed != NULL))
+ return notify_ops->vring_state_changed(dev, queue_id, enable);
+
+ return 0;
+}
+
/*
* Converts QEMU virtual address to Vhost virtual address. This function is
* used to convert the ring addresses to our address space.
@@ -393,7 +434,7 @@ destroy_device(struct vhost_device_ctx ctx)
* the function to remove it from the data core.
*/
if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
- notify_ops->destroy_device(&(ll_dev_cur->dev));
+ notify_destroy_device(&(ll_dev_cur->dev));
ll_dev_cur = rm_config_ll_entry(ll_dev_cur,
ll_dev_last);
} else {
@@ -451,7 +492,7 @@ reset_owner(struct vhost_device_ctx ctx)
return -1;

if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

cleanup_device(dev, 0);
reset_device(dev);
@@ -809,12 +850,12 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
- return notify_ops->new_device(dev);
+ return notify_new_device(dev);
}
/* Otherwise we remove it. */
} else
if (file->fd == VIRTIO_DEV_STOPPED)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);
return 0;
}

@@ -898,3 +939,14 @@ rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const op

return 0;
}
+
+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const ops)
+{
+ pmd_notify_ops = ops;
+
+ return 0;
+}
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
index 75fb57e..0816e71 100644
--- a/lib/librte_vhost/virtio-net.h
+++ b/lib/librte_vhost/virtio-net.h
@@ -37,7 +37,9 @@
#include "vhost-net.h"
#include "rte_virtio_net.h"

-struct virtio_net_device_ops const *notify_ops;
struct virtio_net *get_device(struct vhost_device_ctx ctx);

+int notify_new_device(struct virtio_net *dev);
+void notify_destroy_device(volatile struct virtio_net *dev);
+int notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable);
#endif
--
2.1.4
Yuanhan Liu
2015-12-17 11:42:23 UTC
Permalink
Post by Tetsuya Mukawa
The vhost PMD will be a wrapper of vhost library, but some of vhost
library APIs cannot be mapped to ethdev library APIs.
Becasue of this, in some cases, we still need to use vhost library APIs
for a port created by the vhost PMD.
Currently, when virtio device is created and destroyed, vhost library
will call one of callback handlers. The vhost PMD need to use this
pair of callback handlers to know which virtio devices are connected
actually.
Because we can register only one pair of callbacks to vhost library, if
the PMD use it, DPDK applications cannot have a way to know the events.
This may break legacy DPDK applications that uses vhost library. To prevent
it, this patch adds one more pair of callbacks to vhost library especially
for the vhost PMD.
With the patch, legacy applications can use the vhost PMD even if they need
additional specific handling for virtio device creation and destruction.
For example, legacy application can call
rte_vhost_enable_guest_notification() in callbacks to change setting.
TBH, I never liked it since the beginning. Introducing two callbacks
for one event is a bit messy, and therefore error prone.

I have been thinking this occasionally last few weeks, and have came
up something that we may introduce another layer callback based on
the vhost pmd itself, by a new API:

rte_eth_vhost_register_callback().

And we then call those new callback inside the vhost pmd new_device()
and vhost pmd destroy_device() implementations.

And we could have same callbacks like vhost have, but I'm thinking
that new_device() and destroy_device() doesn't sound like a good name
to a PMD driver. Maybe a name like "link_state_changed" is better?

What do you think of that?


On the other hand, I'm still thinking is that really necessary to let
the application be able to call vhost functions like rte_vhost_enable_guest_notification()
with the vhost PMD driver?

--yliu
Tetsuya Mukawa
2015-12-18 03:15:42 UTC
Permalink
Post by Yuanhan Liu
Post by Tetsuya Mukawa
The vhost PMD will be a wrapper of vhost library, but some of vhost
library APIs cannot be mapped to ethdev library APIs.
Becasue of this, in some cases, we still need to use vhost library APIs
for a port created by the vhost PMD.
Currently, when virtio device is created and destroyed, vhost library
will call one of callback handlers. The vhost PMD need to use this
pair of callback handlers to know which virtio devices are connected
actually.
Because we can register only one pair of callbacks to vhost library, if
the PMD use it, DPDK applications cannot have a way to know the events.
This may break legacy DPDK applications that uses vhost library. To prevent
it, this patch adds one more pair of callbacks to vhost library especially
for the vhost PMD.
With the patch, legacy applications can use the vhost PMD even if they need
additional specific handling for virtio device creation and destruction.
For example, legacy application can call
rte_vhost_enable_guest_notification() in callbacks to change setting.
TBH, I never liked it since the beginning. Introducing two callbacks
for one event is a bit messy, and therefore error prone.
I agree with you.
Post by Yuanhan Liu
I have been thinking this occasionally last few weeks, and have came
up something that we may introduce another layer callback based on
rte_eth_vhost_register_callback().
And we then call those new callback inside the vhost pmd new_device()
and vhost pmd destroy_device() implementations.
And we could have same callbacks like vhost have, but I'm thinking
that new_device() and destroy_device() doesn't sound like a good name
to a PMD driver. Maybe a name like "link_state_changed" is better?
What do you think of that?
Yes, "link_state_changed" will be good.

BTW, I thought it was ok that an DPDK app that used vhost PMD called
vhost library APIs directly.
But probably you may feel strangeness about it. Is this correct?

If so, how about implementing legacy status interrupt mechanism to vhost
PMD?
For example, an DPDK app can register callback handler like
"examples/link_status_interrupt".

Also, if the app doesn't call vhost library APIs directly,
rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
need to handle virtio device structure anymore.
Post by Yuanhan Liu
On the other hand, I'm still thinking is that really necessary to let
the application be able to call vhost functions like rte_vhost_enable_guest_notification()
with the vhost PMD driver?
Basic concept of my patch is that vhost PMD will provides the features
that vhost library provides.

How about removing rte_vhost_enable_guest_notification() from "vhost
library"?
(I also not sure what are use cases)
If we can do this, vhost PMD also doesn't need to take care of it.
Or if rte_vhost_enable_guest_notification() will be removed in the
future, vhost PMD is able to ignore it.


Please let me correct up my thinking about your questions.
- Change concept of patch not to call vhost library APIs directly.
These should be wrapped by ethdev APIs.
- Remove rte_eth_vhost_portid2vdev(), because of above concept changing.
- Implement legacy status changed interrupt to vhost PMD instead of
using own callback mechanism.
- Check if we can remove rte_vhost_enable_guest_notification() from
vhost library.


Hi Xie,

Do you know the use cases of rte_vhost_enable_guest_notification()?

Thanks,
Tetsuya
Tetsuya Mukawa
2015-11-24 09:00:02 UTC
Permalink
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
- iface: The parameter is used to specify a path to connect to a
virtio-net device.
- queues: The parameter is used to specify the number of the queues
virtio-net device has.
(Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
-device virtio-net-pci,netdev=net0

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
config/common_linuxapp | 6 +
doc/guides/nics/index.rst | 1 +
doc/guides/rel_notes/release_2_2.rst | 2 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 57 ++
drivers/net/vhost/rte_eth_vhost.c | 771 ++++++++++++++++++++++++++++
drivers/net/vhost/rte_pmd_vhost_version.map | 8 +
mk/rte.app.mk | 8 +-
8 files changed, 856 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index f72c46d..0140a8e 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -466,6 +466,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
CONFIG_RTE_LIBRTE_VHOST_DEBUG=n

#
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
#Compile Xen domain0 support
#
CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 0a0b724..26db9b7 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -48,6 +48,7 @@ Network Interface Controller Drivers
mlx5
szedata2
virtio
+ vhost
vmxnet3
pcap_ring

diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 8c77768..b6071ab 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -111,6 +111,8 @@ New Features

* **Added vhost-user multiple queue support.**

+* **Added vhost PMD.**
+
* **Added port hotplug support to vmxnet3.**

* **Added port hotplug support to xenvirt.**
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index cddcd57..18d03cf 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -51,5 +51,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt

+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
include $(RTE_SDK)/mk/rte.sharelib.mk
include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..8bec47a
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,57 @@
+# BSD LICENSE
+#
+# Copyright (c) 2010-2015 Intel Corporation.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in
+# the documentation and/or other materials provided with the
+# distribution.
+# * Neither the name of Intel corporation nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..9ef05bc
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,771 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015 IGEL Co., Ltd.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of IGEL Co.,Ltd. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#define ETH_VHOST_IFACE_ARG "iface"
+#define ETH_VHOST_QUEUES_ARG "queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+ ETH_VHOST_IFACE_ARG,
+ ETH_VHOST_QUEUES_ARG,
+ NULL
+};
+
+static struct ether_addr base_eth_addr = {
+ .addr_bytes = {
+ 0x56 /* V */,
+ 0x48 /* H */,
+ 0x4F /* O */,
+ 0x53 /* S */,
+ 0x54 /* T */,
+ 0x00
+ }
+};
+
+struct vhost_queue {
+ rte_atomic32_t allow_queuing;
+ rte_atomic32_t while_queuing;
+ struct virtio_net *device;
+ struct pmd_internal *internal;
+ struct rte_mempool *mb_pool;
+ uint16_t virtqueue_id;
+ uint64_t rx_pkts;
+ uint64_t tx_pkts;
+ uint64_t missed_pkts;
+ uint64_t rx_bytes;
+ uint64_t tx_bytes;
+};
+
+struct pmd_internal {
+ TAILQ_ENTRY(pmd_internal) next;
+ char *dev_name;
+ char *iface_name;
+ unsigned nb_rx_queues;
+ unsigned nb_tx_queues;
+
+ struct vhost_queue *rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+ struct vhost_queue *tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+
+ volatile uint16_t once;
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+ TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+static pthread_t session_th;
+
+static struct rte_eth_link pmd_link = {
+ .link_speed = 10000,
+ .link_duplex = ETH_LINK_FULL_DUPLEX,
+ .link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_rx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Dequeue packets from guest TX queue */
+ nb_rx = rte_vhost_dequeue_burst(r->device,
+ r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
+
+ r->rx_pkts += nb_rx;
+
+ for (i = 0; likely(i < nb_rx); i++)
+ r->rx_bytes += bufs[i]->pkt_len;
+
+out:
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = rte_vhost_enqueue_burst(r->device,
+ r->virtqueue_id, bufs, nb_bufs);
+
+ r->tx_pkts += nb_tx;
+ r->missed_pkts += nb_bufs - nb_tx;
+
+ for (i = 0; likely(i < nb_tx); i++)
+ r->tx_bytes += bufs[i]->pkt_len;
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
+out:
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+ return 0;
+}
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+ int found = 0;
+ struct pmd_internal *internal;
+
+ if (ifname == NULL)
+ return NULL;
+
+ pthread_mutex_lock(&internal_list_lock);
+
+ TAILQ_FOREACH(internal, &internals_list, next) {
+ if (!strcmp(internal->iface_name, ifname)) {
+ found = 1;
+ break;
+ }
+ }
+
+ pthread_mutex_unlock(&internal_list_lock);
+
+ if (!found)
+ return NULL;
+
+ return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "Invalid argument\n");
+ return -1;
+ }
+
+ internal = find_internal_resource(dev->ifname);
+ if (internal == NULL) {
+ RTE_LOG(INFO, PMD, "Invalid device name\n");
+ return -1;
+ }
+
+ eth_dev = rte_eth_dev_allocated(internal->dev_name);
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+ return -1;
+ }
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = dev;
+ vq->internal = internal;
+ rte_vhost_enable_guest_notification(
+ dev, vq->virtqueue_id, 0);
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = dev;
+ vq->internal = internal;
+ rte_vhost_enable_guest_notification(
+ dev, vq->virtqueue_id, 0);
+ }
+
+ dev->flags |= VIRTIO_DEV_RUNNING;
+ dev->pmd_priv = eth_dev;
+ eth_dev->data->dev_link.link_status = 1;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 1);
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 1);
+ }
+ RTE_LOG(INFO, PMD, "New connection established\n");
+
+ return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "Invalid argument\n");
+ return;
+ }
+
+ eth_dev = (struct rte_eth_dev *)dev->pmd_priv;
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+ return;
+ }
+
+ internal = eth_dev->data->dev_private;
+
+ /* Wait until rx/tx_pkt_burst stops accessing vhost device */
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 0);
+ while (rte_atomic32_read(&vq->while_queuing))
+ rte_pause();
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 0);
+ while (rte_atomic32_read(&vq->while_queuing))
+ rte_pause();
+ }
+
+ eth_dev->data->dev_link.link_status = 0;
+
+ dev->pmd_priv = NULL;
+ dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = NULL;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = NULL;
+ }
+
+ RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *
+vhost_driver_session(void *param __rte_unused)
+{
+ static struct virtio_net_device_ops vhost_ops;
+
+ /* set vhost arguments */
+ vhost_ops.new_device = new_device;
+ vhost_ops.destroy_device = destroy_device;
+ if (rte_vhost_driver_pmd_callback_register(&vhost_ops) < 0)
+ rte_panic("Can't register callbacks\n");
+
+ /* start event handling */
+ rte_vhost_driver_session_start();
+
+ pthread_exit(0);
+}
+
+static void
+vhost_driver_session_start(void)
+{
+ int ret;
+
+ ret = pthread_create(&session_th,
+ NULL, vhost_driver_session, NULL);
+ if (ret)
+ rte_panic("Can't create a thread\n");
+}
+
+static void
+vhost_driver_session_stop(void)
+{
+ int ret;
+
+ ret = pthread_cancel(session_th);
+ if (ret)
+ rte_panic("Can't cancel the thread\n");
+
+ ret = pthread_join(session_th, NULL);
+ if (ret)
+ rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+ int ret;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+ ret = rte_vhost_driver_register(internal->iface_name);
+ if (ret)
+ return ret;
+ }
+
+ /* We need only one message handling thread */
+ if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
+ vhost_driver_session_start();
+
+ return 0;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ if (rte_atomic16_cmpset(&internal->once, 1, 0))
+ rte_vhost_driver_unregister(internal->iface_name);
+
+ if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+ vhost_driver_session_stop();
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+ struct vhost_queue *vq;
+
+ rte_free(internal->rx_vhost_queues[rx_queue_id]);
+
+ vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (vq == NULL) {
+ RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+ return -ENOMEM;
+ }
+
+ vq->mb_pool = mb_pool;
+ vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+ internal->rx_vhost_queues[rx_queue_id] = vq;
+ dev->data->rx_queues[rx_queue_id] = vq;
+ return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+ struct vhost_queue *vq;
+
+ rte_free(internal->tx_vhost_queues[tx_queue_id]);
+
+ vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (vq == NULL) {
+ RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+ return -ENOMEM;
+ }
+
+ vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+ internal->tx_vhost_queues[tx_queue_id] = vq;
+ dev->data->tx_queues[tx_queue_id] = vq;
+ return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+ struct rte_eth_dev_info *dev_info)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ dev_info->driver_name = drivername;
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)-1;
+ dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+ dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+ unsigned i;
+ unsigned long rx_total = 0, tx_total = 0, tx_missed_total = 0;
+ unsigned long rx_total_bytes = 0, tx_total_bytes = 0;
+ const struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] == NULL)
+ continue;
+ igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
+ rx_total += igb_stats->q_ipackets[i];
+
+ igb_stats->q_ibytes[i] = internal->rx_vhost_queues[i]->rx_bytes;
+ rx_total_bytes += igb_stats->q_ibytes[i];
+ }
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] == NULL)
+ continue;
+ igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
+ tx_missed_total += internal->tx_vhost_queues[i]->missed_pkts;
+ tx_total += igb_stats->q_opackets[i];
+
+ igb_stats->q_obytes[i] = internal->tx_vhost_queues[i]->tx_bytes;
+ tx_total_bytes += igb_stats->q_obytes[i];
+ }
+
+ igb_stats->ipackets = rx_total;
+ igb_stats->opackets = tx_total;
+ igb_stats->imissed = tx_missed_total;
+ igb_stats->ibytes = rx_total_bytes;
+ igb_stats->obytes = tx_total_bytes;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+ unsigned i;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] == NULL)
+ continue;
+ internal->rx_vhost_queues[i]->rx_pkts = 0;
+ internal->rx_vhost_queues[i]->rx_bytes = 0;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] == NULL)
+ continue;
+ internal->tx_vhost_queues[i]->tx_pkts = 0;
+ internal->tx_vhost_queues[i]->tx_bytes = 0;
+ internal->tx_vhost_queues[i]->missed_pkts = 0;
+ }
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+ return;
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused)
+{
+ return 0;
+}
+
+static const struct eth_dev_ops ops = {
+ .dev_start = eth_dev_start,
+ .dev_stop = eth_dev_stop,
+ .dev_configure = eth_dev_configure,
+ .dev_infos_get = eth_dev_info,
+ .rx_queue_setup = eth_rx_queue_setup,
+ .tx_queue_setup = eth_tx_queue_setup,
+ .rx_queue_release = eth_queue_release,
+ .tx_queue_release = eth_queue_release,
+ .link_update = eth_link_update,
+ .stats_get = eth_stats_get,
+ .stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+ char *iface_name,
+ int16_t queues,
+ const unsigned numa_node)
+{
+ struct rte_eth_dev_data *data = NULL;
+ struct pmd_internal *internal = NULL;
+ struct rte_eth_dev *eth_dev = NULL;
+ struct ether_addr *eth_addr = NULL;
+
+ RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+ numa_node);
+
+ /* now do all data allocation - for eth_dev structure, dummy pci driver
+ * and internal (private) data
+ */
+ data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+ if (data == NULL)
+ goto error;
+
+ internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+ if (internal == NULL)
+ goto error;
+
+ eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+ if (eth_addr == NULL)
+ goto error;
+ *eth_addr = base_eth_addr;
+ eth_addr->addr_bytes[5] = index;
+
+ /* reserve an ethdev entry */
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+ if (eth_dev == NULL)
+ goto error;
+
+ /* now put it all together
+ * - store queue data in internal,
+ * - store numa_node info in ethdev data
+ * - point eth_dev_data to internals
+ * - and point eth_dev structure to new eth_dev_data structure
+ */
+ internal->nb_rx_queues = queues;
+ internal->nb_tx_queues = queues;
+ internal->dev_name = strdup(name);
+ if (internal->dev_name == NULL)
+ goto error;
+ internal->iface_name = strdup(iface_name);
+ if (internal->iface_name == NULL) {
+ free(internal->dev_name);
+ goto error;
+ }
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_INSERT_TAIL(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ data->dev_private = internal;
+ data->port_id = eth_dev->data->port_id;
+ memmove(data->name, eth_dev->data->name, sizeof(data->name));
+ data->nb_rx_queues = queues;
+ data->nb_tx_queues = queues;
+ data->dev_link = pmd_link;
+ data->mac_addrs = eth_addr;
+
+ /* We'll replace the 'data' originally allocated by eth_dev. So the
+ * vhost PMD resources won't be shared between multi processes.
+ */
+ eth_dev->data = data;
+ eth_dev->dev_ops = &ops;
+ eth_dev->driver = NULL;
+ eth_dev->data->dev_flags = RTE_ETH_DEV_DETACHABLE;
+ eth_dev->data->kdrv = RTE_KDRV_NONE;
+ eth_dev->data->drv_name = internal->dev_name;
+ eth_dev->data->numa_node = numa_node;
+
+ /* finally assign rx and tx ops */
+ eth_dev->rx_pkt_burst = eth_vhost_rx;
+ eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+ return data->port_id;
+
+error:
+ rte_free(data);
+ rte_free(internal);
+ rte_free(eth_addr);
+
+ return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **iface_name = extra_args;
+
+ if (value == NULL)
+ return -1;
+
+ *iface_name = value;
+
+ return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint16_t *q = extra_args;
+
+ if ((value == NULL) || (extra_args == NULL))
+ return -EINVAL;
+
+ *q = (uint16_t)strtoul(value, NULL, 0);
+ if ((*q == USHRT_MAX) && (errno == ERANGE))
+ return -1;
+
+ if (*q > RTE_MAX_QUEUES_PER_PORT)
+ return -1;
+
+ return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+ struct rte_kvargs *kvlist = NULL;
+ int ret = 0;
+ int index;
+ char *iface_name;
+ uint16_t queues;
+
+ RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+ if (strlen(name) < strlen("eth_vhost"))
+ return -1;
+
+ index = strtol(name + strlen("eth_vhost"), NULL, 0);
+ if (errno == ERANGE)
+ return -1;
+
+ kvlist = rte_kvargs_parse(params, valid_arguments);
+ if (kvlist == NULL)
+ return -1;
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+ &open_iface, &iface_name);
+ if (ret < 0)
+ goto out_free;
+ } else {
+ ret = -1;
+ goto out_free;
+ }
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+ &open_queues, &queues);
+ if (ret < 0)
+ goto out_free;
+
+ } else
+ queues = 1;
+
+ eth_dev_vhost_create(name, index,
+ iface_name, queues, rte_socket_id());
+
+out_free:
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+ struct rte_eth_dev *eth_dev = NULL;
+ struct pmd_internal *internal;
+ unsigned int i;
+
+ RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+ if (name == NULL)
+ return -EINVAL;
+
+ /* find an ethdev entry */
+ eth_dev = rte_eth_dev_allocated(name);
+ if (eth_dev == NULL)
+ return -ENODEV;
+
+ internal = eth_dev->data->dev_private;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_REMOVE(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ eth_dev_stop(eth_dev);
+
+ if ((internal) && (internal->dev_name))
+ free(internal->dev_name);
+ if ((internal) && (internal->iface_name))
+ free(internal->iface_name);
+
+ rte_free(eth_dev->data->mac_addrs);
+ rte_free(eth_dev->data);
+
+ for (i = 0; i < internal->nb_rx_queues; i++)
+ rte_free(internal->rx_vhost_queues[i]);
+ for (i = 0; i < internal->nb_tx_queues; i++)
+ rte_free(internal->tx_vhost_queues[i]);
+ rte_free(internal);
+
+ rte_eth_dev_release_port(eth_dev);
+
+ return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+ .name = "eth_vhost",
+ .type = PMD_VDEV,
+ .init = rte_pmd_vhost_devinit,
+ .uninit = rte_pmd_vhost_devuninit,
+};
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..bf0361a
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,8 @@
+DPDK_2.2 {
+
+ global:
+
+ rte_eth_vhost_portid2vdev;
+
+ local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 148653e..542df30 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -151,7 +151,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += -lrte_pmd_af_packet
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += -lrte_pmd_null

-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)

endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
--
2.1.4
Tetsuya Mukawa
2015-11-24 09:00:03 UTC
Permalink
This helper function is used to convert port id to virtio device
pointer. To use this function, a port should be managed by vhost PMD.
After getting virtio device pointer, it can be used for calling vhost
library APIs. But some library APIs should not be called with vhost PMD.

Here is.
- rte_vhost_driver_session_start()
- rte_vhost_driver_unregister()

Above APIs will not work with vhost PMD.

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
drivers/net/vhost/Makefile | 5 +++
drivers/net/vhost/rte_eth_vhost.c | 25 +++++++++++++++
drivers/net/vhost/rte_eth_vhost.h | 65 +++++++++++++++++++++++++++++++++++++++
3 files changed, 95 insertions(+)
create mode 100644 drivers/net/vhost/rte_eth_vhost.h

diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
index 8bec47a..8186a80 100644
--- a/drivers/net/vhost/Makefile
+++ b/drivers/net/vhost/Makefile
@@ -48,6 +48,11 @@ LIBABIVER := 1
#
SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c

+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
# this lib depends upon:
DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 9ef05bc..bfe1f18 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -41,6 +41,8 @@
#include <rte_kvargs.h>
#include <rte_virtio_net.h>

+#include "rte_eth_vhost.h"
+
#define ETH_VHOST_IFACE_ARG "iface"
#define ETH_VHOST_QUEUES_ARG "queues"

@@ -768,4 +770,27 @@ static struct rte_driver pmd_vhost_drv = {
.uninit = rte_pmd_vhost_devuninit,
};

+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
+{
+ struct rte_eth_dev *eth_dev;
+
+ if (rte_eth_dev_is_valid_port(port_id) == 0)
+ return NULL;
+
+ eth_dev = &rte_eth_devices[port_id];
+ if (strncmp("eth_vhost", eth_dev->data->drv_name,
+ strlen("eth_vhost")) == 0) {
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+
+ internal = eth_dev->data->dev_private;
+ vq = internal->rx_vhost_queues[0];
+ if ((vq != NULL) && (vq->device != NULL))
+ return vq->device;
+ }
+
+ return NULL;
+}
+
PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..22a880f
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,65 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 IGEL Co., Ltd.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of IGEL Co., Ltd. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_AF_PACKET_H_
+#define _RTE_ETH_AF_PACKET_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_virtio_net.h>
+
+/**
+ * The function convert specified port_id to virtio device structure.
+ * The retured device can be used for vhost library APIs.
+ * To use vhost library APIs and vhost PMD parallely, below API should
+ * not be called, because the API will be called by vhost PMD.
+ * - rte_vhost_driver_session_start()
+ * Once a device is managed by vhost PMD, below API should not be called.
+ * - rte_vhost_driver_unregister()
+ * To unregister the device, call Port Hotplug APIs.
+ *
+ * @param port_id
+ * port number
+ * @return
+ * virtio net device structure corresponding to the specified port
+ * NULL will be returned in error cases.
+ */
+struct virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--
2.1.4
Yuanhan Liu
2015-12-17 11:47:50 UTC
Permalink
Post by Tetsuya Mukawa
This helper function is used to convert port id to virtio device
pointer. To use this function, a port should be managed by vhost PMD.
After getting virtio device pointer, it can be used for calling vhost
library APIs.
I'm thinking why is that necessary. I mean, hey, can we simply treat
it as a normal pmd driver, and don't consider any vhost lib functions
any more while using vhost pmd?

--yliu
Tetsuya Mukawa
2015-12-18 03:15:49 UTC
Permalink
Post by Yuanhan Liu
Post by Tetsuya Mukawa
This helper function is used to convert port id to virtio device
pointer. To use this function, a port should be managed by vhost PMD.
After getting virtio device pointer, it can be used for calling vhost
library APIs.
I'm thinking why is that necessary. I mean, hey, can we simply treat
it as a normal pmd driver, and don't consider any vhost lib functions
any more while using vhost pmd?
I guess vhost PMD cannot hide some of vhost features.
Because of this, we may need to add ethdev APIs to wraps these features.
I described more in one more email. Could you please see it also?

Thanks,
Tetsuya

Tetsuya Mukawa
2015-12-08 01:12:52 UTC
Permalink
Hi Xie and Yuanhan,

Please let me make sure whether this patch is differed.
If it is differed, I guess I may need to add ABI breakage notice before
releasing DPDK-2.2, because the patches changes virtio_net structure.

Tetsuya,
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.
- Rebase on latest master.
- Fix RX/TX routine to count RX/TX bytes.
- Fix RX/TX routine not to count as error packets if enqueue/dequeue
cannot send all packets.
- Fix if-condition checking for multiqueues.
- Add "static" to pthread variable.
- Fix format.
- Change default behavior not to receive queueing event from driver.
- Split the patch to separate rte_eth_vhost_portid2vdev().
- Rebase on latest DPDK tree.
- Fix cording style.
- Fix code not to invoke multiple messaging handling threads.
- Fix code to handle vdev parameters correctly.
- Remove needless cast.
- Remove needless if-condition before rt_free().
- Rebase on latest matser
- Specify correct queue_id in RX/TX function.
- Remove a below patch that fixes vhost library.
The patch was applied as a separate patch.
- vhost: fix crash with multiqueue enabled
- Fix typos.
(Thanks to Thomas, Monjalon)
- Rebase on latest tree with above bernard's patches.
- Support vhost multiple queues.
- Rebase on "remove pci driver from vdevs".
- Optimize RX/TX functions.
- Fix resource leaks.
- Fix compile issue.
- Add patch to fix vhost library.
- Optimize performance.
In RX/TX functions, change code to access only per core data.
- Add below API to allow user to use vhost library APIs for a port managed
by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
- rte_eth_vhost_portid2vdev()
To support this functionality, vhost library is also changed.
Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
- Add code to support vhost multiple queues.
Actually, multiple queues functionality is not enabled so far.
- Fix issues reported by checkpatch.pl
(Thanks to Stephen Hemminger)
vhost: Add callback and private data for vhost PMD
vhost: Add VHOST PMD
vhost: Add helper function to convert port id to virtio device pointer
config/common_linuxapp | 6 +
doc/guides/nics/index.rst | 1 +
doc/guides/rel_notes/release_2_2.rst | 2 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 62 ++
drivers/net/vhost/rte_eth_vhost.c | 796 ++++++++++++++++++++++++++
drivers/net/vhost/rte_eth_vhost.h | 65 +++
drivers/net/vhost/rte_pmd_vhost_version.map | 8 +
lib/librte_vhost/rte_vhost_version.map | 6 +
lib/librte_vhost/rte_virtio_net.h | 3 +
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +-
lib/librte_vhost/virtio-net.c | 60 +-
lib/librte_vhost/virtio-net.h | 4 +-
mk/rte.app.mk | 8 +-
14 files changed, 1024 insertions(+), 14 deletions(-)
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_eth_vhost.h
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
Yuanhan Liu
2015-12-08 02:03:48 UTC
Permalink
Post by Tetsuya Mukawa
Hi Xie and Yuanhan,
Please let me make sure whether this patch is differed.
If it is differed, I guess I may need to add ABI breakage notice before
Tetsuya,

What do you mean by "differed"? Do you mean "delayed"?

Per my understanding, it's a bit late for v2.2 (even at few weeks
before). On the other hand, I'm still waiting for comments from
Huawei, for there are still one or two issues need more discussion.
Post by Tetsuya Mukawa
releasing DPDK-2.2, because the patches changes virtio_net structure.
I had sent a patch (which is just applied by Thomas) for reserving
some spaces for both virtio_net and vhost_virtqueue structure, so
it will not break anything if you simply add few more fields :)

--yliu
Tetsuya Mukawa
2015-12-08 02:10:31 UTC
Permalink
Post by Bruce Richardson
Post by Tetsuya Mukawa
Hi Xie and Yuanhan,
Please let me make sure whether this patch is differed.
If it is differed, I guess I may need to add ABI breakage notice before
Tetsuya,
What do you mean by "differed"? Do you mean "delayed"?
Hi Yuanhan,

I just guess the patch will not be merged in DPDK-2.2.
Post by Bruce Richardson
Per my understanding, it's a bit late for v2.2 (even at few weeks
before). On the other hand, I'm still waiting for comments from
Huawei, for there are still one or two issues need more discussion.
Yes, I agree with you.
Post by Bruce Richardson
Post by Tetsuya Mukawa
releasing DPDK-2.2, because the patches changes virtio_net structure.
I had sent a patch (which is just applied by Thomas) for reserving
some spaces for both virtio_net and vhost_virtqueue structure, so
it will not break anything if you simply add few more fields :)
Sounds great idea!
Thanks for handling virtio things.

Tetsuya,
Tetsuya Mukawa
2015-11-13 05:20:30 UTC
Permalink
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
lib/librte_vhost/rte_vhost_version.map | 6 +++
lib/librte_vhost/rte_virtio_net.h | 3 ++
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++---
lib/librte_vhost/virtio-net.c | 60 +++++++++++++++++++++++++--
lib/librte_vhost/virtio-net.h | 4 +-
5 files changed, 73 insertions(+), 13 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
rte_vhost_driver_unregister;

} DPDK_2.0;
+
+DPDK_2.2 {
+ global:
+
+ rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 5687452..3ef6e58 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -128,6 +128,7 @@ struct virtio_net {
char ifname[IF_NAME_SZ]; /**< Name of the tap device or socket path. */
uint32_t virt_qp_nb; /**< number of queue pair we have allocated */
void *priv; /**< private context */
+ void *pmd_priv; /**< private context for vhost PMD */
struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; /**< Contains all virtqueue information. */
} __rte_cache_aligned;

@@ -224,6 +225,8 @@ int rte_vhost_driver_unregister(const char *dev_name);

/* Register callbacks. */
int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+/* Register callbacks for vhost PMD (Only for internal). */
+int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const);
/* Start vhost driver session blocking loop. */
int rte_vhost_driver_session_start(void);

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index d07452a..d8ae2fc 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -111,7 +111,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)

/* Remove from the data plane. */
if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

if (dev->mem) {
free_mem_region(dev);
@@ -272,7 +272,7 @@ user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)

if (virtio_is_ready(dev) &&
!(dev->flags & VIRTIO_DEV_RUNNING))
- notify_ops->new_device(dev);
+ notify_new_device(dev);
}

/*
@@ -288,7 +288,7 @@ user_get_vring_base(struct vhost_device_ctx ctx,
return -1;
/* We have to stop the queue (virtio) if it is running. */
if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

/* Here we are safe to get the last used index */
ops->get_vring_base(ctx, state->index, state);
@@ -324,10 +324,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
"set queue enable: %d to qp idx: %d\n",
enable, state->index);

- if (notify_ops->vring_state_changed) {
- notify_ops->vring_state_changed(dev, base_idx / VIRTIO_QNUM,
- enable);
- }
+ notify_vring_state_changed(dev, base_idx / VIRTIO_QNUM, enable);

dev->virtqueue[base_idx + VIRTIO_RXQ]->enabled = enable;
dev->virtqueue[base_idx + VIRTIO_TXQ]->enabled = enable;
@@ -341,7 +338,7 @@ user_destroy_device(struct vhost_device_ctx ctx)
struct virtio_net *dev = get_device(ctx);

if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

if (dev && dev->mem) {
free_mem_region(dev);
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index cc917da..886c104 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -65,6 +65,8 @@ struct virtio_net_config_ll {

/* device ops to add/remove device to/from data core. */
struct virtio_net_device_ops const *notify_ops;
+/* device ops for vhost PMD to add/remove device to/from data core. */
+struct virtio_net_device_ops const *pmd_notify_ops;
/* root address of the linked list of managed virtio devices */
static struct virtio_net_config_ll *ll_root;

@@ -81,6 +83,45 @@ static struct virtio_net_config_ll *ll_root;
static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;


+int
+notify_new_device(struct virtio_net *dev)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device != NULL)) {
+ int ret = pmd_notify_ops->new_device(dev);
+
+ if (ret != 0)
+ return ret;
+ }
+ if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
+ return notify_ops->new_device(dev);
+
+ return 0;
+}
+
+void
+notify_destroy_device(volatile struct virtio_net *dev)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device != NULL))
+ pmd_notify_ops->destroy_device(dev);
+ if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
+ notify_ops->destroy_device(dev);
+}
+
+int
+notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable)
+{
+ if ((pmd_notify_ops != NULL) && (pmd_notify_ops->vring_state_changed != NULL)) {
+ int ret = pmd_notify_ops->vring_state_changed(dev, queue_id, enable);
+
+ if (ret != 0)
+ return ret;
+ }
+ if ((notify_ops != NULL) && (notify_ops->vring_state_changed != NULL))
+ return notify_ops->vring_state_changed(dev, queue_id, enable);
+
+ return 0;
+}
+
/*
* Converts QEMU virtual address to Vhost virtual address. This function is
* used to convert the ring addresses to our address space.
@@ -374,7 +415,7 @@ destroy_device(struct vhost_device_ctx ctx)
* the function to remove it from the data core.
*/
if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
- notify_ops->destroy_device(&(ll_dev_cur->dev));
+ notify_destroy_device(&(ll_dev_cur->dev));
ll_dev_cur = rm_config_ll_entry(ll_dev_cur,
ll_dev_last);
} else {
@@ -432,7 +473,7 @@ reset_owner(struct vhost_device_ctx ctx)
return -1;

if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);

cleanup_device(dev);
reset_device(dev);
@@ -790,12 +831,12 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
- return notify_ops->new_device(dev);
+ return notify_new_device(dev);
}
/* Otherwise we remove it. */
} else
if (file->fd == VIRTIO_DEV_STOPPED)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);
return 0;
}

@@ -879,3 +920,14 @@ rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const op

return 0;
}
+
+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const ops)
+{
+ pmd_notify_ops = ops;
+
+ return 0;
+}
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
index 75fb57e..0816e71 100644
--- a/lib/librte_vhost/virtio-net.h
+++ b/lib/librte_vhost/virtio-net.h
@@ -37,7 +37,9 @@
#include "vhost-net.h"
#include "rte_virtio_net.h"

-struct virtio_net_device_ops const *notify_ops;
struct virtio_net *get_device(struct vhost_device_ctx ctx);

+int notify_new_device(struct virtio_net *dev);
+void notify_destroy_device(volatile struct virtio_net *dev);
+int notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable);
#endif
--
2.1.4
Yuanhan Liu
2015-11-17 13:29:33 UTC
Permalink
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs.
Can you be more specific about this?

--yliu
Tetsuya Mukawa
2015-11-19 02:03:50 UTC
Permalink
Post by Yuanhan Liu
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs.
Can you be more specific about this?
--yliu
How about like below?

commit log:
Currently, when virtio device is created and destroyed, vhost library
will call one of callback handlers.
The vhost PMD need to use this pair of callback handlers to know which
virtio devices are connected actually.
Because we can register only one pair of callbacks to vhost library, if
the PMD use it, DPDK applications
cannot have a way to know the events. This may break legacy DPDK
applications that uses vhost library.
To prevent it, this patch adds one more pair of callbacks to vhost
library especially for the vhost PMD.
With the patch, legacy applications can use the vhost PMD even if they
need additional specific handling
for virtio device creation and destruction.
For example, legacy application can call
rte_vhost_enable_guest_notification() in callbacks to change setting.

Tetsuya
Yuanhan Liu
2015-11-19 02:18:50 UTC
Permalink
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs.
Can you be more specific about this?
--yliu
How about like below?
Currently, when virtio device is created and destroyed, vhost library
will call one of callback handlers.
The vhost PMD need to use this pair of callback handlers to know which
virtio devices are connected actually.
Because we can register only one pair of callbacks to vhost library, if
the PMD use it, DPDK applications
cannot have a way to know the events.
Will (and why) the two co-exist at same time?

--yliu
Post by Tetsuya Mukawa
This may break legacy DPDK
applications that uses vhost library.
To prevent it, this patch adds one more pair of callbacks to vhost
library especially for the vhost PMD.
With the patch, legacy applications can use the vhost PMD even if they
need additional specific handling
for virtio device creation and destruction.
For example, legacy application can call
rte_vhost_enable_guest_notification() in callbacks to change setting.
Tetsuya
Tetsuya Mukawa
2015-11-19 03:13:38 UTC
Permalink
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs.
Can you be more specific about this?
--yliu
How about like below?
Currently, when virtio device is created and destroyed, vhost library
will call one of callback handlers.
The vhost PMD need to use this pair of callback handlers to know which
virtio devices are connected actually.
Because we can register only one pair of callbacks to vhost library, if
the PMD use it, DPDK applications
cannot have a way to know the events.
Will (and why) the two co-exist at same time?
Yes it is. Sure, I will describe below in commit log.

Because we cannot map some of vhost library APIs to ethdev APIs, in some
cases, we still
need to use vhost library APIs for a port created by the vhost PMD. One
of example is
rte_vhost_enable_guest_notification().

Thanks,
Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
This may break legacy DPDK
applications that uses vhost library.
To prevent it, this patch adds one more pair of callbacks to vhost
library especially for the vhost PMD.
With the patch, legacy applications can use the vhost PMD even if they
need additional specific handling
for virtio device creation and destruction.
For example, legacy application can call
rte_vhost_enable_guest_notification() in callbacks to change setting.
Tetsuya
Yuanhan Liu
2015-11-19 03:33:30 UTC
Permalink
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs.
Can you be more specific about this?
--yliu
How about like below?
Currently, when virtio device is created and destroyed, vhost library
will call one of callback handlers.
The vhost PMD need to use this pair of callback handlers to know which
virtio devices are connected actually.
Because we can register only one pair of callbacks to vhost library, if
the PMD use it, DPDK applications
cannot have a way to know the events.
Will (and why) the two co-exist at same time?
Yes it is. Sure, I will describe below in commit log.
Because we cannot map some of vhost library APIs to ethdev APIs, in some
cases, we still
need to use vhost library APIs for a port created by the vhost PMD. One
of example is
rte_vhost_enable_guest_notification().
I don't get it why it has something to do with a standalone PMD callback.
And if you don't call rte_vhost_enable_guest_notification() inside vhost
PMD, where else can you call that? I mean, you can't start vhost-pmd
and vhost-swithc in the mean time, right?

And, pmd callback and the old notify callback will not exist at same
time in one case, right? If so, why is that needed?

BTW, if it's a MUST, would you provide a specific example?


--yliu
Post by Tetsuya Mukawa
Thanks,
Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
This may break legacy DPDK
applications that uses vhost library.
To prevent it, this patch adds one more pair of callbacks to vhost
library especially for the vhost PMD.
With the patch, legacy applications can use the vhost PMD even if they
need additional specific handling
for virtio device creation and destruction.
For example, legacy application can call
rte_vhost_enable_guest_notification() in callbacks to change setting.
Tetsuya
Tetsuya Mukawa
2015-11-19 05:14:13 UTC
Permalink
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs.
Can you be more specific about this?
--yliu
How about like below?
Currently, when virtio device is created and destroyed, vhost library
will call one of callback handlers.
The vhost PMD need to use this pair of callback handlers to know which
virtio devices are connected actually.
Because we can register only one pair of callbacks to vhost library, if
the PMD use it, DPDK applications
cannot have a way to know the events.
Will (and why) the two co-exist at same time?
Yes it is. Sure, I will describe below in commit log.
Because we cannot map some of vhost library APIs to ethdev APIs, in some
cases, we still
need to use vhost library APIs for a port created by the vhost PMD. One
of example is
rte_vhost_enable_guest_notification().
I don't get it why it has something to do with a standalone PMD callback.
And if you don't call rte_vhost_enable_guest_notification() inside vhost
PMD, where else can you call that? I mean, you can't start vhost-pmd
and vhost-swithc in the mean time, right?
No it's not true, even after connecting to virtio-net device, you can
change the flag.
It's just a hint for virtio-net driver, and it will be used while queuing.
(We may be able to change the flag, even while sending or receiving packets)
Post by Yuanhan Liu
And, pmd callback and the old notify callback will not exist at same
time in one case, right? If so, why is that needed?
BTW, if it's a MUST, would you provide a specific example?
Actually, this patch is not a MUST.

But still the users need callback handlers to know when virtio-net
device is connected or disconnected.
This is because the user can call rte_vhost_enable_guest_notification()
only while connection is established.

Probably we can use link status changed callback of the PMD for this
purpose.
(The vhost PMD will notice DPDK application using link status callback)

But I am not sure whether we need to implement link status changed
callback for this purpose.
While processing this callback handler, the users will only calls vhost
library APIs that ethdev API cannot map, or store some variables related
with vhost library.
If so, this callback handler itself is specific for using vhost library.
And it may be ok that callback itself is implemented as one of vhost
library APIs.

Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
Thanks,
Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
This may break legacy DPDK
applications that uses vhost library.
To prevent it, this patch adds one more pair of callbacks to vhost
library especially for the vhost PMD.
With the patch, legacy applications can use the vhost PMD even if they
need additional specific handling
for virtio device creation and destruction.
For example, legacy application can call
rte_vhost_enable_guest_notification() in callbacks to change setting.
Tetsuya
Yuanhan Liu
2015-11-19 05:45:36 UTC
Permalink
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs.
Can you be more specific about this?
--yliu
How about like below?
Currently, when virtio device is created and destroyed, vhost library
will call one of callback handlers.
The vhost PMD need to use this pair of callback handlers to know which
virtio devices are connected actually.
Because we can register only one pair of callbacks to vhost library, if
the PMD use it, DPDK applications
cannot have a way to know the events.
Will (and why) the two co-exist at same time?
Yes it is. Sure, I will describe below in commit log.
Because we cannot map some of vhost library APIs to ethdev APIs, in some
cases, we still
need to use vhost library APIs for a port created by the vhost PMD. One
of example is
rte_vhost_enable_guest_notification().
I don't get it why it has something to do with a standalone PMD callback.
And if you don't call rte_vhost_enable_guest_notification() inside vhost
PMD, where else can you call that? I mean, you can't start vhost-pmd
and vhost-swithc in the mean time, right?
No it's not true, even after connecting to virtio-net device, you can
change the flag.
It's just a hint for virtio-net driver, and it will be used while queuing.
(We may be able to change the flag, even while sending or receiving packets)
Post by Yuanhan Liu
And, pmd callback and the old notify callback will not exist at same
time in one case, right? If so, why is that needed?
BTW, if it's a MUST, would you provide a specific example?
Actually, this patch is not a MUST.
But still the users need callback handlers to know when virtio-net
device is connected or disconnected.
This is because the user can call rte_vhost_enable_guest_notification()
only while connection is established.
What does "the user" mean? Is there a second user of vhost lib besides
vhost PMD, that he has to interact with those connected devices? If so,
how?

--yliu
Post by Tetsuya Mukawa
Probably we can use link status changed callback of the PMD for this
purpose.
(The vhost PMD will notice DPDK application using link status callback)
But I am not sure whether we need to implement link status changed
callback for this purpose.
While processing this callback handler, the users will only calls vhost
library APIs that ethdev API cannot map, or store some variables related
with vhost library.
If so, this callback handler itself is specific for using vhost library.
And it may be ok that callback itself is implemented as one of vhost
library APIs.
Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
Thanks,
Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
This may break legacy DPDK
applications that uses vhost library.
To prevent it, this patch adds one more pair of callbacks to vhost
library especially for the vhost PMD.
With the patch, legacy applications can use the vhost PMD even if they
need additional specific handling
for virtio device creation and destruction.
For example, legacy application can call
rte_vhost_enable_guest_notification() in callbacks to change setting.
Tetsuya
Tetsuya Mukawa
2015-11-19 05:58:56 UTC
Permalink
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs.
Can you be more specific about this?
--yliu
How about like below?
Currently, when virtio device is created and destroyed, vhost library
will call one of callback handlers.
The vhost PMD need to use this pair of callback handlers to know which
virtio devices are connected actually.
Because we can register only one pair of callbacks to vhost library, if
the PMD use it, DPDK applications
cannot have a way to know the events.
Will (and why) the two co-exist at same time?
Yes it is. Sure, I will describe below in commit log.
Because we cannot map some of vhost library APIs to ethdev APIs, in some
cases, we still
need to use vhost library APIs for a port created by the vhost PMD. One
of example is
rte_vhost_enable_guest_notification().
I don't get it why it has something to do with a standalone PMD callback.
And if you don't call rte_vhost_enable_guest_notification() inside vhost
PMD, where else can you call that? I mean, you can't start vhost-pmd
and vhost-swithc in the mean time, right?
No it's not true, even after connecting to virtio-net device, you can
change the flag.
It's just a hint for virtio-net driver, and it will be used while queuing.
(We may be able to change the flag, even while sending or receiving packets)
Post by Yuanhan Liu
And, pmd callback and the old notify callback will not exist at same
time in one case, right? If so, why is that needed?
BTW, if it's a MUST, would you provide a specific example?
Actually, this patch is not a MUST.
But still the users need callback handlers to know when virtio-net
device is connected or disconnected.
This is because the user can call rte_vhost_enable_guest_notification()
only while connection is established.
What does "the user" mean? Is there a second user of vhost lib besides
vhost PMD, that he has to interact with those connected devices? If so,
how?
Sorry, my English is wrong.
Not a second user.

For example, If DPDK application has a port created by vhost PMD, then
needs to call rte_vhost_enable_guest_notification() to the port.
DPDK application needs to know when virtio-net device is connected or
disconnected, because the function is only valid while connecting.
But without callback handler, DPDK application cannot know it.

This is what I wanted to explain.

Thanks,
Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
Probably we can use link status changed callback of the PMD for this
purpose.
(The vhost PMD will notice DPDK application using link status callback)
But I am not sure whether we need to implement link status changed
callback for this purpose.
While processing this callback handler, the users will only calls vhost
library APIs that ethdev API cannot map, or store some variables related
with vhost library.
If so, this callback handler itself is specific for using vhost library.
And it may be ok that callback itself is implemented as one of vhost
library APIs.
Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
Thanks,
Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
This may break legacy DPDK
applications that uses vhost library.
To prevent it, this patch adds one more pair of callbacks to vhost
library especially for the vhost PMD.
With the patch, legacy applications can use the vhost PMD even if they
need additional specific handling
for virtio device creation and destruction.
For example, legacy application can call
rte_vhost_enable_guest_notification() in callbacks to change setting.
Tetsuya
Yuanhan Liu
2015-11-19 06:31:37 UTC
Permalink
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs.
Can you be more specific about this?
--yliu
How about like below?
Currently, when virtio device is created and destroyed, vhost library
will call one of callback handlers.
The vhost PMD need to use this pair of callback handlers to know which
virtio devices are connected actually.
Because we can register only one pair of callbacks to vhost library, if
the PMD use it, DPDK applications
cannot have a way to know the events.
Will (and why) the two co-exist at same time?
Yes it is. Sure, I will describe below in commit log.
Because we cannot map some of vhost library APIs to ethdev APIs, in some
cases, we still
need to use vhost library APIs for a port created by the vhost PMD. One
of example is
rte_vhost_enable_guest_notification().
I don't get it why it has something to do with a standalone PMD callback.
And if you don't call rte_vhost_enable_guest_notification() inside vhost
PMD, where else can you call that? I mean, you can't start vhost-pmd
and vhost-swithc in the mean time, right?
No it's not true, even after connecting to virtio-net device, you can
change the flag.
It's just a hint for virtio-net driver, and it will be used while queuing.
(We may be able to change the flag, even while sending or receiving packets)
Post by Yuanhan Liu
And, pmd callback and the old notify callback will not exist at same
time in one case, right? If so, why is that needed?
BTW, if it's a MUST, would you provide a specific example?
Actually, this patch is not a MUST.
But still the users need callback handlers to know when virtio-net
device is connected or disconnected.
This is because the user can call rte_vhost_enable_guest_notification()
only while connection is established.
What does "the user" mean? Is there a second user of vhost lib besides
vhost PMD, that he has to interact with those connected devices? If so,
how?
Sorry, my English is wrong.
Not a second user.
For example, If DPDK application has a port created by vhost PMD, then
needs to call rte_vhost_enable_guest_notification() to the port.
So, you are mixing the usage of vhost PMD and vhost lib in a DPDK
application? Say,

DPDK application
start_vhost_pmd
rte_vhost_driver_pmd_callback_register
rte_vhost_driver_callback_register

I know little about PMD, and not quite sure it's a good combo.

Huawei, comments?

--yliu
Post by Tetsuya Mukawa
DPDK application needs to know when virtio-net device is connected or
disconnected, because the function is only valid while connecting.
But without callback handler, DPDK application cannot know it.
This is what I wanted to explain.
Thanks,
Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
Probably we can use link status changed callback of the PMD for this
purpose.
(The vhost PMD will notice DPDK application using link status callback)
But I am not sure whether we need to implement link status changed
callback for this purpose.
While processing this callback handler, the users will only calls vhost
library APIs that ethdev API cannot map, or store some variables related
with vhost library.
If so, this callback handler itself is specific for using vhost library.
And it may be ok that callback itself is implemented as one of vhost
library APIs.
Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
Thanks,
Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
This may break legacy DPDK
applications that uses vhost library.
To prevent it, this patch adds one more pair of callbacks to vhost
library especially for the vhost PMD.
With the patch, legacy applications can use the vhost PMD even if they
need additional specific handling
for virtio device creation and destruction.
For example, legacy application can call
rte_vhost_enable_guest_notification() in callbacks to change setting.
Tetsuya
Tetsuya Mukawa
2015-11-19 06:37:50 UTC
Permalink
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
Post by Yuanhan Liu
Post by Tetsuya Mukawa
These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs.
Can you be more specific about this?
--yliu
How about like below?
Currently, when virtio device is created and destroyed, vhost library
will call one of callback handlers.
The vhost PMD need to use this pair of callback handlers to know which
virtio devices are connected actually.
Because we can register only one pair of callbacks to vhost library, if
the PMD use it, DPDK applications
cannot have a way to know the events.
Will (and why) the two co-exist at same time?
Yes it is. Sure, I will describe below in commit log.
Because we cannot map some of vhost library APIs to ethdev APIs, in some
cases, we still
need to use vhost library APIs for a port created by the vhost PMD. One
of example is
rte_vhost_enable_guest_notification().
I don't get it why it has something to do with a standalone PMD callback.
And if you don't call rte_vhost_enable_guest_notification() inside vhost
PMD, where else can you call that? I mean, you can't start vhost-pmd
and vhost-swithc in the mean time, right?
No it's not true, even after connecting to virtio-net device, you can
change the flag.
It's just a hint for virtio-net driver, and it will be used while queuing.
(We may be able to change the flag, even while sending or receiving packets)
Post by Yuanhan Liu
And, pmd callback and the old notify callback will not exist at same
time in one case, right? If so, why is that needed?
BTW, if it's a MUST, would you provide a specific example?
Actually, this patch is not a MUST.
But still the users need callback handlers to know when virtio-net
device is connected or disconnected.
This is because the user can call rte_vhost_enable_guest_notification()
only while connection is established.
What does "the user" mean? Is there a second user of vhost lib besides
vhost PMD, that he has to interact with those connected devices? If so,
how?
Sorry, my English is wrong.
Not a second user.
For example, If DPDK application has a port created by vhost PMD, then
needs to call rte_vhost_enable_guest_notification() to the port.
So, you are mixing the usage of vhost PMD and vhost lib in a DPDK
application? Say,
Yes, that is my intention.
Using ethdev(PMD) APIs and some library specific APIs to a same port is
used in bonding PMD also.

Thanks,
Tetsuya
Post by Yuanhan Liu
DPDK application
start_vhost_pmd
rte_vhost_driver_pmd_callback_register
rte_vhost_driver_callback_register
I know little about PMD, and not quite sure it's a good combo.
Huawei, comments?
--yliu
Post by Tetsuya Mukawa
DPDK application needs to know when virtio-net device is connected or
disconnected, because the function is only valid while connecting.
But without callback handler, DPDK application cannot know it.
This is what I wanted to explain.
Thanks,
Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
Probably we can use link status changed callback of the PMD for this
purpose.
(The vhost PMD will notice DPDK application using link status callback)
But I am not sure whether we need to implement link status changed
callback for this purpose.
While processing this callback handler, the users will only calls vhost
library APIs that ethdev API cannot map, or store some variables related
with vhost library.
If so, this callback handler itself is specific for using vhost library.
And it may be ok that callback itself is implemented as one of vhost
library APIs.
Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
Thanks,
Tetsuya
Post by Yuanhan Liu
--yliu
Post by Tetsuya Mukawa
This may break legacy DPDK
applications that uses vhost library.
To prevent it, this patch adds one more pair of callbacks to vhost
library especially for the vhost PMD.
With the patch, legacy applications can use the vhost PMD even if they
need additional specific handling
for virtio device creation and destruction.
For example, legacy application can call
rte_vhost_enable_guest_notification() in callbacks to change setting.
Tetsuya
Yuanhan Liu
2015-11-09 05:42:35 UTC
Permalink
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.
* Known issue.
We may see issues while handling RESET_OWNER message.
These handlings are done in vhost library, so not a part of vhost PMD.
So far, we are waiting for QEMU fixing.
I will try to fix them in this week.

--yliu
Post by Tetsuya Mukawa
- Rebase on latest matser
- Specify correct queue_id in RX/TX function.
- Remove a below patch that fixes vhost library.
The patch was applied as a separate patch.
- vhost: fix crash with multiqueue enabled
- Fix typos.
(Thanks to Thomas, Monjalon)
- Rebase on latest tree with above bernard's patches.
- Support vhost multiple queues.
- Rebase on "remove pci driver from vdevs".
- Optimize RX/TX functions.
- Fix resource leaks.
- Fix compile issue.
- Add patch to fix vhost library.
- Optimize performance.
In RX/TX functions, change code to access only per core data.
- Add below API to allow user to use vhost library APIs for a port managed
by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
- rte_eth_vhost_portid2vdev()
To support this functionality, vhost library is also changed.
Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
- Add code to support vhost multiple queues.
Actually, multiple queues functionality is not enabled so far.
- Fix issues reported by checkpatch.pl
(Thanks to Stephen Hemminger)
vhost: Add callback and private data for vhost PMD
vhost: Add VHOST PMD
config/common_linuxapp | 6 +
doc/guides/nics/index.rst | 1 +
doc/guides/rel_notes/release_2_2.rst | 2 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 62 +++
drivers/net/vhost/rte_eth_vhost.c | 768 ++++++++++++++++++++++++++
drivers/net/vhost/rte_eth_vhost.h | 65 +++
drivers/net/vhost/rte_pmd_vhost_version.map | 8 +
lib/librte_vhost/rte_vhost_version.map | 6 +
lib/librte_vhost/rte_virtio_net.h | 3 +
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +-
lib/librte_vhost/virtio-net.c | 56 +-
lib/librte_vhost/virtio-net.h | 4 +-
mk/rte.app.mk | 8 +-
14 files changed, 993 insertions(+), 13 deletions(-)
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_eth_vhost.h
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
--
2.1.4
Tetsuya Mukawa
2015-11-02 03:58:57 UTC
Permalink
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
- iface: The parameter is used to specify a path to connect to a
virtio-net device.
- queues: The parameter is used to specify the number of the queues
virtio-net device has.
(Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
-device virtio-net-pci,netdev=net0

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
config/common_linuxapp | 6 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/vhost.rst | 82 +++
doc/guides/rel_notes/release_2_2.rst | 2 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 62 +++
drivers/net/vhost/rte_eth_vhost.c | 765 ++++++++++++++++++++++++++++
drivers/net/vhost/rte_eth_vhost.h | 65 +++
drivers/net/vhost/rte_pmd_vhost_version.map | 8 +
mk/rte.app.mk | 8 +-
10 files changed, 1002 insertions(+), 1 deletion(-)
create mode 100644 doc/guides/nics/vhost.rst
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_eth_vhost.h
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index c1d4bbd..fd103e7 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -457,6 +457,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
CONFIG_RTE_LIBRTE_VHOST_DEBUG=n

#
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
#Compile Xen domain0 support
#
CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 2d4936d..57d1041 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -47,6 +47,7 @@ Network Interface Controller Drivers
mlx4
mlx5
virtio
+ vhost
vmxnet3
pcap_ring

diff --git a/doc/guides/nics/vhost.rst b/doc/guides/nics/vhost.rst
new file mode 100644
index 0000000..2ec8d79
--- /dev/null
+++ b/doc/guides/nics/vhost.rst
@@ -0,0 +1,82 @@
+.. BSD LICENSE
+ Copyright(c) 2015 IGEL Co., Ltd.. All rights reserved.
+ All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions
+ are met:
+
+ * Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in
+ the documentation and/or other materials provided with the
+ distribution.
+ * Neither the name of IGEL Co., Ltd. nor the names of its
+ contributors may be used to endorse or promote products derived
+ from this software without specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Poll Mode Driver that wraps vhost library
+=========================================
+
+This PMD is a thin wrapper of the DPDK vhost library.
+The User can handle virtqueues as one of normal DPDK port.
+
+Vhost Implementation in DPDK
+----------------------------
+
+Please refer to Chapter "Vhost Library" of Programmer's Guide to know detail of vhost.
+
+Features and Limitations of vhost PMD
+-------------------------------------
+
+In this release, the vhost PMD provides the basic functionality of packet reception and transmission.
+
+* It provides the function to convert port_id to a pointer of virtio_net device.
+ It allows the user to use vhost library with the PMD in parallel.
+
+* It has multiple queues support.
+
+* It supports Port Hotplug functionality.
+
+* Don't need to stop RX/TX, when the user wants to stop a guest or a virtio-net driver on guest.
+
+Vhost PMD with testpmd application
+----------------------------------
+
+This section demonstrates vhost PMD with testpmd DPDK sample application.
+
+#. Launch the testpmd with vhost PMD:
+
+ .. code-block:: console
+
+ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
+
+ Other basic DPDK preparations like hugepage enabling here.
+ Please refer to the *DPDK Getting Started Guide* for detailed instructions.
+
+#. Launch the QEMU:
+
+ .. code-block:: console
+
+ qemu-system-x86_64 <snip>
+ -chardev socket,id=chr0,path=/tmp/sock0 \
+ -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
+ -device virtio-net-pci,netdev=net0
+
+ This command generates one virtio-net device for QEMU.
+ Once device is recognized by guest, The user can handle the device as normal
+ virtio-net device.
+ When initialization processes between virtio-net driver and vhost library are done, Port status of the testpmd will be linked up.
diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 429dfe6..466c1de 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -58,6 +58,8 @@ New Features
* **Added port hotplug support to xenvirt.**


+* **Added vhost PMD.**
+
* **Removed the PCI device from vdev PMD's.**

* This change required modifications to librte_ether and all vdev and pdev PMD's.
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 6da1ce2..66eb63d 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -50,5 +50,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt

+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
include $(RTE_SDK)/mk/rte.sharelib.mk
include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..8186a80
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+# BSD LICENSE
+#
+# Copyright (c) 2010-2015 Intel Corporation.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in
+# the documentation and/or other materials provided with the
+# distribution.
+# * Neither the name of Intel corporation nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..5e6da9a
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,765 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015 IGEL Co., Ltd.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of IGEL Co.,Ltd. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG "iface"
+#define ETH_VHOST_QUEUES_ARG "queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+ ETH_VHOST_IFACE_ARG,
+ ETH_VHOST_QUEUES_ARG,
+ NULL
+};
+
+static struct ether_addr base_eth_addr = {
+ .addr_bytes = {
+ 0x56 /* V */,
+ 0x48 /* H */,
+ 0x4F /* O */,
+ 0x53 /* S */,
+ 0x54 /* T */,
+ 0x00
+ }
+};
+
+struct vhost_queue {
+ struct virtio_net *device;
+ struct pmd_internal *internal;
+ struct rte_mempool *mb_pool;
+ rte_atomic32_t allow_queuing;
+ rte_atomic32_t while_queuing;
+ uint64_t rx_pkts;
+ uint64_t tx_pkts;
+ uint64_t err_pkts;
+};
+
+struct pmd_internal {
+ TAILQ_ENTRY(pmd_internal) next;
+ char *dev_name;
+ char *iface_name;
+ unsigned nb_rx_queues;
+ unsigned nb_tx_queues;
+
+ struct vhost_queue *rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+ struct vhost_queue *tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+
+ volatile uint16_t once;
+ pthread_t session_th;
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+ TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static struct rte_eth_link pmd_link = {
+ .link_speed = 10000,
+ .link_duplex = ETH_LINK_FULL_DUPLEX,
+ .link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t nb_rx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Dequeue packets from guest TX queue */
+ nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+ VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
+
+ r->rx_pkts += nb_rx;
+
+out:
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+ VIRTIO_RXQ, bufs, nb_bufs);
+
+ r->tx_pkts += nb_tx;
+ r->err_pkts += nb_bufs - nb_tx;
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
+
+out:
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+ int found = 0;
+ struct pmd_internal *internal;
+
+ if (ifname == NULL)
+ return NULL;
+
+ pthread_mutex_lock(&internal_list_lock);
+
+ TAILQ_FOREACH(internal, &internals_list, next) {
+ if (!strcmp(internal->iface_name, ifname)) {
+ found = 1;
+ break;
+ }
+ }
+
+ pthread_mutex_unlock(&internal_list_lock);
+
+ if (!found)
+ return NULL;
+
+ return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "Invalid argument\n");
+ return -1;
+ }
+
+ internal = find_internal_resource(dev->ifname);
+ if (internal == NULL) {
+ RTE_LOG(INFO, PMD, "Invalid device name\n");
+ return -1;
+ }
+
+ if ((dev->virt_qp_nb < internal->nb_rx_queues) ||
+ (dev->virt_qp_nb < internal->nb_tx_queues)) {
+ RTE_LOG(INFO, PMD, "Not enough queues\n");
+ return -1;
+ }
+
+ eth_dev = rte_eth_dev_allocated(internal->dev_name);
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+ return -1;
+ }
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = dev;
+ vq->internal = internal;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = dev;
+ vq->internal = internal;
+ }
+
+ dev->flags |= VIRTIO_DEV_RUNNING;
+ dev->pmd_priv = eth_dev;
+ eth_dev->data->dev_link.link_status = 1;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 1);
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 1);
+ }
+ RTE_LOG(INFO, PMD, "New connection established\n");
+
+ return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "Invalid argument\n");
+ return;
+ }
+
+ eth_dev = (struct rte_eth_dev *)dev->pmd_priv;
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+ return;
+ }
+
+ internal = eth_dev->data->dev_private;
+
+ /* Wait until rx/tx_pkt_burst stops accessing vhost device */
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 0);
+ while (rte_atomic32_read(&vq->while_queuing))
+ rte_pause();
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ rte_atomic32_set(&vq->allow_queuing, 0);
+ while (rte_atomic32_read(&vq->while_queuing))
+ rte_pause();
+ }
+
+ eth_dev->data->dev_link.link_status = 0;
+
+ dev->pmd_priv = NULL;
+ dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = internal->rx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = NULL;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = internal->tx_vhost_queues[i];
+ if (vq == NULL)
+ continue;
+ vq->device = NULL;
+ }
+
+ RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *vhost_driver_session(void *param __rte_unused)
+{
+ static struct virtio_net_device_ops *vhost_ops;
+
+ vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+ if (vhost_ops == NULL)
+ rte_panic("Can't allocate memory\n");
+
+ /* set vhost arguments */
+ vhost_ops->new_device = new_device;
+ vhost_ops->destroy_device = destroy_device;
+ if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
+ rte_panic("Can't register callbacks\n");
+
+ /* start event handling */
+ rte_vhost_driver_session_start();
+
+ rte_free(vhost_ops);
+ pthread_exit(0);
+}
+
+static void vhost_driver_session_start(struct pmd_internal *internal)
+{
+ int ret;
+
+ ret = pthread_create(&internal->session_th,
+ NULL, vhost_driver_session, NULL);
+ if (ret)
+ rte_panic("Can't create a thread\n");
+}
+
+static void vhost_driver_session_stop(struct pmd_internal *internal)
+{
+ int ret;
+
+ ret = pthread_cancel(internal->session_th);
+ if (ret)
+ rte_panic("Can't cancel the thread\n");
+
+ ret = pthread_join(internal->session_th, NULL);
+ if (ret)
+ rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+ int ret;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+ ret = rte_vhost_driver_register(internal->iface_name);
+ if (ret)
+ return ret;
+
+ vhost_driver_session_start(internal);
+ }
+ return 0;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ if (rte_atomic16_cmpset(&internal->once, 1, 0)) {
+ rte_vhost_driver_unregister(internal->iface_name);
+ vhost_driver_session_stop(internal);
+ }
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+ struct vhost_queue *vq;
+
+ if (internal->rx_vhost_queues[rx_queue_id] != NULL)
+ rte_free(internal->rx_vhost_queues[rx_queue_id]);
+
+ vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (vq == NULL) {
+ RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+ return -ENOMEM;
+ }
+
+ vq->mb_pool = mb_pool;
+ internal->rx_vhost_queues[rx_queue_id] = vq;
+ dev->data->rx_queues[rx_queue_id] = vq;
+ return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+ struct vhost_queue *vq;
+
+ if (internal->tx_vhost_queues[tx_queue_id] != NULL)
+ rte_free(internal->tx_vhost_queues[tx_queue_id]);
+
+ vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (vq == NULL) {
+ RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+ return -ENOMEM;
+ }
+
+ internal->tx_vhost_queues[tx_queue_id] = vq;
+ dev->data->tx_queues[tx_queue_id] = vq;
+ return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+ struct rte_eth_dev_info *dev_info)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ dev_info->driver_name = drivername;
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)-1;
+ dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+ dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+ unsigned i;
+ unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+ const struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] == NULL)
+ continue;
+ igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
+ rx_total += igb_stats->q_ipackets[i];
+ }
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] == NULL)
+ continue;
+ igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
+ igb_stats->q_errors[i] = internal->tx_vhost_queues[i]->err_pkts;
+ tx_total += igb_stats->q_opackets[i];
+ tx_err_total += igb_stats->q_errors[i];
+ }
+
+ igb_stats->ipackets = rx_total;
+ igb_stats->opackets = tx_total;
+ igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+ unsigned i;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] == NULL)
+ continue;
+ internal->rx_vhost_queues[i]->rx_pkts = 0;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] == NULL)
+ continue;
+ internal->tx_vhost_queues[i]->tx_pkts = 0;
+ internal->tx_vhost_queues[i]->err_pkts = 0;
+ }
+}
+
+static void
+eth_queue_release(void *q __rte_unused) { ; }
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused) { return 0; }
+
+static const struct eth_dev_ops ops = {
+ .dev_start = eth_dev_start,
+ .dev_stop = eth_dev_stop,
+ .dev_configure = eth_dev_configure,
+ .dev_infos_get = eth_dev_info,
+ .rx_queue_setup = eth_rx_queue_setup,
+ .tx_queue_setup = eth_tx_queue_setup,
+ .rx_queue_release = eth_queue_release,
+ .tx_queue_release = eth_queue_release,
+ .link_update = eth_link_update,
+ .stats_get = eth_stats_get,
+ .stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+ char *iface_name,
+ int16_t queues,
+ const unsigned numa_node)
+{
+ struct rte_eth_dev_data *data = NULL;
+ struct pmd_internal *internal = NULL;
+ struct rte_eth_dev *eth_dev = NULL;
+ struct ether_addr *eth_addr = NULL;
+
+ RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+ numa_node);
+
+ /* now do all data allocation - for eth_dev structure, dummy pci driver
+ * and internal (private) data
+ */
+ data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+ if (data == NULL)
+ goto error;
+
+ internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+ if (internal == NULL)
+ goto error;
+
+ eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+ if (eth_addr == NULL)
+ goto error;
+ *eth_addr = base_eth_addr;
+ eth_addr->addr_bytes[5] = index;
+
+ /* reserve an ethdev entry */
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+ if (eth_dev == NULL)
+ goto error;
+
+ /* now put it all together
+ * - store queue data in internal,
+ * - store numa_node info in ethdev data
+ * - point eth_dev_data to internals
+ * - and point eth_dev structure to new eth_dev_data structure
+ */
+ internal->nb_rx_queues = queues;
+ internal->nb_tx_queues = queues;
+ internal->dev_name = strdup(name);
+ if (internal->dev_name == NULL)
+ goto error;
+ internal->iface_name = strdup(iface_name);
+ if (internal->iface_name == NULL)
+ goto error;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_INSERT_TAIL(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ data->dev_private = internal;
+ data->port_id = eth_dev->data->port_id;
+ memmove(data->name, eth_dev->data->name, sizeof(data->name));
+ data->nb_rx_queues = queues;
+ data->nb_tx_queues = queues;
+ data->dev_link = pmd_link;
+ data->mac_addrs = eth_addr;
+
+ /* We'll replace the 'data' originally allocated by eth_dev. So the
+ * vhost PMD resources won't be shared between multi processes.
+ */
+ eth_dev->data = data;
+ eth_dev->dev_ops = &ops;
+ eth_dev->driver = NULL;
+ eth_dev->data->dev_flags = RTE_ETH_DEV_DETACHABLE;
+ eth_dev->data->kdrv = RTE_KDRV_NONE;
+ eth_dev->data->drv_name = internal->dev_name;
+ eth_dev->data->numa_node = numa_node;
+
+ /* finally assign rx and tx ops */
+ eth_dev->rx_pkt_burst = eth_vhost_rx;
+ eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+ return data->port_id;
+
+error:
+ rte_free(data);
+ rte_free(internal);
+ rte_free(eth_addr);
+
+ return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **iface_name = extra_args;
+
+ if (value == NULL)
+ return -1;
+
+ *iface_name = value;
+
+ return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint16_t *q = extra_args;
+
+ if ((value == NULL) || (extra_args == NULL))
+ return -EINVAL;
+
+ *q = (uint16_t)strtoul(value, NULL, 0);
+ if ((*q == USHRT_MAX) && (errno == ERANGE))
+ return -1;
+
+ if (*q > RTE_MAX_QUEUES_PER_PORT)
+ return -1;
+
+ return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+ struct rte_kvargs *kvlist = NULL;
+ int ret = 0;
+ int index;
+ char *iface_name;
+ uint16_t queues;
+
+ RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+ kvlist = rte_kvargs_parse(params, valid_arguments);
+ if (kvlist == NULL)
+ return -1;
+
+ if (strlen(name) < strlen("eth_vhost"))
+ return -1;
+
+ index = strtol(name + strlen("eth_vhost"), NULL, 0);
+ if (errno == ERANGE)
+ return -1;
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+ &open_iface, &iface_name);
+ if (ret < 0)
+ goto out_free;
+ }
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+ &open_queues, &queues);
+ if (ret < 0)
+ goto out_free;
+
+ } else
+ queues = 1;
+
+ eth_dev_vhost_create(name, index,
+ iface_name, queues, rte_socket_id());
+
+out_free:
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+ struct rte_eth_dev *eth_dev = NULL;
+ struct pmd_internal *internal;
+ unsigned int i;
+
+ RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+ if (name == NULL)
+ return -EINVAL;
+
+ /* find an ethdev entry */
+ eth_dev = rte_eth_dev_allocated(name);
+ if (eth_dev == NULL)
+ return -ENODEV;
+
+ internal = eth_dev->data->dev_private;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_REMOVE(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ eth_dev_stop(eth_dev);
+
+ if ((internal) && (internal->dev_name))
+ free(internal->dev_name);
+ if ((internal) && (internal->iface_name))
+ free(internal->iface_name);
+
+ rte_free(eth_dev->data->mac_addrs);
+ rte_free(eth_dev->data);
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ if (internal->rx_vhost_queues[i] != NULL)
+ rte_free(internal->rx_vhost_queues[i]);
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ if (internal->tx_vhost_queues[i] != NULL)
+ rte_free(internal->tx_vhost_queues[i]);
+ }
+ rte_free(internal);
+
+ rte_eth_dev_release_port(eth_dev);
+
+ return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+ .name = "eth_vhost",
+ .type = PMD_VDEV,
+ .init = rte_pmd_vhost_devinit,
+ .uninit = rte_pmd_vhost_devuninit,
+};
+
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
+{
+ struct rte_eth_dev *eth_dev;
+
+ if (rte_eth_dev_is_valid_port(port_id) == 0)
+ return NULL;
+
+ eth_dev = &rte_eth_devices[port_id];
+ if (strncmp("eth_vhost", eth_dev->data->drv_name,
+ strlen("eth_vhost")) == 0) {
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+
+ internal = eth_dev->data->dev_private;
+ vq = internal->rx_vhost_queues[0];
+ if ((vq != NULL) && (vq->device != NULL))
+ return vq->device;
+ }
+
+ return NULL;
+}
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..22a880f
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,65 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 IGEL Co., Ltd.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of IGEL Co., Ltd. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_AF_PACKET_H_
+#define _RTE_ETH_AF_PACKET_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_virtio_net.h>
+
+/**
+ * The function convert specified port_id to virtio device structure.
+ * The retured device can be used for vhost library APIs.
+ * To use vhost library APIs and vhost PMD parallely, below API should
+ * not be called, because the API will be called by vhost PMD.
+ * - rte_vhost_driver_session_start()
+ * Once a device is managed by vhost PMD, below API should not be called.
+ * - rte_vhost_driver_unregister()
+ * To unregister the device, call Port Hotplug APIs.
+ *
+ * @param port_id
+ * port number
+ * @return
+ * virtio net device structure corresponding to the specified port
+ * NULL will be returned in error cases.
+ */
+struct virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..bf0361a
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,8 @@
+DPDK_2.2 {
+
+ global:
+
+ rte_eth_vhost_portid2vdev;
+
+ local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 724efa7..1af4bb3 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -148,7 +148,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += -lrte_pmd_af_packet
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += -lrte_pmd_null

-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)

endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
--
2.1.4
Yuanhan Liu
2015-11-06 02:22:35 UTC
Permalink
On Mon, Nov 02, 2015 at 12:58:57PM +0900, Tetsuya Mukawa wrote:
...
Post by Tetsuya Mukawa
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t nb_rx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Dequeue packets from guest TX queue */
+ nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+ VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
+
+ r->rx_pkts += nb_rx;
+
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+ VIRTIO_RXQ, bufs, nb_bufs);
+
Michael, I'm wondering here might be the better place to do "automatic
receive steering in multiqueue mode". I mean, as a library function,
queueing/dequeueing packets to/from a specific virt queue is reasonable
to me. It's upto the caller to pick the right queue, doing the queue
steering.

As an eth dev, I guess that's the proper place to do things like that.

Or, I'm thinking we could introduce another vhost function, for not
breaking current API, to do that, returning the right queue, so that
other applications (instead of the vhost pmd only) can use that as well.

Tetsuya, just in case you missed the early discussion about automic
receive steering, here is a link:

http://dpdk.org/ml/archives/dev/2015-October/025779.html


--yliu
Tetsuya Mukawa
2015-11-06 03:54:34 UTC
Permalink
Post by Yuanhan Liu
...
Post by Tetsuya Mukawa
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t nb_rx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Dequeue packets from guest TX queue */
+ nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+ VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
+
+ r->rx_pkts += nb_rx;
+
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+ VIRTIO_RXQ, bufs, nb_bufs);
+
Michael, I'm wondering here might be the better place to do "automatic
receive steering in multiqueue mode". I mean, as a library function,
queueing/dequeueing packets to/from a specific virt queue is reasonable
to me. It's upto the caller to pick the right queue, doing the queue
steering.
Hi Liu,

Oops, I've found a bug here.
To support multiple queues in vhost PMD, I needed to store "queue_id" in
"vhost_queue" structure.
Then, I should call rte_vhost_enqueue_burst() with the value.
Post by Yuanhan Liu
As an eth dev, I guess that's the proper place to do things like that.
Or, I'm thinking we could introduce another vhost function, for not
breaking current API, to do that, returning the right queue, so that
other applications (instead of the vhost pmd only) can use that as well.
I may not understand the steering function enough, but If we support the
steering function in vhost library or vhost PMD, how can we handle
"queue_id" parameter of TX functions?
Probably, we need to ignore the value In some cases.
This may confuse the users because they cannot observe the packets in
their specified queue.

So I guess it may be application responsibility to return packets to the
correct queue.
(But we should write a correct documentation about it)
Post by Yuanhan Liu
Tetsuya, just in case you missed the early discussion about automic
http://dpdk.org/ml/archives/dev/2015-October/025779.html
Thanks, I've checked it!

Tetsuya
Tetsuya Mukawa
2015-11-05 02:17:52 UTC
Permalink
Hi,

Could someone please review below patch series?

Regards,
Tetsuya
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. The patch will work on below patch series.
- [PATCH v7 00/28] remove pci driver from vdevs
* Known issue.
We may see issues while handling RESET_OWNER message.
These handlings are done in vhost library, so not a part of vhost PMD.
So far, we are waiting for QEMU fixing.
- Remove a below patch that fixes vhost library.
The patch was applied as a separate patch.
- vhost: fix crash with multiqueue enabled
- Fix typos.
(Thanks to Thomas, Monjalon)
- Rebase on latest tree with above bernard's patches.
- Support vhost multiple queues.
- Rebase on "remove pci driver from vdevs".
- Optimize RX/TX functions.
- Fix resource leaks.
- Fix compile issue.
- Add patch to fix vhost library.
- Optimize performance.
In RX/TX functions, change code to access only per core data.
- Add below API to allow user to use vhost library APIs for a port managed
by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
- rte_eth_vhost_portid2vdev()
To support this functionality, vhost library is also changed.
Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
- Add code to support vhost multiple queues.
Actually, multiple queues functionality is not enabled so far.
- Fix issues reported by checkpatch.pl
(Thanks to Stephen Hemminger)
vhost: Add callback and private data for vhost PMD
vhost: Add VHOST PMD
config/common_linuxapp | 6 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/vhost.rst | 82 +++
doc/guides/rel_notes/release_2_2.rst | 2 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 62 +++
drivers/net/vhost/rte_eth_vhost.c | 765 ++++++++++++++++++++++++++
drivers/net/vhost/rte_eth_vhost.h | 65 +++
drivers/net/vhost/rte_pmd_vhost_version.map | 8 +
lib/librte_vhost/rte_vhost_version.map | 6 +
lib/librte_vhost/rte_virtio_net.h | 3 +
lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +-
lib/librte_vhost/virtio-net.c | 56 +-
lib/librte_vhost/virtio-net.h | 4 +-
mk/rte.app.mk | 8 +-
15 files changed, 1072 insertions(+), 13 deletions(-)
create mode 100644 doc/guides/nics/vhost.rst
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_eth_vhost.h
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
Stephen Hemminger
2015-11-09 22:25:05 UTC
Permalink
On Tue, 27 Oct 2015 15:12:55 +0900
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.
The PMD has 2 parameters.
- iface: The parameter is used to specify a path to connect to a
virtio-net device.
- queues: The parameter is used to specify the number of the queues
virtio-net device has.
(Default: 1)
Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
-device virtio-net-pci,netdev=net0
Brocade developed a much simpler vhost PMD, without all the atomics and
locking.


/*-
* BSD LICENSE
*
* Copyright (C) Brocade Communications Systems, Inc.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Brocade Communications Systems, Inc.
* nor the names of its contributors may be used to endorse
* or promote products derived from this software without specific
* prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

#include <rte_mbuf.h>
#include <rte_ether.h>
#include <rte_ethdev.h>
#include <rte_malloc.h>
#include <rte_memcpy.h>
#include <rte_dev.h>
#include <rte_log.h>

#include "../librte_vhost/rte_virtio_net.h"
#include "../librte_vhost/virtio-net.h"

struct pmd_internals;

struct vhost_queue {
struct pmd_internals *internals;

struct rte_mempool *mb_pool;

uint64_t pkts;
uint64_t bytes;
};

struct pmd_internals {
struct virtio_net *dev;
unsigned numa_node;
struct eth_driver *eth_drv;

unsigned nb_rx_queues;
unsigned nb_tx_queues;

struct vhost_queue rx_queues[1];
struct vhost_queue tx_queues[1];
uint8_t port_id;
};


static const char *drivername = "Vhost PMD";

static struct rte_eth_link pmd_link = {
.link_speed = 10000,
.link_duplex = ETH_LINK_FULL_DUPLEX,
.link_status = 0
};

static uint16_t
eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
{
int ret, i;
struct vhost_queue *h = q;

ret = rte_vhost_dequeue_burst(h->internals->dev,
VIRTIO_TXQ, h->mb_pool, bufs, nb_bufs);

for (i = 0; i < ret ; i++) {
struct rte_mbuf *m = bufs[i];

m->port = h->internals->port_id;
++h->pkts;
h->bytes += rte_pktmbuf_pkt_len(m);
}
return ret;
}

static uint16_t
eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
{
int ret, i;
struct vhost_queue *h = q;

ret = rte_vhost_enqueue_burst(h->internals->dev,
VIRTIO_RXQ, bufs, nb_bufs);

for (i = 0; i < ret; i++) {
struct rte_mbuf *m = bufs[i];

++h->pkts;
h->bytes += rte_pktmbuf_pkt_len(m);
rte_pktmbuf_free(m);
}

return ret;
}

static int
eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
{
return 0;
}

static int
eth_dev_start(struct rte_eth_dev *dev)
{
struct pmd_internals *internals = dev->data->dev_private;

dev->data->dev_link.link_status = 1;
RTE_LOG(INFO, PMD, "vhost(%s): link up\n", internals->dev->ifname);
return 0;
}

static void
eth_dev_stop(struct rte_eth_dev *dev)
{
struct pmd_internals *internals = dev->data->dev_private;

dev->data->dev_link.link_status = 0;
RTE_LOG(INFO, PMD, "vhost(%s): link down\n", internals->dev->ifname);
}

static int
eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
uint16_t nb_rx_desc __rte_unused,
unsigned int socket_id __rte_unused,
const struct rte_eth_rxconf *rx_conf __rte_unused,
struct rte_mempool *mb_pool)
{
struct pmd_internals *internals = dev->data->dev_private;

internals->rx_queues[rx_queue_id].mb_pool = mb_pool;
dev->data->rx_queues[rx_queue_id] =
&internals->rx_queues[rx_queue_id];
internals->rx_queues[rx_queue_id].internals = internals;

return 0;
}

static int
eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
uint16_t nb_tx_desc __rte_unused,
unsigned int socket_id __rte_unused,
const struct rte_eth_txconf *tx_conf __rte_unused)
{
struct pmd_internals *internals = dev->data->dev_private;

dev->data->tx_queues[tx_queue_id] =
&internals->tx_queues[tx_queue_id];
internals->tx_queues[tx_queue_id].internals = internals;

return 0;
}


static void
eth_dev_info(struct rte_eth_dev *dev,
struct rte_eth_dev_info *dev_info)
{
struct pmd_internals *internals = dev->data->dev_private;

dev_info->driver_name = drivername;
dev_info->max_mac_addrs = 1;
dev_info->max_rx_pktlen = -1;
dev_info->max_rx_queues = (uint16_t)internals->nb_rx_queues;
dev_info->max_tx_queues = (uint16_t)internals->nb_tx_queues;
dev_info->min_rx_bufsize = 0;
dev_info->pci_dev = NULL;
}

static void
eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
{
const struct pmd_internals *internal = dev->data->dev_private;
unsigned i;

for (i = 0; i < internal->nb_rx_queues; i++) {
const struct vhost_queue *h = &internal->rx_queues[i];

stats->ipackets += h->pkts;
stats->ibytes += h->bytes;

if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
stats->q_ibytes[i] = h->bytes;
stats->q_ipackets[i] = h->pkts;
}
}

for (i = 0; i < internal->nb_tx_queues; i++) {
const struct vhost_queue *h = &internal->tx_queues[i];

stats->opackets += h->pkts;
stats->obytes += h->bytes;

if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
stats->q_obytes[i] = h->bytes;
stats->q_opackets[i] = h->pkts;
}
}
}

static void
eth_stats_reset(struct rte_eth_dev *dev)
{
unsigned i;
struct pmd_internals *internal = dev->data->dev_private;

for (i = 0; i < internal->nb_rx_queues; i++) {
internal->rx_queues[i].pkts = 0;
internal->rx_queues[i].bytes = 0;
}

for (i = 0; i < internal->nb_tx_queues; i++) {
internal->tx_queues[i].pkts = 0;
internal->tx_queues[i].bytes = 0;
}
}

static struct eth_driver rte_vhost_pmd = {
.pci_drv = {
.name = "rte_vhost_pmd",
.drv_flags = RTE_PCI_DRV_DETACHABLE,
},
};

static void
eth_queue_release(void *q __rte_unused)
{
}

static int
eth_link_update(struct rte_eth_dev *dev __rte_unused,
int wait_to_complete __rte_unused)
{
return 0;
}

static struct eth_dev_ops eth_ops = {
.dev_start = eth_dev_start,
.dev_stop = eth_dev_stop,
.dev_configure = eth_dev_configure,
.dev_infos_get = eth_dev_info,
.rx_queue_setup = eth_rx_queue_setup,
.tx_queue_setup = eth_tx_queue_setup,
.rx_queue_release = eth_queue_release,
.tx_queue_release = eth_queue_release,
.link_update = eth_link_update,
.stats_get = eth_stats_get,
.stats_reset = eth_stats_reset,
};

static int
eth_dev_vhost_create(const char *name, const unsigned numa_node)
{
const unsigned nb_rx_queues = 1;
const unsigned nb_tx_queues = 1;
struct rte_eth_dev_data *data = NULL;
struct rte_pci_device *pci_dev = NULL;
struct pmd_internals *internals = NULL;
struct rte_eth_dev *eth_dev = NULL;
struct virtio_net *vhost_dev = NULL;
struct eth_driver *eth_drv = NULL;
struct rte_pci_id *id_table = NULL;
struct ether_addr *eth_addr = NULL;

if (name == NULL)
return -EINVAL;

vhost_dev = get_device_by_name(name);

if (vhost_dev == NULL)
return -EINVAL;

RTE_LOG(INFO, PMD, "Creating vhost ethdev on numa socket %u\n",
numa_node);

data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
if (data == NULL)
goto error;

pci_dev = rte_zmalloc_socket(name, sizeof(*pci_dev), 0, numa_node);
if (pci_dev == NULL)
goto error;

id_table = rte_zmalloc_socket(name, sizeof(*id_table), 0, numa_node);
if (id_table == NULL)
goto error;

internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
if (internals == NULL)
goto error;

eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
if (internals == NULL)
goto error;

eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
if (eth_dev == NULL)
goto error;

eth_drv = rte_zmalloc_socket(name, sizeof(*eth_drv), 0, numa_node);
if (eth_drv == NULL)
goto error;

internals->nb_rx_queues = nb_rx_queues;
internals->nb_tx_queues = nb_tx_queues;
internals->numa_node = numa_node;
internals->dev = vhost_dev;

internals->port_id = eth_dev->data->port_id;

eth_drv->pci_drv.name = drivername;
eth_drv->pci_drv.id_table = id_table;
internals->eth_drv = eth_drv;

pci_dev->numa_node = numa_node;
pci_dev->driver = &eth_drv->pci_drv;

data->dev_private = internals;
data->port_id = eth_dev->data->port_id;
data->nb_rx_queues = (uint16_t)nb_rx_queues;
data->nb_tx_queues = (uint16_t)nb_tx_queues;
data->dev_link = pmd_link;
eth_random_addr(&eth_addr->addr_bytes[0]);
data->mac_addrs = eth_addr;
strncpy(data->name, eth_dev->data->name, strlen(eth_dev->data->name));

eth_dev->data = data;
eth_dev->dev_ops = &eth_ops;
eth_dev->pci_dev = pci_dev;
eth_dev->driver = &rte_vhost_pmd;
eth_dev->rx_pkt_burst = eth_vhost_rx;
eth_dev->tx_pkt_burst = eth_vhost_tx;
TAILQ_INIT(&(eth_dev->link_intr_cbs));

return 0;

error:
rte_free(data);
rte_free(pci_dev);
rte_free(id_table);
rte_free(eth_drv);
rte_free(eth_addr);
rte_free(internals);

return -1;
}

static int
rte_pmd_vhost_devinit(const char *name,
const char *params __attribute__((unused)))
{
unsigned numa_node;

if (name == NULL)
return -EINVAL;

RTE_LOG(DEBUG, PMD, "Initializing pmd_vhost for %s\n", name);

numa_node = rte_socket_id();

return eth_dev_vhost_create(name, numa_node);
}

static int
rte_pmd_vhost_devuninit(const char *name)
{
struct rte_eth_dev *eth_dev = NULL;
struct pmd_internals *internals = NULL;

if (name == NULL)
return -EINVAL;

RTE_LOG(DEBUG, PMD, "Closing vhost ethdev on numa socket %u\n",
rte_socket_id());

/* reserve an ethdev entry */
eth_dev = rte_eth_dev_allocated(name);
if (eth_dev == NULL)
return -1;

internals = (struct pmd_internals *)eth_dev->data->dev_private;
rte_free(internals->eth_drv->pci_drv.id_table);
rte_free(internals->eth_drv);
rte_free(eth_dev->data->dev_private);
rte_free(eth_dev->data->mac_addrs);
rte_free(eth_dev->data);
rte_free(eth_dev->pci_dev);

rte_eth_dev_release_port(eth_dev);

return 0;
}

static struct rte_driver pmd_vhost_drv = {
.name = "vhost",
.type = PMD_VDEV,
.init = rte_pmd_vhost_devinit,
.uninit = rte_pmd_vhost_devuninit,
};

PMD_REGISTER_DRIVER(pmd_vhost_drv);
Tetsuya Mukawa
2015-11-10 03:27:28 UTC
Permalink
Post by Stephen Hemminger
On Tue, 27 Oct 2015 15:12:55 +0900
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.
The PMD has 2 parameters.
- iface: The parameter is used to specify a path to connect to a
virtio-net device.
- queues: The parameter is used to specify the number of the queues
virtio-net device has.
(Default: 1)
Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
-device virtio-net-pci,netdev=net0
Brocade developed a much simpler vhost PMD, without all the atomics and
locking.
Hi Stephen,

With your PMD, it seems we need to call some vhost library APIs before
start sending and receiving.
It means we need to manage virtio-net device connections in DPDK
application anyway.

Also, I guess all PMDs should provide feature to be replaced by one of
other PMD without heavy modification for DPDK application.
This is because I tried to manage virtio-net device connections in vhost
PMD.

Thanks,
Tetsuya
Post by Stephen Hemminger
/*-
* BSD LICENSE
*
* Copyright (C) Brocade Communications Systems, Inc.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Brocade Communications Systems, Inc.
* nor the names of its contributors may be used to endorse
* or promote products derived from this software without specific
* prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <rte_mbuf.h>
#include <rte_ether.h>
#include <rte_ethdev.h>
#include <rte_malloc.h>
#include <rte_memcpy.h>
#include <rte_dev.h>
#include <rte_log.h>
#include "../librte_vhost/rte_virtio_net.h"
#include "../librte_vhost/virtio-net.h"
struct pmd_internals;
struct vhost_queue {
struct pmd_internals *internals;
struct rte_mempool *mb_pool;
uint64_t pkts;
uint64_t bytes;
};
struct pmd_internals {
struct virtio_net *dev;
unsigned numa_node;
struct eth_driver *eth_drv;
unsigned nb_rx_queues;
unsigned nb_tx_queues;
struct vhost_queue rx_queues[1];
struct vhost_queue tx_queues[1];
uint8_t port_id;
};
static const char *drivername = "Vhost PMD";
static struct rte_eth_link pmd_link = {
.link_speed = 10000,
.link_duplex = ETH_LINK_FULL_DUPLEX,
.link_status = 0
};
static uint16_t
eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
{
int ret, i;
struct vhost_queue *h = q;
ret = rte_vhost_dequeue_burst(h->internals->dev,
VIRTIO_TXQ, h->mb_pool, bufs, nb_bufs);
for (i = 0; i < ret ; i++) {
struct rte_mbuf *m = bufs[i];
m->port = h->internals->port_id;
++h->pkts;
h->bytes += rte_pktmbuf_pkt_len(m);
}
return ret;
}
static uint16_t
eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
{
int ret, i;
struct vhost_queue *h = q;
ret = rte_vhost_enqueue_burst(h->internals->dev,
VIRTIO_RXQ, bufs, nb_bufs);
for (i = 0; i < ret; i++) {
struct rte_mbuf *m = bufs[i];
++h->pkts;
h->bytes += rte_pktmbuf_pkt_len(m);
rte_pktmbuf_free(m);
}
return ret;
}
static int
eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
{
return 0;
}
static int
eth_dev_start(struct rte_eth_dev *dev)
{
struct pmd_internals *internals = dev->data->dev_private;
dev->data->dev_link.link_status = 1;
RTE_LOG(INFO, PMD, "vhost(%s): link up\n", internals->dev->ifname);
return 0;
}
static void
eth_dev_stop(struct rte_eth_dev *dev)
{
struct pmd_internals *internals = dev->data->dev_private;
dev->data->dev_link.link_status = 0;
RTE_LOG(INFO, PMD, "vhost(%s): link down\n", internals->dev->ifname);
}
static int
eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
uint16_t nb_rx_desc __rte_unused,
unsigned int socket_id __rte_unused,
const struct rte_eth_rxconf *rx_conf __rte_unused,
struct rte_mempool *mb_pool)
{
struct pmd_internals *internals = dev->data->dev_private;
internals->rx_queues[rx_queue_id].mb_pool = mb_pool;
dev->data->rx_queues[rx_queue_id] =
&internals->rx_queues[rx_queue_id];
internals->rx_queues[rx_queue_id].internals = internals;
return 0;
}
static int
eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
uint16_t nb_tx_desc __rte_unused,
unsigned int socket_id __rte_unused,
const struct rte_eth_txconf *tx_conf __rte_unused)
{
struct pmd_internals *internals = dev->data->dev_private;
dev->data->tx_queues[tx_queue_id] =
&internals->tx_queues[tx_queue_id];
internals->tx_queues[tx_queue_id].internals = internals;
return 0;
}
static void
eth_dev_info(struct rte_eth_dev *dev,
struct rte_eth_dev_info *dev_info)
{
struct pmd_internals *internals = dev->data->dev_private;
dev_info->driver_name = drivername;
dev_info->max_mac_addrs = 1;
dev_info->max_rx_pktlen = -1;
dev_info->max_rx_queues = (uint16_t)internals->nb_rx_queues;
dev_info->max_tx_queues = (uint16_t)internals->nb_tx_queues;
dev_info->min_rx_bufsize = 0;
dev_info->pci_dev = NULL;
}
static void
eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
{
const struct pmd_internals *internal = dev->data->dev_private;
unsigned i;
for (i = 0; i < internal->nb_rx_queues; i++) {
const struct vhost_queue *h = &internal->rx_queues[i];
stats->ipackets += h->pkts;
stats->ibytes += h->bytes;
if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
stats->q_ibytes[i] = h->bytes;
stats->q_ipackets[i] = h->pkts;
}
}
for (i = 0; i < internal->nb_tx_queues; i++) {
const struct vhost_queue *h = &internal->tx_queues[i];
stats->opackets += h->pkts;
stats->obytes += h->bytes;
if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
stats->q_obytes[i] = h->bytes;
stats->q_opackets[i] = h->pkts;
}
}
}
static void
eth_stats_reset(struct rte_eth_dev *dev)
{
unsigned i;
struct pmd_internals *internal = dev->data->dev_private;
for (i = 0; i < internal->nb_rx_queues; i++) {
internal->rx_queues[i].pkts = 0;
internal->rx_queues[i].bytes = 0;
}
for (i = 0; i < internal->nb_tx_queues; i++) {
internal->tx_queues[i].pkts = 0;
internal->tx_queues[i].bytes = 0;
}
}
static struct eth_driver rte_vhost_pmd = {
.pci_drv = {
.name = "rte_vhost_pmd",
.drv_flags = RTE_PCI_DRV_DETACHABLE,
},
};
static void
eth_queue_release(void *q __rte_unused)
{
}
static int
eth_link_update(struct rte_eth_dev *dev __rte_unused,
int wait_to_complete __rte_unused)
{
return 0;
}
static struct eth_dev_ops eth_ops = {
.dev_start = eth_dev_start,
.dev_stop = eth_dev_stop,
.dev_configure = eth_dev_configure,
.dev_infos_get = eth_dev_info,
.rx_queue_setup = eth_rx_queue_setup,
.tx_queue_setup = eth_tx_queue_setup,
.rx_queue_release = eth_queue_release,
.tx_queue_release = eth_queue_release,
.link_update = eth_link_update,
.stats_get = eth_stats_get,
.stats_reset = eth_stats_reset,
};
static int
eth_dev_vhost_create(const char *name, const unsigned numa_node)
{
const unsigned nb_rx_queues = 1;
const unsigned nb_tx_queues = 1;
struct rte_eth_dev_data *data = NULL;
struct rte_pci_device *pci_dev = NULL;
struct pmd_internals *internals = NULL;
struct rte_eth_dev *eth_dev = NULL;
struct virtio_net *vhost_dev = NULL;
struct eth_driver *eth_drv = NULL;
struct rte_pci_id *id_table = NULL;
struct ether_addr *eth_addr = NULL;
if (name == NULL)
return -EINVAL;
vhost_dev = get_device_by_name(name);
if (vhost_dev == NULL)
return -EINVAL;
RTE_LOG(INFO, PMD, "Creating vhost ethdev on numa socket %u\n",
numa_node);
data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
if (data == NULL)
goto error;
pci_dev = rte_zmalloc_socket(name, sizeof(*pci_dev), 0, numa_node);
if (pci_dev == NULL)
goto error;
id_table = rte_zmalloc_socket(name, sizeof(*id_table), 0, numa_node);
if (id_table == NULL)
goto error;
internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
if (internals == NULL)
goto error;
eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
if (internals == NULL)
goto error;
eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
if (eth_dev == NULL)
goto error;
eth_drv = rte_zmalloc_socket(name, sizeof(*eth_drv), 0, numa_node);
if (eth_drv == NULL)
goto error;
internals->nb_rx_queues = nb_rx_queues;
internals->nb_tx_queues = nb_tx_queues;
internals->numa_node = numa_node;
internals->dev = vhost_dev;
internals->port_id = eth_dev->data->port_id;
eth_drv->pci_drv.name = drivername;
eth_drv->pci_drv.id_table = id_table;
internals->eth_drv = eth_drv;
pci_dev->numa_node = numa_node;
pci_dev->driver = &eth_drv->pci_drv;
data->dev_private = internals;
data->port_id = eth_dev->data->port_id;
data->nb_rx_queues = (uint16_t)nb_rx_queues;
data->nb_tx_queues = (uint16_t)nb_tx_queues;
data->dev_link = pmd_link;
eth_random_addr(&eth_addr->addr_bytes[0]);
data->mac_addrs = eth_addr;
strncpy(data->name, eth_dev->data->name, strlen(eth_dev->data->name));
eth_dev->data = data;
eth_dev->dev_ops = &eth_ops;
eth_dev->pci_dev = pci_dev;
eth_dev->driver = &rte_vhost_pmd;
eth_dev->rx_pkt_burst = eth_vhost_rx;
eth_dev->tx_pkt_burst = eth_vhost_tx;
TAILQ_INIT(&(eth_dev->link_intr_cbs));
return 0;
rte_free(data);
rte_free(pci_dev);
rte_free(id_table);
rte_free(eth_drv);
rte_free(eth_addr);
rte_free(internals);
return -1;
}
static int
rte_pmd_vhost_devinit(const char *name,
const char *params __attribute__((unused)))
{
unsigned numa_node;
if (name == NULL)
return -EINVAL;
RTE_LOG(DEBUG, PMD, "Initializing pmd_vhost for %s\n", name);
numa_node = rte_socket_id();
return eth_dev_vhost_create(name, numa_node);
}
static int
rte_pmd_vhost_devuninit(const char *name)
{
struct rte_eth_dev *eth_dev = NULL;
struct pmd_internals *internals = NULL;
if (name == NULL)
return -EINVAL;
RTE_LOG(DEBUG, PMD, "Closing vhost ethdev on numa socket %u\n",
rte_socket_id());
/* reserve an ethdev entry */
eth_dev = rte_eth_dev_allocated(name);
if (eth_dev == NULL)
return -1;
internals = (struct pmd_internals *)eth_dev->data->dev_private;
rte_free(internals->eth_drv->pci_drv.id_table);
rte_free(internals->eth_drv);
rte_free(eth_dev->data->dev_private);
rte_free(eth_dev->data->mac_addrs);
rte_free(eth_dev->data);
rte_free(eth_dev->pci_dev);
rte_eth_dev_release_port(eth_dev);
return 0;
}
static struct rte_driver pmd_vhost_drv = {
.name = "vhost",
.type = PMD_VDEV,
.init = rte_pmd_vhost_devinit,
.uninit = rte_pmd_vhost_devuninit,
};
PMD_REGISTER_DRIVER(pmd_vhost_drv);
Tetsuya Mukawa
2015-10-27 07:54:28 UTC
Permalink
Below patch has been submitted as a separate patch.

- [dpdk-dev,1/3] vhost: Fix wrong handling of virtqueue array index
(http://dpdk.org/dev/patchwork/patch/8038/)

Tetsuya
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. The patch will work on below patch series.
- [PATCH v5 00/28] remove pci driver from vdevs
* Known issue.
We may see issues while handling RESET_OWNER message.
These handlings are done in vhost library, so not a part of vhost PMD.
So far, we are waiting for QEMU fixing.
- Support vhost multiple queues.
- Rebase on "remove pci driver from vdevs".
- Optimize RX/TX functions.
- Fix resource leaks.
- Fix compile issue.
- Add patch to fix vhost library.
- Optimize performance.
In RX/TX functions, change code to access only per core data.
- Add below API to allow user to use vhost library APIs for a port managed
by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
- rte_eth_vhost_portid2vdev()
To support this functionality, vhost library is also changed.
Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
- Add code to support vhost multiple queues.
Actually, multiple queues functionality is not enabled so far.
- Fix issues reported by checkpatch.pl
(Thanks to Stephen Hemminger)
vhost: Fix wrong handling of virtqueue array index
vhost: Add callback and private data for vhost PMD
vhost: Add VHOST PMD
config/common_linuxapp | 6 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/vhost.rst | 82 +++
doc/guides/rel_notes/release_2_2.rst | 2 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 62 +++
drivers/net/vhost/rte_eth_vhost.c | 765 ++++++++++++++++++++++++++
drivers/net/vhost/rte_eth_vhost.h | 65 +++
drivers/net/vhost/rte_pmd_vhost_version.map | 8 +
lib/librte_vhost/rte_vhost_version.map | 6 +
lib/librte_vhost/rte_virtio_net.h | 3 +
lib/librte_vhost/vhost_user/virtio-net-user.c | 33 +-
lib/librte_vhost/virtio-net.c | 61 +-
lib/librte_vhost/virtio-net.h | 4 +-
mk/rte.app.mk | 8 +-
15 files changed, 1085 insertions(+), 25 deletions(-)
create mode 100644 doc/guides/nics/vhost.rst
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_eth_vhost.h
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
Thomas Monjalon
2015-10-30 18:30:52 UTC
Permalink
Post by Tetsuya Mukawa
Below patch has been submitted as a separate patch.
- [dpdk-dev,1/3] vhost: Fix wrong handling of virtqueue array index
(http://dpdk.org/dev/patchwork/patch/8038/)
Please could you rebase only the two last patches?
Thanks

PS:
WARNING:TYPO_SPELLING: 'failuer' may be misspelled - perhaps 'failure'?
#606: FILE: drivers/net/vhost/rte_eth_vhost.c:272:
+ RTE_LOG(INFO, PMD, "failuer to find a ethdev\n");
WARNING:TYPO_SPELLING: 'accesing' may be misspelled - perhaps 'accessing'?
#612: FILE: drivers/net/vhost/rte_eth_vhost.c:278:
+ /* Wait until rx/tx_pkt_burst stops accesing vhost device */
Tetsuya Mukawa
2015-11-02 03:15:15 UTC
Permalink
Post by Thomas Monjalon
Post by Tetsuya Mukawa
Below patch has been submitted as a separate patch.
- [dpdk-dev,1/3] vhost: Fix wrong handling of virtqueue array index
(http://dpdk.org/dev/patchwork/patch/8038/)
Please could you rebase only the two last patches?
Thanks
WARNING:TYPO_SPELLING: 'failuer' may be misspelled - perhaps 'failure'?
+ RTE_LOG(INFO, PMD, "failuer to find a ethdev\n");
WARNING:TYPO_SPELLING: 'accesing' may be misspelled - perhaps 'accessing'?
+ /* Wait until rx/tx_pkt_burst stops accesing vhost device */
Hi Thomas,

Thank you so much for checking my patches.
I have fixed a few typos, and rebased on latest tree (with Bernard's patch).
I will submit again soon.

Regards,
Tetsuya
Tetsuya Mukawa
2015-10-22 09:45:50 UTC
Permalink
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
- iface: The parameter is used to specify a path connect to a
virtio-net device.
- queues: The parameter is used to specify the number of the queues
virtio-net device has.
(Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=3' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
config/common_linuxapp | 6 +
drivers/net/Makefile | 4 +
drivers/net/vhost/Makefile | 62 +++
drivers/net/vhost/rte_eth_vhost.c | 735 ++++++++++++++++++++++++++++
drivers/net/vhost/rte_eth_vhost.h | 65 +++
drivers/net/vhost/rte_pmd_vhost_version.map | 8 +
mk/rte.app.mk | 8 +-
7 files changed, 887 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/vhost/Makefile
create mode 100644 drivers/net/vhost/rte_eth_vhost.c
create mode 100644 drivers/net/vhost/rte_eth_vhost.h
create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..7310240 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -446,6 +446,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
CONFIG_RTE_LIBRTE_VHOST_DEBUG=n

#
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
#Compile Xen domain0 support
#
CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 5ebf963..e46a38e 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -49,5 +49,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt

+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
include $(RTE_SDK)/mk/rte.sharelib.mk
include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..8186a80
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+# BSD LICENSE
+#
+# Copyright (c) 2010-2015 Intel Corporation.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in
+# the documentation and/or other materials provided with the
+# distribution.
+# * Neither the name of Intel corporation nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..66bfc2b
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,735 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2010-2015 Intel Corporation.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG "iface"
+#define ETH_VHOST_QUEUES_ARG "queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+ ETH_VHOST_IFACE_ARG,
+ ETH_VHOST_QUEUES_ARG,
+ NULL
+};
+
+static struct ether_addr base_eth_addr = {
+ .addr_bytes = {
+ 0x56 /* V */,
+ 0x48 /* H */,
+ 0x4F /* O */,
+ 0x53 /* S */,
+ 0x54 /* T */,
+ 0x00
+ }
+};
+
+struct vhost_queue {
+ struct virtio_net *device;
+ struct pmd_internal *internal;
+ struct rte_mempool *mb_pool;
+ rte_atomic32_t allow_queuing;
+ rte_atomic32_t while_queuing;
+ rte_atomic64_t rx_pkts;
+ rte_atomic64_t tx_pkts;
+ rte_atomic64_t err_pkts;
+};
+
+struct pmd_internal {
+ TAILQ_ENTRY(pmd_internal) next;
+ char *dev_name;
+ char *iface_name;
+ unsigned nb_rx_queues;
+ unsigned nb_tx_queues;
+
+ struct vhost_queue rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+ struct vhost_queue tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+
+ volatile uint16_t once;
+ pthread_t session_th;
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+ TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static struct rte_eth_link pmd_link = {
+ .link_speed = 10000,
+ .link_duplex = ETH_LINK_FULL_DUPLEX,
+ .link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t nb_rx = 0;
+
+ if (unlikely(r->internal == NULL))
+ return 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Dequeue packets from guest TX queue */
+ nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+ VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
+
+ rte_atomic64_add(&(r->rx_pkts), nb_rx);
+
+out:
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(r->internal == NULL))
+ return 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+ VIRTIO_RXQ, bufs, nb_bufs);
+
+ rte_atomic64_add(&(r->tx_pkts), nb_tx);
+ rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
+
+out:
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+ int found = 0;
+ struct pmd_internal *internal;
+
+ if (ifname == NULL)
+ return NULL;
+
+ pthread_mutex_lock(&internal_list_lock);
+
+ TAILQ_FOREACH(internal, &internals_list, next) {
+ if (!strcmp(internal->iface_name, ifname)) {
+ found = 1;
+ break;
+ }
+ }
+
+ pthread_mutex_unlock(&internal_list_lock);
+
+ if (!found)
+ return NULL;
+
+ return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ uint16_t queues;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "invalid argument\n");
+ return -1;
+ }
+
+ internal = find_internal_resource(dev->ifname);
+ if (internal == NULL) {
+ RTE_LOG(INFO, PMD, "invalid device name\n");
+ return -1;
+ }
+
+ /*
+ * Todo: To support multi queue, get the number of queues here.
+ * So far, vhost provides only one queue.
+ */
+ queues = 1;
+
+ if ((queues < internal->nb_rx_queues) ||
+ (queues < internal->nb_tx_queues)) {
+ RTE_LOG(INFO, PMD, "Not enough queues\n");
+ return -1;
+ }
+
+ eth_dev = rte_eth_dev_allocated(internal->dev_name);
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
+ return -1;
+ }
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ vq->device = dev;
+ vq->internal = internal;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ vq->device = dev;
+ vq->internal = internal;
+ }
+
+ dev->flags |= VIRTIO_DEV_RUNNING;
+ dev->pmd_priv = eth_dev;
+ eth_dev->data->dev_link.link_status = 1;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ rte_atomic32_set(&vq->allow_queuing, 1);
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ rte_atomic32_set(&vq->allow_queuing, 1);
+ }
+ RTE_LOG(INFO, PMD, "New connection established\n");
+
+ return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "invalid argument\n");
+ return;
+ }
+
+ eth_dev = (struct rte_eth_dev *)dev->pmd_priv;
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "failuer to find a ethdev\n");
+ return;
+ }
+
+ internal = eth_dev->data->dev_private;
+
+ /* Wait until rx/tx_pkt_burst stops accesing vhost device */
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ rte_atomic32_set(&vq->allow_queuing, 0);
+ while (rte_atomic32_read(&vq->while_queuing))
+ rte_pause();
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ rte_atomic32_set(&vq->allow_queuing, 0);
+ while (rte_atomic32_read(&vq->while_queuing))
+ rte_pause();
+ }
+
+ eth_dev->data->dev_link.link_status = 0;
+
+ dev->pmd_priv = NULL;
+ dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ vq->device = NULL;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ vq->device = NULL;
+ }
+
+ RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *vhost_driver_session(void *param __rte_unused)
+{
+ static struct virtio_net_device_ops *vhost_ops;
+
+ vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+ if (vhost_ops == NULL)
+ rte_panic("Can't allocate memory\n");
+
+ /* set vhost arguments */
+ vhost_ops->new_device = new_device;
+ vhost_ops->destroy_device = destroy_device;
+ if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
+ rte_panic("Can't register callbacks\n");
+
+ /* start event handling */
+ rte_vhost_driver_session_start();
+
+ rte_free(vhost_ops);
+ pthread_exit(0);
+}
+
+static void vhost_driver_session_start(struct pmd_internal *internal)
+{
+ int ret;
+
+ ret = pthread_create(&internal->session_th,
+ NULL, vhost_driver_session, NULL);
+ if (ret)
+ rte_panic("Can't create a thread\n");
+}
+
+static void vhost_driver_session_stop(struct pmd_internal *internal)
+{
+ int ret;
+
+ ret = pthread_cancel(internal->session_th);
+ if (ret)
+ rte_panic("Can't cancel the thread\n");
+
+ ret = pthread_join(internal->session_th, NULL);
+ if (ret)
+ rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+ int ret;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+ ret = rte_vhost_driver_register(internal->iface_name);
+ if (ret)
+ return ret;
+
+ vhost_driver_session_start(internal);
+ }
+ return 0;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ if (rte_atomic16_cmpset(&internal->once, 1, 0)) {
+ rte_vhost_driver_unregister(internal->iface_name);
+ vhost_driver_session_stop(internal);
+ }
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ internal->rx_vhost_queues[rx_queue_id].mb_pool = mb_pool;
+ dev->data->rx_queues[rx_queue_id] = &internal->rx_vhost_queues[rx_queue_id];
+ return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ dev->data->tx_queues[tx_queue_id] = &internal->tx_vhost_queues[tx_queue_id];
+ return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+ struct rte_eth_dev_info *dev_info)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ dev_info->driver_name = drivername;
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)-1;
+ dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+ dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+ dev_info->min_rx_bufsize = 0;
+ dev_info->pci_dev = dev->pci_dev;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+ unsigned i;
+ unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+ const struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_rx_queues; i++) {
+ igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i].rx_pkts.cnt;
+ rx_total += igb_stats->q_ipackets[i];
+ }
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_tx_queues; i++) {
+ igb_stats->q_opackets[i] = internal->tx_vhost_queues[i].tx_pkts.cnt;
+ igb_stats->q_errors[i] = internal->tx_vhost_queues[i].err_pkts.cnt;
+ tx_total += igb_stats->q_opackets[i];
+ tx_err_total += igb_stats->q_errors[i];
+ }
+
+ igb_stats->ipackets = rx_total;
+ igb_stats->opackets = tx_total;
+ igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+ unsigned i;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++)
+ internal->rx_vhost_queues[i].rx_pkts.cnt = 0;
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ internal->tx_vhost_queues[i].tx_pkts.cnt = 0;
+ internal->tx_vhost_queues[i].err_pkts.cnt = 0;
+ }
+}
+
+static void
+eth_queue_release(void *q __rte_unused) { ; }
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused) { return 0; }
+
+static const struct eth_dev_ops ops = {
+ .dev_start = eth_dev_start,
+ .dev_stop = eth_dev_stop,
+ .dev_configure = eth_dev_configure,
+ .dev_infos_get = eth_dev_info,
+ .rx_queue_setup = eth_rx_queue_setup,
+ .tx_queue_setup = eth_tx_queue_setup,
+ .rx_queue_release = eth_queue_release,
+ .tx_queue_release = eth_queue_release,
+ .link_update = eth_link_update,
+ .stats_get = eth_stats_get,
+ .stats_reset = eth_stats_reset,
+};
+
+static struct eth_driver rte_vhost_pmd = {
+ .pci_drv = {
+ .name = "rte_vhost_pmd",
+ .drv_flags = RTE_PCI_DRV_DETACHABLE,
+ },
+};
+
+static struct rte_pci_id id_table;
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+ char *iface_name,
+ int16_t queues,
+ const unsigned numa_node)
+{
+ struct rte_eth_dev_data *data = NULL;
+ struct rte_pci_device *pci_dev = NULL;
+ struct pmd_internal *internal = NULL;
+ struct rte_eth_dev *eth_dev = NULL;
+ struct ether_addr *eth_addr = NULL;
+
+ RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+ numa_node);
+
+ /* now do all data allocation - for eth_dev structure, dummy pci driver
+ * and internal (private) data
+ */
+ data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+ if (data == NULL)
+ goto error;
+
+ pci_dev = rte_zmalloc_socket(name, sizeof(*pci_dev), 0, numa_node);
+ if (pci_dev == NULL)
+ goto error;
+
+ internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+ if (internal == NULL)
+ goto error;
+
+ eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+ if (eth_addr == NULL)
+ goto error;
+ *eth_addr = base_eth_addr;
+ eth_addr->addr_bytes[5] = index;
+
+ /* reserve an ethdev entry */
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+ if (eth_dev == NULL)
+ goto error;
+
+ /* now put it all together
+ * - store queue data in internal,
+ * - store numa_node info in pci_driver
+ * - point eth_dev_data to internal and pci_driver
+ * - and point eth_dev structure to new eth_dev_data structure
+ */
+ internal->nb_rx_queues = queues;
+ internal->nb_tx_queues = queues;
+ internal->dev_name = strdup(name);
+ if (internal->dev_name == NULL)
+ goto error;
+ internal->iface_name = strdup(iface_name);
+ if (internal->iface_name == NULL)
+ goto error;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_INSERT_TAIL(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ rte_vhost_pmd.pci_drv.name = drivername;
+ rte_vhost_pmd.pci_drv.id_table = &id_table;
+
+ pci_dev->numa_node = numa_node;
+ pci_dev->driver = &rte_vhost_pmd.pci_drv;
+
+ data->dev_private = internal;
+ data->port_id = eth_dev->data->port_id;
+ memmove(data->name, eth_dev->data->name, sizeof(data->name));
+ data->nb_rx_queues = queues;
+ data->nb_tx_queues = queues;
+ data->dev_link = pmd_link;
+ data->mac_addrs = eth_addr;
+
+ /* We'll replace the 'data' originally allocated by eth_dev. So the
+ * vhost PMD resources won't be shared between multi processes.
+ */
+ eth_dev->data = data;
+ eth_dev->driver = &rte_vhost_pmd;
+ eth_dev->dev_ops = &ops;
+ eth_dev->pci_dev = pci_dev;
+
+ /* finally assign rx and tx ops */
+ eth_dev->rx_pkt_burst = eth_vhost_rx;
+ eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+ return data->port_id;
+
+error:
+ rte_free(data);
+ rte_free(pci_dev);
+ rte_free(internal);
+ rte_free(eth_addr);
+
+ return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **iface_name = extra_args;
+
+ if (value == NULL)
+ return -1;
+
+ *iface_name = value;
+
+ return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint16_t *q = extra_args;
+
+ if ((value == NULL) || (extra_args == NULL))
+ return -EINVAL;
+
+ *q = (uint16_t)strtoul(value, NULL, 0);
+ if ((*q == USHRT_MAX) && (errno == ERANGE))
+ return -1;
+
+ if (*q > RTE_MAX_QUEUES_PER_PORT)
+ return -1;
+
+ return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+ struct rte_kvargs *kvlist = NULL;
+ int ret = 0;
+ int index;
+ char *iface_name;
+ uint16_t queues;
+
+ RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+ kvlist = rte_kvargs_parse(params, valid_arguments);
+ if (kvlist == NULL)
+ return -1;
+
+ if (strlen(name) < strlen("eth_vhost"))
+ return -1;
+
+ index = strtol(name + strlen("eth_vhost"), NULL, 0);
+ if (errno == ERANGE)
+ return -1;
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+ &open_iface, &iface_name);
+ if (ret < 0)
+ goto out_free;
+ }
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+ &open_queues, &queues);
+ if (ret < 0)
+ goto out_free;
+
+ } else
+ queues = 1;
+
+ eth_dev_vhost_create(name, index,
+ iface_name, queues, rte_socket_id());
+
+out_free:
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+ struct rte_eth_dev *eth_dev = NULL;
+ struct pmd_internal *internal;
+
+ RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+ if (name == NULL)
+ return -EINVAL;
+
+ /* find an ethdev entry */
+ eth_dev = rte_eth_dev_allocated(name);
+ if (eth_dev == NULL)
+ return -ENODEV;
+
+ internal = eth_dev->data->dev_private;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_REMOVE(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ eth_dev_stop(eth_dev);
+
+ if ((internal) && (internal->dev_name))
+ free(internal->dev_name);
+ if ((internal) && (internal->iface_name))
+ free(internal->iface_name);
+ rte_free(eth_dev->data->dev_private);
+ rte_free(eth_dev->data);
+ rte_free(eth_dev->pci_dev);
+
+ rte_eth_dev_release_port(eth_dev);
+ return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+ .name = "eth_vhost",
+ .type = PMD_VDEV,
+ .init = rte_pmd_vhost_devinit,
+ .uninit = rte_pmd_vhost_devuninit,
+};
+
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
+{
+ struct rte_eth_dev *eth_dev;
+
+ if (rte_eth_dev_is_valid_port(port_id) == 0)
+ return NULL;
+
+ eth_dev = &rte_eth_devices[port_id];
+ if (eth_dev->driver == &rte_vhost_pmd) {
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+
+ internal = eth_dev->data->dev_private;
+ vq = &internal->rx_vhost_queues[0];
+ if (vq->device)
+ return vq->device;
+ }
+
+ return NULL;
+}
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..0c4d4b5
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,65 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation. All rights reserved.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_AF_PACKET_H_
+#define _RTE_ETH_AF_PACKET_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_virtio_net.h>
+
+/**
+ * The function convert specified port_id to virtio device structure.
+ * The retured device can be used for vhost library APIs.
+ * To use vhost library APIs and vhost PMD parallely, below API should
+ * not be called, because the API will be called by vhost PMD.
+ * - rte_vhost_driver_session_start()
+ * Once a device is managed by vhost PMD, below API should not be called.
+ * - rte_vhost_driver_unregister()
+ * To unregister the device, call Port Hotplug APIs.
+ *
+ * @param port_id
+ * port number
+ * @return
+ * virtio net device structure corresponding to the specified port
+ * NULL will be returned in error cases.
+ */
+struct virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..bf0361a
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,8 @@
+DPDK_2.2 {
+
+ global:
+
+ rte_eth_vhost_portid2vdev;
+
+ local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 3871205..1c42fb1 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -144,7 +144,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += -lrte_pmd_af_packet
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += -lrte_pmd_null

-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)

endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
--
2.1.4
Bruce Richardson
2015-10-22 12:49:21 UTC
Permalink
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.
The PMD has 2 parameters.
- iface: The parameter is used to specify a path connect to a
virtio-net device.
- queues: The parameter is used to specify the number of the queues
virtio-net device has.
(Default: 1)
Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=3' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
Hi Tetsuya,

a few comments inline below.

/Bruce
Post by Tetsuya Mukawa
---
config/common_linuxapp | 6 +
<snip>
Post by Tetsuya Mukawa
index 0000000..66bfc2b
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,735 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2010-2015 Intel Corporation.
This is probably not the copyright line you want on your new files.
Post by Tetsuya Mukawa
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG "iface"
+#define ETH_VHOST_QUEUES_ARG "queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+ ETH_VHOST_IFACE_ARG,
+ ETH_VHOST_QUEUES_ARG,
+ NULL
+};
+
+static struct ether_addr base_eth_addr = {
+ .addr_bytes = {
+ 0x56 /* V */,
+ 0x48 /* H */,
+ 0x4F /* O */,
+ 0x53 /* S */,
+ 0x54 /* T */,
+ 0x00
+ }
+};
+
+struct vhost_queue {
+ struct virtio_net *device;
+ struct pmd_internal *internal;
+ struct rte_mempool *mb_pool;
+ rte_atomic32_t allow_queuing;
+ rte_atomic32_t while_queuing;
+ rte_atomic64_t rx_pkts;
+ rte_atomic64_t tx_pkts;
+ rte_atomic64_t err_pkts;
+};
+
+struct pmd_internal {
+ TAILQ_ENTRY(pmd_internal) next;
+ char *dev_name;
+ char *iface_name;
+ unsigned nb_rx_queues;
+ unsigned nb_tx_queues;
+
+ struct vhost_queue rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+ struct vhost_queue tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+
+ volatile uint16_t once;
+ pthread_t session_th;
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+ TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static struct rte_eth_link pmd_link = {
+ .link_speed = 10000,
+ .link_duplex = ETH_LINK_FULL_DUPLEX,
+ .link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t nb_rx = 0;
+
+ if (unlikely(r->internal == NULL))
+ return 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Dequeue packets from guest TX queue */
+ nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+ VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
+
+ rte_atomic64_add(&(r->rx_pkts), nb_rx);
Do we really need to use atomics here? It will slow things down a lot. For
other PMDs the assumption is always that only a single thread can access each
queue at a time - it's up to the app to use locks to enforce that restriction
if necessary.
Post by Tetsuya Mukawa
+
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t i, nb_tx = 0;
+
+ if (unlikely(r->internal == NULL))
+ return 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Enqueue packets to guest RX queue */
+ nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+ VIRTIO_RXQ, bufs, nb_bufs);
+
+ rte_atomic64_add(&(r->tx_pkts), nb_tx);
+ rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
+
+ for (i = 0; likely(i < nb_tx); i++)
+ rte_pktmbuf_free(bufs[i]);
+
+ rte_atomic32_set(&r->while_queuing, 0);
+
+ return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+ int found = 0;
+ struct pmd_internal *internal;
+
+ if (ifname == NULL)
+ return NULL;
+
+ pthread_mutex_lock(&internal_list_lock);
+
+ TAILQ_FOREACH(internal, &internals_list, next) {
+ if (!strcmp(internal->iface_name, ifname)) {
+ found = 1;
+ break;
+ }
+ }
+
+ pthread_mutex_unlock(&internal_list_lock);
+
+ if (!found)
+ return NULL;
+
+ return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ uint16_t queues;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "invalid argument\n");
+ return -1;
+ }
+
+ internal = find_internal_resource(dev->ifname);
+ if (internal == NULL) {
+ RTE_LOG(INFO, PMD, "invalid device name\n");
+ return -1;
+ }
+
+ /*
+ * Todo: To support multi queue, get the number of queues here.
+ * So far, vhost provides only one queue.
+ */
+ queues = 1;
+
+ if ((queues < internal->nb_rx_queues) ||
+ (queues < internal->nb_tx_queues)) {
+ RTE_LOG(INFO, PMD, "Not enough queues\n");
+ return -1;
+ }
+
+ eth_dev = rte_eth_dev_allocated(internal->dev_name);
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
typo "failure". Probably shoudl also be written just as "Failed to find ethdev".
Post by Tetsuya Mukawa
+ return -1;
+ }
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ vq->device = dev;
+ vq->internal = internal;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ vq->device = dev;
+ vq->internal = internal;
+ }
+
+ dev->flags |= VIRTIO_DEV_RUNNING;
+ dev->pmd_priv = eth_dev;
+ eth_dev->data->dev_link.link_status = 1;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ rte_atomic32_set(&vq->allow_queuing, 1);
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ rte_atomic32_set(&vq->allow_queuing, 1);
+ }
+ RTE_LOG(INFO, PMD, "New connection established\n");
+
+ return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "invalid argument\n");
+ return;
+ }
+
+ eth_dev = (struct rte_eth_dev *)dev->pmd_priv;
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "failuer to find a ethdev\n");
+ return;
+ }
+
+ internal = eth_dev->data->dev_private;
+
+ /* Wait until rx/tx_pkt_burst stops accesing vhost device */
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ rte_atomic32_set(&vq->allow_queuing, 0);
+ while (rte_atomic32_read(&vq->while_queuing))
+ rte_pause();
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ rte_atomic32_set(&vq->allow_queuing, 0);
+ while (rte_atomic32_read(&vq->while_queuing))
+ rte_pause();
+ }
+
+ eth_dev->data->dev_link.link_status = 0;
+
+ dev->pmd_priv = NULL;
+ dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+ for (i = 0; i < internal->nb_rx_queues; i++) {
+ vq = &internal->rx_vhost_queues[i];
+ vq->device = NULL;
+ }
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ vq = &internal->tx_vhost_queues[i];
+ vq->device = NULL;
+ }
+
+ RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *vhost_driver_session(void *param __rte_unused)
+{
+ static struct virtio_net_device_ops *vhost_ops;
+
+ vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+ if (vhost_ops == NULL)
+ rte_panic("Can't allocate memory\n");
+
+ /* set vhost arguments */
+ vhost_ops->new_device = new_device;
+ vhost_ops->destroy_device = destroy_device;
+ if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
+ rte_panic("Can't register callbacks\n");
+
+ /* start event handling */
+ rte_vhost_driver_session_start();
+
+ rte_free(vhost_ops);
+ pthread_exit(0);
+}
+
+static void vhost_driver_session_start(struct pmd_internal *internal)
+{
+ int ret;
+
+ ret = pthread_create(&internal->session_th,
+ NULL, vhost_driver_session, NULL);
+ if (ret)
+ rte_panic("Can't create a thread\n");
+}
+
+static void vhost_driver_session_stop(struct pmd_internal *internal)
+{
+ int ret;
+
+ ret = pthread_cancel(internal->session_th);
+ if (ret)
+ rte_panic("Can't cancel the thread\n");
+
+ ret = pthread_join(internal->session_th, NULL);
+ if (ret)
+ rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+ int ret;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+ ret = rte_vhost_driver_register(internal->iface_name);
+ if (ret)
+ return ret;
+
+ vhost_driver_session_start(internal);
+ }
+ return 0;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ if (rte_atomic16_cmpset(&internal->once, 1, 0)) {
+ rte_vhost_driver_unregister(internal->iface_name);
+ vhost_driver_session_stop(internal);
+ }
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ internal->rx_vhost_queues[rx_queue_id].mb_pool = mb_pool;
+ dev->data->rx_queues[rx_queue_id] = &internal->rx_vhost_queues[rx_queue_id];
+ return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ dev->data->tx_queues[tx_queue_id] = &internal->tx_vhost_queues[tx_queue_id];
+ return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+ struct rte_eth_dev_info *dev_info)
+{
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ dev_info->driver_name = drivername;
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)-1;
+ dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+ dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+ dev_info->min_rx_bufsize = 0;
+ dev_info->pci_dev = dev->pci_dev;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+ unsigned i;
+ unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+ const struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_rx_queues; i++) {
+ igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i].rx_pkts.cnt;
+ rx_total += igb_stats->q_ipackets[i];
+ }
+
+ for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+ i < internal->nb_tx_queues; i++) {
+ igb_stats->q_opackets[i] = internal->tx_vhost_queues[i].tx_pkts.cnt;
+ igb_stats->q_errors[i] = internal->tx_vhost_queues[i].err_pkts.cnt;
+ tx_total += igb_stats->q_opackets[i];
+ tx_err_total += igb_stats->q_errors[i];
+ }
+
+ igb_stats->ipackets = rx_total;
+ igb_stats->opackets = tx_total;
+ igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+ unsigned i;
+ struct pmd_internal *internal = dev->data->dev_private;
+
+ for (i = 0; i < internal->nb_rx_queues; i++)
+ internal->rx_vhost_queues[i].rx_pkts.cnt = 0;
+ for (i = 0; i < internal->nb_tx_queues; i++) {
+ internal->tx_vhost_queues[i].tx_pkts.cnt = 0;
+ internal->tx_vhost_queues[i].err_pkts.cnt = 0;
+ }
+}
+
+static void
+eth_queue_release(void *q __rte_unused) { ; }
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused) { return 0; }
+
+static const struct eth_dev_ops ops = {
+ .dev_start = eth_dev_start,
+ .dev_stop = eth_dev_stop,
+ .dev_configure = eth_dev_configure,
+ .dev_infos_get = eth_dev_info,
+ .rx_queue_setup = eth_rx_queue_setup,
+ .tx_queue_setup = eth_tx_queue_setup,
+ .rx_queue_release = eth_queue_release,
+ .tx_queue_release = eth_queue_release,
+ .link_update = eth_link_update,
+ .stats_get = eth_stats_get,
+ .stats_reset = eth_stats_reset,
+};
+
+static struct eth_driver rte_vhost_pmd = {
+ .pci_drv = {
+ .name = "rte_vhost_pmd",
+ .drv_flags = RTE_PCI_DRV_DETACHABLE,
+ },
+};
If you base this patchset on top of Bernard's patchset to remove the PCI devices
then you shouldn't need these pci_dev and id_table structures.
Post by Tetsuya Mukawa
+
+static struct rte_pci_id id_table;
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+ char *iface_name,
+ int16_t queues,
+ const unsigned numa_node)
+{
+ struct rte_eth_dev_data *data = NULL;
+ struct rte_pci_device *pci_dev = NULL;
+ struct pmd_internal *internal = NULL;
+ struct rte_eth_dev *eth_dev = NULL;
+ struct ether_addr *eth_addr = NULL;
+
+ RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+ numa_node);
+
+ /* now do all data allocation - for eth_dev structure, dummy pci driver
+ * and internal (private) data
+ */
+ data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+ if (data == NULL)
+ goto error;
+
+ pci_dev = rte_zmalloc_socket(name, sizeof(*pci_dev), 0, numa_node);
+ if (pci_dev == NULL)
+ goto error;
+
+ internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+ if (internal == NULL)
+ goto error;
+
+ eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+ if (eth_addr == NULL)
+ goto error;
+ *eth_addr = base_eth_addr;
+ eth_addr->addr_bytes[5] = index;
+
+ /* reserve an ethdev entry */
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+ if (eth_dev == NULL)
+ goto error;
+
+ /* now put it all together
+ * - store queue data in internal,
+ * - store numa_node info in pci_driver
+ * - point eth_dev_data to internal and pci_driver
+ * - and point eth_dev structure to new eth_dev_data structure
+ */
+ internal->nb_rx_queues = queues;
+ internal->nb_tx_queues = queues;
+ internal->dev_name = strdup(name);
+ if (internal->dev_name == NULL)
+ goto error;
+ internal->iface_name = strdup(iface_name);
+ if (internal->iface_name == NULL)
+ goto error;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_INSERT_TAIL(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ rte_vhost_pmd.pci_drv.name = drivername;
+ rte_vhost_pmd.pci_drv.id_table = &id_table;
+
+ pci_dev->numa_node = numa_node;
+ pci_dev->driver = &rte_vhost_pmd.pci_drv;
+
+ data->dev_private = internal;
+ data->port_id = eth_dev->data->port_id;
+ memmove(data->name, eth_dev->data->name, sizeof(data->name));
+ data->nb_rx_queues = queues;
+ data->nb_tx_queues = queues;
+ data->dev_link = pmd_link;
+ data->mac_addrs = eth_addr;
+
+ /* We'll replace the 'data' originally allocated by eth_dev. So the
+ * vhost PMD resources won't be shared between multi processes.
+ */
+ eth_dev->data = data;
+ eth_dev->driver = &rte_vhost_pmd;
+ eth_dev->dev_ops = &ops;
+ eth_dev->pci_dev = pci_dev;
+
+ /* finally assign rx and tx ops */
+ eth_dev->rx_pkt_burst = eth_vhost_rx;
+ eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+ return data->port_id;
+
+ rte_free(data);
+ rte_free(pci_dev);
+ rte_free(internal);
+ rte_free(eth_addr);
+
+ return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **iface_name = extra_args;
+
+ if (value == NULL)
+ return -1;
+
+ *iface_name = value;
+
+ return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint16_t *q = extra_args;
+
+ if ((value == NULL) || (extra_args == NULL))
+ return -EINVAL;
+
+ *q = (uint16_t)strtoul(value, NULL, 0);
+ if ((*q == USHRT_MAX) && (errno == ERANGE))
+ return -1;
+
+ if (*q > RTE_MAX_QUEUES_PER_PORT)
+ return -1;
+
+ return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+ struct rte_kvargs *kvlist = NULL;
+ int ret = 0;
+ int index;
+ char *iface_name;
+ uint16_t queues;
+
+ RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+ kvlist = rte_kvargs_parse(params, valid_arguments);
+ if (kvlist == NULL)
+ return -1;
+
+ if (strlen(name) < strlen("eth_vhost"))
+ return -1;
+
+ index = strtol(name + strlen("eth_vhost"), NULL, 0);
+ if (errno == ERANGE)
+ return -1;
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+ &open_iface, &iface_name);
+ if (ret < 0)
+ goto out_free;
+ }
+
+ if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+ &open_queues, &queues);
+ if (ret < 0)
+ goto out_free;
+
+ } else
+ queues = 1;
+
+ eth_dev_vhost_create(name, index,
+ iface_name, queues, rte_socket_id());
+
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+ struct rte_eth_dev *eth_dev = NULL;
+ struct pmd_internal *internal;
+
+ RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+ if (name == NULL)
+ return -EINVAL;
+
+ /* find an ethdev entry */
+ eth_dev = rte_eth_dev_allocated(name);
+ if (eth_dev == NULL)
+ return -ENODEV;
+
+ internal = eth_dev->data->dev_private;
+
+ pthread_mutex_lock(&internal_list_lock);
+ TAILQ_REMOVE(&internals_list, internal, next);
+ pthread_mutex_unlock(&internal_list_lock);
+
+ eth_dev_stop(eth_dev);
+
+ if ((internal) && (internal->dev_name))
+ free(internal->dev_name);
+ if ((internal) && (internal->iface_name))
+ free(internal->iface_name);
+ rte_free(eth_dev->data->dev_private);
+ rte_free(eth_dev->data);
+ rte_free(eth_dev->pci_dev);
+
+ rte_eth_dev_release_port(eth_dev);
+ return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+ .name = "eth_vhost",
+ .type = PMD_VDEV,
+ .init = rte_pmd_vhost_devinit,
+ .uninit = rte_pmd_vhost_devuninit,
+};
+
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
+{
+ struct rte_eth_dev *eth_dev;
+
+ if (rte_eth_dev_is_valid_port(port_id) == 0)
+ return NULL;
+
+ eth_dev = &rte_eth_devices[port_id];
+ if (eth_dev->driver == &rte_vhost_pmd) {
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+
+ internal = eth_dev->data->dev_private;
+ vq = &internal->rx_vhost_queues[0];
+ if (vq->device)
+ return vq->device;
+ }
+
+ return NULL;
+}
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..0c4d4b5
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,65 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation. All rights reserved.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_AF_PACKET_H_
+#define _RTE_ETH_AF_PACKET_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_virtio_net.h>
+
+/**
+ * The function convert specified port_id to virtio device structure.
+ * The retured device can be used for vhost library APIs.
+ * To use vhost library APIs and vhost PMD parallely, below API should
+ * not be called, because the API will be called by vhost PMD.
+ * - rte_vhost_driver_session_start()
+ * Once a device is managed by vhost PMD, below API should not be called.
+ * - rte_vhost_driver_unregister()
+ * To unregister the device, call Port Hotplug APIs.
+ *
+ * port number
+ * virtio net device structure corresponding to the specified port
+ * NULL will be returned in error cases.
+ */
+struct virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..bf0361a
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,8 @@
+DPDK_2.2 {
+
+
+ rte_eth_vhost_portid2vdev;
+
+ local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 3871205..1c42fb1 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -144,7 +144,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += -lrte_pmd_af_packet
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += -lrte_pmd_null
-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)
endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
--
2.1.4
Tetsuya Mukawa
2015-10-23 03:48:13 UTC
Permalink
Post by Bruce Richardson
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.
The PMD has 2 parameters.
- iface: The parameter is used to specify a path connect to a
virtio-net device.
- queues: The parameter is used to specify the number of the queues
virtio-net device has.
(Default: 1)
Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=3' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0
Hi Tetsuya,
a few comments inline below.
/Bruce
Post by Tetsuya Mukawa
---
config/common_linuxapp | 6 +
<snip>
Post by Tetsuya Mukawa
index 0000000..66bfc2b
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,735 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2010-2015 Intel Corporation.
This is probably not the copyright line you want on your new files.
Hi Bruce,

I appreciate your comments.
Yes, I will change above.
Post by Bruce Richardson
Post by Tetsuya Mukawa
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+ struct vhost_queue *r = q;
+ uint16_t nb_rx = 0;
+
+ if (unlikely(r->internal == NULL))
+ return 0;
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ return 0;
+
+ rte_atomic32_set(&r->while_queuing, 1);
+
+ if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ goto out;
+
+ /* Dequeue packets from guest TX queue */
+ nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+ VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
+
+ rte_atomic64_add(&(r->rx_pkts), nb_rx);
Do we really need to use atomics here? It will slow things down a lot. For
other PMDs the assumption is always that only a single thread can access each
queue at a time - it's up to the app to use locks to enforce that restriction
if necessary.
I agree we don't need to use atomic here.
I will change it in next patches.
Post by Bruce Richardson
Post by Tetsuya Mukawa
+static int
+new_device(struct virtio_net *dev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internal *internal;
+ struct vhost_queue *vq;
+ uint16_t queues;
+ unsigned i;
+
+ if (dev == NULL) {
+ RTE_LOG(INFO, PMD, "invalid argument\n");
+ return -1;
+ }
+
+ internal = find_internal_resource(dev->ifname);
+ if (internal == NULL) {
+ RTE_LOG(INFO, PMD, "invalid device name\n");
+ return -1;
+ }
+
+ /*
+ * Todo: To support multi queue, get the number of queues here.
+ * So far, vhost provides only one queue.
+ */
+ queues = 1;
+
+ if ((queues < internal->nb_rx_queues) ||
+ (queues < internal->nb_tx_queues)) {
+ RTE_LOG(INFO, PMD, "Not enough queues\n");
+ return -1;
+ }
+
+ eth_dev = rte_eth_dev_allocated(internal->dev_name);
+ if (eth_dev == NULL) {
+ RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
typo "failure". Probably shoudl also be written just as "Failed to find ethdev".
Thanks, I will fix it.
Post by Bruce Richardson
Post by Tetsuya Mukawa
+static struct eth_driver rte_vhost_pmd = {
+ .pci_drv = {
+ .name = "rte_vhost_pmd",
+ .drv_flags = RTE_PCI_DRV_DETACHABLE,
+ },
+};
If you base this patchset on top of Bernard's patchset to remove the PCI devices
then you shouldn't need these pci_dev and id_table structures.
Sure, I will check him latest patches. And will rebase on it.

Regards,
Tetsuya
Xie, Huawei
2015-10-29 14:25:00 UTC
Permalink
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.
Hi Tetsuya:
I haven't got bandwidth to review the details of this patch but i think
it is the very right thing to do. It is still RFC patch.Is your goal to
make it into 2.2?
Tetsuya Mukawa
2015-10-30 01:18:32 UTC
Permalink
Post by Xie, Huawei
Post by Tetsuya Mukawa
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.
I haven't got bandwidth to review the details of this patch but i think
it is the very right thing to do. It is still RFC patch.Is your goal to
make it into 2.2?
Hi Xie,

Thanks for caring it. Yes, I want to merge it to DPDK-2.2.
I've already sent not RFC patches. Could you please check below?

Subject: [PATCH 0/3] Add VHOST PMD
Date: Tue, 27 Oct 2015 15:12:52 +0900
Message-Id: <1445926375-18986-1-git-send-email-***@igel.co.jp>

Following patch involved in above patch series was submitted as a separate patch.
So please ignore it.
- [PATCH 1/3] vhost: Fix wrong handling of virtqueue array index

Thanks,
Tetsuya
Continue reading on narkive:
Loading...