Discussion:
[dpdk-dev] [RFC PATCH 0/2] Virtio-net PMD Extension to work on host.
Tetsuya Mukawa
2015-11-19 10:57:28 UTC
Permalink
THIS IS A PoC IMPLEMENATION.

[Abstraction]

Normally, virtio-net PMD only works on VM, because there is no virtio-net device on host.
This RFC patch extends virtio-net PMD to be able to work on host as virtual PMD.
But we didn't implement virtio-net device as a part of virtio-net PMD.
To prepare virtio-net device for the PMD, start QEMU process with special QTest mode, then connect it from virtio-net PMD through unix domain socket.

The PMD can connect to anywhere QEMU virtio-net device can.
For example, the PMD can connects to vhost-net kernel module and vhost-user backend application.
Similar to virtio-net PMD on QEMU, application memory that uses virtio-net PMD will be shared between vhost backend application.
But vhost backend application memory will not be shared.

Main target of this PMD is container like docker, rkt, lxc and etc.
We can isolate related processes(virtio-net PMD process, QEMU and vhost-user backend process) by container.
But, to communicate through unix domain socket, shared directory will be needed.


[How to use]

So far, we need QEMU patch to connect to vhost-user backend.
Please check known issue in later section.
Because of this, I will describe example of using vhost-net kernel module.

- Compile
Set "CONFIG_RTE_VIRTIO_VDEV=y" in config/common_linux.
Then compile it.

- Start QEMU like below.
$ sudo qemu-system-x86_64 -qtest unix:/tmp/qtest0,server -machine accel=qtest \
-display none -qtest-log /dev/null \
-netdev type=tap,script=/etc/qemu-ifup,id=net0,vhost=on \
-device virtio-net-pci,netdev=net0 \
-chardev socket,id=chr1,path=/tmp/ivshmem0,server \
-device ivshmem,size=1G,chardev=chr1,vectors=1

- Start DPDK application like below
$ sudo ./testpmd -c f -n 1 -m 1024 --shm \
--vdev="eth_cvio0,qtest=/tmp/qtest0,ivshmem=/tmp/ivshmem0" -- \
--disable-hw-vlan --txqflags=0xf00 -i

- Check created tap device.

(*1) Please Specify same memory size in QEMU and DPDK command line.


[Detailed Description]

- virtio-net device implementation
The PMD uses QEMU virtio-net device. To do that, QEMU QTest functionality is used.
QTest is a test framework of QEMU devices. It allows us to implement a device driver outside of QEMU.
With QTest, we can implement DPDK application and virtio-net PMD as standalone process on host.
When QEMU is invoked as QTest mode, any guest code will not run.
To know more about QTest, see below.
http://wiki.qemu.org/Features/QTest

- probing devices
QTest provides a unix domain socket. Through this socket, driver process can access to I/O port and memory of QEMU virtual machine.
The PMD will send I/O port accesses to probe pci devices.
If we can find virtio-net and ivshmem device, initialize the devices.
Also, I/O port accesses of virtio-net PMD will be sent through socket, and virtio-net PMD can initialize vitio-net device on QEMU correctly.

- ivshmem device to share memory
To share memory that virtio-net PMD process uses, ivshmem device will be used.
Because ivshmem device can only handle one file descriptor, shared memory should be consist of one file.
To allocate such a memory, EAL has new option called "--shm".
If the option is specified, EAL will open a file and allocate memory from hugepages.
While initializing ivshmem device, we can set BAR(Base Address Register).
It represents which memory QEMU vcpu can access to this shared memory.
We will specify host physical address of shared memory as this address.
It is very useful because we don't need to apply patch to QEMU to calculate address offset.
(For example, if virtio-net PMD process will allocate memory from shared memory, then specify the physical address of it to virtio-net register, QEMU virtio-net device can understand it without calculating address offset.)

- Known limitation
So far, the PMD doesn't handle interrupts from QEMU devices.
Because of this, VIRTIO_NET_F_STATUS functionality is dropped.
But without it, we can use all virtio-net functions.

- Known issues
So far, to use vhost-user, we need to apply vhost-user patch to QEMU and DPDK vhost library.
This is because, QEMU will not send memory information and file descriptor of ivshmem device to vhost-user backend.
(Anyway, vhost-net kernel module can receive the information. So vhost-user behavior will not be correct. I will submit the patch to QEMU soon)
Also, we may have an issue in DPDK vhost library to handle kickfd and callfd. The patch for it is needed.
(Let me check it more)
If someone wants to check vhost-user behavior, I will describe it more in later email.


[Addition]

We can apply same manner to handle any kind of QEMU devices from DPDK application.
So far, I don't have any ideas except for virtio-net device. But someone would have.


Tetsuya Mukawa (2):
EAL: Add new EAL "--shm" option.
virtio: Extend virtio-net PMD to support container environment

config/common_linuxapp | 5 +
drivers/net/virtio/Makefile | 4 +
drivers/net/virtio/qtest.c | 590 +++++++++++++++++++++++++++++
drivers/net/virtio/virtio_ethdev.c | 214 ++++++++++-
drivers/net/virtio/virtio_ethdev.h | 16 +
drivers/net/virtio/virtio_pci.h | 25 ++
lib/librte_eal/common/eal_common_options.c | 5 +
lib/librte_eal/common/eal_internal_cfg.h | 1 +
lib/librte_eal/common/eal_options.h | 2 +
lib/librte_eal/common/include/rte_memory.h | 5 +
lib/librte_eal/linuxapp/eal/eal_memory.c | 71 ++++
11 files changed, 917 insertions(+), 21 deletions(-)
create mode 100644 drivers/net/virtio/qtest.c
--
2.1.4
Tetsuya Mukawa
2015-11-19 10:57:29 UTC
Permalink
The patch adds new EAL "--shm" option. If the option is specified,
EAL will allocate one file from hugetlbfs. This memory is for sharing
memory between DPDK applicaiton and QEMU ivhsmem device.

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
lib/librte_eal/common/eal_common_options.c | 5 +++
lib/librte_eal/common/eal_internal_cfg.h | 1 +
lib/librte_eal/common/eal_options.h | 2 +
lib/librte_eal/common/include/rte_memory.h | 5 +++
lib/librte_eal/linuxapp/eal/eal_memory.c | 71 ++++++++++++++++++++++++++++++
5 files changed, 84 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 79db608..67b4e52 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -82,6 +82,7 @@ eal_long_options[] = {
{OPT_NO_HUGE, 0, NULL, OPT_NO_HUGE_NUM },
{OPT_NO_PCI, 0, NULL, OPT_NO_PCI_NUM },
{OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM },
+ {OPT_SHM, 0, NULL, OPT_SHM_NUM },
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM },
{OPT_PCI_WHITELIST, 1, NULL, OPT_PCI_WHITELIST_NUM },
{OPT_PROC_TYPE, 1, NULL, OPT_PROC_TYPE_NUM },
@@ -723,6 +724,10 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_hugetlbfs = 1;
break;

+ case OPT_SHM_NUM:
+ conf->shm = 1;
+ break;
+
case OPT_NO_PCI_NUM:
conf->no_pci = 1;
break;
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index 5f1367e..362ce12 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -66,6 +66,7 @@ struct internal_config {
volatile unsigned no_hugetlbfs; /**< true to disable hugetlbfs */
unsigned hugepage_unlink; /**< true to unlink backing files */
volatile unsigned xen_dom0_support; /**< support app running on Xen Dom0*/
+ volatile unsigned shm; /**< true to create shared memory for ivshmem */
volatile unsigned no_pci; /**< true to disable PCI */
volatile unsigned no_hpet; /**< true to disable HPET */
volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index 4245fd5..263b4f8 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -55,6 +55,8 @@ enum {
OPT_HUGE_DIR_NUM,
#define OPT_HUGE_UNLINK "huge-unlink"
OPT_HUGE_UNLINK_NUM,
+#define OPT_SHM "shm"
+ OPT_SHM_NUM,
#define OPT_LCORES "lcores"
OPT_LCORES_NUM,
#define OPT_LOG_LEVEL "log-level"
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 1bed415..9c1effc 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -100,6 +100,7 @@ struct rte_memseg {
int32_t socket_id; /**< NUMA socket ID. */
uint32_t nchannel; /**< Number of channels. */
uint32_t nrank; /**< Number of ranks. */
+ int fd; /**< fd used for share this memory */
#ifdef RTE_LIBRTE_XEN_DOM0
/**< store segment MFNs */
uint64_t mfn[DOM0_NUM_MEMBLOCK];
@@ -128,6 +129,10 @@ int rte_mem_lock_page(const void *virt);
*/
phys_addr_t rte_mem_virt2phy(const void *virt);

+
+int
+rte_memseg_info_get(int index, int *pfd, uint64_t *psize, void **paddr);
+
/**
* Get the layout of the available physical memory.
*
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 657d19f..c46c2cf 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -143,6 +143,21 @@ rte_mem_lock_page(const void *virt)
return mlock((void*)aligned, page_size);
}

+int
+rte_memseg_info_get(int index, int *pfd, uint64_t *psize, void **paddr)
+{
+ struct rte_mem_config *mcfg;
+ mcfg = rte_eal_get_configuration()->mem_config;
+
+ if (pfd != NULL)
+ *pfd = mcfg->memseg[index].fd;
+ if (psize != NULL)
+ *psize = (uint64_t)mcfg->memseg[index].len;
+ if (paddr != NULL)
+ *paddr = (void *)(uint64_t)mcfg->memseg[index].addr;
+ return 0;
+}
+
/*
* Get physical address of any mapped virtual address in the current process.
*/
@@ -1068,6 +1083,41 @@ calc_num_pages_per_socket(uint64_t * memory,
return total_num_pages;
}

+static void *
+rte_eal_shm_create(int *pfd, const char *hugedir)
+{
+ int ret, fd;
+ char filepath[256];
+ void *vaddr;
+ uint64_t size = internal_config.memory;
+
+ sprintf(filepath, "%s/%s_cvio", hugedir,
+ internal_config.hugefile_prefix);
+
+ fd = open(filepath, O_CREAT | O_RDWR, 0600);
+ if (fd < 0)
+ rte_panic("open %s failed: %s\n", filepath, strerror(errno));
+
+ ret = flock(fd, LOCK_EX);
+ if (ret < 0) {
+ close(fd);
+ rte_panic("flock %s failed: %s\n", filepath, strerror(errno));
+ }
+
+ ret = ftruncate(fd, size);
+ if (ret < 0)
+ rte_panic("ftruncate failed: %s\n", strerror(errno));
+
+ vaddr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+ if (vaddr != MAP_FAILED) {
+ memset(vaddr, 0, size);
+ *pfd = fd;
+ }
+ memset(vaddr, 0, size);
+
+ return vaddr;
+}
+
/*
* Prepare physical memory mapping: fill configuration structure with
* these infos, return 0 on success.
@@ -1120,6 +1170,27 @@ rte_eal_hugepage_init(void)
return 0;
}

+ /* create shared memory consist of only one file */
+ if (internal_config.shm) {
+ int fd;
+ struct hugepage_info *hpi;
+
+ hpi = &internal_config.hugepage_info[0];
+ addr = rte_eal_shm_create(&fd, hpi->hugedir);
+ if (addr == MAP_FAILED) {
+ RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
+ strerror(errno));
+ return -1;
+ }
+ mcfg->memseg[0].phys_addr = rte_mem_virt2phy(addr);
+ mcfg->memseg[0].addr = addr;
+ mcfg->memseg[0].hugepage_sz = hpi->hugepage_sz;
+ mcfg->memseg[0].len = internal_config.memory;
+ mcfg->memseg[0].socket_id = 0;
+ mcfg->memseg[0].fd = fd;
+ return 0;
+ }
+
/* check if app runs on Xen Dom0 */
if (internal_config.xen_dom0_support) {
#ifdef RTE_LIBRTE_XEN_DOM0
--
2.1.4
Tetsuya Mukawa
2015-12-16 08:37:27 UTC
Permalink
[Change log]

PATCH v1:
(Just listing functionality changes and important bug fix)
* Support virtio-net interrupt handling.
(It means virtio-net PMD on host and guest have same virtio-net features)
* Fix memory allocation method to allocate contiguous memory correctly.
* Port Hotplug is supported.
* Rebase on DPDK-2.2.


[Abstraction]

Normally, virtio-net PMD only works on VM, because there is no virtio-net device on host.
This RFC patch extends virtio-net PMD to be able to work on host as virtual PMD.
But we didn't implement virtio-net device as a part of virtio-net PMD.
To prepare virtio-net device for the PMD, start QEMU process with special QTest mode, then connect it from virtio-net PMD through unix domain socket.

The virtio-net PMD on host is fully compatible with the PMD on guest.
We can use same functionalities, and connect to anywhere QEMU virtio-net device can.
For example, the PMD can use virtio-net multi queues function. Also it can connects to vhost-net kernel module and vhost-user backend application.
Similar to virtio-net PMD on QEMU, application memory that uses virtio-net PMD will be shared between vhost backend application. But vhost backend application memory will not be shared.

Main target of this PMD is container like docker, rkt, lxc and etc.
We can isolate related processes(virtio-net PMD process, QEMU and vhost-user backend process) by container.
But, to communicate through unix domain socket, shared directory will be needed.


[How to use]

So far, we need QEMU patch to connect to vhost-user backend.
See below patch.
- http://patchwork.ozlabs.org/patch/552549/
To know how to use, check commit log.


[Detailed Description]

- virtio-net device implementation
This host mode PMD uses QEMU virtio-net device. To do that, QEMU QTest functionality is used.
QTest is a test framework of QEMU devices. It allows us to implement a device driver outside of QEMU.
With QTest, we can implement DPDK application and virtio-net PMD as standalone process on host.
When QEMU is invoked as QTest mode, any guest code will not run.
To know more about QTest, see below.
- http://wiki.qemu.org/Features/QTest

- probing devices
QTest provides a unix domain socket. Through this socket, driver process can access to I/O port and memory of QEMU virtual machine.
The PMD will send I/O port accesses to probe pci devices.
If we can find virtio-net and ivshmem device, initialize the devices.
Also, I/O port accesses of virtio-net PMD will be sent through socket, and virtio-net PMD can initialize vitio-net device on QEMU correctly.

- ivshmem device to share memory
To share memory that virtio-net PMD process uses, ivshmem device will be used.
Because ivshmem device can only handle one file descriptor, shared memory should be consist of one file.
To allocate such a memory, EAL has new option called "--contig-mem".
If the option is specified, EAL will open a file and allocate memory from hugepages.
While initializing ivshmem device, we can set BAR(Base Address Register).
It represents which memory QEMU vcpu can access to this shared memory.
We will specify host physical address of shared memory as this address.
It is very useful because we don't need to apply patch to QEMU to calculate address offset.
(For example, if virtio-net PMD process will allocate memory from shared memory, then specify the physical address of it to virtio-net register, QEMU virtio-net device can understand it without calculating address offset.)


[Known issues]

- vhost-user
So far, to use vhost-user, we need to apply a patch to QEMU.
This is because, QEMU will not send memory information and file descriptor of ivshmem device to vhost-user backend.
I have submitted the patch to QEMU.
See "http://patchwork.ozlabs.org/patch/552549/".
Also, we may have an issue in DPDK vhost library to handle kickfd and callfd.
The patch for this issue is needed. I have a workaround patch, but let me check it more.
If someone wants to check vhost-user behavior, I will describe it more in later email.




Tetsuya Mukawa (2):
EAL: Add new EAL "--contig-mem" option
virtio: Extend virtio-net PMD to support container environment

config/common_linuxapp | 1 +
drivers/net/virtio/Makefile | 4 +
drivers/net/virtio/qtest.c | 1107 ++++++++++++++++++++++++++++
drivers/net/virtio/virtio_ethdev.c | 341 ++++++++-
drivers/net/virtio/virtio_ethdev.h | 12 +
drivers/net/virtio/virtio_pci.h | 25 +
lib/librte_eal/common/eal_common_options.c | 7 +
lib/librte_eal/common/eal_internal_cfg.h | 1 +
lib/librte_eal/common/eal_options.h | 2 +
lib/librte_eal/linuxapp/eal/eal_memory.c | 77 +-
10 files changed, 1543 insertions(+), 34 deletions(-)
create mode 100644 drivers/net/virtio/qtest.c
--
2.1.4
Tetsuya Mukawa
2015-12-16 08:37:28 UTC
Permalink
This option is for allocating physically contiguous memory for EAL.
EAL will provide only one file descriptor for the memory.
So far, this memory will be used by virtio-net PMD on host or container.

DPDK already has had "RTE_EAL_SINGLE_FILE_SEGMENTS" compile option.
It allows us to create one file descriptor for each contiguous memory
regions. But with this option, DPDK may allocate memory that consists of
multiple contiguous memory regions.

The patch adds "--contig-mem" option. It is only valid if
"RTE_EAL_SINGLE_FILE_SEGMENTS" is enabled.
If this option is specified, EAL memory will consist of
only one contiguous memory.

To implement this option, EAL implementation is changed like below.
- In calc_num_pages_per_socket(), EAL checks whether we can allocate
memory that has enough size and consists of one contiguous memory.
- In unmap_unneeded_hugepages(), EAL unmap memory that doesn't have
enough memory size.

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
lib/librte_eal/common/eal_common_options.c | 7 +++
lib/librte_eal/common/eal_internal_cfg.h | 1 +
lib/librte_eal/common/eal_options.h | 2 +
lib/librte_eal/linuxapp/eal/eal_memory.c | 77 ++++++++++++++++++++++++++++--
4 files changed, 82 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 29942ea..55d537e 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -95,6 +95,7 @@ eal_long_options[] = {
{OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM },
{OPT_VMWARE_TSC_MAP, 0, NULL, OPT_VMWARE_TSC_MAP_NUM },
{OPT_XEN_DOM0, 0, NULL, OPT_XEN_DOM0_NUM },
+ {OPT_CONTIG_MEM, 0, NULL, OPT_CONTIG_MEM_NUM },
{0, 0, NULL, 0 }
};

@@ -854,6 +855,12 @@ eal_parse_common_option(int opt, const char *optarg,
conf->process_type = eal_parse_proc_type(optarg);
break;

+#ifdef RTE_EAL_SINGLE_FILE_SEGMENTS
+ case OPT_CONTIG_MEM_NUM:
+ conf->contig_mem = 1;
+ break;
+#endif
+
case OPT_MASTER_LCORE_NUM:
if (eal_parse_master_lcore(optarg) < 0) {
RTE_LOG(ERR, EAL, "invalid parameter for --"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index 5f1367e..c02220d 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -66,6 +66,7 @@ struct internal_config {
volatile unsigned no_hugetlbfs; /**< true to disable hugetlbfs */
unsigned hugepage_unlink; /**< true to unlink backing files */
volatile unsigned xen_dom0_support; /**< support app running on Xen Dom0*/
+ volatile unsigned contig_mem; /**< true to create contiguous eal memory */
volatile unsigned no_pci; /**< true to disable PCI */
volatile unsigned no_hpet; /**< true to disable HPET */
volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index a881c62..a58e371 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -55,6 +55,8 @@ enum {
OPT_HUGE_DIR_NUM,
#define OPT_HUGE_UNLINK "huge-unlink"
OPT_HUGE_UNLINK_NUM,
+#define OPT_CONTIG_MEM "contig-mem"
+ OPT_CONTIG_MEM_NUM,
#define OPT_LCORES "lcores"
OPT_LCORES_NUM,
#define OPT_LOG_LEVEL "log-level"
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 846fd31..63e5296 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -851,9 +851,21 @@ unmap_unneeded_hugepages(struct hugepage_file *hugepg_tbl,
/* find a page that matches the criteria */
if ((hp->size == hpi[size].hugepage_sz) &&
(hp->socket_id == (int) socket)) {
+#ifdef RTE_EAL_SINGLE_FILE_SEGMENTS
+ int nr_pg_left = hpi[size].num_pages[socket] - pages_found;

+ /*
+ * if contig_mem is enabled and the page doesn't have
+ * requested space, unmap it.
+ * Also, if we skipped enough pages, unmap the rest.
+ */
+ if ((pages_found == hpi[size].num_pages[socket]) ||
+ ((internal_config.contig_mem) &&
+ (hp->repeated < nr_pg_left))) {
+#else
/* if we skipped enough pages, unmap the rest */
if (pages_found == hpi[size].num_pages[socket]) {
+#endif
uint64_t unmap_len;

#ifdef RTE_EAL_SINGLE_FILE_SEGMENTS
@@ -875,9 +887,6 @@ unmap_unneeded_hugepages(struct hugepage_file *hugepg_tbl,
#ifdef RTE_EAL_SINGLE_FILE_SEGMENTS
/* else, check how much do we need to map */
else {
- int nr_pg_left =
- hpi[size].num_pages[socket] - pages_found;
-
/* if we need enough memory to fit into the segment */
if (hp->repeated <= nr_pg_left) {
pages_found += hp->repeated;
@@ -949,7 +958,9 @@ static int
calc_num_pages_per_socket(uint64_t * memory,
struct hugepage_info *hp_info,
struct hugepage_info *hp_used,
- unsigned num_hp_info)
+ unsigned num_hp_info,
+ struct hugepage_file *hugepg_tbl __rte_unused,
+ unsigned nr_hugefiles __rte_unused)
{
unsigned socket, j, i = 0;
unsigned requested, available;
@@ -960,6 +971,46 @@ calc_num_pages_per_socket(uint64_t * memory,
if (num_hp_info == 0)
return -1;

+#ifdef RTE_EAL_SINGLE_FILE_SEGMENTS
+ /*
+ * If contiguous memory is required, but specific memory amounts
+ * per socket weren't requested
+ */
+ if ((internal_config.force_sockets == 0)
+ && (internal_config.contig_mem == 1)) {
+ size_t max_contig_memory_per_socket[RTE_MAX_NUMA_NODES];
+ size_t total_size, max_contig_memory = 0;
+
+ memset(max_contig_memory_per_socket, 0,
+ sizeof(max_contig_memory_per_socket));
+
+ /* Calculate maximum contiguous memory size */
+ for (i = 0; i < nr_hugefiles; i++) {
+ socket = hugepg_tbl[i].socket_id;
+
+ max_contig_memory_per_socket[socket] =
+ RTE_MAX(max_contig_memory_per_socket[socket],
+ (hugepg_tbl[i].size * hugepg_tbl[i].repeated));
+ max_contig_memory = RTE_MAX(max_contig_memory,
+ max_contig_memory_per_socket[socket]);
+ }
+
+ total_size = internal_config.memory;
+
+ /* If no enough contiguous memory */
+ if (max_contig_memory < total_mem) {
+ /* To display warning, set how much we can find */
+ total_mem -= max_contig_memory;
+ goto out;
+ }
+
+ /* Find suitable contiguous memory */
+ for (socket = 0; socket < RTE_MAX_NUMA_NODES; socket++) {
+ if (total_size <= max_contig_memory_per_socket[socket])
+ memory[socket] = total_size;
+ }
+ } else
+#endif
/* if specific memory amounts per socket weren't requested */
if (internal_config.force_sockets == 0) {
int cpu_per_socket[RTE_MAX_NUMA_NODES];
@@ -1009,6 +1060,18 @@ calc_num_pages_per_socket(uint64_t * memory,
for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_mem != 0; socket++) {
/* skips if the memory on specific socket wasn't requested */
for (i = 0; i < num_hp_info && memory[socket] != 0; i++){
+#ifdef RTE_EAL_SINGLE_FILE_SEGMENTS
+ if (internal_config.contig_mem) {
+ size_t memory_size;
+
+ memory_size = hp_info[i].num_pages[socket] *
+ hp_info[i].hugepage_sz;
+ /* If memory size isn't enough, skip it */
+ if (memory[socket] > memory_size)
+ continue;
+ }
+#endif
+
hp_used[i].hugedir = hp_info[i].hugedir;
hp_used[i].num_pages[socket] = RTE_MIN(
memory[socket] / hp_info[i].hugepage_sz,
@@ -1064,6 +1127,9 @@ calc_num_pages_per_socket(uint64_t * memory,
}
}

+#ifdef RTE_EAL_SINGLE_FILE_SEGMENTS
+out:
+#endif
/* if we didn't satisfy total memory requirements */
if (total_mem > 0) {
requested = (unsigned) (internal_config.memory / 0x100000);
@@ -1268,7 +1334,8 @@ rte_eal_hugepage_init(void)
/* calculate final number of pages */
nr_hugepages = calc_num_pages_per_socket(memory,
internal_config.hugepage_info, used_hp,
- internal_config.num_hugepage_sizes);
+ internal_config.num_hugepage_sizes,
+ tmp_hp, nr_hugefiles);

/* error if not enough memory available */
if (nr_hugepages < 0)
--
2.1.4
Tetsuya Mukawa
2015-12-16 08:37:29 UTC
Permalink
The patch adds a new virtio-net PMD configuration that allows the PMD to
work on host as if the PMD is in VM.
Here is new configuration for virtio-net PMD.
- CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE
To use this mode, EAL needs physically contiguous memory. To allocate
such memory, enable below option, and add "--contig-mem" option to
application command line.
- CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS

To prepare virtio-net device on host, the users need to invoke QEMU process
in special qtest mode. This mode is mainly used for testing QEMU devices
from outer process. In this mode, no guest runs.
Here is QEMU command line.

$ qemu-system-x86_64 \
-machine pc-i440fx-1.4,accel=qtest \
-display none -qtest-log /dev/null \
-qtest unix:/tmp/socket,server \
-netdev type=tap,script=/etc/qemu-ifup,id=net0,queues=1 \
-device virtio-net-pci,netdev=net0,mq=on \
-chardev socket,id=chr1,path=/tmp/ivshmem,server \
-device ivshmem,size=1G,chardev=chr1,vectors=1

* QEMU process is needed per port.
* In most cases, just using above command is enough.
* The vhost backends like vhost-net and vhost-user can be specified.
* Only checked "pc-i440fx-1.4" machine, but may work with other
machines. It depends on a machine has piix3 south bridge.
If the machine doesn't have, virtio-net PMD cannot receive status
changed interrupts.
* Should not add "--enable-kvm" to QEMU command line.

After invoking QEMU, the PMD can connect to QEMU process using unix
domain sockets. Over these sockets, virtio-net, ivshmem and piix3
device in QEMU are probed by the PMD.
Here is example of command line.

$ testpmd -c f -n 1 -m 1024 --contig-mem \
--vdev="eth_virtio_net0,qtest=/tmp/socket,ivshmem=/tmp/ivshmem" \
-- --disable-hw-vlan --txqflags=0xf00 -i

Please specify same unix domain sockets and memory size in both QEMU and
DPDK command lines like above.
The share memory size should be power of 2, because ivshmem only accepts
such memry size.

Also, "--contig-mem" option is needed for the PMD like above. This option
allocates contiguous memory, and create one hugepage file on hugetlbfs.
If there is no enough contiguous memory, initialization will be failed.

This contiguous memory is used as shared memory between DPDK application
and ivshmem device in QEMU.

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
config/common_linuxapp | 1 +
drivers/net/virtio/Makefile | 4 +
drivers/net/virtio/qtest.c | 1107 ++++++++++++++++++++++++++++++++++++
drivers/net/virtio/virtio_ethdev.c | 341 ++++++++++-
drivers/net/virtio/virtio_ethdev.h | 12 +
drivers/net/virtio/virtio_pci.h | 25 +
6 files changed, 1461 insertions(+), 29 deletions(-)
create mode 100644 drivers/net/virtio/qtest.c

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 74bc515..eaa720c 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -269,6 +269,7 @@ CONFIG_RTE_LIBRTE_PMD_SZEDATA2=n
# Compile burst-oriented VIRTIO PMD driver
#
CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
+CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE=n
CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_INIT=n
CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_RX=n
CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_TX=n
diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 43835ba..697e629 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -52,6 +52,10 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c

+ifeq ($(CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE),y)
+ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += qtest.c
+endif
+
# this lib depends upon:
DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_mempool lib/librte_mbuf
diff --git a/drivers/net/virtio/qtest.c b/drivers/net/virtio/qtest.c
new file mode 100644
index 0000000..4ffdefb
--- /dev/null
+++ b/drivers/net/virtio/qtest.c
@@ -0,0 +1,1107 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 IGEL Co., Ltd. All rights reserved.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of IGEL Co., Ltd. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/queue.h>
+#include <signal.h>
+#include <pthread.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
+#include <rte_memory.h>
+#include <rte_malloc.h>
+#include <rte_common.h>
+#include <rte_interrupts.h>
+
+#include "virtio_pci.h"
+#include "virtio_logs.h"
+#include "virtio_ethdev.h"
+
+#define NB_BUS 256
+#define NB_DEVICE 32
+#define NB_BAR 6
+
+/* PCI common configuration registers */
+#define REG_ADDR_VENDOR_ID 0x0
+#define REG_ADDR_DEVICE_ID 0x2
+#define REG_ADDR_COMMAND 0x4
+#define REG_ADDR_STATUS 0x6
+#define REG_ADDR_REVISION_ID 0x8
+#define REG_ADDR_CLASS_CODE 0x9
+#define REG_ADDR_CACHE_LINE_S 0xc
+#define REG_ADDR_LAT_TIMER 0xd
+#define REG_ADDR_HEADER_TYPE 0xe
+#define REG_ADDR_BIST 0xf
+#define REG_ADDR_BAR0 0x10
+#define REG_ADDR_BAR1 0x14
+#define REG_ADDR_BAR2 0x18
+#define REG_ADDR_BAR3 0x1c
+#define REG_ADDR_BAR4 0x20
+#define REG_ADDR_BAR5 0x24
+
+/* PCI common configuration register values */
+#define REG_VAL_COMMAND_IO 0x1
+#define REG_VAL_COMMAND_MEMORY 0x2
+#define REG_VAL_COMMAND_MASTER 0x4
+#define REG_VAL_HEADER_TYPE_ENDPOINT 0x0
+#define REG_VAL_BAR_MEMORY 0x0
+#define REG_VAL_BAR_IO 0x1
+#define REG_VAL_BAR_LOCATE_32 0x0
+#define REG_VAL_BAR_LOCATE_UNDER_1MB 0x2
+#define REG_VAL_BAR_LOCATE_64 0x4
+
+/* PIIX3 configuration registers */
+#define PIIX3_REG_ADDR_PIRQA 0x60
+#define PIIX3_REG_ADDR_PIRQB 0x61
+#define PIIX3_REG_ADDR_PIRQC 0x62
+#define PIIX3_REG_ADDR_PIRQD 0x63
+
+/* Device information */
+#define VIRTIO_NET_DEVICE_ID 0x1000
+#define VIRTIO_NET_VENDOR_ID 0x1af4
+#define VIRTIO_NET_IO_START 0xc000
+#define VIRTIO_NET_IRQ_NUM 10
+#define IVSHMEM_DEVICE_ID 0x1110
+#define IVSHMEM_VENDOR_ID 0x1af4
+#define IVSHMEM_MEMORY_START 0x1000
+#define IVSHMEM_PROTOCOL_VERSION 0
+#define PIIX3_DEVICE_ID 0x7000
+#define PIIX3_VENDOR_ID 0x8086
+
+#define PCI_CONFIG_ADDR(_bus, _device, _function, _offset) ( \
+ (1 << 31) | ((_bus) & 0xff) << 16 | ((_device) & 0x1f) << 11 | \
+ ((_function) & 0xf) << 8 | ((_offset) & 0xfc))
+
+static char interrupt_message[32];
+
+enum qtest_pci_bar_type {
+ QTEST_PCI_BAR_DISABLE = 0,
+ QTEST_PCI_BAR_IO,
+ QTEST_PCI_BAR_MEMORY_UNDER_1MB,
+ QTEST_PCI_BAR_MEMORY_32,
+ QTEST_PCI_BAR_MEMORY_64
+};
+
+struct qtest_pci_bar {
+ enum qtest_pci_bar_type type;
+ uint8_t addr;
+ uint64_t region_start;
+ uint64_t region_size;
+};
+
+struct qtest_session;
+TAILQ_HEAD(qtest_pci_device_list, qtest_pci_device);
+struct qtest_pci_device {
+ TAILQ_ENTRY(qtest_pci_device) next;
+ const char *name;
+ uint16_t device_id;
+ uint16_t vendor_id;
+ uint8_t bus_addr;
+ uint8_t device_addr;
+ struct qtest_pci_bar bar[NB_BAR];
+ int (*init)(struct qtest_session *s, struct qtest_pci_device *dev);
+};
+
+union qtest_pipefds {
+ struct {
+ int pipefd[2];
+ };
+ struct {
+ int readfd;
+ int writefd;
+ };
+};
+
+struct qtest_session {
+ int qtest_socket;
+ pthread_mutex_t qtest_session_lock;
+
+ struct qtest_pci_device_list head;
+ int ivshmem_socket;
+
+ pthread_t event_th;
+ union qtest_pipefds msgfds;
+
+ pthread_t intr_th;
+ union qtest_pipefds irqfds;
+ rte_atomic16_t enable_intr;
+ rte_intr_callback_fn cb;
+ void *cb_arg;
+};
+
+static int
+qtest_write(int fd, char *buf, size_t count)
+{
+ size_t len = count;
+ size_t total_len = 0;
+ int ret = 0;
+
+ while (len > 0) {
+ ret = write(fd, buf, len);
+ if (ret == (int)len)
+ break;
+ if (ret == -1) {
+ if (errno == EINTR)
+ continue;
+ return ret;
+ }
+ total_len += ret;
+ buf += ret;
+ len -= ret;
+ }
+ return total_len + ret;
+}
+
+static int
+qtest_read(int fd, char *buf, size_t count)
+{
+ size_t len = count;
+ size_t total_len = 0;
+ int ret = 0;
+
+ while (len > 0) {
+ ret = read(fd, buf, len);
+ if (ret == (int)len)
+ break;
+ if (*(buf + ret - 1) == '\n')
+ break;
+ if (ret == -1) {
+ if (errno == EINTR)
+ continue;
+ return ret;
+ }
+ total_len += ret;
+ buf += ret;
+ len -= ret;
+ }
+ return total_len + ret;
+}
+
+/*
+ * To know QTest protocol specification, see below QEMU source code.
+ * - qemu/qtest.c
+ */
+static uint32_t
+qtest_in(struct qtest_session *s, uint16_t addr, char type)
+{
+ char buf[1024];
+ int ret;
+
+ if ((type != 'l') && (type != 'w') && (type != 'b'))
+ rte_panic("Invalid value\n");
+
+ snprintf(buf, sizeof(buf), "in%c 0x%x\n", type, addr);
+ /* write to qtest socket */
+ ret = qtest_write(s->qtest_socket, buf, strlen(buf));
+ /* read reply from event handler */
+ ret = qtest_read(s->msgfds.readfd, buf, sizeof(buf));
+ buf[ret] = '\0';
+ return strtoul(buf + strlen("OK "), NULL, 16);
+}
+
+static void
+qtest_out(struct qtest_session *s, uint16_t addr, uint32_t val, char type)
+{
+ char buf[1024];
+ int ret __rte_unused;
+
+ if ((type != 'l') && (type != 'w') && (type != 'b'))
+ rte_panic("Invalid value\n");
+
+ snprintf(buf, sizeof(buf), "out%c 0x%x 0x%x\n", type, addr, val);
+ /* write to qtest socket */
+ ret = qtest_write(s->qtest_socket, buf, strlen(buf));
+ /* read reply from event handler */
+ ret = qtest_read(s->msgfds.readfd, buf, sizeof(buf));
+}
+
+/*
+ * qtest_pci_read/write are based on PCI configuration space specification.
+ * Accroding to the spec, access size of read()/write() should be 4 bytes.
+ */
+static int
+qtest_pci_readb(struct qtest_session *s, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset)
+{
+ uint32_t tmp;
+
+ tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ qtest_out(s, 0xcf8, tmp, 'l');
+ tmp = qtest_in(s, 0xcfc, 'l');
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot unlock mutex\n");
+
+ return (tmp >> ((offset & 0x3) * 8)) & 0xff;
+}
+
+static void
+qtest_pci_writeb(struct qtest_session *s, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset, uint8_t value)
+{
+ uint32_t addr, tmp, pos;
+
+ addr = PCI_CONFIG_ADDR(bus, device, function, offset);
+ pos = (offset % 4) * 8;
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ qtest_out(s, 0xcf8, addr, 'l');
+ tmp = qtest_in(s, 0xcfc, 'l');
+ tmp = (tmp & ~(0xff << pos)) | (value << pos);
+
+ qtest_out(s, 0xcf8, addr, 'l');
+ qtest_out(s, 0xcfc, tmp, 'l');
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot unlock mutex\n");
+}
+
+static uint32_t
+qtest_pci_readl(struct qtest_session *s, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset)
+{
+ uint32_t tmp;
+
+ tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ qtest_out(s, 0xcf8, tmp, 'l');
+ tmp = qtest_in(s, 0xcfc, 'l');
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot unlock mutex\n");
+
+ return tmp;
+}
+
+static void
+qtest_pci_writel(struct qtest_session *s, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset, uint32_t value)
+{
+ uint32_t tmp;
+
+ tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ qtest_out(s, 0xcf8, tmp, 'l');
+ qtest_out(s, 0xcfc, value, 'l');
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot unlock mutex\n");
+}
+
+static uint64_t
+qtest_pci_readq(struct qtest_session *s, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset)
+{
+ uint32_t tmp;
+ uint64_t val;
+
+ tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ qtest_out(s, 0xcf8, tmp, 'l');
+ val = (uint64_t)qtest_in(s, 0xcfc, 'l');
+
+ tmp = PCI_CONFIG_ADDR(bus, device, function, offset + 4);
+
+ qtest_out(s, 0xcf8, tmp, 'l');
+ val |= (uint64_t)qtest_in(s, 0xcfc, 'l') << 32;
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot unlock mutex\n");
+
+ return val;
+}
+
+static void
+qtest_pci_writeq(struct qtest_session *s, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset, uint64_t value)
+{
+ uint32_t tmp;
+
+ tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ qtest_out(s, 0xcf8, tmp, 'l');
+ qtest_out(s, 0xcfc, (uint32_t)(value & 0xffffffff), 'l');
+
+ tmp = PCI_CONFIG_ADDR(bus, device, function, offset + 4);
+
+ qtest_out(s, 0xcf8, tmp, 'l');
+ qtest_out(s, 0xcfc, (uint32_t)(value >> 32), 'l');
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot unlock mutex\n");
+}
+
+/*
+ * virtio_ioport_read/write are Used by virtio-net PMD
+ */
+void
+virtio_ioport_write(struct virtio_hw *hw, uint64_t addr, uint64_t val, char type)
+{
+ struct qtest_session *s = (struct qtest_session *)hw->qsession;
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ qtest_out(s, VIRTIO_NET_IO_START + (uint16_t)addr, val, type);
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+}
+
+uint32_t
+virtio_ioport_read(struct virtio_hw *hw, uint64_t addr, char type)
+{
+ struct qtest_session *s = (struct qtest_session *)hw->qsession;
+ uint32_t val;
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ val = qtest_in(s, VIRTIO_NET_IO_START + (uint16_t)addr, type);
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ return val;
+}
+
+int
+qtest_intr_enable(void *data)
+{
+ struct virtio_hw *hw = ((struct rte_eth_dev_data *)data)->dev_private;
+ struct qtest_session *s;
+
+ s = (struct qtest_session *)hw->qsession;
+ rte_atomic16_set(&s->enable_intr, 1);
+
+ return 0;
+}
+
+int
+qtest_intr_disable(void *data)
+{
+ struct virtio_hw *hw = ((struct rte_eth_dev_data *)data)->dev_private;
+ struct qtest_session *s;
+
+ s = (struct qtest_session *)hw->qsession;
+ rte_atomic16_set(&s->enable_intr, 0);
+
+ return 0;
+}
+
+void
+qtest_intr_callback_register(void *data,
+ rte_intr_callback_fn cb, void *cb_arg)
+{
+ struct virtio_hw *hw = ((struct rte_eth_dev_data *)data)->dev_private;
+ struct qtest_session *s;
+
+ s = (struct qtest_session *)hw->qsession;
+ s->cb = cb;
+ s->cb_arg = cb_arg;
+ rte_atomic16_set(&s->enable_intr, 1);
+}
+
+void
+qtest_intr_callback_unregister(void *data,
+ rte_intr_callback_fn cb __rte_unused,
+ void *cb_arg __rte_unused)
+{
+ struct virtio_hw *hw = ((struct rte_eth_dev_data *)data)->dev_private;
+ struct qtest_session *s;
+
+ s = (struct qtest_session *)hw->qsession;
+ rte_atomic16_set(&s->enable_intr, 0);
+ s->cb = NULL;
+ s->cb_arg = NULL;
+}
+
+static void *
+qtest_intr_handler(void *data) {
+ struct qtest_session *s = (struct qtest_session *)data;
+ char buf[1];
+ int ret;
+
+ for (;;) {
+ ret = qtest_read(s->irqfds.readfd, buf, sizeof(buf));
+ if (ret < 0)
+ return NULL;
+ s->cb(NULL, s->cb_arg);
+ }
+ return NULL;
+}
+
+static int
+qtest_intr_initialize(void *data)
+{
+ struct virtio_hw *hw = ((struct rte_eth_dev_data *)data)->dev_private;
+ struct qtest_session *s;
+ char buf[1024];
+ int ret;
+
+ s = (struct qtest_session *)hw->qsession;
+
+ /* This message will come when interrupt occurs */
+ snprintf(interrupt_message, sizeof(interrupt_message),
+ "IRQ raise %d", VIRTIO_NET_IRQ_NUM);
+
+ snprintf(buf, sizeof(buf), "irq_intercept_in ioapic\n");
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ /* To enable interrupt, send "irq_intercept_in" message to QEMU */
+ ret = qtest_write(s->qtest_socket, buf, strlen(buf));
+ if (ret < 0) {
+ pthread_mutex_unlock(&s->qtest_session_lock);
+ return -1;
+ }
+
+ /* just ignore QEMU response */
+ ret = qtest_read(s->msgfds.readfd, buf, sizeof(buf));
+ if (ret < 0) {
+ pthread_mutex_unlock(&s->qtest_session_lock);
+ return -1;
+ }
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ return 0;
+}
+
+static void
+qtest_handle_one_message(struct qtest_session *s, char *buf)
+{
+ int ret;
+
+ if (strncmp(buf, interrupt_message, strlen(interrupt_message)) == 0) {
+ if (rte_atomic16_read(&s->enable_intr) == 0)
+ return;
+
+ /* relay interrupt to pipe */
+ ret = write(s->irqfds.writefd, "1", 1);
+ if (ret < 0)
+ rte_panic("cannot relay interrupt\n");
+ } else {
+ /* relay normal message to pipe */
+ ret = qtest_write(s->msgfds.writefd, buf, strlen(buf));
+ if (ret < 0)
+ rte_panic("cannot relay normal message\n");
+ }
+}
+
+static char *
+qtest_get_next_message(char *p)
+{
+ p = strchr(p, '\n');
+ if ((p == NULL) || (*(p + 1) == '\0'))
+ return NULL;
+ return p + 1;
+}
+
+static void
+qtest_close_one_socket(int *fd)
+{
+ if (*fd > 0) {
+ close(*fd);
+ *fd = -1;
+ }
+}
+
+static void
+qtest_close_sockets(struct qtest_session *s)
+{
+ qtest_close_one_socket(&s->qtest_socket);
+ qtest_close_one_socket(&s->msgfds.readfd);
+ qtest_close_one_socket(&s->msgfds.writefd);
+ qtest_close_one_socket(&s->irqfds.readfd);
+ qtest_close_one_socket(&s->irqfds.writefd);
+ qtest_close_one_socket(&s->ivshmem_socket);
+}
+
+/*
+ * This thread relays QTest response using pipe.
+ * The function is needed because we need to separate IRQ message from others.
+ */
+static void *
+qtest_event_handler(void *data) {
+ struct qtest_session *s = (struct qtest_session *)data;
+ char buf[1024];
+ char *p;
+ int ret;
+
+ for (;;) {
+ memset(buf, 0, sizeof(buf));
+ ret = qtest_read(s->qtest_socket, buf, sizeof(buf));
+ if (ret < 0) {
+ qtest_close_sockets(s);
+ return NULL;
+ }
+
+ /* may receive multiple messages at the same time */
+ p = buf;
+ do {
+ qtest_handle_one_message(s, p);
+ } while ((p = qtest_get_next_message(p)) != NULL);
+ }
+ return NULL;
+}
+
+static int
+qtest_init_piix3_device(struct qtest_session *s, struct qtest_pci_device *dev)
+{
+ uint8_t bus, device, virtio_net_slot = 0;
+ struct qtest_pci_device *tmpdev;
+ uint8_t pcislot2regaddr[] = { 0xff,
+ 0xff,
+ 0xff,
+ PIIX3_REG_ADDR_PIRQC,
+ PIIX3_REG_ADDR_PIRQD,
+ PIIX3_REG_ADDR_PIRQA,
+ PIIX3_REG_ADDR_PIRQB};
+
+ bus = dev->bus_addr;
+ device = dev->device_addr;
+
+ PMD_DRV_LOG(INFO,
+ "Find %s on virtual PCI bus: %04x:%02x:00.0\n",
+ dev->name, bus, device);
+
+ /* Get slot id that is connected to virtio-net */
+ TAILQ_FOREACH(tmpdev, &s->head, next) {
+ if (strcmp(tmpdev->name, "virtio-net") == 0) {
+ virtio_net_slot = tmpdev->device_addr;
+ break;
+ }
+ }
+
+ if (virtio_net_slot == 0)
+ return -1;
+
+ /*
+ * Set interrupt routing for virtio-net device.
+ * Here is i440fx/piix3 connection settings
+ * ---------------------------------------
+ * PCI Slot3 -> PIRQC
+ * PCI Slot4 -> PIRQD
+ * PCI Slot5 -> PIRQA
+ * PCI Slot6 -> PIRQB
+ */
+ if (pcislot2regaddr[virtio_net_slot] != 0xff) {
+ qtest_pci_writeb(s, bus, device, 0,
+ pcislot2regaddr[virtio_net_slot],
+ VIRTIO_NET_IRQ_NUM);
+ }
+
+ return 0;
+}
+
+/*
+ * Common initialization of PCI device.
+ * To know detail, see pci specification.
+ */
+static int
+qtest_init_pci_device(struct qtest_session *s, struct qtest_pci_device *dev)
+{
+ uint8_t i, bus, device;
+ uint32_t val;
+ uint64_t val64;
+
+ bus = dev->bus_addr;
+ device = dev->device_addr;
+
+ PMD_DRV_LOG(INFO,
+ "Find %s on virtual PCI bus: %04x:%02x:00.0\n",
+ dev->name, bus, device);
+
+ /* Check header type */
+ val = qtest_pci_readb(s, bus, device, 0, REG_ADDR_HEADER_TYPE);
+ if (val != REG_VAL_HEADER_TYPE_ENDPOINT) {
+ PMD_DRV_LOG(ERR, "Unexpected header type %d\n", val);
+ return -1;
+ }
+
+ /* Check BAR type */
+ for (i = 0; i < NB_BAR; i++) {
+ val = qtest_pci_readl(s, bus, device, 0, dev->bar[i].addr);
+
+ switch (dev->bar[i].type) {
+ case QTEST_PCI_BAR_IO:
+ if ((val & 0x1) != REG_VAL_BAR_IO)
+ goto error;
+ break;
+ case QTEST_PCI_BAR_MEMORY_UNDER_1MB:
+ if ((val & 0x1) != REG_VAL_BAR_MEMORY)
+ goto error;
+ if ((val & 0x6) != REG_VAL_BAR_LOCATE_UNDER_1MB)
+ goto error;
+ break;
+ case QTEST_PCI_BAR_MEMORY_32:
+ if ((val & 0x1) != REG_VAL_BAR_MEMORY)
+ goto error;
+ if ((val & 0x6) != REG_VAL_BAR_LOCATE_32)
+ goto error;
+ break;
+ case QTEST_PCI_BAR_MEMORY_64:
+ if ((val & 0x1) != REG_VAL_BAR_MEMORY)
+ goto error;
+ if ((val & 0x6) != REG_VAL_BAR_LOCATE_64)
+ goto error;
+ break;
+ case QTEST_PCI_BAR_DISABLE:
+ break;
+ }
+ }
+
+ /* Enable device */
+ val = qtest_pci_readl(s, bus, device, 0, REG_ADDR_COMMAND);
+ val |= REG_VAL_COMMAND_IO | REG_VAL_COMMAND_MEMORY | REG_VAL_COMMAND_MASTER;
+ qtest_pci_writel(s, bus, device, 0, REG_ADDR_COMMAND, val);
+
+ /* Calculate BAR size */
+ for (i = 0; i < NB_BAR; i++) {
+ switch (dev->bar[i].type) {
+ case QTEST_PCI_BAR_IO:
+ case QTEST_PCI_BAR_MEMORY_UNDER_1MB:
+ case QTEST_PCI_BAR_MEMORY_32:
+ qtest_pci_writel(s, bus, device, 0,
+ dev->bar[i].addr, 0xffffffff);
+ val = qtest_pci_readl(s, bus, device,
+ 0, dev->bar[i].addr);
+ dev->bar[i].region_size = ~(val & 0xfffffff0) + 1;
+ break;
+ case QTEST_PCI_BAR_MEMORY_64:
+ qtest_pci_writeq(s, bus, device, 0,
+ dev->bar[i].addr, 0xffffffffffffffff);
+ val64 = qtest_pci_readq(s, bus, device,
+ 0, dev->bar[i].addr);
+ dev->bar[i].region_size =
+ ~(val64 & 0xfffffffffffffff0) + 1;
+ break;
+ case QTEST_PCI_BAR_DISABLE:
+ break;
+ }
+ }
+
+ /* Set BAR region */
+ for (i = 0; i < NB_BAR; i++) {
+ switch (dev->bar[i].type) {
+ case QTEST_PCI_BAR_IO:
+ case QTEST_PCI_BAR_MEMORY_UNDER_1MB:
+ case QTEST_PCI_BAR_MEMORY_32:
+ qtest_pci_writel(s, bus, device, 0, dev->bar[i].addr,
+ dev->bar[i].region_start);
+ PMD_DRV_LOG(INFO, "Set BAR of %s device: 0x%lx - 0x%lx\n",
+ dev->name, dev->bar[i].region_start,
+ dev->bar[i].region_start + dev->bar[i].region_size);
+ break;
+ case QTEST_PCI_BAR_MEMORY_64:
+ qtest_pci_writeq(s, bus, device, 0, dev->bar[i].addr,
+ dev->bar[i].region_start);
+ PMD_DRV_LOG(INFO, "Set BAR of %s device: 0x%lx - 0x%lx\n",
+ dev->name, dev->bar[i].region_start,
+ dev->bar[i].region_start + dev->bar[i].region_size);
+ break;
+ case QTEST_PCI_BAR_DISABLE:
+ break;
+ }
+ }
+
+ return 0;
+
+error:
+ PMD_DRV_LOG(ERR, "Unexpected BAR type\n");
+ return -1;
+}
+
+static void
+qtest_find_pci_device(struct qtest_session *s, uint16_t bus, uint8_t device)
+{
+ struct qtest_pci_device *dev;
+ uint32_t val;
+
+ val = qtest_pci_readl(s, bus, device, 0, 0);
+ TAILQ_FOREACH(dev, &s->head, next) {
+ if (val == ((uint32_t)dev->device_id << 16 | dev->vendor_id)) {
+ /* device is found, then store it */
+ dev->bus_addr = bus;
+ dev->device_addr = device;
+ return;
+ }
+ }
+}
+
+static int
+qtest_init_pci_devices(struct qtest_session *s)
+{
+ struct qtest_pci_device *dev;
+ uint16_t bus;
+ uint8_t device;
+ int ret;
+
+ /* Find devices */
+ bus = 0;
+ do {
+ device = 0;
+ do {
+ qtest_find_pci_device(s, bus, device);
+ } while (device++ != NB_DEVICE - 1);
+ } while (bus++ != NB_BUS - 1);
+
+ /* Initialize devices */
+ TAILQ_FOREACH(dev, &s->head, next) {
+ ret = dev->init(s, dev);
+ if (ret != 0)
+ return ret;
+ }
+
+ return 0;
+}
+
+struct rte_pci_id
+qtest_get_pci_id_of_virtio_net(void)
+{
+ struct rte_pci_id id = {VIRTIO_NET_DEVICE_ID,
+ VIRTIO_NET_VENDOR_ID, PCI_ANY_ID, PCI_ANY_ID};
+
+ return id;
+}
+
+static int
+qtest_register_target_devices(struct qtest_session *s)
+{
+ struct qtest_pci_device *virtio_net, *ivshmem, *piix3;
+ const struct rte_memseg *ms;
+
+ ms = rte_eal_get_physmem_layout();
+ /* if EAL memory size isn't pow of 2, ivshmem refuse it */
+ if ((ms[0].len & (ms[0].len - 1)) != 0) {
+ PMD_DRV_LOG(ERR, "memory size must be power of 2\n");
+ return -1;
+ }
+
+ virtio_net = malloc(sizeof(*virtio_net));
+ if (virtio_net == NULL)
+ return -1;
+
+ ivshmem = malloc(sizeof(*ivshmem));
+ if (ivshmem == NULL)
+ return -1;
+
+ piix3 = malloc(sizeof(*piix3));
+ if (piix3 == NULL)
+ return -1;
+
+ memset(virtio_net, 0, sizeof(*virtio_net));
+ memset(ivshmem, 0, sizeof(*ivshmem));
+
+ TAILQ_INIT(&s->head);
+
+ virtio_net->name = "virtio-net";
+ virtio_net->device_id = VIRTIO_NET_DEVICE_ID;
+ virtio_net->vendor_id = VIRTIO_NET_VENDOR_ID;
+ virtio_net->init = qtest_init_pci_device;
+ virtio_net->bar[0].addr = REG_ADDR_BAR0;
+ virtio_net->bar[0].type = QTEST_PCI_BAR_IO;
+ virtio_net->bar[0].region_start = VIRTIO_NET_IO_START;
+ TAILQ_INSERT_TAIL(&s->head, virtio_net, next);
+
+ ivshmem->name = "ivshmem";
+ ivshmem->device_id = IVSHMEM_DEVICE_ID;
+ ivshmem->vendor_id = IVSHMEM_VENDOR_ID;
+ ivshmem->init = qtest_init_pci_device;
+ ivshmem->bar[0].addr = REG_ADDR_BAR0;
+ ivshmem->bar[0].type = QTEST_PCI_BAR_MEMORY_32;
+ ivshmem->bar[0].region_start = IVSHMEM_MEMORY_START;
+ ivshmem->bar[1].addr = REG_ADDR_BAR2;
+ ivshmem->bar[1].type = QTEST_PCI_BAR_MEMORY_64;
+ /* In host mode, only one memory segment is vaild */
+ ivshmem->bar[1].region_start = ms[0].phys_addr;
+ TAILQ_INSERT_TAIL(&s->head, ivshmem, next);
+
+ /* piix3 is needed to route irqs from virtio-net to ioapic */
+ piix3->name = "piix3";
+ piix3->device_id = PIIX3_DEVICE_ID;
+ piix3->vendor_id = PIIX3_VENDOR_ID;
+ piix3->init = qtest_init_piix3_device;
+ TAILQ_INSERT_TAIL(&s->head, piix3, next);
+
+ return 0;
+}
+
+static int
+qtest_send_message_to_ivshmem(int sock_fd, uint64_t client_id, int shm_fd)
+{
+ struct iovec iov;
+ struct msghdr msgh;
+ size_t fdsize = sizeof(int);
+ char control[CMSG_SPACE(fdsize)];
+ struct cmsghdr *cmsg;
+ int ret;
+
+ memset(&msgh, 0, sizeof(msgh));
+ iov.iov_base = &client_id;
+ iov.iov_len = sizeof(client_id);
+
+ msgh.msg_iov = &iov;
+ msgh.msg_iovlen = 1;
+
+ if (shm_fd >= 0) {
+ msgh.msg_control = &control;
+ msgh.msg_controllen = sizeof(control);
+ cmsg = CMSG_FIRSTHDR(&msgh);
+ cmsg->cmsg_len = CMSG_LEN(fdsize);
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ memcpy(CMSG_DATA(cmsg), &shm_fd, fdsize);
+ }
+
+ do {
+ ret = sendmsg(sock_fd, &msgh, 0);
+ } while (ret < 0 && errno == EINTR);
+
+ if (ret < 0) {
+ PMD_DRV_LOG(ERR, "sendmsg error\n");
+ return ret;
+ }
+
+ return ret;
+}
+
+static int
+qtest_open_shared_memory(void)
+{
+ const struct rte_memseg *ms;
+ int shm_fd = -1;
+ uint64_t vaddr;
+ char buf[1024];
+ char *p;
+ FILE *f;
+
+ ms = rte_eal_get_physmem_layout();
+ f = fopen("/proc/self/maps", "r");
+ if (f == NULL)
+ return -1;
+
+ /* parse maps */
+ while (fgets(buf, sizeof(buf), f) != NULL) {
+ /* get vaddr */
+ vaddr = strtoul(buf, NULL, 16);
+
+ /* check if this region is EAL memory */
+ if (vaddr == ms[0].addr_64) {
+ p = strchr(buf, '/');
+ if (p == NULL)
+ return -1;
+ buf[strlen(buf) - 1] = '\0';
+ shm_fd = open(p, O_RDWR);
+ break;
+ }
+ }
+ fclose(f);
+
+ return shm_fd;
+}
+
+static int
+qtest_setup_shared_memory(struct qtest_session *s)
+{
+ int shm_fd, ret;
+
+ /* To share DPDK EAL memory, open EAL memory again */
+ shm_fd = qtest_open_shared_memory();
+ if (shm_fd < 0) {
+ PMD_DRV_LOG(ERR,
+ "Failed to open EAL memory\n");
+ return -1;
+ }
+
+ /* send our protocol version first */
+ ret = qtest_send_message_to_ivshmem(s->ivshmem_socket,
+ IVSHMEM_PROTOCOL_VERSION, -1);
+ if (ret < 0) {
+ PMD_DRV_LOG(ERR,
+ "Failed to send protocol version to ivshmem\n");
+ return -1;
+ }
+
+ /* send client id */
+ ret = qtest_send_message_to_ivshmem(s->ivshmem_socket, 0, -1);
+ if (ret < 0) {
+ PMD_DRV_LOG(ERR, "Failed to send VMID to ivshmem\n");
+ return -1;
+ }
+
+ /* send message to ivshmem */
+ ret = qtest_send_message_to_ivshmem(s->ivshmem_socket, -1, shm_fd);
+ if (ret < 0) {
+ PMD_DRV_LOG(ERR, "Failed to file descriptor to ivshmem\n");
+ return -1;
+ }
+
+ /* close EAL memory again */
+ close(shm_fd);
+
+ return 0;
+}
+
+int
+qtest_vdev_init(struct rte_eth_dev_data *data,
+ int qtest_socket, int ivshmem_socket)
+{
+ struct virtio_hw *hw = ((struct rte_eth_dev_data *)data)->dev_private;
+ struct qtest_session *s;
+ int ret;
+
+ s = rte_zmalloc(NULL, sizeof(*s), RTE_CACHE_LINE_SIZE);
+
+ ret = pipe(s->msgfds.pipefd);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to initialize message pipe\n");
+ return -1;
+ }
+
+ ret = pipe(s->irqfds.pipefd);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to initialize irq pipe\n");
+ return -1;
+ }
+
+ ret = qtest_register_target_devices(s);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to initialize qtest session\n");
+ return -1;
+ }
+
+ ret = pthread_mutex_init(&s->qtest_session_lock, NULL);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to initialize mutex\n");
+ return -1;
+ }
+
+ rte_atomic16_set(&s->enable_intr, 0);
+ s->qtest_socket = qtest_socket;
+ s->ivshmem_socket = ivshmem_socket;
+ hw->qsession = (void *)s;
+
+ ret = pthread_create(&s->event_th, NULL, qtest_event_handler, s);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to create event handler\n");
+ return -1;
+ }
+
+ ret = pthread_create(&s->intr_th, NULL, qtest_intr_handler, s);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to create interrupt handler\n");
+ return -1;
+ }
+
+ ret = qtest_intr_initialize(data);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to initialize interrupt\n");
+ return -1;
+ }
+
+ ret = qtest_setup_shared_memory(s);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to setup shared memory\n");
+ return -1;
+ }
+
+ ret = qtest_init_pci_devices(s);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to initialize devices\n");
+ return -1;
+ }
+
+ return 0;
+}
+
+static void
+qtest_remove_target_devices(struct qtest_session *s)
+{
+ struct qtest_pci_device *dev, *next;
+
+ for (dev = TAILQ_FIRST(&s->head); dev != NULL; dev = next) {
+ next = TAILQ_NEXT(dev, next);
+ TAILQ_REMOVE(&s->head, dev, next);
+ free(dev);
+ }
+}
+
+void
+qtest_vdev_uninit(struct rte_eth_dev_data *data)
+{
+ struct virtio_hw *hw = ((struct rte_eth_dev_data *)data)->dev_private;
+ struct qtest_session *s;
+
+ s = (struct qtest_session *)hw->qsession;
+
+ qtest_close_sockets(s);
+
+ pthread_cancel(s->event_th);
+ pthread_join(s->event_th, NULL);
+
+ pthread_cancel(s->intr_th);
+ pthread_join(s->intr_th, NULL);
+
+ pthread_mutex_destroy(&s->qtest_session_lock);
+
+ qtest_remove_target_devices(s);
+
+ rte_free(s);
+}
diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index d928339..234b561 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -36,6 +36,11 @@
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#endif
#ifdef RTE_EXEC_ENV_LINUXAPP
#include <dirent.h>
#include <fcntl.h>
@@ -56,6 +61,10 @@
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_dev.h>
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+#include <rte_eal_memconfig.h>
+#include <rte_kvargs.h>
+#endif

#include "virtio_ethdev.h"
#include "virtio_pci.h"
@@ -491,8 +500,12 @@ virtio_dev_close(struct rte_eth_dev *dev)
PMD_INIT_LOG(DEBUG, "virtio_dev_close");

/* reset the NIC */
- if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+ if (((dev->dev_type == RTE_ETH_DEV_PCI) &&
+ (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) ||
+ ((dev->dev_type == RTE_ETH_DEV_VIRTUAL) &&
+ (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC))) {
vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
+ }
vtpci_reset(hw);
hw->started = 0;
virtio_dev_free_mbufs(dev);
@@ -1233,15 +1246,22 @@ virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
isr = vtpci_isr(hw);
PMD_DRV_LOG(INFO, "interrupt status = %#x", isr);

- if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0)
- PMD_DRV_LOG(ERR, "interrupt enable failed");
+ if (dev->dev_type == RTE_ETH_DEV_PCI) {
+ if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0)
+ PMD_DRV_LOG(ERR, "interrupt enable failed");
+ }
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if (dev->dev_type == RTE_ETH_DEV_VIRTUAL) {
+ if (qtest_intr_enable(dev->data) < 0)
+ PMD_DRV_LOG(ERR, "interrupt enable failed");
+ }
+#endif

if (isr & VIRTIO_PCI_ISR_CONFIG) {
if (virtio_dev_link_update(dev, 0) == 0)
_rte_eth_dev_callback_process(dev,
RTE_ETH_EVENT_INTR_LSC);
}
-
}

static void
@@ -1264,7 +1284,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
struct virtio_hw *hw = eth_dev->data->dev_private;
struct virtio_net_config *config;
struct virtio_net_config local_config;
- struct rte_pci_device *pci_dev;
+ struct rte_pci_device *pci_dev = eth_dev->pci_dev;
+ struct rte_pci_id id;

RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr));

@@ -1285,13 +1306,20 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
return -ENOMEM;
}

- pci_dev = eth_dev->pci_dev;
+ if (eth_dev->dev_type == RTE_ETH_DEV_PCI) {
+ if (virtio_resource_init(pci_dev) < 0)
+ return -1;

- if (virtio_resource_init(pci_dev) < 0)
- return -1;
-
- hw->use_msix = virtio_has_msix(&pci_dev->addr);
- hw->io_base = (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;
+ hw->use_msix = virtio_has_msix(&pci_dev->addr);
+ hw->io_base = (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;
+ }
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if (eth_dev->dev_type == RTE_ETH_DEV_VIRTUAL) {
+ hw->use_msix = 0;
+ hw->io_base = 0;
+ eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;
+ }
+#endif

/* Reset the device although not necessary at startup */
vtpci_reset(hw);
@@ -1304,8 +1332,14 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
virtio_negotiate_features(hw);

/* If host does not support status then disable LSC */
- if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS))
- pci_dev->driver->drv_flags &= ~RTE_PCI_DRV_INTR_LSC;
+ if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
+ if (eth_dev->dev_type == RTE_ETH_DEV_PCI)
+ pci_dev->driver->drv_flags &= ~RTE_PCI_DRV_INTR_LSC;
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if (eth_dev->dev_type == RTE_ETH_DEV_VIRTUAL)
+ eth_dev->data->dev_flags &= ~RTE_ETH_DEV_INTR_LSC;
+#endif
+ }

rte_eth_copy_pci_info(eth_dev, pci_dev);

@@ -1383,14 +1417,30 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d hw->max_tx_queues=%d",
hw->max_rx_queues, hw->max_tx_queues);
+
+ memset(&id, 0, sizeof(id));
+ if (eth_dev->dev_type == RTE_ETH_DEV_PCI)
+ id = pci_dev->id;
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if (eth_dev->dev_type == RTE_ETH_DEV_VIRTUAL)
+ id = qtest_get_pci_id_of_virtio_net();
+#endif
+
PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
- eth_dev->data->port_id, pci_dev->id.vendor_id,
- pci_dev->id.device_id);
+ eth_dev->data->port_id,
+ id.vendor_id, id.device_id);

/* Setup interrupt callback */
- if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+ if ((eth_dev->dev_type == RTE_ETH_DEV_PCI) &&
+ (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC))
rte_intr_callback_register(&pci_dev->intr_handle,
- virtio_interrupt_handler, eth_dev);
+ virtio_interrupt_handler, eth_dev);
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if ((eth_dev->dev_type == RTE_ETH_DEV_VIRTUAL) &&
+ (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC))
+ qtest_intr_callback_register(eth_dev->data,
+ virtio_interrupt_handler, eth_dev);
+#endif

virtio_dev_cq_start(eth_dev);

@@ -1424,10 +1474,17 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
eth_dev->data->mac_addrs = NULL;

/* reset interrupt callback */
- if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+ if ((eth_dev->dev_type == RTE_ETH_DEV_PCI) &&
+ (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC))
rte_intr_callback_unregister(&pci_dev->intr_handle,
virtio_interrupt_handler,
eth_dev);
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if ((eth_dev->dev_type == RTE_ETH_DEV_VIRTUAL) &&
+ (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC))
+ qtest_intr_callback_unregister(eth_dev->data,
+ virtio_interrupt_handler, eth_dev);
+#endif

PMD_INIT_LOG(DEBUG, "dev_uninit completed");

@@ -1491,11 +1548,15 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return -ENOTSUP;
}

- if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+ if (((dev->dev_type == RTE_ETH_DEV_PCI) &&
+ (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) ||
+ ((dev->dev_type == RTE_ETH_DEV_VIRTUAL) &&
+ (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC))) {
if (vtpci_irq_config(hw, 0) == VIRTIO_MSI_NO_VECTOR) {
PMD_DRV_LOG(ERR, "failed to set config vector");
return -EBUSY;
}
+ }

return 0;
}
@@ -1510,15 +1571,31 @@ virtio_dev_start(struct rte_eth_dev *dev)

/* check if lsc interrupt feature is enabled */
if (dev->data->dev_conf.intr_conf.lsc) {
- if (!(pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) {
- PMD_DRV_LOG(ERR, "link status not supported by host");
- return -ENOTSUP;
- }
+ if (dev->dev_type == RTE_ETH_DEV_PCI) {
+ if (!(pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) {
+ PMD_DRV_LOG(ERR,
+ "link status not supported by host");
+ return -ENOTSUP;
+ }

- if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0) {
- PMD_DRV_LOG(ERR, "interrupt enable failed");
- return -EIO;
+ if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0) {
+ PMD_DRV_LOG(ERR, "interrupt enable failed");
+ return -EIO;
+ }
}
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if (dev->dev_type == RTE_ETH_DEV_VIRTUAL) {
+ if (!(dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)) {
+ PMD_DRV_LOG(ERR,
+ "link status not supported by host");
+ return -ENOTSUP;
+ }
+ if (qtest_intr_enable(dev->data) < 0) {
+ PMD_DRV_LOG(ERR, "interrupt enable failed");
+ return -EIO;
+ }
+ }
+#endif
}

/* Initialize Link state */
@@ -1615,8 +1692,15 @@ virtio_dev_stop(struct rte_eth_dev *dev)

PMD_INIT_LOG(DEBUG, "stop");

- if (dev->data->dev_conf.intr_conf.lsc)
- rte_intr_disable(&dev->pci_dev->intr_handle);
+ if (dev->data->dev_conf.intr_conf.lsc) {
+ if (dev->dev_type == RTE_ETH_DEV_PCI)
+ rte_intr_disable(&dev->pci_dev->intr_handle);
+
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ if (dev->dev_type == RTE_ETH_DEV_VIRTUAL)
+ qtest_intr_disable(dev->data);
+#endif
+ }

memset(&link, 0, sizeof(link));
virtio_dev_atomic_write_link_status(dev, &link);
@@ -1661,7 +1745,13 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
{
struct virtio_hw *hw = dev->data->dev_private;

- dev_info->driver_name = dev->driver->pci_drv.name;
+ if (dev->dev_type == RTE_ETH_DEV_PCI)
+ dev_info->driver_name = dev->driver->pci_drv.name;
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if (dev->dev_type == RTE_ETH_DEV_VIRTUAL)
+ dev_info->driver_name = dev->data->drv_name;
+#endif
+
dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
dev_info->min_rx_bufsize = VIRTIO_MIN_RX_BUFSIZE;
@@ -1689,3 +1779,196 @@ static struct rte_driver rte_virtio_driver = {
};

PMD_REGISTER_DRIVER(rte_virtio_driver);
+
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+
+#define ETH_VIRTIO_NET_ARG_QTEST_PATH "qtest"
+#define ETH_VIRTIO_NET_ARG_IVSHMEM_PATH "ivshmem"
+
+static const char *valid_args[] = {
+ ETH_VIRTIO_NET_ARG_QTEST_PATH,
+ ETH_VIRTIO_NET_ARG_IVSHMEM_PATH,
+ NULL
+};
+
+static int
+get_string_arg(const char *key __rte_unused,
+ const char *value, void *extra_args)
+{
+ int ret, fd, loop = 3;
+ int *pfd = extra_args;
+ struct sockaddr_un sa = {0};
+
+ if ((value == NULL) || (extra_args == NULL))
+ return -EINVAL;
+
+ fd = socket(AF_UNIX, SOCK_STREAM, 0);
+ if (fd < 0)
+ return -1;
+
+ sa.sun_family = AF_UNIX;
+ strncpy(sa.sun_path, value, sizeof(sa.sun_path));
+
+ while (loop--) {
+ /*
+ * may need to wait for qtest and ivshmem
+ * sockets are prepared by QEMU.
+ */
+ ret = connect(fd, (struct sockaddr *)&sa,
+ sizeof(struct sockaddr_un));
+ if (ret != 0)
+ sleep(1);
+ else
+ break;
+ }
+
+ if (ret != 0) {
+ close(fd);
+ return -1;
+ }
+
+ *pfd = fd;
+
+ return 0;
+}
+
+static struct rte_eth_dev *
+virtio_net_eth_dev_alloc(const char *name)
+{
+ struct rte_eth_dev *eth_dev;
+ struct rte_eth_dev_data *data;
+ struct virtio_hw *hw;
+
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+ if (eth_dev == NULL)
+ rte_panic("cannot alloc rte_eth_dev\n");
+
+ data = eth_dev->data;
+
+ hw = rte_zmalloc(NULL, sizeof(*hw), 0);
+ if (!hw)
+ rte_panic("malloc virtio_hw failed\n");
+
+ data->dev_private = hw;
+ eth_dev->driver = &rte_virtio_pmd;
+ return eth_dev;
+}
+
+/*
+ * Initialization when "CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE" is enabled.
+ */
+static int
+rte_virtio_net_pmd_init(const char *name, const char *params)
+{
+ struct rte_kvargs *kvlist = NULL;
+ struct rte_eth_dev *eth_dev = NULL;
+ int ret, qtest_sock, ivshmem_sock;
+ struct rte_mem_config *mcfg;
+
+ if (params == NULL || params[0] == '\0')
+ goto error;
+
+ /* get pointer to global configuration */
+ mcfg = rte_eal_get_configuration()->mem_config;
+
+ /* Check if EAL memory consists of one memory segment */
+ if ((RTE_MAX_MEMSEG > 1) && (mcfg->memseg[1].addr != NULL)) {
+ PMD_INIT_LOG(ERR, "Non contigious memory");
+ goto error;
+ }
+
+ kvlist = rte_kvargs_parse(params, valid_args);
+ if (!kvlist) {
+ PMD_INIT_LOG(ERR, "error when parsing param");
+ goto error;
+ }
+
+ if (rte_kvargs_count(kvlist, ETH_VIRTIO_NET_ARG_IVSHMEM_PATH) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VIRTIO_NET_ARG_IVSHMEM_PATH,
+ &get_string_arg, &ivshmem_sock);
+ if (ret != 0) {
+ PMD_INIT_LOG(ERR,
+ "Failed to connect to ivshmem socket");
+ goto error;
+ }
+ } else {
+ PMD_INIT_LOG(ERR, "No argument specified for %s",
+ ETH_VIRTIO_NET_ARG_IVSHMEM_PATH);
+ goto error;
+ }
+
+ if (rte_kvargs_count(kvlist, ETH_VIRTIO_NET_ARG_QTEST_PATH) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VIRTIO_NET_ARG_QTEST_PATH,
+ &get_string_arg, &qtest_sock);
+ if (ret != 0) {
+ PMD_INIT_LOG(ERR,
+ "Failed to connect to qtest socket");
+ goto error;
+ }
+ } else {
+ PMD_INIT_LOG(ERR, "No argument specified for %s",
+ ETH_VIRTIO_NET_ARG_QTEST_PATH);
+ goto error;
+ }
+
+ eth_dev = virtio_net_eth_dev_alloc(name);
+
+ qtest_vdev_init(eth_dev->data, qtest_sock, ivshmem_sock);
+
+ /* originally, this will be called in rte_eal_pci_probe() */
+ eth_virtio_dev_init(eth_dev);
+
+ eth_dev->driver = NULL;
+ eth_dev->data->dev_flags |= RTE_ETH_DEV_DETACHABLE;
+ eth_dev->data->kdrv = RTE_KDRV_NONE;
+ eth_dev->data->drv_name = "rte_virtio_pmd";
+
+ rte_kvargs_free(kvlist);
+ return 0;
+
+error:
+ rte_kvargs_free(kvlist);
+ return -EFAULT;
+}
+
+/*
+ * Finalization when "CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE" is enabled.
+ */
+static int
+rte_virtio_net_pmd_uninit(const char *name)
+{
+ struct rte_eth_dev *eth_dev = NULL;
+ int ret;
+
+ if (name == NULL)
+ return -EINVAL;
+
+ /* find the ethdev entry */
+ eth_dev = rte_eth_dev_allocated(name);
+ if (eth_dev == NULL)
+ return -ENODEV;
+
+ ret = eth_virtio_dev_uninit(eth_dev);
+ if (ret != 0)
+ return -EFAULT;
+
+ qtest_vdev_uninit(eth_dev->data);
+ rte_free(eth_dev->data->dev_private);
+
+ ret = rte_eth_dev_release_port(eth_dev);
+ if (ret != 0)
+ return -EFAULT;
+
+ return 0;
+}
+
+static struct rte_driver rte_virtio_net_driver = {
+ .name = "eth_virtio_net",
+ .type = PMD_VDEV,
+ .init = rte_virtio_net_pmd_init,
+ .uninit = rte_virtio_net_pmd_uninit,
+};
+
+PMD_REGISTER_DRIVER(rte_virtio_net_driver);
+
+#endif /* RTE_LIBRTE_VIRTIO_HOST_MODE */
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index ae2d47d..eefc7be 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -122,5 +122,17 @@ uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
#define VTNET_LRO_FEATURES (VIRTIO_NET_F_GUEST_TSO4 | \
VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN)

+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+int qtest_vdev_init(struct rte_eth_dev_data *data,
+ int qtest_socket, int ivshmem_socket);
+void qtest_vdev_uninit(struct rte_eth_dev_data *data);
+void qtest_intr_callback_register(void *data,
+ rte_intr_callback_fn cb, void *cb_arg);
+void qtest_intr_callback_unregister(void *data,
+ rte_intr_callback_fn cb, void *cb_arg);
+int qtest_intr_enable(void *data);
+int qtest_intr_disable(void *data);
+struct rte_pci_id qtest_get_pci_id_of_virtio_net(void);
+#endif /* RTE_LIBRTE_VIRTIO_HOST_MODE */

#endif /* _VIRTIO_ETHDEV_H_ */
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 47f722a..d4ede73 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -165,6 +165,9 @@ struct virtqueue;

struct virtio_hw {
struct virtqueue *cvq;
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ void *qsession;
+#endif
uint32_t io_base;
uint32_t guest_features;
uint32_t max_tx_queues;
@@ -226,6 +229,26 @@ outl_p(unsigned int data, unsigned int port)
}
#endif

+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+
+uint32_t virtio_ioport_read(struct virtio_hw *, uint64_t, char type);
+void virtio_ioport_write(struct virtio_hw *, uint64_t, uint64_t, char type);
+
+#define VIRTIO_READ_REG_1(hw, reg) \
+ virtio_ioport_read(hw, reg, 'b')
+#define VIRTIO_WRITE_REG_1(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value, 'b')
+#define VIRTIO_READ_REG_2(hw, reg) \
+ virtio_ioport_read(hw, reg, 'w')
+#define VIRTIO_WRITE_REG_2(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value, 'w')
+#define VIRTIO_READ_REG_4(hw, reg) \
+ virtio_ioport_read(hw, reg, 'l')
+#define VIRTIO_WRITE_REG_4(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value, 'l')
+
+#else /* RTE_LIBRTE_VIRTIO_HOST_MODE */
+
#define VIRTIO_PCI_REG_ADDR(hw, reg) \
(unsigned short)((hw)->io_base + (reg))

@@ -244,6 +267,8 @@ outl_p(unsigned int data, unsigned int port)
#define VIRTIO_WRITE_REG_4(hw, reg, value) \
outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg))))

+#endif /* RTE_LIBRTE_VIRTIO_HOST_MODE */
+
static inline int
vtpci_with_feature(struct virtio_hw *hw, uint32_t bit)
{
--
2.1.4
Pavel Fedin
2015-12-28 11:57:41 UTC
Permalink
Hello!
-----Original Message-----
Sent: Wednesday, December 16, 2015 11:37 AM
Subject: [dpdk-dev] [PATCH v1 2/2] virtio: Extend virtio-net PMD to support container
environment
The patch adds a new virtio-net PMD configuration that allows the PMD to
work on host as if the PMD is in VM.
Here is new configuration for virtio-net PMD.
- CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE
To use this mode, EAL needs physically contiguous memory. To allocate
such memory, enable below option, and add "--contig-mem" option to
application command line.
- CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS
To prepare virtio-net device on host, the users need to invoke QEMU process
in special qtest mode. This mode is mainly used for testing QEMU devices
from outer process. In this mode, no guest runs.
Here is QEMU command line.
$ qemu-system-x86_64 \
-machine pc-i440fx-1.4,accel=qtest \
-display none -qtest-log /dev/null \
-qtest unix:/tmp/socket,server \
-netdev type=tap,script=/etc/qemu-ifup,id=net0,queues=1 \
-device virtio-net-pci,netdev=net0,mq=on \
-chardev socket,id=chr1,path=/tmp/ivshmem,server \
-device ivshmem,size=1G,chardev=chr1,vectors=1
* QEMU process is needed per port.
* In most cases, just using above command is enough.
* The vhost backends like vhost-net and vhost-user can be specified.
* Only checked "pc-i440fx-1.4" machine, but may work with other
machines. It depends on a machine has piix3 south bridge.
If the machine doesn't have, virtio-net PMD cannot receive status
changed interrupts.
* Should not add "--enable-kvm" to QEMU command line.
After invoking QEMU, the PMD can connect to QEMU process using unix
domain sockets. Over these sockets, virtio-net, ivshmem and piix3
device in QEMU are probed by the PMD.
Here is example of command line.
$ testpmd -c f -n 1 -m 1024 --contig-mem \
--vdev="eth_virtio_net0,qtest=/tmp/socket,ivshmem=/tmp/ivshmem" \
-- --disable-hw-vlan --txqflags=0xf00 -i
Please specify same unix domain sockets and memory size in both QEMU and
DPDK command lines like above.
The share memory size should be power of 2, because ivshmem only accepts
such memry size.
Also, "--contig-mem" option is needed for the PMD like above. This option
allocates contiguous memory, and create one hugepage file on hugetlbfs.
If there is no enough contiguous memory, initialization will be failed.
This contiguous memory is used as shared memory between DPDK application
and ivshmem device in QEMU.
---
config/common_linuxapp | 1 +
drivers/net/virtio/Makefile | 4 +
drivers/net/virtio/qtest.c | 1107 ++++++++++++++++++++++++++++++++++++
drivers/net/virtio/virtio_ethdev.c | 341 ++++++++++-
drivers/net/virtio/virtio_ethdev.h | 12 +
drivers/net/virtio/virtio_pci.h | 25 +
6 files changed, 1461 insertions(+), 29 deletions(-)
create mode 100644 drivers/net/virtio/qtest.c
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 74bc515..eaa720c 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -269,6 +269,7 @@ CONFIG_RTE_LIBRTE_PMD_SZEDATA2=n
# Compile burst-oriented VIRTIO PMD driver
#
CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
+CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE=n
CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_INIT=n
CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_RX=n
CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_TX=n
diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 43835ba..697e629 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -52,6 +52,10 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
+ifeq ($(CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE),y)
+ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += qtest.c
+endif
+
DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_mempool lib/librte_mbuf
diff --git a/drivers/net/virtio/qtest.c b/drivers/net/virtio/qtest.c
new file mode 100644
index 0000000..4ffdefb
--- /dev/null
+++ b/drivers/net/virtio/qtest.c
@@ -0,0 +1,1107 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 IGEL Co., Ltd. All rights reserved.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of IGEL Co., Ltd. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/queue.h>
+#include <signal.h>
+#include <pthread.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
+#include <rte_memory.h>
+#include <rte_malloc.h>
+#include <rte_common.h>
+#include <rte_interrupts.h>
+
+#include "virtio_pci.h"
+#include "virtio_logs.h"
+#include "virtio_ethdev.h"
+
+#define NB_BUS 256
+#define NB_DEVICE 32
+#define NB_BAR 6
+
+/* PCI common configuration registers */
+#define REG_ADDR_VENDOR_ID 0x0
+#define REG_ADDR_DEVICE_ID 0x2
+#define REG_ADDR_COMMAND 0x4
+#define REG_ADDR_STATUS 0x6
+#define REG_ADDR_REVISION_ID 0x8
+#define REG_ADDR_CLASS_CODE 0x9
+#define REG_ADDR_CACHE_LINE_S 0xc
+#define REG_ADDR_LAT_TIMER 0xd
+#define REG_ADDR_HEADER_TYPE 0xe
+#define REG_ADDR_BIST 0xf
+#define REG_ADDR_BAR0 0x10
+#define REG_ADDR_BAR1 0x14
+#define REG_ADDR_BAR2 0x18
+#define REG_ADDR_BAR3 0x1c
+#define REG_ADDR_BAR4 0x20
+#define REG_ADDR_BAR5 0x24
+
+/* PCI common configuration register values */
+#define REG_VAL_COMMAND_IO 0x1
+#define REG_VAL_COMMAND_MEMORY 0x2
+#define REG_VAL_COMMAND_MASTER 0x4
+#define REG_VAL_HEADER_TYPE_ENDPOINT 0x0
+#define REG_VAL_BAR_MEMORY 0x0
+#define REG_VAL_BAR_IO 0x1
+#define REG_VAL_BAR_LOCATE_32 0x0
+#define REG_VAL_BAR_LOCATE_UNDER_1MB 0x2
+#define REG_VAL_BAR_LOCATE_64 0x4
+
+/* PIIX3 configuration registers */
+#define PIIX3_REG_ADDR_PIRQA 0x60
+#define PIIX3_REG_ADDR_PIRQB 0x61
+#define PIIX3_REG_ADDR_PIRQC 0x62
+#define PIIX3_REG_ADDR_PIRQD 0x63
+
+/* Device information */
+#define VIRTIO_NET_DEVICE_ID 0x1000
+#define VIRTIO_NET_VENDOR_ID 0x1af4
+#define VIRTIO_NET_IO_START 0xc000
+#define VIRTIO_NET_IRQ_NUM 10
+#define IVSHMEM_DEVICE_ID 0x1110
+#define IVSHMEM_VENDOR_ID 0x1af4
+#define IVSHMEM_MEMORY_START 0x1000
+#define IVSHMEM_PROTOCOL_VERSION 0
+#define PIIX3_DEVICE_ID 0x7000
+#define PIIX3_VENDOR_ID 0x8086
+
+#define PCI_CONFIG_ADDR(_bus, _device, _function, _offset) ( \
+ (1 << 31) | ((_bus) & 0xff) << 16 | ((_device) & 0x1f) << 11 | \
+ ((_function) & 0xf) << 8 | ((_offset) & 0xfc))
+
+static char interrupt_message[32];
+
+enum qtest_pci_bar_type {
+ QTEST_PCI_BAR_DISABLE = 0,
+ QTEST_PCI_BAR_IO,
+ QTEST_PCI_BAR_MEMORY_UNDER_1MB,
+ QTEST_PCI_BAR_MEMORY_32,
+ QTEST_PCI_BAR_MEMORY_64
+};
+
+struct qtest_pci_bar {
+ enum qtest_pci_bar_type type;
+ uint8_t addr;
+ uint64_t region_start;
+ uint64_t region_size;
+};
+
+struct qtest_session;
+TAILQ_HEAD(qtest_pci_device_list, qtest_pci_device);
+struct qtest_pci_device {
+ TAILQ_ENTRY(qtest_pci_device) next;
+ const char *name;
+ uint16_t device_id;
+ uint16_t vendor_id;
+ uint8_t bus_addr;
+ uint8_t device_addr;
+ struct qtest_pci_bar bar[NB_BAR];
+ int (*init)(struct qtest_session *s, struct qtest_pci_device *dev);
+};
+
+union qtest_pipefds {
+ struct {
+ int pipefd[2];
+ };
+ struct {
+ int readfd;
+ int writefd;
+ };
+};
+
+struct qtest_session {
+ int qtest_socket;
+ pthread_mutex_t qtest_session_lock;
+
+ struct qtest_pci_device_list head;
+ int ivshmem_socket;
+
+ pthread_t event_th;
+ union qtest_pipefds msgfds;
+
+ pthread_t intr_th;
+ union qtest_pipefds irqfds;
+ rte_atomic16_t enable_intr;
+ rte_intr_callback_fn cb;
+ void *cb_arg;
+};
+
+static int
+qtest_write(int fd, char *buf, size_t count)
+{
+ size_t len = count;
+ size_t total_len = 0;
+ int ret = 0;
+
+ while (len > 0) {
+ ret = write(fd, buf, len);
+ if (ret == (int)len)
+ break;
+ if (ret == -1) {
+ if (errno == EINTR)
+ continue;
+ return ret;
+ }
+ total_len += ret;
+ buf += ret;
+ len -= ret;
+ }
+ return total_len + ret;
+}
+
+static int
+qtest_read(int fd, char *buf, size_t count)
+{
+ size_t len = count;
+ size_t total_len = 0;
+ int ret = 0;
+
+ while (len > 0) {
+ ret = read(fd, buf, len);
+ if (ret == (int)len)
+ break;
+ if (*(buf + ret - 1) == '\n')
+ break;
+ if (ret == -1) {
+ if (errno == EINTR)
+ continue;
+ return ret;
+ }
+ total_len += ret;
+ buf += ret;
+ len -= ret;
+ }
+ return total_len + ret;
+}
+
+/*
+ * To know QTest protocol specification, see below QEMU source code.
+ * - qemu/qtest.c
+ */
+static uint32_t
+qtest_in(struct qtest_session *s, uint16_t addr, char type)
+{
+ char buf[1024];
+ int ret;
+
+ if ((type != 'l') && (type != 'w') && (type != 'b'))
+ rte_panic("Invalid value\n");
+
+ snprintf(buf, sizeof(buf), "in%c 0x%x\n", type, addr);
+ /* write to qtest socket */
+ ret = qtest_write(s->qtest_socket, buf, strlen(buf));
+ /* read reply from event handler */
+ ret = qtest_read(s->msgfds.readfd, buf, sizeof(buf));
+ buf[ret] = '\0';
+ return strtoul(buf + strlen("OK "), NULL, 16);
+}
+
+static void
+qtest_out(struct qtest_session *s, uint16_t addr, uint32_t val, char type)
+{
+ char buf[1024];
+ int ret __rte_unused;
+
+ if ((type != 'l') && (type != 'w') && (type != 'b'))
+ rte_panic("Invalid value\n");
+
+ snprintf(buf, sizeof(buf), "out%c 0x%x 0x%x\n", type, addr, val);
+ /* write to qtest socket */
+ ret = qtest_write(s->qtest_socket, buf, strlen(buf));
+ /* read reply from event handler */
+ ret = qtest_read(s->msgfds.readfd, buf, sizeof(buf));
+}
+
+/*
+ * qtest_pci_read/write are based on PCI configuration space specification.
+ * Accroding to the spec, access size of read()/write() should be 4 bytes.
+ */
+static int
+qtest_pci_readb(struct qtest_session *s, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset)
+{
+ uint32_t tmp;
+
+ tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ qtest_out(s, 0xcf8, tmp, 'l');
+ tmp = qtest_in(s, 0xcfc, 'l');
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot unlock mutex\n");
+
+ return (tmp >> ((offset & 0x3) * 8)) & 0xff;
+}
+
+static void
+qtest_pci_writeb(struct qtest_session *s, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset, uint8_t value)
+{
+ uint32_t addr, tmp, pos;
+
+ addr = PCI_CONFIG_ADDR(bus, device, function, offset);
+ pos = (offset % 4) * 8;
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ qtest_out(s, 0xcf8, addr, 'l');
+ tmp = qtest_in(s, 0xcfc, 'l');
+ tmp = (tmp & ~(0xff << pos)) | (value << pos);
+
+ qtest_out(s, 0xcf8, addr, 'l');
+ qtest_out(s, 0xcfc, tmp, 'l');
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot unlock mutex\n");
+}
+
+static uint32_t
+qtest_pci_readl(struct qtest_session *s, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset)
+{
+ uint32_t tmp;
+
+ tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ qtest_out(s, 0xcf8, tmp, 'l');
+ tmp = qtest_in(s, 0xcfc, 'l');
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot unlock mutex\n");
+
+ return tmp;
+}
+
+static void
+qtest_pci_writel(struct qtest_session *s, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset, uint32_t value)
+{
+ uint32_t tmp;
+
+ tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ qtest_out(s, 0xcf8, tmp, 'l');
+ qtest_out(s, 0xcfc, value, 'l');
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot unlock mutex\n");
+}
+
+static uint64_t
+qtest_pci_readq(struct qtest_session *s, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset)
+{
+ uint32_t tmp;
+ uint64_t val;
+
+ tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ qtest_out(s, 0xcf8, tmp, 'l');
+ val = (uint64_t)qtest_in(s, 0xcfc, 'l');
+
+ tmp = PCI_CONFIG_ADDR(bus, device, function, offset + 4);
+
+ qtest_out(s, 0xcf8, tmp, 'l');
+ val |= (uint64_t)qtest_in(s, 0xcfc, 'l') << 32;
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot unlock mutex\n");
+
+ return val;
+}
+
+static void
+qtest_pci_writeq(struct qtest_session *s, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset, uint64_t value)
+{
+ uint32_t tmp;
+
+ tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ qtest_out(s, 0xcf8, tmp, 'l');
+ qtest_out(s, 0xcfc, (uint32_t)(value & 0xffffffff), 'l');
+
+ tmp = PCI_CONFIG_ADDR(bus, device, function, offset + 4);
+
+ qtest_out(s, 0xcf8, tmp, 'l');
+ qtest_out(s, 0xcfc, (uint32_t)(value >> 32), 'l');
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot unlock mutex\n");
+}
+
+/*
+ * virtio_ioport_read/write are Used by virtio-net PMD
+ */
+void
+virtio_ioport_write(struct virtio_hw *hw, uint64_t addr, uint64_t val, char type)
+{
+ struct qtest_session *s = (struct qtest_session *)hw->qsession;
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ qtest_out(s, VIRTIO_NET_IO_START + (uint16_t)addr, val, type);
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+}
+
+uint32_t
+virtio_ioport_read(struct virtio_hw *hw, uint64_t addr, char type)
+{
+ struct qtest_session *s = (struct qtest_session *)hw->qsession;
+ uint32_t val;
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ val = qtest_in(s, VIRTIO_NET_IO_START + (uint16_t)addr, type);
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ return val;
+}
+
+int
+qtest_intr_enable(void *data)
+{
+ struct virtio_hw *hw = ((struct rte_eth_dev_data *)data)->dev_private;
+ struct qtest_session *s;
+
+ s = (struct qtest_session *)hw->qsession;
+ rte_atomic16_set(&s->enable_intr, 1);
+
+ return 0;
+}
+
+int
+qtest_intr_disable(void *data)
+{
+ struct virtio_hw *hw = ((struct rte_eth_dev_data *)data)->dev_private;
+ struct qtest_session *s;
+
+ s = (struct qtest_session *)hw->qsession;
+ rte_atomic16_set(&s->enable_intr, 0);
+
+ return 0;
+}
+
+void
+qtest_intr_callback_register(void *data,
+ rte_intr_callback_fn cb, void *cb_arg)
+{
+ struct virtio_hw *hw = ((struct rte_eth_dev_data *)data)->dev_private;
+ struct qtest_session *s;
+
+ s = (struct qtest_session *)hw->qsession;
+ s->cb = cb;
+ s->cb_arg = cb_arg;
+ rte_atomic16_set(&s->enable_intr, 1);
+}
+
+void
+qtest_intr_callback_unregister(void *data,
+ rte_intr_callback_fn cb __rte_unused,
+ void *cb_arg __rte_unused)
+{
+ struct virtio_hw *hw = ((struct rte_eth_dev_data *)data)->dev_private;
+ struct qtest_session *s;
+
+ s = (struct qtest_session *)hw->qsession;
+ rte_atomic16_set(&s->enable_intr, 0);
+ s->cb = NULL;
+ s->cb_arg = NULL;
+}
+
+static void *
+qtest_intr_handler(void *data) {
+ struct qtest_session *s = (struct qtest_session *)data;
+ char buf[1];
+ int ret;
+
+ for (;;) {
+ ret = qtest_read(s->irqfds.readfd, buf, sizeof(buf));
+ if (ret < 0)
+ return NULL;
+ s->cb(NULL, s->cb_arg);
+ }
+ return NULL;
+}
+
+static int
+qtest_intr_initialize(void *data)
+{
+ struct virtio_hw *hw = ((struct rte_eth_dev_data *)data)->dev_private;
+ struct qtest_session *s;
+ char buf[1024];
+ int ret;
+
+ s = (struct qtest_session *)hw->qsession;
+
+ /* This message will come when interrupt occurs */
+ snprintf(interrupt_message, sizeof(interrupt_message),
+ "IRQ raise %d", VIRTIO_NET_IRQ_NUM);
+
+ snprintf(buf, sizeof(buf), "irq_intercept_in ioapic\n");
+
+ if (pthread_mutex_lock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ /* To enable interrupt, send "irq_intercept_in" message to QEMU */
+ ret = qtest_write(s->qtest_socket, buf, strlen(buf));
+ if (ret < 0) {
+ pthread_mutex_unlock(&s->qtest_session_lock);
+ return -1;
+ }
+
+ /* just ignore QEMU response */
+ ret = qtest_read(s->msgfds.readfd, buf, sizeof(buf));
+ if (ret < 0) {
+ pthread_mutex_unlock(&s->qtest_session_lock);
+ return -1;
+ }
+
+ if (pthread_mutex_unlock(&s->qtest_session_lock) < 0)
+ rte_panic("Cannot lock mutex\n");
+
+ return 0;
+}
+
+static void
+qtest_handle_one_message(struct qtest_session *s, char *buf)
+{
+ int ret;
+
+ if (strncmp(buf, interrupt_message, strlen(interrupt_message)) == 0) {
+ if (rte_atomic16_read(&s->enable_intr) == 0)
+ return;
+
+ /* relay interrupt to pipe */
+ ret = write(s->irqfds.writefd, "1", 1);
+ if (ret < 0)
+ rte_panic("cannot relay interrupt\n");
+ } else {
+ /* relay normal message to pipe */
+ ret = qtest_write(s->msgfds.writefd, buf, strlen(buf));
+ if (ret < 0)
+ rte_panic("cannot relay normal message\n");
+ }
+}
+
+static char *
+qtest_get_next_message(char *p)
+{
+ p = strchr(p, '\n');
+ if ((p == NULL) || (*(p + 1) == '\0'))
+ return NULL;
+ return p + 1;
+}
+
+static void
+qtest_close_one_socket(int *fd)
+{
+ if (*fd > 0) {
+ close(*fd);
+ *fd = -1;
+ }
+}
+
+static void
+qtest_close_sockets(struct qtest_session *s)
+{
+ qtest_close_one_socket(&s->qtest_socket);
+ qtest_close_one_socket(&s->msgfds.readfd);
+ qtest_close_one_socket(&s->msgfds.writefd);
+ qtest_close_one_socket(&s->irqfds.readfd);
+ qtest_close_one_socket(&s->irqfds.writefd);
+ qtest_close_one_socket(&s->ivshmem_socket);
+}
+
+/*
+ * This thread relays QTest response using pipe.
+ * The function is needed because we need to separate IRQ message from others.
+ */
+static void *
+qtest_event_handler(void *data) {
+ struct qtest_session *s = (struct qtest_session *)data;
+ char buf[1024];
+ char *p;
+ int ret;
+
+ for (;;) {
+ memset(buf, 0, sizeof(buf));
+ ret = qtest_read(s->qtest_socket, buf, sizeof(buf));
+ if (ret < 0) {
+ qtest_close_sockets(s);
+ return NULL;
+ }
+
+ /* may receive multiple messages at the same time */
+ p = buf;
+ do {
+ qtest_handle_one_message(s, p);
+ } while ((p = qtest_get_next_message(p)) != NULL);
+ }
+ return NULL;
+}
+
+static int
+qtest_init_piix3_device(struct qtest_session *s, struct qtest_pci_device *dev)
+{
+ uint8_t bus, device, virtio_net_slot = 0;
+ struct qtest_pci_device *tmpdev;
+ uint8_t pcislot2regaddr[] = { 0xff,
+ 0xff,
+ 0xff,
+ PIIX3_REG_ADDR_PIRQC,
+ PIIX3_REG_ADDR_PIRQD,
+ PIIX3_REG_ADDR_PIRQA,
+ PIIX3_REG_ADDR_PIRQB};
+
+ bus = dev->bus_addr;
+ device = dev->device_addr;
+
+ PMD_DRV_LOG(INFO,
+ "Find %s on virtual PCI bus: %04x:%02x:00.0\n",
+ dev->name, bus, device);
+
+ /* Get slot id that is connected to virtio-net */
+ TAILQ_FOREACH(tmpdev, &s->head, next) {
+ if (strcmp(tmpdev->name, "virtio-net") == 0) {
+ virtio_net_slot = tmpdev->device_addr;
+ break;
+ }
+ }
+
+ if (virtio_net_slot == 0)
+ return -1;
+
+ /*
+ * Set interrupt routing for virtio-net device.
+ * Here is i440fx/piix3 connection settings
+ * ---------------------------------------
+ * PCI Slot3 -> PIRQC
+ * PCI Slot4 -> PIRQD
+ * PCI Slot5 -> PIRQA
+ * PCI Slot6 -> PIRQB
+ */
+ if (pcislot2regaddr[virtio_net_slot] != 0xff) {
+ qtest_pci_writeb(s, bus, device, 0,
+ pcislot2regaddr[virtio_net_slot],
+ VIRTIO_NET_IRQ_NUM);
+ }
+
+ return 0;
+}
+
+/*
+ * Common initialization of PCI device.
+ * To know detail, see pci specification.
+ */
+static int
+qtest_init_pci_device(struct qtest_session *s, struct qtest_pci_device *dev)
+{
+ uint8_t i, bus, device;
+ uint32_t val;
+ uint64_t val64;
+
+ bus = dev->bus_addr;
+ device = dev->device_addr;
+
+ PMD_DRV_LOG(INFO,
+ "Find %s on virtual PCI bus: %04x:%02x:00.0\n",
+ dev->name, bus, device);
+
+ /* Check header type */
+ val = qtest_pci_readb(s, bus, device, 0, REG_ADDR_HEADER_TYPE);
+ if (val != REG_VAL_HEADER_TYPE_ENDPOINT) {
+ PMD_DRV_LOG(ERR, "Unexpected header type %d\n", val);
+ return -1;
+ }
+
+ /* Check BAR type */
+ for (i = 0; i < NB_BAR; i++) {
+ val = qtest_pci_readl(s, bus, device, 0, dev->bar[i].addr);
+
+ switch (dev->bar[i].type) {
+ if ((val & 0x1) != REG_VAL_BAR_IO)
+ goto error;
+ break;
+ if ((val & 0x1) != REG_VAL_BAR_MEMORY)
+ goto error;
+ if ((val & 0x6) != REG_VAL_BAR_LOCATE_UNDER_1MB)
+ goto error;
+ break;
+ if ((val & 0x1) != REG_VAL_BAR_MEMORY)
+ goto error;
+ if ((val & 0x6) != REG_VAL_BAR_LOCATE_32)
+ goto error;
+ break;
+ if ((val & 0x1) != REG_VAL_BAR_MEMORY)
+ goto error;
+ if ((val & 0x6) != REG_VAL_BAR_LOCATE_64)
+ goto error;
+ break;
+ break;
+ }
+ }
+
+ /* Enable device */
+ val = qtest_pci_readl(s, bus, device, 0, REG_ADDR_COMMAND);
+ val |= REG_VAL_COMMAND_IO | REG_VAL_COMMAND_MEMORY | REG_VAL_COMMAND_MASTER;
+ qtest_pci_writel(s, bus, device, 0, REG_ADDR_COMMAND, val);
+
+ /* Calculate BAR size */
+ for (i = 0; i < NB_BAR; i++) {
+ switch (dev->bar[i].type) {
+ qtest_pci_writel(s, bus, device, 0,
+ dev->bar[i].addr, 0xffffffff);
+ val = qtest_pci_readl(s, bus, device,
+ 0, dev->bar[i].addr);
+ dev->bar[i].region_size = ~(val & 0xfffffff0) + 1;
+ break;
+ qtest_pci_writeq(s, bus, device, 0,
+ dev->bar[i].addr, 0xffffffffffffffff);
+ val64 = qtest_pci_readq(s, bus, device,
+ 0, dev->bar[i].addr);
+ dev->bar[i].region_size =
+ ~(val64 & 0xfffffffffffffff0) + 1;
+ break;
+ break;
+ }
+ }
+
+ /* Set BAR region */
+ for (i = 0; i < NB_BAR; i++) {
+ switch (dev->bar[i].type) {
+ qtest_pci_writel(s, bus, device, 0, dev->bar[i].addr,
+ dev->bar[i].region_start);
+ PMD_DRV_LOG(INFO, "Set BAR of %s device: 0x%lx - 0x%lx\n",
+ dev->name, dev->bar[i].region_start,
+ dev->bar[i].region_start + dev->bar[i].region_size);
+ break;
+ qtest_pci_writeq(s, bus, device, 0, dev->bar[i].addr,
+ dev->bar[i].region_start);
+ PMD_DRV_LOG(INFO, "Set BAR of %s device: 0x%lx - 0x%lx\n",
+ dev->name, dev->bar[i].region_start,
+ dev->bar[i].region_start + dev->bar[i].region_size);
+ break;
+ break;
+ }
+ }
+
+ return 0;
+
+ PMD_DRV_LOG(ERR, "Unexpected BAR type\n");
+ return -1;
+}
+
+static void
+qtest_find_pci_device(struct qtest_session *s, uint16_t bus, uint8_t device)
+{
+ struct qtest_pci_device *dev;
+ uint32_t val;
+
+ val = qtest_pci_readl(s, bus, device, 0, 0);
+ TAILQ_FOREACH(dev, &s->head, next) {
+ if (val == ((uint32_t)dev->device_id << 16 | dev->vendor_id)) {
+ /* device is found, then store it */
+ dev->bus_addr = bus;
+ dev->device_addr = device;
+ return;
+ }
+ }
+}
+
+static int
+qtest_init_pci_devices(struct qtest_session *s)
+{
+ struct qtest_pci_device *dev;
+ uint16_t bus;
+ uint8_t device;
+ int ret;
+
+ /* Find devices */
+ bus = 0;
+ do {
+ device = 0;
+ do {
+ qtest_find_pci_device(s, bus, device);
+ } while (device++ != NB_DEVICE - 1);
+ } while (bus++ != NB_BUS - 1);
+
+ /* Initialize devices */
+ TAILQ_FOREACH(dev, &s->head, next) {
+ ret = dev->init(s, dev);
+ if (ret != 0)
+ return ret;
+ }
+
+ return 0;
+}
+
+struct rte_pci_id
+qtest_get_pci_id_of_virtio_net(void)
+{
+ struct rte_pci_id id = {VIRTIO_NET_DEVICE_ID,
+ VIRTIO_NET_VENDOR_ID, PCI_ANY_ID, PCI_ANY_ID};
+
+ return id;
+}
+
+static int
+qtest_register_target_devices(struct qtest_session *s)
+{
+ struct qtest_pci_device *virtio_net, *ivshmem, *piix3;
+ const struct rte_memseg *ms;
+
+ ms = rte_eal_get_physmem_layout();
+ /* if EAL memory size isn't pow of 2, ivshmem refuse it */
+ if ((ms[0].len & (ms[0].len - 1)) != 0) {
+ PMD_DRV_LOG(ERR, "memory size must be power of 2\n");
+ return -1;
+ }
+
+ virtio_net = malloc(sizeof(*virtio_net));
+ if (virtio_net == NULL)
+ return -1;
+
+ ivshmem = malloc(sizeof(*ivshmem));
+ if (ivshmem == NULL)
+ return -1;
+
+ piix3 = malloc(sizeof(*piix3));
+ if (piix3 == NULL)
+ return -1;
+
+ memset(virtio_net, 0, sizeof(*virtio_net));
+ memset(ivshmem, 0, sizeof(*ivshmem));
+
+ TAILQ_INIT(&s->head);
+
+ virtio_net->name = "virtio-net";
+ virtio_net->device_id = VIRTIO_NET_DEVICE_ID;
+ virtio_net->vendor_id = VIRTIO_NET_VENDOR_ID;
+ virtio_net->init = qtest_init_pci_device;
+ virtio_net->bar[0].addr = REG_ADDR_BAR0;
+ virtio_net->bar[0].type = QTEST_PCI_BAR_IO;
+ virtio_net->bar[0].region_start = VIRTIO_NET_IO_START;
+ TAILQ_INSERT_TAIL(&s->head, virtio_net, next);
+
+ ivshmem->name = "ivshmem";
+ ivshmem->device_id = IVSHMEM_DEVICE_ID;
+ ivshmem->vendor_id = IVSHMEM_VENDOR_ID;
+ ivshmem->init = qtest_init_pci_device;
+ ivshmem->bar[0].addr = REG_ADDR_BAR0;
+ ivshmem->bar[0].type = QTEST_PCI_BAR_MEMORY_32;
+ ivshmem->bar[0].region_start = IVSHMEM_MEMORY_START;
+ ivshmem->bar[1].addr = REG_ADDR_BAR2;
+ ivshmem->bar[1].type = QTEST_PCI_BAR_MEMORY_64;
+ /* In host mode, only one memory segment is vaild */
+ ivshmem->bar[1].region_start = ms[0].phys_addr;
+ TAILQ_INSERT_TAIL(&s->head, ivshmem, next);
+
+ /* piix3 is needed to route irqs from virtio-net to ioapic */
+ piix3->name = "piix3";
+ piix3->device_id = PIIX3_DEVICE_ID;
+ piix3->vendor_id = PIIX3_VENDOR_ID;
+ piix3->init = qtest_init_piix3_device;
+ TAILQ_INSERT_TAIL(&s->head, piix3, next);
+
+ return 0;
+}
+
+static int
+qtest_send_message_to_ivshmem(int sock_fd, uint64_t client_id, int shm_fd)
+{
+ struct iovec iov;
+ struct msghdr msgh;
+ size_t fdsize = sizeof(int);
+ char control[CMSG_SPACE(fdsize)];
+ struct cmsghdr *cmsg;
+ int ret;
+
+ memset(&msgh, 0, sizeof(msgh));
+ iov.iov_base = &client_id;
+ iov.iov_len = sizeof(client_id);
+
+ msgh.msg_iov = &iov;
+ msgh.msg_iovlen = 1;
+
+ if (shm_fd >= 0) {
+ msgh.msg_control = &control;
+ msgh.msg_controllen = sizeof(control);
+ cmsg = CMSG_FIRSTHDR(&msgh);
+ cmsg->cmsg_len = CMSG_LEN(fdsize);
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ memcpy(CMSG_DATA(cmsg), &shm_fd, fdsize);
+ }
+
+ do {
+ ret = sendmsg(sock_fd, &msgh, 0);
+ } while (ret < 0 && errno == EINTR);
+
+ if (ret < 0) {
+ PMD_DRV_LOG(ERR, "sendmsg error\n");
+ return ret;
+ }
+
+ return ret;
+}
+
+static int
+qtest_open_shared_memory(void)
+{
+ const struct rte_memseg *ms;
+ int shm_fd = -1;
+ uint64_t vaddr;
+ char buf[1024];
+ char *p;
+ FILE *f;
+
+ ms = rte_eal_get_physmem_layout();
+ f = fopen("/proc/self/maps", "r");
+ if (f == NULL)
+ return -1;
+
+ /* parse maps */
+ while (fgets(buf, sizeof(buf), f) != NULL) {
+ /* get vaddr */
+ vaddr = strtoul(buf, NULL, 16);
+
+ /* check if this region is EAL memory */
+ if (vaddr == ms[0].addr_64) {
+ p = strchr(buf, '/');
+ if (p == NULL)
+ return -1;
+ buf[strlen(buf) - 1] = '\0';
+ shm_fd = open(p, O_RDWR);
+ break;
+ }
+ }
+ fclose(f);
+
+ return shm_fd;
+}
+
+static int
+qtest_setup_shared_memory(struct qtest_session *s)
+{
+ int shm_fd, ret;
+
+ /* To share DPDK EAL memory, open EAL memory again */
+ shm_fd = qtest_open_shared_memory();
+ if (shm_fd < 0) {
+ PMD_DRV_LOG(ERR,
+ "Failed to open EAL memory\n");
+ return -1;
+ }
+
+ /* send our protocol version first */
+ ret = qtest_send_message_to_ivshmem(s->ivshmem_socket,
+ IVSHMEM_PROTOCOL_VERSION, -1);
+ if (ret < 0) {
+ PMD_DRV_LOG(ERR,
+ "Failed to send protocol version to ivshmem\n");
+ return -1;
+ }
+
+ /* send client id */
+ ret = qtest_send_message_to_ivshmem(s->ivshmem_socket, 0, -1);
+ if (ret < 0) {
+ PMD_DRV_LOG(ERR, "Failed to send VMID to ivshmem\n");
+ return -1;
+ }
+
+ /* send message to ivshmem */
+ ret = qtest_send_message_to_ivshmem(s->ivshmem_socket, -1, shm_fd);
+ if (ret < 0) {
+ PMD_DRV_LOG(ERR, "Failed to file descriptor to ivshmem\n");
+ return -1;
+ }
+
+ /* close EAL memory again */
+ close(shm_fd);
+
+ return 0;
+}
+
+int
+qtest_vdev_init(struct rte_eth_dev_data *data,
+ int qtest_socket, int ivshmem_socket)
+{
+ struct virtio_hw *hw = ((struct rte_eth_dev_data *)data)->dev_private;
+ struct qtest_session *s;
+ int ret;
+
+ s = rte_zmalloc(NULL, sizeof(*s), RTE_CACHE_LINE_SIZE);
+
+ ret = pipe(s->msgfds.pipefd);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to initialize message pipe\n");
+ return -1;
+ }
+
+ ret = pipe(s->irqfds.pipefd);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to initialize irq pipe\n");
+ return -1;
+ }
+
+ ret = qtest_register_target_devices(s);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to initialize qtest session\n");
+ return -1;
+ }
+
+ ret = pthread_mutex_init(&s->qtest_session_lock, NULL);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to initialize mutex\n");
+ return -1;
+ }
+
+ rte_atomic16_set(&s->enable_intr, 0);
+ s->qtest_socket = qtest_socket;
+ s->ivshmem_socket = ivshmem_socket;
+ hw->qsession = (void *)s;
+
+ ret = pthread_create(&s->event_th, NULL, qtest_event_handler, s);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to create event handler\n");
+ return -1;
+ }
+
+ ret = pthread_create(&s->intr_th, NULL, qtest_intr_handler, s);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to create interrupt handler\n");
+ return -1;
+ }
+
+ ret = qtest_intr_initialize(data);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to initialize interrupt\n");
+ return -1;
+ }
+
+ ret = qtest_setup_shared_memory(s);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to setup shared memory\n");
+ return -1;
+ }
+
+ ret = qtest_init_pci_devices(s);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to initialize devices\n");
+ return -1;
+ }
+
+ return 0;
+}
+
+static void
+qtest_remove_target_devices(struct qtest_session *s)
+{
+ struct qtest_pci_device *dev, *next;
+
+ for (dev = TAILQ_FIRST(&s->head); dev != NULL; dev = next) {
+ next = TAILQ_NEXT(dev, next);
+ TAILQ_REMOVE(&s->head, dev, next);
+ free(dev);
+ }
+}
+
+void
+qtest_vdev_uninit(struct rte_eth_dev_data *data)
+{
+ struct virtio_hw *hw = ((struct rte_eth_dev_data *)data)->dev_private;
+ struct qtest_session *s;
+
+ s = (struct qtest_session *)hw->qsession;
+
+ qtest_close_sockets(s);
+
+ pthread_cancel(s->event_th);
+ pthread_join(s->event_th, NULL);
+
+ pthread_cancel(s->intr_th);
+ pthread_join(s->intr_th, NULL);
+
+ pthread_mutex_destroy(&s->qtest_session_lock);
+
+ qtest_remove_target_devices(s);
+
+ rte_free(s);
+}
diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index d928339..234b561 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -36,6 +36,11 @@
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#endif
#ifdef RTE_EXEC_ENV_LINUXAPP
#include <dirent.h>
#include <fcntl.h>
@@ -56,6 +61,10 @@
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_dev.h>
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+#include <rte_eal_memconfig.h>
+#include <rte_kvargs.h>
+#endif
#include "virtio_ethdev.h"
#include "virtio_pci.h"
@@ -491,8 +500,12 @@ virtio_dev_close(struct rte_eth_dev *dev)
PMD_INIT_LOG(DEBUG, "virtio_dev_close");
/* reset the NIC */
- if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+ if (((dev->dev_type == RTE_ETH_DEV_PCI) &&
+ (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) ||
+ ((dev->dev_type == RTE_ETH_DEV_VIRTUAL) &&
+ (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC))) {
vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
+ }
vtpci_reset(hw);
hw->started = 0;
virtio_dev_free_mbufs(dev);
@@ -1233,15 +1246,22 @@ virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
isr = vtpci_isr(hw);
PMD_DRV_LOG(INFO, "interrupt status = %#x", isr);
- if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0)
- PMD_DRV_LOG(ERR, "interrupt enable failed");
+ if (dev->dev_type == RTE_ETH_DEV_PCI) {
+ if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0)
+ PMD_DRV_LOG(ERR, "interrupt enable failed");
+ }
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if (dev->dev_type == RTE_ETH_DEV_VIRTUAL) {
+ if (qtest_intr_enable(dev->data) < 0)
+ PMD_DRV_LOG(ERR, "interrupt enable failed");
+ }
+#endif
if (isr & VIRTIO_PCI_ISR_CONFIG) {
if (virtio_dev_link_update(dev, 0) == 0)
_rte_eth_dev_callback_process(dev,
RTE_ETH_EVENT_INTR_LSC);
}
-
}
static void
@@ -1264,7 +1284,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
struct virtio_hw *hw = eth_dev->data->dev_private;
struct virtio_net_config *config;
struct virtio_net_config local_config;
- struct rte_pci_device *pci_dev;
+ struct rte_pci_device *pci_dev = eth_dev->pci_dev;
+ struct rte_pci_id id;
RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr));
@@ -1285,13 +1306,20 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
return -ENOMEM;
}
- pci_dev = eth_dev->pci_dev;
+ if (eth_dev->dev_type == RTE_ETH_DEV_PCI) {
+ if (virtio_resource_init(pci_dev) < 0)
+ return -1;
- if (virtio_resource_init(pci_dev) < 0)
- return -1;
-
- hw->use_msix = virtio_has_msix(&pci_dev->addr);
- hw->io_base = (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;
+ hw->use_msix = virtio_has_msix(&pci_dev->addr);
+ hw->io_base = (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;
+ }
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if (eth_dev->dev_type == RTE_ETH_DEV_VIRTUAL) {
+ hw->use_msix = 0;
+ hw->io_base = 0;
+ eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;
+ }
+#endif
/* Reset the device although not necessary at startup */
vtpci_reset(hw);
@@ -1304,8 +1332,14 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
virtio_negotiate_features(hw);
/* If host does not support status then disable LSC */
- if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS))
- pci_dev->driver->drv_flags &= ~RTE_PCI_DRV_INTR_LSC;
+ if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
+ if (eth_dev->dev_type == RTE_ETH_DEV_PCI)
+ pci_dev->driver->drv_flags &= ~RTE_PCI_DRV_INTR_LSC;
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if (eth_dev->dev_type == RTE_ETH_DEV_VIRTUAL)
+ eth_dev->data->dev_flags &= ~RTE_ETH_DEV_INTR_LSC;
+#endif
+ }
rte_eth_copy_pci_info(eth_dev, pci_dev);
@@ -1383,14 +1417,30 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d hw->max_tx_queues=%d",
hw->max_rx_queues, hw->max_tx_queues);
+
+ memset(&id, 0, sizeof(id));
+ if (eth_dev->dev_type == RTE_ETH_DEV_PCI)
+ id = pci_dev->id;
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if (eth_dev->dev_type == RTE_ETH_DEV_VIRTUAL)
+ id = qtest_get_pci_id_of_virtio_net();
+#endif
+
PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
- eth_dev->data->port_id, pci_dev->id.vendor_id,
- pci_dev->id.device_id);
+ eth_dev->data->port_id,
+ id.vendor_id, id.device_id);
/* Setup interrupt callback */
- if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+ if ((eth_dev->dev_type == RTE_ETH_DEV_PCI) &&
+ (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC))
rte_intr_callback_register(&pci_dev->intr_handle,
- virtio_interrupt_handler, eth_dev);
+ virtio_interrupt_handler, eth_dev);
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if ((eth_dev->dev_type == RTE_ETH_DEV_VIRTUAL) &&
+ (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC))
+ qtest_intr_callback_register(eth_dev->data,
+ virtio_interrupt_handler, eth_dev);
+#endif
virtio_dev_cq_start(eth_dev);
@@ -1424,10 +1474,17 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
eth_dev->data->mac_addrs = NULL;
/* reset interrupt callback */
- if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+ if ((eth_dev->dev_type == RTE_ETH_DEV_PCI) &&
+ (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC))
rte_intr_callback_unregister(&pci_dev->intr_handle,
virtio_interrupt_handler,
eth_dev);
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if ((eth_dev->dev_type == RTE_ETH_DEV_VIRTUAL) &&
+ (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC))
+ qtest_intr_callback_unregister(eth_dev->data,
+ virtio_interrupt_handler, eth_dev);
+#endif
PMD_INIT_LOG(DEBUG, "dev_uninit completed");
@@ -1491,11 +1548,15 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return -ENOTSUP;
}
- if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+ if (((dev->dev_type == RTE_ETH_DEV_PCI) &&
+ (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) ||
+ ((dev->dev_type == RTE_ETH_DEV_VIRTUAL) &&
+ (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC))) {
if (vtpci_irq_config(hw, 0) == VIRTIO_MSI_NO_VECTOR) {
PMD_DRV_LOG(ERR, "failed to set config vector");
return -EBUSY;
}
+ }
return 0;
}
@@ -1510,15 +1571,31 @@ virtio_dev_start(struct rte_eth_dev *dev)
/* check if lsc interrupt feature is enabled */
if (dev->data->dev_conf.intr_conf.lsc) {
- if (!(pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) {
- PMD_DRV_LOG(ERR, "link status not supported by host");
- return -ENOTSUP;
- }
+ if (dev->dev_type == RTE_ETH_DEV_PCI) {
+ if (!(pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) {
+ PMD_DRV_LOG(ERR,
+ "link status not supported by host");
+ return -ENOTSUP;
+ }
- if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0) {
- PMD_DRV_LOG(ERR, "interrupt enable failed");
- return -EIO;
+ if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0) {
+ PMD_DRV_LOG(ERR, "interrupt enable failed");
+ return -EIO;
+ }
}
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if (dev->dev_type == RTE_ETH_DEV_VIRTUAL) {
+ if (!(dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)) {
+ PMD_DRV_LOG(ERR,
+ "link status not supported by host");
+ return -ENOTSUP;
+ }
+ if (qtest_intr_enable(dev->data) < 0) {
+ PMD_DRV_LOG(ERR, "interrupt enable failed");
+ return -EIO;
+ }
+ }
+#endif
}
/* Initialize Link state */
@@ -1615,8 +1692,15 @@ virtio_dev_stop(struct rte_eth_dev *dev)
PMD_INIT_LOG(DEBUG, "stop");
- if (dev->data->dev_conf.intr_conf.lsc)
- rte_intr_disable(&dev->pci_dev->intr_handle);
+ if (dev->data->dev_conf.intr_conf.lsc) {
+ if (dev->dev_type == RTE_ETH_DEV_PCI)
+ rte_intr_disable(&dev->pci_dev->intr_handle);
+
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ if (dev->dev_type == RTE_ETH_DEV_VIRTUAL)
+ qtest_intr_disable(dev->data);
+#endif
+ }
memset(&link, 0, sizeof(link));
virtio_dev_atomic_write_link_status(dev, &link);
@@ -1661,7 +1745,13 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info
*dev_info)
{
struct virtio_hw *hw = dev->data->dev_private;
- dev_info->driver_name = dev->driver->pci_drv.name;
+ if (dev->dev_type == RTE_ETH_DEV_PCI)
+ dev_info->driver_name = dev->driver->pci_drv.name;
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ else if (dev->dev_type == RTE_ETH_DEV_VIRTUAL)
+ dev_info->driver_name = dev->data->drv_name;
+#endif
+
dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
dev_info->min_rx_bufsize = VIRTIO_MIN_RX_BUFSIZE;
@@ -1689,3 +1779,196 @@ static struct rte_driver rte_virtio_driver = {
};
PMD_REGISTER_DRIVER(rte_virtio_driver);
+
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+
+#define ETH_VIRTIO_NET_ARG_QTEST_PATH "qtest"
+#define ETH_VIRTIO_NET_ARG_IVSHMEM_PATH "ivshmem"
+
+static const char *valid_args[] = {
+ ETH_VIRTIO_NET_ARG_QTEST_PATH,
+ ETH_VIRTIO_NET_ARG_IVSHMEM_PATH,
+ NULL
+};
+
+static int
+get_string_arg(const char *key __rte_unused,
+ const char *value, void *extra_args)
+{
+ int ret, fd, loop = 3;
+ int *pfd = extra_args;
+ struct sockaddr_un sa = {0};
+
+ if ((value == NULL) || (extra_args == NULL))
+ return -EINVAL;
+
+ fd = socket(AF_UNIX, SOCK_STREAM, 0);
+ if (fd < 0)
+ return -1;
+
+ sa.sun_family = AF_UNIX;
+ strncpy(sa.sun_path, value, sizeof(sa.sun_path));
+
+ while (loop--) {
+ /*
+ * may need to wait for qtest and ivshmem
+ * sockets are prepared by QEMU.
+ */
+ ret = connect(fd, (struct sockaddr *)&sa,
+ sizeof(struct sockaddr_un));
+ if (ret != 0)
+ sleep(1);
+ else
+ break;
+ }
+
+ if (ret != 0) {
+ close(fd);
+ return -1;
+ }
+
+ *pfd = fd;
+
+ return 0;
+}
+
+static struct rte_eth_dev *
+virtio_net_eth_dev_alloc(const char *name)
+{
+ struct rte_eth_dev *eth_dev;
+ struct rte_eth_dev_data *data;
+ struct virtio_hw *hw;
+
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+ if (eth_dev == NULL)
+ rte_panic("cannot alloc rte_eth_dev\n");
+
+ data = eth_dev->data;
+
+ hw = rte_zmalloc(NULL, sizeof(*hw), 0);
+ if (!hw)
+ rte_panic("malloc virtio_hw failed\n");
+
+ data->dev_private = hw;
+ eth_dev->driver = &rte_virtio_pmd;
+ return eth_dev;
+}
+
+/*
+ * Initialization when "CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE" is enabled.
+ */
+static int
+rte_virtio_net_pmd_init(const char *name, const char *params)
+{
+ struct rte_kvargs *kvlist = NULL;
+ struct rte_eth_dev *eth_dev = NULL;
+ int ret, qtest_sock, ivshmem_sock;
+ struct rte_mem_config *mcfg;
+
+ if (params == NULL || params[0] == '\0')
+ goto error;
+
+ /* get pointer to global configuration */
+ mcfg = rte_eal_get_configuration()->mem_config;
+
+ /* Check if EAL memory consists of one memory segment */
+ if ((RTE_MAX_MEMSEG > 1) && (mcfg->memseg[1].addr != NULL)) {
+ PMD_INIT_LOG(ERR, "Non contigious memory");
+ goto error;
+ }
+
+ kvlist = rte_kvargs_parse(params, valid_args);
+ if (!kvlist) {
+ PMD_INIT_LOG(ERR, "error when parsing param");
+ goto error;
+ }
+
+ if (rte_kvargs_count(kvlist, ETH_VIRTIO_NET_ARG_IVSHMEM_PATH) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VIRTIO_NET_ARG_IVSHMEM_PATH,
+ &get_string_arg, &ivshmem_sock);
+ if (ret != 0) {
+ PMD_INIT_LOG(ERR,
+ "Failed to connect to ivshmem socket");
+ goto error;
+ }
+ } else {
+ PMD_INIT_LOG(ERR, "No argument specified for %s",
+ ETH_VIRTIO_NET_ARG_IVSHMEM_PATH);
+ goto error;
+ }
+
+ if (rte_kvargs_count(kvlist, ETH_VIRTIO_NET_ARG_QTEST_PATH) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_VIRTIO_NET_ARG_QTEST_PATH,
+ &get_string_arg, &qtest_sock);
+ if (ret != 0) {
+ PMD_INIT_LOG(ERR,
+ "Failed to connect to qtest socket");
+ goto error;
+ }
+ } else {
+ PMD_INIT_LOG(ERR, "No argument specified for %s",
+ ETH_VIRTIO_NET_ARG_QTEST_PATH);
+ goto error;
+ }
+
+ eth_dev = virtio_net_eth_dev_alloc(name);
+
+ qtest_vdev_init(eth_dev->data, qtest_sock, ivshmem_sock);
+
+ /* originally, this will be called in rte_eal_pci_probe() */
+ eth_virtio_dev_init(eth_dev);
+
+ eth_dev->driver = NULL;
+ eth_dev->data->dev_flags |= RTE_ETH_DEV_DETACHABLE;
+ eth_dev->data->kdrv = RTE_KDRV_NONE;
+ eth_dev->data->drv_name = "rte_virtio_pmd";
+
+ rte_kvargs_free(kvlist);
+ return 0;
+
+ rte_kvargs_free(kvlist);
+ return -EFAULT;
+}
+
+/*
+ * Finalization when "CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE" is enabled.
+ */
+static int
+rte_virtio_net_pmd_uninit(const char *name)
+{
+ struct rte_eth_dev *eth_dev = NULL;
+ int ret;
+
+ if (name == NULL)
+ return -EINVAL;
+
+ /* find the ethdev entry */
+ eth_dev = rte_eth_dev_allocated(name);
+ if (eth_dev == NULL)
+ return -ENODEV;
+
+ ret = eth_virtio_dev_uninit(eth_dev);
+ if (ret != 0)
+ return -EFAULT;
+
+ qtest_vdev_uninit(eth_dev->data);
+ rte_free(eth_dev->data->dev_private);
+
+ ret = rte_eth_dev_release_port(eth_dev);
+ if (ret != 0)
+ return -EFAULT;
+
+ return 0;
+}
+
+static struct rte_driver rte_virtio_net_driver = {
+ .name = "eth_virtio_net",
+ .type = PMD_VDEV,
+ .init = rte_virtio_net_pmd_init,
+ .uninit = rte_virtio_net_pmd_uninit,
+};
+
+PMD_REGISTER_DRIVER(rte_virtio_net_driver);
+
+#endif /* RTE_LIBRTE_VIRTIO_HOST_MODE */
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index ae2d47d..eefc7be 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -122,5 +122,17 @@ uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf
**tx_pkts,
#define VTNET_LRO_FEATURES (VIRTIO_NET_F_GUEST_TSO4 | \
VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN)
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+int qtest_vdev_init(struct rte_eth_dev_data *data,
+ int qtest_socket, int ivshmem_socket);
+void qtest_vdev_uninit(struct rte_eth_dev_data *data);
+void qtest_intr_callback_register(void *data,
+ rte_intr_callback_fn cb, void *cb_arg);
+void qtest_intr_callback_unregister(void *data,
+ rte_intr_callback_fn cb, void *cb_arg);
+int qtest_intr_enable(void *data);
+int qtest_intr_disable(void *data);
+struct rte_pci_id qtest_get_pci_id_of_virtio_net(void);
+#endif /* RTE_LIBRTE_VIRTIO_HOST_MODE */
#endif /* _VIRTIO_ETHDEV_H_ */
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 47f722a..d4ede73 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -165,6 +165,9 @@ struct virtqueue;
struct virtio_hw {
struct virtqueue *cvq;
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ void *qsession;
+#endif
uint32_t io_base;
uint32_t guest_features;
uint32_t max_tx_queues;
@@ -226,6 +229,26 @@ outl_p(unsigned int data, unsigned int port)
}
#endif
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+
+uint32_t virtio_ioport_read(struct virtio_hw *, uint64_t, char type);
+void virtio_ioport_write(struct virtio_hw *, uint64_t, uint64_t, char type);
+
+#define VIRTIO_READ_REG_1(hw, reg) \
+ virtio_ioport_read(hw, reg, 'b')
+#define VIRTIO_WRITE_REG_1(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value, 'b')
+#define VIRTIO_READ_REG_2(hw, reg) \
+ virtio_ioport_read(hw, reg, 'w')
+#define VIRTIO_WRITE_REG_2(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value, 'w')
+#define VIRTIO_READ_REG_4(hw, reg) \
+ virtio_ioport_read(hw, reg, 'l')
+#define VIRTIO_WRITE_REG_4(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value, 'l')
+
+#else /* RTE_LIBRTE_VIRTIO_HOST_MODE */
+
I have a concern against such compile-time switches. What if we want the same code to work for both 'real' virtio and socket-based?
Shouldn't we introduce some function pointers here to be able to switch them at runtime?
#define VIRTIO_PCI_REG_ADDR(hw, reg) \
(unsigned short)((hw)->io_base + (reg))
@@ -244,6 +267,8 @@ outl_p(unsigned int data, unsigned int port)
#define VIRTIO_WRITE_REG_4(hw, reg, value) \
outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg))))
+#endif /* RTE_LIBRTE_VIRTIO_HOST_MODE */
+
static inline int
vtpci_with_feature(struct virtio_hw *hw, uint32_t bit)
{
--
2.1.4
Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia
Tetsuya Mukawa
2016-01-06 03:57:07 UTC
Permalink
Post by Pavel Fedin
Hello!
Post by Tetsuya Mukawa
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 47f722a..d4ede73 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -165,6 +165,9 @@ struct virtqueue;
struct virtio_hw {
struct virtqueue *cvq;
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ void *qsession;
+#endif
uint32_t io_base;
uint32_t guest_features;
uint32_t max_tx_queues;
@@ -226,6 +229,26 @@ outl_p(unsigned int data, unsigned int port)
}
#endif
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+
+uint32_t virtio_ioport_read(struct virtio_hw *, uint64_t, char type);
+void virtio_ioport_write(struct virtio_hw *, uint64_t, uint64_t, char type);
+
+#define VIRTIO_READ_REG_1(hw, reg) \
+ virtio_ioport_read(hw, reg, 'b')
+#define VIRTIO_WRITE_REG_1(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value, 'b')
+#define VIRTIO_READ_REG_2(hw, reg) \
+ virtio_ioport_read(hw, reg, 'w')
+#define VIRTIO_WRITE_REG_2(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value, 'w')
+#define VIRTIO_READ_REG_4(hw, reg) \
+ virtio_ioport_read(hw, reg, 'l')
+#define VIRTIO_WRITE_REG_4(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value, 'l')
+
+#else /* RTE_LIBRTE_VIRTIO_HOST_MODE */
+
I have a concern against such compile-time switches. What if we want the same code to work for both 'real' virtio and socket-based?
Shouldn't we introduce some function pointers here to be able to switch them at runtime?
Hi Pavel,

Thanks for commenting.
In that case, you will run QEMU, then create containers in the guest.
Do you have an use case for this usage?

Anyway, such a feature depends on how to allocate share memory.
So far, this patch allow you to run both virtio-net 'real' and 'virtual'
PMDs on guest, but it will be changed to remove contiguous memory
restriction.
Could you please see an other thread that we talk about the restriction
in? (I will add you to CC.)

Thanks,
Tetsuya
Tan, Jianfeng
2016-01-06 05:56:41 UTC
Permalink
Post by Tetsuya Mukawa
Post by Pavel Fedin
Hello!
Post by Tetsuya Mukawa
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 47f722a..d4ede73 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -165,6 +165,9 @@ struct virtqueue;
struct virtio_hw {
struct virtqueue *cvq;
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ void *qsession;
+#endif
uint32_t io_base;
uint32_t guest_features;
uint32_t max_tx_queues;
@@ -226,6 +229,26 @@ outl_p(unsigned int data, unsigned int port)
}
#endif
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+
+uint32_t virtio_ioport_read(struct virtio_hw *, uint64_t, char type);
+void virtio_ioport_write(struct virtio_hw *, uint64_t, uint64_t, char type);
+
+#define VIRTIO_READ_REG_1(hw, reg) \
+ virtio_ioport_read(hw, reg, 'b')
+#define VIRTIO_WRITE_REG_1(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value, 'b')
+#define VIRTIO_READ_REG_2(hw, reg) \
+ virtio_ioport_read(hw, reg, 'w')
+#define VIRTIO_WRITE_REG_2(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value, 'w')
+#define VIRTIO_READ_REG_4(hw, reg) \
+ virtio_ioport_read(hw, reg, 'l')
+#define VIRTIO_WRITE_REG_4(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value, 'l')
+
+#else /* RTE_LIBRTE_VIRTIO_HOST_MODE */
+
I have a concern against such compile-time switches. What if we want the same code to work for both 'real' virtio and socket-based?
Shouldn't we introduce some function pointers here to be able to switch them at runtime?
Hi Pavel,
Thanks for commenting.
In that case, you will run QEMU, then create containers in the guest.
Do you have an use case for this usage?
Anyway, such a feature depends on how to allocate share memory.
So far, this patch allow you to run both virtio-net 'real' and 'virtual'
PMDs on guest, but it will be changed to remove contiguous memory
restriction.
Could you please see an other thread that we talk about the restriction
in? (I will add you to CC.)
Thanks,
Tetsuya
Hi Tetsuya,

I prefer to a compiled library to work well in both VM and container.

For this issue, we can address this issue using Yuanhan's way to address
virtio 1.0 support.
(He introduces struct virtio_pci_ops)

Thanks,
Jianfeng
Tetsuya Mukawa
2016-01-06 07:27:53 UTC
Permalink
Post by Tan, Jianfeng
Post by Tetsuya Mukawa
Post by Pavel Fedin
Hello!
Post by Tetsuya Mukawa
diff --git a/drivers/net/virtio/virtio_pci.h
b/drivers/net/virtio/virtio_pci.h
index 47f722a..d4ede73 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -165,6 +165,9 @@ struct virtqueue;
struct virtio_hw {
struct virtqueue *cvq;
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+ void *qsession;
+#endif
uint32_t io_base;
uint32_t guest_features;
uint32_t max_tx_queues;
@@ -226,6 +229,26 @@ outl_p(unsigned int data, unsigned int port)
}
#endif
+#ifdef RTE_LIBRTE_VIRTIO_HOST_MODE
+
+uint32_t virtio_ioport_read(struct virtio_hw *, uint64_t, char type);
+void virtio_ioport_write(struct virtio_hw *, uint64_t, uint64_t, char type);
+
+#define VIRTIO_READ_REG_1(hw, reg) \
+ virtio_ioport_read(hw, reg, 'b')
+#define VIRTIO_WRITE_REG_1(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value, 'b')
+#define VIRTIO_READ_REG_2(hw, reg) \
+ virtio_ioport_read(hw, reg, 'w')
+#define VIRTIO_WRITE_REG_2(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value, 'w')
+#define VIRTIO_READ_REG_4(hw, reg) \
+ virtio_ioport_read(hw, reg, 'l')
+#define VIRTIO_WRITE_REG_4(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value, 'l')
+
+#else /* RTE_LIBRTE_VIRTIO_HOST_MODE */
+
I have a concern against such compile-time switches. What if we
want the same code to work for both 'real' virtio and socket-based?
Shouldn't we introduce some function pointers here to be able to
switch them at runtime?
Hi Pavel,
Thanks for commenting.
In that case, you will run QEMU, then create containers in the guest.
Do you have an use case for this usage?
Anyway, such a feature depends on how to allocate share memory.
So far, this patch allow you to run both virtio-net 'real' and 'virtual'
PMDs on guest, but it will be changed to remove contiguous memory
restriction.
Could you please see an other thread that we talk about the restriction
in? (I will add you to CC.)
Thanks,
Tetsuya
Hi Tetsuya,
I prefer to a compiled library to work well in both VM and container.
For this issue, we can address this issue using Yuanhan's way to
address virtio 1.0 support.
(He introduces struct virtio_pci_ops)
Thanks,
Jianfeng
Sounds great!
I will check it.

Tetsuya
Tan, Jianfeng
2015-12-24 14:05:13 UTC
Permalink
Hi Tetsuya,

After several days' studying your patch, I have some questions as follows:

1. Is physically-contig memory really necessary?
This is a too strong requirement IMHO. IVSHMEM doesn't require this in its original meaning. So how do you think of
Huawei Xie's idea of using virtual address for address translation? (In addition, virtual address of mem_table could be
different in application and QTest, but this can be addressed because SET_MEM_TABLE msg will be intercepted by
QTest)

2. Is root privilege OK in container's case?
Another reason we'd like to give up physically-contig feature is that it needs root privilege to read /proc/self/pagemap
file. Container has already been widely criticized for bad security isolation. Enabling root privilege will make it worse.
On the other hand, it's not easy to remove root privilege too. If we use vhost-net as the backend, kernel will definitely
require root privilege to create a tap device/raw socket. We tend to pick such work, which requires root, into runtime
preparation of a container. Do you agree?

3.Is one Qtest process per virtio device too heavy?
Although we can foresee that each container always owns only one virtio device, but take its possible high density
into consideration, hundreds or even thousands of container requires the same number of QTest processes. As
you mentioned that port hotplug is supported, is it possible to use just one QTest process for all virtio devices
emulation?

As you know, we have another solution according to this (which under heavy internal review). But I think we have lots
of common problems to be solved, right?

Thanks for your great work!

Thanks,
Jianfeng
-----Original Message-----
Sent: Wednesday, December 16, 2015 4:37 PM
Subject: [PATCH v1 0/2] Virtio-net PMD Extension to work on host
[Change log]
(Just listing functionality changes and important bug fix)
* Support virtio-net interrupt handling.
(It means virtio-net PMD on host and guest have same virtio-net features)
* Fix memory allocation method to allocate contiguous memory correctly.
* Port Hotplug is supported.
* Rebase on DPDK-2.2.
[Abstraction]
Normally, virtio-net PMD only works on VM, because there is no virtio-net device on host.
This RFC patch extends virtio-net PMD to be able to work on host as virtual PMD.
But we didn't implement virtio-net device as a part of virtio-net PMD.
To prepare virtio-net device for the PMD, start QEMU process with special
QTest mode, then connect it from virtio-net PMD through unix domain
socket.
The virtio-net PMD on host is fully compatible with the PMD on guest.
We can use same functionalities, and connect to anywhere QEMU virtio-net device can.
For example, the PMD can use virtio-net multi queues function. Also it can
connects to vhost-net kernel module and vhost-user backend application.
Similar to virtio-net PMD on QEMU, application memory that uses virtio-net
PMD will be shared between vhost backend application. But vhost backend
application memory will not be shared.
Main target of this PMD is container like docker, rkt, lxc and etc.
We can isolate related processes(virtio-net PMD process, QEMU and vhost-
user backend process) by container.
But, to communicate through unix domain socket, shared directory will be needed.
[How to use]
So far, we need QEMU patch to connect to vhost-user backend.
See below patch.
- http://patchwork.ozlabs.org/patch/552549/
To know how to use, check commit log.
[Detailed Description]
- virtio-net device implementation
This host mode PMD uses QEMU virtio-net device. To do that, QEMU QTest
functionality is used.
QTest is a test framework of QEMU devices. It allows us to implement a
device driver outside of QEMU.
With QTest, we can implement DPDK application and virtio-net PMD as
standalone process on host.
When QEMU is invoked as QTest mode, any guest code will not run.
To know more about QTest, see below.
- http://wiki.qemu.org/Features/QTest
- probing devices
QTest provides a unix domain socket. Through this socket, driver process can
access to I/O port and memory of QEMU virtual machine.
The PMD will send I/O port accesses to probe pci devices.
If we can find virtio-net and ivshmem device, initialize the devices.
Also, I/O port accesses of virtio-net PMD will be sent through socket, and
virtio-net PMD can initialize vitio-net device on QEMU correctly.
- ivshmem device to share memory
To share memory that virtio-net PMD process uses, ivshmem device will be used.
Because ivshmem device can only handle one file descriptor, shared memory
should be consist of one file.
To allocate such a memory, EAL has new option called "--contig-mem".
If the option is specified, EAL will open a file and allocate memory from hugepages.
While initializing ivshmem device, we can set BAR(Base Address Register).
It represents which memory QEMU vcpu can access to this shared memory.
We will specify host physical address of shared memory as this address.
It is very useful because we don't need to apply patch to QEMU to calculate address offset.
(For example, if virtio-net PMD process will allocate memory from shared
memory, then specify the physical address of it to virtio-net register, QEMU
virtio-net device can understand it without calculating address offset.)
[Known issues]
- vhost-user
So far, to use vhost-user, we need to apply a patch to QEMU.
This is because, QEMU will not send memory information and file descriptor
of ivshmem device to vhost-user backend.
I have submitted the patch to QEMU.
See "http://patchwork.ozlabs.org/patch/552549/".
Also, we may have an issue in DPDK vhost library to handle kickfd and callfd.
The patch for this issue is needed. I have a workaround patch, but let me check it more.
If someone wants to check vhost-user behavior, I will describe it more in later email.
EAL: Add new EAL "--contig-mem" option
virtio: Extend virtio-net PMD to support container environment
config/common_linuxapp | 1 +
drivers/net/virtio/Makefile | 4 +
drivers/net/virtio/qtest.c | 1107
++++++++++++++++++++++++++++
drivers/net/virtio/virtio_ethdev.c | 341 ++++++++-
drivers/net/virtio/virtio_ethdev.h | 12 +
drivers/net/virtio/virtio_pci.h | 25 +
lib/librte_eal/common/eal_common_options.c | 7 +
lib/librte_eal/common/eal_internal_cfg.h | 1 +
lib/librte_eal/common/eal_options.h | 2 +
lib/librte_eal/linuxapp/eal/eal_memory.c | 77 +-
10 files changed, 1543 insertions(+), 34 deletions(-)
create mode 100644 drivers/net/virtio/qtest.c
--
2.1.4
Tetsuya Mukawa
2015-12-28 11:06:19 UTC
Permalink
Post by Tan, Jianfeng
Hi Tetsuya,
1. Is physically-contig memory really necessary?
This is a too strong requirement IMHO. IVSHMEM doesn't require this in its original meaning. So how do you think of
Huawei Xie's idea of using virtual address for address translation? (In addition, virtual address of mem_table could be
different in application and QTest, but this can be addressed because SET_MEM_TABLE msg will be intercepted by
QTest)
Hi Jianfeng,

Thanks for your suggestion.
Huawei's idea may solve contig-mem restriction.
Let me have time to check it more.
Post by Tan, Jianfeng
2. Is root privilege OK in container's case?
Another reason we'd like to give up physically-contig feature is that it needs root privilege to read /proc/self/pagemap
file. Container has already been widely criticized for bad security isolation. Enabling root privilege will make it worse.
I haven't checked how to invoke DPDK application in non-privileged
container.
But if we can invoke it, it's great.

I guess if we allocate memory like you did, probably we will not read
"/proc/self/pagemap".
Then we will be able to invoke DPDK application in non-privileged container.
Is this correct?
Post by Tan, Jianfeng
On the other hand, it's not easy to remove root privilege too. If we use vhost-net as the backend, kernel will definitely
require root privilege to create a tap device/raw socket. We tend to pick such work, which requires root, into runtime
preparation of a container. Do you agree?
Yes, I agree. It's not easy to remove root privilege in some cases.
I guess if we can remove it in vhost-user case, it will be enough for
DPDK users.
What do you think?
Post by Tan, Jianfeng
3.Is one Qtest process per virtio device too heavy?
Although we can foresee that each container always owns only one virtio device, but take its possible high density
into consideration, hundreds or even thousands of container requires the same number of QTest processes. As
you mentioned that port hotplug is supported, is it possible to use just one QTest process for all virtio devices
emulation?
Yes, we can use pci hotplug for that purpose.
But it may depends on security policy.
The shared QEMU process knows all file descriptors of DPDK application
memories.
Because of this, I guess some users don't want to share QEMU process.

If the vhost-user is used, QEMU process doesn't use CPU resource.
So, I am not sure sleeping QEMU process will be overhead.

BTW, If we use pci hotplug, we need to use (virtual) pci bridge to
cascade pci devices.
So implementation will be more complex.
Honestly, I am not sure I will be able to finish it by next DPDK release.
How about starting from this implementation?
If we really need this feature, then add it.
Post by Tan, Jianfeng
As you know, we have another solution according to this (which under heavy internal review). But I think we have lots
of common problems to be solved, right?
Yes, I think so. And thanks for good suggestion.

Tetsuya,
Post by Tan, Jianfeng
Thanks for your great work!
Thanks,
Jianfeng
-----Original Message-----
Sent: Wednesday, December 16, 2015 4:37 PM
Subject: [PATCH v1 0/2] Virtio-net PMD Extension to work on host
[Change log]
(Just listing functionality changes and important bug fix)
* Support virtio-net interrupt handling.
(It means virtio-net PMD on host and guest have same virtio-net features)
* Fix memory allocation method to allocate contiguous memory correctly.
* Port Hotplug is supported.
* Rebase on DPDK-2.2.
[Abstraction]
Normally, virtio-net PMD only works on VM, because there is no virtio-net device on host.
This RFC patch extends virtio-net PMD to be able to work on host as virtual PMD.
But we didn't implement virtio-net device as a part of virtio-net PMD.
To prepare virtio-net device for the PMD, start QEMU process with special
QTest mode, then connect it from virtio-net PMD through unix domain
socket.
The virtio-net PMD on host is fully compatible with the PMD on guest.
We can use same functionalities, and connect to anywhere QEMU virtio-net device can.
For example, the PMD can use virtio-net multi queues function. Also it can
connects to vhost-net kernel module and vhost-user backend application.
Similar to virtio-net PMD on QEMU, application memory that uses virtio-net
PMD will be shared between vhost backend application. But vhost backend
application memory will not be shared.
Main target of this PMD is container like docker, rkt, lxc and etc.
We can isolate related processes(virtio-net PMD process, QEMU and vhost-
user backend process) by container.
But, to communicate through unix domain socket, shared directory will be needed.
[How to use]
So far, we need QEMU patch to connect to vhost-user backend.
See below patch.
- http://patchwork.ozlabs.org/patch/552549/
To know how to use, check commit log.
[Detailed Description]
- virtio-net device implementation
This host mode PMD uses QEMU virtio-net device. To do that, QEMU QTest
functionality is used.
QTest is a test framework of QEMU devices. It allows us to implement a
device driver outside of QEMU.
With QTest, we can implement DPDK application and virtio-net PMD as
standalone process on host.
When QEMU is invoked as QTest mode, any guest code will not run.
To know more about QTest, see below.
- http://wiki.qemu.org/Features/QTest
- probing devices
QTest provides a unix domain socket. Through this socket, driver process can
access to I/O port and memory of QEMU virtual machine.
The PMD will send I/O port accesses to probe pci devices.
If we can find virtio-net and ivshmem device, initialize the devices.
Also, I/O port accesses of virtio-net PMD will be sent through socket, and
virtio-net PMD can initialize vitio-net device on QEMU correctly.
- ivshmem device to share memory
To share memory that virtio-net PMD process uses, ivshmem device will be used.
Because ivshmem device can only handle one file descriptor, shared memory
should be consist of one file.
To allocate such a memory, EAL has new option called "--contig-mem".
If the option is specified, EAL will open a file and allocate memory from hugepages.
While initializing ivshmem device, we can set BAR(Base Address Register).
It represents which memory QEMU vcpu can access to this shared memory.
We will specify host physical address of shared memory as this address.
It is very useful because we don't need to apply patch to QEMU to calculate
address offset.
(For example, if virtio-net PMD process will allocate memory from shared
memory, then specify the physical address of it to virtio-net register, QEMU
virtio-net device can understand it without calculating address offset.)
[Known issues]
- vhost-user
So far, to use vhost-user, we need to apply a patch to QEMU.
This is because, QEMU will not send memory information and file descriptor
of ivshmem device to vhost-user backend.
I have submitted the patch to QEMU.
See "http://patchwork.ozlabs.org/patch/552549/".
Also, we may have an issue in DPDK vhost library to handle kickfd and callfd.
The patch for this issue is needed. I have a workaround patch, but let me check it more.
If someone wants to check vhost-user behavior, I will describe it more in later email.
EAL: Add new EAL "--contig-mem" option
virtio: Extend virtio-net PMD to support container environment
config/common_linuxapp | 1 +
drivers/net/virtio/Makefile | 4 +
drivers/net/virtio/qtest.c | 1107
++++++++++++++++++++++++++++
drivers/net/virtio/virtio_ethdev.c | 341 ++++++++-
drivers/net/virtio/virtio_ethdev.h | 12 +
drivers/net/virtio/virtio_pci.h | 25 +
lib/librte_eal/common/eal_common_options.c | 7 +
lib/librte_eal/common/eal_internal_cfg.h | 1 +
lib/librte_eal/common/eal_options.h | 2 +
lib/librte_eal/linuxapp/eal/eal_memory.c | 77 +-
10 files changed, 1543 insertions(+), 34 deletions(-)
create mode 100644 drivers/net/virtio/qtest.c
--
2.1.4
Tetsuya Mukawa
2016-01-06 03:57:27 UTC
Permalink
Post by Tetsuya Mukawa
Post by Tan, Jianfeng
Hi Tetsuya,
1. Is physically-contig memory really necessary?
This is a too strong requirement IMHO. IVSHMEM doesn't require this in its original meaning. So how do you think of
Huawei Xie's idea of using virtual address for address translation? (In addition, virtual address of mem_table could be
different in application and QTest, but this can be addressed because SET_MEM_TABLE msg will be intercepted by
QTest)
Hi Jianfeng,
Thanks for your suggestion.
Huawei's idea may solve contig-mem restriction.
Let me have time to check it more.
Hi Jianfeng,

I made sure we can remove the restriction with Huawei's idea.
One thing I concern is below.
If we don't use contiguous memory, this PMD will not work with other
'physical' PMDs like e1000 PMD, virtio-net PMD, and etc.
(This is because allocated memory may not be physically contiguous.)

One of examples is that if we implement like above, in QEMU guest, we
can handle a host NIC directly, but in container, we will not be able to
handle the device.
This will be a restriction for this virtual addressing changing.

Do you know an use case that the user wants to handle 'physical' PMD and
'virtual' virtio-net PMD together?

Tetsuya,
Tan, Jianfeng
2016-01-06 05:42:40 UTC
Permalink
Post by Tetsuya Mukawa
Post by Tetsuya Mukawa
Post by Tan, Jianfeng
Hi Tetsuya,
1. Is physically-contig memory really necessary?
This is a too strong requirement IMHO. IVSHMEM doesn't require this in its original meaning. So how do you think of
Huawei Xie's idea of using virtual address for address translation? (In addition, virtual address of mem_table could be
different in application and QTest, but this can be addressed because SET_MEM_TABLE msg will be intercepted by
QTest)
Hi Jianfeng,
Thanks for your suggestion.
Huawei's idea may solve contig-mem restriction.
Let me have time to check it more.
Hi Jianfeng,
I made sure we can remove the restriction with Huawei's idea.
One thing I concern is below.
If we don't use contiguous memory, this PMD will not work with other
'physical' PMDs like e1000 PMD, virtio-net PMD, and etc.
(This is because allocated memory may not be physically contiguous.)
One of examples is that if we implement like above, in QEMU guest, we
can handle a host NIC directly, but in container, we will not be able to
handle the device.
This will be a restriction for this virtual addressing changing.
Do you know an use case that the user wants to handle 'physical' PMD and
'virtual' virtio-net PMD together?
Tetsuya,
Hi Tetsuya,

I have no use case in hand, which handles 'physical' PMDs and 'virtual'
virtio-net PMD together.
(Pavel Fedin once tried to run ovs in container, but that case just uses
virtual virtio devices, I
don't know if he has plan to add 'physical' PMDs as well.)

Actually, it's not completely contradictory to make them work together.
Like this:
a. containers with root privilege
We can initialize memory as legacy way. (TODO: besides
physical-contiguous, we try allocate
virtual-contiguous big area for all memsegs as well.)
a.1 For vhost-net, before sending memory regions into kernel, we can
merge those virtual-contiguous regions into one region.
a.2 For vhost-user, we can merge memory regions in the vhost. The
blocker is that for now, maximum fd num was restricted
by VHOST_MEMORY_MAX_NREGIONS=8 (so in 2M-hugepage's case, 16M shared
memory is not nearly enough).

b. containers without root privilege
No need to worry about this problem, because it lacks of privilege to
construct physical-contiguous memory.

Thanks,
Jianfeng
Tetsuya Mukawa
2016-01-06 07:35:00 UTC
Permalink
Post by Tan, Jianfeng
Post by Tetsuya Mukawa
Post by Tetsuya Mukawa
Post by Tan, Jianfeng
Hi Tetsuya,
1. Is physically-contig memory really necessary?
This is a too strong requirement IMHO. IVSHMEM doesn't require this
in its original meaning. So how do you think of
Huawei Xie's idea of using virtual address for address translation?
(In addition, virtual address of mem_table could be
different in application and QTest, but this can be addressed
because SET_MEM_TABLE msg will be intercepted by
QTest)
Hi Jianfeng,
Thanks for your suggestion.
Huawei's idea may solve contig-mem restriction.
Let me have time to check it more.
Hi Jianfeng,
I made sure we can remove the restriction with Huawei's idea.
One thing I concern is below.
If we don't use contiguous memory, this PMD will not work with other
'physical' PMDs like e1000 PMD, virtio-net PMD, and etc.
(This is because allocated memory may not be physically contiguous.)
One of examples is that if we implement like above, in QEMU guest, we
can handle a host NIC directly, but in container, we will not be able to
handle the device.
This will be a restriction for this virtual addressing changing.
Do you know an use case that the user wants to handle 'physical' PMD and
'virtual' virtio-net PMD together?
Tetsuya,
Hi Tetsuya,
I have no use case in hand, which handles 'physical' PMDs and
'virtual' virtio-net PMD together.
(Pavel Fedin once tried to run ovs in container, but that case just
uses virtual virtio devices, I
don't know if he has plan to add 'physical' PMDs as well.)
Actually, it's not completely contradictory to make them work
a. containers with root privilege
We can initialize memory as legacy way. (TODO: besides
physical-contiguous, we try allocate
virtual-contiguous big area for all memsegs as well.)
Hi Jianfeng,

Yes, I agree with you.
If the feature is really needed, we will be able to have work around.
Post by Tan, Jianfeng
a.1 For vhost-net, before sending memory regions into kernel, we can
merge those virtual-contiguous regions into one region.
a.2 For vhost-user, we can merge memory regions in the vhost. The
blocker is that for now, maximum fd num was restricted
by VHOST_MEMORY_MAX_NREGIONS=8 (so in 2M-hugepage's case, 16M shared
memory is not nearly enough).
With current your implementation, when 'virtual' virtio-net PMD is used,
'phys_addr' will be virtual address in EAL layer.

struct rte_memseg {
phys_addr_t phys_addr; /**< Start physical address. */
union {
void *addr; /**< Start virtual address. */
uint64_t addr_64; /**< Makes sure addr is always 64
bits */
};
.......
};

How about choosing it in virtio-net PMD?
(In the case of 'virtual', just use 'addr' instead of using 'phys_addr'.)
For example, port0 may use physical address, but port1 may use virtual
address.

With this, of course, we don't have an issue with 'physical' virtio-net PMD.
Also, with 'virtual' virtio-net PMD, we can use virtual address and fd
that represents the big virtual address space.
(TODO: Need to change rte_memseg and EAL to keep fd and offset?)
Then, you don't worry about VHOST_MEMORY_MAX_NREGIONS, because we have
only one fd.
Post by Tan, Jianfeng
b. containers without root privilege
No need to worry about this problem, because it lacks of privilege to
construct physical-contiguous memory.
Yes, we cannot run 'physical' PMDs in this type of container.
Anyway, I will check it more, if we really need it.

Thanks,
Tetsuya
Tan, Jianfeng
2016-01-11 05:31:41 UTC
Permalink
Hi Tetsuya,
Post by Tetsuya Mukawa
With current your implementation, when 'virtual' virtio-net PMD is used,
'phys_addr' will be virtual address in EAL layer.
struct rte_memseg {
phys_addr_t phys_addr; /**< Start physical address. */
union {
void *addr; /**< Start virtual address. */
uint64_t addr_64; /**< Makes sure addr is always 64
bits */
};
.......
};
It's not true. It does not effect EAL layer at all. Just fill virtual
address in virtio PMD when:
1). set_base_addr;
2). preparing RX's descriptors;
3). transmitting packets, CVA is filled in TX's descriptors;
4). in TX and CQ's header, CVA is used.
Post by Tetsuya Mukawa
How about choosing it in virtio-net PMD?
My current implementation works as you say.
Post by Tetsuya Mukawa
(In the case of 'virtual', just use 'addr' instead of using 'phys_addr'.)
For example, port0 may use physical address, but port1 may use virtual
address.
With this, of course, we don't have an issue with 'physical' virtio-net PMD.
Also, with 'virtual' virtio-net PMD, we can use virtual address and fd
that represents the big virtual address space.
(TODO: Need to change rte_memseg and EAL to keep fd and offset?)
I suppose you mean that when initializing memory, just maintain one fd
in the end, and
mmap all memsegs inside it. This sounds like a good idea to solve the
limitation of
VHOST_MEMORY_MAX_NREGIONS.

Besides, Sergio and I are discussing about using VA instead of PA in
VFIO to avoid the
requirement of physical-config for physical devices.


Thanks,
Jianfeng
Post by Tetsuya Mukawa
Then, you don't worry about VHOST_MEMORY_MAX_NREGIONS, because we have
only one fd.
Post by Tan, Jianfeng
b. containers without root privilege
No need to worry about this problem, because it lacks of privilege to
construct physical-contiguous memory.
Yes, we cannot run 'physical' PMDs in this type of container.
Anyway, I will check it more, if we really need it.
Thanks,
Tetsuya
Tetsuya Mukawa
2015-11-19 10:57:30 UTC
Permalink
The patch extends virtio-net PMD to work on host. Actually, we don't have
virtio-net devices on host. Thus, this PMD called "cvio PMD" is for
virtual device.

To prepare virtio-net device on host, the users need to invoke QEMU process
in special qtest mode. In this mode, no guest runs. Also, this mode is mainly
used for testing QEMU devices from outer process.
Here is example of command line.

$ qemu-system-x86_64 -machine accel=qtest -display none \
-qtest unix:/tmp/qtest.sock,server \
-netdev type=tap,script=/etc/qemu-ifup,id=net0 \
-device virtio-net-pci,netdev=net0 \
-chardev socket,id=chr0,path=/tmp/ivshmem.sock,server \
-device ivshmem,size=1G,chardev=chr0,vectors=1

After invoking QEMU, cvio PMD can connect to QEMU process using unix
domain sockets. Over these sockets, virtio-net device and ivshmem device
in QEMU are probed by cvio PMD. Here is example of command line.

$ testpmd -c f -n 1 -m 1024 --shm \
--vdev="eth_cvio0,qtest=/tmp/qtest.sock,ivshmem=/tmp/ivshmem.sock" \
-- --disable-hw-vlan --txqflags=0xf00 -i

Please specify same unix domain sockets and memory size in both command
lines like above.

Also, "--shm" option is needed for cvio PMD like above. This option creates
one hugepage file on hugetlbfs. It means we need enough contiguous memory.
If there is no enough contiguous memory, initialization will be failed.

This contiguous memory is used for sharing memory between DPDK application
and ivshmem device in QEMU.

Signed-off-by: Tetsuya Mukawa <***@igel.co.jp>
---
config/common_linuxapp | 5 +
drivers/net/virtio/Makefile | 4 +
drivers/net/virtio/qtest.c | 590 +++++++++++++++++++++++++++++++++++++
drivers/net/virtio/virtio_ethdev.c | 214 ++++++++++++--
drivers/net/virtio/virtio_ethdev.h | 16 +
drivers/net/virtio/virtio_pci.h | 25 ++
6 files changed, 833 insertions(+), 21 deletions(-)
create mode 100644 drivers/net/virtio/qtest.c

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 7248262..11e8fd1 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -478,3 +478,8 @@ CONFIG_RTE_APP_TEST=y
CONFIG_RTE_TEST_PMD=y
CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=n
CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
+
+#
+# Enable virtio support for container
+#
+CONFIG_RTE_VIRTIO_VDEV=n
diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 43835ba..5d6f69c 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -52,6 +52,10 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c

+ifeq ($(CONFIG_RTE_VIRTIO_VDEV),y)
+ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += qtest.c
+endif
+
# this lib depends upon:
DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_mempool lib/librte_mbuf
diff --git a/drivers/net/virtio/qtest.c b/drivers/net/virtio/qtest.c
new file mode 100644
index 0000000..005fa24
--- /dev/null
+++ b/drivers/net/virtio/qtest.c
@@ -0,0 +1,590 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 IGEL Co., Ltd. All rights reserved.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of IGEL Co., Ltd. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/queue.h>
+#include <signal.h>
+
+#include "virtio_pci.h"
+#include "virtio_logs.h"
+#include "virtio_ethdev.h"
+
+#define IVSHMEM_PROTOCOL_VERSION 0
+
+#define NB_BUS 256
+#define NB_DEVICE 32
+#define NB_PCI_BAR 6
+
+#define REG_ADDR_VENDOR_ID 0x0
+#define REG_ADDR_DEVICE_ID 0x2
+#define REG_ADDR_COMMAND 0x4
+#define REG_ADDR_STATUS 0x6
+#define REG_ADDR_REVISION_ID 0x8
+#define REG_ADDR_CLASS_CODE 0x9
+#define REG_ADDR_CACHE_LINE_S 0xc
+#define REG_ADDR_LAT_TIMER 0xd
+#define REG_ADDR_HEADER_TYPE 0xe
+#define REG_ADDR_BIST 0xf
+#define REG_ADDR_BAR0 0x10
+#define REG_ADDR_BAR1 0x14
+#define REG_ADDR_BAR2 0x18
+#define REG_ADDR_BAR3 0x1c
+#define REG_ADDR_BAR4 0x20
+#define REG_ADDR_BAR5 0x24
+
+#define REG_VAL_COMMAND_IO 0x1
+#define REG_VAL_COMMAND_MEMORY 0x2
+#define REG_VAL_COMMAND_MASTER 0x4
+
+#define REG_VAL_HEADER_TYPE_ENDPOINT 0x0
+
+#define REG_VAL_BAR_MEMORY 0x0
+#define REG_VAL_BAR_IO 0x1
+#define REG_VAL_BAR_LOCATE_32 0x0
+#define REG_VAL_BAR_LOCATE_UNDER_1MB 0x2
+#define REG_VAL_BAR_LOCATE_64 0x4
+
+#define VIRTIO_NET_IO_START 0xc000
+
+enum qtest_pci_bar_type {
+ QTEST_PCI_BAR_DISABLE = 0,
+ QTEST_PCI_BAR_IO,
+ QTEST_PCI_BAR_MEMORY_UNDER_1MB,
+ QTEST_PCI_BAR_MEMORY_32,
+ QTEST_PCI_BAR_MEMORY_64
+};
+
+struct qtest_pci_bar {
+ enum qtest_pci_bar_type type;
+ uint8_t addr;
+ uint64_t region_start;
+ uint64_t region_size;
+};
+
+TAILQ_HEAD(qtest_pci_device_list, qtest_pci_device);
+struct qtest_pci_device {
+ TAILQ_ENTRY(qtest_pci_device) next;
+ const char *name;
+ uint16_t device_id;
+ uint16_t vendor_id;
+ uint8_t bus_addr;
+ uint8_t device_addr;
+ struct qtest_pci_bar bar[NB_PCI_BAR];
+ int (*init)(int fd, struct qtest_pci_device *dev);
+};
+
+struct qtest_session {
+ int qtest_socket;
+ int ivshmem_socket;
+ int shm_fd;
+ struct qtest_pci_device_list head;
+};
+
+static uint32_t
+qtest_inl(int fd, uint16_t addr)
+{
+ char buf[1024];
+ int ret;
+
+ snprintf(buf, sizeof(buf), "inl 0x%x\n", addr);
+ ret = write(fd, buf, strlen(buf));
+ ret = read(fd, buf, sizeof(buf));
+ buf[ret] = '\0';
+ return strtoul(buf + strlen("OK "), NULL, 16);
+}
+
+static void
+qtest_outl(int fd, uint16_t addr, uint32_t val)
+{
+ char buf[1024];
+ int ret;
+
+ snprintf(buf, sizeof(buf), "outl 0x%x 0x%x\n", addr, val);
+ ret = write(fd, buf, strlen(buf));
+ if (ret < 0)
+ return;
+
+ ret = read(fd, buf, sizeof(buf));
+ if (ret < 0)
+ return;
+}
+
+static int
+qtest_pci_readb(int fd, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset)
+{
+ uint32_t tmp;
+
+ tmp = 0x80000000 | (bus & 0xff) << 16 | (device & 0x1f) << 11 |
+ (function & 0xf) << 8 | (offset & 0xfc);
+
+ qtest_outl(fd, 0xcf8, tmp);
+ tmp = qtest_inl(fd, 0xcfc);
+
+ return (tmp >> ((offset & 0x3) * 8)) & 0xff;
+}
+
+static uint32_t
+qtest_pci_readl(int fd, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset)
+{
+ uint32_t tmp;
+
+ tmp = 0x80000000 | (bus & 0xff) << 16 | (device & 0x1f) << 11 |
+ (function & 0xf) << 8 | (offset & 0xfc);
+
+ qtest_outl(fd, 0xcf8, tmp);
+ return qtest_inl(fd, 0xcfc);
+}
+
+static void
+qtest_pci_writel(int fd, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset, uint32_t value)
+{
+ uint32_t tmp;
+
+ tmp = 0x80000000 | (bus & 0xff) << 16 | (device & 0x1f) << 11 |
+ (function & 0xf) << 8 | (offset & 0xfc);
+
+ qtest_outl(fd, 0xcf8, tmp);
+ qtest_outl(fd, 0xcfc, value);
+ return;
+}
+
+static uint64_t
+qtest_pci_readq(int fd, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset)
+{
+ uint32_t tmp;
+ uint64_t val;
+
+ tmp = 0x80000000 | (bus & 0xff) << 16 | (device & 0x1f) << 11 |
+ (function & 0xf) << 8 | (offset & 0xfc);
+
+ qtest_outl(fd, 0xcf8, tmp);
+ val = (uint64_t)qtest_inl(fd, 0xcfc);
+
+ tmp = 0x80000000 | (bus & 0xff) << 16 | (device & 0x1f) << 11 |
+ (function & 0xf) << 8 | ((offset + 4)& 0xfc);
+
+ qtest_outl(fd, 0xcf8, tmp);
+ val |= (uint64_t)qtest_inl(fd, 0xcfc) << 32;
+
+ return val;
+}
+
+static void
+qtest_pci_writeq(int fd, uint8_t bus, uint8_t device,
+ uint8_t function, uint8_t offset, uint64_t value)
+{
+ uint32_t tmp;
+
+ tmp = 0x80000000 | (bus & 0xff) << 16 | (device & 0x1f) << 11 |
+ (function & 0xf) << 8 | (offset & 0xfc);
+
+ qtest_outl(fd, 0xcf8, tmp);
+ qtest_outl(fd, 0xcfc, (uint32_t)(value & 0xffffffff));
+
+ tmp = 0x80000000 | (bus & 0xff) << 16 | (device & 0x1f) << 11 |
+ (function & 0xf) << 8 | ((offset + 4) & 0xfc);
+
+ qtest_outl(fd, 0xcf8, tmp);
+ qtest_outl(fd, 0xcfc, (uint32_t)(value >> 32));
+ return;
+}
+
+static int
+qtest_init_pci_device(int fd, struct qtest_pci_device *dev)
+{
+ uint8_t i, bus, device;
+ uint32_t val;
+ uint64_t val64;
+
+ bus = dev->bus_addr;
+ device = dev->device_addr;
+
+ PMD_DRV_LOG(INFO,
+ "Find %s on virtual PCI bus: %04x:%02x:00.0\n",
+ dev->name, bus, device);
+
+ /* Check header type */
+ val = qtest_pci_readb(fd, bus, device, 0, REG_ADDR_HEADER_TYPE);
+ if (val != REG_VAL_HEADER_TYPE_ENDPOINT) {
+ PMD_DRV_LOG(ERR, "Unexpected header type %d\n", val);
+ return -1;
+ }
+
+ /* Check BAR type */
+ for (i = 0; i < NB_PCI_BAR; i++) {
+ val = qtest_pci_readl(fd, bus, device, 0, dev->bar[i].addr);
+
+ switch (dev->bar[i].type) {
+ case QTEST_PCI_BAR_IO:
+ if ((val & 0x1) != REG_VAL_BAR_IO)
+ goto error;
+ break;
+ case QTEST_PCI_BAR_MEMORY_UNDER_1MB:
+ if ((val & 0x1) != REG_VAL_BAR_MEMORY)
+ goto error;
+ if ((val & 0x6) != REG_VAL_BAR_LOCATE_UNDER_1MB)
+ goto error;
+ break;
+ case QTEST_PCI_BAR_MEMORY_32:
+ if ((val & 0x1) != REG_VAL_BAR_MEMORY)
+ goto error;
+ if ((val & 0x6) != REG_VAL_BAR_LOCATE_32)
+ goto error;
+ break;
+ case QTEST_PCI_BAR_MEMORY_64:
+ if ((val & 0x1) != REG_VAL_BAR_MEMORY)
+ goto error;
+ if ((val & 0x6) != REG_VAL_BAR_LOCATE_64)
+ goto error;
+ break;
+ case QTEST_PCI_BAR_DISABLE:
+ break;
+ }
+ }
+
+ /* Enable device */
+ val = qtest_pci_readl(fd, bus, device, 0, REG_ADDR_COMMAND);
+ val |= REG_VAL_COMMAND_IO | REG_VAL_COMMAND_MEMORY | REG_VAL_COMMAND_MASTER;
+ qtest_pci_writel(fd, bus, device, 0, REG_ADDR_COMMAND, val);
+
+ /* Calculate BAR size */
+ for (i = 0; i < NB_PCI_BAR; i++) {
+ switch (dev->bar[i].type) {
+ case QTEST_PCI_BAR_IO:
+ case QTEST_PCI_BAR_MEMORY_UNDER_1MB:
+ case QTEST_PCI_BAR_MEMORY_32:
+ qtest_pci_writel(fd, bus, device, 0,
+ dev->bar[i].addr, 0xffffffff);
+ val = qtest_pci_readl(fd, bus, device,
+ 0, dev->bar[i].addr);
+ dev->bar[i].region_size = ~(val & 0xfffffff0) + 1;
+ break;
+ case QTEST_PCI_BAR_MEMORY_64:
+ qtest_pci_writeq(fd, bus, device, 0,
+ dev->bar[i].addr, 0xffffffffffffffff);
+ val64 = qtest_pci_readq(fd, bus, device,
+ 0, dev->bar[i].addr);
+ dev->bar[i].region_size =
+ ~(val64 & 0xfffffffffffffff0) + 1;
+ break;
+ case QTEST_PCI_BAR_DISABLE:
+ break;
+ }
+ }
+
+ /* Set BAR region */
+ for (i = 0; i < NB_PCI_BAR; i++) {
+ switch (dev->bar[i].type) {
+ case QTEST_PCI_BAR_IO:
+ case QTEST_PCI_BAR_MEMORY_UNDER_1MB:
+ case QTEST_PCI_BAR_MEMORY_32:
+ qtest_pci_writel(fd, bus, device, 0, dev->bar[i].addr,
+ dev->bar[i].region_start);
+ PMD_DRV_LOG(INFO, "Set BAR of %s device: 0x%lx - 0x%lx\n",
+ dev->name, dev->bar[i].region_start,
+ dev->bar[i].region_start + dev->bar[i].region_size);
+ break;
+ case QTEST_PCI_BAR_MEMORY_64:
+ qtest_pci_writeq(fd, bus, device, 0, dev->bar[i].addr,
+ dev->bar[i].region_start);
+ PMD_DRV_LOG(INFO, "Set BAR of %s device: 0x%lx - 0x%lx\n",
+ dev->name, dev->bar[i].region_start,
+ dev->bar[i].region_start + dev->bar[i].region_size);
+ break;
+ case QTEST_PCI_BAR_DISABLE:
+ break;
+ }
+
+ }
+
+ return 0;
+
+error:
+ PMD_DRV_LOG(ERR, "Unexpected BAR type\n");
+ return -1;
+}
+
+static int
+qtest_try_init_pci_device(struct qtest_session *s, uint16_t bus, uint8_t device)
+{
+ struct qtest_pci_device *dev;
+ uint32_t val;
+
+ val = qtest_pci_readl(s->qtest_socket, bus, device, 0, 0);
+ TAILQ_FOREACH(dev, &s->head, next) {
+ if (val == ((uint32_t)dev->device_id << 16 | dev->vendor_id)) {
+ dev->bus_addr = bus;
+ dev->device_addr = device;
+ return dev->init(s->qtest_socket, dev);
+ }
+ }
+
+ return 0;
+}
+
+static int
+qtest_init_pci_devices(struct qtest_session *s)
+{
+ uint16_t bus;
+ uint8_t device;
+ int ret;
+
+ bus = 0;
+ do {
+ device = 0;
+ do {
+ ret = qtest_try_init_pci_device(s, bus, device);
+ if (ret != 0)
+ return ret;
+ }
+ while (device++ != NB_DEVICE - 1);
+ } while (bus++ != NB_BUS - 1);
+
+ return 0;
+}
+
+static void
+qtest_close_session(struct qtest_session *s)
+{
+ close(s->qtest_socket);
+ close(s->ivshmem_socket);
+}
+
+static int
+qtest_register_target_devices(struct qtest_session *s)
+{
+ struct qtest_pci_device *virtio_net, *ivshmem;
+ void *addr;
+
+ virtio_net = malloc(sizeof(*virtio_net));
+ if (virtio_net == NULL)
+ return -1;
+
+ ivshmem = malloc(sizeof(*ivshmem));
+ if (ivshmem == NULL)
+ return -1;
+
+ memset(virtio_net, 0, sizeof(*virtio_net));
+ memset(ivshmem, 0, sizeof(*ivshmem));
+
+ TAILQ_INIT(&s->head);
+
+ virtio_net->name = "virtio-net";
+ virtio_net->device_id = 0x1000;
+ virtio_net->vendor_id = 0x1af4;
+ virtio_net->init = qtest_init_pci_device;
+ virtio_net->bar[0].addr = REG_ADDR_BAR0;
+ virtio_net->bar[0].type = QTEST_PCI_BAR_IO;
+ virtio_net->bar[0].region_start = VIRTIO_NET_IO_START;
+
+ TAILQ_INSERT_TAIL(&s->head, virtio_net, next);
+
+ ivshmem->name = "ivshmem";
+ ivshmem->device_id = 0x1110;
+ ivshmem->vendor_id = 0x1af4;
+ ivshmem->init = qtest_init_pci_device;
+ ivshmem->bar[0].addr = REG_ADDR_BAR0;
+ ivshmem->bar[0].type = QTEST_PCI_BAR_MEMORY_32;
+ ivshmem->bar[0].region_start = 0;
+ ivshmem->bar[1].addr = REG_ADDR_BAR2;
+ ivshmem->bar[1].type = QTEST_PCI_BAR_MEMORY_64;
+
+ rte_memseg_info_get(0, NULL, NULL, &addr);
+ ivshmem->bar[1].region_start = rte_mem_virt2phy(addr);
+
+ TAILQ_INSERT_TAIL(&s->head, ivshmem, next);
+
+ return 0;
+}
+
+static int
+qtest_send_message_to_ivshmem(int sock_fd, uint64_t client_id, int shm_fd)
+{
+ struct iovec iov;
+ struct msghdr msgh;
+ size_t fdsize = sizeof(int);
+ char control[CMSG_SPACE(fdsize)];
+ struct cmsghdr *cmsg;
+ int ret;
+
+ memset(&msgh, 0, sizeof(msgh));
+ iov.iov_base = &client_id;
+ iov.iov_len = sizeof(client_id);
+
+ msgh.msg_iov = &iov;
+ msgh.msg_iovlen = 1;
+
+ if (shm_fd >= 0) {
+ msgh.msg_control = &control;
+ msgh.msg_controllen = sizeof(control);
+ cmsg = CMSG_FIRSTHDR(&msgh);
+ cmsg->cmsg_len = CMSG_LEN(fdsize);
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ memcpy(CMSG_DATA(cmsg), &shm_fd, fdsize);
+ }
+
+ do {
+ ret = sendmsg(sock_fd, &msgh, 0);
+ } while (ret < 0 && errno == EINTR);
+
+ if (ret < 0) {
+ PMD_DRV_LOG(ERR, "sendmsg error\n");
+ return ret;
+ }
+
+ return ret;
+}
+
+static int
+qtest_setup_shared_memory(struct qtest_session *s)
+{
+ int ret;
+
+ /* send our protocol version first */
+ ret = qtest_send_message_to_ivshmem(s->ivshmem_socket,
+ IVSHMEM_PROTOCOL_VERSION, -1);
+ if (ret < 0) {
+ PMD_DRV_LOG(ERR,
+ "Failed to send protocol version to ivshmem\n");
+ return -1;
+ }
+
+ /* send cliend id */
+ ret = qtest_send_message_to_ivshmem(s->ivshmem_socket, 0, -1);
+ if (ret < 0) {
+ PMD_DRV_LOG(ERR, "Failed to send VMID to ivshmem\n");
+ return -1;
+ }
+
+ /* send shm_fd */
+ ret = qtest_send_message_to_ivshmem(s->ivshmem_socket, -1, s->shm_fd);
+ if (ret < 0) {
+ PMD_DRV_LOG(ERR, "Failed to file descriptor to ivshmem\n");
+ return -1;
+ }
+
+ return 0;
+}
+
+static void
+qtest_remove_target_devices(struct qtest_session *s)
+{
+ struct qtest_pci_device *dev, *next;
+
+ for (dev = TAILQ_FIRST(&s->head); dev != NULL; dev = next) {
+ next = TAILQ_NEXT(dev, next);
+ TAILQ_REMOVE(&s->head, dev, next);
+ free(dev);
+ }
+}
+
+int
+qtest_vdev_init(struct rte_eth_dev_data *data,
+ int qtest_socket, int ivshmem_socket)
+{
+ struct virtio_hw *hw = data->dev_private;
+ struct qtest_session *s;
+ uint64_t size = 0;
+ int ret;
+ int shm_fd;
+
+ s = malloc(sizeof(*s));
+
+ ret = qtest_register_target_devices(s);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to initialize qtest session\n");
+ return -1;
+ }
+
+ rte_memseg_info_get(0, &shm_fd, &size, NULL);
+
+ s->qtest_socket = qtest_socket;
+ s->ivshmem_socket = ivshmem_socket;
+ s->shm_fd = shm_fd;
+
+ ret = qtest_setup_shared_memory(s);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to setup shared memory\n");
+ return -1;
+ }
+
+ ret = qtest_init_pci_devices(s);
+ if (ret != 0) {
+ PMD_DRV_LOG(ERR, "Failed to initialize devices\n");
+ return -1;
+ }
+
+ hw->qsession = (void *)s;
+
+ return 0;
+}
+
+void
+qtest_vdev_uninit(struct rte_eth_dev_data *data)
+{
+ struct virtio_hw *hw = data->dev_private;
+ struct qtest_session *s;
+
+ s = (struct qtest_session *)hw->qsession;
+ qtest_close_session(s);
+ qtest_remove_target_devices(s);
+}
+
+void
+virtio_ioport_write(struct virtio_hw *hw, uint64_t addr, uint64_t val)
+{
+ struct qtest_session *s = (struct qtest_session *)hw->qsession;
+
+ return qtest_outl(s->qtest_socket,
+ VIRTIO_NET_IO_START + (uint16_t)addr, val);
+}
+
+uint32_t
+virtio_ioport_read(struct virtio_hw *hw, uint64_t addr)
+{
+ struct qtest_session *s = (struct qtest_session *)hw->qsession;
+
+ return qtest_inl(s->qtest_socket,
+ VIRTIO_NET_IO_START + (uint16_t)addr);
+}
+
diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 74c00ee..ea42ef1 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -36,6 +36,11 @@
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
+#ifdef RTE_VIRTIO_VDEV
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#endif
#ifdef RTE_EXEC_ENV_LINUXAPP
#include <dirent.h>
#include <fcntl.h>
@@ -56,6 +61,9 @@
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_dev.h>
+#ifdef RTE_VIRTIO_VDEV
+#include <rte_kvargs.h>
+#endif

#include "virtio_ethdev.h"
#include "virtio_pci.h"
@@ -491,8 +499,10 @@ virtio_dev_close(struct rte_eth_dev *dev)
PMD_INIT_LOG(DEBUG, "virtio_dev_close");

/* reset the NIC */
- if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+ if ((dev->dev_type == RTE_ETH_DEV_PCI) &&
+ (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) {
vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
+ }
vtpci_reset(hw);
hw->started = 0;
virtio_dev_free_mbufs(dev);
@@ -1266,7 +1276,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
struct virtio_hw *hw = eth_dev->data->dev_private;
struct virtio_net_config *config;
struct virtio_net_config local_config;
- struct rte_pci_device *pci_dev;
+ struct rte_pci_device *pci_dev = eth_dev->pci_dev;

RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr));

@@ -1287,15 +1297,18 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
return -ENOMEM;
}

- pci_dev = eth_dev->pci_dev;
+ if (eth_dev->dev_type == RTE_ETH_DEV_PCI) {
+ rte_eth_copy_pci_info(eth_dev, pci_dev);

- rte_eth_copy_pci_info(eth_dev, pci_dev);
+ if (virtio_resource_init(pci_dev) < 0)
+ return -1;

- if (virtio_resource_init(pci_dev) < 0)
- return -1;
-
- hw->use_msix = virtio_has_msix(&pci_dev->addr);
- hw->io_base = (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;
+ hw->use_msix = virtio_has_msix(&pci_dev->addr);
+ hw->io_base = (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;
+ } else {
+ hw->use_msix = 0;
+ hw->io_base = 0;
+ }

/* Reset the device although not necessary at startup */
vtpci_reset(hw);
@@ -1308,8 +1321,10 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
virtio_negotiate_features(hw);

/* If host does not support status then disable LSC */
- if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS))
+ if ((eth_dev->dev_type == RTE_ETH_DEV_PCI) &&
+ (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS))) {
pci_dev->driver->drv_flags &= ~RTE_PCI_DRV_INTR_LSC;
+ }

rx_func_get(eth_dev);

@@ -1385,14 +1400,16 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d hw->max_tx_queues=%d",
hw->max_rx_queues, hw->max_tx_queues);
- PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
- eth_dev->data->port_id, pci_dev->id.vendor_id,
- pci_dev->id.device_id);
+ if (eth_dev->dev_type == RTE_ETH_DEV_PCI) {
+ PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
+ eth_dev->data->port_id, pci_dev->id.vendor_id,
+ pci_dev->id.device_id);

- /* Setup interrupt callback */
- if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
- rte_intr_callback_register(&pci_dev->intr_handle,
- virtio_interrupt_handler, eth_dev);
+ /* Setup interrupt callback */
+ if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+ rte_intr_callback_register(&pci_dev->intr_handle,
+ virtio_interrupt_handler, eth_dev);
+ }

virtio_dev_cq_start(eth_dev);

@@ -1426,10 +1443,12 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
eth_dev->data->mac_addrs = NULL;

/* reset interrupt callback */
- if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+ if ((eth_dev->dev_type == RTE_ETH_DEV_PCI) &&
+ (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) {
rte_intr_callback_unregister(&pci_dev->intr_handle,
virtio_interrupt_handler,
eth_dev);
+ }

PMD_INIT_LOG(DEBUG, "dev_uninit completed");

@@ -1493,11 +1512,13 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return -ENOTSUP;
}

- if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+ if ((dev->dev_type == RTE_ETH_DEV_PCI) &&
+ (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) {
if (vtpci_irq_config(hw, 0) == VIRTIO_MSI_NO_VECTOR) {
PMD_DRV_LOG(ERR, "failed to set config vector");
return -EBUSY;
}
+ }

return 0;
}
@@ -1511,7 +1532,8 @@ virtio_dev_start(struct rte_eth_dev *dev)
struct rte_pci_device *pci_dev = dev->pci_dev;

/* check if lsc interrupt feature is enabled */
- if (dev->data->dev_conf.intr_conf.lsc) {
+ if ((dev->dev_type == RTE_ETH_DEV_PCI) &&
+ (dev->data->dev_conf.intr_conf.lsc)) {
if (!(pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) {
PMD_DRV_LOG(ERR, "link status not supported by host");
return -ENOTSUP;
@@ -1617,8 +1639,10 @@ virtio_dev_stop(struct rte_eth_dev *dev)

PMD_INIT_LOG(DEBUG, "stop");

- if (dev->data->dev_conf.intr_conf.lsc)
+ if ((dev->dev_type == RTE_ETH_DEV_PCI) &&
+ (dev->data->dev_conf.intr_conf.lsc)) {
rte_intr_disable(&dev->pci_dev->intr_handle);
+ }

memset(&link, 0, sizeof(link));
virtio_dev_atomic_write_link_status(dev, &link);
@@ -1691,3 +1715,151 @@ static struct rte_driver rte_virtio_driver = {
};

PMD_REGISTER_DRIVER(rte_virtio_driver);
+
+#ifdef RTE_VIRTIO_VDEV
+
+#define ETH_CVIO_ARG_QTEST_PATH "qtest"
+#define ETH_CVIO_ARG_IVSHMEM_PATH "ivshmem"
+
+/*TODO: specify mac addr */
+static const char *valid_args[] = {
+ ETH_CVIO_ARG_QTEST_PATH,
+ ETH_CVIO_ARG_IVSHMEM_PATH,
+ NULL
+};
+
+static int
+get_string_arg(const char *key __rte_unused,
+ const char *value, void *extra_args)
+{
+ int ret, fd, loop = 3;
+ int *pfd = extra_args;
+ struct sockaddr_un sa = {0};
+
+ if ((value == NULL) || (extra_args == NULL))
+ return -EINVAL;
+
+ fd = socket(AF_UNIX, SOCK_STREAM, 0);
+ if (fd < 0)
+ return -1;
+
+ sa.sun_family = AF_UNIX;
+ strncpy(sa.sun_path, value, sizeof(sa.sun_path));
+
+ /* We need to wait QEMU until socket will be prepared */
+ while (loop--) {
+ ret = connect(fd, (struct sockaddr*)&sa,
+ sizeof(struct sockaddr_un));
+ if (ret != 0)
+ sleep(1);
+ else
+ break;
+ }
+
+ if (ret != 0) {
+ close(fd);
+ return -1;
+ }
+
+ *pfd = fd;
+
+ return 0;
+}
+
+static struct rte_eth_dev *
+cvio_eth_dev_alloc(const char *name)
+{
+ struct rte_eth_dev *eth_dev;
+ struct rte_eth_dev_data *data;
+ struct virtio_hw *hw;
+
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+ if (eth_dev == NULL)
+ rte_panic("cannot alloc rte_eth_dev\n");
+
+ data = eth_dev->data;
+
+ hw = rte_zmalloc(NULL, sizeof(*hw), 0);
+ if (!hw)
+ rte_panic("malloc virtio_hw failed\n");
+
+ data->dev_private = hw;
+ /* will be used in virtio_dev_info_get() */
+ eth_dev->driver = &rte_virtio_pmd;
+ /* TAILQ_INIT(&(eth_dev->link_intr_cbs)); */
+ return eth_dev;
+}
+
+/*
+ * Dev initialization routine.
+ * Invoked once for each virtio vdev at EAL init time,
+ * See rte_eal_dev_init().
+ * Returns 0 on success.
+ */
+static int
+rte_cvio_pmd_devinit(const char *name, const char *params)
+{
+ struct rte_kvargs *kvlist = NULL;
+ struct rte_eth_dev *eth_dev = NULL;
+ int ret, qtest_sock, ivshmem_sock;
+
+ if (params == NULL || params[0] == '\0') {
+ rte_panic("param is null\n");
+ }
+
+ kvlist = rte_kvargs_parse(params, valid_args);
+ if (!kvlist)
+ rte_panic("error when parsing param\n");
+
+ if (rte_kvargs_count(kvlist, ETH_CVIO_ARG_IVSHMEM_PATH) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_CVIO_ARG_IVSHMEM_PATH,
+ &get_string_arg, &ivshmem_sock);
+ if (ret != 0) {
+ PMD_INIT_LOG(ERR,
+ "Failed to connect to ivshmem socket");
+ return -1;
+ }
+ } else {
+ rte_panic("no arg: %s\n", ETH_CVIO_ARG_IVSHMEM_PATH);
+ }
+
+ if (rte_kvargs_count(kvlist, ETH_CVIO_ARG_QTEST_PATH) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_CVIO_ARG_QTEST_PATH,
+ &get_string_arg, &qtest_sock);
+ if (ret != 0) {
+ PMD_INIT_LOG(ERR,
+ "Failed to connect to qtest socket");
+ return -1;
+ }
+ } else {
+ rte_panic("no arg: %s\n", ETH_CVIO_ARG_QTEST_PATH);
+ }
+
+ eth_dev = cvio_eth_dev_alloc(name);
+
+ qtest_vdev_init(eth_dev->data, qtest_sock, ivshmem_sock);
+
+ /* originally, this will be called in rte_eal_pci_probe() */
+ eth_virtio_dev_init(eth_dev);
+
+ return 0;
+}
+
+static int
+rte_cvio_pmd_devuninit(const char *name)
+{
+ /* TODO: Support port hotlug */
+ rte_panic("%s", name);
+ return 0;
+}
+
+static struct rte_driver rte_cvio_driver = {
+ .name = "eth_cvio",
+ .type = PMD_VDEV,
+ .init = rte_cvio_pmd_devinit,
+ .uninit = rte_cvio_pmd_devuninit,
+};
+
+PMD_REGISTER_DRIVER(rte_cvio_driver);
+
+#endif /* RTE_VIRTIO_VDEV */
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index ae2d47d..4f12c53 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -56,6 +56,16 @@
#define VIRTIO_MAX_RX_PKTLEN 9728

/* Features desired/implemented by this driver. */
+#ifdef RTE_VIRTIO_VDEV
+#define VIRTIO_PMD_GUEST_FEATURES \
+ (1u << VIRTIO_NET_F_MAC | \
+ 1u << VIRTIO_NET_F_MQ | \
+ 1u << VIRTIO_NET_F_CTRL_MAC_ADDR | \
+ 1u << VIRTIO_NET_F_CTRL_VQ | \
+ 1u << VIRTIO_NET_F_CTRL_RX | \
+ 1u << VIRTIO_NET_F_CTRL_VLAN | \
+ 1u << VIRTIO_NET_F_MRG_RXBUF)
+#else /* RTE_VIRTIO_VDEV */
#define VIRTIO_PMD_GUEST_FEATURES \
(1u << VIRTIO_NET_F_MAC | \
1u << VIRTIO_NET_F_STATUS | \
@@ -65,6 +75,7 @@
1u << VIRTIO_NET_F_CTRL_RX | \
1u << VIRTIO_NET_F_CTRL_VLAN | \
1u << VIRTIO_NET_F_MRG_RXBUF)
+#endif /* RTE_VIRTIO_VDEV */

/*
* CQ function prototype
@@ -122,5 +133,10 @@ uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
#define VTNET_LRO_FEATURES (VIRTIO_NET_F_GUEST_TSO4 | \
VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN)

+#ifdef RTE_VIRTIO_VDEV
+int qtest_vdev_init(struct rte_eth_dev_data *data,
+ int qtest_socket, int ivshmem_socket);
+void qtest_vdev_uninit(struct rte_eth_dev_data *data);
+#endif

#endif /* _VIRTIO_ETHDEV_H_ */
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 47f722a..fe884de 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -165,6 +165,9 @@ struct virtqueue;

struct virtio_hw {
struct virtqueue *cvq;
+#ifdef RTE_VIRTIO_VDEV
+ void *qsession;
+#endif
uint32_t io_base;
uint32_t guest_features;
uint32_t max_tx_queues;
@@ -226,6 +229,26 @@ outl_p(unsigned int data, unsigned int port)
}
#endif

+#ifdef RTE_VIRTIO_VDEV
+
+uint32_t virtio_ioport_read(struct virtio_hw *, uint64_t);
+void virtio_ioport_write(struct virtio_hw *, uint64_t, uint64_t);
+
+#define VIRTIO_READ_REG_1(hw, reg) \
+ virtio_ioport_read(hw, reg)
+#define VIRTIO_WRITE_REG_1(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value)
+#define VIRTIO_READ_REG_2(hw, reg) \
+ virtio_ioport_read(hw, reg)
+#define VIRTIO_WRITE_REG_2(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value)
+#define VIRTIO_READ_REG_4(hw, reg) \
+ virtio_ioport_read(hw, reg)
+#define VIRTIO_WRITE_REG_4(hw, reg, value) \
+ virtio_ioport_write(hw, reg, value)
+
+#else /* RTE_VIRTIO_VDEV */
+
#define VIRTIO_PCI_REG_ADDR(hw, reg) \
(unsigned short)((hw)->io_base + (reg))

@@ -244,6 +267,8 @@ outl_p(unsigned int data, unsigned int port)
#define VIRTIO_WRITE_REG_4(hw, reg, value) \
outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg))))

+#endif /* RTE_VIRTIO_VDEV */
+
static inline int
vtpci_with_feature(struct virtio_hw *hw, uint32_t bit)
{
--
2.1.4
Rich Lane
2015-11-19 18:16:24 UTC
Permalink
What's the reason for using qemu as a middleman? Couldn't the new PMD
itself open /dev/vhost-net or the vhost-user socket and send the commands
to set up virtqueues? That was the approach taken by Jianfeng's earlier RFC.
Post by Tetsuya Mukawa
THIS IS A PoC IMPLEMENATION.
[Abstraction]
Normally, virtio-net PMD only works on VM, because there is no virtio-net device on host.
This RFC patch extends virtio-net PMD to be able to work on host as virtual PMD.
But we didn't implement virtio-net device as a part of virtio-net PMD.
To prepare virtio-net device for the PMD, start QEMU process with special
QTest mode, then connect it from virtio-net PMD through unix domain socket.
The PMD can connect to anywhere QEMU virtio-net device can.
For example, the PMD can connects to vhost-net kernel module and
vhost-user backend application.
Similar to virtio-net PMD on QEMU, application memory that uses virtio-net
PMD will be shared between vhost backend application.
But vhost backend application memory will not be shared.
Main target of this PMD is container like docker, rkt, lxc and etc.
We can isolate related processes(virtio-net PMD process, QEMU and
vhost-user backend process) by container.
But, to communicate through unix domain socket, shared directory will be needed.
[How to use]
So far, we need QEMU patch to connect to vhost-user backend.
Please check known issue in later section.
Because of this, I will describe example of using vhost-net kernel module.
- Compile
Set "CONFIG_RTE_VIRTIO_VDEV=y" in config/common_linux.
Then compile it.
- Start QEMU like below.
$ sudo qemu-system-x86_64 -qtest unix:/tmp/qtest0,server -machine accel=qtest \
-display none -qtest-log /dev/null \
-netdev
type=tap,script=/etc/qemu-ifup,id=net0,vhost=on \
-device virtio-net-pci,netdev=net0 \
-chardev
socket,id=chr1,path=/tmp/ivshmem0,server \
-device ivshmem,size=1G,chardev=chr1,vectors=1
- Start DPDK application like below
$ sudo ./testpmd -c f -n 1 -m 1024 --shm \
--vdev="eth_cvio0,qtest=/tmp/qtest0,ivshmem=/tmp/ivshmem0" -- \
--disable-hw-vlan --txqflags=0xf00 -i
- Check created tap device.
(*1) Please Specify same memory size in QEMU and DPDK command line.
[Detailed Description]
- virtio-net device implementation
The PMD uses QEMU virtio-net device. To do that, QEMU QTest functionality is used.
QTest is a test framework of QEMU devices. It allows us to implement a
device driver outside of QEMU.
With QTest, we can implement DPDK application and virtio-net PMD as
standalone process on host.
When QEMU is invoked as QTest mode, any guest code will not run.
To know more about QTest, see below.
http://wiki.qemu.org/Features/QTest
- probing devices
QTest provides a unix domain socket. Through this socket, driver process
can access to I/O port and memory of QEMU virtual machine.
The PMD will send I/O port accesses to probe pci devices.
If we can find virtio-net and ivshmem device, initialize the devices.
Also, I/O port accesses of virtio-net PMD will be sent through socket, and
virtio-net PMD can initialize vitio-net device on QEMU correctly.
- ivshmem device to share memory
To share memory that virtio-net PMD process uses, ivshmem device will be used.
Because ivshmem device can only handle one file descriptor, shared memory
should be consist of one file.
To allocate such a memory, EAL has new option called "--shm".
If the option is specified, EAL will open a file and allocate memory from hugepages.
While initializing ivshmem device, we can set BAR(Base Address Register).
It represents which memory QEMU vcpu can access to this shared memory.
We will specify host physical address of shared memory as this address.
It is very useful because we don't need to apply patch to QEMU to calculate address offset.
(For example, if virtio-net PMD process will allocate memory from shared
memory, then specify the physical address of it to virtio-net register,
QEMU virtio-net device can understand it without calculating address
offset.)
- Known limitation
So far, the PMD doesn't handle interrupts from QEMU devices.
Because of this, VIRTIO_NET_F_STATUS functionality is dropped.
But without it, we can use all virtio-net functions.
- Known issues
So far, to use vhost-user, we need to apply vhost-user patch to QEMU and
DPDK vhost library.
This is because, QEMU will not send memory information and file descriptor
of ivshmem device to vhost-user backend.
(Anyway, vhost-net kernel module can receive the information. So
vhost-user behavior will not be correct. I will submit the patch to QEMU
soon)
Also, we may have an issue in DPDK vhost library to handle kickfd and
callfd. The patch for it is needed.
(Let me check it more)
If someone wants to check vhost-user behavior, I will describe it more in later email.
[Addition]
We can apply same manner to handle any kind of QEMU devices from DPDK application.
So far, I don't have any ideas except for virtio-net device. But someone would have.
EAL: Add new EAL "--shm" option.
virtio: Extend virtio-net PMD to support container environment
config/common_linuxapp | 5 +
drivers/net/virtio/Makefile | 4 +
drivers/net/virtio/qtest.c | 590
+++++++++++++++++++++++++++++
drivers/net/virtio/virtio_ethdev.c | 214 ++++++++++-
drivers/net/virtio/virtio_ethdev.h | 16 +
drivers/net/virtio/virtio_pci.h | 25 ++
lib/librte_eal/common/eal_common_options.c | 5 +
lib/librte_eal/common/eal_internal_cfg.h | 1 +
lib/librte_eal/common/eal_options.h | 2 +
lib/librte_eal/common/include/rte_memory.h | 5 +
lib/librte_eal/linuxapp/eal/eal_memory.c | 71 ++++
11 files changed, 917 insertions(+), 21 deletions(-)
create mode 100644 drivers/net/virtio/qtest.c
--
2.1.4
Xie, Huawei
2015-11-20 02:00:51 UTC
Permalink
Post by Rich Lane
What's the reason for using qemu as a middleman? Couldn't the new PMD
itself open /dev/vhost-net or the vhost-user socket and send the commands
to set up virtqueues? That was the approach taken by Jianfeng's earlier RFC.
Rich:
Our initial POC also has a device simulation layer, but it is linked
with the DPDK driver as a library.
As i created that device simulation based on lkvm, and it takes too much
effort to rewrite it from scratch, so we decide to release a simple
version without device simulation first.
Without device simulation, The PMD is pretty simple, standalone, no
dependency to qtest process.
With device simulation, we could easily implement other virtio device in
DPDK easily, like virtio-crypt.
Maybe anyway we provide the simple implementation option, for customers
who don't like the extra complexity to launch a secondary process in
their container.
[...]
Tetsuya Mukawa
2015-11-20 02:35:34 UTC
Permalink
Post by Xie, Huawei
Post by Rich Lane
What's the reason for using qemu as a middleman? Couldn't the new PMD
itself open /dev/vhost-net or the vhost-user socket and send the commands
to set up virtqueues? That was the approach taken by Jianfeng's earlier RFC.
Our initial POC also has a device simulation layer, but it is linked
with the DPDK driver as a library.
As i created that device simulation based on lkvm, and it takes too much
effort to rewrite it from scratch, so we decide to release a simple
version without device simulation first.
Without device simulation, The PMD is pretty simple, standalone, no
dependency to qtest process.
With device simulation, we could easily implement other virtio device in
DPDK easily, like virtio-crypt.
Hi Rich and Xie,

Probably, how to prepare virtio-net device is the difference between
Jianfeng's RFC and our RFC.
The reason why I choose this approach is below.

1. Ease of maintenance
If we have our virtio-net device, we need to spend time to maintain it.
And QEMU virtio-net code is more tested by more virtio-net drivers and
more users. As a result, it should have less bugs.
Also, If we uses QEMU virtio-net code, we only need to maintain QTest
related implementation.
Anyway, QTest is very stable.
Probably we have bugs first, but later, we don't need to maintain it so
much.

2. Extendability
The virtio-net and vhost specification will be extended in the future.
If we have own implementation, we need to maintain more codes.
Post by Xie, Huawei
Maybe anyway we provide the simple implementation option, for customers
who don't like the extra complexity to launch a secondary process in
their container.
[...]
For example, for the user who is OK to invoke 2 processes in same
container, just prepare shell script that invokes QTest process and
vhost-user backend process will be enough.

For the users who doesn't want to invoke 2 processes in one container,
anyway they use some kind of orchestration tool, so invoking one more
process/container is not difficult.

I guess the invoking and connecting multiple processes over containers
may not be something special for container users.
(like deploying load balancer, web server and DB)

Tetsuya
Tetsuya Mukawa
2015-11-20 02:53:36 UTC
Permalink
Post by Tetsuya Mukawa
Post by Xie, Huawei
Post by Rich Lane
What's the reason for using qemu as a middleman? Couldn't the new PMD
itself open /dev/vhost-net or the vhost-user socket and send the commands
to set up virtqueues? That was the approach taken by Jianfeng's earlier RFC.
Our initial POC also has a device simulation layer, but it is linked
with the DPDK driver as a library.
As i created that device simulation based on lkvm, and it takes too much
effort to rewrite it from scratch, so we decide to release a simple
version without device simulation first.
Without device simulation, The PMD is pretty simple, standalone, no
dependency to qtest process.
With device simulation, we could easily implement other virtio device in
DPDK easily, like virtio-crypt.
Hi Rich and Xie,
Probably, how to prepare virtio-net device is the difference between
Jianfeng's RFC and our RFC.
The reason why I choose this approach is below.
1. Ease of maintenance
If we have our virtio-net device, we need to spend time to maintain it.
And QEMU virtio-net code is more tested by more virtio-net drivers and
more users. As a result, it should have less bugs.
Also, If we uses QEMU virtio-net code, we only need to maintain QTest
related implementation.
Anyway, QTest is very stable.
Probably we have bugs first, but later, we don't need to maintain it so
much.
2. Extendability
The virtio-net and vhost specification will be extended in the future.
If we have own implementation, we need to maintain more codes.
Post by Xie, Huawei
Maybe anyway we provide the simple implementation option, for customers
who don't like the extra complexity to launch a secondary process in
their container.
[...]
For example, for the user who is OK to invoke 2 processes in same
container, just prepare shell script that invokes QTest process and
vhost-user backend process will be enough.
For the users who doesn't want to invoke 2 processes in one container,
anyway they use some kind of orchestration tool, so invoking one more
process/container is not difficult.
I guess the invoking and connecting multiple processes over containers
may not be something special for container users.
(like deploying load balancer, web server and DB)
Tetsuya
But yes, it may be nice to have option for the users who only needs
limited features.
Actually, I am not container users, so not sure which is preferred.
If we have the option, I guess we need to choose very limited features
to be easy to maintain.

Thanks,
Tetsuya
Qiu, Michael
2015-12-28 05:15:58 UTC
Permalink
Hi, Tetsuya

I have a question about your solution, as I know you plan to run qemu
and dpdk both in container right?

If so, I think it's a bit tricky, DPDK is a lib, and qemu is a App,
seems it is not suitable to let a lib depends on Apps.

Also, till now I don't see any usecase to run qemu inside container.

Thanks,
Michael
Post by Tetsuya Mukawa
THIS IS A PoC IMPLEMENATION.
[Abstraction]
Normally, virtio-net PMD only works on VM, because there is no virtio-net device on host.
This RFC patch extends virtio-net PMD to be able to work on host as virtual PMD.
But we didn't implement virtio-net device as a part of virtio-net PMD.
To prepare virtio-net device for the PMD, start QEMU process with special QTest mode, then connect it from virtio-net PMD through unix domain socket.
The PMD can connect to anywhere QEMU virtio-net device can.
For example, the PMD can connects to vhost-net kernel module and vhost-user backend application.
Similar to virtio-net PMD on QEMU, application memory that uses virtio-net PMD will be shared between vhost backend application.
But vhost backend application memory will not be shared.
Main target of this PMD is container like docker, rkt, lxc and etc.
We can isolate related processes(virtio-net PMD process, QEMU and vhost-user backend process) by container.
But, to communicate through unix domain socket, shared directory will be needed.
[How to use]
So far, we need QEMU patch to connect to vhost-user backend.
Please check known issue in later section.
Because of this, I will describe example of using vhost-net kernel module.
- Compile
Set "CONFIG_RTE_VIRTIO_VDEV=y" in config/common_linux.
Then compile it.
- Start QEMU like below.
$ sudo qemu-system-x86_64 -qtest unix:/tmp/qtest0,server -machine accel=qtest \
-display none -qtest-log /dev/null \
-netdev type=tap,script=/etc/qemu-ifup,id=net0,vhost=on \
-device virtio-net-pci,netdev=net0 \
-chardev socket,id=chr1,path=/tmp/ivshmem0,server \
-device ivshmem,size=1G,chardev=chr1,vectors=1
- Start DPDK application like below
$ sudo ./testpmd -c f -n 1 -m 1024 --shm \
--vdev="eth_cvio0,qtest=/tmp/qtest0,ivshmem=/tmp/ivshmem0" -- \
--disable-hw-vlan --txqflags=0xf00 -i
- Check created tap device.
(*1) Please Specify same memory size in QEMU and DPDK command line.
[Detailed Description]
- virtio-net device implementation
The PMD uses QEMU virtio-net device. To do that, QEMU QTest functionality is used.
QTest is a test framework of QEMU devices. It allows us to implement a device driver outside of QEMU.
With QTest, we can implement DPDK application and virtio-net PMD as standalone process on host.
When QEMU is invoked as QTest mode, any guest code will not run.
To know more about QTest, see below.
http://wiki.qemu.org/Features/QTest
- probing devices
QTest provides a unix domain socket. Through this socket, driver process can access to I/O port and memory of QEMU virtual machine.
The PMD will send I/O port accesses to probe pci devices.
If we can find virtio-net and ivshmem device, initialize the devices.
Also, I/O port accesses of virtio-net PMD will be sent through socket, and virtio-net PMD can initialize vitio-net device on QEMU correctly.
- ivshmem device to share memory
To share memory that virtio-net PMD process uses, ivshmem device will be used.
Because ivshmem device can only handle one file descriptor, shared memory should be consist of one file.
To allocate such a memory, EAL has new option called "--shm".
If the option is specified, EAL will open a file and allocate memory from hugepages.
While initializing ivshmem device, we can set BAR(Base Address Register).
It represents which memory QEMU vcpu can access to this shared memory.
We will specify host physical address of shared memory as this address.
It is very useful because we don't need to apply patch to QEMU to calculate address offset.
(For example, if virtio-net PMD process will allocate memory from shared memory, then specify the physical address of it to virtio-net register, QEMU virtio-net device can understand it without calculating address offset.)
- Known limitation
So far, the PMD doesn't handle interrupts from QEMU devices.
Because of this, VIRTIO_NET_F_STATUS functionality is dropped.
But without it, we can use all virtio-net functions.
- Known issues
So far, to use vhost-user, we need to apply vhost-user patch to QEMU and DPDK vhost library.
This is because, QEMU will not send memory information and file descriptor of ivshmem device to vhost-user backend.
(Anyway, vhost-net kernel module can receive the information. So vhost-user behavior will not be correct. I will submit the patch to QEMU soon)
Also, we may have an issue in DPDK vhost library to handle kickfd and callfd. The patch for it is needed.
(Let me check it more)
If someone wants to check vhost-user behavior, I will describe it more in later email.
[Addition]
We can apply same manner to handle any kind of QEMU devices from DPDK application.
So far, I don't have any ideas except for virtio-net device. But someone would have.
EAL: Add new EAL "--shm" option.
virtio: Extend virtio-net PMD to support container environment
config/common_linuxapp | 5 +
drivers/net/virtio/Makefile | 4 +
drivers/net/virtio/qtest.c | 590 +++++++++++++++++++++++++++++
drivers/net/virtio/virtio_ethdev.c | 214 ++++++++++-
drivers/net/virtio/virtio_ethdev.h | 16 +
drivers/net/virtio/virtio_pci.h | 25 ++
lib/librte_eal/common/eal_common_options.c | 5 +
lib/librte_eal/common/eal_internal_cfg.h | 1 +
lib/librte_eal/common/eal_options.h | 2 +
lib/librte_eal/common/include/rte_memory.h | 5 +
lib/librte_eal/linuxapp/eal/eal_memory.c | 71 ++++
11 files changed, 917 insertions(+), 21 deletions(-)
create mode 100644 drivers/net/virtio/qtest.c
Tetsuya Mukawa
2015-12-28 11:06:37 UTC
Permalink
Post by Qiu, Michael
Hi, Tetsuya
I have a question about your solution, as I know you plan to run qemu
and dpdk both in container right?
Hi Michael,

Thanks for your comments.

It depends on use cases.
For example, if we want to use vhost-user, we need to invoke 3 processes
(DPDK app that uses virtio-net PMD, QEMU and DPDK app that uses
vhost-user PMD).
These 3 process can run in both container and host.
Some users may want to run 3 processes in different container.
Some users may want to run QEMU and vhost-user PMD prcess in one container.
Anyway, in some cases, DPDK app and QEMU will run in one container like
you said.
Post by Qiu, Michael
If so, I think it's a bit tricky, DPDK is a lib, and qemu is a App,
seems it is not suitable to let a lib depends on Apps.
If we implement as Jianfeng did, virtio-net PMD will depends on
vhost-user PMD process.
And if there is no vhost-user process, it doesn't work.
In my case, if there is no QEMU process, it doesn't work.
Post by Qiu, Michael
Also, till now I don't see any usecase to run qemu inside container.
Described above, if you don't want to run QEMU in container, you don't
need to do.
You can run QEMU in host.

Actually, QEMU will handle file descriptor of virtio-net PMD process
memory, so someone may want to isolate it.
In this case, they can run QEMU in container.

Thanks,
Tetsuya
Post by Qiu, Michael
Thanks,
Michael
Post by Tetsuya Mukawa
THIS IS A PoC IMPLEMENATION.
[Abstraction]
Normally, virtio-net PMD only works on VM, because there is no virtio-net device on host.
This RFC patch extends virtio-net PMD to be able to work on host as virtual PMD.
But we didn't implement virtio-net device as a part of virtio-net PMD.
To prepare virtio-net device for the PMD, start QEMU process with special QTest mode, then connect it from virtio-net PMD through unix domain socket.
The PMD can connect to anywhere QEMU virtio-net device can.
For example, the PMD can connects to vhost-net kernel module and vhost-user backend application.
Similar to virtio-net PMD on QEMU, application memory that uses virtio-net PMD will be shared between vhost backend application.
But vhost backend application memory will not be shared.
Main target of this PMD is container like docker, rkt, lxc and etc.
We can isolate related processes(virtio-net PMD process, QEMU and vhost-user backend process) by container.
But, to communicate through unix domain socket, shared directory will be needed.
[How to use]
So far, we need QEMU patch to connect to vhost-user backend.
Please check known issue in later section.
Because of this, I will describe example of using vhost-net kernel module.
- Compile
Set "CONFIG_RTE_VIRTIO_VDEV=y" in config/common_linux.
Then compile it.
- Start QEMU like below.
$ sudo qemu-system-x86_64 -qtest unix:/tmp/qtest0,server -machine accel=qtest \
-display none -qtest-log /dev/null \
-netdev type=tap,script=/etc/qemu-ifup,id=net0,vhost=on \
-device virtio-net-pci,netdev=net0 \
-chardev socket,id=chr1,path=/tmp/ivshmem0,server \
-device ivshmem,size=1G,chardev=chr1,vectors=1
- Start DPDK application like below
$ sudo ./testpmd -c f -n 1 -m 1024 --shm \
--vdev="eth_cvio0,qtest=/tmp/qtest0,ivshmem=/tmp/ivshmem0" -- \
--disable-hw-vlan --txqflags=0xf00 -i
- Check created tap device.
(*1) Please Specify same memory size in QEMU and DPDK command line.
[Detailed Description]
- virtio-net device implementation
The PMD uses QEMU virtio-net device. To do that, QEMU QTest functionality is used.
QTest is a test framework of QEMU devices. It allows us to implement a device driver outside of QEMU.
With QTest, we can implement DPDK application and virtio-net PMD as standalone process on host.
When QEMU is invoked as QTest mode, any guest code will not run.
To know more about QTest, see below.
http://wiki.qemu.org/Features/QTest
- probing devices
QTest provides a unix domain socket. Through this socket, driver process can access to I/O port and memory of QEMU virtual machine.
The PMD will send I/O port accesses to probe pci devices.
If we can find virtio-net and ivshmem device, initialize the devices.
Also, I/O port accesses of virtio-net PMD will be sent through socket, and virtio-net PMD can initialize vitio-net device on QEMU correctly.
- ivshmem device to share memory
To share memory that virtio-net PMD process uses, ivshmem device will be used.
Because ivshmem device can only handle one file descriptor, shared memory should be consist of one file.
To allocate such a memory, EAL has new option called "--shm".
If the option is specified, EAL will open a file and allocate memory from hugepages.
While initializing ivshmem device, we can set BAR(Base Address Register).
It represents which memory QEMU vcpu can access to this shared memory.
We will specify host physical address of shared memory as this address.
It is very useful because we don't need to apply patch to QEMU to calculate address offset.
(For example, if virtio-net PMD process will allocate memory from shared memory, then specify the physical address of it to virtio-net register, QEMU virtio-net device can understand it without calculating address offset.)
- Known limitation
So far, the PMD doesn't handle interrupts from QEMU devices.
Because of this, VIRTIO_NET_F_STATUS functionality is dropped.
But without it, we can use all virtio-net functions.
- Known issues
So far, to use vhost-user, we need to apply vhost-user patch to QEMU and DPDK vhost library.
This is because, QEMU will not send memory information and file descriptor of ivshmem device to vhost-user backend.
(Anyway, vhost-net kernel module can receive the information. So vhost-user behavior will not be correct. I will submit the patch to QEMU soon)
Also, we may have an issue in DPDK vhost library to handle kickfd and callfd. The patch for it is needed.
(Let me check it more)
If someone wants to check vhost-user behavior, I will describe it more in later email.
[Addition]
We can apply same manner to handle any kind of QEMU devices from DPDK application.
So far, I don't have any ideas except for virtio-net device. But someone would have.
EAL: Add new EAL "--shm" option.
virtio: Extend virtio-net PMD to support container environment
config/common_linuxapp | 5 +
drivers/net/virtio/Makefile | 4 +
drivers/net/virtio/qtest.c | 590 +++++++++++++++++++++++++++++
drivers/net/virtio/virtio_ethdev.c | 214 ++++++++++-
drivers/net/virtio/virtio_ethdev.h | 16 +
drivers/net/virtio/virtio_pci.h | 25 ++
lib/librte_eal/common/eal_common_options.c | 5 +
lib/librte_eal/common/eal_internal_cfg.h | 1 +
lib/librte_eal/common/eal_options.h | 2 +
lib/librte_eal/common/include/rte_memory.h | 5 +
lib/librte_eal/linuxapp/eal/eal_memory.c | 71 ++++
11 files changed, 917 insertions(+), 21 deletions(-)
create mode 100644 drivers/net/virtio/qtest.c
Loading...