Xie, Huawei
2015-08-20 10:14:55 UTC
I read your mail, seems what we did are quite similar. Here i wrote a
quick mail to describe our design. Let me know if it is the same thing.
We don't have a high performance networking interface in container for
NFV. Current veth pair based interface couldn't be easily accelerated.
1. DPDK based virtio PMD driver in container.
2. device simulation framework in container.
3. dpdk(or kernel) vhost running in host.
How virtio is created?
A: There is no "real" virtio-pci device in container environment.
1). Host maintains pools of memories, and shares memory to container.
This could be accomplished through host share a huge page file to container.
2). Containers creates virtio rings based on the shared memory.
3). Container creates mbuf memory pools on the shared memory.
4) Container send the memory and vring information to vhost through
vhost message. This could be done either through ioctl call or vhost
user message.
How vhost message is sent?
A: There are two alternative ways to do this.
1) The customized virtio PMD is responsible for all the vring creation,
and vhost message sending.
2) We could do this through a lightweight device simulation framework.
The device simulation creates simple PCI bus. On the PCI bus,
virtio-net PCI devices are created. The device simulations provides
IOAPI for MMIO/IO access.
2.1 virtio PMD configures the pseudo virtio device as how it does in
KVM guest enviroment.
2.2 Rather than using io instruction, virtio PMD uses IOAPI for IO
operation on the virtio-net PCI device.
2.3 The device simulation is responsible for device state machine
simulation.
2.4 The device simulation is responsbile for talking to vhost.
With this approach, we could minimize the virtio PMD modifications.
The virtio PMD is like configuring a real virtio-net PCI device.
Memory mapping?
A: QEMU could access the whole guest memory in KVM enviroment. We need
to fill the gap.
container maps the shared memory to container's virtual address space
and host maps it to host's virtual address space. There is a fixed
offset mapping.
Container creates shared vring based on the memory. Container also
creates mbuf memory pool based on the shared memroy.
In VHOST_SET_MEMORY_TABLE message, we send the memory mapping
information for the shared memory. As we require mbuf pool created on
the shared memory, and buffers are allcoated from the mbuf pools, dpdk
vhost could translate the GPA in vring desc to host virtual.
GPA or CVA in vring desc?
To ease the memory translation, rather than using GPA, here we use
CVA(container virtual address). This the tricky thing here.
1) virtio PMD writes vring's VFN rather than PFN to PFN register through
IOAPI.
2) device simulation framework will use VFN as PFN.
3) device simulation sends SET_VRING_ADDR with CVA.
4) virtio PMD fills vring desc with CVA of the mbuf data pointer rather
than GPA.
So when host sees the CVA, it could translates it to HVA(host virtual
address).
The virtio interface in container follows the vhost message format, and
is compliant with dpdk vhost implmentation, i.e, no dpdk vhost
modification is needed.
vHost isn't aware whether the incoming virtio comes from KVM guest or
container.
The pretty much covers the high level design. There are quite some low
level issues. For example, 32bit PFN is enough for KVM guest, since we
use 64bit VFN(virtual page frame number), trick is done here through a
special IOAPI.
/huawei
quick mail to describe our design. Let me know if it is the same thing.
We don't have a high performance networking interface in container for
NFV. Current veth pair based interface couldn't be easily accelerated.
1. DPDK based virtio PMD driver in container.
2. device simulation framework in container.
3. dpdk(or kernel) vhost running in host.
How virtio is created?
A: There is no "real" virtio-pci device in container environment.
1). Host maintains pools of memories, and shares memory to container.
This could be accomplished through host share a huge page file to container.
2). Containers creates virtio rings based on the shared memory.
3). Container creates mbuf memory pools on the shared memory.
4) Container send the memory and vring information to vhost through
vhost message. This could be done either through ioctl call or vhost
user message.
How vhost message is sent?
A: There are two alternative ways to do this.
1) The customized virtio PMD is responsible for all the vring creation,
and vhost message sending.
2) We could do this through a lightweight device simulation framework.
The device simulation creates simple PCI bus. On the PCI bus,
virtio-net PCI devices are created. The device simulations provides
IOAPI for MMIO/IO access.
2.1 virtio PMD configures the pseudo virtio device as how it does in
KVM guest enviroment.
2.2 Rather than using io instruction, virtio PMD uses IOAPI for IO
operation on the virtio-net PCI device.
2.3 The device simulation is responsible for device state machine
simulation.
2.4 The device simulation is responsbile for talking to vhost.
With this approach, we could minimize the virtio PMD modifications.
The virtio PMD is like configuring a real virtio-net PCI device.
Memory mapping?
A: QEMU could access the whole guest memory in KVM enviroment. We need
to fill the gap.
container maps the shared memory to container's virtual address space
and host maps it to host's virtual address space. There is a fixed
offset mapping.
Container creates shared vring based on the memory. Container also
creates mbuf memory pool based on the shared memroy.
In VHOST_SET_MEMORY_TABLE message, we send the memory mapping
information for the shared memory. As we require mbuf pool created on
the shared memory, and buffers are allcoated from the mbuf pools, dpdk
vhost could translate the GPA in vring desc to host virtual.
GPA or CVA in vring desc?
To ease the memory translation, rather than using GPA, here we use
CVA(container virtual address). This the tricky thing here.
1) virtio PMD writes vring's VFN rather than PFN to PFN register through
IOAPI.
2) device simulation framework will use VFN as PFN.
3) device simulation sends SET_VRING_ADDR with CVA.
4) virtio PMD fills vring desc with CVA of the mbuf data pointer rather
than GPA.
So when host sees the CVA, it could translates it to HVA(host virtual
address).
The virtio interface in container follows the vhost message format, and
is compliant with dpdk vhost implmentation, i.e, no dpdk vhost
modification is needed.
vHost isn't aware whether the incoming virtio comes from KVM guest or
container.
The pretty much covers the high level design. There are quite some low
level issues. For example, 32bit PFN is enough for KVM guest, since we
use 64bit VFN(virtual page frame number), trick is done here through a
special IOAPI.
/huawei