Discussion:
[dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK
(too old to reply)
Anatoly Burakov
2018-09-04 13:11:35 UTC
Permalink
This is a proposal to enable using externally allocated memory
in DPDK.

In a nutshell, here is what is being done here:

- Index internal malloc heaps by NUMA node index, rather than NUMA
node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
- Each new heap will receive a unique socket ID that will be used by
allocator to decide from which heap (internal or external) to
allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
of externally allocated memory
- If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps

The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).

The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).

A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.

Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.

RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements

Anatoly Burakov (16):
mem: add length to memseg list
mem: allow memseg lists to be marked as external
malloc: index heaps using heap ID rather than NUMA node
mem: do not check for invalid socket ID
flow_classify: do not check for invalid socket ID
pipeline: do not check for invalid socket ID
sched: do not check for invalid socket ID
malloc: add name to malloc heaps
malloc: add function to query socket ID of named heap
malloc: allow creating malloc heaps
malloc: allow destroying heaps
malloc: allow adding memory to named heaps
malloc: allow removing memory from named heaps
malloc: allow attaching to external memory chunks
malloc: allow detaching from external memory
test: add unit tests for external memory support

config/common_base | 1 +
config/rte_config.h | 1 +
drivers/bus/fslmc/fslmc_vfio.c | 7 +-
drivers/bus/pci/linux/pci.c | 2 +-
drivers/net/mlx4/mlx4_mr.c | 3 +
drivers/net/mlx5/mlx5.c | 5 +-
drivers/net/mlx5/mlx5_mr.c | 3 +
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 +
lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +-
lib/librte_eal/common/eal_common_memory.c | 9 +-
lib/librte_eal/common/eal_common_memzone.c | 8 +-
.../common/include/rte_eal_memconfig.h | 6 +-
lib/librte_eal/common/include/rte_malloc.h | 181 +++++++++
.../common/include/rte_malloc_heap.h | 3 +
lib/librte_eal/common/include/rte_memory.h | 9 +
lib/librte_eal/common/malloc_heap.c | 287 +++++++++++--
lib/librte_eal/common/malloc_heap.h | 17 +
lib/librte_eal/common/rte_malloc.c | 383 ++++++++++++++++-
lib/librte_eal/linuxapp/eal/eal.c | 3 +
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 +-
lib/librte_eal/rte_eal_version.map | 7 +
lib/librte_flow_classify/rte_flow_classify.c | 3 +-
lib/librte_mempool/rte_mempool.c | 31 +-
lib/librte_pipeline/rte_pipeline.c | 3 +-
lib/librte_sched/rte_sched.c | 2 +-
test/test/Makefile | 1 +
test/test/autotest_data.py | 14 +-
test/test/meson.build | 1 +
test/test/test_external_mem.c | 384 ++++++++++++++++++
test/test/test_malloc.c | 3 +
test/test/test_memzone.c | 3 +
34 files changed, 1346 insertions(+), 84 deletions(-)
create mode 100644 test/test/test_external_mem.c
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:36 UTC
Permalink
Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.

Signed-off-by: Anatoly Burakov <***@intel.com>
---
drivers/bus/pci/linux/pci.c | 2 +-
lib/librte_eal/bsdapp/eal/eal_memory.c | 2 ++
lib/librte_eal/common/eal_common_memory.c | 5 ++---
lib/librte_eal/common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 3 ++-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +++-
6 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..d6e1027ab 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
static int
find_max_end_va(const struct rte_memseg_list *msl, void *arg)
{
- size_t sz = msl->memseg_arr.len * msl->page_sz;
+ size_t sz = msl->len;
void *end_va = RTE_PTR_ADD(msl->base_va, sz);
void **max_va = arg;

diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 16d2bc7c3..65ea670f9 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -79,6 +79,7 @@ rte_eal_hugepage_init(void)
}
msl->base_va = addr;
msl->page_sz = page_sz;
+ msl->len = internal_config.memory;
msl->socket_id = 0;

/* populate memsegs. each memseg is 1 page long */
@@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl)
return -1;
}
msl->base_va = addr;
+ msl->len = mem_sz;

return 0;
}
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index fbfb1b055..0868bf681 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl)

/* a memseg list was specified, check if it's the right one */
start = msl->base_va;
- end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len);
+ end = RTE_PTR_ADD(start, msl->len);

if (addr < start || addr >= end)
return NULL;
@@ -194,8 +194,7 @@ virt2memseg_list(const void *addr)
msl = &mcfg->memsegs[msl_idx];

start = msl->base_va;
- end = RTE_PTR_ADD(start,
- (size_t)msl->page_sz * msl->memseg_arr.len);
+ end = RTE_PTR_ADD(start, msl->len);
if (addr >= start && addr < end)
break;
}
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..1d8b0a6fe 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,6 +30,7 @@ struct rte_memseg_list {
uint64_t addr_64;
/**< Makes sure addr is always 64-bits */
};
+ size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
volatile uint32_t version; /**< version number for multiprocess sync. */
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index aa95551a8..d040a2f71 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -828,7 +828,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
int msl_idx, seg_idx, ret, dir_fd = -1;

start_addr = (uintptr_t) msl->base_va;
- end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
+ end_addr = start_addr + msl->len;

if ((uintptr_t)wa->ms->addr < start_addr ||
(uintptr_t)wa->ms->addr >= end_addr)
@@ -1314,6 +1314,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
return -1;
}
local_msl->base_va = primary_msl->base_va;
+ local_msl->len = primary_msl->len;

return 0;
}
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index dbf19499e..c522538bf 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -857,6 +857,7 @@ alloc_va_space(struct rte_memseg_list *msl)
return -1;
}
msl->base_va = addr;
+ msl->len = mem_sz;

return 0;
}
@@ -1365,6 +1366,7 @@ eal_legacy_hugepage_init(void)
msl->base_va = addr;
msl->page_sz = page_sz;
msl->socket_id = 0;
+ msl->len = internal_config.memory;

/* populate memsegs. each memseg is one page long */
for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
@@ -1611,7 +1613,7 @@ eal_legacy_hugepage_init(void)
if (msl->memseg_arr.count > 0)
continue;
/* this is an unused list, deallocate it */
- mem_sz = (size_t)msl->page_sz * msl->memseg_arr.len;
+ mem_sz = msl->len;
munmap(msl->base_va, mem_sz);
msl->base_va = NULL;
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:37 UTC
Permalink
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense. Mempools is a
special case, because we may be asked to allocate a mempool on
a specific socket, and we need to ignore all page sizes on
other heaps or other sockets. Previously, this assumption of
knowing all page sizes was not a problem, but it will be now,
so we have to match socket ID with page size when calculating
minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
v1:
- Adjust all calls to memseg walk functions to ignore external
segments where it made sense to do so

drivers/bus/fslmc/fslmc_vfio.c | 7 +++--
drivers/net/mlx4/mlx4_mr.c | 3 ++
drivers/net/mlx5/mlx5.c | 5 ++-
drivers/net/mlx5/mlx5_mr.c | 3 ++
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++-
lib/librte_eal/bsdapp/eal/eal.c | 3 ++
lib/librte_eal/bsdapp/eal/eal_memory.c | 7 +++--
lib/librte_eal/common/eal_common_memory.c | 4 +++
.../common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/common/include/rte_memory.h | 9 ++++++
lib/librte_eal/common/malloc_heap.c | 9 ++++--
lib/librte_eal/linuxapp/eal/eal.c | 3 ++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 ++++++
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 +++++++---
lib/librte_mempool/rte_mempool.c | 31 ++++++++++++++-----
test/test/test_malloc.c | 3 ++
test/test/test_memzone.c | 3 ++
17 files changed, 102 insertions(+), 20 deletions(-)

diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
}

static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *n_segs = arg;
int ret;

+ if (msl->external)
+ return 0;
+
ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
if (ret)
DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;

+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ec63bc6e2..d9ed15880 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
static void *uar_base;

static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
void **addr = arg;

+ if (msl->external)
+ return 0;
+
if (*addr == NULL)
*addr = ms->addr;
else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;

+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index b2444096c..885c59c8a 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
uint32_t region_nr;
};
static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, size_t len, void *arg)
{
struct walk_arg *wa = arg;
struct vhost_memory_region *mr;
void *start_addr;

+ if (msl->external)
+ return 0;
+
if (wa->region_nr >= max_regions)
return -1;

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;

+ if (msl->external)
+ return 0;
+
if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
return 1;

diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
int seg_idx;
};
static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
struct attach_walk_args *wa = arg;
void *addr;

+ if (msl->external)
+ return 0;
+
addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0868bf681..55a11bf4d 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
{
uint64_t *total_len = arg;

+ if (msl->external)
+ return 0;
+
*total_len += msl->memseg_arr.count * msl->page_sz;

return 0;
@@ -547,6 +550,7 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
return ret;
}

+
/* init memory subsystem */
int
rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..76faf9a4a 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
+ bool external; /**< true if this list points to external memory */
volatile uint32_t version; /**< version number for multiprocess sync. */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index c4b7f4cff..b381d1cb6 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 12aaf2d72..8c37b9d7c 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct malloc_heap *heap;
int msl_idx;

+ if (msl->external)
+ return 0;
+
heap = &mcfg->malloc_heaps[msl->socket_id];

/* msl is const, so find it */
@@ -756,8 +759,10 @@ malloc_heap_free(struct malloc_elem *elem)
/* anything after this is a bonus */
ret = 0;

- /* ...of which we can't avail if we are in legacy mode */
- if (internal_config.legacy_mem)
+ /* ...of which we can't avail if we are in legacy mode, or if this is an
+ * externally allocated segment.
+ */
+ if (internal_config.legacy_mem || msl->external)
goto free_unlock;

/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..729ae2060 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;

+ if (msl->external)
+ return 0;
+
return *socket_id == msl->socket_id;
}

diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index d040a2f71..8b0bbe43f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1250,6 +1250,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
unsigned int i;
int msl_idx;

+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1298,6 +1301,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
char name[PATH_MAX];
int msl_idx, ret;

+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1328,6 +1334,9 @@ secondary_lock_list_create_walk(const struct rte_memseg_list *msl,
int msl_idx;
int *data;

+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
}

static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *vfio_container_fd = arg;

+ if (msl->external)
+ return 0;
+
return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
}

static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
int *vfio_container_fd = arg;

+ if (msl->external)
+ return 0;
+
return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
uint64_t hugepage_sz;
};
static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
struct spapr_walk_param *param = arg;
uint64_t max = ms->iova + ms->len;

+ if (msl->external)
+ return 0;
+
if (max > param->window_size) {
param->hugepage_sz = ms->hugepage_sz;
param->window_size = max;
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..4eae7bec6 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,40 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}

+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;

- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ valid = msl->socket_id == wa->socket_id;
+ valid |= wa->socket_id == SOCKET_ID_ANY && !msl->external;
+
+ if (!valid)
+ return 0;
+
+ if (msl->page_sz < wa->min)
+ wa->min = msl->page_sz;

return 0;
}

static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
{
- size_t min_pagesz = SIZE_MAX;
+ struct pagesz_walk_arg wa;

- rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+ wa.min = SIZE_MAX;
+ wa.socket_id = socket_id;

- return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+ rte_memseg_list_walk(find_min_pagesz, &wa);
+
+ return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
}


@@ -470,7 +485,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_sz = 0;
pg_shift = 0;
} else if (try_contig) {
- pg_sz = get_min_page_size();
+ pg_sz = get_min_page_size(mp->socket_id);
pg_shift = rte_bsf32(pg_sz);
} else {
pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
{
int32_t *socket = arg;

+ if (msl->external)
+ return 0;
+
return *socket == msl->socket_id;
}

diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
{
struct walk_arg *wa = arg;

+ if (msl->external)
+ return 0;
+
if (msl->page_sz == RTE_PGSIZE_2M)
wa->hugepage_2MB_avail = 1;
if (msl->page_sz == RTE_PGSIZE_1G)
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:41 UTC
Permalink
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <***@intel.com>
---
lib/librte_pipeline/rte_pipeline.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c
index 0cb8b804e..2c047a8a4 100644
--- a/lib/librte_pipeline/rte_pipeline.c
+++ b/lib/librte_pipeline/rte_pipeline.c
@@ -178,8 +178,7 @@ rte_pipeline_check_params(struct rte_pipeline_params *params)
}

/* socket */
- if ((params->socket_id < 0) ||
- (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+ if (params->socket_id < 0) {
RTE_LOG(ERR, PIPELINE,
"%s: Incorrect value for parameter socket_id\n",
__func__);
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:45 UTC
Permalink
Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.

Signed-off-by: Anatoly Burakov <***@intel.com>
---
lib/librte_eal/common/include/rte_malloc.h | 19 ++++++++
lib/librte_eal/common/malloc_heap.c | 30 +++++++++++++
lib/librte_eal/common/malloc_heap.h | 3 ++
lib/librte_eal/common/rte_malloc.c | 52 ++++++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
5 files changed, 105 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 8870732a6..182afab1c 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats);

+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ * socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ * Name of the heap to create.
+ *
+ * @return
+ * - 0 on successful creation
+ * - -1 in case of error, with rte_errno set to one of the following:
+ * EINVAL - ``heap_name`` was NULL, empty or too long
+ * EEXIST - heap by name of ``heap_name`` already exists
+ * ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
/**
* Find socket ID corresponding to a named heap.
*
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 813961f0c..2742f7b11 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
#include "malloc_heap.h"
#include "malloc_mp.h"

+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
static unsigned
check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
{
@@ -1006,6 +1010,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
rte_spinlock_unlock(&heap->lock);
}

+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+ static uint32_t next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
+ /* prevent overflow. did you really create 2 billion heaps??? */
+ if (next_socket_id > INT32_MAX) {
+ RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+ rte_errno = ENOSPC;
+ return -1;
+ }
+
+ /* initialize empty heap */
+ heap->alloc_count = 0;
+ heap->first = NULL;
+ heap->last = NULL;
+ LIST_INIT(heap->free_head);
+ rte_spinlock_init(&heap->lock);
+ heap->total_size = 0;
+ heap->socket_id = next_socket_id++;
+
+ /* set up name */
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ return 0;
+}
+
int
rte_eal_malloc_heap_init(void)
{
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
size_t align, bool contig);

+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
int
malloc_heap_free(struct malloc_elem *elem);

diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b789333b3..ade5af406 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
#include <rte_branch_prediction.h>
#include <rte_debug.h>
#include <rte_launch.h>
@@ -286,3 +287,54 @@ rte_malloc_virt2iova(const void *addr)

return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
}
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ int i, ret;
+
+ if (heap_name == NULL ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ /* check if there is space in the heap list, or if heap with this name
+ * already exists.
+ */
+ rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+ /* existing heap */
+ if (strncmp(heap_name, tmp->name,
+ RTE_HEAP_NAME_MAX_LEN) == 0) {
+ RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+ heap_name);
+ rte_errno = EEXIST;
+ ret = -1;
+ goto unlock;
+ }
+ /* empty heap */
+ if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+ heap = tmp;
+ break;
+ }
+ }
+ if (heap == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+ rte_errno = ENOSPC;
+ ret = -1;
+ goto unlock;
+ }
+
+ /* we're sure that we can create a new heap, so do it */
+ ret = malloc_heap_create(heap, heap_name);
+unlock:
+ rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 6fd729b8b..c93dcf1a3 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -311,6 +311,7 @@ EXPERIMENTAL {
rte_fbarray_set_used;
rte_log_register_type_and_pick_level;
rte_malloc_dump_heaps;
+ rte_malloc_heap_create;
rte_malloc_heap_get_socket;
rte_mem_alloc_validator_register;
rte_mem_alloc_validator_unregister;
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:51 UTC
Permalink
Add simple unit tests to test external memory support.
The tests are pretty basic and mostly consist of checking
if invalid API calls are handled correctly, plus a simple
allocation/deallocation test for malloc and memzone.

Signed-off-by: Anatoly Burakov <***@intel.com>
---
test/test/Makefile | 1 +
test/test/autotest_data.py | 14 +-
test/test/meson.build | 1 +
test/test/test_external_mem.c | 384 ++++++++++++++++++++++++++++++++++
4 files changed, 396 insertions(+), 4 deletions(-)
create mode 100644 test/test/test_external_mem.c

diff --git a/test/test/Makefile b/test/test/Makefile
index e6967bab6..074ac6e03 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -71,6 +71,7 @@ SRCS-y += test_bitmap.c
SRCS-y += test_reciprocal_division.c
SRCS-y += test_reciprocal_division_perf.c
SRCS-y += test_fbarray.c
+SRCS-y += test_external_mem.c

SRCS-y += test_ring.c
SRCS-y += test_ring_perf.c
diff --git a/test/test/autotest_data.py b/test/test/autotest_data.py
index f68d9b111..51f8e1689 100644
--- a/test/test/autotest_data.py
+++ b/test/test/autotest_data.py
@@ -477,10 +477,16 @@
"Report": None,
},
{
- "Name": "Fbarray autotest",
- "Command": "fbarray_autotest",
- "Func": default_autotest,
- "Report": None,
+ "Name": "Fbarray autotest",
+ "Command": "fbarray_autotest",
+ "Func": default_autotest,
+ "Report": None,
+ },
+ {
+ "Name": "External memory autotest",
+ "Command": "external_mem_autotest",
+ "Func": default_autotest,
+ "Report": None,
},
#
#Please always keep all dump tests at the end and together!
diff --git a/test/test/meson.build b/test/test/meson.build
index b1dd6eca2..3abf02b71 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -155,6 +155,7 @@ test_names = [
'eventdev_common_autotest',
'eventdev_octeontx_autotest',
'eventdev_sw_autotest',
+ 'external_mem_autotest',
'func_reentrancy_autotest',
'flow_classify_autotest',
'hash_scaling_autotest',
diff --git a/test/test/test_external_mem.c b/test/test/test_external_mem.c
new file mode 100644
index 000000000..5edb5c348
--- /dev/null
+++ b/test/test/test_external_mem.c
@@ -0,0 +1,384 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+
+#include <rte_common.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+
+#include "test.h"
+
+#define EXTERNAL_MEM_SZ (RTE_PGSIZE_4K << 10) /* 4M of data */
+
+static int
+test_invalid_param(void *addr, size_t len, size_t pgsz, rte_iova_t *iova,
+ int n_pages)
+{
+ static const char * const names[] = {
+ NULL, /* NULL name */
+ "", /* empty name */
+ "this heap name is definitely way too long to be valid"
+ };
+ const char *valid_name = "valid heap name";
+ unsigned int i;
+
+ /* check invalid name handling */
+ for (i = 0; i < RTE_DIM(names); i++) {
+ const char *name = names[i];
+
+ /* these calls may fail for other reasons, so check errno */
+ if (rte_malloc_heap_create(name) >= 0 || rte_errno != EINVAL) {
+ printf("%s():%i: Created heap with invalid name\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ if (rte_malloc_heap_destroy(name) >= 0 || rte_errno != EINVAL) {
+ printf("%s():%i: Destroyed heap with invalid name\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ if (rte_malloc_heap_get_socket(name) >= 0 ||
+ rte_errno != EINVAL) {
+ printf("%s():%i: Found socket for heap with invalid name\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ if (rte_malloc_heap_memory_add(name, addr, len,
+ NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+ printf("%s():%i: Added memory to heap with invalid name\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+ if (rte_malloc_heap_memory_remove(name, addr, len) >= 0 ||
+ rte_errno != EINVAL) {
+ printf("%s():%i: Removed memory from heap with invalid name\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ if (rte_malloc_heap_memory_attach(name, addr, len) >= 0 ||
+ rte_errno != EINVAL) {
+ printf("%s():%i: Attached memory to heap with invalid name\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+ if (rte_malloc_heap_memory_detach(name, addr, len) >= 0 ||
+ rte_errno != EINVAL) {
+ printf("%s():%i: Detached memory from heap with invalid name\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+ }
+
+ /* do same as above, but with a valid heap name */
+
+ /* skip create call */
+ if (rte_malloc_heap_destroy(valid_name) >= 0 || rte_errno != ENOENT) {
+ printf("%s():%i: Destroyed heap with invalid name\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+ if (rte_malloc_heap_get_socket(valid_name) >= 0 ||
+ rte_errno != ENOENT) {
+ printf("%s():%i: Found socket for heap with invalid name\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ /* these calls may fail for other reasons, so check errno */
+ if (rte_malloc_heap_memory_add(valid_name, addr, len,
+ NULL, 0, pgsz) >= 0 || rte_errno != ENOENT) {
+ printf("%s():%i: Added memory to non-existent heap\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+ if (rte_malloc_heap_memory_remove(valid_name, addr, len) >= 0 ||
+ rte_errno != ENOENT) {
+ printf("%s():%i: Removed memory from non-existent heap\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ if (rte_malloc_heap_memory_attach(valid_name, addr, len) >= 0 ||
+ rte_errno != ENOENT) {
+ printf("%s():%i: Attached memory to non-existent heap\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+ if (rte_malloc_heap_memory_detach(valid_name, addr, len) >= 0 ||
+ rte_errno != ENOENT) {
+ printf("%s():%i: Detached memory from non-existent heap\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ /* create a valid heap but test other invalid parameters */
+ if (rte_malloc_heap_create(valid_name) != 0) {
+ printf("%s():%i: Failed to create valid heap\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ /* zero length */
+ if (rte_malloc_heap_memory_add(valid_name, addr, 0,
+ NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+ printf("%s():%i: Added memory with invalid parameters\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ if (rte_malloc_heap_memory_remove(valid_name, addr, 0) >= 0 ||
+ rte_errno != EINVAL) {
+ printf("%s():%i: Removed memory with invalid parameters\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ if (rte_malloc_heap_memory_attach(valid_name, addr, 0) >= 0 ||
+ rte_errno != EINVAL) {
+ printf("%s():%i: Attached memory with invalid parameters\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+ if (rte_malloc_heap_memory_detach(valid_name, addr, 0) >= 0 ||
+ rte_errno != EINVAL) {
+ printf("%s():%i: Detached memory with invalid parameters\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ /* zero address */
+ if (rte_malloc_heap_memory_add(valid_name, NULL, len,
+ NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+ printf("%s():%i: Added memory with invalid parameters\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ if (rte_malloc_heap_memory_remove(valid_name, NULL, len) >= 0 ||
+ rte_errno != EINVAL) {
+ printf("%s():%i: Removed memory with invalid parameters\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ if (rte_malloc_heap_memory_attach(valid_name, NULL, len) >= 0 ||
+ rte_errno != EINVAL) {
+ printf("%s():%i: Attached memory with invalid parameters\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+ if (rte_malloc_heap_memory_detach(valid_name, NULL, len) >= 0 ||
+ rte_errno != EINVAL) {
+ printf("%s():%i: Detached memory with invalid parameters\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ /* wrong page count */
+ if (rte_malloc_heap_memory_add(valid_name, addr, len,
+ iova, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+ printf("%s():%i: Added memory with invalid parameters\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+ if (rte_malloc_heap_memory_add(valid_name, addr, len,
+ iova, n_pages - 1, pgsz) >= 0 || rte_errno != EINVAL) {
+ printf("%s():%i: Added memory with invalid parameters\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+ if (rte_malloc_heap_memory_add(valid_name, addr, len,
+ iova, n_pages + 1, pgsz) >= 0 || rte_errno != EINVAL) {
+ printf("%s():%i: Added memory with invalid parameters\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ /* tests passed, destroy heap */
+ if (rte_malloc_heap_destroy(valid_name) != 0) {
+ printf("%s():%i: Failed to destroy valid heap\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+ return 0;
+fail:
+ rte_malloc_heap_destroy(valid_name);
+ return -1;
+}
+
+static int
+test_basic(void *addr, size_t len, size_t pgsz, rte_iova_t *iova, int n_pages)
+{
+ const char *heap_name = "heap";
+ void *ptr = NULL;
+ int socket_id, i;
+ const struct rte_memzone *mz = NULL;
+
+ /* create heap */
+ if (rte_malloc_heap_create(heap_name) != 0) {
+ printf("%s():%i: Failed to create malloc heap\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ /* get socket ID corresponding to this heap */
+ socket_id = rte_malloc_heap_get_socket(heap_name);
+ if (socket_id < 0) {
+ printf("%s():%i: cannot find socket for external heap\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ /* heap is empty, so any allocation should fail */
+ ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+ if (ptr != NULL) {
+ printf("%s():%i: Allocated from empty heap\n", __func__,
+ __LINE__);
+ goto fail;
+ }
+
+ /* add memory to heap */
+ if (rte_malloc_heap_memory_add(heap_name, addr, len,
+ iova, n_pages, pgsz) != 0) {
+ printf("%s():%i: Failed to add memory to heap\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ /* check that we can get this memory from EAL now */
+ for (i = 0; i < n_pages; i++) {
+ const struct rte_memseg *ms;
+
+ ms = rte_mem_virt2memseg(RTE_PTR_ADD(addr, pgsz * i), NULL);
+ if (ms == NULL) {
+ printf("%s():%i: Failed to retrieve memseg for external mem\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+ if (ms->iova != iova[i]) {
+ printf("%s():%i: IOVA mismatch\n", __func__, __LINE__);
+ goto fail;
+ }
+ }
+
+ /* allocate - this now should succeed */
+ ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+ if (ptr == NULL) {
+ printf("%s():%i: Failed to allocate from external heap\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ /* check if address is in expected range */
+ if (ptr < addr || ptr >= RTE_PTR_ADD(addr, len)) {
+ printf("%s():%i: Allocated from unexpected address space\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ /* we've allocated something - removing memory should fail */
+ if (rte_malloc_heap_memory_remove(heap_name, addr, len) >= 0 ||
+ rte_errno != EBUSY) {
+ printf("%s():%i: Removing memory succeeded when memory is not free\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+ if (rte_malloc_heap_destroy(heap_name) >= 0 || rte_errno != EBUSY) {
+ printf("%s():%i: Destroying heap succeeded when memory is not free\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ /* try allocating an IOVA-contiguous memzone - this should succeed
+ * because we've set up a contiguous IOVA table.
+ */
+ mz = rte_memzone_reserve("heap_test", pgsz * 2, socket_id,
+ RTE_MEMZONE_IOVA_CONTIG);
+ if (mz == NULL) {
+ printf("%s():%i: Failed to reserve memzone\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ rte_malloc_dump_stats(stdout, NULL);
+ rte_malloc_dump_heaps(stdout);
+
+ /* free memory - removing it should now succeed */
+ rte_free(ptr);
+ ptr = NULL;
+
+ rte_memzone_free(mz);
+ mz = NULL;
+
+ if (rte_malloc_heap_memory_remove(heap_name, addr, len) != 0) {
+ printf("%s():%i: Removing memory from heap failed\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+ if (rte_malloc_heap_destroy(heap_name) != 0) {
+ printf("%s():%i: Destroying heap failed\n",
+ __func__, __LINE__);
+ goto fail;
+ }
+
+ return 0;
+fail:
+ rte_memzone_free(mz);
+ rte_free(ptr);
+ /* even if something failed, attempt to clean up */
+ rte_malloc_heap_memory_remove(heap_name, addr, len);
+ rte_malloc_heap_destroy(heap_name);
+
+ return -1;
+}
+
+/* we need to test attach/detach in secondary processes. */
+static int
+test_external_mem(void)
+{
+ size_t len = EXTERNAL_MEM_SZ;
+ size_t pgsz = RTE_PGSIZE_4K;
+ rte_iova_t iova[len / pgsz];
+ void *addr;
+ int ret, n_pages;
+
+ /* create external memory area */
+ n_pages = RTE_DIM(iova);
+ addr = mmap(NULL, len, PROT_WRITE | PROT_READ,
+ MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+ if (addr == MAP_FAILED) {
+ printf("%s():%i: Failed to create dummy memory area\n",
+ __func__, __LINE__);
+ return -1;
+ }
+ for (int i = 0; i < n_pages; i++) {
+ /* arbitrary IOVA */
+ rte_iova_t tmp = 0x100000000 + i * pgsz;
+ iova[i] = tmp;
+ }
+
+ ret = test_invalid_param(addr, len, pgsz, iova, n_pages);
+ ret |= test_basic(addr, len, pgsz, iova, n_pages);
+
+ munmap(addr, len);
+
+ return ret;
+}
+
+REGISTER_TEST_COMMAND(external_mem_autotest, test_external_mem);
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:47 UTC
Permalink
Add an API to add externally allocated memory to malloc heap. The
memory will be stored in memseg lists like regular DPDK memory.
Multiple segments are allowed within a heap. If IOVA table is
not provided, IOVA addresses are filled in with RTE_BAD_IOVA.

Signed-off-by: Anatoly Burakov <***@intel.com>
---
lib/librte_eal/common/include/rte_malloc.h | 39 ++++++++++++
lib/librte_eal/common/malloc_heap.c | 74 ++++++++++++++++++++++
lib/librte_eal/common/malloc_heap.h | 4 ++
lib/librte_eal/common/rte_malloc.c | 51 +++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
5 files changed, 169 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 8a8cc1e6d..47f867a05 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,45 @@ int
rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats);

+/**
+ * Add memory chunk to a heap with specified name.
+ *
+ * @note Multiple memory chunks can be added to the same heap
+ *
+ * @note Memory must be previously allocated for DPDK to be able to use it as a
+ * malloc heap. Failing to do so will result in undefined behavior, up to and
+ * including segmentation faults.
+ *
+ * @note Calling this function will erase any contents already present at the
+ * supplied memory address.
+ *
+ * @param heap_name
+ * Name of the heap to add memory chunk to
+ * @param va_addr
+ * Start of virtual area to add to the heap
+ * @param len
+ * Length of virtual area to add to the heap
+ * @param iova_addrs
+ * Array of page IOVA addresses corresponding to each page in this memory
+ * area. Can be NULL, in which case page IOVA addresses will be set to
+ * RTE_BAD_IOVA.
+ * @param n_pages
+ * Number of elements in the iova_addrs array. Ignored if ``iova_addrs``
+ * is NULL.
+ * @param page_sz
+ * Page size of the underlying memory
+ *
+ * @return
+ * - 0 on success
+ * - -1 in case of error, with rte_errno set to one of the following:
+ * EINVAL - one of the parameters was invalid
+ * EPERM - attempted to add memory to a reserved heap
+ * ENOSPC - no more space in internal config to store a new memory chunk
+ */
+int __rte_experimental
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+ rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
/**
* Creates a new empty malloc heap with a specified name.
*
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 471094cd1..af2476504 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1010,6 +1010,80 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
rte_spinlock_unlock(&heap->lock);
}

+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+ rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ char fbarray_name[RTE_FBARRAY_NAME_LEN];
+ struct rte_memseg_list *msl = NULL;
+ struct rte_fbarray *arr;
+ size_t seg_len = n_pages * page_sz;
+ unsigned int i;
+
+ /* first, find a free memseg list */
+ for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) {
+ struct rte_memseg_list *tmp = &mcfg->memsegs[i];
+ if (tmp->base_va == NULL) {
+ msl = tmp;
+ break;
+ }
+ }
+ if (msl == NULL) {
+ RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
+ rte_errno = ENOSPC;
+ return -1;
+ }
+
+ snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
+ heap->name, va_addr);
+
+ /* create the backing fbarray */
+ if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
+ sizeof(struct rte_memseg)) < 0) {
+ RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n");
+ return -1;
+ }
+ arr = &msl->memseg_arr;
+
+ /* fbarray created, fill it up */
+ for (i = 0; i < n_pages; i++) {
+ struct rte_memseg *ms;
+
+ rte_fbarray_set_used(arr, i);
+ ms = rte_fbarray_get(arr, i);
+ ms->addr = RTE_PTR_ADD(va_addr, n_pages * page_sz);
+ ms->iova = iova_addrs == NULL ? RTE_BAD_IOVA : iova_addrs[i];
+ ms->hugepage_sz = page_sz;
+ ms->len = page_sz;
+ ms->nchannel = rte_memory_get_nchannel();
+ ms->nrank = rte_memory_get_nrank();
+ ms->socket_id = heap->socket_id;
+ }
+
+ /* set up the memseg list */
+ msl->base_va = va_addr;
+ msl->page_sz = page_sz;
+ msl->socket_id = heap->socket_id;
+ msl->len = seg_len;
+ msl->version = 0;
+ msl->external = true;
+
+ /* erase contents of new memory */
+ memset(va_addr, 0, seg_len);
+
+ /* now, add newly minted memory to the malloc heap */
+ malloc_heap_add_memory(heap, msl, va_addr, seg_len);
+
+ heap->total_size += seg_len;
+
+ /* all done! */
+ RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
+ heap->name, va_addr);
+
+ return 0;
+}
+
int
malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
{
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 75278da3c..237ce9dc2 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -39,6 +39,10 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
int
malloc_heap_destroy(struct malloc_heap *heap);

+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+ rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
int
malloc_heap_free(struct malloc_elem *elem);

diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index d135f9730..329524ac9 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -303,6 +303,57 @@ find_named_heap(const char *name)
return NULL;
}

+int
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+ rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ unsigned int n;
+ int ret;
+
+ if (heap_name == NULL || va_addr == NULL ||
+ page_sz == 0 || !rte_is_power_of_2(page_sz) ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ ret = -1;
+ goto unlock;
+ }
+ rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+ /* find our heap */
+ heap = find_named_heap(heap_name);
+ if (heap == NULL) {
+ rte_errno = ENOENT;
+ ret = -1;
+ goto unlock;
+ }
+ if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+ /* cannot add memory to internal heaps */
+ rte_errno = EPERM;
+ ret = -1;
+ goto unlock;
+ }
+ n = len / page_sz;
+ if (n != n_pages && iova_addrs != NULL) {
+ rte_errno = EINVAL;
+ ret = -1;
+ goto unlock;
+ }
+
+ rte_spinlock_lock(&heap->lock);
+ ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n,
+ page_sz);
+ rte_spinlock_unlock(&heap->lock);
+
+unlock:
+ rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
+
int
rte_malloc_heap_create(const char *heap_name)
{
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 1d3ca0716..0d052d20a 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -314,6 +314,7 @@ EXPERIMENTAL {
rte_malloc_heap_create;
rte_malloc_heap_destroy;
rte_malloc_heap_get_socket;
+ rte_malloc_heap_memory_add;
rte_mem_alloc_validator_register;
rte_mem_alloc_validator_unregister;
rte_mem_event_callback_register;
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:42 UTC
Permalink
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <***@intel.com>
---
lib/librte_sched/rte_sched.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 9269e5c71..d4e2189c7 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -329,7 +329,7 @@ rte_sched_port_check_params(struct rte_sched_port_params *params)
return -1;

/* socket */
- if ((params->socket < 0) || (params->socket >= RTE_MAX_NUMA_NODES))
+ if (params->socket < 0)
return -3;

/* rate */
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:46 UTC
Permalink
Add an API to destroy specified heap.

Signed-off-by: Anatoly Burakov <***@intel.com>
---
lib/librte_eal/common/include/rte_malloc.h | 23 +++++++++
lib/librte_eal/common/malloc_heap.c | 22 ++++++++
lib/librte_eal/common/malloc_heap.h | 3 ++
lib/librte_eal/common/rte_malloc.c | 58 ++++++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
5 files changed, 107 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 182afab1c..8a8cc1e6d 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -282,6 +282,29 @@ rte_malloc_get_socket_stats(int socket,
int __rte_experimental
rte_malloc_heap_create(const char *heap_name);

+/**
+ * Destroys a previously created malloc heap with specified name.
+ *
+ * @note This function will return a failure result if not all memory allocated
+ * from the heap has been freed back to the heap
+ *
+ * @note This function will return a failure result if not all memory segments
+ * were removed from the heap prior to its destruction
+ *
+ * @param heap_name
+ * Name of the heap to create.
+ *
+ * @return
+ * - 0 on success
+ * - -1 in case of error, with rte_errno set to one of the following:
+ * EINVAL - ``heap_name`` was NULL, empty or too long
+ * ENOENT - heap by the name of ``heap_name`` was not found
+ * EPERM - attempting to destroy reserved heap
+ * EBUSY - heap still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_destroy(const char *heap_name);
+
/**
* Find socket ID corresponding to a named heap.
*
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 2742f7b11..471094cd1 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1036,6 +1036,28 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
return 0;
}

+int
+malloc_heap_destroy(struct malloc_heap *heap)
+{
+ if (heap->alloc_count != 0) {
+ RTE_LOG(ERR, EAL, "Heap is still in use\n");
+ rte_errno = EBUSY;
+ return -1;
+ }
+ if (heap->first != NULL || heap->last != NULL) {
+ RTE_LOG(ERR, EAL, "Heap still contains memory segments\n");
+ rte_errno = EBUSY;
+ return -1;
+ }
+ if (heap->total_size != 0)
+ RTE_LOG(ERR, EAL, "Total size not zero, heap is likely corrupt\n");
+
+ /* after this, the lock will be dropped */
+ memset(heap, 0, sizeof(*heap));
+
+ return 0;
+}
+
int
rte_eal_malloc_heap_init(void)
{
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index eebee16dc..75278da3c 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -36,6 +36,9 @@ malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
int
malloc_heap_create(struct malloc_heap *heap, const char *heap_name);

+int
+malloc_heap_destroy(struct malloc_heap *heap);
+
int
malloc_heap_free(struct malloc_elem *elem);

diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index ade5af406..d135f9730 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -288,6 +288,21 @@ rte_malloc_virt2iova(const void *addr)
return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
}

+static struct malloc_heap *
+find_named_heap(const char *name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int i;
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+ if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN))
+ return heap;
+ }
+ return NULL;
+}
+
int
rte_malloc_heap_create(const char *heap_name)
{
@@ -338,3 +353,46 @@ rte_malloc_heap_create(const char *heap_name)

return ret;
}
+
+int
+rte_malloc_heap_destroy(const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ int ret;
+
+ if (heap_name == NULL ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+ /* start from non-socket heaps */
+ heap = find_named_heap(heap_name);
+ if (heap == NULL) {
+ RTE_LOG(ERR, EAL, "Heap %s not found\n", heap_name);
+ rte_errno = ENOENT;
+ ret = -1;
+ goto unlock;
+ }
+ /* we shouldn't be able to destroy internal heaps */
+ if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+ rte_errno = EPERM;
+ ret = -1;
+ goto unlock;
+ }
+ /* sanity checks done, now we can destroy the heap */
+ rte_spinlock_lock(&heap->lock);
+ ret = malloc_heap_destroy(heap);
+
+ /* if we failed, lock is still active */
+ if (ret < 0)
+ rte_spinlock_unlock(&heap->lock);
+unlock:
+ rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index c93dcf1a3..1d3ca0716 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -312,6 +312,7 @@ EXPERIMENTAL {
rte_log_register_type_and_pick_level;
rte_malloc_dump_heaps;
rte_malloc_heap_create;
+ rte_malloc_heap_destroy;
rte_malloc_heap_get_socket;
rte_mem_alloc_validator_register;
rte_mem_alloc_validator_unregister;
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:49 UTC
Permalink
In order to use external memory in multiple processes, we need to
attach to primary process's memseg lists, so add a new API to do
that. It is the responsibility of the user to ensure that memory
is accessible and that it has been previously added to the malloc
heap by another process.

Signed-off-by: Anatoly Burakov <***@intel.com>
---
lib/librte_eal/common/include/rte_malloc.h | 32 +++++++++
lib/librte_eal/common/rte_malloc.c | 83 ++++++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
3 files changed, 116 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 9bbe8e3af..37af0e481 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -268,6 +268,10 @@ rte_malloc_get_socket_stats(int socket,
*
* @note Multiple memory chunks can be added to the same heap
*
+ * @note Before accessing this memory in other processes, it needs to be
+ * attached in each of those processes by calling
+ * ``rte_malloc_heap_memory_attach`` in each other process.
+ *
* @note Memory must be previously allocated for DPDK to be able to use it as a
* malloc heap. Failing to do so will result in undefined behavior, up to and
* including segmentation faults.
@@ -329,12 +333,38 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
int __rte_experimental
rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);

+/**
+ * Attach to an already existing chunk of external memory in another process.
+ *
+ * @note This function must be called before any attempt is made to use an
+ * already existing external memory chunk. This function does *not* need to
+ * be called if a call to ``rte_malloc_heap_memory_add`` was made in the
+ * current process.
+ *
+ * @param heap_name
+ * Heap name to which this chunk of memory belongs
+ * @param va_addr
+ * Start address of memory chunk to attach to
+ * @param len
+ * Length of memory chunk to attach to
+ * @return
+ * 0 on successful attach
+ * -1 on unsuccessful attach, with rte_errno set to indicate cause for error:
+ * EINVAL - one of the parameters was invalid
+ * EPERM - attempted to attach memory to a reserved heap
+ * ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
+
/**
* Creates a new empty malloc heap with a specified name.
*
* @note Heaps created via this call will automatically get assigned a unique
* socket ID, which can be found using ``rte_malloc_heap_get_socket()``
*
+ * @note This function has to only be called in one process.
+ *
* @param heap_name
* Name of the heap to create.
*
@@ -357,6 +387,8 @@ rte_malloc_heap_create(const char *heap_name);
* @note This function will return a failure result if not all memory segments
* were removed from the heap prior to its destruction
*
+ * @note This function has to only be called in one process.
+ *
* @param heap_name
* Name of the heap to create.
*
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5093c4a46..2ed173466 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -393,6 +393,89 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
return ret;
}

+struct sync_mem_walk_arg {
+ void *va_addr;
+ size_t len;
+ int result;
+};
+
+static int
+attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct sync_mem_walk_arg *wa = arg;
+ size_t len = msl->page_sz * msl->memseg_arr.len;
+
+ if (msl->base_va == wa->va_addr &&
+ len == wa->len) {
+ struct rte_memseg_list *found_msl;
+ int msl_idx, ret;
+
+ /* msl is const */
+ msl_idx = msl - mcfg->memsegs;
+ found_msl = &mcfg->memsegs[msl_idx];
+
+ ret = rte_fbarray_attach(&found_msl->memseg_arr);
+
+ if (ret < 0)
+ wa->result = -rte_errno;
+ else
+ wa->result = 0;
+ return 1;
+ }
+ return 0;
+}
+
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ struct sync_mem_walk_arg wa;
+ int ret;
+
+ if (heap_name == NULL || va_addr == NULL || len == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+ /* find our heap */
+ heap = find_named_heap(heap_name);
+ if (heap == NULL) {
+ rte_errno = ENOENT;
+ ret = -1;
+ goto unlock;
+ }
+ /* we shouldn't be able to attach to internal heaps */
+ if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+ rte_errno = EPERM;
+ ret = -1;
+ goto unlock;
+ }
+
+ /* find corresponding memseg list to attach to */
+ wa.va_addr = va_addr;
+ wa.len = len;
+ wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+
+ /* we're already holding a read lock */
+ rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+
+ if (wa.result < 0) {
+ rte_errno = -wa.result;
+ ret = -1;
+ } else {
+ ret = 0;
+ }
+unlock:
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+ return ret;
+}
+
int
rte_malloc_heap_create(const char *heap_name)
{
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index f10c34130..822c5693a 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -315,6 +315,7 @@ EXPERIMENTAL {
rte_malloc_heap_destroy;
rte_malloc_heap_get_socket;
rte_malloc_heap_memory_add;
+ rte_malloc_heap_memory_attach;
rte_malloc_heap_memory_remove;
rte_mem_alloc_validator_register;
rte_mem_alloc_validator_unregister;
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:44 UTC
Permalink
When we will be creating external heaps, they will have their own
"fake" socket ID, so add a function that will map the heap name
to its socket ID.

Signed-off-by: Anatoly Burakov <***@intel.com>
---
lib/librte_eal/common/include/rte_malloc.h | 14 ++++++++
lib/librte_eal/common/rte_malloc.c | 37 ++++++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
3 files changed, 52 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index a9fb7e452..8870732a6 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,20 @@ int
rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats);

+/**
+ * Find socket ID corresponding to a named heap.
+ *
+ * @param name
+ * Heap name to find socket ID for
+ * @return
+ * Socket ID in case of success (a non-negative number)
+ * -1 in case of error, with rte_errno set to one of the following:
+ * EINVAL - ``name`` was NULL
+ * ENOENT - heap identified by the name ``name`` was not found
+ */
+int __rte_experimental
+rte_malloc_heap_get_socket(const char *name);
+
/**
* Dump statistics.
*
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 0515d47f3..b789333b3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -8,6 +8,7 @@
#include <string.h>
#include <sys/queue.h>

+#include <rte_errno.h>
#include <rte_memcpy.h>
#include <rte_memory.h>
#include <rte_eal.h>
@@ -183,6 +184,42 @@ rte_malloc_dump_heaps(FILE *f)
rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
}

+int
+rte_malloc_heap_get_socket(const char *name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ unsigned int idx;
+ int ret;
+
+ if (name == NULL ||
+ strnlen(name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+ for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+ struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+ if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN)) {
+ heap = tmp;
+ break;
+ }
+ }
+
+ if (heap != NULL) {
+ ret = heap->socket_id;
+ } else {
+ rte_errno = ENOENT;
+ ret = -1;
+ }
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
+
/*
* Print stats on memory type. If type is NULL, info on all types is printed
*/
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 344a43d32..6fd729b8b 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -311,6 +311,7 @@ EXPERIMENTAL {
rte_fbarray_set_used;
rte_log_register_type_and_pick_level;
rte_malloc_dump_heaps;
+ rte_malloc_heap_get_socket;
rte_mem_alloc_validator_register;
rte_mem_alloc_validator_unregister;
rte_mem_event_callback_register;
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:48 UTC
Permalink
Add an API to remove memory from specified heaps. This will first
check if all elements within the region are free, and that the
region is the original region that was added to the heap (by
comparing its length to length of memory addressed by the
underlying memseg list).

Signed-off-by: Anatoly Burakov <***@intel.com>
---
lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++
lib/librte_eal/common/malloc_heap.c | 54 ++++++++++++++++++++++
lib/librte_eal/common/malloc_heap.h | 4 ++
lib/librte_eal/common/rte_malloc.c | 39 ++++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
5 files changed, 125 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 47f867a05..9bbe8e3af 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -302,6 +302,33 @@ int __rte_experimental
rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);

+/**
+ * Remove memory chunk from heap with specified name.
+ *
+ * @note Memory chunk being removed must be the same as one that was added;
+ * partially removing memory chunks is not supported
+ *
+ * @note Memory area must not contain any allocated elements to allow its
+ * removal from the heap
+ *
+ * @param heap_name
+ * Name of the heap to remove memory from
+ * @param va_addr
+ * Virtual address to remove from the heap
+ * @param len
+ * Length of virtual area to remove from the heap
+ *
+ * @return
+ * - 0 on success
+ * - -1 in case of error, with rte_errno set to one of the following:
+ * EINVAL - one of the parameters was invalid
+ * EPERM - attempted to remove memory from a reserved heap
+ * ENOENT - heap or memory chunk was not found
+ * EBUSY - memory chunk still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
+
/**
* Creates a new empty malloc heap with a specified name.
*
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index af2476504..7d1d4a290 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1010,6 +1010,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
rte_spinlock_unlock(&heap->lock);
}

+static int
+destroy_seg(struct malloc_elem *elem, size_t len)
+{
+ struct malloc_heap *heap = elem->heap;
+ struct rte_memseg_list *msl;
+
+ msl = elem->msl;
+
+ /* this element can be removed */
+ malloc_elem_free_list_remove(elem);
+ malloc_elem_hide_region(elem, elem, len);
+
+ heap->total_size -= len;
+
+ memset(elem, 0, sizeof(*elem));
+
+ /* destroy the fbarray backing this memory */
+ if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
+ return -1;
+
+ /* reset the memseg list */
+ memset(msl, 0, sizeof(*msl));
+
+ return 0;
+}
+
int
malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
@@ -1084,6 +1110,34 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
return 0;
}

+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+ size_t len)
+{
+ struct malloc_elem *elem = heap->first;
+
+ /* find element with specified va address */
+ while (elem != NULL && elem != va_addr) {
+ elem = elem->next;
+ /* stop if we've blown past our VA */
+ if (elem > (struct malloc_elem *)va_addr) {
+ rte_errno = ENOENT;
+ return -1;
+ }
+ }
+ /* check if element was found */
+ if (elem == NULL || elem->msl->len != len) {
+ rte_errno = ENOENT;
+ return -1;
+ }
+ /* if element's size is not equal to segment len, segment is busy */
+ if (elem->state == ELEM_BUSY || elem->size != len) {
+ rte_errno = EBUSY;
+ return -1;
+ }
+ return destroy_seg(elem, len);
+}
+
int
malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
{
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 237ce9dc2..e48996d52 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -43,6 +43,10 @@ int
malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);

+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+ size_t len);
+
int
malloc_heap_free(struct malloc_elem *elem);

diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 329524ac9..5093c4a46 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -354,6 +354,45 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
return ret;
}

+int
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ int ret;
+
+ if (heap_name == NULL || va_addr == NULL || len == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+ /* find our heap */
+ heap = find_named_heap(heap_name);
+ if (heap == NULL) {
+ rte_errno = ENOENT;
+ ret = -1;
+ goto unlock;
+ }
+ if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+ /* cannot remove memory from internal heaps */
+ rte_errno = EPERM;
+ ret = -1;
+ goto unlock;
+ }
+
+ rte_spinlock_lock(&heap->lock);
+ ret = malloc_heap_remove_external_memory(heap, va_addr, len);
+ rte_spinlock_unlock(&heap->lock);
+
+unlock:
+ rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
+
int
rte_malloc_heap_create(const char *heap_name)
{
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 0d052d20a..f10c34130 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -315,6 +315,7 @@ EXPERIMENTAL {
rte_malloc_heap_destroy;
rte_malloc_heap_get_socket;
rte_malloc_heap_memory_add;
+ rte_malloc_heap_memory_remove;
rte_mem_alloc_validator_register;
rte_mem_alloc_validator_unregister;
rte_mem_event_callback_register;
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:38 UTC
Permalink
Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.

Signed-off-by: Anatoly Burakov <***@intel.com>
---
config/common_base | 1 +
config/rte_config.h | 1 +
.../common/include/rte_eal_memconfig.h | 4 +-
.../common/include/rte_malloc_heap.h | 1 +
lib/librte_eal/common/malloc_heap.c | 85 +++++++++++++------
lib/librte_eal/common/malloc_heap.h | 3 +
lib/librte_eal/common/rte_malloc.c | 41 ++++++---
7 files changed, 94 insertions(+), 42 deletions(-)

diff --git a/config/common_base b/config/common_base
index 4bcbaf923..e96c52054 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
CONFIG_RTE_LIBRTE_EAL=y
CONFIG_RTE_MAX_LCORE=128
CONFIG_RTE_MAX_NUMA_NODES=8
+CONFIG_RTE_MAX_HEAPS=32
CONFIG_RTE_MAX_MEMSEG_LISTS=64
# each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
# or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
diff --git a/config/rte_config.h b/config/rte_config.h
index a8e479774..1f330c24e 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -21,6 +21,7 @@
/****** library defines ********/

/* EAL defines */
+#define RTE_MAX_HEAPS 32
#define RTE_MAX_MEMSEG_LISTS 128
#define RTE_MAX_MEMSEG_PER_LIST 8192
#define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 76faf9a4a..5c6bd4bc3 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -72,8 +72,8 @@ struct rte_mem_config {

struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */

- /* Heaps of Malloc per socket */
- struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+ /* Heaps of Malloc */
+ struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];

/* address of mem_config in primary process. used to map shared config into
* exact same address the primary process maps it.
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d43fa9097..e7ac32d42 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -27,6 +27,7 @@ struct malloc_heap {

unsigned alloc_count;
size_t total_size;
+ unsigned int socket_id;
} __rte_cache_aligned;

#endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 8c37b9d7c..0a868f61d 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
return check_flag & flags;
}

+int
+malloc_socket_to_heap_id(unsigned int socket_id)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int i;
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+ if (heap->socket_id == socket_id)
+ return i;
+ }
+ return -1;
+}
+
/*
* Expand the heap with a memory area.
*/
@@ -93,12 +108,13 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct rte_memseg_list *found_msl;
struct malloc_heap *heap;
- int msl_idx;
+ int msl_idx, heap_idx;

if (msl->external)
return 0;

- heap = &mcfg->malloc_heaps[msl->socket_id];
+ heap_idx = malloc_socket_to_heap_id(msl->socket_id);
+ heap = &mcfg->malloc_heaps[heap_idx];

/* msl is const, so find it */
msl_idx = msl - mcfg->memsegs;
@@ -111,6 +127,7 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);

heap->total_size += len;
+ heap->socket_id = msl->socket_id;

RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -563,12 +580,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,

/* this will try lower page sizes first */
static void *
-heap_alloc_on_socket(const char *type, size_t size, int socket,
- unsigned int flags, size_t align, size_t bound, bool contig)
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+ unsigned int heap_id, unsigned int flags, size_t align,
+ size_t bound, bool contig)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+ struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
+ int socket_id;
void *ret;

rte_spinlock_lock(&(heap->lock));
@@ -590,8 +609,13 @@ heap_alloc_on_socket(const char *type, size_t size, int socket,
if (ret != NULL)
goto alloc_unlock;

- if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound,
- contig)) {
+ socket_id = rte_socket_id_by_idx(heap_id);
+ /* if socket ID is invalid, this is an external heap */
+ if (socket_id < 0)
+ goto alloc_unlock;
+
+ if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
+ bound, contig)) {
ret = heap_alloc(heap, type, size, flags, align, bound, contig);

/* this should have succeeded */
@@ -607,7 +631,7 @@ void *
malloc_heap_alloc(const char *type, size_t size, int socket_arg,
unsigned int flags, size_t align, size_t bound, bool contig)
{
- int socket, i, cur_socket;
+ int socket, heap_id, i;
void *ret;

/* return NULL if size is 0 or alignment is not power-of-2 */
@@ -622,22 +646,25 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
else
socket = socket_arg;

- /* Check socket parameter */
- if (socket >= RTE_MAX_NUMA_NODES)
+ /* turn socket ID into heap ID */
+ heap_id = malloc_socket_to_heap_id(socket);
+ /* if heap id is negative, socket ID was invalid */
+ if (heap_id < 0)
return NULL;

- ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
- contig);
+ ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
+ bound, contig);
if (ret != NULL || socket_arg != SOCKET_ID_ANY)
return ret;

- /* try other heaps */
+ /* try other heaps. we are only iterating through native DPDK sockets,
+ * so external heaps won't be included.
+ */
for (i = 0; i < (int) rte_socket_count(); i++) {
- cur_socket = rte_socket_id_by_idx(i);
- if (cur_socket == socket)
+ if (i == heap_id)
continue;
- ret = heap_alloc_on_socket(type, size, cur_socket, flags,
- align, bound, contig);
+ ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
+ bound, contig);
if (ret != NULL)
return ret;
}
@@ -645,11 +672,11 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
}

static void *
-heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags,
- size_t align, bool contig)
+heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id,
+ unsigned int flags, size_t align, bool contig)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+ struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
void *ret;

rte_spinlock_lock(&(heap->lock));
@@ -667,7 +694,7 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
size_t align, bool contig)
{
- int socket, i, cur_socket;
+ int socket, i, cur_socket, heap_id;
void *ret;

/* return NULL if align is not power-of-2 */
@@ -682,11 +709,13 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
else
socket = socket_arg;

- /* Check socket parameter */
- if (socket >= RTE_MAX_NUMA_NODES)
+ /* turn socket ID into heap ID */
+ heap_id = malloc_socket_to_heap_id(socket);
+ /* if heap id is negative, socket ID was invalid */
+ if (heap_id < 0)
return NULL;

- ret = heap_alloc_biggest_on_socket(type, socket, flags, align,
+ ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align,
contig);
if (ret != NULL || socket_arg != SOCKET_ID_ANY)
return ret;
@@ -696,8 +725,8 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
cur_socket = rte_socket_id_by_idx(i);
if (cur_socket == socket)
continue;
- ret = heap_alloc_biggest_on_socket(type, cur_socket, flags,
- align, contig);
+ ret = heap_alloc_biggest_on_heap_id(type, i, flags, align,
+ contig);
if (ret != NULL)
return ret;
}
@@ -919,7 +948,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size)
}

/*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
*/
int
malloc_heap_get_stats(struct malloc_heap *heap,
@@ -957,7 +986,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
}

/*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
*/
void
malloc_heap_dump(struct malloc_heap *heap, FILE *f)
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index f52cb5559..61b844b6f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap,
void
malloc_heap_dump(struct malloc_heap *heap, FILE *f);

+int
+malloc_socket_to_heap_id(unsigned int socket_id);
+
int
rte_eal_malloc_heap_init(void);

diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..dfcdf380a 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int heap_idx, ret = -1;

- if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
- return -1;
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);

- return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats);
+ heap_idx = malloc_socket_to_heap_id(socket);
+ if (heap_idx < 0)
+ goto unlock;
+
+ ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
+ socket_stats);
+unlock:
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
}

/*
@@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f)
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
unsigned int idx;

- for (idx = 0; idx < rte_socket_count(); idx++) {
- unsigned int socket = rte_socket_id_by_idx(idx);
- fprintf(f, "Heap on socket %i:\n", socket);
- malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+ for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+ fprintf(f, "Heap id: %u\n", idx);
+ malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
}

+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
}

/*
@@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f)
void
rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
{
- unsigned int socket;
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int heap_id;
struct rte_malloc_socket_stats sock_stats;
+
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
/* Iterate through all initialised heaps */
- for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
- if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
- continue;
+ for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];

- fprintf(f, "Socket:%u\n", socket);
+ malloc_heap_get_stats(heap, &sock_stats);
+
+ fprintf(f, "Heap id:%u\n", heap_id);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
@@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count);
fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count);
}
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
return;
}
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:40 UTC
Permalink
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <***@intel.com>
---
lib/librte_flow_classify/rte_flow_classify.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_flow_classify/rte_flow_classify.c b/lib/librte_flow_classify/rte_flow_classify.c
index 4c3469da1..fb652a2b7 100644
--- a/lib/librte_flow_classify/rte_flow_classify.c
+++ b/lib/librte_flow_classify/rte_flow_classify.c
@@ -247,8 +247,7 @@ rte_flow_classifier_check_params(struct rte_flow_classifier_params *params)
}

/* socket */
- if ((params->socket_id < 0) ||
- (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+ if (params->socket_id < 0) {
RTE_FLOW_CLASSIFY_LOG(ERR,
"%s: Incorrect value for parameter socket_id\n",
__func__);
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:43 UTC
Permalink
We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

Signed-off-by: Anatoly Burakov <***@intel.com>
---
lib/librte_eal/common/include/rte_malloc_heap.h | 2 ++
lib/librte_eal/common/malloc_heap.c | 15 ++++++++++++++-
lib/librte_eal/common/rte_malloc.c | 1 +
3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@

/* Number of free lists per heap, grouped by size. */
#define RTE_HEAP_NUM_FREELISTS 13
+#define RTE_HEAP_NAME_MAX_LEN 32

/* dummy definition, for pointers */
struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
unsigned alloc_count;
size_t total_size;
unsigned int socket_id;
+ char name[RTE_HEAP_NAME_MAX_LEN];
} __rte_cache_aligned;

#endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 0a868f61d..813961f0c 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -127,7 +127,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);

heap->total_size += len;
- heap->socket_id = msl->socket_id;

RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -1011,6 +1010,20 @@ int
rte_eal_malloc_heap_init(void)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int i;
+
+ /* assign names to default DPDK heaps */
+ for (i = 0; i < rte_socket_count(); i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+ char heap_name[RTE_HEAP_NAME_MAX_LEN];
+ int socket_id = rte_socket_id_by_idx(i);
+
+ snprintf(heap_name, sizeof(heap_name) - 1,
+ "socket_%i", socket_id);
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ heap->socket_id = socket_id;
+ }
+

if (register_mp_requests()) {
RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 458c44ba6..0515d47f3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
malloc_heap_get_stats(heap, &sock_stats);

fprintf(f, "Heap id:%u\n", heap_id);
+ fprintf(f, "\tHeap name:%s\n", heap->name);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:50 UTC
Permalink
Add API to detach from existing chunk of external memory in a
process.

Signed-off-by: Anatoly Burakov <***@intel.com>
---
lib/librte_eal/common/include/rte_malloc.h | 27 ++++++++++++++++++++++
lib/librte_eal/common/rte_malloc.c | 27 ++++++++++++++++++----
lib/librte_eal/rte_eal_version.map | 1 +
3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 37af0e481..0794f58e5 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -315,6 +315,9 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
* @note Memory area must not contain any allocated elements to allow its
* removal from the heap
*
+ * @note All other processes must detach from the memory chunk prior to it being
+ * removed from the heap.
+ *
* @param heap_name
* Name of the heap to remove memory from
* @param va_addr
@@ -357,6 +360,30 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
int __rte_experimental
rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);

+/**
+ * Detach from a chunk of external memory in secondary process.
+ *
+ * @note This function must be called in before any attempt is made to remove
+ * external memory from the heap in another process. This function does *not*
+ * need to be called if a call to ``rte_malloc_heap_memory_remove`` will be
+ * called in current process.
+ *
+ * @param heap_name
+ * Heap name to which this chunk of memory belongs
+ * @param va_addr
+ * Start address of memory chunk to attach to
+ * @param len
+ * Length of memory chunk to attach to
+ * @return
+ * 0 on successful detach
+ * -1 on unsuccessful detach, with rte_errno set to indicate cause for error:
+ * EINVAL - one of the parameters was invalid
+ * EPERM - attempted to detach memory from a reserved heap
+ * ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len);
+
/**
* Creates a new empty malloc heap with a specified name.
*
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 2ed173466..08571e5a0 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -397,10 +397,11 @@ struct sync_mem_walk_arg {
void *va_addr;
size_t len;
int result;
+ bool attach;
};

static int
-attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct sync_mem_walk_arg *wa = arg;
@@ -415,7 +416,10 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
msl_idx = msl - mcfg->memsegs;
found_msl = &mcfg->memsegs[msl_idx];

- ret = rte_fbarray_attach(&found_msl->memseg_arr);
+ if (wa->attach)
+ ret = rte_fbarray_attach(&found_msl->memseg_arr);
+ else
+ ret = rte_fbarray_detach(&found_msl->memseg_arr);

if (ret < 0)
wa->result = -rte_errno;
@@ -426,8 +430,8 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
return 0;
}

-int
-rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+static int
+sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct malloc_heap *heap = NULL;
@@ -461,9 +465,10 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
wa.va_addr = va_addr;
wa.len = len;
wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+ wa.attach = attach;

/* we're already holding a read lock */
- rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+ rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa);

if (wa.result < 0) {
rte_errno = -wa.result;
@@ -476,6 +481,18 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
return ret;
}

+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+ return sync_memory(heap_name, va_addr, len, true);
+}
+
+int
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len)
+{
+ return sync_memory(heap_name, va_addr, len, false);
+}
+
int
rte_malloc_heap_create(const char *heap_name)
{
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 822c5693a..73fecb912 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -316,6 +316,7 @@ EXPERIMENTAL {
rte_malloc_heap_get_socket;
rte_malloc_heap_memory_add;
rte_malloc_heap_memory_attach;
+ rte_malloc_heap_memory_detach;
rte_malloc_heap_memory_remove;
rte_mem_alloc_validator_register;
rte_mem_alloc_validator_unregister;
--
2.17.1
Anatoly Burakov
2018-09-04 13:11:39 UTC
Permalink
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <***@intel.com>
---
lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
lib/librte_eal/common/rte_malloc.c | 4 ----
2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
return NULL;
}

- if ((socket_id != SOCKET_ID_ANY) &&
- (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+ if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
rte_errno = EINVAL;
return NULL;
}

- if (!rte_eal_has_hugepages())
+ /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+ * external heap.
+ */
+ if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
socket_id = SOCKET_ID_ANY;

contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index dfcdf380a..458c44ba6 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
if (!rte_eal_has_hugepages())
socket_arg = SOCKET_ID_ANY;

- /* Check socket parameter */
- if (socket_arg >= RTE_MAX_NUMA_NODES)
- return NULL;
-
return malloc_heap_alloc(type, size, socket_arg, 0,
align == 0 ? 1 : align, 0, false);
}
--
2.17.1
Shahaf Shuler
2018-09-13 07:44:15 UTC
Permalink
Hi Anatoly,

First thanks for the patchset, it is a great enhancement.

See question below.
Subject: [dpdk-dev] [PATCH 00/16] Support externally allocated memory in
DPDK
This is a proposal to enable using externally allocated memory in DPDK.
- Index internal malloc heaps by NUMA node index, rather than NUMA
node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
- Each new heap will receive a unique socket ID that will be used by
allocator to decide from which heap (internal or external) to
allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
of externally allocated memory
- If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps
The responsibility to ensure memory is accessible before using it is on the
shoulders of the user - there is no checking done with regards to validity of
the memory (nor could there be...).
That makes sense. However who should be in-charge of mapping this memory for dma access?
The user or internally be the PMD when encounter the first packet or while traversing the existing mempools?
The general approach is to create heap and add memory into it. For any other
process wishing to use the same memory, said memory must first be
attached (otherwise some things will not work).
A design decision was made to make multiprocess synchronization a manual
process. Due to underlying issues with attaching to fbarrays in secondary
processes, this design was deemed to be better because we don't want to
fail to create external heap in the primary because something in the
secondary has failed when in fact we may not eve have wanted this memory
to be accessible in the secondary in the first place.
Using external memory in multiprocess is *hard*, because not only memory
space needs to be preallocated, but it also needs to be attached in each
process to allow other processes to access the page table. The attach API call
may or may not succeed, depending on memory layout, for reasons similar to
other multiprocess failures. This is treated as a "known issue" for this release.
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements
mem: add length to memseg list
mem: allow memseg lists to be marked as external
malloc: index heaps using heap ID rather than NUMA node
mem: do not check for invalid socket ID
flow_classify: do not check for invalid socket ID
pipeline: do not check for invalid socket ID
sched: do not check for invalid socket ID
malloc: add name to malloc heaps
malloc: add function to query socket ID of named heap
malloc: allow creating malloc heaps
malloc: allow destroying heaps
malloc: allow adding memory to named heaps
malloc: allow removing memory from named heaps
malloc: allow attaching to external memory chunks
malloc: allow detaching from external memory
test: add unit tests for external memory support
config/common_base | 1 +
config/rte_config.h | 1 +
drivers/bus/fslmc/fslmc_vfio.c | 7 +-
drivers/bus/pci/linux/pci.c | 2 +-
drivers/net/mlx4/mlx4_mr.c | 3 +
drivers/net/mlx5/mlx5.c | 5 +-
drivers/net/mlx5/mlx5_mr.c | 3 +
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 +
lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +-
lib/librte_eal/common/eal_common_memory.c | 9 +-
lib/librte_eal/common/eal_common_memzone.c | 8 +-
.../common/include/rte_eal_memconfig.h | 6 +-
lib/librte_eal/common/include/rte_malloc.h | 181 +++++++++
.../common/include/rte_malloc_heap.h | 3 +
lib/librte_eal/common/include/rte_memory.h | 9 +
lib/librte_eal/common/malloc_heap.c | 287 +++++++++++--
lib/librte_eal/common/malloc_heap.h | 17 +
lib/librte_eal/common/rte_malloc.c | 383 ++++++++++++++++-
lib/librte_eal/linuxapp/eal/eal.c | 3 +
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 +-
lib/librte_eal/rte_eal_version.map | 7 +
lib/librte_flow_classify/rte_flow_classify.c | 3 +-
lib/librte_mempool/rte_mempool.c | 31 +-
lib/librte_pipeline/rte_pipeline.c | 3 +-
lib/librte_sched/rte_sched.c | 2 +-
test/test/Makefile | 1 +
test/test/autotest_data.py | 14 +-
test/test/meson.build | 1 +
test/test/test_external_mem.c | 384 ++++++++++++++++++
test/test/test_malloc.c | 3 +
test/test/test_memzone.c | 3 +
34 files changed, 1346 insertions(+), 84 deletions(-) create mode 100644
test/test/test_external_mem.c
--
2.17.1
Burakov, Anatoly
2018-09-17 10:07:17 UTC
Permalink
Post by Shahaf Shuler
Hi Anatoly,
First thanks for the patchset, it is a great enhancement.
See question below.
Subject: [dpdk-dev] [PATCH 00/16] Support externally allocated memory in
DPDK
This is a proposal to enable using externally allocated memory in DPDK.
- Index internal malloc heaps by NUMA node index, rather than NUMA
node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
- Each new heap will receive a unique socket ID that will be used by
allocator to decide from which heap (internal or external) to
allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
of externally allocated memory
- If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps
The responsibility to ensure memory is accessible before using it is on the
shoulders of the user - there is no checking done with regards to validity of
the memory (nor could there be...).
That makes sense. However who should be in-charge of mapping this memory for dma access?
The user or internally be the PMD when encounter the first packet or while traversing the existing mempools?
Hi Shahaf,

There are two ways this can be solved. The first way is to perform VFIO
mapping automatically on adding/attaching memory. The second is to force
user to do it manually. For now, the latter is chosen because user knows
best if they intend to do DMA on that memory, but i'm open to suggestions.

There is an issue with some devices and buses (i.e. bus/fslmc) bypassing
EAL VFIO infrastructure and performing their own VFIO/DMA mapping magic,
but solving that problem is outside the scope of this patchset. Those
devices/buses should fix themselves :)

When not using VFIO, it's out of our hands anyway.
--
Thanks,
Anatoly
Shahaf Shuler
2018-09-17 12:16:40 UTC
Permalink
Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory
in DPDK
[...]
Post by Shahaf Shuler
Post by Anatoly Burakov
The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).
That makes sense. However who should be in-charge of mapping this
memory for dma access?
Post by Shahaf Shuler
The user or internally be the PMD when encounter the first packet or while
traversing the existing mempools?
Hi Shahaf,
There are two ways this can be solved. The first way is to perform VFIO
mapping automatically on adding/attaching memory. The second is to force
user to do it manually. For now, the latter is chosen because user knows best
if they intend to do DMA on that memory, but i'm open to suggestions.
I agree with that approach, and will add not only if the mempool is for dma or not but also which ports will use this mempool (this can effect on the mapping).
However I don't think this is generic enough to use only VFIO. As you said, there are some devices not using VFIO for mapping rather some proprietary driver utility.
IMO DPDK should introduce generic and device agnostic APIs to the user.

My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap that have a generic dma_map(uint8_t port, address, len). Each driver will register with its own mapping callback (can be vfio_dma_map).
It can be outside of this series, just wondering the people opinion on such approach.
There is an issue with some devices and buses (i.e. bus/fslmc) bypassing EAL
VFIO infrastructure and performing their own VFIO/DMA mapping magic, but
solving that problem is outside the scope of this patchset. Those
devices/buses should fix themselves :)
When not using VFIO, it's out of our hands anyway.
Why?
VFIO is not a must requirement for devices in DPDK
Burakov, Anatoly
2018-09-17 13:00:20 UTC
Permalink
Post by Shahaf Shuler
Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory
in DPDK
[...]
Post by Shahaf Shuler
Post by Anatoly Burakov
The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).
That makes sense. However who should be in-charge of mapping this
memory for dma access?
Post by Shahaf Shuler
The user or internally be the PMD when encounter the first packet or while
traversing the existing mempools?
Hi Shahaf,
There are two ways this can be solved. The first way is to perform VFIO
mapping automatically on adding/attaching memory. The second is to force
user to do it manually. For now, the latter is chosen because user knows best
if they intend to do DMA on that memory, but i'm open to suggestions.
I agree with that approach, and will add not only if the mempool is for dma or not but also which ports will use this mempool (this can effect on the mapping).
That is perhaps too hardware-specific - this should probably be handled
inside the driver callbacks.
Post by Shahaf Shuler
However I don't think this is generic enough to use only VFIO. As you said, there are some devices not using VFIO for mapping rather some proprietary driver utility.
IMO DPDK should introduce generic and device agnostic APIs to the user.
My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap that have a generic dma_map(uint8_t port, address, len). Each driver will register with its own mapping callback (can be vfio_dma_map).
It can be outside of this series, just wondering the people opinion on such approach.
I don't disagree. I don't like bus/net/etc drivers doing their own thing
with regards to mapping, and i would by far prefer generic way to set up
DMA maps, to which VFIO will be a subscriber.
Post by Shahaf Shuler
There is an issue with some devices and buses (i.e. bus/fslmc) bypassing EAL
VFIO infrastructure and performing their own VFIO/DMA mapping magic, but
solving that problem is outside the scope of this patchset. Those
devices/buses should fix themselves :)
When not using VFIO, it's out of our hands anyway.
Why?
VFIO is not a must requirement for devices in DPDK.
When i say "out of our hands", what i mean to say is, currently as far
as EAL API is concerned, there is no DMA mapping outside of VFIO.
Post by Shahaf Shuler
--
Thanks,
Anatoly
--
Thanks,
Anatoly
Shreyansh Jain
2018-09-18 12:29:56 UTC
Permalink
Post by Burakov, Anatoly
Post by Shahaf Shuler
Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory
in DPDK
[...]
Post by Shahaf Shuler
Post by Anatoly Burakov
The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).
That makes sense. However who should be in-charge of mapping this
memory for dma access?
Post by Shahaf Shuler
The user or internally be the PMD when encounter the first packet or while
traversing the existing mempools?
Hi Shahaf,
There are two ways this can be solved. The first way is to perform VFIO
mapping automatically on adding/attaching memory. The second is to force
user to do it manually. For now, the latter is chosen because user knows best
if they intend to do DMA on that memory, but i'm open to suggestions.
I agree with that approach, and will add not only if the mempool is
for dma or not but also which ports will use this mempool (this can
effect on the mapping).
That is perhaps too hardware-specific - this should probably be handled
inside the driver callbacks.
Post by Shahaf Shuler
However I don't think this is generic enough to use only VFIO. As you
said, there are some devices not using VFIO for mapping rather some
proprietary driver utility.
IMO DPDK should introduce generic and device agnostic APIs to the user.
My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap
that have a generic dma_map(uint8_t port, address, len). Each driver
will register with its own mapping callback (can be vfio_dma_map).
It can be outside of this series, just wondering the people opinion on such approach.
I don't disagree. I don't like bus/net/etc drivers doing their own thing
with regards to mapping, and i would by far prefer generic way to set up
DMA maps, to which VFIO will be a subscriber.
Post by Shahaf Shuler
There is an issue with some devices and buses (i.e. bus/fslmc) bypassing EAL
VFIO infrastructure and performing their own VFIO/DMA mapping magic, but
solving that problem is outside the scope of this patchset. Those
devices/buses should fix themselves :)
DMA mapping is a very common principle and can be easily be a candidate
for lets-make-generic-movement, but, being close to hardware (or
hardware specific), it does require the driver to have some flexibility
in terms of its eventual implementation.

I maintain one of those drivers (bus/fslmc) in DPDK which needs to have
special VFIO layer - and from that experience, I can say that VFIO
mapping does require some flexibility. SoC semantics are sometimes too
complex to pin to general-universally-agreed-standard concept. (or, one
can easily call it a 'bug', while it is a 'feature' for others :D)

In fact, NXP has another driver (bus/dpaa) which doesn't even work with
VFIO - loves to work directly with Phys_addr. And, it is not at a lower
priority than one with VFIO.

Thus, I really don't think a strongly controlled VFIO mapping should be
EAL's responsibility. Failure because of lack of mapping is a driver's
problem.
Post by Burakov, Anatoly
Post by Shahaf Shuler
When not using VFIO, it's out of our hands anyway.
Why?
VFIO is not a must requirement for devices in DPDK.
When i say "out of our hands", what i mean to say is, currently as far
as EAL API is concerned, there is no DMA mapping outside of VFIO.
Post by Shahaf Shuler
--
Thanks,
Anatoly
Burakov, Anatoly
2018-09-18 15:15:17 UTC
Permalink
Post by Shreyansh Jain
Post by Burakov, Anatoly
Post by Shahaf Shuler
Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory
in DPDK
[...]
Post by Shahaf Shuler
Post by Anatoly Burakov
The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).
That makes sense. However who should be in-charge of mapping this
memory for dma access?
Post by Shahaf Shuler
The user or internally be the PMD when encounter the first packet or while
traversing the existing mempools?
Hi Shahaf,
There are two ways this can be solved. The first way is to perform VFIO
mapping automatically on adding/attaching memory. The second is to force
user to do it manually. For now, the latter is chosen because user knows best
if they intend to do DMA on that memory, but i'm open to suggestions.
I agree with that approach, and will add not only if the mempool is
for dma or not but also which ports will use this mempool (this can
effect on the mapping).
That is perhaps too hardware-specific - this should probably be
handled inside the driver callbacks.
Post by Shahaf Shuler
However I don't think this is generic enough to use only VFIO. As you
said, there are some devices not using VFIO for mapping rather some
proprietary driver utility.
IMO DPDK should introduce generic and device agnostic APIs to the user.
My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap
that have a generic dma_map(uint8_t port, address, len). Each driver
will register with its own mapping callback (can be vfio_dma_map).
It can be outside of this series, just wondering the people opinion on such approach.
I don't disagree. I don't like bus/net/etc drivers doing their own
thing with regards to mapping, and i would by far prefer generic way
to set up DMA maps, to which VFIO will be a subscriber.
Post by Shahaf Shuler
There is an issue with some devices and buses (i.e. bus/fslmc) bypassing EAL
VFIO infrastructure and performing their own VFIO/DMA mapping magic, but
solving that problem is outside the scope of this patchset. Those
devices/buses should fix themselves :)
DMA mapping is a very common principle and can be easily be a candidate
for lets-make-generic-movement, but, being close to hardware (or
hardware specific), it does require the driver to have some flexibility
in terms of its eventual implementation.
Perhaps i didn't word my response clearly enough. I didn't mean to say
(or imply) that EAL must handle all DMA mappings itself. Rather, EAL
should provide a generic infrastructure of maintaining current mappings
etc., and provide a subscription mechanism for other users (e.g.
drivers) so that the details of implementation of exactly how to map
things for DMA is up to the drivers.

In other words, we agree :)
Post by Shreyansh Jain
I maintain one of those drivers (bus/fslmc) in DPDK which needs to have
special VFIO layer - and from that experience, I can say that VFIO
mapping does require some flexibility. SoC semantics are sometimes too
complex to pin to general-universally-agreed-standard concept. (or, one
can easily call it a 'bug', while it is a 'feature' for others :D)
In fact, NXP has another driver (bus/dpaa) which doesn't even work with
VFIO - loves to work directly with Phys_addr. And, it is not at a lower
priority than one with VFIO.
Thus, I really don't think a strongly controlled VFIO mapping should be
EAL's responsibility. Failure because of lack of mapping is a driver's
problem.
While EAL doesn't necessarily need to be involved with mapping things
for VFIO, i believe it does need to be the authority on what gets
mapped. The user needs a way to make arbitrary memory available for DMA
- this is where EAL comes in. VFIO itself can be factored out into a
separate subsystem (DMA drivers, anyone? :D ), but given that memory
cometh and goeth (external memory included), and given that some things
tend to be a bit complicated [*], EAL needs to know when something is
supposed to be mapped or unmapped, and when to notify subscribers that
they may have to refresh their DMA maps.

[*] for example, VFIO can only do mappings whenever there are devices
actually attached to a VFIO container, so we have to maintain all maps
between hotplug events to ensure that memory set up for DMA doesn't
silently get unmapped on device detach and subsequent attach.
--
Thanks,
Anatoly
Andrew Rybchenko
2018-09-20 09:30:40 UTC
Permalink
Post by Anatoly Burakov
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
A couple of minor questions/suggestions below, but it is OK to
go as is even if rejected.

Acked-by: Andrew Rybchenko <***@solarflare.com>

<...>
Post by Anatoly Burakov
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..d61c77da3 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,40 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}
+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;
- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ valid = msl->socket_id == wa->socket_id;
Is it intended that we accept externally allocated segment
if it is on requested socket? If so, it would be good to add
comment to explain why.
Post by Anatoly Burakov
+ valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+ if (!valid)
+ return 0;
+
+ if (msl->page_sz < wa->min)
+ wa->min = msl->page_sz;
I'd suggest to keep single return (it is just a bit shorter)
if (valid && msl->page_sz < wa->min)
         wa->min = msl->page_sz;

<...>
Burakov, Anatoly
2018-09-20 09:54:20 UTC
Permalink
Post by Andrew Rybchenko
Post by Anatoly Burakov
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
A couple of minor questions/suggestions below, but it is OK to
go as is even if rejected.
<...>
Post by Anatoly Burakov
diff --git a/lib/librte_mempool/rte_mempool.c
b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..d61c77da3 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,40 @@ static unsigned optimize_object_size(unsigned obj_size)
      return new_obj_size * RTE_MEMPOOL_ALIGN;
  }
+struct pagesz_walk_arg {
+    int socket_id;
+    size_t min;
+};
+
  static int
  find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
  {
-    size_t *min = arg;
+    struct pagesz_walk_arg *wa = arg;
+    bool valid;
-    if (msl->page_sz < *min)
-        *min = msl->page_sz;
+    valid = msl->socket_id == wa->socket_id;
Is it intended that we accept externally allocated segment
if it is on requested socket? If so, it would be good to add
comment to explain why.
Accepting externally allocated segments is precisely the point here - we
want to find page size of underlying memory, regardless of whether it's
internal or external. We use socket ID to identify valid page sizes for
a particular heap (since socket ID is technically a heap identifier, as
far as external code is concerned), but within that heap there can be
multiple segment lists corresponding to that socket ID, each with its
own page size.
Post by Andrew Rybchenko
Post by Anatoly Burakov
+    valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+    if (!valid)
+        return 0;
+
+    if (msl->page_sz < wa->min)
+        wa->min = msl->page_sz;
I'd suggest to keep single return (it is just a bit shorter)
if (valid && msl->page_sz < wa->min)
         wa->min = msl->page_sz;
Sure. If there will be other comments that warrant a v3 respin, i'll
incorporate this feedback :)

Thanks for the review!
Post by Andrew Rybchenko
<...>
--
Thanks,
Anatoly
Continue reading on narkive:
Loading...