Discussion:
[dpdk-dev] [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint
(too old to reply)
Anatoly Burakov
2018-06-01 17:15:09 UTC
Permalink
This patchset adds a new command-line option "--in-memory",
which takes old debug options "--huge-unlink" and
"--no-shconf", and enhances them with additional
functionality. This will allow DPDK to reserve hugepages
anonymously instead of using hugetlbfs mountpoints. Coupled
with the fact that this option also effectively enables both
"--no-shconf" and "--huge-unlink" modes, DPDK will be able
to run entirely in memory and not create any shared files
while running - neither hugepages nor any runtime data.

This will, of course, disable secondary processes, but for
use-cases this is targeted at (containers etc.), this is
not a problem.

Older revisions had kernel support at 4.14+ and also
required a fairly new glibc, but now due to not using memfd
and using mmap() instead, minimum supported kernel version
has dropped to 3.8.

RFC->v1 changes:
- Dropped memfd, using anonymous mmap() instead
- Do not deprecate old command-line parameters, instead
use them as they are, and add a deprecation notice to
remove them in the next release.

Anatoly Burakov (9):
fbarray: support no-shconf mode
ipc: add support for no-shconf mode
eal: add support for no-shconf for hugepage info
eal: add support for no-shconf in hugepage data file
eal: do not create runtime dir in no-shconf mode
mem: add support for hugepage-unlink mode
eal: add --in-memory option
doc: add deprecation notice for EAL command line options
mem: support in-memory mode

doc/guides/rel_notes/deprecation.rst | 5 +
lib/librte_eal/bsdapp/eal/eal.c | 3 +-
lib/librte_eal/bsdapp/eal/eal_hugepage_info.c | 4 +
lib/librte_eal/common/eal_common_fbarray.c | 71 +++++----
lib/librte_eal/common/eal_common_options.c | 21 ++-
lib/librte_eal/common/eal_common_proc.c | 25 ++++
lib/librte_eal/common/eal_internal_cfg.h | 4 +
lib/librte_eal/common/eal_options.h | 2 +
lib/librte_eal/linuxapp/eal/eal.c | 3 +-
.../linuxapp/eal/eal_hugepage_info.c | 95 ++++++++----
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 140 ++++++++++++------
lib/librte_eal/linuxapp/eal/eal_memory.c | 16 +-
12 files changed, 272 insertions(+), 117 deletions(-)
--
2.17.0
Anatoly Burakov
2018-06-01 17:15:10 UTC
Permalink
When using --no-shconf option, the expectation is that no multiprocess
will be supported as no shared files are created. However, fbarray still
creates some shared files that prevent multiple processes with the same
prefix from starting.

Fix this by avoiding creating shared files whenever noshconf option is
specified. Since virtual areas we get from eal_get_virtual_area() are
read-only, remap them as writable.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --no-shconf only

lib/librte_eal/common/eal_common_fbarray.c | 71 +++++++++++++---------
1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c
index 019f84c18..2c8b2c218 100644
--- a/lib/librte_eal/common/eal_common_fbarray.c
+++ b/lib/librte_eal/common/eal_common_fbarray.c
@@ -434,39 +434,52 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len,
if (data == NULL)
goto fail;

- eal_get_fbarray_path(path, sizeof(path), name);
+ if (internal_config.no_shconf) {
+ /* remap virtual area as writable */
+ void *new_data = mmap(data, mmap_len, PROT_READ | PROT_WRITE,
+ MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (new_data == MAP_FAILED) {
+ RTE_LOG(DEBUG, EAL, "%s(): couldn't remap anonymous memory: %s\n",
+ __func__, strerror(errno));
+ goto fail;
+ }
+ } else {
+ eal_get_fbarray_path(path, sizeof(path), name);

- /*
- * Each fbarray is unique to process namespace, i.e. the filename
- * depends on process prefix. Try to take out a lock and see if we
- * succeed. If we don't, someone else is using it already.
- */
- fd = open(path, O_CREAT | O_RDWR, 0600);
- if (fd < 0) {
- RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n", __func__,
- path, strerror(errno));
- rte_errno = errno;
- goto fail;
- } else if (flock(fd, LOCK_EX | LOCK_NB)) {
- RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", __func__,
- path, strerror(errno));
- rte_errno = EBUSY;
- goto fail;
- }
+ /*
+ * Each fbarray is unique to process namespace, i.e. the
+ * filename depends on process prefix. Try to take out a lock
+ * and see if we succeed. If we don't, someone else is using it
+ * already.
+ */
+ fd = open(path, O_CREAT | O_RDWR, 0600);
+ if (fd < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n",
+ __func__, path, strerror(errno));
+ rte_errno = errno;
+ goto fail;
+ } else if (flock(fd, LOCK_EX | LOCK_NB)) {
+ RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n",
+ __func__, path, strerror(errno));
+ rte_errno = EBUSY;
+ goto fail;
+ }

- /* take out a non-exclusive lock, so that other processes could still
- * attach to it, but no other process could reinitialize it.
- */
- if (flock(fd, LOCK_SH | LOCK_NB)) {
- rte_errno = errno;
- goto fail;
- }
+ /* take out a non-exclusive lock, so that other processes could
+ * still attach to it, but no other process could reinitialize
+ * it.
+ */
+ if (flock(fd, LOCK_SH | LOCK_NB)) {
+ rte_errno = errno;
+ goto fail;
+ }

- if (resize_and_map(fd, data, mmap_len))
- goto fail;
+ if (resize_and_map(fd, data, mmap_len))
+ goto fail;

- /* we've mmap'ed the file, we can now close the fd */
- close(fd);
+ /* we've mmap'ed the file, we can now close the fd */
+ close(fd);
+ }

/* initialize the data */
memset(data, 0, mmap_len);
--
2.17.0
Anatoly Burakov
2018-06-01 17:15:11 UTC
Permalink
IPC is an inter-process communication mechanism. Since no secondaries
can ever be expected to run in no-shconf mode, IPC will be useless, so
do not enable it in the first place. In the interests of API usage
convenience, we will still allow registering callbacks, but obviously
they won't ever be triggered.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --no-shconf only

lib/librte_eal/common/eal_common_proc.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c
index 707d8ab30..31b5394cc 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -626,6 +626,14 @@ rte_mp_channel_init(void)
int dir_fd;
pthread_t mp_handle_tid, async_reply_handle_tid;

+ /* in no shared files mode, we do not have secondary processes support,
+ * so no need to initialize IPC.
+ */
+ if (internal_config.no_shconf) {
+ RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC will be disabled\n");
+ return 0;
+ }
+
/* create filter path */
create_socket_path("*", path, sizeof(path));
strlcpy(mp_filter, basename(path), sizeof(mp_filter));
@@ -988,6 +996,12 @@ rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply,

if (check_input(req) == false)
return -1;
+
+ if (internal_config.no_shconf) {
+ RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+ return 0;
+ }
+
if (gettimeofday(&now, NULL) < 0) {
RTE_LOG(ERR, EAL, "Faile to get current time\n");
rte_errno = errno;
@@ -1072,6 +1086,12 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,

if (check_input(req) == false)
return -1;
+
+ if (internal_config.no_shconf) {
+ RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+ return 0;
+ }
+
if (gettimeofday(&now, NULL) < 0) {
RTE_LOG(ERR, EAL, "Faile to get current time\n");
rte_errno = errno;
@@ -1213,5 +1233,10 @@ rte_mp_reply(struct rte_mp_msg *msg, const char *peer)
return -1;
}

+ if (internal_config.no_shconf) {
+ RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+ return 0;
+ }
+
return mp_send(msg, peer, MP_REP);
}
--
2.17.0
Anatoly Burakov
2018-06-01 17:15:13 UTC
Permalink
Do not create a shared hugepage data file if we were asked to
not create any shared files.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --no-shconf only

lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index c917de1c2..cb784e1c3 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -521,7 +521,18 @@ static void *
create_shared_memory(const char *filename, const size_t mem_size)
{
void *retval;
- int fd = open(filename, O_CREAT | O_RDWR, 0666);
+ int fd;
+
+ /* if no shared files mode is used, create anonymous memory instead */
+ if (internal_config.no_shconf) {
+ retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (retval == MAP_FAILED)
+ return NULL;
+ return retval;
+ }
+
+ fd = open(filename, O_CREAT | O_RDWR, 0666);
if (fd < 0)
return NULL;
if (ftruncate(fd, mem_size) < 0) {
--
2.17.0
Anatoly Burakov
2018-06-01 17:15:12 UTC
Permalink
Do not create any shared hugepage size info files if we were
asked to not create any shared files.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --no-shconf only

lib/librte_eal/bsdapp/eal/eal_hugepage_info.c | 4 ++++
lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 4 ++++
2 files changed, 8 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
index 836feb672..1e8f5df23 100644
--- a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
@@ -101,6 +101,10 @@ eal_hugepage_info_init(void)
hpi->num_pages[0] = num_buffers;
hpi->lock_descriptor = fd;

+ /* for no shared files mode, do not create shared memory config */
+ if (internal_config.no_shconf)
+ return 0;
+
tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
sizeof(internal_config.hugepage_info));
if (tmp_hpi == NULL ) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 7eca711ba..7f8e2fd9c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -446,6 +446,10 @@ eal_hugepage_info_init(void)
if (hugepage_info_init() < 0)
return -1;

+ /* for no shared files mode, we're done */
+ if (internal_config.no_shconf)
+ return 0;
+
hpi = &internal_config.hugepage_info[0];

tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
--
2.17.0
Anatoly Burakov
2018-06-01 17:15:15 UTC
Permalink
Unlink hugepages after creating them, to honor the hugepage-unlink mode.
We cannot resize non-existing files, so make single file segments
explicitly unsupported.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --huge-unlink only

lib/librte_eal/linuxapp/eal/eal_memalloc.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 8c11f98c9..f1b6d9744 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -512,6 +512,13 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
__func__, strerror(errno));
goto resized;
}
+ if (internal_config.hugepage_unlink) {
+ if (unlink(path)) {
+ RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+ __func__, strerror(errno));
+ goto resized;
+ }
+ }
}

/*
@@ -592,7 +599,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
/* ignore failure, can't make it any worse */
} else {
/* only remove file if we can take out a write lock */
- if (lock(fd, LOCK_EX) == 1)
+ if (internal_config.hugepage_unlink == 0 &&
+ lock(fd, LOCK_EX) == 1)
unlink(path);
close(fd);
}
@@ -617,6 +625,12 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
return -1;
}

+ /* if we've already unlinked the page, nothing needs to be done */
+ if (internal_config.hugepage_unlink) {
+ memset(ms, 0, sizeof(*ms));
+ return 0;
+ }
+
/* if we are not in single file segments mode, we're going to unmap the
* segment and thus drop the lock on original fd, but hugepage dir is
* now locked so we can take out another one without races.
--
2.17.0
Anatoly Burakov
2018-06-01 17:15:14 UTC
Permalink
Now that the rest of the EAL is adjusted to not create any shared
files, prevent runtime directory from ever being created.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --no-shconf only

lib/librte_eal/bsdapp/eal/eal.c | 3 ++-
lib/librte_eal/linuxapp/eal/eal.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index dc279542d..13b6f8ae1 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -601,7 +601,8 @@ rte_eal_init(int argc, char **argv)
}

/* create runtime data directory */
- if (eal_create_runtime_dir() < 0) {
+ if (internal_config.no_shconf == 0 &&
+ eal_create_runtime_dir() < 0) {
rte_eal_init_alert("Cannot create runtime directory\n");
rte_errno = EACCES;
return -1;
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 8655b8691..a8d291520 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -818,7 +818,8 @@ rte_eal_init(int argc, char **argv)
}

/* create runtime data directory */
- if (eal_create_runtime_dir() < 0) {
+ if (internal_config.no_shconf == 0 &&
+ eal_create_runtime_dir() < 0) {
rte_eal_init_alert("Cannot create runtime directory\n");
rte_errno = EACCES;
return -1;
--
2.17.0
Anatoly Burakov
2018-06-01 17:15:16 UTC
Permalink
This command-line option will cause DPDK to operate entirely in
memory and not create any shared files at runtime, including any
shared configuration or hugetlbfs files. This is useful for debug
purposes, as well as for certain use cases like containers or
automatic memory cleanup.

Currently, this option acts as a strict superset of --no-shconf and
--huge-unlink commands.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Do not deprecate old options, instead just coopt them

lib/librte_eal/common/eal_common_options.c | 21 +++++++++++++++++++--
lib/librte_eal/common/eal_internal_cfg.h | 4 ++++
lib/librte_eal/common/eal_options.h | 2 ++
3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index ecebb2923..b175b1446 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -66,6 +66,7 @@ eal_long_options[] = {
{OPT_NO_HUGE, 0, NULL, OPT_NO_HUGE_NUM },
{OPT_NO_PCI, 0, NULL, OPT_NO_PCI_NUM },
{OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM },
+ {OPT_IN_MEMORY, 0, NULL, OPT_IN_MEMORY_NUM },
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM },
{OPT_PCI_WHITELIST, 1, NULL, OPT_PCI_WHITELIST_NUM },
{OPT_PROC_TYPE, 1, NULL, OPT_PROC_TYPE_NUM },
@@ -1165,6 +1166,13 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_shconf = 1;
break;

+ case OPT_IN_MEMORY_NUM:
+ conf->in_memory = 1;
+ /* in-memory is a superset of noshconf and huge-unlink */
+ conf->no_shconf = 1;
+ conf->hugepage_unlink = 1;
+ break;
+
case OPT_PROC_TYPE_NUM:
conf->process_type = eal_parse_proc_type(optarg);
break;
@@ -1316,12 +1324,19 @@ eal_check_common_options(struct internal_config *internal_cfg)
"be specified together with --"OPT_NO_HUGE"\n");
return -1;
}
-
- if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
+ if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink &&
+ !internal_cfg->in_memory) {
RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
"be specified together with --"OPT_NO_HUGE"\n");
return -1;
}
+ if (internal_cfg->single_file_segments &&
+ internal_cfg->hugepage_unlink) {
+ RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is "
+ "not compatible with neither --"OPT_IN_MEMORY" nor "
+ "--"OPT_HUGE_UNLINK"\n");
+ return -1;
+ }

return 0;
}
@@ -1370,6 +1385,8 @@ eal_common_usage(void)
" Set specific log level\n"
" -v Display version information on startup\n"
" -h, --help This help\n"
+ " --"OPT_IN_MEMORY" Operate entirely in memory. This will \n"
+ " disable secondary process support\n"
"\nEAL options for DEBUG use only:\n"
" --"OPT_HUGE_UNLINK" Unlink hugepage files after init\n"
" --"OPT_NO_HUGE" Use malloc instead of hugetlbfs\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index c4cbf3acd..f90d94206 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -41,6 +41,10 @@ struct internal_config {
volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
* instead of native TSC */
volatile unsigned no_shconf; /**< true if there is no shared config */
+ volatile unsigned in_memory;
+ /**< true if DPDK should operate entirely in-memory and not create any
+ * shared files or runtime data.
+ */
volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices */
volatile enum rte_proc_type_t process_type; /**< multi-process proc type */
/** true to try allocating memory on specific sockets */
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index 211ae06ae..dcde4054e 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -45,6 +45,8 @@ enum {
OPT_NO_PCI_NUM,
#define OPT_NO_SHCONF "no-shconf"
OPT_NO_SHCONF_NUM,
+#define OPT_IN_MEMORY "in-memory"
+ OPT_IN_MEMORY_NUM,
#define OPT_SOCKET_MEM "socket-mem"
OPT_SOCKET_MEM_NUM,
#define OPT_SYSLOG "syslog"
--
2.17.0
Anatoly Burakov
2018-06-01 17:15:18 UTC
Permalink
Implement the final piece of the in-memory mode puzzle - enable running
DPDK entirely in memory, without creating any files.

To do it, use mmap with MAP_HUGETLB and size flags to enable DPDK to work
without hugetlbfs mountpoints. In order to enable this, a few things needed
to be changed.

First of all, we need to allow empty hugetlbfs mountpoints in
hugepage_info, and handle them correctly (by not trying to create any
files and lock any directories).

Next, we need to reorder the mapping sequence, because the page is not
really allocated until the page fault, and we cannot get its IOVA
address before we trigger the page fault.

Finally, decide at compile time whether we are going to be supporting
anonymous hugepages or not, because we cannot check for it at runtime.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Drop memfd and instead use mmap() with MAP_HUGETLB. This will drop the
kernel requirements down to 3.8, and does not impose any restrictions
glibc (as far as i known).

Unfortunately, there's a bit of an issue with this approach, because
mmap() is stupid and will happily ignore unsupported arguments. This
means that if the binary were to be compiled on a 3.8+ kernel but run
on a pre-3.8 kernel (such as currently supported minimum of 3.2), then
most likely the memory would be allocated using regular pages, causing
unthinkable performance degradation. No solution to this problem is
currently known to me.

.../linuxapp/eal/eal_hugepage_info.c | 91 +++++++-----
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 130 +++++++++++-------
lib/librte_eal/linuxapp/eal/eal_memory.c | 3 +-
3 files changed, 139 insertions(+), 85 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 7f8e2fd9c..3a7d4b222 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -18,6 +18,8 @@
#include <sys/queue.h>
#include <sys/stat.h>

+#include <linux/mman.h> /* for hugetlb-related flags */
+
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_launch.h>
@@ -313,11 +315,49 @@ compare_hpi(const void *a, const void *b)
return hpi_b->hugepage_sz - hpi_a->hugepage_sz;
}

+static void
+calc_num_pages(struct hugepage_info *hpi, struct dirent *dirent)
+{
+ uint64_t total_pages = 0;
+ unsigned int i;
+
+ /*
+ * first, try to put all hugepages into relevant sockets, but
+ * if first attempts fails, fall back to collecting all pages
+ * in one socket and sorting them later
+ */
+ total_pages = 0;
+ /* we also don't want to do this for legacy init */
+ if (!internal_config.legacy_mem)
+ for (i = 0; i < rte_socket_count(); i++) {
+ int socket = rte_socket_id_by_idx(i);
+ unsigned int num_pages =
+ get_num_hugepages_on_node(
+ dirent->d_name, socket);
+ hpi->num_pages[socket] = num_pages;
+ total_pages += num_pages;
+ }
+ /*
+ * we failed to sort memory from the get go, so fall
+ * back to old way
+ */
+ if (total_pages == 0) {
+ hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
+
+#ifndef RTE_ARCH_64
+ /* for 32-bit systems, limit number of hugepages to
+ * 1GB per page size */
+ hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0],
+ RTE_PGSIZE_1G / hpi->hugepage_sz);
+#endif
+ }
+}
+
static int
hugepage_info_init(void)
{ const char dirent_start_text[] = "hugepages-";
const size_t dirent_start_len = sizeof(dirent_start_text) - 1;
- unsigned int i, total_pages, num_sizes = 0;
+ unsigned int i, num_sizes = 0;
DIR *dir;
struct dirent *dirent;

@@ -355,6 +395,22 @@ hugepage_info_init(void)
"%" PRIu64 " reserved, but no mounted "
"hugetlbfs found for that size\n",
num_pages, hpi->hugepage_sz);
+ /* if we have kernel support for reserving hugepages
+ * through mmap, and we're in in-memory mode, treat this
+ * page size as valid. we cannot be in legacy mode at
+ * this point because we've checked this earlier in the
+ * init process.
+ */
+#ifdef MAP_HUGE_SHIFT
+ if (internal_config.in_memory) {
+ RTE_LOG(DEBUG, EAL, "In-memory mode enabled, "
+ "hugepages of size %" PRIu64 " bytes "
+ "will be allocated anonymously\n",
+ hpi->hugepage_sz);
+ calc_num_pages(hpi, dirent);
+ num_sizes++;
+ }
+#endif
continue;
}

@@ -371,35 +427,7 @@ hugepage_info_init(void)
if (clear_hugedir(hpi->hugedir) == -1)
break;

- /*
- * first, try to put all hugepages into relevant sockets, but
- * if first attempts fails, fall back to collecting all pages
- * in one socket and sorting them later
- */
- total_pages = 0;
- /* we also don't want to do this for legacy init */
- if (!internal_config.legacy_mem)
- for (i = 0; i < rte_socket_count(); i++) {
- int socket = rte_socket_id_by_idx(i);
- unsigned int num_pages =
- get_num_hugepages_on_node(
- dirent->d_name, socket);
- hpi->num_pages[socket] = num_pages;
- total_pages += num_pages;
- }
- /*
- * we failed to sort memory from the get go, so fall
- * back to old way
- */
- if (total_pages == 0)
- hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
-
-#ifndef RTE_ARCH_64
- /* for 32-bit systems, limit number of hugepages to
- * 1GB per page size */
- hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0],
- RTE_PGSIZE_1G / hpi->hugepage_sz);
-#endif
+ calc_num_pages(hpi, dirent);

num_sizes++;
}
@@ -423,8 +451,7 @@ hugepage_info_init(void)

for (j = 0; j < RTE_MAX_NUMA_NODES; j++)
num_pages += hpi->num_pages[j];
- if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0 &&
- num_pages > 0)
+ if (num_pages > 0)
return 0;
}

diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index f1b6d9744..19c53e7af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -28,6 +28,7 @@
#include <numaif.h>
#endif
#include <linux/falloc.h>
+#include <linux/mman.h> /* for hugetlb-related mmap flags */

#include <rte_common.h>
#include <rte_log.h>
@@ -40,6 +41,15 @@
#include "eal_internal_cfg.h"
#include "eal_memalloc.h"

+const int anonymous_hugepages_supported =
+#ifdef MAP_HUGE_SHIFT
+ 1;
+#define RTE_MAP_HUGE_SHIFT MAP_HUGE_SHIFT
+#else
+ 0;
+#define RTE_MAP_HUGE_SHIFT 26
+#endif
+
/*
* not all kernel version support fallocate on hugetlbfs, so fall back to
* ftruncate and disallow deallocation if fallocate is not supported.
@@ -486,47 +496,63 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
int cur_socket_id = 0;
#endif
uint64_t map_offset;
+ rte_iova_t iova;
+ void *va;
char path[PATH_MAX];
int ret = 0;
int fd;
size_t alloc_sz;

- /* takes out a read lock on segment or segment list */
- fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx);
- if (fd < 0) {
- RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n");
- return -1;
- }
-
alloc_sz = hi->hugepage_sz;
- if (internal_config.single_file_segments) {
- map_offset = seg_idx * alloc_sz;
- ret = resize_hugefile(fd, path, list_idx, seg_idx, map_offset,
- alloc_sz, true);
- if (ret < 0)
- goto resized;
+ if (internal_config.in_memory && anonymous_hugepages_supported) {
+ int log2, flags;
+
+ log2 = rte_log2_u32(alloc_sz);
+ /* as per mmap() manpage, all page sizes are log2 of page size
+ * shifted by MAP_HUGE_SHIFT
+ */
+ flags = (log2 << RTE_MAP_HUGE_SHIFT) | MAP_HUGETLB | MAP_FIXED |
+ MAP_PRIVATE | MAP_ANONYMOUS;
+ fd = -1;
+ va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, flags, -1, 0);
} else {
- map_offset = 0;
- if (ftruncate(fd, alloc_sz) < 0) {
- RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n",
- __func__, strerror(errno));
- goto resized;
+ /* takes out a read lock on segment or segment list */
+ fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx);
+ if (fd < 0) {
+ RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n");
+ return -1;
}
- if (internal_config.hugepage_unlink) {
- if (unlink(path)) {
- RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+
+ if (internal_config.single_file_segments) {
+ map_offset = seg_idx * alloc_sz;
+ ret = resize_hugefile(fd, path, list_idx, seg_idx,
+ map_offset, alloc_sz, true);
+ if (ret < 0)
+ goto resized;
+ } else {
+ map_offset = 0;
+ if (ftruncate(fd, alloc_sz) < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n",
__func__, strerror(errno));
goto resized;
}
+ if (internal_config.hugepage_unlink) {
+ if (unlink(path)) {
+ RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+ __func__, strerror(errno));
+ goto resized;
+ }
+ }
}
- }

- /*
- * map the segment, and populate page tables, the kernel fills this
- * segment with zeros if it's a new page.
- */
- void *va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd, map_offset);
+ /*
+ * map the segment, and populate page tables, the kernel fills
+ * this segment with zeros if it's a new page.
+ */
+ va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd,
+ map_offset);
+ }

if (va == MAP_FAILED) {
RTE_LOG(DEBUG, EAL, "%s(): mmap() failed: %s\n", __func__,
@@ -539,24 +565,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
goto resized;
}

- rte_iova_t iova = rte_mem_virt2iova(addr);
- if (iova == RTE_BAD_PHYS_ADDR) {
- RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n",
- __func__);
- goto mapped;
- }
-
-#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
- move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
-
- if (cur_socket_id != socket_id) {
- RTE_LOG(DEBUG, EAL,
- "%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
- __func__, socket_id, cur_socket_id);
- goto mapped;
- }
-#endif
-
/* In linux, hugetlb limitations, like cgroup, are
* enforced at fault time instead of mmap(), even
* with the option of MAP_POPULATE. Kernel will send
@@ -569,9 +577,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
(unsigned int)(alloc_sz >> 20));
goto mapped;
}
- /* for non-single file segments, we can close fd here */
- if (!internal_config.single_file_segments)
- close(fd);

/* we need to trigger a write to the page to enforce page fault and
* ensure that page is accessible to us, but we can't overwrite value
@@ -580,6 +585,28 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
*/
*(volatile int *)addr = *(volatile int *)addr;

+ iova = rte_mem_virt2iova(addr);
+ if (iova == RTE_BAD_PHYS_ADDR) {
+ RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n",
+ __func__);
+ goto mapped;
+ }
+
+#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
+ move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
+
+ if (cur_socket_id != socket_id) {
+ RTE_LOG(DEBUG, EAL,
+ "%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
+ __func__, socket_id, cur_socket_id);
+ goto mapped;
+ }
+#endif
+ /* for non-single file segments that aren't in-memory, we can close fd
+ * here */
+ if (!internal_config.single_file_segments && !internal_config.in_memory)
+ close(fd);
+
ms->addr = addr;
ms->hugepage_sz = alloc_sz;
ms->len = alloc_sz;
@@ -600,6 +627,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
} else {
/* only remove file if we can take out a write lock */
if (internal_config.hugepage_unlink == 0 &&
+ internal_config.in_memory == 0 &&
lock(fd, LOCK_EX) == 1)
unlink(path);
close(fd);
@@ -709,7 +737,7 @@ alloc_seg_walk(const struct rte_memseg_list *msl, void *arg)
* during init, we already hold a write lock, so don't try to take out
* another one.
*/
- if (wa->hi->lock_descriptor == -1) {
+ if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) {
dir_fd = open(wa->hi->hugedir, O_RDONLY);
if (dir_fd < 0) {
RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
@@ -813,7 +841,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
* during init, we already hold a write lock, so don't try to take out
* another one.
*/
- if (wa->hi->lock_descriptor == -1) {
+ if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) {
dir_fd = open(wa->hi->hugedir, O_RDONLY);
if (dir_fd < 0) {
RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index cb784e1c3..a98d8c036 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1060,8 +1060,7 @@ get_socket_mem_size(int socket)

for (i = 0; i < internal_config.num_hugepage_sizes; i++){
struct hugepage_info *hpi = &internal_config.hugepage_info[i];
- if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0)
- size += hpi->hugepage_sz * hpi->num_pages[socket];
+ size += hpi->hugepage_sz * hpi->num_pages[socket];
}

return size;
--
2.17.0
Anatoly Burakov
2018-06-01 17:15:17 UTC
Permalink
Options --no-shconf and --huge-unlink will be removed, and
replaced with --in-memory option, which will be a superset
of these two, and an offially support method to run DPDK
entirely in memory.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Add this patch

doc/guides/rel_notes/deprecation.rst | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1ce692eac..c8344f42f 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,6 +8,11 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------

+* eal: command-line options ``--no-shconf`` and ``--huge-unlink`` will be
+ removed, and replaced with a single option ``--in-memory``, which will
+ enable DPDK to operate entirely in memory, without creating any files on any
+ filesystems.
+
* eal: DPDK runtime configuration file (located at
``/var/run/.<prefix>_config``) will be moved. The new path will be as follows:
--
2.17.0
Anatoly Burakov
2018-07-13 10:27:06 UTC
Permalink
This patchset adds a new command-line option "--in-memory",
which takes old debug options "--huge-unlink" and
"--no-shconf", and enhances them with additional
functionality. This will allow DPDK to reserve hugepages
anonymously instead of using hugetlbfs mountpoints. Coupled
with the fact that this option also effectively enables both
"--no-shconf" and "--huge-unlink" modes, DPDK will be able
to run entirely in memory and not create any shared files
while running - neither hugepages nor any runtime data.

This will, of course, disable secondary processes, but for
use-cases this is targeted at (containers etc.), this is
not a problem.

Older revisions had kernel support at 4.14+ and also
required a fairly new glibc, but now due to not using memfd
and using mmap() instead, minimum supported kernel version
has dropped to 3.8.

v1->v2 changes:
- Rebase on latest master
- Fix patch 5 to include check from patch 6 as commit message
states

RFC->v1 changes:
- Dropped memfd, using anonymous mmap() instead
- Do not deprecate old command-line parameters, instead
use them as they are, and add a deprecation notice to
remove them in the next release.

Anatoly Burakov (9):
fbarray: support no-shconf mode
ipc: add support for no-shconf mode
eal: add support for no-shconf for hugepage info
eal: add support for no-shconf in hugepage data file
eal: do not create runtime dir in no-shconf mode
mem: add support for hugepage-unlink mode
eal: add --in-memory option
doc: add deprecation notice for EAL command line options
mem: support in-memory mode

doc/guides/rel_notes/deprecation.rst | 5 +
lib/librte_eal/bsdapp/eal/eal.c | 3 +-
lib/librte_eal/bsdapp/eal/eal_hugepage_info.c | 4 +
lib/librte_eal/common/eal_common_fbarray.c | 71 +++++----
lib/librte_eal/common/eal_common_options.c | 20 ++-
lib/librte_eal/common/eal_common_proc.c | 25 ++++
lib/librte_eal/common/eal_internal_cfg.h | 4 +
lib/librte_eal/common/eal_options.h | 2 +
lib/librte_eal/linuxapp/eal/eal.c | 3 +-
.../linuxapp/eal/eal_hugepage_info.c | 95 ++++++++----
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 140 ++++++++++++------
lib/librte_eal/linuxapp/eal/eal_memory.c | 16 +-
12 files changed, 271 insertions(+), 117 deletions(-)
--
2.17.1
Anatoly Burakov
2018-07-13 10:27:07 UTC
Permalink
When using --no-shconf option, the expectation is that no multiprocess
will be supported as no shared files are created. However, fbarray still
creates some shared files that prevent multiple processes with the same
prefix from starting.

Fix this by avoiding creating shared files whenever noshconf option is
specified. Since virtual areas we get from eal_get_virtual_area() are
read-only, remap them as writable.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --no-shconf only

lib/librte_eal/common/eal_common_fbarray.c | 71 +++++++++++++---------
1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c
index 977174c4f..43caf3ced 100644
--- a/lib/librte_eal/common/eal_common_fbarray.c
+++ b/lib/librte_eal/common/eal_common_fbarray.c
@@ -705,39 +705,52 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len,
if (data == NULL)
goto fail;

- eal_get_fbarray_path(path, sizeof(path), name);
+ if (internal_config.no_shconf) {
+ /* remap virtual area as writable */
+ void *new_data = mmap(data, mmap_len, PROT_READ | PROT_WRITE,
+ MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (new_data == MAP_FAILED) {
+ RTE_LOG(DEBUG, EAL, "%s(): couldn't remap anonymous memory: %s\n",
+ __func__, strerror(errno));
+ goto fail;
+ }
+ } else {
+ eal_get_fbarray_path(path, sizeof(path), name);

- /*
- * Each fbarray is unique to process namespace, i.e. the filename
- * depends on process prefix. Try to take out a lock and see if we
- * succeed. If we don't, someone else is using it already.
- */
- fd = open(path, O_CREAT | O_RDWR, 0600);
- if (fd < 0) {
- RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n", __func__,
- path, strerror(errno));
- rte_errno = errno;
- goto fail;
- } else if (flock(fd, LOCK_EX | LOCK_NB)) {
- RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", __func__,
- path, strerror(errno));
- rte_errno = EBUSY;
- goto fail;
- }
+ /*
+ * Each fbarray is unique to process namespace, i.e. the
+ * filename depends on process prefix. Try to take out a lock
+ * and see if we succeed. If we don't, someone else is using it
+ * already.
+ */
+ fd = open(path, O_CREAT | O_RDWR, 0600);
+ if (fd < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n",
+ __func__, path, strerror(errno));
+ rte_errno = errno;
+ goto fail;
+ } else if (flock(fd, LOCK_EX | LOCK_NB)) {
+ RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n",
+ __func__, path, strerror(errno));
+ rte_errno = EBUSY;
+ goto fail;
+ }

- /* take out a non-exclusive lock, so that other processes could still
- * attach to it, but no other process could reinitialize it.
- */
- if (flock(fd, LOCK_SH | LOCK_NB)) {
- rte_errno = errno;
- goto fail;
- }
+ /* take out a non-exclusive lock, so that other processes could
+ * still attach to it, but no other process could reinitialize
+ * it.
+ */
+ if (flock(fd, LOCK_SH | LOCK_NB)) {
+ rte_errno = errno;
+ goto fail;
+ }

- if (resize_and_map(fd, data, mmap_len))
- goto fail;
+ if (resize_and_map(fd, data, mmap_len))
+ goto fail;

- /* we've mmap'ed the file, we can now close the fd */
- close(fd);
+ /* we've mmap'ed the file, we can now close the fd */
+ close(fd);
+ }

/* initialize the data */
memset(data, 0, mmap_len);
--
2.17.1
Anatoly Burakov
2018-07-13 10:27:08 UTC
Permalink
IPC is an inter-process communication mechanism. Since no secondaries
can ever be expected to run in no-shconf mode, IPC will be useless, so
do not enable it in the first place. In the interests of API usage
convenience, we will still allow registering callbacks, but obviously
they won't ever be triggered.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --no-shconf only

lib/librte_eal/common/eal_common_proc.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c
index f010ef59e..c19b4b406 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -626,6 +626,14 @@ rte_mp_channel_init(void)
int dir_fd;
pthread_t mp_handle_tid, async_reply_handle_tid;

+ /* in no shared files mode, we do not have secondary processes support,
+ * so no need to initialize IPC.
+ */
+ if (internal_config.no_shconf) {
+ RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC will be disabled\n");
+ return 0;
+ }
+
/* create filter path */
create_socket_path("*", path, sizeof(path));
strlcpy(mp_filter, basename(path), sizeof(mp_filter));
@@ -988,6 +996,12 @@ rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply,

if (check_input(req) == false)
return -1;
+
+ if (internal_config.no_shconf) {
+ RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+ return 0;
+ }
+
if (gettimeofday(&now, NULL) < 0) {
RTE_LOG(ERR, EAL, "Faile to get current time\n");
rte_errno = errno;
@@ -1072,6 +1086,12 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,

if (check_input(req) == false)
return -1;
+
+ if (internal_config.no_shconf) {
+ RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+ return 0;
+ }
+
if (gettimeofday(&now, NULL) < 0) {
RTE_LOG(ERR, EAL, "Faile to get current time\n");
rte_errno = errno;
@@ -1213,5 +1233,10 @@ rte_mp_reply(struct rte_mp_msg *msg, const char *peer)
return -1;
}

+ if (internal_config.no_shconf) {
+ RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+ return 0;
+ }
+
return mp_send(msg, peer, MP_REP);
}
--
2.17.1
Anatoly Burakov
2018-07-13 10:27:15 UTC
Permalink
Implement the final piece of the in-memory mode puzzle - enable running
DPDK entirely in memory, without creating any files.

To do it, use mmap with MAP_HUGETLB and size flags to enable DPDK to work
without hugetlbfs mountpoints. In order to enable this, a few things needed
to be changed.

First of all, we need to allow empty hugetlbfs mountpoints in
hugepage_info, and handle them correctly (by not trying to create any
files and lock any directories).

Next, we need to reorder the mapping sequence, because the page is not
really allocated until the page fault, and we cannot get its IOVA
address before we trigger the page fault.

Finally, decide at compile time whether we are going to be supporting
anonymous hugepages or not, because we cannot check for it at runtime.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Drop memfd and instead use mmap() with MAP_HUGETLB. This will drop the
kernel requirements down to 3.8, and does not impose any restrictions
glibc (as far as i known).

Unfortunately, there's a bit of an issue with this approach, because
mmap() is stupid and will happily ignore unsupported arguments. This
means that if the binary were to be compiled on a 3.8+ kernel but run
on a pre-3.8 kernel (such as currently supported minimum of 3.2), then
most likely the memory would be allocated using regular pages, causing
unthinkable performance degradation. No solution to this problem is
currently known to me.

.../linuxapp/eal/eal_hugepage_info.c | 91 +++++++-----
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 130 +++++++++++-------
lib/librte_eal/linuxapp/eal/eal_memory.c | 3 +-
3 files changed, 139 insertions(+), 85 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 7f8e2fd9c..3a7d4b222 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -18,6 +18,8 @@
#include <sys/queue.h>
#include <sys/stat.h>

+#include <linux/mman.h> /* for hugetlb-related flags */
+
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_launch.h>
@@ -313,11 +315,49 @@ compare_hpi(const void *a, const void *b)
return hpi_b->hugepage_sz - hpi_a->hugepage_sz;
}

+static void
+calc_num_pages(struct hugepage_info *hpi, struct dirent *dirent)
+{
+ uint64_t total_pages = 0;
+ unsigned int i;
+
+ /*
+ * first, try to put all hugepages into relevant sockets, but
+ * if first attempts fails, fall back to collecting all pages
+ * in one socket and sorting them later
+ */
+ total_pages = 0;
+ /* we also don't want to do this for legacy init */
+ if (!internal_config.legacy_mem)
+ for (i = 0; i < rte_socket_count(); i++) {
+ int socket = rte_socket_id_by_idx(i);
+ unsigned int num_pages =
+ get_num_hugepages_on_node(
+ dirent->d_name, socket);
+ hpi->num_pages[socket] = num_pages;
+ total_pages += num_pages;
+ }
+ /*
+ * we failed to sort memory from the get go, so fall
+ * back to old way
+ */
+ if (total_pages == 0) {
+ hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
+
+#ifndef RTE_ARCH_64
+ /* for 32-bit systems, limit number of hugepages to
+ * 1GB per page size */
+ hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0],
+ RTE_PGSIZE_1G / hpi->hugepage_sz);
+#endif
+ }
+}
+
static int
hugepage_info_init(void)
{ const char dirent_start_text[] = "hugepages-";
const size_t dirent_start_len = sizeof(dirent_start_text) - 1;
- unsigned int i, total_pages, num_sizes = 0;
+ unsigned int i, num_sizes = 0;
DIR *dir;
struct dirent *dirent;

@@ -355,6 +395,22 @@ hugepage_info_init(void)
"%" PRIu64 " reserved, but no mounted "
"hugetlbfs found for that size\n",
num_pages, hpi->hugepage_sz);
+ /* if we have kernel support for reserving hugepages
+ * through mmap, and we're in in-memory mode, treat this
+ * page size as valid. we cannot be in legacy mode at
+ * this point because we've checked this earlier in the
+ * init process.
+ */
+#ifdef MAP_HUGE_SHIFT
+ if (internal_config.in_memory) {
+ RTE_LOG(DEBUG, EAL, "In-memory mode enabled, "
+ "hugepages of size %" PRIu64 " bytes "
+ "will be allocated anonymously\n",
+ hpi->hugepage_sz);
+ calc_num_pages(hpi, dirent);
+ num_sizes++;
+ }
+#endif
continue;
}

@@ -371,35 +427,7 @@ hugepage_info_init(void)
if (clear_hugedir(hpi->hugedir) == -1)
break;

- /*
- * first, try to put all hugepages into relevant sockets, but
- * if first attempts fails, fall back to collecting all pages
- * in one socket and sorting them later
- */
- total_pages = 0;
- /* we also don't want to do this for legacy init */
- if (!internal_config.legacy_mem)
- for (i = 0; i < rte_socket_count(); i++) {
- int socket = rte_socket_id_by_idx(i);
- unsigned int num_pages =
- get_num_hugepages_on_node(
- dirent->d_name, socket);
- hpi->num_pages[socket] = num_pages;
- total_pages += num_pages;
- }
- /*
- * we failed to sort memory from the get go, so fall
- * back to old way
- */
- if (total_pages == 0)
- hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
-
-#ifndef RTE_ARCH_64
- /* for 32-bit systems, limit number of hugepages to
- * 1GB per page size */
- hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0],
- RTE_PGSIZE_1G / hpi->hugepage_sz);
-#endif
+ calc_num_pages(hpi, dirent);

num_sizes++;
}
@@ -423,8 +451,7 @@ hugepage_info_init(void)

for (j = 0; j < RTE_MAX_NUMA_NODES; j++)
num_pages += hpi->num_pages[j];
- if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0 &&
- num_pages > 0)
+ if (num_pages > 0)
return 0;
}

diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index d610923b8..10c959da4 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -28,6 +28,7 @@
#include <numaif.h>
#endif
#include <linux/falloc.h>
+#include <linux/mman.h> /* for hugetlb-related mmap flags */

#include <rte_common.h>
#include <rte_log.h>
@@ -41,6 +42,15 @@
#include "eal_memalloc.h"
#include "eal_private.h"

+const int anonymous_hugepages_supported =
+#ifdef MAP_HUGE_SHIFT
+ 1;
+#define RTE_MAP_HUGE_SHIFT MAP_HUGE_SHIFT
+#else
+ 0;
+#define RTE_MAP_HUGE_SHIFT 26
+#endif
+
/*
* not all kernel version support fallocate on hugetlbfs, so fall back to
* ftruncate and disallow deallocation if fallocate is not supported.
@@ -461,6 +471,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
int cur_socket_id = 0;
#endif
uint64_t map_offset;
+ rte_iova_t iova;
+ void *va;
char path[PATH_MAX];
int ret = 0;
int fd;
@@ -468,43 +480,57 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
int flags;
void *new_addr;

- /* takes out a read lock on segment or segment list */
- fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx);
- if (fd < 0) {
- RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n");
- return -1;
- }
-
alloc_sz = hi->hugepage_sz;
- if (internal_config.single_file_segments) {
- map_offset = seg_idx * alloc_sz;
- ret = resize_hugefile(fd, path, list_idx, seg_idx, map_offset,
- alloc_sz, true);
- if (ret < 0)
- goto resized;
+ if (internal_config.in_memory && anonymous_hugepages_supported) {
+ int log2, flags;
+
+ log2 = rte_log2_u32(alloc_sz);
+ /* as per mmap() manpage, all page sizes are log2 of page size
+ * shifted by MAP_HUGE_SHIFT
+ */
+ flags = (log2 << RTE_MAP_HUGE_SHIFT) | MAP_HUGETLB | MAP_FIXED |
+ MAP_PRIVATE | MAP_ANONYMOUS;
+ fd = -1;
+ va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, flags, -1, 0);
} else {
- map_offset = 0;
- if (ftruncate(fd, alloc_sz) < 0) {
- RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n",
- __func__, strerror(errno));
- goto resized;
+ /* takes out a read lock on segment or segment list */
+ fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx);
+ if (fd < 0) {
+ RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n");
+ return -1;
}
- if (internal_config.hugepage_unlink) {
- if (unlink(path)) {
- RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+
+ if (internal_config.single_file_segments) {
+ map_offset = seg_idx * alloc_sz;
+ ret = resize_hugefile(fd, path, list_idx, seg_idx,
+ map_offset, alloc_sz, true);
+ if (ret < 0)
+ goto resized;
+ } else {
+ map_offset = 0;
+ if (ftruncate(fd, alloc_sz) < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n",
__func__, strerror(errno));
goto resized;
}
+ if (internal_config.hugepage_unlink) {
+ if (unlink(path)) {
+ RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+ __func__, strerror(errno));
+ goto resized;
+ }
+ }
}
+
+ /*
+ * map the segment, and populate page tables, the kernel fills
+ * this segment with zeros if it's a new page.
+ */
+ va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd,
+ map_offset);
}

- /*
- * map the segment, and populate page tables, the kernel fills this
- * segment with zeros if it's a new page.
- */
- void *va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd, map_offset);
-
if (va == MAP_FAILED) {
RTE_LOG(DEBUG, EAL, "%s(): mmap() failed: %s\n", __func__,
strerror(errno));
@@ -519,24 +545,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
goto resized;
}

- rte_iova_t iova = rte_mem_virt2iova(addr);
- if (iova == RTE_BAD_PHYS_ADDR) {
- RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n",
- __func__);
- goto mapped;
- }
-
-#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
- move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
-
- if (cur_socket_id != socket_id) {
- RTE_LOG(DEBUG, EAL,
- "%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
- __func__, socket_id, cur_socket_id);
- goto mapped;
- }
-#endif
-
/* In linux, hugetlb limitations, like cgroup, are
* enforced at fault time instead of mmap(), even
* with the option of MAP_POPULATE. Kernel will send
@@ -549,9 +557,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
(unsigned int)(alloc_sz >> 20));
goto mapped;
}
- /* for non-single file segments, we can close fd here */
- if (!internal_config.single_file_segments)
- close(fd);

/* we need to trigger a write to the page to enforce page fault and
* ensure that page is accessible to us, but we can't overwrite value
@@ -560,6 +565,28 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
*/
*(volatile int *)addr = *(volatile int *)addr;

+ iova = rte_mem_virt2iova(addr);
+ if (iova == RTE_BAD_PHYS_ADDR) {
+ RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n",
+ __func__);
+ goto mapped;
+ }
+
+#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
+ move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
+
+ if (cur_socket_id != socket_id) {
+ RTE_LOG(DEBUG, EAL,
+ "%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
+ __func__, socket_id, cur_socket_id);
+ goto mapped;
+ }
+#endif
+ /* for non-single file segments that aren't in-memory, we can close fd
+ * here */
+ if (!internal_config.single_file_segments && !internal_config.in_memory)
+ close(fd);
+
ms->addr = addr;
ms->hugepage_sz = alloc_sz;
ms->len = alloc_sz;
@@ -595,6 +622,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
} else {
/* only remove file if we can take out a write lock */
if (internal_config.hugepage_unlink == 0 &&
+ internal_config.in_memory == 0 &&
lock(fd, LOCK_EX) == 1)
unlink(path);
close(fd);
@@ -705,7 +733,7 @@ alloc_seg_walk(const struct rte_memseg_list *msl, void *arg)
* during init, we already hold a write lock, so don't try to take out
* another one.
*/
- if (wa->hi->lock_descriptor == -1) {
+ if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) {
dir_fd = open(wa->hi->hugedir, O_RDONLY);
if (dir_fd < 0) {
RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
@@ -809,7 +837,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
* during init, we already hold a write lock, so don't try to take out
* another one.
*/
- if (wa->hi->lock_descriptor == -1) {
+ if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) {
dir_fd = open(wa->hi->hugedir, O_RDONLY);
if (dir_fd < 0) {
RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index ddfa8b133..dbf19499e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1088,8 +1088,7 @@ get_socket_mem_size(int socket)

for (i = 0; i < internal_config.num_hugepage_sizes; i++){
struct hugepage_info *hpi = &internal_config.hugepage_info[i];
- if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0)
- size += hpi->hugepage_sz * hpi->num_pages[socket];
+ size += hpi->hugepage_sz * hpi->num_pages[socket];
}

return size;
--
2.17.1
Thomas Monjalon
2018-07-13 12:15:44 UTC
Permalink
There is a compilation error:

../lib/librte_eal/linuxapp/eal/eal_memalloc.c: In function ‘alloc_seg’:
../lib/librte_eal/linuxapp/eal/eal_memalloc.c:619:3: error:
‘map_offset’ may be used uninitialized in this function
Post by Anatoly Burakov
Implement the final piece of the in-memory mode puzzle - enable running
DPDK entirely in memory, without creating any files.
To do it, use mmap with MAP_HUGETLB and size flags to enable DPDK to work
without hugetlbfs mountpoints. In order to enable this, a few things needed
to be changed.
First of all, we need to allow empty hugetlbfs mountpoints in
hugepage_info, and handle them correctly (by not trying to create any
files and lock any directories).
Next, we need to reorder the mapping sequence, because the page is not
really allocated until the page fault, and we cannot get its IOVA
address before we trigger the page fault.
Finally, decide at compile time whether we are going to be supporting
anonymous hugepages or not, because we cannot check for it at runtime.
---
- Drop memfd and instead use mmap() with MAP_HUGETLB. This will drop the
kernel requirements down to 3.8, and does not impose any restrictions
glibc (as far as i known).
Unfortunately, there's a bit of an issue with this approach, because
mmap() is stupid and will happily ignore unsupported arguments. This
means that if the binary were to be compiled on a 3.8+ kernel but run
on a pre-3.8 kernel (such as currently supported minimum of 3.2), then
most likely the memory would be allocated using regular pages, causing
unthinkable performance degradation. No solution to this problem is
currently known to me.
Anatoly Burakov
2018-07-13 10:27:10 UTC
Permalink
Do not create a shared hugepage data file if we were asked to
not create any shared files.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --no-shconf only

lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 5d3c8831b..ddfa8b133 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -521,7 +521,18 @@ static void *
create_shared_memory(const char *filename, const size_t mem_size)
{
void *retval;
- int fd = open(filename, O_CREAT | O_RDWR, 0666);
+ int fd;
+
+ /* if no shared files mode is used, create anonymous memory instead */
+ if (internal_config.no_shconf) {
+ retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (retval == MAP_FAILED)
+ return NULL;
+ return retval;
+ }
+
+ fd = open(filename, O_CREAT | O_RDWR, 0666);
if (fd < 0)
return NULL;
if (ftruncate(fd, mem_size) < 0) {
--
2.17.1
Anatoly Burakov
2018-07-13 10:27:14 UTC
Permalink
Options --no-shconf and --huge-unlink will be removed, and
replaced with --in-memory option, which will be a superset
of these two, and an offially support method to run DPDK
entirely in memory.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Add this patch

doc/guides/rel_notes/deprecation.rst | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 5de59833d..dd1b5c5d8 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,6 +8,11 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------

+* eal: command-line options ``--no-shconf`` and ``--huge-unlink`` will be
+ removed, and replaced with a single option ``--in-memory``, which will
+ enable DPDK to operate entirely in memory, without creating any files on any
+ filesystems.
+
* eal: DPDK runtime configuration file (located at
``/var/run/.<prefix>_config``) will be moved. The new path will be as follows:
--
2.17.1
Thomas Monjalon
2018-07-13 12:13:57 UTC
Permalink
Post by Anatoly Burakov
Options --no-shconf and --huge-unlink will be removed, and
replaced with --in-memory option, which will be a superset
of these two, and an offially support method to run DPDK
entirely in memory.
The deprecation notice should be sent separately in order to wait
for enough agreement.
Burakov, Anatoly
2018-07-13 12:29:24 UTC
Permalink
Post by Thomas Monjalon
Post by Anatoly Burakov
Options --no-shconf and --huge-unlink will be removed, and
replaced with --in-memory option, which will be a superset
of these two, and an offially support method to run DPDK
entirely in memory.
The deprecation notice should be sent separately in order to wait
for enough agreement.
Really, we don't have to deprecate old options. It would be nice to
remove them as half-measures and replace them with a proper
implementation, but it's not strictly necessary, so i'll make it a
separate patch.
--
Thanks,
Anatoly
Anatoly Burakov
2018-07-13 10:27:12 UTC
Permalink
Unlink hugepages after creating them, to honor the hugepage-unlink mode.
We cannot resize non-existing files, so make single file segments
explicitly unsupported.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
v1->v2:
- Move check for hugepage unlink into this patch, to be
consistent with commit message

RFC->v1:
- Use --huge-unlink only

RFC->v1:
- Use --huge-unlink only

lib/librte_eal/common/eal_common_options.c | 6 ++++++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 16 +++++++++++++++-
2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 45ea01a8b..df5d53648 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -1332,6 +1332,12 @@ eal_check_common_options(struct internal_config *internal_cfg)
" is only supported in non-legacy memory mode\n");
return -1;
}
+ if (internal_cfg->single_file_segments &&
+ internal_cfg->hugepage_unlink) {
+ RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is "
+ "not compatible with --"OPT_HUGE_UNLINK"\n");
+ return -1;
+ }

return 0;
}
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 69604f823..d610923b8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -489,6 +489,13 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
__func__, strerror(errno));
goto resized;
}
+ if (internal_config.hugepage_unlink) {
+ if (unlink(path)) {
+ RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+ __func__, strerror(errno));
+ goto resized;
+ }
+ }
}

/*
@@ -587,7 +594,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
/* ignore failure, can't make it any worse */
} else {
/* only remove file if we can take out a write lock */
- if (lock(fd, LOCK_EX) == 1)
+ if (internal_config.hugepage_unlink == 0 &&
+ lock(fd, LOCK_EX) == 1)
unlink(path);
close(fd);
}
@@ -612,6 +620,12 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
return -1;
}

+ /* if we've already unlinked the page, nothing needs to be done */
+ if (internal_config.hugepage_unlink) {
+ memset(ms, 0, sizeof(*ms));
+ return 0;
+ }
+
/* if we are not in single file segments mode, we're going to unmap the
* segment and thus drop the lock on original fd, but hugepage dir is
* now locked so we can take out another one without races.
--
2.17.1
Anatoly Burakov
2018-07-13 10:27:09 UTC
Permalink
Do not create any shared hugepage size info files if we were
asked to not create any shared files.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --no-shconf only

lib/librte_eal/bsdapp/eal/eal_hugepage_info.c | 4 ++++
lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 4 ++++
2 files changed, 8 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
index 836feb672..1e8f5df23 100644
--- a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
@@ -101,6 +101,10 @@ eal_hugepage_info_init(void)
hpi->num_pages[0] = num_buffers;
hpi->lock_descriptor = fd;

+ /* for no shared files mode, do not create shared memory config */
+ if (internal_config.no_shconf)
+ return 0;
+
tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
sizeof(internal_config.hugepage_info));
if (tmp_hpi == NULL ) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 7eca711ba..7f8e2fd9c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -446,6 +446,10 @@ eal_hugepage_info_init(void)
if (hugepage_info_init() < 0)
return -1;

+ /* for no shared files mode, we're done */
+ if (internal_config.no_shconf)
+ return 0;
+
hpi = &internal_config.hugepage_info[0];

tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
--
2.17.1
Anatoly Burakov
2018-07-13 10:27:13 UTC
Permalink
This command-line option will cause DPDK to operate entirely in
memory and not create any shared files at runtime, including any
shared configuration or hugetlbfs files. This is useful for debug
purposes, as well as for certain use cases like containers or
automatic memory cleanup.

Currently, this option acts as a strict superset of --no-shconf and
--huge-unlink commands.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Do not deprecate old options, instead just coopt them

lib/librte_eal/common/eal_common_options.c | 18 ++++++++++++++----
lib/librte_eal/common/eal_internal_cfg.h | 4 ++++
lib/librte_eal/common/eal_options.h | 2 ++
3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index df5d53648..f308b57c3 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -66,6 +66,7 @@ eal_long_options[] = {
{OPT_NO_HUGE, 0, NULL, OPT_NO_HUGE_NUM },
{OPT_NO_PCI, 0, NULL, OPT_NO_PCI_NUM },
{OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM },
+ {OPT_IN_MEMORY, 0, NULL, OPT_IN_MEMORY_NUM },
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM },
{OPT_PCI_WHITELIST, 1, NULL, OPT_PCI_WHITELIST_NUM },
{OPT_PROC_TYPE, 1, NULL, OPT_PROC_TYPE_NUM },
@@ -1170,6 +1171,13 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_shconf = 1;
break;

+ case OPT_IN_MEMORY_NUM:
+ conf->in_memory = 1;
+ /* in-memory is a superset of noshconf and huge-unlink */
+ conf->no_shconf = 1;
+ conf->hugepage_unlink = 1;
+ break;
+
case OPT_PROC_TYPE_NUM:
conf->process_type = eal_parse_proc_type(optarg);
break;
@@ -1321,8 +1329,8 @@ eal_check_common_options(struct internal_config *internal_cfg)
"be specified together with --"OPT_NO_HUGE"\n");
return -1;
}
-
- if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
+ if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink &&
+ !internal_cfg->in_memory) {
RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
"be specified together with --"OPT_NO_HUGE"\n");
return -1;
@@ -1330,12 +1338,12 @@ eal_check_common_options(struct internal_config *internal_cfg)
if (internal_config.force_socket_limits && internal_config.legacy_mem) {
RTE_LOG(ERR, EAL, "Option --"OPT_SOCKET_LIMIT
" is only supported in non-legacy memory mode\n");
- return -1;
}
if (internal_cfg->single_file_segments &&
internal_cfg->hugepage_unlink) {
RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is "
- "not compatible with --"OPT_HUGE_UNLINK"\n");
+ "not compatible with neither --"OPT_IN_MEMORY" nor "
+ "--"OPT_HUGE_UNLINK"\n");
return -1;
}

@@ -1386,6 +1394,8 @@ eal_common_usage(void)
" Set specific log level\n"
" -v Display version information on startup\n"
" -h, --help This help\n"
+ " --"OPT_IN_MEMORY" Operate entirely in memory. This will \n"
+ " disable secondary process support\n"
"\nEAL options for DEBUG use only:\n"
" --"OPT_HUGE_UNLINK" Unlink hugepage files after init\n"
" --"OPT_NO_HUGE" Use malloc instead of hugetlbfs\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index d66cd0313..00ee6e06e 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -41,6 +41,10 @@ struct internal_config {
volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
* instead of native TSC */
volatile unsigned no_shconf; /**< true if there is no shared config */
+ volatile unsigned in_memory;
+ /**< true if DPDK should operate entirely in-memory and not create any
+ * shared files or runtime data.
+ */
volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices */
volatile enum rte_proc_type_t process_type; /**< multi-process proc type */
/** true to try allocating memory on specific sockets */
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index 6d92f64a8..96e166787 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -45,6 +45,8 @@ enum {
OPT_NO_PCI_NUM,
#define OPT_NO_SHCONF "no-shconf"
OPT_NO_SHCONF_NUM,
+#define OPT_IN_MEMORY "in-memory"
+ OPT_IN_MEMORY_NUM,
#define OPT_SOCKET_MEM "socket-mem"
OPT_SOCKET_MEM_NUM,
#define OPT_SOCKET_LIMIT "socket-limit"
--
2.17.1
Thomas Monjalon
2018-07-13 12:13:14 UTC
Permalink
Post by Anatoly Burakov
This command-line option will cause DPDK to operate entirely in
memory and not create any shared files at runtime, including any
shared configuration or hugetlbfs files. This is useful for debug
purposes, as well as for certain use cases like containers or
automatic memory cleanup.
Currently, this option acts as a strict superset of --no-shconf and
--huge-unlink commands.
I would like to see some support or review for this feature.
Burakov, Anatoly
2018-07-13 12:27:56 UTC
Permalink
Post by Thomas Monjalon
Post by Anatoly Burakov
This command-line option will cause DPDK to operate entirely in
memory and not create any shared files at runtime, including any
shared configuration or hugetlbfs files. This is useful for debug
purposes, as well as for certain use cases like containers or
automatic memory cleanup.
Currently, this option acts as a strict superset of --no-shconf and
--huge-unlink commands.
I would like to see some support or review for this feature.
While the justification for it can be use-cases like running a DPDK
process without worrying about cleaning up its hugepages afterwards
(somewhat less of a concern since 18.05 but still a concern if primary
crashes), it is really fixing the no-shconf/huge-unlink options to not
be half-measures.

Both of these options effectively disable secondary processes, but don't
do it in a consistent way - huge-unlink cleans up hugepages after
allocating them, but leaves shared config on. No-shconf disables shared
config, but leaves hugepages in place. Since 18.05, huge-unlink didn't
work anyway (wasn't implemented, which was my omission), and due to EAL
now relying on fbarray's to store some data, no-shconf wasn't working
correctly either because even though shared config wasn't created, two
primaries still couldn't share a prefix with --no-shconf (see the first
patch).

So, this patchset is really an acknowledgement of the fact that both
huge-unlink and no-shconf options are really there to disable secondary
processes and stop leaving files on the file system. I just went one
step further, and instead of allocating-and-then-removing hugepages
we're not creating them in the first place, and map them anonymously
instead.
--
Thanks,
Anatoly
Anatoly Burakov
2018-07-13 10:27:11 UTC
Permalink
Now that the rest of the EAL is adjusted to not create any shared
files, prevent runtime directory from ever being created.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --no-shconf only

lib/librte_eal/bsdapp/eal/eal.c | 3 ++-
lib/librte_eal/linuxapp/eal/eal.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index dc279542d..13b6f8ae1 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -601,7 +601,8 @@ rte_eal_init(int argc, char **argv)
}

/* create runtime data directory */
- if (eal_create_runtime_dir() < 0) {
+ if (internal_config.no_shconf == 0 &&
+ eal_create_runtime_dir() < 0) {
rte_eal_init_alert("Cannot create runtime directory\n");
rte_errno = EACCES;
return -1;
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index ec7cea55d..191960caa 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -832,7 +832,8 @@ rte_eal_init(int argc, char **argv)
}

/* create runtime data directory */
- if (eal_create_runtime_dir() < 0) {
+ if (internal_config.no_shconf == 0 &&
+ eal_create_runtime_dir() < 0) {
rte_eal_init_alert("Cannot create runtime directory\n");
rte_errno = EACCES;
return -1;
--
2.17.1
Anatoly Burakov
2018-07-13 12:47:56 UTC
Permalink
This patchset adds a new command-line option "--in-memory",
which takes old debug options "--huge-unlink" and
"--no-shconf", and enhances them with additional
functionality. This will allow DPDK to reserve hugepages
anonymously instead of using hugetlbfs mountpoints. Coupled
with the fact that this option also effectively enables both
"--no-shconf" and "--huge-unlink" modes, DPDK will be able
to run entirely in memory and not create any shared files
while running - neither hugepages nor any runtime data.

This will, of course, disable secondary processes, but for
use-cases this is targeted at (containers etc.), this is
not a problem.

Older revisions had kernel support at 4.14+ and also
required a fairly new glibc, but now due to not using memfd
and using mmap() instead, minimum supported kernel version
has dropped to 3.8.

v2->v3 changes:
- Fix compile issue in patch 9 (now 8)
- Drop deprecation notice (will be sent separately)

v1->v2 changes:
- Rebase on latest master
- Fix patch 5 to include check from patch 6 as commit message
states

RFC->v1 changes:
- Dropped memfd, using anonymous mmap() instead
- Do not deprecate old command-line parameters, instead
use them as they are, and add a deprecation notice to
remove them in the next release.

Anatoly Burakov (8):
fbarray: support no-shconf mode
ipc: add support for no-shconf mode
eal: add support for no-shconf for hugepage info
eal: add support for no-shconf in hugepage data file
eal: do not create runtime dir in no-shconf mode
mem: add support for hugepage-unlink mode
eal: add --in-memory option
mem: support in-memory mode

lib/librte_eal/bsdapp/eal/eal.c | 3 +-
lib/librte_eal/bsdapp/eal/eal_hugepage_info.c | 4 +
lib/librte_eal/common/eal_common_fbarray.c | 71 +++++----
lib/librte_eal/common/eal_common_options.c | 20 ++-
lib/librte_eal/common/eal_common_proc.c | 25 +++
lib/librte_eal/common/eal_internal_cfg.h | 4 +
lib/librte_eal/common/eal_options.h | 2 +
lib/librte_eal/linuxapp/eal/eal.c | 3 +-
.../linuxapp/eal/eal_hugepage_info.c | 95 +++++++----
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 150 ++++++++++++------
lib/librte_eal/linuxapp/eal/eal_memory.c | 16 +-
11 files changed, 276 insertions(+), 117 deletions(-)
--
2.17.1
Anatoly Burakov
2018-07-13 12:47:57 UTC
Permalink
When using --no-shconf option, the expectation is that no multiprocess
will be supported as no shared files are created. However, fbarray still
creates some shared files that prevent multiple processes with the same
prefix from starting.

Fix this by avoiding creating shared files whenever noshconf option is
specified. Since virtual areas we get from eal_get_virtual_area() are
read-only, remap them as writable.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --no-shconf only

lib/librte_eal/common/eal_common_fbarray.c | 71 +++++++++++++---------
1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c
index 977174c4f..43caf3ced 100644
--- a/lib/librte_eal/common/eal_common_fbarray.c
+++ b/lib/librte_eal/common/eal_common_fbarray.c
@@ -705,39 +705,52 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len,
if (data == NULL)
goto fail;

- eal_get_fbarray_path(path, sizeof(path), name);
+ if (internal_config.no_shconf) {
+ /* remap virtual area as writable */
+ void *new_data = mmap(data, mmap_len, PROT_READ | PROT_WRITE,
+ MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (new_data == MAP_FAILED) {
+ RTE_LOG(DEBUG, EAL, "%s(): couldn't remap anonymous memory: %s\n",
+ __func__, strerror(errno));
+ goto fail;
+ }
+ } else {
+ eal_get_fbarray_path(path, sizeof(path), name);

- /*
- * Each fbarray is unique to process namespace, i.e. the filename
- * depends on process prefix. Try to take out a lock and see if we
- * succeed. If we don't, someone else is using it already.
- */
- fd = open(path, O_CREAT | O_RDWR, 0600);
- if (fd < 0) {
- RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n", __func__,
- path, strerror(errno));
- rte_errno = errno;
- goto fail;
- } else if (flock(fd, LOCK_EX | LOCK_NB)) {
- RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", __func__,
- path, strerror(errno));
- rte_errno = EBUSY;
- goto fail;
- }
+ /*
+ * Each fbarray is unique to process namespace, i.e. the
+ * filename depends on process prefix. Try to take out a lock
+ * and see if we succeed. If we don't, someone else is using it
+ * already.
+ */
+ fd = open(path, O_CREAT | O_RDWR, 0600);
+ if (fd < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n",
+ __func__, path, strerror(errno));
+ rte_errno = errno;
+ goto fail;
+ } else if (flock(fd, LOCK_EX | LOCK_NB)) {
+ RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n",
+ __func__, path, strerror(errno));
+ rte_errno = EBUSY;
+ goto fail;
+ }

- /* take out a non-exclusive lock, so that other processes could still
- * attach to it, but no other process could reinitialize it.
- */
- if (flock(fd, LOCK_SH | LOCK_NB)) {
- rte_errno = errno;
- goto fail;
- }
+ /* take out a non-exclusive lock, so that other processes could
+ * still attach to it, but no other process could reinitialize
+ * it.
+ */
+ if (flock(fd, LOCK_SH | LOCK_NB)) {
+ rte_errno = errno;
+ goto fail;
+ }

- if (resize_and_map(fd, data, mmap_len))
- goto fail;
+ if (resize_and_map(fd, data, mmap_len))
+ goto fail;

- /* we've mmap'ed the file, we can now close the fd */
- close(fd);
+ /* we've mmap'ed the file, we can now close the fd */
+ close(fd);
+ }

/* initialize the data */
memset(data, 0, mmap_len);
--
2.17.1
Anatoly Burakov
2018-07-13 12:47:58 UTC
Permalink
IPC is an inter-process communication mechanism. Since no secondaries
can ever be expected to run in no-shconf mode, IPC will be useless, so
do not enable it in the first place. In the interests of API usage
convenience, we will still allow registering callbacks, but obviously
they won't ever be triggered.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --no-shconf only

lib/librte_eal/common/eal_common_proc.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c
index f010ef59e..c19b4b406 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -626,6 +626,14 @@ rte_mp_channel_init(void)
int dir_fd;
pthread_t mp_handle_tid, async_reply_handle_tid;

+ /* in no shared files mode, we do not have secondary processes support,
+ * so no need to initialize IPC.
+ */
+ if (internal_config.no_shconf) {
+ RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC will be disabled\n");
+ return 0;
+ }
+
/* create filter path */
create_socket_path("*", path, sizeof(path));
strlcpy(mp_filter, basename(path), sizeof(mp_filter));
@@ -988,6 +996,12 @@ rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply,

if (check_input(req) == false)
return -1;
+
+ if (internal_config.no_shconf) {
+ RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+ return 0;
+ }
+
if (gettimeofday(&now, NULL) < 0) {
RTE_LOG(ERR, EAL, "Faile to get current time\n");
rte_errno = errno;
@@ -1072,6 +1086,12 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,

if (check_input(req) == false)
return -1;
+
+ if (internal_config.no_shconf) {
+ RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+ return 0;
+ }
+
if (gettimeofday(&now, NULL) < 0) {
RTE_LOG(ERR, EAL, "Faile to get current time\n");
rte_errno = errno;
@@ -1213,5 +1233,10 @@ rte_mp_reply(struct rte_mp_msg *msg, const char *peer)
return -1;
}

+ if (internal_config.no_shconf) {
+ RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+ return 0;
+ }
+
return mp_send(msg, peer, MP_REP);
}
--
2.17.1
Anatoly Burakov
2018-07-13 12:48:04 UTC
Permalink
Implement the final piece of the in-memory mode puzzle - enable running
DPDK entirely in memory, without creating any files.

To do it, use mmap with MAP_HUGETLB and size flags to enable DPDK to work
without hugetlbfs mountpoints. In order to enable this, a few things needed
to be changed.

First of all, we need to allow empty hugetlbfs mountpoints in
hugepage_info, and handle them correctly (by not trying to create any
files and lock any directories).

Next, we need to reorder the mapping sequence, because the page is not
really allocated until the page fault, and we cannot get its IOVA
address before we trigger the page fault.

Finally, decide at compile time whether we are going to be supporting
anonymous hugepages or not, because we cannot check for it at runtime.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Drop memfd and instead use mmap() with MAP_HUGETLB. This will drop the
kernel requirements down to 3.8, and does not impose any restrictions
glibc (as far as i known).

Unfortunately, there's a bit of an issue with this approach, because
mmap() is stupid and will happily ignore unsupported arguments. This
means that if the binary were to be compiled on a 3.8+ kernel but run
on a pre-3.8 kernel (such as currently supported minimum of 3.2), then
most likely the memory would be allocated using regular pages, causing
unthinkable performance degradation. No solution to this problem is
currently known to me.

.../linuxapp/eal/eal_hugepage_info.c | 91 ++++++++----
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 140 +++++++++++-------
lib/librte_eal/linuxapp/eal/eal_memory.c | 3 +-
3 files changed, 149 insertions(+), 85 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 7f8e2fd9c..3a7d4b222 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -18,6 +18,8 @@
#include <sys/queue.h>
#include <sys/stat.h>

+#include <linux/mman.h> /* for hugetlb-related flags */
+
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_launch.h>
@@ -313,11 +315,49 @@ compare_hpi(const void *a, const void *b)
return hpi_b->hugepage_sz - hpi_a->hugepage_sz;
}

+static void
+calc_num_pages(struct hugepage_info *hpi, struct dirent *dirent)
+{
+ uint64_t total_pages = 0;
+ unsigned int i;
+
+ /*
+ * first, try to put all hugepages into relevant sockets, but
+ * if first attempts fails, fall back to collecting all pages
+ * in one socket and sorting them later
+ */
+ total_pages = 0;
+ /* we also don't want to do this for legacy init */
+ if (!internal_config.legacy_mem)
+ for (i = 0; i < rte_socket_count(); i++) {
+ int socket = rte_socket_id_by_idx(i);
+ unsigned int num_pages =
+ get_num_hugepages_on_node(
+ dirent->d_name, socket);
+ hpi->num_pages[socket] = num_pages;
+ total_pages += num_pages;
+ }
+ /*
+ * we failed to sort memory from the get go, so fall
+ * back to old way
+ */
+ if (total_pages == 0) {
+ hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
+
+#ifndef RTE_ARCH_64
+ /* for 32-bit systems, limit number of hugepages to
+ * 1GB per page size */
+ hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0],
+ RTE_PGSIZE_1G / hpi->hugepage_sz);
+#endif
+ }
+}
+
static int
hugepage_info_init(void)
{ const char dirent_start_text[] = "hugepages-";
const size_t dirent_start_len = sizeof(dirent_start_text) - 1;
- unsigned int i, total_pages, num_sizes = 0;
+ unsigned int i, num_sizes = 0;
DIR *dir;
struct dirent *dirent;

@@ -355,6 +395,22 @@ hugepage_info_init(void)
"%" PRIu64 " reserved, but no mounted "
"hugetlbfs found for that size\n",
num_pages, hpi->hugepage_sz);
+ /* if we have kernel support for reserving hugepages
+ * through mmap, and we're in in-memory mode, treat this
+ * page size as valid. we cannot be in legacy mode at
+ * this point because we've checked this earlier in the
+ * init process.
+ */
+#ifdef MAP_HUGE_SHIFT
+ if (internal_config.in_memory) {
+ RTE_LOG(DEBUG, EAL, "In-memory mode enabled, "
+ "hugepages of size %" PRIu64 " bytes "
+ "will be allocated anonymously\n",
+ hpi->hugepage_sz);
+ calc_num_pages(hpi, dirent);
+ num_sizes++;
+ }
+#endif
continue;
}

@@ -371,35 +427,7 @@ hugepage_info_init(void)
if (clear_hugedir(hpi->hugedir) == -1)
break;

- /*
- * first, try to put all hugepages into relevant sockets, but
- * if first attempts fails, fall back to collecting all pages
- * in one socket and sorting them later
- */
- total_pages = 0;
- /* we also don't want to do this for legacy init */
- if (!internal_config.legacy_mem)
- for (i = 0; i < rte_socket_count(); i++) {
- int socket = rte_socket_id_by_idx(i);
- unsigned int num_pages =
- get_num_hugepages_on_node(
- dirent->d_name, socket);
- hpi->num_pages[socket] = num_pages;
- total_pages += num_pages;
- }
- /*
- * we failed to sort memory from the get go, so fall
- * back to old way
- */
- if (total_pages == 0)
- hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
-
-#ifndef RTE_ARCH_64
- /* for 32-bit systems, limit number of hugepages to
- * 1GB per page size */
- hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0],
- RTE_PGSIZE_1G / hpi->hugepage_sz);
-#endif
+ calc_num_pages(hpi, dirent);

num_sizes++;
}
@@ -423,8 +451,7 @@ hugepage_info_init(void)

for (j = 0; j < RTE_MAX_NUMA_NODES; j++)
num_pages += hpi->num_pages[j];
- if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0 &&
- num_pages > 0)
+ if (num_pages > 0)
return 0;
}

diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index d610923b8..79443c56a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -28,6 +28,7 @@
#include <numaif.h>
#endif
#include <linux/falloc.h>
+#include <linux/mman.h> /* for hugetlb-related mmap flags */

#include <rte_common.h>
#include <rte_log.h>
@@ -41,6 +42,15 @@
#include "eal_memalloc.h"
#include "eal_private.h"

+const int anonymous_hugepages_supported =
+#ifdef MAP_HUGE_SHIFT
+ 1;
+#define RTE_MAP_HUGE_SHIFT MAP_HUGE_SHIFT
+#else
+ 0;
+#define RTE_MAP_HUGE_SHIFT 26
+#endif
+
/*
* not all kernel version support fallocate on hugetlbfs, so fall back to
* ftruncate and disallow deallocation if fallocate is not supported.
@@ -461,6 +471,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
int cur_socket_id = 0;
#endif
uint64_t map_offset;
+ rte_iova_t iova;
+ void *va;
char path[PATH_MAX];
int ret = 0;
int fd;
@@ -468,43 +480,66 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
int flags;
void *new_addr;

- /* takes out a read lock on segment or segment list */
- fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx);
- if (fd < 0) {
- RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n");
- return -1;
- }
-
alloc_sz = hi->hugepage_sz;
- if (internal_config.single_file_segments) {
- map_offset = seg_idx * alloc_sz;
- ret = resize_hugefile(fd, path, list_idx, seg_idx, map_offset,
- alloc_sz, true);
- if (ret < 0)
- goto resized;
- } else {
+ if (internal_config.in_memory && anonymous_hugepages_supported) {
+ int log2, flags;
+
+ log2 = rte_log2_u32(alloc_sz);
+ /* as per mmap() manpage, all page sizes are log2 of page size
+ * shifted by MAP_HUGE_SHIFT
+ */
+ flags = (log2 << RTE_MAP_HUGE_SHIFT) | MAP_HUGETLB | MAP_FIXED |
+ MAP_PRIVATE | MAP_ANONYMOUS;
+ fd = -1;
+ va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, flags, -1, 0);
+
+ /* single-file segments codepath will never be active because
+ * in-memory mode is incompatible with it and it's stopped at
+ * EAL initialization stage, however the compiler doesn't know
+ * that and complains about map_offset being used uninitialized
+ * on failure codepaths while having in-memory mode enabled. so,
+ * assign a value here.
+ */
map_offset = 0;
- if (ftruncate(fd, alloc_sz) < 0) {
- RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n",
- __func__, strerror(errno));
- goto resized;
+ } else {
+ /* takes out a read lock on segment or segment list */
+ fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx);
+ if (fd < 0) {
+ RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n");
+ return -1;
}
- if (internal_config.hugepage_unlink) {
- if (unlink(path)) {
- RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+
+ if (internal_config.single_file_segments) {
+ map_offset = seg_idx * alloc_sz;
+ ret = resize_hugefile(fd, path, list_idx, seg_idx,
+ map_offset, alloc_sz, true);
+ if (ret < 0)
+ goto resized;
+ } else {
+ map_offset = 0;
+ if (ftruncate(fd, alloc_sz) < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n",
__func__, strerror(errno));
goto resized;
}
+ if (internal_config.hugepage_unlink) {
+ if (unlink(path)) {
+ RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+ __func__, strerror(errno));
+ goto resized;
+ }
+ }
}
+
+ /*
+ * map the segment, and populate page tables, the kernel fills
+ * this segment with zeros if it's a new page.
+ */
+ va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd,
+ map_offset);
}

- /*
- * map the segment, and populate page tables, the kernel fills this
- * segment with zeros if it's a new page.
- */
- void *va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd, map_offset);
-
if (va == MAP_FAILED) {
RTE_LOG(DEBUG, EAL, "%s(): mmap() failed: %s\n", __func__,
strerror(errno));
@@ -519,24 +554,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
goto resized;
}

- rte_iova_t iova = rte_mem_virt2iova(addr);
- if (iova == RTE_BAD_PHYS_ADDR) {
- RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n",
- __func__);
- goto mapped;
- }
-
-#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
- move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
-
- if (cur_socket_id != socket_id) {
- RTE_LOG(DEBUG, EAL,
- "%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
- __func__, socket_id, cur_socket_id);
- goto mapped;
- }
-#endif
-
/* In linux, hugetlb limitations, like cgroup, are
* enforced at fault time instead of mmap(), even
* with the option of MAP_POPULATE. Kernel will send
@@ -549,9 +566,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
(unsigned int)(alloc_sz >> 20));
goto mapped;
}
- /* for non-single file segments, we can close fd here */
- if (!internal_config.single_file_segments)
- close(fd);

/* we need to trigger a write to the page to enforce page fault and
* ensure that page is accessible to us, but we can't overwrite value
@@ -560,6 +574,28 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
*/
*(volatile int *)addr = *(volatile int *)addr;

+ iova = rte_mem_virt2iova(addr);
+ if (iova == RTE_BAD_PHYS_ADDR) {
+ RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n",
+ __func__);
+ goto mapped;
+ }
+
+#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
+ move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
+
+ if (cur_socket_id != socket_id) {
+ RTE_LOG(DEBUG, EAL,
+ "%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
+ __func__, socket_id, cur_socket_id);
+ goto mapped;
+ }
+#endif
+ /* for non-single file segments that aren't in-memory, we can close fd
+ * here */
+ if (!internal_config.single_file_segments && !internal_config.in_memory)
+ close(fd);
+
ms->addr = addr;
ms->hugepage_sz = alloc_sz;
ms->len = alloc_sz;
@@ -588,6 +624,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
RTE_LOG(CRIT, EAL, "Can't mmap holes in our virtual address space\n");
}
resized:
+ /* in-memory mode will never be single-file-segments mode */
if (internal_config.single_file_segments) {
resize_hugefile(fd, path, list_idx, seg_idx, map_offset,
alloc_sz, false);
@@ -595,6 +632,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
} else {
/* only remove file if we can take out a write lock */
if (internal_config.hugepage_unlink == 0 &&
+ internal_config.in_memory == 0 &&
lock(fd, LOCK_EX) == 1)
unlink(path);
close(fd);
@@ -705,7 +743,7 @@ alloc_seg_walk(const struct rte_memseg_list *msl, void *arg)
* during init, we already hold a write lock, so don't try to take out
* another one.
*/
- if (wa->hi->lock_descriptor == -1) {
+ if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) {
dir_fd = open(wa->hi->hugedir, O_RDONLY);
if (dir_fd < 0) {
RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
@@ -809,7 +847,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
* during init, we already hold a write lock, so don't try to take out
* another one.
*/
- if (wa->hi->lock_descriptor == -1) {
+ if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) {
dir_fd = open(wa->hi->hugedir, O_RDONLY);
if (dir_fd < 0) {
RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index ddfa8b133..dbf19499e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1088,8 +1088,7 @@ get_socket_mem_size(int socket)

for (i = 0; i < internal_config.num_hugepage_sizes; i++){
struct hugepage_info *hpi = &internal_config.hugepage_info[i];
- if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0)
- size += hpi->hugepage_sz * hpi->num_pages[socket];
+ size += hpi->hugepage_sz * hpi->num_pages[socket];
}

return size;
--
2.17.1
Anatoly Burakov
2018-07-13 12:48:00 UTC
Permalink
Do not create a shared hugepage data file if we were asked to
not create any shared files.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --no-shconf only

lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 5d3c8831b..ddfa8b133 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -521,7 +521,18 @@ static void *
create_shared_memory(const char *filename, const size_t mem_size)
{
void *retval;
- int fd = open(filename, O_CREAT | O_RDWR, 0666);
+ int fd;
+
+ /* if no shared files mode is used, create anonymous memory instead */
+ if (internal_config.no_shconf) {
+ retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (retval == MAP_FAILED)
+ return NULL;
+ return retval;
+ }
+
+ fd = open(filename, O_CREAT | O_RDWR, 0666);
if (fd < 0)
return NULL;
if (ftruncate(fd, mem_size) < 0) {
--
2.17.1
Anatoly Burakov
2018-07-13 12:48:03 UTC
Permalink
This command-line option will cause DPDK to operate entirely in
memory and not create any shared files at runtime, including any
shared configuration or hugetlbfs files. This is useful for debug
purposes, as well as for certain use cases like containers or
automatic memory cleanup.

Currently, this option acts as a strict superset of --no-shconf and
--huge-unlink commands.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Do not deprecate old options, instead just coopt them

lib/librte_eal/common/eal_common_options.c | 18 ++++++++++++++----
lib/librte_eal/common/eal_internal_cfg.h | 4 ++++
lib/librte_eal/common/eal_options.h | 2 ++
3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index df5d53648..f308b57c3 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -66,6 +66,7 @@ eal_long_options[] = {
{OPT_NO_HUGE, 0, NULL, OPT_NO_HUGE_NUM },
{OPT_NO_PCI, 0, NULL, OPT_NO_PCI_NUM },
{OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM },
+ {OPT_IN_MEMORY, 0, NULL, OPT_IN_MEMORY_NUM },
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM },
{OPT_PCI_WHITELIST, 1, NULL, OPT_PCI_WHITELIST_NUM },
{OPT_PROC_TYPE, 1, NULL, OPT_PROC_TYPE_NUM },
@@ -1170,6 +1171,13 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_shconf = 1;
break;

+ case OPT_IN_MEMORY_NUM:
+ conf->in_memory = 1;
+ /* in-memory is a superset of noshconf and huge-unlink */
+ conf->no_shconf = 1;
+ conf->hugepage_unlink = 1;
+ break;
+
case OPT_PROC_TYPE_NUM:
conf->process_type = eal_parse_proc_type(optarg);
break;
@@ -1321,8 +1329,8 @@ eal_check_common_options(struct internal_config *internal_cfg)
"be specified together with --"OPT_NO_HUGE"\n");
return -1;
}
-
- if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
+ if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink &&
+ !internal_cfg->in_memory) {
RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
"be specified together with --"OPT_NO_HUGE"\n");
return -1;
@@ -1330,12 +1338,12 @@ eal_check_common_options(struct internal_config *internal_cfg)
if (internal_config.force_socket_limits && internal_config.legacy_mem) {
RTE_LOG(ERR, EAL, "Option --"OPT_SOCKET_LIMIT
" is only supported in non-legacy memory mode\n");
- return -1;
}
if (internal_cfg->single_file_segments &&
internal_cfg->hugepage_unlink) {
RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is "
- "not compatible with --"OPT_HUGE_UNLINK"\n");
+ "not compatible with neither --"OPT_IN_MEMORY" nor "
+ "--"OPT_HUGE_UNLINK"\n");
return -1;
}

@@ -1386,6 +1394,8 @@ eal_common_usage(void)
" Set specific log level\n"
" -v Display version information on startup\n"
" -h, --help This help\n"
+ " --"OPT_IN_MEMORY" Operate entirely in memory. This will \n"
+ " disable secondary process support\n"
"\nEAL options for DEBUG use only:\n"
" --"OPT_HUGE_UNLINK" Unlink hugepage files after init\n"
" --"OPT_NO_HUGE" Use malloc instead of hugetlbfs\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index d66cd0313..00ee6e06e 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -41,6 +41,10 @@ struct internal_config {
volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
* instead of native TSC */
volatile unsigned no_shconf; /**< true if there is no shared config */
+ volatile unsigned in_memory;
+ /**< true if DPDK should operate entirely in-memory and not create any
+ * shared files or runtime data.
+ */
volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices */
volatile enum rte_proc_type_t process_type; /**< multi-process proc type */
/** true to try allocating memory on specific sockets */
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index 6d92f64a8..96e166787 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -45,6 +45,8 @@ enum {
OPT_NO_PCI_NUM,
#define OPT_NO_SHCONF "no-shconf"
OPT_NO_SHCONF_NUM,
+#define OPT_IN_MEMORY "in-memory"
+ OPT_IN_MEMORY_NUM,
#define OPT_SOCKET_MEM "socket-mem"
OPT_SOCKET_MEM_NUM,
#define OPT_SOCKET_LIMIT "socket-limit"
--
2.17.1
Anatoly Burakov
2018-07-13 12:47:59 UTC
Permalink
Do not create any shared hugepage size info files if we were
asked to not create any shared files.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --no-shconf only

lib/librte_eal/bsdapp/eal/eal_hugepage_info.c | 4 ++++
lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 4 ++++
2 files changed, 8 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
index 836feb672..1e8f5df23 100644
--- a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
@@ -101,6 +101,10 @@ eal_hugepage_info_init(void)
hpi->num_pages[0] = num_buffers;
hpi->lock_descriptor = fd;

+ /* for no shared files mode, do not create shared memory config */
+ if (internal_config.no_shconf)
+ return 0;
+
tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
sizeof(internal_config.hugepage_info));
if (tmp_hpi == NULL ) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 7eca711ba..7f8e2fd9c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -446,6 +446,10 @@ eal_hugepage_info_init(void)
if (hugepage_info_init() < 0)
return -1;

+ /* for no shared files mode, we're done */
+ if (internal_config.no_shconf)
+ return 0;
+
hpi = &internal_config.hugepage_info[0];

tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
--
2.17.1
Anatoly Burakov
2018-07-13 12:48:02 UTC
Permalink
Unlink hugepages after creating them, to honor the hugepage-unlink mode.
We cannot resize non-existing files, so make single file segments
explicitly unsupported.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
v1->v2:
- Move check for hugepage unlink into this patch, to be
consistent with commit message

RFC->v1:
- Use --huge-unlink only

RFC->v1:
- Use --huge-unlink only

lib/librte_eal/common/eal_common_options.c | 6 ++++++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 16 +++++++++++++++-
2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 45ea01a8b..df5d53648 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -1332,6 +1332,12 @@ eal_check_common_options(struct internal_config *internal_cfg)
" is only supported in non-legacy memory mode\n");
return -1;
}
+ if (internal_cfg->single_file_segments &&
+ internal_cfg->hugepage_unlink) {
+ RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is "
+ "not compatible with --"OPT_HUGE_UNLINK"\n");
+ return -1;
+ }

return 0;
}
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 69604f823..d610923b8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -489,6 +489,13 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
__func__, strerror(errno));
goto resized;
}
+ if (internal_config.hugepage_unlink) {
+ if (unlink(path)) {
+ RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+ __func__, strerror(errno));
+ goto resized;
+ }
+ }
}

/*
@@ -587,7 +594,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
/* ignore failure, can't make it any worse */
} else {
/* only remove file if we can take out a write lock */
- if (lock(fd, LOCK_EX) == 1)
+ if (internal_config.hugepage_unlink == 0 &&
+ lock(fd, LOCK_EX) == 1)
unlink(path);
close(fd);
}
@@ -612,6 +620,12 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
return -1;
}

+ /* if we've already unlinked the page, nothing needs to be done */
+ if (internal_config.hugepage_unlink) {
+ memset(ms, 0, sizeof(*ms));
+ return 0;
+ }
+
/* if we are not in single file segments mode, we're going to unmap the
* segment and thus drop the lock on original fd, but hugepage dir is
* now locked so we can take out another one without races.
--
2.17.1
Anatoly Burakov
2018-07-13 12:48:01 UTC
Permalink
Now that the rest of the EAL is adjusted to not create any shared
files, prevent runtime directory from ever being created.

Signed-off-by: Anatoly Burakov <***@intel.com>
---

Notes:
RFC->v1:
- Use --no-shconf only

lib/librte_eal/bsdapp/eal/eal.c | 3 ++-
lib/librte_eal/linuxapp/eal/eal.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index dc279542d..13b6f8ae1 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -601,7 +601,8 @@ rte_eal_init(int argc, char **argv)
}

/* create runtime data directory */
- if (eal_create_runtime_dir() < 0) {
+ if (internal_config.no_shconf == 0 &&
+ eal_create_runtime_dir() < 0) {
rte_eal_init_alert("Cannot create runtime directory\n");
rte_errno = EACCES;
return -1;
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index ec7cea55d..191960caa 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -832,7 +832,8 @@ rte_eal_init(int argc, char **argv)
}

/* create runtime data directory */
- if (eal_create_runtime_dir() < 0) {
+ if (internal_config.no_shconf == 0 &&
+ eal_create_runtime_dir() < 0) {
rte_eal_init_alert("Cannot create runtime directory\n");
rte_errno = EACCES;
return -1;
--
2.17.1
Thomas Monjalon
2018-07-13 13:41:11 UTC
Permalink
Post by Anatoly Burakov
fbarray: support no-shconf mode
ipc: add support for no-shconf mode
eal: add support for no-shconf for hugepage info
eal: add support for no-shconf in hugepage data file
eal: do not create runtime dir in no-shconf mode
mem: add support for hugepage-unlink mode
eal: add --in-memory option
mem: support in-memory mode
Applied, thanks.

Continue reading on narkive:
Loading...