Discussion:
[dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
(too old to reply)
Alejandro Lucero
2018-10-05 12:45:21 UTC
Permalink
I sent a patchset about this to be applied on 17.11 stable. The memory
code has had main changes since that version, so here it is the patchset
adjusted to current master repo.

This patchset adds, mainly, a check for ensuring IOVAs are within a
restricted range due to addressing limitations with some devices. There
are two known cases: NFP and IOMMU VT-d emulation.

With this check IOVAs out of range are detected and PMDs can abort
initialization. For the VT-d case, IOVA VA mode is allowed as long as
IOVAs are within the supported range, avoiding to forbid IOVA VA by
default.

For the addressing limitations known cases, there are just 40(NFP) or
39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
most systems. With machines using more memory, the added check will
ensure IOVAs within the range.

With IOVA VA, and because the way the Linux kernel serves mmap calls
in 64 bits systems, 39 or 40 bits are not enough. It is possible to
give an address hint with a lower starting address than the default one
used by the kernel, and then ensuring the mmap uses that hint or hint plus
some offset. With 64 bits systems, the process virtual address space is
large enoguh for doing the hugepages mmaping within the supported range
when those addressing limitations exist. This patchset also adds a change
for using such a hint making the use of IOVA VA a more than likely
possibility when there are those addressing limitations.

The check is not done by default but just when it is required. This
patchset adds the check for NFP initialization and for setting the IOVA
mode is an emulated VT-d is detected. Also, because the recent patchset
adding dynamic memory allocation, the check is also invoked for ensuring
the new memsegs are within the required range.

This patchset could be applied to stable 18.05.

v2:
- change logs from INFO to DEBUG
- only keeps dma mask if device capable of addressing allocated memory
- add ABI changes
- change hint address increment to page size
- split pci/bus commit in two
- fix commits

v3:
- remove previous code about keeping dma mask
Alejandro Lucero
2018-10-05 12:45:22 UTC
Permalink
A device can suffer addressing limitations. This function checks
memsegs have iovas within the supported range based on dma mask.

PMDs should use this function during initialization if device
suffers addressing limitations, returning an error if this function
returns memsegs out of range.

Another usage is for emulated IOMMU hardware with addressing
limitations.

It is necessary to save the most restricted dma mask for checking out
memory allocated dynamically after initialization.

Signed-off-by: Alejandro Lucero <***@netronome.com>
Reviewed-by: Anatoly Burakov <***@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 10 ++++
lib/librte_eal/common/eal_common_memory.c | 60 +++++++++++++++++++++++
lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_memory.h | 3 ++
lib/librte_eal/common/malloc_heap.c | 12 +++++
lib/librte_eal/linuxapp/eal/eal.c | 2 +
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 91 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 2133a5b..c806dc6 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -104,6 +104,14 @@ New Features
the specified port. The port must be stopped before the command call in order
to reconfigure queues.

+* **Added check for ensuring allocated memory addressable by devices.**
+
+ Some devices can have addressing limitations so a new function,
+ ``rte_eal_check_dma_mask``, has been added for checking allocated memory is
+ not out of the device range. Because now memory can be dynamically allocated
+ after initialization, a dma mask is kept and any new allocated memory will be
+ checked out against that dma mask and rejected if out of range. If more than
+ one device has addressing limitations, the dma mask is the more restricted one.

API Changes
-----------
@@ -156,6 +164,8 @@ ABI Changes
``rte_config`` structure on account of improving DPDK usability when
using either ``--legacy-mem`` or ``--single-file-segments`` flags.

+* eal: added ``dma_maskbits`` to ``rte_mem_config`` for keeping more restricted
+ dma mask based on devices addressing limitations.

Removed Items
-------------
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804..c482f0d 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -385,6 +385,66 @@ struct virtiova {
rte_memseg_walk(dump_memseg, f);
}

+static int
+check_iova(const struct rte_memseg_list *msl __rte_unused,
+ const struct rte_memseg *ms, void *arg)
+{
+ uint64_t *mask = arg;
+ rte_iova_t iova;
+
+ /* higher address within segment */
+ iova = (ms->iova + ms->len) - 1;
+ if (!(iova & *mask))
+ return 0;
+
+ RTE_LOG(DEBUG, EAL, "memseg iova %"PRIx64", len %zx, out of range\n",
+ ms->iova, ms->len);
+
+ RTE_LOG(DEBUG, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
+ return 1;
+}
+
+#if defined(RTE_ARCH_64)
+#define MAX_DMA_MASK_BITS 63
+#else
+#define MAX_DMA_MASK_BITS 31
+#endif
+
+/* check memseg iovas are within the required range based on dma mask */
+int __rte_experimental
+rte_eal_check_dma_mask(uint8_t maskbits)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ uint64_t mask;
+
+ /* sanity check */
+ if (maskbits > MAX_DMA_MASK_BITS) {
+ RTE_LOG(ERR, EAL, "wrong dma mask size %u (Max: %u)\n",
+ maskbits, MAX_DMA_MASK_BITS);
+ return -1;
+ }
+
+ /* create dma mask */
+ mask = ~((1ULL << maskbits) - 1);
+
+ if (rte_memseg_walk(check_iova, &mask))
+ /*
+ * Dma mask precludes hugepage usage.
+ * This device can not be used and we do not need to keep
+ * the dma mask.
+ */
+ return 1;
+
+ /*
+ * we need to keep the more restricted maskbit for checking
+ * potential dynamic memory allocation in the future.
+ */
+ mcfg->dma_maskbits = mcfg->dma_maskbits == 0 ? maskbits :
+ RTE_MIN(mcfg->dma_maskbits, maskbits);
+
+ return 0;
+}
+
/* return the number of memory channels */
unsigned rte_memory_get_nchannel(void)
{
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 62a21c2..b5dff70 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -81,6 +81,9 @@ struct rte_mem_config {
/* legacy mem and single file segments options are shared */
uint32_t legacy_mem;
uint32_t single_file_segments;
+
+ /* keeps the more restricted dma mask */
+ uint8_t dma_maskbits;
} __attribute__((__packed__));


diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277..c349d6c 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -454,6 +454,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
*/
unsigned rte_memory_get_nrank(void);

+/* check memsegs iovas are within a range based on dma mask */
+int rte_eal_check_dma_mask(uint8_t maskbits);
+
/**
* Drivers based on uio will not load unless physical
* addresses are obtainable. It is only possible to get
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3..3b5b2b6 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -259,11 +259,13 @@ struct malloc_elem *
int socket, unsigned int flags, size_t align, size_t bound,
bool contig, struct rte_memseg **ms, int n_segs)
{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct rte_memseg_list *msl;
struct malloc_elem *elem = NULL;
size_t alloc_sz;
int allocd_pages;
void *ret, *map_addr;
+ uint64_t mask;

alloc_sz = (size_t)pg_sz * n_segs;

@@ -291,6 +293,16 @@ struct malloc_elem *
goto fail;
}

+ if (mcfg->dma_maskbits) {
+ mask = ~((1ULL << mcfg->dma_maskbits) - 1);
+ if (rte_eal_check_dma_mask(mask)) {
+ RTE_LOG(ERR, EAL,
+ "%s(): couldn't allocate memory due to DMA mask\n",
+ __func__);
+ goto fail;
+ }
+ }
+
/* add newly minted memsegs to malloc heap */
elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz);

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 4a55d3b..dfe1b8c 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -263,6 +263,8 @@ enum rte_iova_mode
* processes could later map the config into this exact location */
rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;

+ rte_config.mem_config->dma_maskbits = 0;
+
}

/* attach to an existing shared memory config */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bb..2baefce 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -291,6 +291,7 @@ EXPERIMENTAL {
rte_devargs_parsef;
rte_devargs_remove;
rte_devargs_type_count;
+ rte_eal_check_dma_mask;
rte_eal_cleanup;
rte_eal_hotplug_add;
rte_eal_hotplug_remove;
--
1.9.1
Tu, Lijuan
2018-10-10 08:56:05 UTC
Permalink
Hi
-----Original Message-----
Sent: Friday, October 5, 2018 8:45 PM
Subject: [dpdk-dev] [PATCH v3 1/6] mem: add function for checking
memsegs IOVAs addresses
A device can suffer addressing limitations. This function checks memsegs
have iovas within the supported range based on dma mask.
PMDs should use this function during initialization if device suffers
addressing limitations, returning an error if this function returns memsegs
out of range.
Another usage is for emulated IOMMU hardware with addressing limitations.
It is necessary to save the most restricted dma mask for checking out
memory allocated dynamically after initialization.
---
doc/guides/rel_notes/release_18_11.rst | 10 ++++
lib/librte_eal/common/eal_common_memory.c | 60
+++++++++++++++++++++++
lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_memory.h | 3 ++
lib/librte_eal/common/malloc_heap.c | 12 +++++
lib/librte_eal/linuxapp/eal/eal.c | 2 +
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 91 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst
b/doc/guides/rel_notes/release_18_11.rst
index 2133a5b..c806dc6 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -104,6 +104,14 @@ New Features
the specified port. The port must be stopped before the command call in order
to reconfigure queues.
+* **Added check for ensuring allocated memory addressable by devices.**
+
+ Some devices can have addressing limitations so a new function,
+ ``rte_eal_check_dma_mask``, has been added for checking allocated
+ memory is not out of the device range. Because now memory can be
+ dynamically allocated after initialization, a dma mask is kept and
+ any new allocated memory will be checked out against that dma mask
+ and rejected if out of range. If more than one device has addressing
limitations, the dma mask is the more restricted one.
API Changes
-----------
@@ -156,6 +164,8 @@ ABI Changes
``rte_config`` structure on account of improving DPDK usability when
using either ``--legacy-mem`` or ``--single-file-segments`` flags.
+* eal: added ``dma_maskbits`` to ``rte_mem_config`` for keeping more restricted
+ dma mask based on devices addressing limitations.
Removed Items
-------------
diff --git a/lib/librte_eal/common/eal_common_memory.c
b/lib/librte_eal/common/eal_common_memory.c
index 0b69804..c482f0d 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -385,6 +385,66 @@ struct virtiova {
rte_memseg_walk(dump_memseg, f);
}
+static int
+check_iova(const struct rte_memseg_list *msl __rte_unused,
+ const struct rte_memseg *ms, void *arg) {
+ uint64_t *mask = arg;
+ rte_iova_t iova;
+
+ /* higher address within segment */
+ iova = (ms->iova + ms->len) - 1;
+ if (!(iova & *mask))
+ return 0;
+
+ RTE_LOG(DEBUG, EAL, "memseg iova %"PRIx64", len %zx, out of
range\n",
+ ms->iova, ms->len);
+
+ RTE_LOG(DEBUG, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
+ return 1;
+}
+
+#if defined(RTE_ARCH_64)
+#define MAX_DMA_MASK_BITS 63
+#else
+#define MAX_DMA_MASK_BITS 31
+#endif
+
+/* check memseg iovas are within the required range based on dma mask
+*/ int __rte_experimental rte_eal_check_dma_mask(uint8_t maskbits) {
+ struct rte_mem_config *mcfg =
rte_eal_get_configuration()->mem_config;
+ uint64_t mask;
+
+ /* sanity check */
+ if (maskbits > MAX_DMA_MASK_BITS) {
+ RTE_LOG(ERR, EAL, "wrong dma mask size %u (Max: %u)\n",
+ maskbits, MAX_DMA_MASK_BITS);
+ return -1;
+ }
+
+ /* create dma mask */
+ mask = ~((1ULL << maskbits) - 1);
+
+ if (rte_memseg_walk(check_iova, &mask))
[Lijuan]In my environment, testpmd halts at rte_memseg_walk() when maskbits is 0.
+ /*
+ * Dma mask precludes hugepage usage.
+ * This device can not be used and we do not need to keep
+ * the dma mask.
+ */
+ return 1;
+
+ /*
+ * we need to keep the more restricted maskbit for checking
+ * potential dynamic memory allocation in the future.
+ */
+ RTE_MIN(mcfg->dma_maskbits, maskbits);
+
+ return 0;
+}
+
/* return the number of memory channels */ unsigned
rte_memory_get_nchannel(void) { diff --git
a/lib/librte_eal/common/include/rte_eal_memconfig.h
b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 62a21c2..b5dff70 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -81,6 +81,9 @@ struct rte_mem_config {
/* legacy mem and single file segments options are shared */
uint32_t legacy_mem;
uint32_t single_file_segments;
+
+ /* keeps the more restricted dma mask */
+ uint8_t dma_maskbits;
} __attribute__((__packed__));
diff --git a/lib/librte_eal/common/include/rte_memory.h
b/lib/librte_eal/common/include/rte_memory.h
index 14bd277..c349d6c 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -454,6 +454,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct
rte_memseg_list *msl,
*/
unsigned rte_memory_get_nrank(void);
+/* check memsegs iovas are within a range based on dma mask */ int
+rte_eal_check_dma_mask(uint8_t maskbits);
+
/**
* Drivers based on uio will not load unless physical
* addresses are obtainable. It is only possible to get diff --git
a/lib/librte_eal/common/malloc_heap.c
b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3..3b5b2b6 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -259,11 +259,13 @@ struct malloc_elem *
int socket, unsigned int flags, size_t align, size_t bound,
bool contig, struct rte_memseg **ms, int n_segs) {
+ struct rte_mem_config *mcfg =
rte_eal_get_configuration()->mem_config;
struct rte_memseg_list *msl;
struct malloc_elem *elem = NULL;
size_t alloc_sz;
int allocd_pages;
void *ret, *map_addr;
+ uint64_t mask;
alloc_sz = (size_t)pg_sz * n_segs;
@@ -291,6 +293,16 @@ struct malloc_elem *
goto fail;
}
+ if (mcfg->dma_maskbits) {
+ mask = ~((1ULL << mcfg->dma_maskbits) - 1);
+ if (rte_eal_check_dma_mask(mask)) {
+ RTE_LOG(ERR, EAL,
+ "%s(): couldn't allocate memory due to DMA mask\n",
+ __func__);
+ goto fail;
+ }
+ }
+
/* add newly minted memsegs to malloc heap */
elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz);
diff --git a/lib/librte_eal/linuxapp/eal/eal.c
b/lib/librte_eal/linuxapp/eal/eal.c
index 4a55d3b..dfe1b8c 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -263,6 +263,8 @@ enum rte_iova_mode
* processes could later map the config into this exact location */
rte_config.mem_config->mem_cfg_addr = (uintptr_t)
rte_mem_cfg_addr;
+ rte_config.mem_config->dma_maskbits = 0;
+
}
/* attach to an existing shared memory config */ diff --git
a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bb..2baefce 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -291,6 +291,7 @@ EXPERIMENTAL {
rte_devargs_parsef;
rte_devargs_remove;
rte_devargs_type_count;
+ rte_eal_check_dma_mask;
rte_eal_cleanup;
rte_eal_hotplug_add;
rte_eal_hotplug_remove;
--
1.9.1
Alejandro Lucero
2018-10-11 09:26:00 UTC
Permalink
Post by Alejandro Lucero
Hi
-----Original Message-----
Sent: Friday, October 5, 2018 8:45 PM
Subject: [dpdk-dev] [PATCH v3 1/6] mem: add function for checking
memsegs IOVAs addresses
A device can suffer addressing limitations. This function checks memsegs
have iovas within the supported range based on dma mask.
PMDs should use this function during initialization if device suffers
addressing limitations, returning an error if this function returns
memsegs
out of range.
Another usage is for emulated IOMMU hardware with addressing limitations.
It is necessary to save the most restricted dma mask for checking out
memory allocated dynamically after initialization.
---
doc/guides/rel_notes/release_18_11.rst | 10 ++++
lib/librte_eal/common/eal_common_memory.c | 60
+++++++++++++++++++++++
lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_memory.h | 3 ++
lib/librte_eal/common/malloc_heap.c | 12 +++++
lib/librte_eal/linuxapp/eal/eal.c | 2 +
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 91 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst
b/doc/guides/rel_notes/release_18_11.rst
index 2133a5b..c806dc6 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -104,6 +104,14 @@ New Features
the specified port. The port must be stopped before the command call
in
order
to reconfigure queues.
+* **Added check for ensuring allocated memory addressable by devices.**
+
+ Some devices can have addressing limitations so a new function,
+ ``rte_eal_check_dma_mask``, has been added for checking allocated
+ memory is not out of the device range. Because now memory can be
+ dynamically allocated after initialization, a dma mask is kept and
+ any new allocated memory will be checked out against that dma mask
+ and rejected if out of range. If more than one device has addressing
limitations, the dma mask is the more restricted one.
API Changes
-----------
@@ -156,6 +164,8 @@ ABI Changes
``rte_config`` structure on account of improving DPDK usability when
using either ``--legacy-mem`` or ``--single-file-segments``
flags.
+* eal: added ``dma_maskbits`` to ``rte_mem_config`` for keeping more restricted
+ dma mask based on devices addressing limitations.
Removed Items
-------------
diff --git a/lib/librte_eal/common/eal_common_memory.c
b/lib/librte_eal/common/eal_common_memory.c
index 0b69804..c482f0d 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -385,6 +385,66 @@ struct virtiova {
rte_memseg_walk(dump_memseg, f);
}
+static int
+check_iova(const struct rte_memseg_list *msl __rte_unused,
+ const struct rte_memseg *ms, void *arg) {
+ uint64_t *mask = arg;
+ rte_iova_t iova;
+
+ /* higher address within segment */
+ iova = (ms->iova + ms->len) - 1;
+ if (!(iova & *mask))
+ return 0;
+
+ RTE_LOG(DEBUG, EAL, "memseg iova %"PRIx64", len %zx, out of
range\n",
+ ms->iova, ms->len);
+
+ RTE_LOG(DEBUG, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
+ return 1;
+}
+
+#if defined(RTE_ARCH_64)
+#define MAX_DMA_MASK_BITS 63
+#else
+#define MAX_DMA_MASK_BITS 31
+#endif
+
+/* check memseg iovas are within the required range based on dma mask
+*/ int __rte_experimental rte_eal_check_dma_mask(uint8_t maskbits) {
+ struct rte_mem_config *mcfg =
rte_eal_get_configuration()->mem_config;
+ uint64_t mask;
+
+ /* sanity check */
+ if (maskbits > MAX_DMA_MASK_BITS) {
+ RTE_LOG(ERR, EAL, "wrong dma mask size %u (Max: %u)\n",
+ maskbits, MAX_DMA_MASK_BITS);
+ return -1;
+ }
+
+ /* create dma mask */
+ mask = ~((1ULL << maskbits) - 1);
+
+ if (rte_memseg_walk(check_iova, &mask))
[Lijuan]In my environment, testpmd halts at rte_memseg_walk() when maskbits is 0.
Can you explain this further?
Who is calling rte_eal_check_dma_mask with mask 0? is this a X86_64 system?
The only explanation I can find is the IOMMU hardware reporting mgaw=0 what
I would say is something completely wrong.
Post by Alejandro Lucero
+ /*
+ * Dma mask precludes hugepage usage.
+ * This device can not be used and we do not need to keep
+ * the dma mask.
+ */
+ return 1;
+
+ /*
+ * we need to keep the more restricted maskbit for checking
+ * potential dynamic memory allocation in the future.
+ */
+ RTE_MIN(mcfg->dma_maskbits, maskbits);
+
+ return 0;
+}
+
/* return the number of memory channels */ unsigned
rte_memory_get_nchannel(void) { diff --git
a/lib/librte_eal/common/include/rte_eal_memconfig.h
b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 62a21c2..b5dff70 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -81,6 +81,9 @@ struct rte_mem_config {
/* legacy mem and single file segments options are shared */
uint32_t legacy_mem;
uint32_t single_file_segments;
+
+ /* keeps the more restricted dma mask */
+ uint8_t dma_maskbits;
} __attribute__((__packed__));
diff --git a/lib/librte_eal/common/include/rte_memory.h
b/lib/librte_eal/common/include/rte_memory.h
index 14bd277..c349d6c 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -454,6 +454,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct
rte_memseg_list *msl,
*/
unsigned rte_memory_get_nrank(void);
+/* check memsegs iovas are within a range based on dma mask */ int
+rte_eal_check_dma_mask(uint8_t maskbits);
+
/**
* Drivers based on uio will not load unless physical
* addresses are obtainable. It is only possible to get diff --git
a/lib/librte_eal/common/malloc_heap.c
b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3..3b5b2b6 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -259,11 +259,13 @@ struct malloc_elem *
int socket, unsigned int flags, size_t align, size_t bound,
bool contig, struct rte_memseg **ms, int n_segs) {
+ struct rte_mem_config *mcfg =
rte_eal_get_configuration()->mem_config;
struct rte_memseg_list *msl;
struct malloc_elem *elem = NULL;
size_t alloc_sz;
int allocd_pages;
void *ret, *map_addr;
+ uint64_t mask;
alloc_sz = (size_t)pg_sz * n_segs;
@@ -291,6 +293,16 @@ struct malloc_elem *
goto fail;
}
+ if (mcfg->dma_maskbits) {
+ mask = ~((1ULL << mcfg->dma_maskbits) - 1);
+ if (rte_eal_check_dma_mask(mask)) {
+ RTE_LOG(ERR, EAL,
+ "%s(): couldn't allocate memory due to DMA
mask\n",
+ __func__);
+ goto fail;
+ }
+ }
+
/* add newly minted memsegs to malloc heap */
elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz);
diff --git a/lib/librte_eal/linuxapp/eal/eal.c
b/lib/librte_eal/linuxapp/eal/eal.c
index 4a55d3b..dfe1b8c 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -263,6 +263,8 @@ enum rte_iova_mode
* processes could later map the config into this exact location */
rte_config.mem_config->mem_cfg_addr = (uintptr_t)
rte_mem_cfg_addr;
+ rte_config.mem_config->dma_maskbits = 0;
+
}
/* attach to an existing shared memory config */ diff --git
a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bb..2baefce 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -291,6 +291,7 @@ EXPERIMENTAL {
rte_devargs_parsef;
rte_devargs_remove;
rte_devargs_type_count;
+ rte_eal_check_dma_mask;
rte_eal_cleanup;
rte_eal_hotplug_add;
rte_eal_hotplug_remove;
--
1.9.1
Thomas Monjalon
2018-10-28 21:03:48 UTC
Permalink
Post by Alejandro Lucero
A device can suffer addressing limitations. This function checks
memsegs have iovas within the supported range based on dma mask.
PMDs should use this function during initialization if device
suffers addressing limitations, returning an error if this function
returns memsegs out of range.
Another usage is for emulated IOMMU hardware with addressing
limitations.
It is necessary to save the most restricted dma mask for checking out
memory allocated dynamically after initialization.
I am really disappointed that no more people looked at this patch,
even after asking in several maintainers meeting:
http://mails.dpdk.org/archives/dev/2018-October/116337.html
http://mails.dpdk.org/archives/dev/2018-October/117092.html

This series will enter in 18.11-rc1 because no strong objection
(only 1 issue raised without any followup).

I had no time to review it recently, but after a quick look, I see
some details which can be improved. Please, Alejandro,
consider comments below for a separate patch in -rc2. Thanks

[...]
Post by Alejandro Lucero
+static int
+check_iova(const struct rte_memseg_list *msl __rte_unused,
+ const struct rte_memseg *ms, void *arg)
+{
+ uint64_t *mask = arg;
+ rte_iova_t iova;
+
+ /* higher address within segment */
+ iova = (ms->iova + ms->len) - 1;
This black magic needs to be better commented.
Post by Alejandro Lucero
+ if (!(iova & *mask))
+ return 0;
+
+ RTE_LOG(DEBUG, EAL, "memseg iova %"PRIx64", len %zx, out of range\n",
+ ms->iova, ms->len);
+
+ RTE_LOG(DEBUG, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
+ return 1;
+}
[...]
Post by Alejandro Lucero
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
+/* check memsegs iovas are within a range based on dma mask */
+int rte_eal_check_dma_mask(uint8_t maskbits);
You need to insert a doxygen comment, explaining the format
of the maskbits parameter, and the behaviour of the function.
The doxygen comment must also state the function is experimental.

About the name of the function, the trend is to use rte_mem_ prefix
instead of rte_eal_ in this case.

__rte_experimental tag must be added to the function prototype, I will do
when applying the patch.
Alejandro Lucero
2018-10-05 12:45:23 UTC
Permalink
Linux kernel uses a really high address as starting address for
serving mmaps calls. If there exist addressing limitations and
IOVA mode is VA, this starting address is likely too high for
those devices. However, it is possible to use a lower address in
the process virtual address space as with 64 bits there is a lot
of available space.

This patch adds an address hint as starting address for 64 bits
systems and increments the hint for next invocations. If the mmap
call does not use the hint address, repeat the mmap call using
the hint address incremented by page size.

Signed-off-by: Alejandro Lucero <***@netronome.com>
Reviewed-by: Anatoly Burakov <***@intel.com>
---
lib/librte_eal/common/eal_common_memory.c | 34 ++++++++++++++++++++++++++++++-
1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index c482f0d..853c44c 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -37,6 +37,23 @@
static void *next_baseaddr;
static uint64_t system_page_sz;

+#ifdef RTE_ARCH_64
+/*
+ * Linux kernel uses a really high address as starting address for serving
+ * mmaps calls. If there exists addressing limitations and IOVA mode is VA,
+ * this starting address is likely too high for those devices. However, it
+ * is possible to use a lower address in the process virtual address space
+ * as with 64 bits there is a lot of available space.
+ *
+ * Current known limitations are 39 or 40 bits. Setting the starting address
+ * at 4GB implies there are 508GB or 1020GB for mapping the available
+ * hugepages. This is likely enough for most systems, although a device with
+ * addressing limitations should call rte_eal_check_dma_mask for ensuring all
+ * memory is within supported range.
+ */
+static uint64_t baseaddr = 0x100000000;
+#endif
+
void *
eal_get_virtual_area(void *requested_addr, size_t *size,
size_t page_sz, int flags, int mmap_flags)
@@ -60,6 +77,11 @@
rte_eal_process_type() == RTE_PROC_PRIMARY)
next_baseaddr = (void *) internal_config.base_virtaddr;

+#ifdef RTE_ARCH_64
+ if (next_baseaddr == NULL && internal_config.base_virtaddr == 0 &&
+ rte_eal_process_type() == RTE_PROC_PRIMARY)
+ next_baseaddr = (void *) baseaddr;
+#endif
if (requested_addr == NULL && next_baseaddr != NULL) {
requested_addr = next_baseaddr;
requested_addr = RTE_PTR_ALIGN(requested_addr, page_sz);
@@ -91,7 +113,17 @@
mmap_flags, -1, 0);
if (mapped_addr == MAP_FAILED && allow_shrink)
*size -= page_sz;
- } while (allow_shrink && mapped_addr == MAP_FAILED && *size > 0);
+
+ if (mapped_addr != MAP_FAILED && addr_is_hint &&
+ mapped_addr != requested_addr) {
+ /* hint was not used. Try with another offset */
+ munmap(mapped_addr, map_sz);
+ mapped_addr = MAP_FAILED;
+ next_baseaddr = RTE_PTR_ADD(next_baseaddr, page_sz);
+ requested_addr = next_baseaddr;
+ }
+ } while ((allow_shrink || addr_is_hint) &&
+ mapped_addr == MAP_FAILED && *size > 0);

/* align resulting address - if map failed, we will ignore the value
* anyway, so no need to add additional checks.
--
1.9.1
Dariusz Stojaczyk
2018-10-29 16:08:06 UTC
Permalink
On Fri, Oct 5, 2018 at 2:47 PM Alejandro Lucero
Post by Alejandro Lucero
Linux kernel uses a really high address as starting address for
serving mmaps calls. If there exist addressing limitations and
IOVA mode is VA, this starting address is likely too high for
those devices. However, it is possible to use a lower address in
the process virtual address space as with 64 bits there is a lot
of available space.
This patch adds an address hint as starting address for 64 bits
systems and increments the hint for next invocations. If the mmap
call does not use the hint address, repeat the mmap call using
the hint address incremented by page size.
---
lib/librte_eal/common/eal_common_memory.c | 34 ++++++++++++++++++++++++++++++-
1 file changed, 33 insertions(+), 1 deletion(-)
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index c482f0d..853c44c 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -37,6 +37,23 @@
static void *next_baseaddr;
static uint64_t system_page_sz;
+#ifdef RTE_ARCH_64
+/*
+ * Linux kernel uses a really high address as starting address for serving
+ * mmaps calls. If there exists addressing limitations and IOVA mode is VA,
+ * this starting address is likely too high for those devices. However, it
+ * is possible to use a lower address in the process virtual address space
+ * as with 64 bits there is a lot of available space.
+ *
+ * Current known limitations are 39 or 40 bits. Setting the starting address
+ * at 4GB implies there are 508GB or 1020GB for mapping the available
+ * hugepages. This is likely enough for most systems, although a device with
+ * addressing limitations should call rte_eal_check_dma_mask for ensuring all
+ * memory is within supported range.
+ */
+static uint64_t baseaddr = 0x100000000;
+#endif
This breaks running with ASAN unless a custom --base-virtaddr option
is specified. The default base-virtaddr introduced by this patch falls
into an area that's already reserved by ASAN.

See here: https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm
The only available address space starts at 0x10007fff8000, which
unfortunately doesn't fit in 39 bits.

Right now the very first eal_get_virtual_area() in EAL initialization
is used with 4KB pagesize, meaning that DPDK will try to mmap at each
4KB-aligned offset all the way from 0x100000000 to 0x10007fff8000,
which takes quite a long, long time.

I'm not sure about the solution to this problem, but I verify that
starting DPDK 18.11-rc1 with `--base-virtaddr 0x200000000000` works
just fine under ASAN.

D.
Post by Alejandro Lucero
<snip>
Alejandro Lucero
2018-10-29 16:40:54 UTC
Permalink
Hi Dariousz,
Post by Dariusz Stojaczyk
On Fri, Oct 5, 2018 at 2:47 PM Alejandro Lucero
Post by Alejandro Lucero
Linux kernel uses a really high address as starting address for
serving mmaps calls. If there exist addressing limitations and
IOVA mode is VA, this starting address is likely too high for
those devices. However, it is possible to use a lower address in
the process virtual address space as with 64 bits there is a lot
of available space.
This patch adds an address hint as starting address for 64 bits
systems and increments the hint for next invocations. If the mmap
call does not use the hint address, repeat the mmap call using
the hint address incremented by page size.
---
lib/librte_eal/common/eal_common_memory.c | 34
++++++++++++++++++++++++++++++-
Post by Alejandro Lucero
1 file changed, 33 insertions(+), 1 deletion(-)
diff --git a/lib/librte_eal/common/eal_common_memory.c
b/lib/librte_eal/common/eal_common_memory.c
Post by Alejandro Lucero
index c482f0d..853c44c 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -37,6 +37,23 @@
static void *next_baseaddr;
static uint64_t system_page_sz;
+#ifdef RTE_ARCH_64
+/*
+ * Linux kernel uses a really high address as starting address for
serving
Post by Alejandro Lucero
+ * mmaps calls. If there exists addressing limitations and IOVA mode is
VA,
Post by Alejandro Lucero
+ * this starting address is likely too high for those devices. However,
it
Post by Alejandro Lucero
+ * is possible to use a lower address in the process virtual address
space
Post by Alejandro Lucero
+ * as with 64 bits there is a lot of available space.
+ *
+ * Current known limitations are 39 or 40 bits. Setting the starting
address
Post by Alejandro Lucero
+ * at 4GB implies there are 508GB or 1020GB for mapping the available
+ * hugepages. This is likely enough for most systems, although a device
with
Post by Alejandro Lucero
+ * addressing limitations should call rte_eal_check_dma_mask for
ensuring all
Post by Alejandro Lucero
+ * memory is within supported range.
+ */
+static uint64_t baseaddr = 0x100000000;
+#endif
This breaks running with ASAN unless a custom --base-virtaddr option
is specified. The default base-virtaddr introduced by this patch falls
into an area that's already reserved by ASAN.
https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm
The only available address space starts at 0x10007fff8000, which
unfortunately doesn't fit in 39 bits.
Right now the very first eal_get_virtual_area() in EAL initialization
is used with 4KB pagesize, meaning that DPDK will try to mmap at each
4KB-aligned offset all the way from 0x100000000 to 0x10007fff8000,
which takes quite a long, long time.
I'm not sure about the solution to this problem, but I verify that
starting DPDK 18.11-rc1 with `--base-virtaddr 0x200000000000` works
just fine under ASAN.
Do we have documentation about using Address Sanitizer?
I understand the goal but, which is the cost? Do you have numbers about the
impact on performance?

Solving this is not trivial. I would say someone interested in this but
using a hardware with addressing limitations needs to choose.
Could it be possible to modify the virtual addresses used by default? I
guess the shadow regions can be higher that the default ones.
Post by Dariusz Stojaczyk
D.
Post by Alejandro Lucero
<snip>
Alejandro Lucero
2018-10-05 12:45:24 UTC
Permalink
Current code checks if IOMMU hardware reports enough addressing
bits for using IOVA mode but it repeats the same check for any
PCI device present. This is not necessary because the IOMMU hardware
is the same for all of them.

This patch only checks the IOMMU using first PCI device found.

Signed-off-by: Alejandro Lucero <***@netronome.com>
Acked-by: Anatoly Burakov <***@intel.com>
---
drivers/bus/pci/linux/pci.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac..a871549 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -620,8 +620,11 @@
FOREACH_DEVICE_ON_PCIBUS(dev) {
if (!rte_pci_match(drv, dev))
continue;
- if (!pci_one_device_iommu_support_va(dev))
- return false;
+ /*
+ * just one PCI device needs to be checked out because
+ * the IOMMU hardware is the same for all of them.
+ */
+ return pci_one_device_iommu_support_va(dev);
}
}
return true;
--
1.9.1
Alejandro Lucero
2018-10-05 12:45:25 UTC
Permalink
Currently the code precludes IOVA mode if IOMMU hardware reports
less addressing bits than necessary for full virtual memory range.

Although VT-d emulation currently only supports 39 bits, it could
be iovas for allocated memlory being within that supported range.
This patch allows IOVA mode in such a case adding a call to
rte_eal_check_dma_mask using the reported addressing bits by the
IOMMU hardware.

Indeed, memory initialization code has been modified for using lower
virtual addresses than those used by the kernel for 64 bits processes
by default, and therefore memsegs iovas can use 39 bits or less for
most systems. And this is likely 100% true for VMs.

Signed-off-by: Alejandro Lucero <***@netronome.com>
Acked-by: Anatoly Burakov <***@intel.com>
---
drivers/bus/pci/linux/pci.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index a871549..5cf78d7 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -588,10 +588,8 @@
fclose(fp);

mgaw = ((vtd_cap_reg & VTD_CAP_MGAW_MASK) >> VTD_CAP_MGAW_SHIFT) + 1;
- if (mgaw < X86_VA_WIDTH)
- return false;

- return true;
+ return rte_eal_check_dma_mask(mgaw) == 0 ? true : false;
}
#elif defined(RTE_ARCH_PPC_64)
static bool
--
1.9.1
Alejandro Lucero
2018-10-05 12:45:26 UTC
Permalink
NFP devices can not handle DMA addresses requiring more than
40 bits. This patch uses rte_dev_check_dma_mask with 40 bits
and avoids device initialization if memory out of NFP range.

Signed-off-by: Alejandro Lucero <***@netronome.com>
---
drivers/net/nfp/nfp_net.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index 170b5d6..3910980 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -2680,6 +2680,14 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)

pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);

+ /* NFP can not handle DMA addresses requiring more than 40 bits */
+ if (rte_eal_check_dma_mask(40)) {
+ RTE_LOG(ERR, PMD, "device %s can not be used:",
+ pci_dev->device.name);
+ RTE_LOG(ERR, PMD, "\trestricted dma mask to 40 bits!\n");
+ return -ENODEV;
+ };
+
if ((pci_dev->id.device_id == PCI_DEVICE_ID_NFP4000_PF_NIC) ||
(pci_dev->id.device_id == PCI_DEVICE_ID_NFP6000_PF_NIC)) {
port = get_pf_port_number(eth_dev->data->name);
--
1.9.1
Alejandro Lucero
2018-10-05 12:45:27 UTC
Permalink
NFP can handle IOVA as VA. It requires to check those IOVAs
being in the supported range what is done during initialization.

Signed-off-by: Alejandro Lucero <***@netronome.com>
---
drivers/net/nfp/nfp_net.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index 3910980..cc73b0b 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -3268,14 +3268,16 @@ static int eth_nfp_pci_remove(struct rte_pci_device *pci_dev)

static struct rte_pci_driver rte_nfp_net_pf_pmd = {
.id_table = pci_id_nfp_pf_net_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+ RTE_PCI_DRV_IOVA_AS_VA,
.probe = nfp_pf_pci_probe,
.remove = eth_nfp_pci_remove,
};

static struct rte_pci_driver rte_nfp_net_vf_pmd = {
.id_table = pci_id_nfp_vf_net_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+ RTE_PCI_DRV_IOVA_AS_VA,
.probe = eth_nfp_pci_probe,
.remove = eth_nfp_pci_remove,
};
--
1.9.1
Thomas Monjalon
2018-10-28 21:04:18 UTC
Permalink
Post by Alejandro Lucero
I sent a patchset about this to be applied on 17.11 stable. The memory
code has had main changes since that version, so here it is the patchset
adjusted to current master repo.
This patchset adds, mainly, a check for ensuring IOVAs are within a
restricted range due to addressing limitations with some devices. There
are two known cases: NFP and IOMMU VT-d emulation.
With this check IOVAs out of range are detected and PMDs can abort
initialization. For the VT-d case, IOVA VA mode is allowed as long as
IOVAs are within the supported range, avoiding to forbid IOVA VA by
default.
For the addressing limitations known cases, there are just 40(NFP) or
39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
most systems. With machines using more memory, the added check will
ensure IOVAs within the range.
With IOVA VA, and because the way the Linux kernel serves mmap calls
in 64 bits systems, 39 or 40 bits are not enough. It is possible to
give an address hint with a lower starting address than the default one
used by the kernel, and then ensuring the mmap uses that hint or hint plus
some offset. With 64 bits systems, the process virtual address space is
large enoguh for doing the hugepages mmaping within the supported range
when those addressing limitations exist. This patchset also adds a change
for using such a hint making the use of IOVA VA a more than likely
possibility when there are those addressing limitations.
The check is not done by default but just when it is required. This
patchset adds the check for NFP initialization and for setting the IOVA
mode is an emulated VT-d is detected. Also, because the recent patchset
adding dynamic memory allocation, the check is also invoked for ensuring
the new memsegs are within the required range.
This patchset could be applied to stable 18.05.
Applied, thanks
Yao, Lei A
2018-10-29 08:23:41 UTC
Permalink
Hi, Lucero, Thomas

This patch set will cause deadlock during memory initialization.
rte_memseg_walk and try_expand_heap both will lock
the file &mcfg->memory_hotplug_lock. So dead lock will occur.

#0 rte_memseg_walk
#1 <-rte_eal_check_dma_mask
#2 <-alloc_pages_on_heap
#3 <-try_expand_heap_primary
#4 <-try_expand_heap

Log as following:
EAL: TSC frequency is ~2494156 KHz
EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
[New Thread 0x7ffff5e0d700 (LWP 330350)]
EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0

Could you have a check on this? A lot of test cases in our validation
team fail because of this. Thanks a lot!

BRs
Lei
-----Original Message-----
Sent: Monday, October 29, 2018 5:04 AM
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA
mask
Post by Alejandro Lucero
I sent a patchset about this to be applied on 17.11 stable. The memory
code has had main changes since that version, so here it is the patchset
adjusted to current master repo.
This patchset adds, mainly, a check for ensuring IOVAs are within a
restricted range due to addressing limitations with some devices. There
are two known cases: NFP and IOMMU VT-d emulation.
With this check IOVAs out of range are detected and PMDs can abort
initialization. For the VT-d case, IOVA VA mode is allowed as long as
IOVAs are within the supported range, avoiding to forbid IOVA VA by
default.
For the addressing limitations known cases, there are just 40(NFP) or
39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
most systems. With machines using more memory, the added check will
ensure IOVAs within the range.
With IOVA VA, and because the way the Linux kernel serves mmap calls
in 64 bits systems, 39 or 40 bits are not enough. It is possible to
give an address hint with a lower starting address than the default one
used by the kernel, and then ensuring the mmap uses that hint or hint plus
some offset. With 64 bits systems, the process virtual address space is
large enoguh for doing the hugepages mmaping within the supported
range
Post by Alejandro Lucero
when those addressing limitations exist. This patchset also adds a change
for using such a hint making the use of IOVA VA a more than likely
possibility when there are those addressing limitations.
The check is not done by default but just when it is required. This
patchset adds the check for NFP initialization and for setting the IOVA
mode is an emulated VT-d is detected. Also, because the recent patchset
adding dynamic memory allocation, the check is also invoked for ensuring
the new memsegs are within the required range.
This patchset could be applied to stable 18.05.
Applied, thanks
Thomas Monjalon
2018-10-29 08:42:43 UTC
Permalink
Post by Yao, Lei A
Hi, Lucero, Thomas
This patch set will cause deadlock during memory initialization.
rte_memseg_walk and try_expand_heap both will lock
the file &mcfg->memory_hotplug_lock. So dead lock will occur.
#0 rte_memseg_walk
#1 <-rte_eal_check_dma_mask
#2 <-alloc_pages_on_heap
#3 <-try_expand_heap_primary
#4 <-try_expand_heap
EAL: TSC frequency is ~2494156 KHz
EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
[New Thread 0x7ffff5e0d700 (LWP 330350)]
EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
Could you have a check on this? A lot of test cases in our validation
team fail because of this. Thanks a lot!
Can we just call rte_memseg_walk_thread_unsafe()?

+Cc Anatoly
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
I sent a patchset about this to be applied on 17.11 stable. The memory
code has had main changes since that version, so here it is the patchset
adjusted to current master repo.
This patchset adds, mainly, a check for ensuring IOVAs are within a
restricted range due to addressing limitations with some devices. There
are two known cases: NFP and IOMMU VT-d emulation.
With this check IOVAs out of range are detected and PMDs can abort
initialization. For the VT-d case, IOVA VA mode is allowed as long as
IOVAs are within the supported range, avoiding to forbid IOVA VA by
default.
For the addressing limitations known cases, there are just 40(NFP) or
39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
most systems. With machines using more memory, the added check will
ensure IOVAs within the range.
With IOVA VA, and because the way the Linux kernel serves mmap calls
in 64 bits systems, 39 or 40 bits are not enough. It is possible to
give an address hint with a lower starting address than the default one
used by the kernel, and then ensuring the mmap uses that hint or hint plus
some offset. With 64 bits systems, the process virtual address space is
large enoguh for doing the hugepages mmaping within the supported
range
Post by Alejandro Lucero
when those addressing limitations exist. This patchset also adds a change
for using such a hint making the use of IOVA VA a more than likely
possibility when there are those addressing limitations.
The check is not done by default but just when it is required. This
patchset adds the check for NFP initialization and for setting the IOVA
mode is an emulated VT-d is detected. Also, because the recent patchset
adding dynamic memory allocation, the check is also invoked for ensuring
the new memsegs are within the required range.
This patchset could be applied to stable 18.05.
Applied, thanks
Thomas Monjalon
2018-10-29 09:07:24 UTC
Permalink
One more comment about this issue,

There was no reply to the question asked by Alejandro on October 11th:
http://mails.dpdk.org/archives/dev/2018-October/115402.html
and there were no more reviews despite all my requests:
http://mails.dpdk.org/archives/dev/2018-October/117475.html
Without any more comment, I had to apply the patchset.

Now we need to find a solution. Please suggest.
Post by Thomas Monjalon
Post by Yao, Lei A
Hi, Lucero, Thomas
This patch set will cause deadlock during memory initialization.
rte_memseg_walk and try_expand_heap both will lock
the file &mcfg->memory_hotplug_lock. So dead lock will occur.
#0 rte_memseg_walk
#1 <-rte_eal_check_dma_mask
#2 <-alloc_pages_on_heap
#3 <-try_expand_heap_primary
#4 <-try_expand_heap
EAL: TSC frequency is ~2494156 KHz
EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
[New Thread 0x7ffff5e0d700 (LWP 330350)]
EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
Could you have a check on this? A lot of test cases in our validation
team fail because of this. Thanks a lot!
Can we just call rte_memseg_walk_thread_unsafe()?
+Cc Anatoly
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
I sent a patchset about this to be applied on 17.11 stable. The memory
code has had main changes since that version, so here it is the patchset
adjusted to current master repo.
This patchset adds, mainly, a check for ensuring IOVAs are within a
restricted range due to addressing limitations with some devices. There
are two known cases: NFP and IOMMU VT-d emulation.
With this check IOVAs out of range are detected and PMDs can abort
initialization. For the VT-d case, IOVA VA mode is allowed as long as
IOVAs are within the supported range, avoiding to forbid IOVA VA by
default.
For the addressing limitations known cases, there are just 40(NFP) or
39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
most systems. With machines using more memory, the added check will
ensure IOVAs within the range.
With IOVA VA, and because the way the Linux kernel serves mmap calls
in 64 bits systems, 39 or 40 bits are not enough. It is possible to
give an address hint with a lower starting address than the default one
used by the kernel, and then ensuring the mmap uses that hint or hint plus
some offset. With 64 bits systems, the process virtual address space is
large enoguh for doing the hugepages mmaping within the supported
range
Post by Alejandro Lucero
when those addressing limitations exist. This patchset also adds a change
for using such a hint making the use of IOVA VA a more than likely
possibility when there are those addressing limitations.
The check is not done by default but just when it is required. This
patchset adds the check for NFP initialization and for setting the IOVA
mode is an emulated VT-d is detected. Also, because the recent patchset
adding dynamic memory allocation, the check is also invoked for ensuring
the new memsegs are within the required range.
This patchset could be applied to stable 18.05.
Applied, thanks
Alejandro Lucero
2018-10-29 09:25:48 UTC
Permalink
Can we have the configuration triggering this issue?
Post by Thomas Monjalon
One more comment about this issue,
http://mails.dpdk.org/archives/dev/2018-October/115402.html
http://mails.dpdk.org/archives/dev/2018-October/117475.html
Without any more comment, I had to apply the patchset.
Now we need to find a solution. Please suggest.
Post by Thomas Monjalon
Post by Yao, Lei A
Hi, Lucero, Thomas
This patch set will cause deadlock during memory initialization.
rte_memseg_walk and try_expand_heap both will lock
the file &mcfg->memory_hotplug_lock. So dead lock will occur.
#0 rte_memseg_walk
#1 <-rte_eal_check_dma_mask
#2 <-alloc_pages_on_heap
#3 <-try_expand_heap_primary
#4 <-try_expand_heap
EAL: TSC frequency is ~2494156 KHz
EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
[New Thread 0x7ffff5e0d700 (LWP 330350)]
EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
Could you have a check on this? A lot of test cases in our validation
team fail because of this. Thanks a lot!
Can we just call rte_memseg_walk_thread_unsafe()?
+Cc Anatoly
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
I sent a patchset about this to be applied on 17.11 stable. The
memory
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
code has had main changes since that version, so here it is the
patchset
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
adjusted to current master repo.
This patchset adds, mainly, a check for ensuring IOVAs are within a
restricted range due to addressing limitations with some devices.
There
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
are two known cases: NFP and IOMMU VT-d emulation.
With this check IOVAs out of range are detected and PMDs can abort
initialization. For the VT-d case, IOVA VA mode is allowed as long
as
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
IOVAs are within the supported range, avoiding to forbid IOVA VA by
default.
For the addressing limitations known cases, there are just 40(NFP)
or
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
39(VT-d) bits for handling IOVAs. When using IOVA PA, those
limitations
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely
enough for
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
most systems. With machines using more memory, the added check will
ensure IOVAs within the range.
With IOVA VA, and because the way the Linux kernel serves mmap
calls
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
in 64 bits systems, 39 or 40 bits are not enough. It is possible to
give an address hint with a lower starting address than the
default one
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
used by the kernel, and then ensuring the mmap uses that hint or
hint plus
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
some offset. With 64 bits systems, the process virtual address
space is
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
large enoguh for doing the hugepages mmaping within the supported
range
Post by Alejandro Lucero
when those addressing limitations exist. This patchset also adds a
change
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
for using such a hint making the use of IOVA VA a more than likely
possibility when there are those addressing limitations.
The check is not done by default but just when it is required. This
patchset adds the check for NFP initialization and for setting the
IOVA
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
mode is an emulated VT-d is detected. Also, because the recent
patchset
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
adding dynamic memory allocation, the check is also invoked for
ensuring
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
the new memsegs are within the required range.
This patchset could be applied to stable 18.05.
Applied, thanks
Yao, Lei A
2018-10-29 09:44:22 UTC
Permalink
Hi, Lucero

My server info:
Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
Hugepage: 1G
Kernel: 4.15.0
OS: Ubuntu

Steps are simple:

1. Bind one i40e/ixgbe NIC to igb_uio

2. Launch testpmd:

./x86_64-native-linuxapp-gcc/app/testpmd -c 0x03 -n 4 --log-level=eal,8 -- -i

BRs
Lei


From: Alejandro Lucero [mailto:***@netronome.com]
Sent: Monday, October 29, 2018 5:26 PM
To: Thomas Monjalon <***@monjalon.net>
Cc: Yao, Lei A <***@intel.com>; dev <***@dpdk.org>; Xu, Qian Q <***@intel.com>; Lin, Xueqin <***@intel.com>; Burakov, Anatoly <***@intel.com>; Yigit, Ferruh <***@intel.com>; Richardson, Bruce <***@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask

Can we have the configuration triggering this issue?

On Mon, Oct 29, 2018 at 9:07 AM Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>> wrote:
One more comment about this issue,

There was no reply to the question asked by Alejandro on October 11th:
http://mails.dpdk.org/archives/dev/2018-October/115402.html
and there were no more reviews despite all my requests:
http://mails.dpdk.org/archives/dev/2018-October/117475.html
Without any more comment, I had to apply the patchset.

Now we need to find a solution. Please suggest.
Post by Thomas Monjalon
Post by Yao, Lei A
Hi, Lucero, Thomas
This patch set will cause deadlock during memory initialization.
rte_memseg_walk and try_expand_heap both will lock
the file &mcfg->memory_hotplug_lock. So dead lock will occur.
#0 rte_memseg_walk
#1 <-rte_eal_check_dma_mask
#2 <-alloc_pages_on_heap
#3 <-try_expand_heap_primary
#4 <-try_expand_heap
EAL: TSC frequency is ~2494156 KHz
EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
[New Thread 0x7ffff5e0d700 (LWP 330350)]
EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
Could you have a check on this? A lot of test cases in our validation
team fail because of this. Thanks a lot!
Can we just call rte_memseg_walk_thread_unsafe()?
+Cc Anatoly
Post by Yao, Lei A
Post by Alejandro Lucero
I sent a patchset about this to be applied on 17.11 stable. The memory
code has had main changes since that version, so here it is the patchset
adjusted to current master repo.
This patchset adds, mainly, a check for ensuring IOVAs are within a
restricted range due to addressing limitations with some devices. There
are two known cases: NFP and IOMMU VT-d emulation.
With this check IOVAs out of range are detected and PMDs can abort
initialization. For the VT-d case, IOVA VA mode is allowed as long as
IOVAs are within the supported range, avoiding to forbid IOVA VA by
default.
For the addressing limitations known cases, there are just 40(NFP) or
39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
most systems. With machines using more memory, the added check will
ensure IOVAs within the range.
With IOVA VA, and because the way the Linux kernel serves mmap calls
in 64 bits systems, 39 or 40 bits are not enough. It is possible to
give an address hint with a lower starting address than the default one
used by the kernel, and then ensuring the mmap uses that hint or hint plus
some offset. With 64 bits systems, the process virtual address space is
large enoguh for doing the hugepages mmaping within the supported
range
Post by Alejandro Lucero
when those addressing limitations exist. This patchset also adds a change
for using such a hint making the use of IOVA VA a more than likely
possibility when there are those addressing limitations.
The check is not done by default but just when it is required. This
patchset adds the check for NFP initialization and for setting the IOVA
mode is an emulated VT-d is detected. Also, because the recent patchset
adding dynamic memory allocation, the check is also invoked for ensuring
the new memsegs are within the required range.
This patchset could be applied to stabl
Yao, Lei A
2018-10-29 09:36:18 UTC
Permalink
-----Original Message-----
Sent: Monday, October 29, 2018 4:43 PM
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA
mask
Post by Yao, Lei A
Hi, Lucero, Thomas
This patch set will cause deadlock during memory initialization.
rte_memseg_walk and try_expand_heap both will lock
the file &mcfg->memory_hotplug_lock. So dead lock will occur.
#0 rte_memseg_walk
#1 <-rte_eal_check_dma_mask
#2 <-alloc_pages_on_heap
#3 <-try_expand_heap_primary
#4 <-try_expand_heap
EAL: TSC frequency is ~2494156 KHz
EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
[New Thread 0x7ffff5e0d700 (LWP 330350)]
EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
Could you have a check on this? A lot of test cases in our validation
team fail because of this. Thanks a lot!
Can we just call rte_memseg_walk_thread_unsafe()?
+Cc Anatoly
Hi, Thomas

I change to rte_memseg_walk_thread_unsafe(), still
Can't work.

EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: memseg iova 140000000, len 40000000, out of range
EAL: using dma mask ffffffffffffffff
EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 1
EAL: Restoring previous memory policy: 0
EAL: memseg iova 1bc0000000, len 40000000, out of range
EAL: using dma mask ffffffffffffffff
EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
error allocating rte services array
EAL: FATAL: rte_service_init() failed
EAL: rte_service_init() failed
PANIC in main():

BRs
Lei
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
I sent a patchset about this to be applied on 17.11 stable. The memory
code has had main changes since that version, so here it is the patchset
adjusted to current master repo.
This patchset adds, mainly, a check for ensuring IOVAs are within a
restricted range due to addressing limitations with some devices. There
are two known cases: NFP and IOMMU VT-d emulation.
With this check IOVAs out of range are detected and PMDs can abort
initialization. For the VT-d case, IOVA VA mode is allowed as long as
IOVAs are within the supported range, avoiding to forbid IOVA VA by
default.
For the addressing limitations known cases, there are just 40(NFP) or
39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
most systems. With machines using more memory, the added check
will
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
ensure IOVAs within the range.
With IOVA VA, and because the way the Linux kernel serves mmap calls
in 64 bits systems, 39 or 40 bits are not enough. It is possible to
give an address hint with a lower starting address than the default one
used by the kernel, and then ensuring the mmap uses that hint or hint
plus
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
some offset. With 64 bits systems, the process virtual address space is
large enoguh for doing the hugepages mmaping within the supported
range
Post by Alejandro Lucero
when those addressing limitations exist. This patchset also adds a
change
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
for using such a hint making the use of IOVA VA a more than likely
possibility when there are those addressing limitations.
The check is not done by default but just when it is required. This
patchset adds the check for NFP initialization and for setting the IOVA
mode is an emulated VT-d is detected. Also, because the recent
patchset
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
adding dynamic memory allocation, the check is also invoked for
ensuring
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Alejandro Lucero
the new memsegs are within the required range.
This patchset could be applied to stable 18.05.
Applied, thanks
Thomas Monjalon
2018-10-29 09:48:25 UTC
Permalink
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Yao, Lei A
Hi, Lucero, Thomas
This patch set will cause deadlock during memory initialization.
rte_memseg_walk and try_expand_heap both will lock
the file &mcfg->memory_hotplug_lock. So dead lock will occur.
#0 rte_memseg_walk
#1 <-rte_eal_check_dma_mask
#2 <-alloc_pages_on_heap
#3 <-try_expand_heap_primary
#4 <-try_expand_heap
EAL: TSC frequency is ~2494156 KHz
EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
[New Thread 0x7ffff5e0d700 (LWP 330350)]
EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
Could you have a check on this? A lot of test cases in our validation
team fail because of this. Thanks a lot!
Can we just call rte_memseg_walk_thread_unsafe()?
+Cc Anatoly
Hi, Thomas
I change to rte_memseg_walk_thread_unsafe(), still
Can't work.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: memseg iova 140000000, len 40000000, out of range
EAL: using dma mask ffffffffffffffff
EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 1
EAL: Restoring previous memory policy: 0
EAL: memseg iova 1bc0000000, len 40000000, out of range
EAL: using dma mask ffffffffffffffff
EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
error allocating rte services array
EAL: FATAL: rte_service_init() failed
EAL: rte_service_init() failed
I think it is showing there are at least 2 issues:
1/ deadlock
2/ allocation does not comply with mask check (out of range)
Alejandro Lucero
2018-10-29 10:11:09 UTC
Permalink
I know what is going on.

In patchset version 3 I forgot to remove an old code. Anatoly spotted that
and I was going to send another version for fixing it. Before sending the
new version I saw that report about a problem with dma_mask and I'm afraid
I did not send another version with the fix ...

Yao, can you try with next patch?:

*diff --git a/lib/librte_eal/common/eal_common_memory.c
b/lib/librte_eal/common/eal_common_memory.c*

*index ef656bbad..26adf46c0 100644*

*--- a/lib/librte_eal/common/eal_common_memory.c*

*+++ b/lib/librte_eal/common/eal_common_memory.c*

@@ -458,10 +458,6 @@ rte_eal_check_dma_mask(uint8_t maskbits)

return -1;

}



- /* keep the more restricted maskbit */

- if (!mcfg->dma_maskbits || maskbits < mcfg->dma_maskbits)

- mcfg->dma_maskbits = maskbits;

-

/* create dma mask */

mask = ~((1ULL << maskbits) - 1);
Post by Thomas Monjalon
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Yao, Lei A
Hi, Lucero, Thomas
This patch set will cause deadlock during memory initialization.
rte_memseg_walk and try_expand_heap both will lock
the file &mcfg->memory_hotplug_lock. So dead lock will occur.
#0 rte_memseg_walk
#1 <-rte_eal_check_dma_mask
#2 <-alloc_pages_on_heap
#3 <-try_expand_heap_primary
#4 <-try_expand_heap
EAL: TSC frequency is ~2494156 KHz
EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
[New Thread 0x7ffff5e0d700 (LWP 330350)]
EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
Could you have a check on this? A lot of test cases in our validation
team fail because of this. Thanks a lot!
Can we just call rte_memseg_walk_thread_unsafe()?
+Cc Anatoly
Hi, Thomas
I change to rte_memseg_walk_thread_unsafe(), still
Can't work.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: memseg iova 140000000, len 40000000, out of range
EAL: using dma mask ffffffffffffffff
EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 1
EAL: Restoring previous memory policy: 0
EAL: memseg iova 1bc0000000, len 40000000, out of range
EAL: using dma mask ffffffffffffffff
EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
error allocating rte services array
EAL: FATAL: rte_service_init() failed
EAL: rte_service_init() failed
1/ deadlock
2/ allocation does not comply with mask check (out of range)
Alejandro Lucero
2018-10-29 10:15:58 UTC
Permalink
Apologies. Forget my previous email. Just using the wrong repo.

Looking at solving this asap.

On Mon, Oct 29, 2018 at 10:11 AM Alejandro Lucero <
Post by Alejandro Lucero
I know what is going on.
In patchset version 3 I forgot to remove an old code. Anatoly spotted that
and I was going to send another version for fixing it. Before sending the
new version I saw that report about a problem with dma_mask and I'm afraid
I did not send another version with the fix ...
*diff --git a/lib/librte_eal/common/eal_common_memory.c
b/lib/librte_eal/common/eal_common_memory.c*
*index ef656bbad..26adf46c0 100644*
*--- a/lib/librte_eal/common/eal_common_memory.c*
*+++ b/lib/librte_eal/common/eal_common_memory.c*
@@ -458,10 +458,6 @@ rte_eal_check_dma_mask(uint8_t maskbits)
return -1;
}
- /* keep the more restricted maskbit */
- if (!mcfg->dma_maskbits || maskbits < mcfg->dma_maskbits)
- mcfg->dma_maskbits = maskbits;
-
/* create dma mask */
mask = ~((1ULL << maskbits) - 1);
Post by Yao, Lei A
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Yao, Lei A
Hi, Lucero, Thomas
This patch set will cause deadlock during memory initialization.
rte_memseg_walk and try_expand_heap both will lock
the file &mcfg->memory_hotplug_lock. So dead lock will occur.
#0 rte_memseg_walk
#1 <-rte_eal_check_dma_mask
#2 <-alloc_pages_on_heap
#3 <-try_expand_heap_primary
#4 <-try_expand_heap
EAL: TSC frequency is ~2494156 KHz
EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
[New Thread 0x7ffff5e0d700 (LWP 330350)]
EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
Could you have a check on this? A lot of test cases in our
validation
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Yao, Lei A
team fail because of this. Thanks a lot!
Can we just call rte_memseg_walk_thread_unsafe()?
+Cc Anatoly
Hi, Thomas
I change to rte_memseg_walk_thread_unsafe(), still
Can't work.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: memseg iova 140000000, len 40000000, out of range
EAL: using dma mask ffffffffffffffff
EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 1
EAL: Restoring previous memory policy: 0
EAL: memseg iova 1bc0000000, len 40000000, out of range
EAL: using dma mask ffffffffffffffff
EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
error allocating rte services array
EAL: FATAL: rte_service_init() failed
EAL: rte_service_init() failed
1/ deadlock
2/ allocation does not comply with mask check (out of range)
Alejandro Lucero
2018-10-29 11:39:14 UTC
Permalink
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the deadlock.

Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
you modify the call like this:

*diff --git a/lib/librte_eal/common/eal_common_memory.c
b/lib/librte_eal/common/eal_common_memory.c*

*index 12dcedf5c..69b26e464 100644*

*--- a/lib/librte_eal/common/eal_common_memory.c*

*+++ b/lib/librte_eal/common/eal_common_memory.c*

@@ -462,7 +462,7 @@ rte_eal_check_dma_mask(uint8_t maskbits)

/* create dma mask */

mask = ~((1ULL << maskbits) - 1);



- if (rte_memseg_walk(check_iova, &mask))

+ if (!rte_memseg_walk(check_iova, &mask))

/*

* Dma mask precludes hugepage usage.

* This device can not be used and we do not need to keep


it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.


Anatoly, maybe you can see something I can not.



On Mon, Oct 29, 2018 at 10:15 AM Alejandro Lucero <
Post by Alejandro Lucero
Apologies. Forget my previous email. Just using the wrong repo.
Looking at solving this asap.
On Mon, Oct 29, 2018 at 10:11 AM Alejandro Lucero <
Post by Alejandro Lucero
I know what is going on.
In patchset version 3 I forgot to remove an old code. Anatoly spotted
that and I was going to send another version for fixing it. Before sending
the new version I saw that report about a problem with dma_mask and I'm
afraid I did not send another version with the fix ...
*diff --git a/lib/librte_eal/common/eal_common_memory.c
b/lib/librte_eal/common/eal_common_memory.c*
*index ef656bbad..26adf46c0 100644*
*--- a/lib/librte_eal/common/eal_common_memory.c*
*+++ b/lib/librte_eal/common/eal_common_memory.c*
@@ -458,10 +458,6 @@ rte_eal_check_dma_mask(uint8_t maskbits)
return -1;
}
- /* keep the more restricted maskbit */
- if (!mcfg->dma_maskbits || maskbits < mcfg->dma_maskbits)
- mcfg->dma_maskbits = maskbits;
-
/* create dma mask */
mask = ~((1ULL << maskbits) - 1);
Post by Yao, Lei A
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Yao, Lei A
Hi, Lucero, Thomas
This patch set will cause deadlock during memory initialization.
rte_memseg_walk and try_expand_heap both will lock
the file &mcfg->memory_hotplug_lock. So dead lock will occur.
#0 rte_memseg_walk
#1 <-rte_eal_check_dma_mask
#2 <-alloc_pages_on_heap
#3 <-try_expand_heap_primary
#4 <-try_expand_heap
EAL: TSC frequency is ~2494156 KHz
EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
[New Thread 0x7ffff5e0d700 (LWP 330350)]
EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
Could you have a check on this? A lot of test cases in our
validation
Post by Yao, Lei A
Post by Thomas Monjalon
Post by Yao, Lei A
team fail because of this. Thanks a lot!
Can we just call rte_memseg_walk_thread_unsafe()?
+Cc Anatoly
Hi, Thomas
I change to rte_memseg_walk_thread_unsafe(), still
Can't work.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: memseg iova 140000000, len 40000000, out of range
EAL: using dma mask ffffffffffffffff
EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 1
EAL: Restoring previous memory policy: 0
EAL: memseg iova 1bc0000000, len 40000000, out of range
EAL: using dma mask ffffffffffffffff
EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
error allocating rte services array
EAL: FATAL: rte_service_init() failed
EAL: rte_service_init() failed
1/ deadlock
2/ allocation does not comply with mask check (out of range)
Thomas Monjalon
2018-10-29 11:46:45 UTC
Permalink
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the deadlock.
The deadlock is a bigger concern I think.
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.

PS: please do not top post and avoid HTML emails, thanks
Alejandro Lucero
2018-10-29 12:55:34 UTC
Permalink
Post by Thomas Monjalon
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the deadlock.
The deadlock is a bigger concern I think.
I think once the call to rte_eal_check_dma_mask uses the maskbits instead
of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.

Yao, can you try with the attached patch?
Post by Thomas Monjalon
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but
if
Post by Alejandro Lucero
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.
PS: please do not top post and avoid HTML emails, thanks
Yao, Lei A
2018-10-29 13:18:06 UTC
Permalink
From: Alejandro Lucero [mailto:***@netronome.com]
Sent: Monday, October 29, 2018 8:56 PM
To: Thomas Monjalon <***@monjalon.net>
Cc: Yao, Lei A <***@intel.com>; dev <***@dpdk.org>; Xu, Qian Q <***@intel.com>; Lin, Xueqin <***@intel.com>; Burakov, Anatoly <***@intel.com>; Yigit, Ferruh <***@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the deadlock.
The deadlock is a bigger concern I think.

I think once the call to rte_eal_check_dma_mask uses the maskbits instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.

Yao, can you try with the attached patch?

Hi, Lucero

This patch can fix the issue at my side. Thanks a lot
for you quick action.

BRs
Lei
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.

PS: please do not top post and avoid HTML emails,
Alejandro Lucero
2018-10-29 13:40:42 UTC
Permalink
*Sent:* Monday, October 29, 2018 8:56 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the
deadlock.
The deadlock is a bigger concern I think.
I think once the call to rte_eal_check_dma_mask uses the maskbits instead
of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
Yao, can you try with the attached patch?
Hi, Lucero
This patch can fix the issue at my side. Thanks a lot
for you quick action.
Great!

I will send an official patch with the changes.

I have to say that I tested the patchset, but I think it was where
legacy_mem was still there and therefore dynamic memory allocation code not
used during memory initialization.

There is something that concerns me though. Using
rte_memseg_walk_thread_unsafe could be a problem under some situations
although those situations being unlikely.

Usually, calling rte_eal_check_dma_mask happens during initialization. Then
it is safe to use the unsafe function for walking memsegs, but with device
hotplug and dynamic memory allocation, there exists a potential race
condition when the primary process is allocating more memory and
concurrently a device is hotplugged and a secondary process does the device
initialization. By now, this is just a problem with the NFP, and the
potential race condition window really unlikely, but I will work on this
asap.
BRs
Lei
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
but if
Post by Alejandro Lucero
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.
PS: please do not top post and avoid HTML emails, thanks
Thomas Monjalon
2018-10-29 14:18:21 UTC
Permalink
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the
deadlock.
The deadlock is a bigger concern I think.
I think once the call to rte_eal_check_dma_mask uses the maskbits instead
of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
Yao, can you try with the attached patch?
Hi, Lucero
This patch can fix the issue at my side. Thanks a lot
for you quick action.
Great!
I will send an official patch with the changes.
Please, do not forget my other request to better comment functions.
Post by Alejandro Lucero
I have to say that I tested the patchset, but I think it was where
legacy_mem was still there and therefore dynamic memory allocation code not
used during memory initialization.
There is something that concerns me though. Using
rte_memseg_walk_thread_unsafe could be a problem under some situations
although those situations being unlikely.
Usually, calling rte_eal_check_dma_mask happens during initialization. Then
it is safe to use the unsafe function for walking memsegs, but with device
hotplug and dynamic memory allocation, there exists a potential race
condition when the primary process is allocating more memory and
concurrently a device is hotplugged and a secondary process does the device
initialization. By now, this is just a problem with the NFP, and the
potential race condition window really unlikely, but I will work on this
asap.
Yes, this is what concerns me.
You can add a comment explaining the unsafe which is not handled.
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
but if
Post by Alejandro Lucero
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.
PS: please do not top post and avoid HTML emails, thanks
Alejandro Lucero
2018-10-29 14:35:04 UTC
Permalink
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using
the
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
mask instead of the maskbits. However, this does not solves the
deadlock.
The deadlock is a bigger concern I think.
I think once the call to rte_eal_check_dma_mask uses the maskbits
instead
Post by Alejandro Lucero
Post by Alejandro Lucero
of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
Yao, can you try with the attached patch?
Hi, Lucero
This patch can fix the issue at my side. Thanks a lot
for you quick action.
Great!
I will send an official patch with the changes.
Please, do not forget my other request to better comment functions.
Sure.
Post by Thomas Monjalon
Post by Alejandro Lucero
I have to say that I tested the patchset, but I think it was where
legacy_mem was still there and therefore dynamic memory allocation code
not
Post by Alejandro Lucero
used during memory initialization.
There is something that concerns me though. Using
rte_memseg_walk_thread_unsafe could be a problem under some situations
although those situations being unlikely.
Usually, calling rte_eal_check_dma_mask happens during initialization.
Then
Post by Alejandro Lucero
it is safe to use the unsafe function for walking memsegs, but with
device
Post by Alejandro Lucero
hotplug and dynamic memory allocation, there exists a potential race
condition when the primary process is allocating more memory and
concurrently a device is hotplugged and a secondary process does the
device
Post by Alejandro Lucero
initialization. By now, this is just a problem with the NFP, and the
potential race condition window really unlikely, but I will work on this
asap.
Yes, this is what concerns me.
You can add a comment explaining the unsafe which is not handled.
I'' do.

Thanks!
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
but if
Post by Alejandro Lucero
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of
course.
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a
variable,
Post by Alejandro Lucero
Post by Alejandro Lucero
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.
PS: please do not top post and avoid HTML emails, thanks
Yongseok Koh
2018-10-29 18:54:34 UTC
Permalink
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the
deadlock.
The deadlock is a bigger concern I think.
I think once the call to rte_eal_check_dma_mask uses the maskbits instead
of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
Yao, can you try with the attached patch?
Hi, Lucero
This patch can fix the issue at my side. Thanks a lot
for you quick action.
Great!
I will send an official patch with the changes.
Please, do not forget my other request to better comment functions.
Alejandro,

This patchset has been merged to stable/17.11 per your request for the last release.
You must send a fix to stable/17.11 as well, if you think there's a same issue there.

Thanks,
Yongseok
Post by Thomas Monjalon
Post by Alejandro Lucero
I have to say that I tested the patchset, but I think it was where
legacy_mem was still there and therefore dynamic memory allocation code not
used during memory initialization.
There is something that concerns me though. Using
rte_memseg_walk_thread_unsafe could be a problem under some situations
although those situations being unlikely.
Usually, calling rte_eal_check_dma_mask happens during initialization. Then
it is safe to use the unsafe function for walking memsegs, but with device
hotplug and dynamic memory allocation, there exists a potential race
condition when the primary process is allocating more memory and
concurrently a device is hotplugged and a secondary process does the device
initialization. By now, this is just a problem with the NFP, and the
potential race condition window really unlikely, but I will work on this
asap.
Yes, this is what concerns me.
You can add a comment explaining the unsafe which is not handled.
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
but if
Post by Alejandro Lucero
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.
PS: please do not top post and avoid HTML emails, thanks
Alejandro Lucero
2018-10-29 19:37:56 UTC
Permalink
Post by Thomas Monjalon
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using
the
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
mask instead of the maskbits. However, this does not solves the
deadlock.
The deadlock is a bigger concern I think.
I think once the call to rte_eal_check_dma_mask uses the maskbits
instead
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
Yao, can you try with the attached patch?
Hi, Lucero
This patch can fix the issue at my side. Thanks a lot
for you quick action.
Great!
I will send an official patch with the changes.
Please, do not forget my other request to better comment functions.
Alejandro,
This patchset has been merged to stable/17.11 per your request for the last release.
You must send a fix to stable/17.11 as well, if you think there's a same issue there.
The patchset for 17.11 was much more simpler. There have been a lot of
changes to the memory code since 17.11, and this problem should not be
present in stable 17.11.

Once I have said that, if there are any reports about a problem with this
patchset in 17.11, I will work on it as a priority.

Thanks.
Post by Thomas Monjalon
Thanks,
Yongseok
Post by Thomas Monjalon
Post by Alejandro Lucero
I have to say that I tested the patchset, but I think it was where
legacy_mem was still there and therefore dynamic memory allocation code
not
Post by Thomas Monjalon
Post by Alejandro Lucero
used during memory initialization.
There is something that concerns me though. Using
rte_memseg_walk_thread_unsafe could be a problem under some situations
although those situations being unlikely.
Usually, calling rte_eal_check_dma_mask happens during initialization.
Then
Post by Thomas Monjalon
Post by Alejandro Lucero
it is safe to use the unsafe function for walking memsegs, but with
device
Post by Thomas Monjalon
Post by Alejandro Lucero
hotplug and dynamic memory allocation, there exists a potential race
condition when the primary process is allocating more memory and
concurrently a device is hotplugged and a secondary process does the
device
Post by Thomas Monjalon
Post by Alejandro Lucero
initialization. By now, this is just a problem with the NFP, and the
potential race condition window really unlikely, but I will work on this
asap.
Yes, this is what concerns me.
You can add a comment explaining the unsafe which is not handled.
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
but if
Post by Alejandro Lucero
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of
course.
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a
variable,
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.
PS: please do not top post and avoid HTML emails, thanks
Burakov, Anatoly
2018-10-30 10:10:11 UTC
Permalink
Post by Alejandro Lucero
Post by Thomas Monjalon
Post by Alejandro Lucero
On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask
using the
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
mask instead of the maskbits. However, this does not solves the
deadlock.
The deadlock is a bigger concern I think.
I think once the call to rte_eal_check_dma_mask uses the
maskbits instead
Post by Thomas Monjalon
Post by Alejandro Lucero
of the mask, calling rte_memseg_walk_thread_unsafe avoids the
deadlock.
Post by Thomas Monjalon
Post by Alejandro Lucero
Yao, can you try with the attached patch?
Hi, Lucero
This patch can fix the issue at my side. Thanks a lot
for you quick action.
Great!
I will send an official patch with the changes.
Please, do not forget my other request to better comment functions.
Alejandro,
This patchset has been merged to stable/17.11 per your request for
the last release.
You must send a fix to stable/17.11 as well, if you think there's a
same issue there.
The patchset for 17.11 was much more simpler. There have been a lot of
changes to the memory code since 17.11, and this problem should not be
present in stable 17.11.
Once I have said that, if there are any reports about a problem with
this patchset in 17.11, I will work on it as a priority.
Thanks.
17.11 will definitely be immune to the deadlock issue since there are no
memseg walks there :) It however may still have incorrect mask handling.
--
Thanks,
Anatoly
Burakov, Anatoly
2018-10-30 10:11:55 UTC
Permalink
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the
deadlock.
The deadlock is a bigger concern I think.
I think once the call to rte_eal_check_dma_mask uses the maskbits instead
of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
Yao, can you try with the attached patch?
Hi, Lucero
This patch can fix the issue at my side. Thanks a lot
for you quick action.
Great!
I will send an official patch with the changes.
Please, do not forget my other request to better comment functions.
Post by Alejandro Lucero
I have to say that I tested the patchset, but I think it was where
legacy_mem was still there and therefore dynamic memory allocation code not
used during memory initialization.
There is something that concerns me though. Using
rte_memseg_walk_thread_unsafe could be a problem under some situations
although those situations being unlikely.
Usually, calling rte_eal_check_dma_mask happens during initialization. Then
it is safe to use the unsafe function for walking memsegs, but with device
hotplug and dynamic memory allocation, there exists a potential race
condition when the primary process is allocating more memory and
concurrently a device is hotplugged and a secondary process does the device
initialization. By now, this is just a problem with the NFP, and the
potential race condition window really unlikely, but I will work on this
asap.
Yes, this is what concerns me.
You can add a comment explaining the unsafe which is not handled.
The issue here is that this code is called from both memory-locked and
memory-unlocked context. Virtio had a similar issue with their mem table
update code - they solved it by manually locking the memory before doing
everything else, and using thread_unsafe version of the walk.

Could something like that be done here?
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
but if
Post by Alejandro Lucero
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.
PS: please do not top post and avoid HTML emails, thanks
--
Thanks,
Anatoly
Alejandro Lucero
2018-10-30 10:19:35 UTC
Permalink
Post by Thomas Monjalon
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using
the
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
mask instead of the maskbits. However, this does not solves the
deadlock.
The deadlock is a bigger concern I think.
I think once the call to rte_eal_check_dma_mask uses the maskbits
instead
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
Yao, can you try with the attached patch?
Hi, Lucero
This patch can fix the issue at my side. Thanks a lot
for you quick action.
Great!
I will send an official patch with the changes.
Please, do not forget my other request to better comment functions.
Post by Alejandro Lucero
I have to say that I tested the patchset, but I think it was where
legacy_mem was still there and therefore dynamic memory allocation code
not
Post by Thomas Monjalon
Post by Alejandro Lucero
used during memory initialization.
There is something that concerns me though. Using
rte_memseg_walk_thread_unsafe could be a problem under some situations
although those situations being unlikely.
Usually, calling rte_eal_check_dma_mask happens during initialization.
Then
Post by Thomas Monjalon
Post by Alejandro Lucero
it is safe to use the unsafe function for walking memsegs, but with
device
Post by Thomas Monjalon
Post by Alejandro Lucero
hotplug and dynamic memory allocation, there exists a potential race
condition when the primary process is allocating more memory and
concurrently a device is hotplugged and a secondary process does the
device
Post by Thomas Monjalon
Post by Alejandro Lucero
initialization. By now, this is just a problem with the NFP, and the
potential race condition window really unlikely, but I will work on this
asap.
Yes, this is what concerns me.
You can add a comment explaining the unsafe which is not handled.
The issue here is that this code is called from both memory-locked and
memory-unlocked context. Virtio had a similar issue with their mem table
update code - they solved it by manually locking the memory before doing
everything else, and using thread_unsafe version of the walk.
Could something like that be done here?
I have a patch adding a safe and an unsafe dma mask check versions.
However, because the multiprocess problem reported, I think the fixing
requires other type of work.

The problem I see now is calling rte_eal_check_dma_mask from set_iova_mode
code path is wrong. This can not be done at that point because the memory
has not been initialized yet.
Post by Thomas Monjalon
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
but if
Post by Alejandro Lucero
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of
course.
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
Post by Alejandro Lucero
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a
variable,
Post by Thomas Monjalon
Post by Alejandro Lucero
Post by Alejandro Lucero
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.
PS: please do not top post and avoid HTML emails, thanks
--
Thanks,
Anatoly
Lin, Xueqin
2018-10-30 03:20:18 UTC
Permalink
Hi Lucero&Thomas,

Find the patch can’t fix multi-process cases.
Steps:

1. Setup primary process successfully

./hotplug_mp --proc-type=auto



2. Fail to setup secondary process

./hotplug_mp --proc-type=auto

EAL: Detected 88 lcore(s)

EAL: Detected 2 NUMA nodes

EAL: Auto-detected process type: SECONDARY

EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23

Segmentation fault (core dumped)


More information as below:

Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.

0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

264 for (idx = first; idx < msk->n_masks; idx++) {

#0 0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

#1 0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0, next=true,

used=true) at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001

#2 0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4, start=0)

at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018

#3 0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401 <check_iova>,

arg=0x7fffffffcc38) at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589

#4 0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')

at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465

#5 0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)

at /root/dpdk/drivers/bus/pci/linux/pci.c:593

#6 0x00000000005b9738 in pci_devices_iommu_support_va ()

at /root/dpdk/drivers/bus/pci/linux/pci.c:626

#7 0x00000000005b97a7 in rte_pci_get_iommu_class ()

at /root/dpdk/drivers/bus/pci/linux/pci.c:650

#8 0x000000000058f1ce in rte_bus_get_iommu_class ()

at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237

#9 0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)

at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919

#10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)

at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28


Best regards,
Xueqin

From: Alejandro Lucero [mailto:***@netronome.com]
Sent: Monday, October 29, 2018 9:41 PM
To: Yao, Lei A <***@intel.com>
Cc: Thomas Monjalon <***@monjalon.net>; dev <***@dpdk.org>; Xu, Qian Q <***@intel.com>; Lin, Xueqin <***@intel.com>; Burakov, Anatoly <***@intel.com>; Yigit, Ferruh <***@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <***@intel.com<mailto:***@intel.com>> wrote:


From: Alejandro Lucero [mailto:***@netronome.com<mailto:***@netronome.com>]
Sent: Monday, October 29, 2018 8:56 PM
To: Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>>
Cc: Yao, Lei A <***@intel.com<mailto:***@intel.com>>; dev <***@dpdk.org<mailto:***@dpdk.org>>; Xu, Qian Q <***@intel.com<mailto:***@intel.com>>; Lin, Xueqin <***@intel.com<mailto:***@intel.com>>; Burakov, Anatoly <***@intel.com<mailto:***@intel.com>>; Yigit, Ferruh <***@intel.com<mailto:***@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the deadlock.
The deadlock is a bigger concern I think.

I think once the call to rte_eal_check_dma_mask uses the maskbits instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.

Yao, can you try with the attached patch?

Hi, Lucero

This patch can fix the issue at my side. Thanks a lot
for you quick action.


Great!

I will send an official patch with the changes.

I have to say that I tested the patchset, but I think it was where legacy_mem was still there and therefore dynamic memory allocation code not used during memory initialization.

There is something that concerns me though. Using rte_memseg_walk_thread_unsafe could be a problem under some situations although those situations being unlikely.

Usually, calling rte_eal_check_dma_mask happens during initialization. Then it is safe to use the unsafe function for walking memsegs, but with device hotplug and dynamic memory allocation, there exists a potential race condition when the primary process is allocating more memory and concurrently a device is hotplugged and a secondary process does the device initialization. By now, this is just a problem with the NFP, and the potential race condition window really unlikely, but I will work on this asap.

BRs
Lei
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.

PS: please do not top post an
Alejandro Lucero
2018-10-30 09:41:05 UTC
Permalink
Post by Lin, Xueqin
Hi Lucero&Thomas,
Find the patch can’t fix multi-process cases.
Hi,

I think it is not specifically about multiprocess but about hotplug with
multiprocess because I can execute the symmetric_mp successfully with a
secondary process.

Working on this as a priority.

Thanks.
Post by Lin, Xueqin
1. Setup primary process successfully
./hotplug_mp --proc-type=auto
2. Fail to setup secondary process
./hotplug_mp --proc-type=auto
EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23
Segmentation fault (core dumped)
Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.
0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)
at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
264 for (idx = first; idx < msk->n_masks; idx++) {
#0 0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)
at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
#1 0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0, next=true,
used=true) at
/root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001
#2 0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4, start=0)
at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018
#3 0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401 <check_iova>,
arg=0x7fffffffcc38) at
/root/dpdk/lib/librte_eal/common/eal_common_memory.c:589
#4 0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')
at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465
#5 0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)
at /root/dpdk/drivers/bus/pci/linux/pci.c:593
#6 0x00000000005b9738 in pci_devices_iommu_support_va ()
at /root/dpdk/drivers/bus/pci/linux/pci.c:626
#7 0x00000000005b97a7 in rte_pci_get_iommu_class ()
at /root/dpdk/drivers/bus/pci/linux/pci.c:650
#8 0x000000000058f1ce in rte_bus_get_iommu_class ()
at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237
#9 0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)
at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919
#10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)
at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28
Best regards,
Xueqin
*Sent:* Monday, October 29, 2018 9:41 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
*Sent:* Monday, October 29, 2018 8:56 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the
deadlock.
The deadlock is a bigger concern I think.
I think once the call to rte_eal_check_dma_mask uses the maskbits instead
of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
Yao, can you try with the attached patch?
Hi, Lucero
This patch can fix the issue at my side. Thanks a lot
for you quick action.
Great!
I will send an official patch with the changes.
I have to say that I tested the patchset, but I think it was where
legacy_mem was still there and therefore dynamic memory allocation code not
used during memory initialization.
There is something that concerns me though. Using
rte_memseg_walk_thread_unsafe could be a problem under some situations
although those situations being unlikely.
Usually, calling rte_eal_check_dma_mask happens during initialization.
Then it is safe to use the unsafe function for walking memsegs, but with
device hotplug and dynamic memory allocation, there exists a potential race
condition when the primary process is allocating more memory and
concurrently a device is hotplugged and a secondary process does the device
initialization. By now, this is just a problem with the NFP, and the
potential race condition window really unlikely, but I will work on this
asap.
BRs
Lei
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
but if
Post by Alejandro Lucero
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.
PS: please do not top post and avoid HTML emails, thanks
Lin, Xueqin
2018-10-30 10:33:59 UTC
Permalink
Hi Lucero,

No, we have reproduced multi-process issues(include symmetric_mp, simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
It is also strange that 1~2 servers don’t have the issue.

Bind two NNT ports or FVL ports

./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=1

EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
[New Thread 0x7ffff6eda700 (LWP 90103)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
[New Thread 0x7ffff66d9700 (LWP 90104)]

Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
0x00000000005566b5 in rte_fbarray_find_next_used ()
(gdb) bt
#0 0x00000000005566b5 in rte_fbarray_find_next_used ()
#1 0x000000000054da9c in rte_eal_check_dma_mask ()
#2 0x0000000000572ae7 in pci_one_device_iommu_support_va ()
#3 0x0000000000573988 in rte_pci_get_iommu_class ()
#4 0x000000000054f743 in rte_bus_get_iommu_class ()
#5 0x000000000053c123 in rte_eal_init ()
#6 0x000000000046be2b in main ()

Best regards,
Xueqin

From: Alejandro Lucero [mailto:***@netronome.com]
Sent: Tuesday, October 30, 2018 5:41 PM
To: Lin, Xueqin <***@intel.com>
Cc: Yao, Lei A <***@intel.com>; Thomas Monjalon <***@monjalon.net>; dev <***@dpdk.org>; Xu, Qian Q <***@intel.com>; Burakov, Anatoly <***@intel.com>; Yigit, Ferruh <***@intel.com>; Zhang, Qi Z <***@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 3:20 AM Lin, Xueqin <***@intel.com<mailto:***@intel.com>> wrote:
Hi Lucero&Thomas,

Find the patch can’t fix multi-process cases.

Hi,

I think it is not specifically about multiprocess but about hotplug with multiprocess because I can execute the symmetric_mp successfully with a secondary process.

Working on this as a priority.

Thanks.

Steps:

1. Setup primary process successfully

./hotplug_mp --proc-type=auto



2. Fail to setup secondary process

./hotplug_mp --proc-type=auto

EAL: Detected 88 lcore(s)

EAL: Detected 2 NUMA nodes

EAL: Auto-detected process type: SECONDARY

EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23

Segmentation fault (core dumped)


More information as below:

Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.

0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

264 for (idx = first; idx < msk->n_masks; idx++) {

#0 0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

#1 0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0, next=true,

used=true) at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001

#2 0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4, start=0)

at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018

#3 0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401 <check_iova>,

arg=0x7fffffffcc38) at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589

#4 0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')

at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465

#5 0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)

at /root/dpdk/drivers/bus/pci/linux/pci.c:593

#6 0x00000000005b9738 in pci_devices_iommu_support_va ()

at /root/dpdk/drivers/bus/pci/linux/pci.c:626

#7 0x00000000005b97a7 in rte_pci_get_iommu_class ()

at /root/dpdk/drivers/bus/pci/linux/pci.c:650

#8 0x000000000058f1ce in rte_bus_get_iommu_class ()

at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237

#9 0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)

at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919

#10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)

at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28


Best regards,
Xueqin

From: Alejandro Lucero [mailto:***@netronome.com<mailto:***@netronome.com>]
Sent: Monday, October 29, 2018 9:41 PM
To: Yao, Lei A <***@intel.com<mailto:***@intel.com>>
Cc: Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>>; dev <***@dpdk.org<mailto:***@dpdk.org>>; Xu, Qian Q <***@intel.com<mailto:***@intel.com>>; Lin, Xueqin <***@intel.com<mailto:***@intel.com>>; Burakov, Anatoly <***@intel.com<mailto:***@intel.com>>; Yigit, Ferruh <***@intel.com<mailto:***@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <***@intel.com<mailto:***@intel.com>> wrote:


From: Alejandro Lucero [mailto:***@netronome.com<mailto:***@netronome.com>]
Sent: Monday, October 29, 2018 8:56 PM
To: Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>>
Cc: Yao, Lei A <***@intel.com<mailto:***@intel.com>>; dev <***@dpdk.org<mailto:***@dpdk.org>>; Xu, Qian Q <***@intel.com<mailto:***@intel.com>>; Lin, Xueqin <***@intel.com<mailto:***@intel.com>>; Burakov, Anatoly <***@intel.com<mailto:***@intel.com>>; Yigit, Ferruh <***@intel.com<mailto:***@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the deadlock.
The deadlock is a bigger concern I think.

I think once the call to rte_eal_check_dma_mask uses the maskbits instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.

Yao, can you try with the attached patch?

Hi, Lucero

This patch can fix the issue at my side. Thanks a lot
for you quick action.


Great!

I will send an official patch with the changes.

I have to say that I tested the patchset, but I think it was where legacy_mem was still there and therefore dynamic memory allocation code not used during memory initialization.

There is something that concerns me though. Using rte_memseg_walk_thread_unsafe could be a problem under some situations although those situations being unlikely.

Usually, calling rte_eal_check_dma_mask happens during initialization. Then it is safe to use the unsafe function for walking memsegs, but with device hotplug and dynamic memory allocation, there exists a potential race condition when the primary process is allocating more memory and concurrently a device is hotplugged and a secondary process does the device initialization. By now, this is just a problem with the NFP, and the potential race condition window really unlikely, but I will work on this asap.

BRs
Lei
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.

PS: please do not top post and avoid HTML emai
Alejandro Lucero
2018-10-30 10:38:23 UTC
Permalink
Post by Lin, Xueqin
Hi Lucero,
No, we have reproduced multi-process issues(include symmetric_mp,
simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
It is also strange that 1~2 servers don’t have the issue.
Yes, you are right. I could execute it but it was due to how this problem
triggers.
I think I can fix this and at the same time solving properly the initial
issue without any limitation like that potential race condition I
mentioned.
I can give you a patch to try in a couple of hours.

Thanks
Post by Lin, Xueqin
Bind two NNT ports or FVL ports
./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=1
EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
[New Thread 0x7ffff6eda700 (LWP 90103)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
[New Thread 0x7ffff66d9700 (LWP 90104)]
Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
0x00000000005566b5 in rte_fbarray_find_next_used ()
(gdb) bt
#0 0x00000000005566b5 in rte_fbarray_find_next_used ()
#1 0x000000000054da9c in rte_eal_check_dma_mask ()
#2 0x0000000000572ae7 in pci_one_device_iommu_support_va ()
#3 0x0000000000573988 in rte_pci_get_iommu_class ()
#4 0x000000000054f743 in rte_bus_get_iommu_class ()
#5 0x000000000053c123 in rte_eal_init ()
#6 0x000000000046be2b in main ()
Best regards,
Xueqin
*Sent:* Tuesday, October 30, 2018 5:41 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Hi Lucero&Thomas,
Find the patch can’t fix multi-process cases.
Hi,
I think it is not specifically about multiprocess but about hotplug with
multiprocess because I can execute the symmetric_mp successfully with a
secondary process.
Working on this as a priority.
Thanks.
1. Setup primary process successfully
./hotplug_mp --proc-type=auto
2. Fail to setup secondary process
./hotplug_mp --proc-type=auto
EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23
Segmentation fault (core dumped)
Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.
0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)
at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
264 for (idx = first; idx < msk->n_masks; idx++) {
#0 0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)
at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
#1 0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0, next=true,
used=true) at
/root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001
#2 0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4, start=0)
at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018
#3 0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401 <check_iova>,
arg=0x7fffffffcc38) at
/root/dpdk/lib/librte_eal/common/eal_common_memory.c:589
#4 0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')
at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465
#5 0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)
at /root/dpdk/drivers/bus/pci/linux/pci.c:593
#6 0x00000000005b9738 in pci_devices_iommu_support_va ()
at /root/dpdk/drivers/bus/pci/linux/pci.c:626
#7 0x00000000005b97a7 in rte_pci_get_iommu_class ()
at /root/dpdk/drivers/bus/pci/linux/pci.c:650
#8 0x000000000058f1ce in rte_bus_get_iommu_class ()
at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237
#9 0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)
at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919
#10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)
at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28
Best regards,
Xueqin
*Sent:* Monday, October 29, 2018 9:41 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
*Sent:* Monday, October 29, 2018 8:56 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the
deadlock.
The deadlock is a bigger concern I think.
I think once the call to rte_eal_check_dma_mask uses the maskbits instead
of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
Yao, can you try with the attached patch?
Hi, Lucero
This patch can fix the issue at my side. Thanks a lot
for you quick action.
Great!
I will send an official patch with the changes.
I have to say that I tested the patchset, but I think it was where
legacy_mem was still there and therefore dynamic memory allocation code not
used during memory initialization.
There is something that concerns me though. Using
rte_memseg_walk_thread_unsafe could be a problem under some situations
although those situations being unlikely.
Usually, calling rte_eal_check_dma_mask happens during initialization.
Then it is safe to use the unsafe function for walking memsegs, but with
device hotplug and dynamic memory allocation, there exists a potential race
condition when the primary process is allocating more memory and
concurrently a device is hotplugged and a secondary process does the device
initialization. By now, this is just a problem with the NFP, and the
potential race condition window really unlikely, but I will work on this
asap.
BRs
Lei
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
but if
Post by Alejandro Lucero
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.
PS: please do not top post and avoid HTML emails, thanks
Lin, Xueqin
2018-10-30 12:21:59 UTC
Permalink
Some found on some our servers:
If not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then reboot to make it effective.
18.11 rc1: Success to setup testpmd and secondary process.

If add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then reboot to make it effective.
18.11 rc1: Fail to setup testpmd and secondary process.
18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but fail to setup secondary process.

Maybe ”intel_iommu=on iommu=pt” enable or not result in our test gap.
Most of our team servers should enable the IOMMU for VT-d and vfio test.

Best regards,
Xueqin

From: Alejandro Lucero [mailto:***@netronome.com]
Sent: Tuesday, October 30, 2018 6:38 PM
To: Lin, Xueqin <***@intel.com>
Cc: Yao, Lei A <***@intel.com>; Thomas Monjalon <***@monjalon.net>; dev <***@dpdk.org>; Xu, Qian Q <***@intel.com>; Burakov, Anatoly <***@intel.com>; Yigit, Ferruh <***@intel.com>; Zhang, Qi Z <***@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 10:34 AM Lin, Xueqin <***@intel.com<mailto:***@intel.com>> wrote:
Hi Lucero,

No, we have reproduced multi-process issues(include symmetric_mp, simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
It is also strange that 1~2 servers don’t have the issue.


Yes, you are right. I could execute it but it was due to how this problem triggers.
I think I can fix this and at the same time solving properly the initial issue without any limitation like that potential race condition I mentioned.
I can give you a patch to try in a couple of hours.

Thanks

Bind two NNT ports or FVL ports

./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=1

EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
[New Thread 0x7ffff6eda700 (LWP 90103)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
[New Thread 0x7ffff66d9700 (LWP 90104)]

Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
0x00000000005566b5 in rte_fbarray_find_next_used ()
(gdb) bt
#0 0x00000000005566b5 in rte_fbarray_find_next_used ()
#1 0x000000000054da9c in rte_eal_check_dma_mask ()
#2 0x0000000000572ae7 in pci_one_device_iommu_support_va ()
#3 0x0000000000573988 in rte_pci_get_iommu_class ()
#4 0x000000000054f743 in rte_bus_get_iommu_class ()
#5 0x000000000053c123 in rte_eal_init ()
#6 0x000000000046be2b in main ()

Best regards,
Xueqin

From: Alejandro Lucero [mailto:***@netronome.com<mailto:***@netronome.com>]
Sent: Tuesday, October 30, 2018 5:41 PM
To: Lin, Xueqin <***@intel.com<mailto:***@intel.com>>
Cc: Yao, Lei A <***@intel.com<mailto:***@intel.com>>; Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>>; dev <***@dpdk.org<mailto:***@dpdk.org>>; Xu, Qian Q <***@intel.com<mailto:***@intel.com>>; Burakov, Anatoly <***@intel.com<mailto:***@intel.com>>; Yigit, Ferruh <***@intel.com<mailto:***@intel.com>>; Zhang, Qi Z <***@intel.com<mailto:***@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 3:20 AM Lin, Xueqin <***@intel.com<mailto:***@intel.com>> wrote:
Hi Lucero&Thomas,

Find the patch can’t fix multi-process cases.

Hi,

I think it is not specifically about multiprocess but about hotplug with multiprocess because I can execute the symmetric_mp successfully with a secondary process.

Working on this as a priority.

Thanks.

Steps:

1. Setup primary process successfully

./hotplug_mp --proc-type=auto



2. Fail to setup secondary process

./hotplug_mp --proc-type=auto

EAL: Detected 88 lcore(s)

EAL: Detected 2 NUMA nodes

EAL: Auto-detected process type: SECONDARY

EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23

Segmentation fault (core dumped)


More information as below:

Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.

0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

264 for (idx = first; idx < msk->n_masks; idx++) {

#0 0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

#1 0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0, next=true,

used=true) at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001

#2 0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4, start=0)

at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018

#3 0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401 <check_iova>,

arg=0x7fffffffcc38) at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589

#4 0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')

at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465

#5 0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)

at /root/dpdk/drivers/bus/pci/linux/pci.c:593

#6 0x00000000005b9738 in pci_devices_iommu_support_va ()

at /root/dpdk/drivers/bus/pci/linux/pci.c:626

#7 0x00000000005b97a7 in rte_pci_get_iommu_class ()

at /root/dpdk/drivers/bus/pci/linux/pci.c:650

#8 0x000000000058f1ce in rte_bus_get_iommu_class ()

at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237

#9 0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)

at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919

#10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)

at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28


Best regards,
Xueqin

From: Alejandro Lucero [mailto:***@netronome.com<mailto:***@netronome.com>]
Sent: Monday, October 29, 2018 9:41 PM
To: Yao, Lei A <***@intel.com<mailto:***@intel.com>>
Cc: Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>>; dev <***@dpdk.org<mailto:***@dpdk.org>>; Xu, Qian Q <***@intel.com<mailto:***@intel.com>>; Lin, Xueqin <***@intel.com<mailto:***@intel.com>>; Burakov, Anatoly <***@intel.com<mailto:***@intel.com>>; Yigit, Ferruh <***@intel.com<mailto:***@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <***@intel.com<mailto:***@intel.com>> wrote:


From: Alejandro Lucero [mailto:***@netronome.com<mailto:***@netronome.com>]
Sent: Monday, October 29, 2018 8:56 PM
To: Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>>
Cc: Yao, Lei A <***@intel.com<mailto:***@intel.com>>; dev <***@dpdk.org<mailto:***@dpdk.org>>; Xu, Qian Q <***@intel.com<mailto:***@intel.com>>; Lin, Xueqin <***@intel.com<mailto:***@intel.com>>; Burakov, Anatoly <***@intel.com<mailto:***@intel.com>>; Yigit, Ferruh <***@intel.com<mailto:***@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the deadlock.
The deadlock is a bigger concern I think.

I think once the call to rte_eal_check_dma_mask uses the maskbits instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.

Yao, can you try with the attached patch?

Hi, Lucero

This patch can fix the issue at my side. Thanks a lot
for you quick action.


Great!

I will send an official patch with the changes.

I have to say that I tested the patchset, but I think it was where legacy_mem was still there and therefore dynamic memory allocation code not used during memory initialization.

There is something that concerns me though. Using rte_memseg_walk_thread_unsafe could be a problem under some situations although those situations being unlikely.

Usually, calling rte_eal_check_dma_mask happens during initialization. Then it is safe to use the unsafe function for walking memsegs, but with device hotplug and dynamic memory allocation, there exists a potential race condition when the primary process is allocating more memory and concurrently a device is hotplugged and a secondary process does the device initialization. By now, this is just a problem with the NFP, and the potential race condition window really unlikely, but I will work on this asap.

BRs
Lei
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.

PS: please do not top p
Alejandro Lucero
2018-10-30 12:37:11 UTC
Permalink
Post by Lin, Xueqin
If not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then
reboot to make it effective.
18.11 rc1: Success to setup testpmd and secondary process.
If add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then
reboot to make it effective.
18.11 rc1: Fail to setup testpmd and secondary process.
18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but fail to setup
secondary process.
Maybe ”intel_iommu=on iommu=pt” enable or not result in our test gap.
Most of our team servers should enable the IOMMU for VT-d and vfio test.
It makes sense because the problem is when the IOVA mode is set inside
drivers/bus/pci/linux/pci.c and if there is not IOMMU, not call to
rte_eal_check_dma_mask at all.
Post by Lin, Xueqin
Best regards,
Xueqin
*Sent:* Tuesday, October 30, 2018 6:38 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Hi Lucero,
No, we have reproduced multi-process issues(include symmetric_mp,
simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
It is also strange that 1~2 servers don’t have the issue.
Yes, you are right. I could execute it but it was due to how this problem triggers.
I think I can fix this and at the same time solving properly the initial
issue without any limitation like that potential race condition I
mentioned.
I can give you a patch to try in a couple of hours.
Thanks
Bind two NNT ports or FVL ports
./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=1
EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
[New Thread 0x7ffff6eda700 (LWP 90103)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
[New Thread 0x7ffff66d9700 (LWP 90104)]
Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
0x00000000005566b5 in rte_fbarray_find_next_used ()
(gdb) bt
#0 0x00000000005566b5 in rte_fbarray_find_next_used ()
#1 0x000000000054da9c in rte_eal_check_dma_mask ()
#2 0x0000000000572ae7 in pci_one_device_iommu_support_va ()
#3 0x0000000000573988 in rte_pci_get_iommu_class ()
#4 0x000000000054f743 in rte_bus_get_iommu_class ()
#5 0x000000000053c123 in rte_eal_init ()
#6 0x000000000046be2b in main ()
Best regards,
Xueqin
*Sent:* Tuesday, October 30, 2018 5:41 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Hi Lucero&Thomas,
Find the patch can’t fix multi-process cases.
Hi,
I think it is not specifically about multiprocess but about hotplug with
multiprocess because I can execute the symmetric_mp successfully with a
secondary process.
Working on this as a priority.
Thanks.
1. Setup primary process successfully
./hotplug_mp --proc-type=auto
2. Fail to setup secondary process
./hotplug_mp --proc-type=auto
EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23
Segmentation fault (core dumped)
Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.
0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)
at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
264 for (idx = first; idx < msk->n_masks; idx++) {
#0 0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)
at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
#1 0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0, next=true,
used=true) at
/root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001
#2 0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4, start=0)
at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018
#3 0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401 <check_iova>,
arg=0x7fffffffcc38) at
/root/dpdk/lib/librte_eal/common/eal_common_memory.c:589
#4 0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')
at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465
#5 0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)
at /root/dpdk/drivers/bus/pci/linux/pci.c:593
#6 0x00000000005b9738 in pci_devices_iommu_support_va ()
at /root/dpdk/drivers/bus/pci/linux/pci.c:626
#7 0x00000000005b97a7 in rte_pci_get_iommu_class ()
at /root/dpdk/drivers/bus/pci/linux/pci.c:650
#8 0x000000000058f1ce in rte_bus_get_iommu_class ()
at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237
#9 0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)
at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919
#10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)
at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28
Best regards,
Xueqin
*Sent:* Monday, October 29, 2018 9:41 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
*Sent:* Monday, October 29, 2018 8:56 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the
deadlock.
The deadlock is a bigger concern I think.
I think once the call to rte_eal_check_dma_mask uses the maskbits instead
of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
Yao, can you try with the attached patch?
Hi, Lucero
This patch can fix the issue at my side. Thanks a lot
for you quick action.
Great!
I will send an official patch with the changes.
I have to say that I tested the patchset, but I think it was where
legacy_mem was still there and therefore dynamic memory allocation code not
used during memory initialization.
There is something that concerns me though. Using
rte_memseg_walk_thread_unsafe could be a problem under some situations
although those situations being unlikely.
Usually, calling rte_eal_check_dma_mask happens during initialization.
Then it is safe to use the unsafe function for walking memsegs, but with
device hotplug and dynamic memory allocation, there exists a potential race
condition when the primary process is allocating more memory and
concurrently a device is hotplugged and a secondary process does the device
initialization. By now, this is just a problem with the NFP, and the
potential race condition window really unlikely, but I will work on this
asap.
BRs
Lei
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
but if
Post by Alejandro Lucero
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.
PS: please do not top post and avoid HTML emails, thanks
Alejandro Lucero
2018-10-30 14:04:34 UTC
Permalink
On Tue, Oct 30, 2018 at 12:37 PM Alejandro Lucero <
Post by Alejandro Lucero
Post by Lin, Xueqin
If not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then
reboot to make it effective.
18.11 rc1: Success to setup testpmd and secondary process.
If add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then
reboot to make it effective.
18.11 rc1: Fail to setup testpmd and secondary process.
18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but fail to
setup secondary process.
Maybe ”intel_iommu=on iommu=pt” enable or not result in our test gap.
Most of our team servers should enable the IOMMU for VT-d and vfio test.
It makes sense because the problem is when the IOVA mode is set inside
drivers/bus/pci/linux/pci.c and if there is not IOMMU, not call to
rte_eal_check_dma_mask at all.
Post by Lin, Xueqin
Best regards,
Xueqin
*Sent:* Tuesday, October 30, 2018 6:38 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Hi Lucero,
No, we have reproduced multi-process issues(include symmetric_mp,
simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
It is also strange that 1~2 servers don’t have the issue.
Yes, you are right. I could execute it but it was due to how this problem triggers.
I think I can fix this and at the same time solving properly the initial
issue without any limitation like that potential race condition I
mentioned.
I can give you a patch to try in a couple of hours.
Hi Lin,

Can you try the patch attached?

Thanks
Post by Alejandro Lucero
Thanks
Post by Lin, Xueqin
Bind two NNT ports or FVL ports
./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=1
EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
[New Thread 0x7ffff6eda700 (LWP 90103)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
[New Thread 0x7ffff66d9700 (LWP 90104)]
Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
0x00000000005566b5 in rte_fbarray_find_next_used ()
(gdb) bt
#0 0x00000000005566b5 in rte_fbarray_find_next_used ()
#1 0x000000000054da9c in rte_eal_check_dma_mask ()
#2 0x0000000000572ae7 in pci_one_device_iommu_support_va ()
#3 0x0000000000573988 in rte_pci_get_iommu_class ()
#4 0x000000000054f743 in rte_bus_get_iommu_class ()
#5 0x000000000053c123 in rte_eal_init ()
#6 0x000000000046be2b in main ()
Best regards,
Xueqin
*Sent:* Tuesday, October 30, 2018 5:41 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Hi Lucero&Thomas,
Find the patch can’t fix multi-process cases.
Hi,
I think it is not specifically about multiprocess but about hotplug with
multiprocess because I can execute the symmetric_mp successfully with a
secondary process.
Working on this as a priority.
Thanks.
1. Setup primary process successfully
./hotplug_mp --proc-type=auto
2. Fail to setup secondary process
./hotplug_mp --proc-type=auto
EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23
Segmentation fault (core dumped)
Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.
0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)
at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
264 for (idx = first; idx < msk->n_masks; idx++) {
#0 0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)
at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
#1 0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0, next=true,
used=true) at
/root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001
#2 0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4, start=0)
at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018
#3 0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401 <check_iova>,
arg=0x7fffffffcc38) at
/root/dpdk/lib/librte_eal/common/eal_common_memory.c:589
#4 0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')
at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465
#5 0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)
at /root/dpdk/drivers/bus/pci/linux/pci.c:593
#6 0x00000000005b9738 in pci_devices_iommu_support_va ()
at /root/dpdk/drivers/bus/pci/linux/pci.c:626
#7 0x00000000005b97a7 in rte_pci_get_iommu_class ()
at /root/dpdk/drivers/bus/pci/linux/pci.c:650
#8 0x000000000058f1ce in rte_bus_get_iommu_class ()
at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237
#9 0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)
at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919
#10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)
at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28
Best regards,
Xueqin
*Sent:* Monday, October 29, 2018 9:41 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
*Sent:* Monday, October 29, 2018 8:56 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the
deadlock.
The deadlock is a bigger concern I think.
I think once the call to rte_eal_check_dma_mask uses the maskbits
instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the
deadlock.
Yao, can you try with the attached patch?
Hi, Lucero
This patch can fix the issue at my side. Thanks a lot
for you quick action.
Great!
I will send an official patch with the changes.
I have to say that I tested the patchset, but I think it was where
legacy_mem was still there and therefore dynamic memory allocation code not
used during memory initialization.
There is something that concerns me though. Using
rte_memseg_walk_thread_unsafe could be a problem under some situations
although those situations being unlikely.
Usually, calling rte_eal_check_dma_mask happens during initialization.
Then it is safe to use the unsafe function for walking memsegs, but with
device hotplug and dynamic memory allocation, there exists a potential race
condition when the primary process is allocating more memory and
concurrently a device is hotplugged and a secondary process does the device
initialization. By now, this is just a problem with the NFP, and the
potential race condition window really unlikely, but I will work on this
asap.
BRs
Lei
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
but if
Post by Alejandro Lucero
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.
PS: please do not top post and avoid HTML emails, thanks
Burakov, Anatoly
2018-10-30 14:14:51 UTC
Permalink
Post by Alejandro Lucero
On Tue, Oct 30, 2018 at 12:37 PM Alejandro Lucero
Some found on some our servers:____
If  not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg
file, then reboot to make it effective.____
18.11 rc1: Success to setup testpmd  and secondary process.____
__ __
If  add  ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file,
then reboot to make it effective.____
18.11 rc1:  Fail to setup testpmd  and secondary process.____
18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but
fail to setup secondary process.____
__ __
Maybe ”intel_iommu=on iommu=pt” enable or not result in our test
gap. ____
Most of our team servers should enable the IOMMU for VT-d and
vfio test. ____
__
It makes sense because the problem is when the IOVA mode is set
inside drivers/bus/pci/linux/pci.c and if there is not IOMMU, not
call to rte_eal_check_dma_mask at all.
__
Best regards,____
Xueqin____
__ __
*Sent:* Tuesday, October 30, 2018 6:38 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based
on DMA mask____
__ __
__ __
On Tue, Oct 30, 2018 at 10:34 AM Lin, Xueqin
Hi Lucero,____
____
No, we have reproduced multi-process issues(include
symmetric_mp, simple_mp, hotplug_mp, multi-process unit
test… )on most of our servers. ____
It is also strange that 1~2 servers don’t have the issue.____
____
__ __
Yes, you are right. I could execute it but it was due to how
this problem triggers. ____
I think I can fix this and at the same time solving properly the
initial issue without any limitation like that potential race
condition I mentioned. ____
I can give you a patch to try in a couple of hours. ____
__
Hi Lin,
Can you try the patch attached?
Thanks
Hi Alejandro,

Attachments are not supported on the mailing list :)
--
Thanks,
Anatoly
Alejandro Lucero
2018-10-30 14:45:04 UTC
Permalink
Post by Burakov, Anatoly
Post by Alejandro Lucero
On Tue, Oct 30, 2018 at 12:37 PM Alejandro Lucero
Some found on some our servers:____
If not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg
file, then reboot to make it effective.____
18.11 rc1: Success to setup testpmd and secondary process.____
__ __
If add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file,
then reboot to make it effective.____
18.11 rc1: Fail to setup testpmd and secondary process.____
18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but
fail to setup secondary process.____
__ __
Maybe ”intel_iommu=on iommu=pt” enable or not result in our test
gap. ____
Most of our team servers should enable the IOMMU for VT-d and
vfio test. ____
__
It makes sense because the problem is when the IOVA mode is set
inside drivers/bus/pci/linux/pci.c and if there is not IOMMU, not
call to rte_eal_check_dma_mask at all.
__
Best regards,____
Xueqin____
__ __
*Sent:* Tuesday, October 30, 2018 6:38 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based
on DMA mask____
__ __
__ __
On Tue, Oct 30, 2018 at 10:34 AM Lin, Xueqin
Hi Lucero,____
____
No, we have reproduced multi-process issues(include
symmetric_mp, simple_mp, hotplug_mp, multi-process unit
test… )on most of our servers. ____
It is also strange that 1~2 servers don’t have the issue.____
____
__ __
Yes, you are right. I could execute it but it was due to how
this problem triggers. ____
I think I can fix this and at the same time solving properly the
initial issue without any limitation like that potential race
condition I mentioned. ____
I can give you a patch to try in a couple of hours. ____
__
Hi Lin,
Can you try the patch attached?
Thanks
Hi Alejandro,
Attachments are not supported on the mailing list :)
Apologies. I should have sent it just to Lin.
Post by Burakov, Anatoly
--
Thanks,
Anatoly
Lin, Xueqin
2018-10-30 14:45:20 UTC
Permalink
Hi Lucero,

The patch could fix both testpmd and multi-process can’t setup issues on my environment.
Hope that you could upload fix patch to patches page in community.
Thanks a lot.

Best regards,
Xueqin

From: Alejandro Lucero [mailto:***@netronome.com]
Sent: Tuesday, October 30, 2018 10:05 PM
To: Lin, Xueqin <***@intel.com>
Cc: Yao, Lei A <***@intel.com>; Thomas Monjalon <***@monjalon.net>; dev <***@dpdk.org>; Xu, Qian Q <***@intel.com>; Burakov, Anatoly <***@intel.com>; Yigit, Ferruh <***@intel.com>; Zhang, Qi Z <***@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 12:37 PM Alejandro Lucero <***@netronome.com<mailto:***@netronome.com>> wrote:

On Tue, Oct 30, 2018 at 12:22 PM Lin, Xueqin <***@intel.com<mailto:***@intel.com>> wrote:
Some found on some our servers:
If not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then reboot to make it effective.
18.11 rc1: Success to setup testpmd and secondary process.

If add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then reboot to make it effective.
18.11 rc1: Fail to setup testpmd and secondary process.
18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but fail to setup secondary process.

Maybe ”intel_iommu=on iommu=pt” enable or not result in our test gap.
Most of our team servers should enable the IOMMU for VT-d and vfio test.


It makes sense because the problem is when the IOVA mode is set inside drivers/bus/pci/linux/pci.c and if there is not IOMMU, not call to rte_eal_check_dma_mask at all.


Best regards,
Xueqin

From: Alejandro Lucero [mailto:***@netronome.com<mailto:***@netronome.com>]
Sent: Tuesday, October 30, 2018 6:38 PM
To: Lin, Xueqin <***@intel.com<mailto:***@intel.com>>
Cc: Yao, Lei A <***@intel.com<mailto:***@intel.com>>; Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>>; dev <***@dpdk.org<mailto:***@dpdk.org>>; Xu, Qian Q <***@intel.com<mailto:***@intel.com>>; Burakov, Anatoly <***@intel.com<mailto:***@intel.com>>; Yigit, Ferruh <***@intel.com<mailto:***@intel.com>>; Zhang, Qi Z <***@intel.com<mailto:***@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 10:34 AM Lin, Xueqin <***@intel.com<mailto:***@intel.com>> wrote:
Hi Lucero,

No, we have reproduced multi-process issues(include symmetric_mp, simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
It is also strange that 1~2 servers don’t have the issue.


Yes, you are right. I could execute it but it was due to how this problem triggers.
I think I can fix this and at the same time solving properly the initial issue without any limitation like that potential race condition I mentioned.
I can give you a patch to try in a couple of hours.


Hi Lin,

Can you try the patch attached?

Thanks

Thanks

Bind two NNT ports or FVL ports

./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=1

EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
[New Thread 0x7ffff6eda700 (LWP 90103)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
[New Thread 0x7ffff66d9700 (LWP 90104)]

Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
0x00000000005566b5 in rte_fbarray_find_next_used ()
(gdb) bt
#0 0x00000000005566b5 in rte_fbarray_find_next_used ()
#1 0x000000000054da9c in rte_eal_check_dma_mask ()
#2 0x0000000000572ae7 in pci_one_device_iommu_support_va ()
#3 0x0000000000573988 in rte_pci_get_iommu_class ()
#4 0x000000000054f743 in rte_bus_get_iommu_class ()
#5 0x000000000053c123 in rte_eal_init ()
#6 0x000000000046be2b in main ()

Best regards,
Xueqin

From: Alejandro Lucero [mailto:***@netronome.com<mailto:***@netronome.com>]
Sent: Tuesday, October 30, 2018 5:41 PM
To: Lin, Xueqin <***@intel.com<mailto:***@intel.com>>
Cc: Yao, Lei A <***@intel.com<mailto:***@intel.com>>; Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>>; dev <***@dpdk.org<mailto:***@dpdk.org>>; Xu, Qian Q <***@intel.com<mailto:***@intel.com>>; Burakov, Anatoly <***@intel.com<mailto:***@intel.com>>; Yigit, Ferruh <***@intel.com<mailto:***@intel.com>>; Zhang, Qi Z <***@intel.com<mailto:***@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 3:20 AM Lin, Xueqin <***@intel.com<mailto:***@intel.com>> wrote:
Hi Lucero&Thomas,

Find the patch can’t fix multi-process cases.

Hi,

I think it is not specifically about multiprocess but about hotplug with multiprocess because I can execute the symmetric_mp successfully with a secondary process.

Working on this as a priority.

Thanks.

Steps:

1. Setup primary process successfully

./hotplug_mp --proc-type=auto



2. Fail to setup secondary process

./hotplug_mp --proc-type=auto

EAL: Detected 88 lcore(s)

EAL: Detected 2 NUMA nodes

EAL: Auto-detected process type: SECONDARY

EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23

Segmentation fault (core dumped)


More information as below:

Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.

0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

264 for (idx = first; idx < msk->n_masks; idx++) {

#0 0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

#1 0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0, next=true,

used=true) at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001

#2 0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4, start=0)

at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018

#3 0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401 <check_iova>,

arg=0x7fffffffcc38) at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589

#4 0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')

at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465

#5 0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)

at /root/dpdk/drivers/bus/pci/linux/pci.c:593

#6 0x00000000005b9738 in pci_devices_iommu_support_va ()

at /root/dpdk/drivers/bus/pci/linux/pci.c:626

#7 0x00000000005b97a7 in rte_pci_get_iommu_class ()

at /root/dpdk/drivers/bus/pci/linux/pci.c:650

#8 0x000000000058f1ce in rte_bus_get_iommu_class ()

at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237

#9 0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)

at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919

#10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)

at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28


Best regards,
Xueqin

From: Alejandro Lucero [mailto:***@netronome.com<mailto:***@netronome.com>]
Sent: Monday, October 29, 2018 9:41 PM
To: Yao, Lei A <***@intel.com<mailto:***@intel.com>>
Cc: Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>>; dev <***@dpdk.org<mailto:***@dpdk.org>>; Xu, Qian Q <***@intel.com<mailto:***@intel.com>>; Lin, Xueqin <***@intel.com<mailto:***@intel.com>>; Burakov, Anatoly <***@intel.com<mailto:***@intel.com>>; Yigit, Ferruh <***@intel.com<mailto:***@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <***@intel.com<mailto:***@intel.com>> wrote:


From: Alejandro Lucero [mailto:***@netronome.com<mailto:***@netronome.com>]
Sent: Monday, October 29, 2018 8:56 PM
To: Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>>
Cc: Yao, Lei A <***@intel.com<mailto:***@intel.com>>; dev <***@dpdk.org<mailto:***@dpdk.org>>; Xu, Qian Q <***@intel.com<mailto:***@intel.com>>; Lin, Xueqin <***@intel.com<mailto:***@intel.com>>; Burakov, Anatoly <***@intel.com<mailto:***@intel.com>>; Yigit, Ferruh <***@intel.com<mailto:***@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the deadlock.
The deadlock is a bigger concern I think.

I think once the call to rte_eal_check_dma_mask uses the maskbits instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.

Yao, can you try with the attached patch?

Hi, Lucero

This patch can fix the issue at my side. Thanks a lot
for you quick action.


Great!

I will send an official patch with the changes.

I have to say that I tested the patchset, but I think it was where legacy_mem was still there and therefore dynamic memory allocation code not used during memory initialization.

There is something that concerns me though. Using rte_memseg_walk_thread_unsafe could be a problem under some situations although those situations being unlikely.

Usually, calling rte_eal_check_dma_mask happens during initialization. Then it is safe to use the unsafe function for walking memsegs, but with device hotplug and dynamic memory allocation, there exists a potential race condition when the primary process is allocating more memory and concurrently a device is hotplugged and a secondary process does the device initialization. By now, this is just a problem with the NFP, and the potential race condition window really unlikely, but I will work on this asap.

BRs
Lei
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.

PS: please do no
Alejandro Lucero
2018-10-30 14:57:22 UTC
Permalink
Post by Lin, Xueqin
Hi Lucero,
The patch could fix both testpmd and multi-process can’t setup issues on my environment.
Hope that you could upload fix patch to patches page in community.
Thanks a lot.
Great.

I need to format the patchset properly and clean things up but I hope I can
send a patchset this week.

Thanks for testing!

By the way, is this testing something you are doing by yourself or it is
part of Intel DPDK work?
Post by Lin, Xueqin
Best regards,
Xueqin
*Sent:* Tuesday, October 30, 2018 10:05 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
On Tue, Oct 30, 2018 at 12:37 PM Alejandro Lucero <
If not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then
reboot to make it effective.
18.11 rc1: Success to setup testpmd and secondary process.
If add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then
reboot to make it effective.
18.11 rc1: Fail to setup testpmd and secondary process.
18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but fail to setup
secondary process.
Maybe ”intel_iommu=on iommu=pt” enable or not result in our test gap.
Most of our team servers should enable the IOMMU for VT-d and vfio test.
It makes sense because the problem is when the IOVA mode is set inside
drivers/bus/pci/linux/pci.c and if there is not IOMMU, not call to
rte_eal_check_dma_mask at all.
Best regards,
Xueqin
*Sent:* Tuesday, October 30, 2018 6:38 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Hi Lucero,
No, we have reproduced multi-process issues(include symmetric_mp,
simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
It is also strange that 1~2 servers don’t have the issue.
Yes, you are right. I could execute it but it was due to how this problem triggers.
I think I can fix this and at the same time solving properly the initial
issue without any limitation like that potential race condition I
mentioned.
I can give you a patch to try in a couple of hours.
Hi Lin,
Can you try the patch attached?
Thanks
Thanks
Bind two NNT ports or FVL ports
./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=1
EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
[New Thread 0x7ffff6eda700 (LWP 90103)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
[New Thread 0x7ffff66d9700 (LWP 90104)]
Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
0x00000000005566b5 in rte_fbarray_find_next_used ()
(gdb) bt
#0 0x00000000005566b5 in rte_fbarray_find_next_used ()
#1 0x000000000054da9c in rte_eal_check_dma_mask ()
#2 0x0000000000572ae7 in pci_one_device_iommu_support_va ()
#3 0x0000000000573988 in rte_pci_get_iommu_class ()
#4 0x000000000054f743 in rte_bus_get_iommu_class ()
#5 0x000000000053c123 in rte_eal_init ()
#6 0x000000000046be2b in main ()
Best regards,
Xueqin
*Sent:* Tuesday, October 30, 2018 5:41 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Hi Lucero&Thomas,
Find the patch can’t fix multi-process cases.
Hi,
I think it is not specifically about multiprocess but about hotplug with
multiprocess because I can execute the symmetric_mp successfully with a
secondary process.
Working on this as a priority.
Thanks.
1. Setup primary process successfully
./hotplug_mp --proc-type=auto
2. Fail to setup secondary process
./hotplug_mp --proc-type=auto
EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23
Segmentation fault (core dumped)
Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.
0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)
at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
264 for (idx = first; idx < msk->n_masks; idx++) {
#0 0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)
at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
#1 0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0, next=true,
used=true) at
/root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001
#2 0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4, start=0)
at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018
#3 0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401 <check_iova>,
arg=0x7fffffffcc38) at
/root/dpdk/lib/librte_eal/common/eal_common_memory.c:589
#4 0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')
at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465
#5 0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)
at /root/dpdk/drivers/bus/pci/linux/pci.c:593
#6 0x00000000005b9738 in pci_devices_iommu_support_va ()
at /root/dpdk/drivers/bus/pci/linux/pci.c:626
#7 0x00000000005b97a7 in rte_pci_get_iommu_class ()
at /root/dpdk/drivers/bus/pci/linux/pci.c:650
#8 0x000000000058f1ce in rte_bus_get_iommu_class ()
at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237
#9 0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)
at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919
#10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)
at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28
Best regards,
Xueqin
*Sent:* Monday, October 29, 2018 9:41 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
*Sent:* Monday, October 29, 2018 8:56 PM
*Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the
deadlock.
The deadlock is a bigger concern I think.
I think once the call to rte_eal_check_dma_mask uses the maskbits instead
of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
Yao, can you try with the attached patch?
Hi, Lucero
This patch can fix the issue at my side. Thanks a lot
for you quick action.
Great!
I will send an official patch with the changes.
I have to say that I tested the patchset, but I think it was where
legacy_mem was still there and therefore dynamic memory allocation code not
used during memory initialization.
There is something that concerns me though. Using
rte_memseg_walk_thread_unsafe could be a problem under some situations
although those situations being unlikely.
Usually, calling rte_eal_check_dma_mask happens during initialization.
Then it is safe to use the unsafe function for walking memsegs, but with
device hotplug and dynamic memory allocation, there exists a potential race
condition when the primary process is allocating more memory and
concurrently a device is hotplugged and a secondary process does the device
initialization. By now, this is just a problem with the NFP, and the
potential race condition window really unlikely, but I will work on this
asap.
BRs
Lei
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
but if
Post by Alejandro Lucero
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.
PS: please do not top post and avoid HTML emails, thanks
Lin, Xueqin
2018-10-30 15:09:22 UTC
Permalink
Hi Lucero,

From: Alejandro Lucero [mailto:***@netronome.com]
Sent: Tuesday, October 30, 2018 10:57 PM
To: Lin, Xueqin <***@intel.com>
Cc: Yao, Lei A <***@intel.com>; Thomas Monjalon <***@monjalon.net>; dev <***@dpdk.org>; Xu, Qian Q <***@intel.com>; Burakov, Anatoly <***@intel.com>; Yigit, Ferruh <***@intel.com>; Zhang, Qi Z <***@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 2:45 PM Lin, Xueqin <***@intel.com<mailto:***@intel.com>> wrote:
Hi Lucero,

The patch could fix both testpmd and multi-process can’t setup issues on my environment.
Hope that you could upload fix patch to patches page in community.
Thanks a lot.


Great.

I need to format the patchset properly and clean things up but I hope I can send a patchset this week.

Thanks for testing!

By the way, is this testing something you are doing by yourself or it is part of Intel DPDK work?


We are from Intel DPDK validation team☺
It is 18.11 rc1 cycle, the issue block most of our cases can’t continue, include NIC, NIC VF, vhost/virtio, sample…
It is very urgent for us to check DPDK QA in very limit time.
Hope you could send fix patch officially soon, then merge to master branch after review.
Thanks.
Best regards,
Xueqin

From: Alejandro Lucero [mailto:***@netronome.com<mailto:***@netronome.com>]
Sent: Tuesday, October 30, 2018 10:05 PM
To: Lin, Xueqin <***@intel.com<mailto:***@intel.com>>
Cc: Yao, Lei A <***@intel.com<mailto:***@intel.com>>; Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>>; dev <***@dpdk.org<mailto:***@dpdk.org>>; Xu, Qian Q <***@intel.com<mailto:***@intel.com>>; Burakov, Anatoly <***@intel.com<mailto:***@intel.com>>; Yigit, Ferruh <***@intel.com<mailto:***@intel.com>>; Zhang, Qi Z <***@intel.com<mailto:***@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 12:37 PM Alejandro Lucero <***@netronome.com<mailto:***@netronome.com>> wrote:

On Tue, Oct 30, 2018 at 12:22 PM Lin, Xueqin <***@intel.com<mailto:***@intel.com>> wrote:
Some found on some our servers:
If not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then reboot to make it effective.
18.11 rc1: Success to setup testpmd and secondary process.

If add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then reboot to make it effective.
18.11 rc1: Fail to setup testpmd and secondary process.
18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but fail to setup secondary process.

Maybe ”intel_iommu=on iommu=pt” enable or not result in our test gap.
Most of our team servers should enable the IOMMU for VT-d and vfio test.


It makes sense because the problem is when the IOVA mode is set inside drivers/bus/pci/linux/pci.c and if there is not IOMMU, not call to rte_eal_check_dma_mask at all.


Best regards,
Xueqin

From: Alejandro Lucero [mailto:***@netronome.com<mailto:***@netronome.com>]
Sent: Tuesday, October 30, 2018 6:38 PM
To: Lin, Xueqin <***@intel.com<mailto:***@intel.com>>
Cc: Yao, Lei A <***@intel.com<mailto:***@intel.com>>; Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>>; dev <***@dpdk.org<mailto:***@dpdk.org>>; Xu, Qian Q <***@intel.com<mailto:***@intel.com>>; Burakov, Anatoly <***@intel.com<mailto:***@intel.com>>; Yigit, Ferruh <***@intel.com<mailto:***@intel.com>>; Zhang, Qi Z <***@intel.com<mailto:***@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 10:34 AM Lin, Xueqin <***@intel.com<mailto:***@intel.com>> wrote:
Hi Lucero,

No, we have reproduced multi-process issues(include symmetric_mp, simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
It is also strange that 1~2 servers don’t have the issue.


Yes, you are right. I could execute it but it was due to how this problem triggers.
I think I can fix this and at the same time solving properly the initial issue without any limitation like that potential race condition I mentioned.
I can give you a patch to try in a couple of hours.


Hi Lin,

Can you try the patch attached?

Thanks

Thanks

Bind two NNT ports or FVL ports

./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=1

EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
[New Thread 0x7ffff6eda700 (LWP 90103)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
[New Thread 0x7ffff66d9700 (LWP 90104)]

Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
0x00000000005566b5 in rte_fbarray_find_next_used ()
(gdb) bt
#0 0x00000000005566b5 in rte_fbarray_find_next_used ()
#1 0x000000000054da9c in rte_eal_check_dma_mask ()
#2 0x0000000000572ae7 in pci_one_device_iommu_support_va ()
#3 0x0000000000573988 in rte_pci_get_iommu_class ()
#4 0x000000000054f743 in rte_bus_get_iommu_class ()
#5 0x000000000053c123 in rte_eal_init ()
#6 0x000000000046be2b in main ()

Best regards,
Xueqin

From: Alejandro Lucero [mailto:***@netronome.com<mailto:***@netronome.com>]
Sent: Tuesday, October 30, 2018 5:41 PM
To: Lin, Xueqin <***@intel.com<mailto:***@intel.com>>
Cc: Yao, Lei A <***@intel.com<mailto:***@intel.com>>; Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>>; dev <***@dpdk.org<mailto:***@dpdk.org>>; Xu, Qian Q <***@intel.com<mailto:***@intel.com>>; Burakov, Anatoly <***@intel.com<mailto:***@intel.com>>; Yigit, Ferruh <***@intel.com<mailto:***@intel.com>>; Zhang, Qi Z <***@intel.com<mailto:***@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 3:20 AM Lin, Xueqin <***@intel.com<mailto:***@intel.com>> wrote:
Hi Lucero&Thomas,

Find the patch can’t fix multi-process cases.

Hi,

I think it is not specifically about multiprocess but about hotplug with multiprocess because I can execute the symmetric_mp successfully with a secondary process.

Working on this as a priority.

Thanks.

Steps:

1. Setup primary process successfully

./hotplug_mp --proc-type=auto



2. Fail to setup secondary process

./hotplug_mp --proc-type=auto

EAL: Detected 88 lcore(s)

EAL: Detected 2 NUMA nodes

EAL: Auto-detected process type: SECONDARY

EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23

Segmentation fault (core dumped)


More information as below:

Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.

0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

264 for (idx = first; idx < msk->n_masks; idx++) {

#0 0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

#1 0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0, next=true,

used=true) at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001

#2 0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4, start=0)

at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018

#3 0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401 <check_iova>,

arg=0x7fffffffcc38) at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589

#4 0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')

at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465

#5 0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)

at /root/dpdk/drivers/bus/pci/linux/pci.c:593

#6 0x00000000005b9738 in pci_devices_iommu_support_va ()

at /root/dpdk/drivers/bus/pci/linux/pci.c:626

#7 0x00000000005b97a7 in rte_pci_get_iommu_class ()

at /root/dpdk/drivers/bus/pci/linux/pci.c:650

#8 0x000000000058f1ce in rte_bus_get_iommu_class ()

at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237

#9 0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)

at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919

#10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)

at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28


Best regards,
Xueqin

From: Alejandro Lucero [mailto:***@netronome.com<mailto:***@netronome.com>]
Sent: Monday, October 29, 2018 9:41 PM
To: Yao, Lei A <***@intel.com<mailto:***@intel.com>>
Cc: Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>>; dev <***@dpdk.org<mailto:***@dpdk.org>>; Xu, Qian Q <***@intel.com<mailto:***@intel.com>>; Lin, Xueqin <***@intel.com<mailto:***@intel.com>>; Burakov, Anatoly <***@intel.com<mailto:***@intel.com>>; Yigit, Ferruh <***@intel.com<mailto:***@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <***@intel.com<mailto:***@intel.com>> wrote:


From: Alejandro Lucero [mailto:***@netronome.com<mailto:***@netronome.com>]
Sent: Monday, October 29, 2018 8:56 PM
To: Thomas Monjalon <***@monjalon.net<mailto:***@monjalon.net>>
Cc: Yao, Lei A <***@intel.com<mailto:***@intel.com>>; dev <***@dpdk.org<mailto:***@dpdk.org>>; Xu, Qian Q <***@intel.com<mailto:***@intel.com>>; Lin, Xueqin <***@intel.com<mailto:***@intel.com>>; Burakov, Anatoly <***@intel.com<mailto:***@intel.com>>; Yigit, Ferruh <***@intel.com<mailto:***@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the deadlock.
The deadlock is a bigger concern I think.

I think once the call to rte_eal_check_dma_mask uses the maskbits instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.

Yao, can you try with the attached patch?

Hi, Lucero

This patch can fix the issue at my side. Thanks a lot
for you quick action.


Great!

I will send an official patch with the changes.

I have to say that I tested the patchset, but I think it was where legacy_mem was still there and therefore dynamic memory allocation code not used during memory initialization.

There is something that concerns me though. Using rte_memseg_walk_thread_unsafe could be a problem under some situations although those situations being unlikely.

Usually, calling rte_eal_check_dma_mask happens during initialization. Then it is safe to use the unsafe function for walking memsegs, but with device hotplug and dynamic memory allocation, there exists a potential race condition when the primary process is allocating more memory and concurrently a device is hotplugged and a secondary process does the device initialization. By now, this is just a problem with the NFP, and the potential race condition window really unlikely, but I will work on this asap.

BRs
Lei
Post by Alejandro Lucero
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
- if (rte_memseg_walk(check_iova, &mask))
+ if (!rte_memseg_walk(check_iova, &mask))
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.

PS: pleas

Burakov, Anatoly
2018-10-30 10:18:58 UTC
Permalink
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the deadlock.
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but
*diff --git a/lib/librte_eal/common/eal_common_memory.c
b/lib/librte_eal/common/eal_common_memory.c*
*index 12dcedf5c..69b26e464 100644*
*--- a/lib/librte_eal/common/eal_common_memory.c*
*+++ b/lib/librte_eal/common/eal_common_memory.c*
@@ -462,7 +462,7 @@rte_eal_check_dma_mask(uint8_t maskbits)
/* create dma mask */
mask = ~((1ULL << maskbits) - 1);
- if (rte_memseg_walk(check_iova, &mask))
+if (!rte_memseg_walk(check_iova, &mask))
/*
* Dma mask precludes hugepage usage.
* This device can not be used and we do not need to keep
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anatoly, maybe you can see something I can not.
memseg walk will return 0 only when each callback returned 0 and there
were no more segments left to call callbacks on. If your code always
returns 0, then return value of memseg_walk will always be zero.

If your code returns 1 or -1 in some cases, then this error condition
will trigger. If it doesn't, then your condition by which you decide to
return 1 or 0, is incorrect :) I couldn't spot any obvious issues there,
but i'll recheck.
--
Thanks,
Anatoly
Alejandro Lucero
2018-10-30 10:23:31 UTC
Permalink
Post by Burakov, Anatoly
Post by Alejandro Lucero
I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the deadlock.
Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but
*diff --git a/lib/librte_eal/common/eal_common_memory.c
b/lib/librte_eal/common/eal_common_memory.c*
*index 12dcedf5c..69b26e464 100644*
*--- a/lib/librte_eal/common/eal_common_memory.c*
*+++ b/lib/librte_eal/common/eal_common_memory.c*
@@ -462,7 +462,7 @@rte_eal_check_dma_mask(uint8_t maskbits)
/* create dma mask */
mask = ~((1ULL << maskbits) - 1);
- if (rte_memseg_walk(check_iova, &mask))
+if (!rte_memseg_walk(check_iova, &mask))
/*
* Dma mask precludes hugepage usage.
* This device can not be used and we do not need to keep
it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.
Anatoly, maybe you can see something I can not.
memseg walk will return 0 only when each callback returned 0 and there
were no more segments left to call callbacks on. If your code always
returns 0, then return value of memseg_walk will always be zero.
If your code returns 1 or -1 in some cases, then this error condition
will trigger. If it doesn't, then your condition by which you decide to
return 1 or 0, is incorrect :) I couldn't spot any obvious issues there,
but i'll recheck.
Thanks for looking at this, but I was wrong. The return code changes
everything so it does not make sense to compare both.
--
Post by Burakov, Anatoly
Thanks,
Anatoly
Continue reading on narkive:
Loading...