Discussion:
[dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in memory pool allocations, when enabling/disabling this configuration
(too old to reply)
Asaf Sinai
2018-11-26 09:15:51 UTC
Permalink
Hi,

We have 2 NUMAs in our system, and we try to allocate a single DPDK memory pool on each NUMA.
However, we see no difference when enabling/disabling "CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES" configuration.
We expected that disabling it will allocate pools only on one NUMA (probably NUMA0), but it actually allocates pools on both NUMAs, according to "socket_id" parameter passed to "rte_mempool_create" API.
We have 192GB memory, so NUMA1 memory starts from address: 0x1800000000.
As you can see below, "undDpdkPoolNameSocket_1" was indeed allocated on NUMA1, as we wanted, although "CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES" is disabled:

CONFIG_RTE_LIBRTE_VHOST_NUMA=n
CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n

created poolName=undDpdkPoolNameSocket_0, nbufs=887808, bufferSize=2432, total=2059MB
(memZone: name=MP_undDpdkPoolNameSocket_0, socket_id=0, vaddr=0x1f2c0427d00-0x1f2c05abe00, paddr=0x178e627d00-0x178e7abe00, len=1589504, hugepage_sz=2MB)
created poolName=undDpdkPoolNameSocket_1, nbufs=887808, bufferSize=2432, total=2059MB
(memZone: name=MP_undDpdkPoolNameSocket_1, socket_id=1, vaddr=0x1f57fa7be40-0x1f57fbfff40, paddr=0x2f8247be40-0x2f825fff40, len=1589504, hugepage_sz=2MB)

Does anyone know what is "CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES" configuration used for?

Thanks,
Asaf

[Radware]

Asaf Sinai
ND SW Engineer
Email: ***@radware.com<mailto:***@radware.com>

T:+972-72-3917050
M:+972-50-6518541
F:+972-3-6488662


[Check out the latest and greatest from Radware]<https://www.radware.com/Resources/CampaignRedirector.html>


www.radware.com<https://www.radware.com>



[Blog]<https://blog.radware.com/> [Loading Image...] <https://www.linkedin.com/companies/165642> [Loading Image...] <file://twitter.com/radware> [youtube] <https://www.youtube.com/user/radwareinc>


Confidentiality note: This message, and any attachments to it, contains privileged/confidential information of RADWARE Ltd./RADWARE Inc. and may not be disclosed, used, copied, or transmitted in any form or by any means without prior written permission from RADWARE. If you are not the intended recipient, delete the message and any attachments from your system without reading or copying it, and kindly notify the sender by e-mail. Thank you.

P Please consider your environmental responsibility before printing this e-mail
Burakov, Anatoly
2018-11-26 11:09:58 UTC
Permalink
Post by Asaf Sinai
Hi,
We have 2 NUMAs in our system, and we try to allocate a single DPDK memory pool on each NUMA.
However, we see no difference when enabling/disabling "CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES" configuration.
We expected that disabling it will allocate pools only on one NUMA (probably NUMA0), but it actually allocates pools on both NUMAs, according to "socket_id" parameter passed to "rte_mempool_create" API.
We have 192GB memory, so NUMA1 memory starts from address: 0x1800000000.
CONFIG_RTE_LIBRTE_VHOST_NUMA=n
CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
created poolName=undDpdkPoolNameSocket_0, nbufs=887808, bufferSize=2432, total=2059MB
(memZone: name=MP_undDpdkPoolNameSocket_0, socket_id=0, vaddr=0x1f2c0427d00-0x1f2c05abe00, paddr=0x178e627d00-0x178e7abe00, len=1589504, hugepage_sz=2MB)
created poolName=undDpdkPoolNameSocket_1, nbufs=887808, bufferSize=2432, total=2059MB
(memZone: name=MP_undDpdkPoolNameSocket_1, socket_id=1, vaddr=0x1f57fa7be40-0x1f57fbfff40, paddr=0x2f8247be40-0x2f825fff40, len=1589504, hugepage_sz=2MB)
Does anyone know what is "CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES" configuration used for?
Thanks,
Asaf
Hi Asaf,

I cannot reproduce this behavior. Just tried running testpmd with DPDK
18.08 as well as latest master [1], and DPDK could not successfully
allocate a mempool on socket 1.

Did you reconfigure and recompile DPDK after this config change?

[1] Latest master will crash on init in this configuration, fix:
http://patches.dpdk.org/patch/48338/
--
Thanks,
Anatoly
Asaf Sinai
2018-11-26 11:33:32 UTC
Permalink
Hi Anatoly,

We did not check it with "testpmd", only with our application.
From the beginning, we did not enable this configuration (look at attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?

Thanks,
Asaf

-----Original Message-----
From: Burakov, Anatoly <***@intel.com>
Sent: Monday, November 26, 2018 01:10 PM
To: Asaf Sinai <***@Radware.com>; ***@dpdk.org
Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in memory pool allocations, when enabling/disabling this configuration
Post by Asaf Sinai
Hi,
We have 2 NUMAs in our system, and we try to allocate a single DPDK memory pool on each NUMA.
However, we see no difference when enabling/disabling "CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES" configuration.
We expected that disabling it will allocate pools only on one NUMA (probably NUMA0), but it actually allocates pools on both NUMAs, according to "socket_id" parameter passed to "rte_mempool_create" API.
We have 192GB memory, so NUMA1 memory starts from address: 0x1800000000.
CONFIG_RTE_LIBRTE_VHOST_NUMA=n
CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
created poolName=undDpdkPoolNameSocket_0, nbufs=887808, bufferSize=2432, total=2059MB
(memZone: name=MP_undDpdkPoolNameSocket_0, socket_id=0, vaddr=0x1f2c0427d00-0x1f2c05abe00, paddr=0x178e627d00-0x178e7abe00, len=1589504, hugepage_sz=2MB)
created poolName=undDpdkPoolNameSocket_1, nbufs=887808, bufferSize=2432, total=2059MB
(memZone: name=MP_undDpdkPoolNameSocket_1, socket_id=1, vaddr=0x1f57fa7be40-0x1f57fbfff40, paddr=0x2f8247be40-0x2f825fff40, len=1589504, hugepage_sz=2MB)
Does anyone know what is "CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES" configuration used for?
Thanks,
Asaf
Hi Asaf,

I cannot reproduce this behavior. Just tried running testpmd with DPDK
18.08 as well as latest master [1], and DPDK could not successfully
allocate a mempool on socket 1.

Did you reconfigure and recompile DPDK after this config change?

[1] Latest master will crash on init in this configuration, fix:
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatches.dpdk.org%2Fpatch%2F48338%2F&amp;data=02%7C01%7CAsafSi%40radware.com%7C8abb9fa1f2534a424b8e08d6538fb6ef%7C6ae4e000b5d04f48a766402d46119b76%7C0%7C0%7C636788274062104056&amp;sdata=LvREwJCBJ25pQ2va8r6US%2F%2B4fPcUQCjPl6cfuc%2B0gGA%3D&amp;reserved=0

--
Thanks,
Anatoly
Burakov, Anatoly
2018-11-26 11:43:52 UTC
Permalink
Post by Asaf Sinai
Hi Anatoly,
We did not check it with "testpmd", only with our application.
From the beginning, we did not enable this configuration (look at attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?
Just tested with DPDK 17.11, and yes, it does work the way you are
describing. This is not intended behavior. I will look into it.
--
Thanks,
Anatoly
Burakov, Anatoly
2018-11-26 12:50:41 UTC
Permalink
Post by Burakov, Anatoly
Post by Asaf Sinai
Hi Anatoly,
We did not check it with "testpmd", only with our application.
 From the beginning, we did not enable this configuration (look at
attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?
Just tested with DPDK 17.11, and yes, it does work the way you are
describing. This is not intended behavior. I will look into it.
+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.

Looking at the code, i think this config option needs to be reworked and
we should clarify what we mean by this option. It appears that i've
misunderstood what this option actually intended to do, and i also think
it's naming could be improved because it's confusing and misleading.

In 17.11, this option does *not* prevent EAL from using NUMA - it merely
disables using libnuma to perform memory allocation. This looks like
intended (if counter-intuitive) behavior - disabling this option will
simply revert DPDK to working as it did before this option was
introduced (i.e. best-effort allocation). This is why your code still
works - because EAL still does allocate memory on socket 1, and *knows*
that it's socket 1 memory. It still supports NUMA.

The commit message for these changes states that the actual purpose of
this option is to enable "balanced" hugepage allocation. In case of
cgroups limitations, previously, DPDK would've exhausted all hugepages
on master core's socket before attempting to allocate from other
sockets, but by the time we've reached cgroups limits on numbers of
hugepages, we might not have reached socket 1 and thus missed out on the
pages we could've allocated, but didn't. Using libnuma solves this
issue, because now we can allocate pages on sockets we want, instead of
hoping we won't run out of hugepages before we get the memory we need.

In 18.05 onwards, this option works differently (and arguably wrong).
More specifically, it disallows allocations on sockets other than 0, and
it also makes it so that EAL does not check which socket the memory
*actually* came from. So, not only allocating memory from socket 1 is
disabled, but allocating from socket 0 may even get you memory from
socket 1!

+CC Thomas

The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it
makes it seem like this option disables NUMA support, which is not the case.

I would also argue that it is not relevant to 18.05+ memory subsystem,
and should only work in legacy mode, because it is *impossible* to make
it work right in the new memory subsystem, and here's why:

Without libnuma, we have no way of "asking" the kernel to allocate a
hugepage on a specific socket - instead, any allocation will most likely
happen on socket from which the allocation came from. For example, if
user program's lcore is on socket 1, allocation on socket 0 will
actually allocate a page on socket 1.

If we don't check for page's NUMA node affinity (which is what currently
happens) - we get performance degradation because we may unintentionally
allocate memory on wrong NUMA node. If we do check for this - then
allocation of memory on socket 1 from lcore on socket 0 will almost
never succeed, because kernel will always give us pages on socket 0.

Put it simply, there is no sane way to make this option work for the new
memory subsystem - IMO it should be dropped, and libnuma should be made
a hard dependency on Linux.
--
Thanks,
Anatoly
Ilya Maximets
2018-11-26 13:16:55 UTC
Permalink
Post by Burakov, Anatoly
Post by Asaf Sinai
Hi Anatoly,
We did not check it with "testpmd", only with our application.
 From the beginning, we did not enable this configuration (look at attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?
Just tested with DPDK 17.11, and yes, it does work the way you are describing. This is not intended behavior. I will look into it.
+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
Looking at the code, i think this config option needs to be reworked and we should clarify what we mean by this option. It appears that i've misunderstood what this option actually intended to do, and i also think it's naming could be improved because it's confusing and misleading.
In 17.11, this option does *not* prevent EAL from using NUMA - it merely disables using libnuma to perform memory allocation. This looks like intended (if counter-intuitive) behavior - disabling this option will simply revert DPDK to working as it did before this option was introduced (i.e. best-effort allocation). This is why your code still works - because EAL still does allocate memory on socket 1, and *knows* that it's socket 1 memory. It still supports NUMA.
The commit message for these changes states that the actual purpose of this option is to enable "balanced" hugepage allocation. In case of cgroups limitations, previously, DPDK would've exhausted all hugepages on master core's socket before attempting to allocate from other sockets, but by the time we've reached cgroups limits on numbers of hugepages, we might not have reached socket 1 and thus missed out on the pages we could've allocated, but didn't. Using libnuma solves this issue, because now we can allocate pages on sockets we want, instead of hoping we won't run out of hugepages before we get the memory we need.
In 18.05 onwards, this option works differently (and arguably wrong). More specifically, it disallows allocations on sockets other than 0, and it also makes it so that EAL does not check which socket the memory *actually* came from. So, not only allocating memory from socket 1 is disabled, but allocating from socket 0 may even get you memory from socket 1!
I'd consider this as a bug.
Post by Burakov, Anatoly
+CC Thomas
The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes it seem like this option disables NUMA support, which is not the case.
Without libnuma, we have no way of "asking" the kernel to allocate a hugepage on a specific socket - instead, any allocation will most likely happen on socket from which the allocation came from. For example, if user program's lcore is on socket 1, allocation on socket 0 will actually allocate a page on socket 1.
If we don't check for page's NUMA node affinity (which is what currently happens) - we get performance degradation because we may unintentionally allocate memory on wrong NUMA node. If we do check for this - then allocation of memory on socket 1 from lcore on socket 0 will almost never succeed, because kernel will always give us pages on socket 0.
Put it simply, there is no sane way to make this option work for the new memory subsystem - IMO it should be dropped, and libnuma should be made a hard dependency on Linux.
I agree that new memory model could not work without libnuma, i.e. will
lead to unpredictable memory allocations with no any respect to requested
socket_id's. I also agree that CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only
sane for a legacy memory model.
It looks like we have no other choice than just drop the option and make
the code unconditional, i.e. have hard dependency on libnuma.

Best regards, Ilya Maximets.
Ilya Maximets
2018-11-26 13:20:23 UTC
Permalink
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Asaf Sinai
Hi Anatoly,
We did not check it with "testpmd", only with our application.
 From the beginning, we did not enable this configuration (look at attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?
Just tested with DPDK 17.11, and yes, it does work the way you are describing. This is not intended behavior. I will look into it.
+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
Looking at the code, i think this config option needs to be reworked and we should clarify what we mean by this option. It appears that i've misunderstood what this option actually intended to do, and i also think it's naming could be improved because it's confusing and misleading.
In 17.11, this option does *not* prevent EAL from using NUMA - it merely disables using libnuma to perform memory allocation. This looks like intended (if counter-intuitive) behavior - disabling this option will simply revert DPDK to working as it did before this option was introduced (i.e. best-effort allocation). This is why your code still works - because EAL still does allocate memory on socket 1, and *knows* that it's socket 1 memory. It still supports NUMA.
The commit message for these changes states that the actual purpose of this option is to enable "balanced" hugepage allocation. In case of cgroups limitations, previously, DPDK would've exhausted all hugepages on master core's socket before attempting to allocate from other sockets, but by the time we've reached cgroups limits on numbers of hugepages, we might not have reached socket 1 and thus missed out on the pages we could've allocated, but didn't. Using libnuma solves this issue, because now we can allocate pages on sockets we want, instead of hoping we won't run out of hugepages before we get the memory we need.
In 18.05 onwards, this option works differently (and arguably wrong). More specifically, it disallows allocations on sockets other than 0, and it also makes it so that EAL does not check which socket the memory *actually* came from. So, not only allocating memory from socket 1 is disabled, but allocating from socket 0 may even get you memory from socket 1!
I'd consider this as a bug.
Post by Burakov, Anatoly
+CC Thomas
The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes it seem like this option disables NUMA support, which is not the case.
Without libnuma, we have no way of "asking" the kernel to allocate a hugepage on a specific socket - instead, any allocation will most likely happen on socket from which the allocation came from. For example, if user program's lcore is on socket 1, allocation on socket 0 will actually allocate a page on socket 1.
If we don't check for page's NUMA node affinity (which is what currently happens) - we get performance degradation because we may unintentionally allocate memory on wrong NUMA node. If we do check for this - then allocation of memory on socket 1 from lcore on socket 0 will almost never succeed, because kernel will always give us pages on socket 0.
Put it simply, there is no sane way to make this option work for the new memory subsystem - IMO it should be dropped, and libnuma should be made a hard dependency on Linux.
I agree that new memory model could not work without libnuma, i.e. will
lead to unpredictable memory allocations with no any respect to requested
socket_id's. I also agree that CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only
sane for a legacy memory model.
It looks like we have no other choice than just drop the option and make
the code unconditional, i.e. have hard dependency on libnuma.
We, probably, could compile this code and have hard dependency only for
platforms with 'RTE_MAX_NUMA_NODES > 1'.
Post by Ilya Maximets
Best regards, Ilya Maximets.
Burakov, Anatoly
2018-11-26 13:42:45 UTC
Permalink
Post by Ilya Maximets
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Asaf Sinai
Hi Anatoly,
We did not check it with "testpmd", only with our application.
 From the beginning, we did not enable this configuration (look at attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?
Just tested with DPDK 17.11, and yes, it does work the way you are describing. This is not intended behavior. I will look into it.
+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
Looking at the code, i think this config option needs to be reworked and we should clarify what we mean by this option. It appears that i've misunderstood what this option actually intended to do, and i also think it's naming could be improved because it's confusing and misleading.
In 17.11, this option does *not* prevent EAL from using NUMA - it merely disables using libnuma to perform memory allocation. This looks like intended (if counter-intuitive) behavior - disabling this option will simply revert DPDK to working as it did before this option was introduced (i.e. best-effort allocation). This is why your code still works - because EAL still does allocate memory on socket 1, and *knows* that it's socket 1 memory. It still supports NUMA.
The commit message for these changes states that the actual purpose of this option is to enable "balanced" hugepage allocation. In case of cgroups limitations, previously, DPDK would've exhausted all hugepages on master core's socket before attempting to allocate from other sockets, but by the time we've reached cgroups limits on numbers of hugepages, we might not have reached socket 1 and thus missed out on the pages we could've allocated, but didn't. Using libnuma solves this issue, because now we can allocate pages on sockets we want, instead of hoping we won't run out of hugepages before we get the memory we need.
In 18.05 onwards, this option works differently (and arguably wrong). More specifically, it disallows allocations on sockets other than 0, and it also makes it so that EAL does not check which socket the memory *actually* came from. So, not only allocating memory from socket 1 is disabled, but allocating from socket 0 may even get you memory from socket 1!
I'd consider this as a bug.
Post by Burakov, Anatoly
+CC Thomas
The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes it seem like this option disables NUMA support, which is not the case.
Without libnuma, we have no way of "asking" the kernel to allocate a hugepage on a specific socket - instead, any allocation will most likely happen on socket from which the allocation came from. For example, if user program's lcore is on socket 1, allocation on socket 0 will actually allocate a page on socket 1.
If we don't check for page's NUMA node affinity (which is what currently happens) - we get performance degradation because we may unintentionally allocate memory on wrong NUMA node. If we do check for this - then allocation of memory on socket 1 from lcore on socket 0 will almost never succeed, because kernel will always give us pages on socket 0.
Put it simply, there is no sane way to make this option work for the new memory subsystem - IMO it should be dropped, and libnuma should be made a hard dependency on Linux.
I agree that new memory model could not work without libnuma, i.e. will
lead to unpredictable memory allocations with no any respect to requested
socket_id's. I also agree that CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only
sane for a legacy memory model.
It looks like we have no other choice than just drop the option and make
the code unconditional, i.e. have hard dependency on libnuma.
We, probably, could compile this code and have hard dependency only for
platforms with 'RTE_MAX_NUMA_NODES > 1'.
Well, as long as legacy mode stays supported, we have to keep the
option. The "drop" part was referring to supporting it under the new
memory system, not a literal drop from config files.

As for using RTE_MAX_NUMA_NODES, i don't think it's merited.
Distributions cannot deliver different DPDK versions based on the number
of sockets on a particular machine - so it would have to be a hard
dependency for distributions anyway (does any distribution ship DPDK
without libnuma?).

For those compiling from source - are there any supported distributions
which don't package libnuma? I don't see much sense in keeping libnuma
optional, IMO. This is of course up to the tech board to decide, but IMO
the "without libnuma it's basically broken" argument is very strong in
my opinion :)
--
Thanks,
Anatoly
Ilya Maximets
2018-11-26 14:10:32 UTC
Permalink
Post by Ilya Maximets
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Asaf Sinai
Hi Anatoly,
We did not check it with "testpmd", only with our application.
  From the beginning, we did not enable this configuration (look at attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?
Just tested with DPDK 17.11, and yes, it does work the way you are describing. This is not intended behavior. I will look into it.
+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
Looking at the code, i think this config option needs to be reworked and we should clarify what we mean by this option. It appears that i've misunderstood what this option actually intended to do, and i also think it's naming could be improved because it's confusing and misleading.
In 17.11, this option does *not* prevent EAL from using NUMA - it merely disables using libnuma to perform memory allocation. This looks like intended (if counter-intuitive) behavior - disabling this option will simply revert DPDK to working as it did before this option was introduced (i.e. best-effort allocation). This is why your code still works - because EAL still does allocate memory on socket 1, and *knows* that it's socket 1 memory. It still supports NUMA.
The commit message for these changes states that the actual purpose of this option is to enable "balanced" hugepage allocation. In case of cgroups limitations, previously, DPDK would've exhausted all hugepages on master core's socket before attempting to allocate from other sockets, but by the time we've reached cgroups limits on numbers of hugepages, we might not have reached socket 1 and thus missed out on the pages we could've allocated, but didn't. Using libnuma solves this issue, because now we can allocate pages on sockets we want, instead of hoping we won't run out of hugepages before we get the memory we need.
In 18.05 onwards, this option works differently (and arguably wrong). More specifically, it disallows allocations on sockets other than 0, and it also makes it so that EAL does not check which socket the memory *actually* came from. So, not only allocating memory from socket 1 is disabled, but allocating from socket 0 may even get you memory from socket 1!
I'd consider this as a bug.
Post by Burakov, Anatoly
+CC Thomas
The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes it seem like this option disables NUMA support, which is not the case.
Without libnuma, we have no way of "asking" the kernel to allocate a hugepage on a specific socket - instead, any allocation will most likely happen on socket from which the allocation came from. For example, if user program's lcore is on socket 1, allocation on socket 0 will actually allocate a page on socket 1.
If we don't check for page's NUMA node affinity (which is what currently happens) - we get performance degradation because we may unintentionally allocate memory on wrong NUMA node. If we do check for this - then allocation of memory on socket 1 from lcore on socket 0 will almost never succeed, because kernel will always give us pages on socket 0.
Put it simply, there is no sane way to make this option work for the new memory subsystem - IMO it should be dropped, and libnuma should be made a hard dependency on Linux.
I agree that new memory model could not work without libnuma, i.e. will
lead to unpredictable memory allocations with no any respect to requested
socket_id's. I also agree that CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only
sane for a legacy memory model.
It looks like we have no other choice than just drop the option and make
the code unconditional, i.e. have hard dependency on libnuma.
We, probably, could compile this code and have hard dependency only for
platforms with 'RTE_MAX_NUMA_NODES > 1'.
Well, as long as legacy mode stays supported, we have to keep the option. The "drop" part was referring to supporting it under the new memory system, not a literal drop from config files.
The option was introduced because we didn't want to introduce the
new hard dependency. Since we'll have it anyway, I'm not sure if
keeping the option for legacy mode makes any sense.
As for using RTE_MAX_NUMA_NODES, i don't think it's merited. Distributions cannot deliver different DPDK versions based on the number of sockets on a particular machine - so it would have to be a hard dependency for distributions anyway (does any distribution ship DPDK without libnuma?).
At least ARMv7 builds commonly does not ship libnuma package.
For those compiling from source - are there any supported distributions which don't package libnuma? I don't see much sense in keeping libnuma optional, IMO. This is of course up to the tech board to decide, but IMO the "without libnuma it's basically broken" argument is very strong in my opinion :)
Burakov, Anatoly
2018-11-26 14:21:11 UTC
Permalink
Post by Ilya Maximets
Post by Ilya Maximets
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Asaf Sinai
Hi Anatoly,
We did not check it with "testpmd", only with our application.
  From the beginning, we did not enable this configuration (look at attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?
Just tested with DPDK 17.11, and yes, it does work the way you are describing. This is not intended behavior. I will look into it.
+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
Looking at the code, i think this config option needs to be reworked and we should clarify what we mean by this option. It appears that i've misunderstood what this option actually intended to do, and i also think it's naming could be improved because it's confusing and misleading.
In 17.11, this option does *not* prevent EAL from using NUMA - it merely disables using libnuma to perform memory allocation. This looks like intended (if counter-intuitive) behavior - disabling this option will simply revert DPDK to working as it did before this option was introduced (i.e. best-effort allocation). This is why your code still works - because EAL still does allocate memory on socket 1, and *knows* that it's socket 1 memory. It still supports NUMA.
The commit message for these changes states that the actual purpose of this option is to enable "balanced" hugepage allocation. In case of cgroups limitations, previously, DPDK would've exhausted all hugepages on master core's socket before attempting to allocate from other sockets, but by the time we've reached cgroups limits on numbers of hugepages, we might not have reached socket 1 and thus missed out on the pages we could've allocated, but didn't. Using libnuma solves this issue, because now we can allocate pages on sockets we want, instead of hoping we won't run out of hugepages before we get the memory we need.
In 18.05 onwards, this option works differently (and arguably wrong). More specifically, it disallows allocations on sockets other than 0, and it also makes it so that EAL does not check which socket the memory *actually* came from. So, not only allocating memory from socket 1 is disabled, but allocating from socket 0 may even get you memory from socket 1!
I'd consider this as a bug.
Post by Burakov, Anatoly
+CC Thomas
The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes it seem like this option disables NUMA support, which is not the case.
Without libnuma, we have no way of "asking" the kernel to allocate a hugepage on a specific socket - instead, any allocation will most likely happen on socket from which the allocation came from. For example, if user program's lcore is on socket 1, allocation on socket 0 will actually allocate a page on socket 1.
If we don't check for page's NUMA node affinity (which is what currently happens) - we get performance degradation because we may unintentionally allocate memory on wrong NUMA node. If we do check for this - then allocation of memory on socket 1 from lcore on socket 0 will almost never succeed, because kernel will always give us pages on socket 0.
Put it simply, there is no sane way to make this option work for the new memory subsystem - IMO it should be dropped, and libnuma should be made a hard dependency on Linux.
I agree that new memory model could not work without libnuma, i.e. will
lead to unpredictable memory allocations with no any respect to requested
socket_id's. I also agree that CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only
sane for a legacy memory model.
It looks like we have no other choice than just drop the option and make
the code unconditional, i.e. have hard dependency on libnuma.
We, probably, could compile this code and have hard dependency only for
platforms with 'RTE_MAX_NUMA_NODES > 1'.
Well, as long as legacy mode stays supported, we have to keep the option. The "drop" part was referring to supporting it under the new memory system, not a literal drop from config files.
The option was introduced because we didn't want to introduce the
new hard dependency. Since we'll have it anyway, I'm not sure if
keeping the option for legacy mode makes any sense.
Oh yes, you're right. Drop it is!
Post by Ilya Maximets
As for using RTE_MAX_NUMA_NODES, i don't think it's merited. Distributions cannot deliver different DPDK versions based on the number of sockets on a particular machine - so it would have to be a hard dependency for distributions anyway (does any distribution ship DPDK without libnuma?).
At least ARMv7 builds commonly does not ship libnuma package.
Do you mean libnuma builds for ARMv7 are not available? Or do you mean
the libnuma package is not installed by default?

If it's the latter, then i believe it's not installed by default
anywhere, but if using distribution version of DPDK, libnuma will be
taken care of via package manager. Presumably building from source can
be taken care of with pkg-config/meson.

Or do you mean ARMv7 does not have libnuma for their arch at all, in any
distro?
Post by Ilya Maximets
For those compiling from source - are there any supported distributions which don't package libnuma? I don't see much sense in keeping libnuma optional, IMO. This is of course up to the tech board to decide, but IMO the "without libnuma it's basically broken" argument is very strong in my opinion :)
--
Thanks,
Anatoly
Ilya Maximets
2018-11-26 14:32:01 UTC
Permalink
Post by Burakov, Anatoly
Post by Ilya Maximets
Post by Ilya Maximets
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Asaf Sinai
Hi Anatoly,
We did not check it with "testpmd", only with our application.
   From the beginning, we did not enable this configuration (look at attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?
Just tested with DPDK 17.11, and yes, it does work the way you are describing. This is not intended behavior. I will look into it.
+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
Looking at the code, i think this config option needs to be reworked and we should clarify what we mean by this option. It appears that i've misunderstood what this option actually intended to do, and i also think it's naming could be improved because it's confusing and misleading.
In 17.11, this option does *not* prevent EAL from using NUMA - it merely disables using libnuma to perform memory allocation. This looks like intended (if counter-intuitive) behavior - disabling this option will simply revert DPDK to working as it did before this option was introduced (i.e. best-effort allocation). This is why your code still works - because EAL still does allocate memory on socket 1, and *knows* that it's socket 1 memory. It still supports NUMA.
The commit message for these changes states that the actual purpose of this option is to enable "balanced" hugepage allocation. In case of cgroups limitations, previously, DPDK would've exhausted all hugepages on master core's socket before attempting to allocate from other sockets, but by the time we've reached cgroups limits on numbers of hugepages, we might not have reached socket 1 and thus missed out on the pages we could've allocated, but didn't. Using libnuma solves this issue, because now we can allocate pages on sockets we want, instead of hoping we won't run out of hugepages before we get the memory we need.
In 18.05 onwards, this option works differently (and arguably wrong). More specifically, it disallows allocations on sockets other than 0, and it also makes it so that EAL does not check which socket the memory *actually* came from. So, not only allocating memory from socket 1 is disabled, but allocating from socket 0 may even get you memory from socket 1!
I'd consider this as a bug.
Post by Burakov, Anatoly
+CC Thomas
The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes it seem like this option disables NUMA support, which is not the case.
Without libnuma, we have no way of "asking" the kernel to allocate a hugepage on a specific socket - instead, any allocation will most likely happen on socket from which the allocation came from. For example, if user program's lcore is on socket 1, allocation on socket 0 will actually allocate a page on socket 1.
If we don't check for page's NUMA node affinity (which is what currently happens) - we get performance degradation because we may unintentionally allocate memory on wrong NUMA node. If we do check for this - then allocation of memory on socket 1 from lcore on socket 0 will almost never succeed, because kernel will always give us pages on socket 0.
Put it simply, there is no sane way to make this option work for the new memory subsystem - IMO it should be dropped, and libnuma should be made a hard dependency on Linux.
I agree that new memory model could not work without libnuma, i.e. will
lead to unpredictable memory allocations with no any respect to requested
socket_id's. I also agree that CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only
sane for a legacy memory model.
It looks like we have no other choice than just drop the option and make
the code unconditional, i.e. have hard dependency on libnuma.
We, probably, could compile this code and have hard dependency only for
platforms with 'RTE_MAX_NUMA_NODES > 1'.
Well, as long as legacy mode stays supported, we have to keep the option. The "drop" part was referring to supporting it under the new memory system, not a literal drop from config files.
The option was introduced because we didn't want to introduce the
new hard dependency. Since we'll have it anyway, I'm not sure if
keeping the option for legacy mode makes any sense.
Oh yes, you're right. Drop it is!
Post by Ilya Maximets
As for using RTE_MAX_NUMA_NODES, i don't think it's merited. Distributions cannot deliver different DPDK versions based on the number of sockets on a particular machine - so it would have to be a hard dependency for distributions anyway (does any distribution ship DPDK without libnuma?).
At least ARMv7 builds commonly does not ship libnuma package.
Do you mean libnuma builds for ARMv7 are not available? Or do you mean the libnuma package is not installed by default?
If it's the latter, then i believe it's not installed by default anywhere, but if using distribution version of DPDK, libnuma will be taken care of via package manager. Presumably building from source can be taken care of with pkg-config/meson.
Or do you mean ARMv7 does not have libnuma for their arch at all, in any distro?
libnuma builds for ARMv7 are not available in most of the distros. I didn't check all,
but here is results for Ubuntu:
https://packages.ubuntu.com/search?suite=bionic&arch=armhf&searchon=names&keywords=libnuma

You may see that Ubuntu 18.04 (bionic) has no libnuma package for 'armhf' and
also 'powerpc' platforms.
Post by Burakov, Anatoly
Post by Ilya Maximets
For those compiling from source - are there any supported distributions which don't package libnuma? I don't see much sense in keeping libnuma optional, IMO. This is of course up to the tech board to decide, but IMO the "without libnuma it's basically broken" argument is very strong in my opinion :)
Burakov, Anatoly
2018-11-26 14:57:29 UTC
Permalink
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Ilya Maximets
Post by Ilya Maximets
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Asaf Sinai
Hi Anatoly,
We did not check it with "testpmd", only with our application.
   From the beginning, we did not enable this configuration (look at attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?
Just tested with DPDK 17.11, and yes, it does work the way you are describing. This is not intended behavior. I will look into it.
+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
Looking at the code, i think this config option needs to be reworked and we should clarify what we mean by this option. It appears that i've misunderstood what this option actually intended to do, and i also think it's naming could be improved because it's confusing and misleading.
In 17.11, this option does *not* prevent EAL from using NUMA - it merely disables using libnuma to perform memory allocation. This looks like intended (if counter-intuitive) behavior - disabling this option will simply revert DPDK to working as it did before this option was introduced (i.e. best-effort allocation). This is why your code still works - because EAL still does allocate memory on socket 1, and *knows* that it's socket 1 memory. It still supports NUMA.
The commit message for these changes states that the actual purpose of this option is to enable "balanced" hugepage allocation. In case of cgroups limitations, previously, DPDK would've exhausted all hugepages on master core's socket before attempting to allocate from other sockets, but by the time we've reached cgroups limits on numbers of hugepages, we might not have reached socket 1 and thus missed out on the pages we could've allocated, but didn't. Using libnuma solves this issue, because now we can allocate pages on sockets we want, instead of hoping we won't run out of hugepages before we get the memory we need.
In 18.05 onwards, this option works differently (and arguably wrong). More specifically, it disallows allocations on sockets other than 0, and it also makes it so that EAL does not check which socket the memory *actually* came from. So, not only allocating memory from socket 1 is disabled, but allocating from socket 0 may even get you memory from socket 1!
I'd consider this as a bug.
Post by Burakov, Anatoly
+CC Thomas
The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes it seem like this option disables NUMA support, which is not the case.
Without libnuma, we have no way of "asking" the kernel to allocate a hugepage on a specific socket - instead, any allocation will most likely happen on socket from which the allocation came from. For example, if user program's lcore is on socket 1, allocation on socket 0 will actually allocate a page on socket 1.
If we don't check for page's NUMA node affinity (which is what currently happens) - we get performance degradation because we may unintentionally allocate memory on wrong NUMA node. If we do check for this - then allocation of memory on socket 1 from lcore on socket 0 will almost never succeed, because kernel will always give us pages on socket 0.
Put it simply, there is no sane way to make this option work for the new memory subsystem - IMO it should be dropped, and libnuma should be made a hard dependency on Linux.
I agree that new memory model could not work without libnuma, i.e. will
lead to unpredictable memory allocations with no any respect to requested
socket_id's. I also agree that CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only
sane for a legacy memory model.
It looks like we have no other choice than just drop the option and make
the code unconditional, i.e. have hard dependency on libnuma.
We, probably, could compile this code and have hard dependency only for
platforms with 'RTE_MAX_NUMA_NODES > 1'.
Well, as long as legacy mode stays supported, we have to keep the option. The "drop" part was referring to supporting it under the new memory system, not a literal drop from config files.
The option was introduced because we didn't want to introduce the
new hard dependency. Since we'll have it anyway, I'm not sure if
keeping the option for legacy mode makes any sense.
Oh yes, you're right. Drop it is!
Post by Ilya Maximets
As for using RTE_MAX_NUMA_NODES, i don't think it's merited. Distributions cannot deliver different DPDK versions based on the number of sockets on a particular machine - so it would have to be a hard dependency for distributions anyway (does any distribution ship DPDK without libnuma?).
At least ARMv7 builds commonly does not ship libnuma package.
Do you mean libnuma builds for ARMv7 are not available? Or do you mean the libnuma package is not installed by default?
If it's the latter, then i believe it's not installed by default anywhere, but if using distribution version of DPDK, libnuma will be taken care of via package manager. Presumably building from source can be taken care of with pkg-config/meson.
Or do you mean ARMv7 does not have libnuma for their arch at all, in any distro?
libnuma builds for ARMv7 are not available in most of the distros. I didn't check all,
https://packages.ubuntu.com/search?suite=bionic&arch=armhf&searchon=names&keywords=libnuma
You may see that Ubuntu 18.04 (bionic) has no libnuma package for 'armhf' and
also 'powerpc' platforms.
That's a difficulty. Do these platforms support NUMA? In other words,
could we replace this flag with just outright disabling NUMA support?
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Ilya Maximets
For those compiling from source - are there any supported distributions which don't package libnuma? I don't see much sense in keeping libnuma optional, IMO. This is of course up to the tech board to decide, but IMO the "without libnuma it's basically broken" argument is very strong in my opinion :)
--
Thanks,
Anatoly
Asaf Sinai
2018-11-26 15:25:04 UTC
Permalink
+CC Ilia & Sasha.

-----Original Message-----
From: Burakov, Anatoly <***@intel.com>
Sent: Monday, November 26, 2018 04:57 PM
To: Ilya Maximets <***@samsung.com>; Asaf Sinai <***@Radware.com>; ***@dpdk.org; Thomas Monjalon <***@monjalon.net>
Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in memory pool allocations, when enabling/disabling this configuration
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Ilya Maximets
Post by Ilya Maximets
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Asaf Sinai
Hi Anatoly,
We did not check it with "testpmd", only with our application.
   From the beginning, we did not enable this configuration (look at attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?
Just tested with DPDK 17.11, and yes, it does work the way you are describing. This is not intended behavior. I will look into it.
+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
Looking at the code, i think this config option needs to be reworked and we should clarify what we mean by this option. It appears that i've misunderstood what this option actually intended to do, and i also think it's naming could be improved because it's confusing and misleading.
In 17.11, this option does *not* prevent EAL from using NUMA - it merely disables using libnuma to perform memory allocation. This looks like intended (if counter-intuitive) behavior - disabling this option will simply revert DPDK to working as it did before this option was introduced (i.e. best-effort allocation). This is why your code still works - because EAL still does allocate memory on socket 1, and *knows* that it's socket 1 memory. It still supports NUMA.
The commit message for these changes states that the actual purpose of this option is to enable "balanced" hugepage allocation. In case of cgroups limitations, previously, DPDK would've exhausted all hugepages on master core's socket before attempting to allocate from other sockets, but by the time we've reached cgroups limits on numbers of hugepages, we might not have reached socket 1 and thus missed out on the pages we could've allocated, but didn't. Using libnuma solves this issue, because now we can allocate pages on sockets we want, instead of hoping we won't run out of hugepages before we get the memory we need.
In 18.05 onwards, this option works differently (and arguably wrong). More specifically, it disallows allocations on sockets other than 0, and it also makes it so that EAL does not check which socket the memory *actually* came from. So, not only allocating memory from socket 1 is disabled, but allocating from socket 0 may even get you memory from socket 1!
I'd consider this as a bug.
Post by Burakov, Anatoly
+CC Thomas
The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes it seem like this option disables NUMA support, which is not the case.
Without libnuma, we have no way of "asking" the kernel to allocate a hugepage on a specific socket - instead, any allocation will most likely happen on socket from which the allocation came from. For example, if user program's lcore is on socket 1, allocation on socket 0 will actually allocate a page on socket 1.
If we don't check for page's NUMA node affinity (which is what currently happens) - we get performance degradation because we may unintentionally allocate memory on wrong NUMA node. If we do check for this - then allocation of memory on socket 1 from lcore on socket 0 will almost never succeed, because kernel will always give us pages on socket 0.
Put it simply, there is no sane way to make this option work for the new memory subsystem - IMO it should be dropped, and libnuma should be made a hard dependency on Linux.
I agree that new memory model could not work without libnuma,
i.e. will lead to unpredictable memory allocations with no any
respect to requested socket_id's. I also agree that
CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only sane for a legacy memory model.
It looks like we have no other choice than just drop the option
and make the code unconditional, i.e. have hard dependency on libnuma.
We, probably, could compile this code and have hard dependency
only for platforms with 'RTE_MAX_NUMA_NODES > 1'.
Well, as long as legacy mode stays supported, we have to keep the option. The "drop" part was referring to supporting it under the new memory system, not a literal drop from config files.
The option was introduced because we didn't want to introduce the
new hard dependency. Since we'll have it anyway, I'm not sure if
keeping the option for legacy mode makes any sense.
Oh yes, you're right. Drop it is!
Post by Ilya Maximets
As for using RTE_MAX_NUMA_NODES, i don't think it's merited. Distributions cannot deliver different DPDK versions based on the number of sockets on a particular machine - so it would have to be a hard dependency for distributions anyway (does any distribution ship DPDK without libnuma?).
At least ARMv7 builds commonly does not ship libnuma package.
Do you mean libnuma builds for ARMv7 are not available? Or do you mean the libnuma package is not installed by default?
If it's the latter, then i believe it's not installed by default anywhere, but if using distribution version of DPDK, libnuma will be taken care of via package manager. Presumably building from source can be taken care of with pkg-config/meson.
Or do you mean ARMv7 does not have libnuma for their arch at all, in any distro?
libnuma builds for ARMv7 are not available in most of the distros. I
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpac
kages.ubuntu.com%2Fsearch%3Fsuite%3Dbionic%26arch%3Darmhf%26searchon%3
Dnames%26keywords%3Dlibnuma&amp;data=02%7C01%7CAsafSi%40radware.com%7C
a44f84bca42d4a52acac08d653af83b8%7C6ae4e000b5d04f48a766402d46119b76%7C
0%7C0%7C636788410626179927&amp;sdata=1pJ0WkAs6Y%2Bv3w%2BhKAELBw%2BjMra
BnhiqqpsXkRv2ifI%3D&amp;reserved=0
You may see that Ubuntu 18.04 (bionic) has no libnuma package for
'armhf' and also 'powerpc' platforms.
That's a difficulty. Do these platforms support NUMA? In other words, could we replace this flag with just outright disabling NUMA support?
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Ilya Maximets
For those compiling from source - are there any supported
distributions which don't package libnuma? I don't see much sense
in keeping libnuma optional, IMO. This is of course up to the tech
board to decide, but IMO the "without libnuma it's basically
broken" argument is very strong in my opinion :)
Hemant Agrawal
2018-11-27 10:26:50 UTC
Permalink
Post by Asaf Sinai
+CC Ilia & Sasha.
-----Original Message-----
Sent: Monday, November 26, 2018 04:57 PM
Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in memory pool allocations, when enabling/disabling this configuration
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Ilya Maximets
Post by Ilya Maximets
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Asaf Sinai
Hi Anatoly,
We did not check it with "testpmd", only with our application.
   From the beginning, we did not enable this configuration (look at attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?
Just tested with DPDK 17.11, and yes, it does work the way you are describing. This is not intended behavior. I will look into it.
+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
Looking at the code, i think this config option needs to be reworked and we should clarify what we mean by this option. It appears that i've misunderstood what this option actually intended to do, and i also think it's naming could be improved because it's confusing and misleading.
In 17.11, this option does *not* prevent EAL from using NUMA - it merely disables using libnuma to perform memory allocation. This looks like intended (if counter-intuitive) behavior - disabling this option will simply revert DPDK to working as it did before this option was introduced (i.e. best-effort allocation). This is why your code still works - because EAL still does allocate memory on socket 1, and *knows* that it's socket 1 memory. It still supports NUMA.
The commit message for these changes states that the actual purpose of this option is to enable "balanced" hugepage allocation. In case of cgroups limitations, previously, DPDK would've exhausted all hugepages on master core's socket before attempting to allocate from other sockets, but by the time we've reached cgroups limits on numbers of hugepages, we might not have reached socket 1 and thus missed out on the pages we could've allocated, but didn't. Using libnuma solves this issue, because now we can allocate pages on sockets we want, instead of hoping we won't run out of hugepages before we get the memory we need.
In 18.05 onwards, this option works differently (and arguably wrong). More specifically, it disallows allocations on sockets other than 0, and it also makes it so that EAL does not check which socket the memory *actually* came from. So, not only allocating memory from socket 1 is disabled, but allocating from socket 0 may even get you memory from socket 1!
I'd consider this as a bug.
Post by Burakov, Anatoly
+CC Thomas
The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes it seem like this option disables NUMA support, which is not the case.
Without libnuma, we have no way of "asking" the kernel to allocate a hugepage on a specific socket - instead, any allocation will most likely happen on socket from which the allocation came from. For example, if user program's lcore is on socket 1, allocation on socket 0 will actually allocate a page on socket 1.
If we don't check for page's NUMA node affinity (which is what currently happens) - we get performance degradation because we may unintentionally allocate memory on wrong NUMA node. If we do check for this - then allocation of memory on socket 1 from lcore on socket 0 will almost never succeed, because kernel will always give us pages on socket 0.
Put it simply, there is no sane way to make this option work for the new memory subsystem - IMO it should be dropped, and libnuma should be made a hard dependency on Linux.
I agree that new memory model could not work without libnuma,
i.e. will lead to unpredictable memory allocations with no any
respect to requested socket_id's. I also agree that
CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only sane for a legacy memory model.
It looks like we have no other choice than just drop the option
and make the code unconditional, i.e. have hard dependency on libnuma.
We, probably, could compile this code and have hard dependency
only for platforms with 'RTE_MAX_NUMA_NODES > 1'.
Well, as long as legacy mode stays supported, we have to keep the option. The "drop" part was referring to supporting it under the new memory system, not a literal drop from config files.
The option was introduced because we didn't want to introduce the
new hard dependency. Since we'll have it anyway, I'm not sure if
keeping the option for legacy mode makes any sense.
Oh yes, you're right. Drop it is!
Post by Ilya Maximets
As for using RTE_MAX_NUMA_NODES, i don't think it's merited. Distributions cannot deliver different DPDK versions based on the number of sockets on a particular machine - so it would have to be a hard dependency for distributions anyway (does any distribution ship DPDK without libnuma?).
At least ARMv7 builds commonly does not ship libnuma package.
Do you mean libnuma builds for ARMv7 are not available? Or do you mean the libnuma package is not installed by default?
If it's the latter, then i believe it's not installed by default anywhere, but if using distribution version of DPDK, libnuma will be taken care of via package manager. Presumably building from source can be taken care of with pkg-config/meson.
Or do you mean ARMv7 does not have libnuma for their arch at all, in any distro?
libnuma builds for ARMv7 are not available in most of the distros. I
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpac
kages.ubuntu.com%2Fsearch%3Fsuite%3Dbionic%26arch%3Darmhf%26searchon%3
Dnames%26keywords%3Dlibnuma&amp;data=02%7C01%7CAsafSi%40radware.com%7C
a44f84bca42d4a52acac08d653af83b8%7C6ae4e000b5d04f48a766402d46119b76%7C
0%7C0%7C636788410626179927&amp;sdata=1pJ0WkAs6Y%2Bv3w%2BhKAELBw%2BjMra
BnhiqqpsXkRv2ifI%3D&amp;reserved=0
You may see that Ubuntu 18.04 (bionic) has no libnuma package for
'armhf' and also 'powerpc' platforms.
That's a difficulty. Do these platforms support NUMA? In other words, could we replace this flag with just outright disabling NUMA support?
Many platforms don't support NUMA, so they dont' really need libnuma.

Mandating libnuma will also break several things:

  - cross build for ARM on x86 - which is among the preferred method
for build by many in ARM community.

 - many of the embedded SoCs are without NUMA support, they use smaller
rootf (e.g. Yocto).  It will be a burden to add libnuma there.
Post by Asaf Sinai
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Ilya Maximets
For those compiling from source - are there any supported
distributions which don't package libnuma? I don't see much sense
in keeping libnuma optional, IMO. This is of course up to the tech
board to decide, but IMO the "without libnuma it's basically
broken" argument is very strong in my opinion :)
--
Burakov, Anatoly
2018-11-27 10:33:08 UTC
Permalink
Post by Hemant Agrawal
Post by Asaf Sinai
+CC Ilia & Sasha.
-----Original Message-----
Sent: Monday, November 26, 2018 04:57 PM
Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in memory pool allocations, when enabling/disabling this configuration
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Ilya Maximets
Post by Ilya Maximets
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Asaf Sinai
Hi Anatoly,
We did not check it with "testpmd", only with our application.
   From the beginning, we did not enable this configuration (look at attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?
Just tested with DPDK 17.11, and yes, it does work the way you are describing. This is not intended behavior. I will look into it.
+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
Looking at the code, i think this config option needs to be reworked and we should clarify what we mean by this option. It appears that i've misunderstood what this option actually intended to do, and i also think it's naming could be improved because it's confusing and misleading.
In 17.11, this option does *not* prevent EAL from using NUMA - it merely disables using libnuma to perform memory allocation. This looks like intended (if counter-intuitive) behavior - disabling this option will simply revert DPDK to working as it did before this option was introduced (i.e. best-effort allocation). This is why your code still works - because EAL still does allocate memory on socket 1, and *knows* that it's socket 1 memory. It still supports NUMA.
The commit message for these changes states that the actual purpose of this option is to enable "balanced" hugepage allocation. In case of cgroups limitations, previously, DPDK would've exhausted all hugepages on master core's socket before attempting to allocate from other sockets, but by the time we've reached cgroups limits on numbers of hugepages, we might not have reached socket 1 and thus missed out on the pages we could've allocated, but didn't. Using libnuma solves this issue, because now we can allocate pages on sockets we want, instead of hoping we won't run out of hugepages before we get the memory we need.
In 18.05 onwards, this option works differently (and arguably wrong). More specifically, it disallows allocations on sockets other than 0, and it also makes it so that EAL does not check which socket the memory *actually* came from. So, not only allocating memory from socket 1 is disabled, but allocating from socket 0 may even get you memory from socket 1!
I'd consider this as a bug.
Post by Burakov, Anatoly
+CC Thomas
The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes it seem like this option disables NUMA support, which is not the case.
Without libnuma, we have no way of "asking" the kernel to allocate a hugepage on a specific socket - instead, any allocation will most likely happen on socket from which the allocation came from. For example, if user program's lcore is on socket 1, allocation on socket 0 will actually allocate a page on socket 1.
If we don't check for page's NUMA node affinity (which is what currently happens) - we get performance degradation because we may unintentionally allocate memory on wrong NUMA node. If we do check for this - then allocation of memory on socket 1 from lcore on socket 0 will almost never succeed, because kernel will always give us pages on socket 0.
Put it simply, there is no sane way to make this option work for the new memory subsystem - IMO it should be dropped, and libnuma should be made a hard dependency on Linux.
I agree that new memory model could not work without libnuma,
i.e. will lead to unpredictable memory allocations with no any
respect to requested socket_id's. I also agree that
CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only sane for a legacy memory model.
It looks like we have no other choice than just drop the option
and make the code unconditional, i.e. have hard dependency on libnuma.
We, probably, could compile this code and have hard dependency
only for platforms with 'RTE_MAX_NUMA_NODES > 1'.
Well, as long as legacy mode stays supported, we have to keep the option. The "drop" part was referring to supporting it under the new memory system, not a literal drop from config files.
The option was introduced because we didn't want to introduce the
new hard dependency. Since we'll have it anyway, I'm not sure if
keeping the option for legacy mode makes any sense.
Oh yes, you're right. Drop it is!
Post by Ilya Maximets
As for using RTE_MAX_NUMA_NODES, i don't think it's merited. Distributions cannot deliver different DPDK versions based on the number of sockets on a particular machine - so it would have to be a hard dependency for distributions anyway (does any distribution ship DPDK without libnuma?).
At least ARMv7 builds commonly does not ship libnuma package.
Do you mean libnuma builds for ARMv7 are not available? Or do you mean the libnuma package is not installed by default?
If it's the latter, then i believe it's not installed by default anywhere, but if using distribution version of DPDK, libnuma will be taken care of via package manager. Presumably building from source can be taken care of with pkg-config/meson.
Or do you mean ARMv7 does not have libnuma for their arch at all, in any distro?
libnuma builds for ARMv7 are not available in most of the distros. I
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpac
kages.ubuntu.com%2Fsearch%3Fsuite%3Dbionic%26arch%3Darmhf%26searchon%3
Dnames%26keywords%3Dlibnuma&amp;data=02%7C01%7CAsafSi%40radware.com%7C
a44f84bca42d4a52acac08d653af83b8%7C6ae4e000b5d04f48a766402d46119b76%7C
0%7C0%7C636788410626179927&amp;sdata=1pJ0WkAs6Y%2Bv3w%2BhKAELBw%2BjMra
BnhiqqpsXkRv2ifI%3D&amp;reserved=0
You may see that Ubuntu 18.04 (bionic) has no libnuma package for
'armhf' and also 'powerpc' platforms.
That's a difficulty. Do these platforms support NUMA? In other words, could we replace this flag with just outright disabling NUMA support?
Many platforms don't support NUMA, so they dont' really need libnuma.
  - cross build for ARM on x86 - which is among the preferred method
for build by many in ARM community.
 - many of the embedded SoCs are without NUMA support, they use smaller
rootf (e.g. Yocto).  It will be a burden to add libnuma there.
OK, point taken.

So, the alternative would be to have the ability to outright disable
NUMA support (either with a new option, or reworking this one - i would
prefer a new one, since this one is confusingly named). Meaning, report
all cores as socket 0, report all hardware as socket 0, report all
memory as socket 0 and never care about NUMA nodes anywhere.

Would that work? E.g. by default, make libnuma a hard dependency on x86
Linux (but allow to disable it), but disable it everywhere else?
Post by Hemant Agrawal
Post by Asaf Sinai
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Ilya Maximets
For those compiling from source - are there any supported
distributions which don't package libnuma? I don't see much sense
in keeping libnuma optional, IMO. This is of course up to the tech
board to decide, but IMO the "without libnuma it's basically
broken" argument is very strong in my opinion :)
--
Thanks,
Anatoly
--
Thanks,
Anatoly
Ilya Maximets
2018-11-27 16:49:58 UTC
Permalink
Post by Burakov, Anatoly
Post by Hemant Agrawal
Post by Asaf Sinai
+CC Ilia & Sasha.
-----Original Message-----
Sent: Monday, November 26, 2018 04:57 PM
Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in memory pool allocations, when enabling/disabling this configuration
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Ilya Maximets
Post by Ilya Maximets
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Asaf Sinai
Hi Anatoly,
We did not check it with "testpmd", only with our application.
      From the beginning, we did not enable this configuration (look at attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?
Just tested with DPDK 17.11, and yes, it does work the way you are describing. This is not intended behavior. I will look into it.
+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
Looking at the code, i think this config option needs to be reworked and we should clarify what we mean by this option. It appears that i've misunderstood what this option actually intended to do, and i also think it's naming could be improved because it's confusing and misleading.
In 17.11, this option does *not* prevent EAL from using NUMA - it merely disables using libnuma to perform memory allocation. This looks like intended (if counter-intuitive) behavior - disabling this option will simply revert DPDK to working as it did before this option was introduced (i.e. best-effort allocation). This is why your code still works - because EAL still does allocate memory on socket 1, and *knows* that it's socket 1 memory. It still supports NUMA.
The commit message for these changes states that the actual purpose of this option is to enable "balanced" hugepage allocation. In case of cgroups limitations, previously, DPDK would've exhausted all hugepages on master core's socket before attempting to allocate from other sockets, but by the time we've reached cgroups limits on numbers of hugepages, we might not have reached socket 1 and thus missed out on the pages we could've allocated, but didn't. Using libnuma solves this issue, because now we can allocate pages on sockets we want, instead of hoping we won't run out of hugepages before we get the memory we need.
In 18.05 onwards, this option works differently (and arguably wrong). More specifically, it disallows allocations on sockets other than 0, and it also makes it so that EAL does not check which socket the memory *actually* came from. So, not only allocating memory from socket 1 is disabled, but allocating from socket 0 may even get you memory from socket 1!
I'd consider this as a bug.
Post by Burakov, Anatoly
+CC Thomas
The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes it seem like this option disables NUMA support, which is not the case.
Without libnuma, we have no way of "asking" the kernel to allocate a hugepage on a specific socket - instead, any allocation will most likely happen on socket from which the allocation came from. For example, if user program's lcore is on socket 1, allocation on socket 0 will actually allocate a page on socket 1.
If we don't check for page's NUMA node affinity (which is what currently happens) - we get performance degradation because we may unintentionally allocate memory on wrong NUMA node. If we do check for this - then allocation of memory on socket 1 from lcore on socket 0 will almost never succeed, because kernel will always give us pages on socket 0.
Put it simply, there is no sane way to make this option work for the new memory subsystem - IMO it should be dropped, and libnuma should be made a hard dependency on Linux.
I agree that new memory model could not work without libnuma,
i.e. will lead to unpredictable memory allocations with no any
respect to requested socket_id's. I also agree that
CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only sane for a legacy memory model.
It looks like we have no other choice than just drop the option
and make the code unconditional, i.e. have hard dependency on libnuma.
We, probably, could compile this code and have hard dependency
only for platforms with 'RTE_MAX_NUMA_NODES > 1'.
Well, as long as legacy mode stays supported, we have to keep the option. The "drop" part was referring to supporting it under the new memory system, not a literal drop from config files.
The option was introduced because we didn't want to introduce the
new hard dependency. Since we'll have it anyway, I'm not sure if
keeping the option for legacy mode makes any sense.
Oh yes, you're right. Drop it is!
Post by Ilya Maximets
As for using RTE_MAX_NUMA_NODES, i don't think it's merited. Distributions cannot deliver different DPDK versions based on the number of sockets on a particular machine - so it would have to be a hard dependency for distributions anyway (does any distribution ship DPDK without libnuma?).
At least ARMv7 builds commonly does not ship libnuma package.
Do you mean libnuma builds for ARMv7 are not available? Or do you mean the libnuma package is not installed by default?
If it's the latter, then i believe it's not installed by default anywhere, but if using distribution version of DPDK, libnuma will be taken care of via package manager. Presumably building from source can be taken care of with pkg-config/meson.
Or do you mean ARMv7 does not have libnuma for their arch at all, in any distro?
libnuma builds for ARMv7 are not available in most of the distros. I
       https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpac
kages.ubuntu.com%2Fsearch%3Fsuite%3Dbionic%26arch%3Darmhf%26searchon%3
Dnames%26keywords%3Dlibnuma&amp;data=02%7C01%7CAsafSi%40radware.com%7C
a44f84bca42d4a52acac08d653af83b8%7C6ae4e000b5d04f48a766402d46119b76%7C
0%7C0%7C636788410626179927&amp;sdata=1pJ0WkAs6Y%2Bv3w%2BhKAELBw%2BjMra
BnhiqqpsXkRv2ifI%3D&amp;reserved=0
You may see that Ubuntu 18.04 (bionic) has no libnuma package for
'armhf' and also 'powerpc' platforms.
That's a difficulty. Do these platforms support NUMA? In other words, could we replace this flag with just outright disabling NUMA support?
Many platforms don't support NUMA, so they dont' really need libnuma.
    - cross build for ARM on x86 - which is among the preferred method
for build by many in ARM community.
   - many of the embedded SoCs are without NUMA support, they use smaller
rootf (e.g. Yocto).  It will be a burden to add libnuma there.
OK, point taken.
So, the alternative would be to have the ability to outright disable NUMA support (either with a new option, or reworking this one - i would prefer a new one, since this one is confusingly named). Meaning, report all cores as socket 0, report all hardware as socket 0, report all memory as socket 0 and never care about NUMA nodes anywhere.
Would that work? E.g. by default, make libnuma a hard dependency on x86 Linux (but allow to disable it), but disable it everywhere else?
I think, you may just rename the RTE_EAL_NUMA_AWARE_HUGEPAGES to something
like RTE_EAL_NUMA_SUPPORT and keep all the defaults as is, i.e.
* globally disabled
* enabled for linux
* disabled for armv7a, dpaa, dpaa2 and stingray.
Meson could handle everything dynamically.
Post by Burakov, Anatoly
Post by Hemant Agrawal
Post by Asaf Sinai
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Ilya Maximets
For those compiling from source - are there any supported
distributions which don't package libnuma? I don't see much sense
in keeping libnuma optional, IMO. This is of course up to the tech
board to decide, but IMO the "without libnuma it's basically
broken" argument is very strong in my opinion :)
--
Thanks,
Anatoly
Burakov, Anatoly
2018-12-10 10:09:43 UTC
Permalink
Hi all,
Thanks for the detailed explanations!
- Dividing huge pages between NUMAs was based, by default, on Linux good will.
- Enforcing Linux to divide huge pages between NUMAs, required enabling configuration option "CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES".
- The enforcement was done via "libnuma" library.
- The mentioned configuration option is ignored, so that by default, all huge pages are allocated on NUMA 0.
- if "libnuma" library exists in system, then huge pages will be divided between NUMAs, without any special configuration.
- The above is relevant to architectures that support NUMA, e.g. X86 (which we use).
Thanks,
Asaf
Hi Asaf,

Before 18.05, the above description is correct.

Since 18.05, it's not _quite_ like that. There are two memory modes in
18.05 - default and legacy. Legacy mode pretty much behaves like
pre-18.05 code.

Default memory mode without the CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES for
all intents and purposes should be considered unsupported for post-18.05
code, and libnuma should be considered to be a hard dependency for
non-legacy, NUMA-aware code. Without this option, EAL will disallow
allocations on sockets other than 0, but on a NUMA-enabled system, you
won't necessarily get memory from socket 0 - it will *say* it is on
socket 0, but it may not *actually* be the case, because without libnuma
we do not check where it was allocated.

Reasons for the above behavior is simple: legacy mem mode preallocates
all memory in advance. This gives us an opportunity to figure out page
socket affinity at initialization, and not worry about it afterwards.
Non-legacy mode doesn't have the luxury of preallocating all memory in
advance, instead we allocate memory on the fly - which means that
whenever an allocation is requested, we need memory not just anywhere
(like in legacy init case), but located on a specific socket - we cannot
"sort it out later" like we do with legacy mem. Without libnuma, we
cannot get this functionality.
-----Original Message-----
Sent: Tuesday, November 27, 2018 06:50 PM
Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in memory pool allocations, when enabling/disabling this configuration
Post by Burakov, Anatoly
Post by Hemant Agrawal
Post by Asaf Sinai
+CC Ilia & Sasha.
-----Original Message-----
Sent: Monday, November 26, 2018 04:57 PM
Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no
difference in memory pool allocations, when enabling/disabling this
configuration
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Ilya Maximets
Post by Ilya Maximets
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Asaf Sinai
Hi Anatoly,
We did not check it with "testpmd", only with our application.
      From the beginning, we did not enable this configuration (look at attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?
Just tested with DPDK 17.11, and yes, it does work the way you are describing. This is not intended behavior. I will look into it.
+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
Looking at the code, i think this config option needs to be reworked and we should clarify what we mean by this option. It appears that i've misunderstood what this option actually intended to do, and i also think it's naming could be improved because it's confusing and misleading.
In 17.11, this option does *not* prevent EAL from using NUMA - it merely disables using libnuma to perform memory allocation. This looks like intended (if counter-intuitive) behavior - disabling this option will simply revert DPDK to working as it did before this option was introduced (i.e. best-effort allocation). This is why your code still works - because EAL still does allocate memory on socket 1, and *knows* that it's socket 1 memory. It still supports NUMA.
The commit message for these changes states that the actual purpose of this option is to enable "balanced" hugepage allocation. In case of cgroups limitations, previously, DPDK would've exhausted all hugepages on master core's socket before attempting to allocate from other sockets, but by the time we've reached cgroups limits on numbers of hugepages, we might not have reached socket 1 and thus missed out on the pages we could've allocated, but didn't. Using libnuma solves this issue, because now we can allocate pages on sockets we want, instead of hoping we won't run out of hugepages before we get the memory we need.
In 18.05 onwards, this option works differently (and arguably wrong). More specifically, it disallows allocations on sockets other than 0, and it also makes it so that EAL does not check which socket the memory *actually* came from. So, not only allocating memory from socket 1 is disabled, but allocating from socket 0 may even get you memory from socket 1!
I'd consider this as a bug.
Post by Burakov, Anatoly
+CC Thomas
The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes it seem like this option disables NUMA support, which is not the case.
Without libnuma, we have no way of "asking" the kernel to allocate a hugepage on a specific socket - instead, any allocation will most likely happen on socket from which the allocation came from. For example, if user program's lcore is on socket 1, allocation on socket 0 will actually allocate a page on socket 1.
If we don't check for page's NUMA node affinity (which is what currently happens) - we get performance degradation because we may unintentionally allocate memory on wrong NUMA node. If we do check for this - then allocation of memory on socket 1 from lcore on socket 0 will almost never succeed, because kernel will always give us pages on socket 0.
Put it simply, there is no sane way to make this option work for the new memory subsystem - IMO it should be dropped, and libnuma should be made a hard dependency on Linux.
I agree that new memory model could not work without libnuma,
i.e. will lead to unpredictable memory allocations with no any
respect to requested socket_id's. I also agree that
CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only sane for a legacy memory model.
It looks like we have no other choice than just drop the
option and make the code unconditional, i.e. have hard dependency on libnuma.
We, probably, could compile this code and have hard dependency
only for platforms with 'RTE_MAX_NUMA_NODES > 1'.
Well, as long as legacy mode stays supported, we have to keep the option. The "drop" part was referring to supporting it under the new memory system, not a literal drop from config files.
The option was introduced because we didn't want to introduce the
new hard dependency. Since we'll have it anyway, I'm not sure if
keeping the option for legacy mode makes any sense.
Oh yes, you're right. Drop it is!
Post by Ilya Maximets
As for using RTE_MAX_NUMA_NODES, i don't think it's merited. Distributions cannot deliver different DPDK versions based on the number of sockets on a particular machine - so it would have to be a hard dependency for distributions anyway (does any distribution ship DPDK without libnuma?).
At least ARMv7 builds commonly does not ship libnuma package.
Do you mean libnuma builds for ARMv7 are not available? Or do you mean the libnuma package is not installed by default?
If it's the latter, then i believe it's not installed by default anywhere, but if using distribution version of DPDK, libnuma will be taken care of via package manager. Presumably building from source can be taken care of with pkg-config/meson.
Or do you mean ARMv7 does not have libnuma for their arch at all, in any distro?
libnuma builds for ARMv7 are not available in most of the distros.
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
pac
kages.ubuntu.com%2Fsearch%3Fsuite%3Dbionic%26arch%3Darmhf%26searcho
n%3
Dnames%26keywords%3Dlibnuma&amp;data=02%7C01%7CAsafSi%40radware.com
%7C
a44f84bca42d4a52acac08d653af83b8%7C6ae4e000b5d04f48a766402d46119b76
%7C
0%7C0%7C636788410626179927&amp;sdata=1pJ0WkAs6Y%2Bv3w%2BhKAELBw%2Bj
Mra
BnhiqqpsXkRv2ifI%3D&amp;reserved=0
You may see that Ubuntu 18.04 (bionic) has no libnuma package for
'armhf' and also 'powerpc' platforms.
That's a difficulty. Do these platforms support NUMA? In other words, could we replace this flag with just outright disabling NUMA support?
Many platforms don't support NUMA, so they dont' really need libnuma.
    - cross build for ARM on x86 - which is among the preferred
method for build by many in ARM community.
   - many of the embedded SoCs are without NUMA support, they use
smaller rootf (e.g. Yocto).  It will be a burden to add libnuma there.
OK, point taken.
So, the alternative would be to have the ability to outright disable NUMA support (either with a new option, or reworking this one - i would prefer a new one, since this one is confusingly named). Meaning, report all cores as socket 0, report all hardware as socket 0, report all memory as socket 0 and never care about NUMA nodes anywhere.
Would that work? E.g. by default, make libnuma a hard dependency on x86 Linux (but allow to disable it), but disable it everywhere else?
I think, you may just rename the RTE_EAL_NUMA_AWARE_HUGEPAGES to something like RTE_EAL_NUMA_SUPPORT and keep all the defaults as is, i.e.
* globally disabled
* enabled for linux
* disabled for armv7a, dpaa, dpaa2 and stingray.
Meson could handle everything dynamically.
Post by Burakov, Anatoly
Post by Hemant Agrawal
Post by Asaf Sinai
Post by Ilya Maximets
Post by Burakov, Anatoly
Post by Ilya Maximets
For those compiling from source - are there any supported
distributions which don't package libnuma? I don't see much
sense in keeping libnuma optional, IMO. This is of course up to
the tech board to decide, but IMO the "without libnuma it's
basically broken" argument is very strong in my opinion :)
--
Thanks,
Anatoly
--
Thanks,
Anatoly
Ilya Maximets
2018-11-26 12:23:16 UTC
Permalink
Post by Asaf Sinai
Hi,
We have 2 NUMAs in our system, and we try to allocate a single DPDK memory pool on each NUMA.
However, we see no difference when enabling/disabling "CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES" configuration.
We expected that disabling it will allocate pools only on one NUMA (probably NUMA0), but it actually allocates pools on both NUMAs, according to "socket_id" parameter passed to "rte_mempool_create" API.
We have 192GB memory, so NUMA1 memory starts from address: 0x1800000000.
CONFIG_RTE_LIBRTE_VHOST_NUMA=n
CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
created poolName=undDpdkPoolNameSocket_0, nbufs=887808, bufferSize=2432, total=2059MB
(memZone: name=MP_undDpdkPoolNameSocket_0, socket_id=0, vaddr=0x1f2c0427d00-0x1f2c05abe00, paddr=0x178e627d00-0x178e7abe00, len=1589504, hugepage_sz=2MB)
created poolName=undDpdkPoolNameSocket_1, nbufs=887808, bufferSize=2432, total=2059MB
(memZone: name=MP_undDpdkPoolNameSocket_1, socket_id=1, vaddr=0x1f57fa7be40-0x1f57fbfff40, paddr=0x2f8247be40-0x2f825fff40, len=1589504, hugepage_sz=2MB)
Does anyone know what is "CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES" configuration used for?
This config option was introduced to force DPDK to allocate memory
from NUMA nodes that was requested by 'socket_id'. That is exactly
what you're observing.

Look at the commit 1b72605d2416 ("mem: balanced allocation of hugepages")
for the original issue fixed by this option.

----
Post by Asaf Sinai
Hi Asaf,
I cannot reproduce this behavior. Just tried running testpmd with DPDK
18.08 as well as latest master [1], and DPDK could not successfully
allocate a mempool on socket 1.
I think that this is a bug. Because, if option enabled, you should successfully
allocate the memory from the requested NUMA if it's available.

If option disabled, we just requesting pages from the kernel and it could
return them from any NUMA node. With option enabled, we're trying to
force kernel to allocate from the nodes we need.

Best regards, Ilya Maximets.
Ilya Maximets
2018-11-26 12:46:11 UTC
Permalink
Post by Ilya Maximets
Post by Asaf Sinai
Hi,
We have 2 NUMAs in our system, and we try to allocate a single DPDK memory pool on each NUMA.
However, we see no difference when enabling/disabling "CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES" configuration.
We expected that disabling it will allocate pools only on one NUMA (probably NUMA0), but it actually allocates pools on both NUMAs, according to "socket_id" parameter passed to "rte_mempool_create" API.
We have 192GB memory, so NUMA1 memory starts from address: 0x1800000000.
CONFIG_RTE_LIBRTE_VHOST_NUMA=n
CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
created poolName=undDpdkPoolNameSocket_0, nbufs=887808, bufferSize=2432, total=2059MB
(memZone: name=MP_undDpdkPoolNameSocket_0, socket_id=0, vaddr=0x1f2c0427d00-0x1f2c05abe00, paddr=0x178e627d00-0x178e7abe00, len=1589504, hugepage_sz=2MB)
created poolName=undDpdkPoolNameSocket_1, nbufs=887808, bufferSize=2432, total=2059MB
(memZone: name=MP_undDpdkPoolNameSocket_1, socket_id=1, vaddr=0x1f57fa7be40-0x1f57fbfff40, paddr=0x2f8247be40-0x2f825fff40, len=1589504, hugepage_sz=2MB)
Does anyone know what is "CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES" configuration used for?
This config option was introduced to force DPDK to allocate memory
from NUMA nodes that was requested by 'socket_id'. That is exactly
what you're observing.
I meant '--socket-mem' (not the 'socket_id'), of course.
Post by Ilya Maximets
Look at the commit 1b72605d2416 ("mem: balanced allocation of hugepages")
for the original issue fixed by this option.
----
Post by Asaf Sinai
Hi Asaf,
I cannot reproduce this behavior. Just tried running testpmd with DPDK
18.08 as well as latest master [1], and DPDK could not successfully
allocate a mempool on socket 1.
I think that this is a bug. Because, if option enabled, you should successfully
allocate the memory from the requested NUMA if it's available.
Anyway, we can't guarantee the failure here because if option disabled
the kernel will decide from which NUMA to allocate memory. And it
could eventually allocate it from the requested one.

And I guess, the meaning of this option was a bit changed with a
new memory model.
Post by Ilya Maximets
If option disabled, we just requesting pages from the kernel and it could
return them from any NUMA node. With option enabled, we're trying to
force kernel to allocate from the nodes we need.
Best regards, Ilya Maximets.
Loading...