[dpdk-dev] [PATCH 0/3] new software event timer adapter

Discussion:

Erik Gabriel Carrillo

2018-11-29 23:35:11 UTC

This patch series introduces a new version of the event timer
adapter software PMD [1]. In the original design, timer event producer
lcores in the primary and secondary processes enqueued event timers
into a ring, and a service core in the primary process dequeued them
and processed them further. To improve performance, this version does
away with the ring and lets the lcores in both primary and secondary
processes insert timers into directly into the timer skiplist data
structures; the service core directly accesses the lists as well.
To achieve this, however, modifications to the timer library [2] are
required to enable the timer skiplists to be created and accessed in
shared memory. New APIs are introduced in the timer library to enable
selecting from multiple instances of the timer skiplists. Instances of
the event timer adapter, as well as the original APIs of the timer
library, can then each access distinct timer lists.

Future versions of this series will hopefully improve the names
used for the data structures and APIs in the timer library.

This series depends on the following patch:
https://patches.dpdk.org/patch/48417/

[1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html
[2] https://doc.dpdk.org/guides/prog_guide/timer_lib.html

Erik Gabriel Carrillo (3):
timer: allow timer management in shared memory
timer: add function to stop all timers in a list
eventdev: add new software event timer adapter

lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++---------------
lib/librte_timer/Makefile | 1 +
lib/librte_timer/rte_timer.c | 579 ++++++++++++++++++----
lib/librte_timer/rte_timer.h | 200 +++++++-
lib/librte_timer/rte_timer_version.map | 22 +-
5 files changed, 972 insertions(+), 517 deletions(-)

--
2.6.4

Erik Gabriel Carrillo

2018-11-29 23:35:12 UTC

Permalink

Currently, the timer library uses a per-process table of structures to
manage skiplists of timers presumably because timers contain arbitrary
function pointers whose value may not resolve properly in other
processes.

However, if the same callback is used handle all timers, and that
callback is only invoked in one process, then it woud be safe to allow
the data structures to be allocated in shared memory, and to allow
secondary processes to modify the timer lists. This would let timers be
used in more multi-process scenarios.

The library's global variables are wrapped with a struct, and an array
of these structures is created in shared memory. The original APIs
are updated to reference the zeroth entry in the array. This maintains
the original behavior for both primary and secondary processes since
the set intersection of their coremasks should be empty [1]. New APIs
are introduced to enable the allocation/deallocation of other entries
in the array.

New variants of the APIs used to start and stop timers are introduced;
they allow a caller to specify which array entry should be used to
locate the timer list to insert into or delete from.

Finally, a new variant of rte_timer_manage() is introduced, which
allows a caller to specify which array entry should be used to locate
the timer lists to process; it can also process multiple timer lists per
invocation.

[1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations

Signed-off-by: Erik Gabriel Carrillo <***@intel.com>
---
lib/librte_timer/Makefile | 1 +
lib/librte_timer/rte_timer.c | 526 +++++++++++++++++++++++++++------
lib/librte_timer/rte_timer.h | 168 ++++++++++-
lib/librte_timer/rte_timer_version.map | 21 +-
4 files changed, 614 insertions(+), 102 deletions(-)

diff --git a/lib/librte_timer/Makefile b/lib/librte_timer/Makefile
index 4ebd528..8ec63f4 100644
--- a/lib/librte_timer/Makefile
+++ b/lib/librte_timer/Makefile
@@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
# library name
LIB = librte_timer.a

+CFLAGS += -DALLOW_EXPERIMENTAL_API
CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
LDLIBS += -lrte_eal

diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 30c7b0a..a76be8b 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -5,6 +5,7 @@
#include <string.h>
#include <stdio.h>
#include <stdint.h>
+#include <stdbool.h>
#include <inttypes.h>
#include <assert.h>
#include <sys/queue.h>
@@ -21,23 +22,27 @@
#include <rte_spinlock.h>
#include <rte_random.h>
#include <rte_pause.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>

#include "rte_timer.h"

-LIST_HEAD(rte_timer_list, rte_timer);
-
+/**
+ * Per-lcore info for timers.
+ */
struct priv_timer {
- struct rte_timer pending_head; /**< dummy timer instance to head up list */
+ struct rte_timer pending_head; /**< dummy timer to head up list */
rte_spinlock_t list_lock; /**< lock to protect list access */

/** per-core variable that true if a timer was updated on this
- * core since last reset of the variable */
+ * core since last reset of the variable
+ */
int updated;

/** track the current depth of the skiplist */
- unsigned curr_skiplist_depth;
+ unsigned int curr_skiplist_depth;

- unsigned prev_lcore; /**< used for lcore round robin */
+ unsigned int prev_lcore; /**< used for lcore round robin */

/** running timer on this lcore now */
struct rte_timer *running_tim;
@@ -48,33 +53,140 @@ struct priv_timer {
#endif
} __rte_cache_aligned;

-/** per-lcore private info for timers */
-static struct priv_timer priv_timer[RTE_MAX_LCORE];
+#define FL_ALLOCATED (1 << 0)
+struct rte_timer_data {
+ struct priv_timer priv_timer[RTE_MAX_LCORE];
+ uint8_t internal_flags;
+};
+
+#define RTE_MAX_DATA_ELS 64
+static struct rte_timer_data *rte_timer_data_arr;
+static uint32_t default_data_id; // id set to zero automatically
+static uint32_t rte_timer_subsystem_initialized;

/* when debug is enabled, store some statistics */
#ifdef RTE_LIBRTE_TIMER_DEBUG
-#define __TIMER_STAT_ADD(name, n) do { \
+#define __TIMER_STAT_ADD(data, name, n) do { \
unsigned __lcore_id = rte_lcore_id(); \
if (__lcore_id < RTE_MAX_LCORE) \
- priv_timer[__lcore_id].stats.name += (n); \
+ data->priv_timer[__lcore_id].stats.name += (n); \
} while(0)
#else
-#define __TIMER_STAT_ADD(name, n) do {} while(0)
+#define __TIMER_STAT_ADD(data, name, n) do {} while (0)
#endif

-/* Init the timer library. */
-void
+static inline int
+timer_data_valid(uint32_t id)
+{
+ return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED);
+}
+
+/* validate ID and retrieve timer data pointer, or return error value */
+#define TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, retval) do { \
+ if (id >= RTE_MAX_DATA_ELS || !timer_data_valid(id)) \
+ return retval; \
+ timer_data = &rte_timer_data_arr[id]; \
+} while (0)
+
+int __rte_experimental
+rte_timer_data_alloc(uint32_t *id_ptr)
+{
+ int i;
+ struct rte_timer_data *data;
+
+ if (!rte_timer_subsystem_initialized)
+ return -ENOMEM;
+
+ for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+ data = &rte_timer_data_arr[i];
+ if (!(data->internal_flags & FL_ALLOCATED)) {
+ data->internal_flags |= FL_ALLOCATED;
+
+ if (id_ptr)
+ *id_ptr = i;
+
+ return 0;
+ }
+ }
+
+ return -ENOSPC;
+}
+
+int __rte_experimental
+rte_timer_data_dealloc(uint32_t id)
+{
+ struct rte_timer_data *timer_data;
+ TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, -EINVAL);
+
+ timer_data->internal_flags &= ~(FL_ALLOCATED);
+
+ return 0;
+}
+
+/* Init the timer library. Allocate an array of timer data structs in shared
+ * memory, and allocate the zeroth entry for use with original timer
+ * APIs. Since the intersection of the sets of lcore ids in primary and
+ * secondary processes should be empty, the zeroth entry can be shared by
+ * multiple processes.
+ */
+int
rte_timer_subsystem_init(void)
{
- unsigned lcore_id;
+ const struct rte_memzone *mz;
+ struct rte_timer_data *data;
+ int i, lcore_id;
+ static const char *mz_name = "rte_timer_mz";

- /* since priv_timer is static, it's zeroed by default, so only init some
- * fields.
- */
- for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id ++) {
- rte_spinlock_init(&priv_timer[lcore_id].list_lock);
- priv_timer[lcore_id].prev_lcore = lcore_id;
+ if (rte_timer_subsystem_initialized)
+ return -EALREADY;
+
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+ mz = rte_memzone_lookup(mz_name);
+ if (mz == NULL)
+ return -EEXIST;
+
+ rte_timer_data_arr = mz->addr;
+
+ rte_timer_data_arr[default_data_id].internal_flags |=
+ FL_ALLOCATED;
+
+ rte_timer_subsystem_initialized = 1;
+
+ return 0;
+ }
+
+ mz = rte_memzone_reserve_aligned(mz_name,
+ RTE_MAX_DATA_ELS * sizeof(*rte_timer_data_arr),
+ SOCKET_ID_ANY, 0, RTE_CACHE_LINE_SIZE);
+ if (mz == NULL)
+ return -ENOMEM;
+
+ rte_timer_data_arr = mz->addr;
+
+ for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+ data = &rte_timer_data_arr[i];
+
+ for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+ rte_spinlock_init(
+ &data->priv_timer[lcore_id].list_lock);
+ data->priv_timer[lcore_id].prev_lcore = lcore_id;
+ }
}
+
+ rte_timer_data_arr[default_data_id].internal_flags |= FL_ALLOCATED;
+
+ rte_timer_subsystem_initialized = 1;
+
+ return 0;
+}
+
+void __rte_experimental
+rte_timer_subsystem_finalize(void)
+{
+ if (rte_timer_data_arr)
+ rte_free(rte_timer_data_arr);
+
+ rte_timer_subsystem_initialized = 0;
}

/* Initialize the timer handle tim for use */
@@ -95,7 +207,8 @@ rte_timer_init(struct rte_timer *tim)
*/
static int
timer_set_config_state(struct rte_timer *tim,
- union rte_timer_status *ret_prev_status)
+ union rte_timer_status *ret_prev_status,
+ struct rte_timer_data *data)
{
union rte_timer_status prev_status, status;
int success = 0;
@@ -113,7 +226,7 @@ timer_set_config_state(struct rte_timer *tim,
*/
if (prev_status.state == RTE_TIMER_RUNNING &&
(prev_status.owner != (uint16_t)lcore_id ||
- tim != priv_timer[lcore_id].running_tim))
+ tim != data->priv_timer[lcore_id].running_tim))
return -1;

/* timer is being configured on another core */
@@ -207,13 +320,13 @@ timer_get_skiplist_level(unsigned curr_depth)
*/
static void
timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
- struct rte_timer **prev)
+ struct rte_timer **prev, struct rte_timer_data *data)
{
- unsigned lvl = priv_timer[tim_lcore].curr_skiplist_depth;
- prev[lvl] = &priv_timer[tim_lcore].pending_head;
- while(lvl != 0) {
+ unsigned int lvl = data->priv_timer[tim_lcore].curr_skiplist_depth;
+ prev[lvl] = &data->priv_timer[tim_lcore].pending_head;
+ while (lvl != 0) {
lvl--;
- prev[lvl] = prev[lvl+1];
+ prev[lvl] = prev[lvl + 1];
while (prev[lvl]->sl_next[lvl] &&
prev[lvl]->sl_next[lvl]->expire <= time_val)
prev[lvl] = prev[lvl]->sl_next[lvl];
@@ -226,14 +339,16 @@ timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
*/
static void
timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
- struct rte_timer **prev)
+ struct rte_timer **prev,
+ struct rte_timer_data *data)
{
int i;
/* to get a specific entry in the list, look for just lower than the time
* values, and then increment on each level individually if necessary
*/
- timer_get_prev_entries(tim->expire - 1, tim_lcore, prev);
- for (i = priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; i--) {
+ timer_get_prev_entries(tim->expire - 1, tim_lcore, prev, data);
+ for (i = data->priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0;
+ i--) {
while (prev[i]->sl_next[i] != NULL &&
prev[i]->sl_next[i] != tim &&
prev[i]->sl_next[i]->expire <= tim->expire)
@@ -247,20 +362,21 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
* timer must not be in a list
*/
static void
-timer_add(struct rte_timer *tim, unsigned int tim_lcore)
+timer_add(struct rte_timer *tim, unsigned int tim_lcore,
+ struct rte_timer_data *data)
{
unsigned lvl;
struct rte_timer *prev[MAX_SKIPLIST_DEPTH+1];

/* find where exactly this element goes in the list of elements
* for each depth. */
- timer_get_prev_entries(tim->expire, tim_lcore, prev);
+ timer_get_prev_entries(tim->expire, tim_lcore, prev, data);

/* now assign it a new level and add at that level */
const unsigned tim_level = timer_get_skiplist_level(
- priv_timer[tim_lcore].curr_skiplist_depth);
- if (tim_level == priv_timer[tim_lcore].curr_skiplist_depth)
- priv_timer[tim_lcore].curr_skiplist_depth++;
+ data->priv_timer[tim_lcore].curr_skiplist_depth);
+ if (tim_level == data->priv_timer[tim_lcore].curr_skiplist_depth)
+ data->priv_timer[tim_lcore].curr_skiplist_depth++;

lvl = tim_level;
while (lvl > 0) {
@@ -272,9 +388,10 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore)
prev[0]->sl_next[0] = tim;

/* save the lowest list entry into the expire field of the dummy hdr
- * NOTE: this is not atomic on 32-bit*/
- priv_timer[tim_lcore].pending_head.expire = priv_timer[tim_lcore].\
- pending_head.sl_next[0]->expire;
+ * NOTE: this is not atomic on 32-bit
+ */
+ data->priv_timer[tim_lcore].pending_head.expire =
+ data->priv_timer[tim_lcore].pending_head.sl_next[0]->expire;
}

/*
@@ -284,7 +401,7 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore)
*/
static void
timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
- int local_is_locked)
+ int local_is_locked, struct rte_timer_data *data)
{
unsigned lcore_id = rte_lcore_id();
unsigned prev_owner = prev_status.owner;
@@ -295,30 +412,33 @@ timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
* list; if it is on local core, we need to lock if we are not
* called from rte_timer_manage() */
if (prev_owner != lcore_id || !local_is_locked)
- rte_spinlock_lock(&priv_timer[prev_owner].list_lock);
+ rte_spinlock_lock(&data->priv_timer[prev_owner].list_lock);

/* save the lowest list entry into the expire field of the dummy hdr.
* NOTE: this is not atomic on 32-bit */
- if (tim == priv_timer[prev_owner].pending_head.sl_next[0])
- priv_timer[prev_owner].pending_head.expire =
+ if (tim == data->priv_timer[prev_owner].pending_head.sl_next[0])
+ data->priv_timer[prev_owner].pending_head.expire =
((tim->sl_next[0] == NULL) ? 0 : tim->sl_next[0]->expire);

/* adjust pointers from previous entries to point past this */
- timer_get_prev_entries_for_node(tim, prev_owner, prev);
- for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) {
+ timer_get_prev_entries_for_node(tim, prev_owner, prev, data);
+ i = data->priv_timer[prev_owner].curr_skiplist_depth - 1;
+ for ( ; i >= 0; i--) {
if (prev[i]->sl_next[i] == tim)
prev[i]->sl_next[i] = tim->sl_next[i];
}

/* in case we deleted last entry at a level, adjust down max level */
- for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--)
- if (priv_timer[prev_owner].pending_head.sl_next[i] == NULL)
- priv_timer[prev_owner].curr_skiplist_depth --;
+ for (i = data->priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0;
+ i--)
+ if (data->priv_timer[prev_owner].pending_head.sl_next[i] ==
+ NULL)
+ data->priv_timer[prev_owner].curr_skiplist_depth--;
else
break;

if (prev_owner != lcore_id || !local_is_locked)
- rte_spinlock_unlock(&priv_timer[prev_owner].list_lock);
+ rte_spinlock_unlock(&data->priv_timer[prev_owner].list_lock);
}

/* Reset and start the timer associated with the timer handle (private func) */
@@ -326,7 +446,8 @@ static int
__rte_timer_reset(struct rte_timer *tim, uint64_t expire,
uint64_t period, unsigned tim_lcore,
rte_timer_cb_t fct, void *arg,
- int local_is_locked)
+ int local_is_locked,
+ struct rte_timer_data *data)
{
union rte_timer_status prev_status, status;
int ret;
@@ -337,9 +458,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
if (lcore_id < RTE_MAX_LCORE) {
/* EAL thread with valid lcore_id */
tim_lcore = rte_get_next_lcore(
- priv_timer[lcore_id].prev_lcore,
+ data->priv_timer[lcore_id].prev_lcore,
0, 1);
- priv_timer[lcore_id].prev_lcore = tim_lcore;
+ data->priv_timer[lcore_id].prev_lcore = tim_lcore;
} else
/* non-EAL thread do not run rte_timer_manage(),
* so schedule the timer on the first enabled lcore. */
@@ -348,20 +469,20 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,

/* wait that the timer is in correct status before update,
* and mark it as being configured */
- ret = timer_set_config_state(tim, &prev_status);
+ ret = timer_set_config_state(tim, &prev_status, data);
if (ret < 0)
return -1;

- __TIMER_STAT_ADD(reset, 1);
+ __TIMER_STAT_ADD(data, reset, 1);
if (prev_status.state == RTE_TIMER_RUNNING &&
lcore_id < RTE_MAX_LCORE) {
- priv_timer[lcore_id].updated = 1;
+ data->priv_timer[lcore_id].updated = 1;
}

/* remove it from list */
if (prev_status.state == RTE_TIMER_PENDING) {
- timer_del(tim, prev_status, local_is_locked);
- __TIMER_STAT_ADD(pending, -1);
+ timer_del(tim, prev_status, local_is_locked, data);
+ __TIMER_STAT_ADD(data, pending, -1);
}

tim->period = period;
@@ -374,10 +495,10 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
* we are not called from rte_timer_manage()
*/
if (tim_lcore != lcore_id || !local_is_locked)
- rte_spinlock_lock(&priv_timer[tim_lcore].list_lock);
+ rte_spinlock_lock(&data->priv_timer[tim_lcore].list_lock);

- __TIMER_STAT_ADD(pending, 1);
- timer_add(tim, tim_lcore);
+ __TIMER_STAT_ADD(data, pending, 1);
+ timer_add(tim, tim_lcore, data);

/* update state: as we are in CONFIG state, only us can modify
* the state so we don't need to use cmpset() here */
@@ -387,7 +508,7 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
tim->status.u32 = status.u32;

if (tim_lcore != lcore_id || !local_is_locked)
- rte_spinlock_unlock(&priv_timer[tim_lcore].list_lock);
+ rte_spinlock_unlock(&data->priv_timer[tim_lcore].list_lock);

return 0;
}
@@ -395,11 +516,23 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
/* Reset and start the timer associated with the timer handle tim */
int
rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
- enum rte_timer_type type, unsigned tim_lcore,
- rte_timer_cb_t fct, void *arg)
+ enum rte_timer_type type, unsigned int tim_lcore,
+ rte_timer_cb_t fct, void *arg)
+{
+ return rte_timer_alt_reset(default_data_id, tim, ticks, type,
+ tim_lcore, fct, arg);
+}
+
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+ uint64_t ticks, enum rte_timer_type type,
+ unsigned int tim_lcore, rte_timer_cb_t fct, void *arg)
{
uint64_t cur_time = rte_get_timer_cycles();
uint64_t period;
+ struct rte_timer_data *timer_data;
+
+ TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);

if (unlikely((tim_lcore != (unsigned)LCORE_ID_ANY) &&
!(rte_lcore_is_enabled(tim_lcore) ||
@@ -412,7 +545,7 @@ rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
period = 0;

return __rte_timer_reset(tim, cur_time + ticks, period, tim_lcore,
- fct, arg, 0);
+ fct, arg, 0, timer_data);
}

/* loop until rte_timer_reset() succeed */
@@ -430,26 +563,35 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
int
rte_timer_stop(struct rte_timer *tim)
{
+ return rte_timer_alt_stop(default_data_id, tim);
+}
+
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
+{
union rte_timer_status prev_status, status;
unsigned lcore_id = rte_lcore_id();
int ret;
+ struct rte_timer_data *timer_data;
+
+ TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);

/* wait that the timer is in correct status before update,
* and mark it as being configured */
- ret = timer_set_config_state(tim, &prev_status);
+ ret = timer_set_config_state(tim, &prev_status, timer_data);
if (ret < 0)
return -1;

- __TIMER_STAT_ADD(stop, 1);
+ __TIMER_STAT_ADD(timer_data, stop, 1);
if (prev_status.state == RTE_TIMER_RUNNING &&
lcore_id < RTE_MAX_LCORE) {
- priv_timer[lcore_id].updated = 1;
+ timer_data->priv_timer[lcore_id].updated = 1;
}

/* remove it from list */
if (prev_status.state == RTE_TIMER_PENDING) {
- timer_del(tim, prev_status, 0);
- __TIMER_STAT_ADD(pending, -1);
+ timer_del(tim, prev_status, 0, timer_data);
+ __TIMER_STAT_ADD(timer_data, pending, -1);
}

/* mark timer as stopped */
@@ -486,13 +628,14 @@ void rte_timer_manage(void)
struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
uint64_t cur_time;
int i, ret;
+ struct rte_timer_data *data = &rte_timer_data_arr[default_data_id];

/* timer manager only runs on EAL thread with valid lcore_id */
assert(lcore_id < RTE_MAX_LCORE);

- __TIMER_STAT_ADD(manage, 1);
+ __TIMER_STAT_ADD(data, manage, 1);
/* optimize for the case where per-cpu list is empty */
- if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
+ if (data->priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
return;
cur_time = rte_get_timer_cycles();

@@ -500,32 +643,34 @@ void rte_timer_manage(void)
/* on 64-bit the value cached in the pending_head.expired will be
* updated atomically, so we can consult that for a quick check here
* outside the lock */
- if (likely(priv_timer[lcore_id].pending_head.expire > cur_time))
+ if (likely(data->priv_timer[lcore_id].pending_head.expire > cur_time))
return;
#endif

/* browse ordered list, add expired timers in 'expired' list */
- rte_spinlock_lock(&priv_timer[lcore_id].list_lock);
+ rte_spinlock_lock(&data->priv_timer[lcore_id].list_lock);

/* if nothing to do just unlock and return */
- if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL ||
- priv_timer[lcore_id].pending_head.sl_next[0]->expire > cur_time) {
- rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
+ if (data->priv_timer[lcore_id].pending_head.sl_next[0] == NULL ||
+ data->priv_timer[lcore_id].pending_head.sl_next[0]->expire >
+ cur_time) {
+ rte_spinlock_unlock(&data->priv_timer[lcore_id].list_lock);
return;
}

/* save start of list of expired timers */
- tim = priv_timer[lcore_id].pending_head.sl_next[0];
+ tim = data->priv_timer[lcore_id].pending_head.sl_next[0];

/* break the existing list at current time point */
- timer_get_prev_entries(cur_time, lcore_id, prev);
- for (i = priv_timer[lcore_id].curr_skiplist_depth -1; i >= 0; i--) {
- if (prev[i] == &priv_timer[lcore_id].pending_head)
+ timer_get_prev_entries(cur_time, lcore_id, prev, data);
+ for (i = data->priv_timer[lcore_id].curr_skiplist_depth - 1; i >= 0;
+ i--) {
+ if (prev[i] == &data->priv_timer[lcore_id].pending_head)
continue;
- priv_timer[lcore_id].pending_head.sl_next[i] =
+ data->priv_timer[lcore_id].pending_head.sl_next[i] =
prev[i]->sl_next[i];
if (prev[i]->sl_next[i] == NULL)
- priv_timer[lcore_id].curr_skiplist_depth--;
+ data->priv_timer[lcore_id].curr_skiplist_depth--;
prev[i] ->sl_next[i] = NULL;
}

@@ -548,25 +693,25 @@ void rte_timer_manage(void)
}

/* update the next to expire timer value */
- priv_timer[lcore_id].pending_head.expire =
- (priv_timer[lcore_id].pending_head.sl_next[0] == NULL) ? 0 :
- priv_timer[lcore_id].pending_head.sl_next[0]->expire;
+ data->priv_timer[lcore_id].pending_head.expire =
+ (data->priv_timer[lcore_id].pending_head.sl_next[0] == NULL) ? 0 :
+ data->priv_timer[lcore_id].pending_head.sl_next[0]->expire;

- rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
+ rte_spinlock_unlock(&data->priv_timer[lcore_id].list_lock);

/* now scan expired list and call callbacks */
for (tim = run_first_tim; tim != NULL; tim = next_tim) {
next_tim = tim->sl_next[0];
- priv_timer[lcore_id].updated = 0;
- priv_timer[lcore_id].running_tim = tim;
+ data->priv_timer[lcore_id].updated = 0;
+ data->priv_timer[lcore_id].running_tim = tim;

/* execute callback function with list unlocked */
tim->f(tim, tim->arg);

- __TIMER_STAT_ADD(pending, -1);
+ __TIMER_STAT_ADD(data, pending, -1);
/* the timer was stopped or reloaded by the callback
* function, we have nothing to do here */
- if (priv_timer[lcore_id].updated == 1)
+ if (data->priv_timer[lcore_id].updated == 1)
continue;

if (tim->period == 0) {
@@ -578,33 +723,217 @@ void rte_timer_manage(void)
}
else {
/* keep it in list and mark timer as pending */
- rte_spinlock_lock(&priv_timer[lcore_id].list_lock);
+ rte_spinlock_lock(
+ &data->priv_timer[lcore_id].list_lock);
status.state = RTE_TIMER_PENDING;
- __TIMER_STAT_ADD(pending, 1);
+ __TIMER_STAT_ADD(data, pending, 1);
status.owner = (int16_t)lcore_id;
rte_wmb();
tim->status.u32 = status.u32;
__rte_timer_reset(tim, tim->expire + tim->period,
- tim->period, lcore_id, tim->f, tim->arg, 1);
- rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
+ tim->period, lcore_id, tim->f, tim->arg, 1,
+ data);
+ rte_spinlock_unlock(
+ &data->priv_timer[lcore_id].list_lock);
+ }
+ }
+ data->priv_timer[lcore_id].running_tim = NULL;
+}
+
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id,
+ unsigned int *poll_lcores,
+ int nb_poll_lcores,
+ rte_timer_alt_manage_cb_t f)
+{
+ union rte_timer_status status;
+ struct rte_timer *tim, *next_tim, **pprev;
+ struct rte_timer *run_first_tims[RTE_MAX_LCORE];
+ unsigned int this_lcore = rte_lcore_id();
+ struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
+ uint64_t cur_time;
+ int i, j, ret;
+ int nb_runlists = 0;
+ struct priv_timer *priv_timer;
+ uint32_t poll_lcore;
+ struct rte_timer_data *data;
+
+ TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);
+
+ /* timer manager only runs on EAL thread with valid lcore_id */
+ assert(this_lcore < RTE_MAX_LCORE);
+
+ __TIMER_STAT_ADD(data, manage, 1);
+
+ if (poll_lcores == NULL) {
+ poll_lcores = (unsigned int []){rte_lcore_id()};
+ nb_poll_lcores = 1;
+ }
+
+ for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
+ poll_lcore = poll_lcores[++i]) {
+ priv_timer = &data->priv_timer[poll_lcore];
+
+ /* optimize for the case where per-cpu list is empty */
+ if (priv_timer->pending_head.sl_next[0] == NULL)
+ continue;
+ cur_time = rte_get_timer_cycles();
+
+#ifdef RTE_ARCH_64
+ /* on 64-bit the value cached in the pending_head.expired will
+ * be updated atomically, so we can consult that for a quick
+ * check here outside the lock
+ */
+ if (likely(priv_timer->pending_head.expire > cur_time))
+ continue;
+#endif
+
+ /* browse ordered list, add expired timers in 'expired' list */
+ rte_spinlock_lock(&priv_timer->list_lock);
+
+ /* if nothing to do just unlock and return */
+ if (priv_timer->pending_head.sl_next[0] == NULL ||
+ priv_timer->pending_head.sl_next[0]->expire > cur_time) {
+ rte_spinlock_unlock(&priv_timer->list_lock);
+ continue;
+ }
+
+ /* save start of list of expired timers */
+ tim = priv_timer->pending_head.sl_next[0];
+
+ /* break the existing list at current time point */
+ timer_get_prev_entries(cur_time, poll_lcore, prev, data);
+ for (j = priv_timer->curr_skiplist_depth - 1; j >= 0; j--) {
+ if (prev[j] == &priv_timer->pending_head)
+ continue;
+
+ priv_timer->pending_head.sl_next[j] =
+ prev[j]->sl_next[j];
+
+ if (prev[j]->sl_next[j] == NULL)
+ priv_timer->curr_skiplist_depth--;
+
+ prev[j]->sl_next[j] = NULL;
+ }
+
+ /* transition run-list from PENDING to RUNNING */
+ run_first_tims[nb_runlists] = tim;
+ pprev = &run_first_tims[nb_runlists];
+ nb_runlists++;
+
+ for ( ; tim != NULL; tim = next_tim) {
+ next_tim = tim->sl_next[0];
+
+ ret = timer_set_running_state(tim);
+ if (likely(ret == 0)) {
+ pprev = &tim->sl_next[0];
+ } else {
+ /* another core is trying to re-config this one,
+ * remove it from local expired list
+ */
+ *pprev = next_tim;
+ }
+ }
+
+ /* update the next to expire timer value */
+ priv_timer->pending_head.expire =
+ (priv_timer->pending_head.sl_next[0] == NULL) ? 0 :
+ priv_timer->pending_head.sl_next[0]->expire;
+
+ rte_spinlock_unlock(&priv_timer->list_lock);
+ }
+
+ /* Now process the run lists */
+ while (1) {
+ bool done = true;
+ uint64_t min_expire = UINT64_MAX;
+ int min_idx = 0;
+
+ /* Find the next oldest timer to process */
+ for (i = 0; i < nb_runlists; i++) {
+ tim = run_first_tims[i];
+
+ if (tim != NULL && tim->expire < min_expire) {
+ min_expire = tim->expire;
+ min_idx = i;
+ done = false;
+ }
+ }
+
+ if (done)
+ break;
+
+ tim = run_first_tims[min_idx];
+
+ /* Move down the runlist from which we picked a timer to
+ * execute
+ */
+ run_first_tims[min_idx] = run_first_tims[min_idx]->sl_next[0];
+
+ priv_timer->updated = 0;
+ priv_timer->running_tim = tim;
+
+ /* Call the provided callback function */
+ f(tim);
+
+ __TIMER_STAT_ADD(data, pending, -1);
+
+ /* the timer was stopped or reloaded by the callback
+ * function, we have nothing to do here
+ */
+ if (priv_timer->updated == 1)
+ continue;
+
+ if (tim->period == 0) {
+ /* remove from done list and mark timer as stopped */
+ status.state = RTE_TIMER_STOP;
+ status.owner = RTE_TIMER_NO_OWNER;
+ rte_wmb();
+ tim->status.u32 = status.u32;
+ } else {
+ /* keep it in list and mark timer as pending */
+ rte_spinlock_lock(
+ &data->priv_timer[this_lcore].list_lock);
+ status.state = RTE_TIMER_PENDING;
+ __TIMER_STAT_ADD(data, pending, 1);
+ status.owner = (int16_t)this_lcore;
+ rte_wmb();
+ tim->status.u32 = status.u32;
+ __rte_timer_reset(tim, tim->expire + tim->period,
+ tim->period, this_lcore, tim->f, tim->arg, 1,
+ data);
+ rte_spinlock_unlock(
+ &data->priv_timer[this_lcore].list_lock);
}
+
+ priv_timer->running_tim = NULL;
}
- priv_timer[lcore_id].running_tim = NULL;
+
+ return 0;
}

/* dump statistics about timers */
void rte_timer_dump_stats(FILE *f)
{
+ rte_timer_alt_dump_stats(default_data_id, f);
+}
+
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id __rte_unused, FILE *f)
+{
#ifdef RTE_LIBRTE_TIMER_DEBUG
struct rte_timer_debug_stats sum;
unsigned lcore_id;
+ struct rte_timer_data *timer_data;
+
+ TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);

memset(&sum, 0, sizeof(sum));
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
- sum.reset += priv_timer[lcore_id].stats.reset;
- sum.stop += priv_timer[lcore_id].stats.stop;
- sum.manage += priv_timer[lcore_id].stats.manage;
- sum.pending += priv_timer[lcore_id].stats.pending;
+ sum.reset += data->priv_timer[lcore_id].stats.reset;
+ sum.stop += data->priv_timer[lcore_id].stats.stop;
+ sum.manage += data->priv_timer[lcore_id].stats.manage;
+ sum.pending += data->priv_timer[lcore_id].stats.pending;
}
fprintf(f, "Timer statistics:\n");
fprintf(f, " reset = %"PRIu64"\n", sum.reset);
@@ -614,4 +943,5 @@ void rte_timer_dump_stats(FILE *f)
#else
fprintf(f, "No timer statistics, RTE_LIBRTE_TIMER_DEBUG is disabled\n");
#endif
+ return 0;
}
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 9b95cd2..9daa334 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -39,6 +39,7 @@
#include <stddef.h>
#include <rte_common.h>
#include <rte_config.h>
+#include <rte_spinlock.h>

#ifdef __cplusplus
extern "C" {
@@ -132,12 +133,52 @@ struct rte_timer
#endif

/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Allocate a timer data instance in shared memory to track a set of pending
+ * timer lists.
+ *
+ * @param id_ptr
+ * Pointer to variable into which to write the identifier of the allocated
+ * timer data instance.
+ *
+ * @return
+ * 0: Success
+ * -ENOSPC: maximum number of timer data instances already allocated
+ */
+int __rte_experimental rte_timer_data_alloc(uint32_t *id_ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Deallocate a timer data instance.
+ *
+ * @param id
+ * Identifier of the timer data instance to deallocate.
+ *
+ * @return
+ * 0: Success
+ * -EINVAL: invalid timer data instance identifier
+ */
+int __rte_experimental rte_timer_data_dealloc(uint32_t id);
+
+/**
* Initialize the timer library.
*
* Initializes internal variables (list, locks and so on) for the RTE
* timer library.
*/
-void rte_timer_subsystem_init(void);
+int rte_timer_subsystem_init(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Free timer subsystem resources.
+ */
+void __rte_experimental rte_timer_subsystem_finalize(void);

/**
* Initialize a timer handle.
@@ -254,7 +295,6 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
*/
int rte_timer_stop(struct rte_timer *tim);

-
/**
* Loop until rte_timer_stop() succeeds.
*
@@ -302,6 +342,130 @@ void rte_timer_manage(void);
*/
void rte_timer_dump_stats(FILE *f);

+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_reset(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list to which
+ * the timer should be added.
+ *
+ * @see rte_timer_reset()
+ *
+ * @param timer_data_id
+ * An identifier indicating which instance of timer data should be used for
+ * this operation.
+ * @param tim
+ * The timer handle.
+ * @param ticks
+ * The number of cycles (see rte_get_hpet_hz()) before the callback
+ * function is called.
+ * @param type
+ * The type can be either:
+ * - PERIODICAL: The timer is automatically reloaded after execution
+ * (returns to the PENDING state)
+ * - SINGLE: The timer is one-shot, that is, the timer goes to a
+ * STOPPED state after execution.
+ * @param tim_lcore
+ * The ID of the lcore where the timer callback function has to be
+ * executed. If tim_lcore is LCORE_ID_ANY, the timer library will
+ * launch it on a different core for each call (round-robin).
+ * @param fct
+ * The callback function of the timer. This parameter can be NULL if (and
+ * only if) rte_timer_alt_manage() will be used to manage this timer.
+ * @param arg
+ * The user argument of the callback function.
+ * @return
+ * - 0: Success; the timer is scheduled.
+ * - (-1): Timer is in the RUNNING or CONFIG state.
+ * - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+ uint64_t ticks, enum rte_timer_type type,
+ unsigned int tim_lcore, rte_timer_cb_t fct, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_stop(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list from which
+ * this timer should be removed.
+ *
+ * @see rte_timer_stop()
+ *
+ * @param timer_data_id
+ * An identifier indicating which instance of timer data should be used for
+ * this operation.
+ * @param tim
+ * The timer handle.
+ * @return
+ * - 0: Success; the timer is stopped.
+ * - (-1): The timer is in the RUNNING or CONFIG state.
+ * - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim);
+
+/**
+ * Callback function type for rte_timer_alt_manage().
+ */
+typedef void (*rte_timer_alt_manage_cb_t)(void *);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Manage a set of timer lists and execute the specified callback function for
+ * all expired timers. This function is similar to rte_timer_manage(), except
+ * that it allows a caller to specify the timer_data instance that should
+ * be operated on, as well as a set of lcore IDs identifying which timer lists
+ * should be processed. Callback functions of individual timers are ignored.
+ *
+ * @see rte_timer_manage()
+ *
+ * @param timer_data_id
+ * An identifier indicating which instance of timer data should be used for
+ * this operation.
+ * @param poll_lcores
+ * An array of lcore ids identifying the timer lists that should be processed.
+ * NULL is allowed - if NULL, the timer list corresponding to the lcore
+ * calling this routine is processed (same as rte_timer_manage()).
+ * @param n_poll_lcores
+ * The size of the poll_lcores array. If 'poll_lcores' is NULL, this parameter
+ * is ignored.
+ * @param f
+ * The callback function which should be called for all expired timers.
+ * @return
+ * - 0: success
+ * - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
+ int n_poll_lcores, rte_timer_alt_manage_cb_t f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_dump_stats(), except that it allows
+ * the caller to specify the rte_timer_data instance that should be used.
+ *
+ * @see rte_timer_dump_stats()
+ *
+ * @param timer_data_id
+ * An identifier indicating which instance of timer data should be used for
+ * this operation.
+ * @param f
+ * A pointer to a file for output
+ * @return
+ * - 0: success
+ * - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id, FILE *f);
+
#ifdef __cplusplus
}
#endif
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index 9b2e4b8..1e6b70d 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -3,13 +3,30 @@ DPDK_2.0 {

rte_timer_dump_stats;
rte_timer_init;
- rte_timer_manage;
rte_timer_pending;
rte_timer_reset;
rte_timer_reset_sync;
rte_timer_stop;
rte_timer_stop_sync;
- rte_timer_subsystem_init;

local: *;
};
+
+DPDK_19.02 {
+ global:
+
+ rte_timer_manage;
+ rte_timer_subsystem_init;
+} DPDK_2.0;
+
+EXPERIMENTAL {
+ global:
+
+ rte_timer_alt_dump_stats;
+ rte_timer_alt_manage;
+ rte_timer_alt_reset;
+ rte_timer_alt_stop;
+ rte_timer_data_alloc;
+ rte_timer_data_dealloc;
+ rte_timer_subsystem_finalize;
+};

--
2.6.4

Erik Gabriel Carrillo

2018-11-29 23:35:13 UTC

Permalink

Add a function to the timer API that allows a caller to traverse a
specified set of timer lists, stopping each timer in each list,
and invoking a callback function.

Signed-off-by: Erik Gabriel Carrillo <***@intel.com>
---
lib/librte_timer/rte_timer.c | 81 +++++++++++++++++++++++++++-------
lib/librte_timer/rte_timer.h | 32 ++++++++++++++
lib/librte_timer/rte_timer_version.map | 1 +
3 files changed, 97 insertions(+), 17 deletions(-)

diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index a76be8b..1eaf755 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -559,39 +559,30 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
rte_pause();
}

-/* Stop the timer associated with the timer handle tim */
-int
-rte_timer_stop(struct rte_timer *tim)
-{
- return rte_timer_alt_stop(default_data_id, tim);
-}
-
-int __rte_experimental
-rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
+static int
+__rte_timer_stop(struct rte_timer *tim, int local_is_locked,
+ struct rte_timer_data *data)
{
union rte_timer_status prev_status, status;
unsigned lcore_id = rte_lcore_id();
int ret;
- struct rte_timer_data *timer_data;
-
- TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);

/* wait that the timer is in correct status before update,
* and mark it as being configured */
- ret = timer_set_config_state(tim, &prev_status, timer_data);
+ ret = timer_set_config_state(tim, &prev_status, data);
if (ret < 0)
return -1;

- __TIMER_STAT_ADD(timer_data, stop, 1);
+ __TIMER_STAT_ADD(data, stop, 1);
if (prev_status.state == RTE_TIMER_RUNNING &&
lcore_id < RTE_MAX_LCORE) {
- timer_data->priv_timer[lcore_id].updated = 1;
+ data->priv_timer[lcore_id].updated = 1;
}

/* remove it from list */
if (prev_status.state == RTE_TIMER_PENDING) {
- timer_del(tim, prev_status, 0, timer_data);
- __TIMER_STAT_ADD(timer_data, pending, -1);
+ timer_del(tim, prev_status, local_is_locked, data);
+ __TIMER_STAT_ADD(data, pending, -1);
}

/* mark timer as stopped */
@@ -603,6 +594,23 @@ rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
return 0;
}

+/* Stop the timer associated with the timer handle tim */
+int
+rte_timer_stop(struct rte_timer *tim)
+{
+ return rte_timer_alt_stop(default_data_id, tim);
+}
+
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
+{
+ struct rte_timer_data *timer_data;
+
+ TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+ return __rte_timer_stop(tim, 0, timer_data);
+}
+
/* loop until rte_timer_stop() succeed */
void
rte_timer_stop_sync(struct rte_timer *tim)
@@ -912,6 +920,45 @@ rte_timer_alt_manage(uint32_t timer_data_id,
return 0;
}

+/* Walk pending lists, stopping timers and calling user-specified function */
+int __rte_experimental
+rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores,
+ int nb_walk_lcores,
+ rte_timer_stop_all_cb_t f, void *f_arg)
+{
+ int i;
+ struct priv_timer *priv_timer;
+ uint32_t walk_lcore;
+ struct rte_timer *tim, *next_tim;
+ struct rte_timer_data *timer_data;
+
+ TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+ for (i = 0, walk_lcore = walk_lcores[i];
+ i < nb_walk_lcores;
+ walk_lcore = walk_lcores[++i]) {
+ priv_timer = &timer_data->priv_timer[walk_lcore];
+
+ rte_spinlock_lock(&priv_timer->list_lock);
+
+ for (tim = priv_timer->pending_head.sl_next[0];
+ tim != NULL;
+ tim = next_tim) {
+ next_tim = tim->sl_next[0];
+
+ /* Call timer_stop with lock held */
+ __rte_timer_stop(tim, 1, timer_data);
+
+ if (f)
+ f(tim, f_arg);
+ }
+
+ rte_spinlock_unlock(&priv_timer->list_lock);
+ }
+
+ return 0;
+}
+
/* dump statistics about timers */
void rte_timer_dump_stats(FILE *f)
{
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 9daa334..27b1ebd 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -446,6 +446,38 @@ rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
int n_poll_lcores, rte_timer_alt_manage_cb_t f);

/**
+ * Callback function type for rte_timer_stop_all().
+ */
+typedef void (*rte_timer_stop_all_cb_t)(struct rte_timer *tim, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Walk the pending timer lists for the specified lcore IDs, and for each timer
+ * that is encountered, stop it and call the specified callback function to
+ * process it further.
+ *
+ * @param timer_data_id
+ * An identifier indicating which instance of timer data should be used for
+ * this operation.
+ * @param walk_lcores
+ * An array of lcore ids identifying the timer lists that should be processed.
+ * @param nb_walk_lcores
+ * The size of the walk_lcores array.
+ * @param f
+ * The callback function which should be called for each timers. Can be NULL.
+ * @param f_arg
+ * An arbitrary argument that will be passed to f, if it is called.
+ * @return
+ * - 0: success
+ * - EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores,
+ int nb_walk_lcores, rte_timer_stop_all_cb_t f, void *f_arg);
+
+/**
* @warning
* @b EXPERIMENTAL: this API may change without prior notice
*
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index 1e6b70d..0fab845 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -28,5 +28,6 @@ EXPERIMENTAL {
rte_timer_alt_stop;
rte_timer_data_alloc;
rte_timer_data_dealloc;
+ rte_timer_stop_all;
rte_timer_subsystem_finalize;
};

--
2.6.4

Erik Gabriel Carrillo

2018-11-29 23:35:14 UTC

Permalink

This commit updates the implementation of the software event timer
adapter. The original version used rings to let producer cores (and
secondary processes) send timers to a service core, which would then arm
or cancel the timers, depending on what the application had requested.
The ring can be a bottleneck, so we replace the original implementation
with one that uses new APIs introduced in the timer library. The new
APIs allow the underlying timer skiplists to be allocated in shared
memory, which allows the producer cores in both primary and secondary
processes to install timers directly into the lists, obviating the need
for a ring. Each producer core also gets a unique timer list to insert
timers into, so no contention occurs there. The adapter's service
function can utilize a new flavor of rte_timer_manage() that can traverse
multiple timer lists, and also accepts a callback function. The callback
function is only called from the primary process, since that's where the
service runs, and the callback is the same for all timers - it is defined
to enqueue a timer expiry event in the event device.

Signed-off-by: Erik Gabriel Carrillo <***@intel.com>
---
lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++---------------
1 file changed, 275 insertions(+), 412 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index 79070d4..9c528cb 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -7,6 +7,7 @@
#include <inttypes.h>
#include <stdbool.h>
#include <sys/queue.h>
+#include <assert.h>

#include <rte_memzone.h>
#include <rte_memory.h>
@@ -19,6 +20,7 @@
#include <rte_timer.h>
#include <rte_service_component.h>
#include <rte_cycles.h>
+#include <rte_random.h>

#include "rte_eventdev.h"
#include "rte_eventdev_pmd.h"
@@ -34,7 +36,7 @@ static int evtim_buffer_logtype;

static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];

-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops;
+static const struct rte_event_timer_adapter_ops swtim_ops;

#define EVTIM_LOG(level, logtype, ...) \
rte_log(RTE_LOG_ ## level, logtype, \
@@ -211,7 +213,7 @@ rte_event_timer_adapter_create_ext(
* implementation.
*/
if (adapter->ops == NULL)
- adapter->ops = &sw_event_adapter_timer_ops;
+ adapter->ops = &swtim_ops;

/* Allow driver to do some setup */
FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
@@ -334,7 +336,7 @@ rte_event_timer_adapter_lookup(uint16_t adapter_id)
* implementation.
*/
if (adapter->ops == NULL)
- adapter->ops = &sw_event_adapter_timer_ops;
+ adapter->ops = &swtim_ops;

/* Set fast-path function pointers */
adapter->arm_burst = adapter->ops->arm_burst;
@@ -491,6 +493,7 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
}

*nb_events_inv = 0;
+
*nb_events_flushed = rte_event_enqueue_burst(dev_id, port_id,
&events[tail_idx], n);
if (*nb_events_flushed != n && rte_errno == -EINVAL) {
@@ -498,137 +501,123 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
(*nb_events_inv)++;
}

+ if (*nb_events_flushed > 0)
+ EVTIM_BUF_LOG_DBG("enqueued %"PRIu16" timer events to event "
+ "device", *nb_events_flushed);
+
bufp->tail = bufp->tail + *nb_events_flushed + *nb_events_inv;
}

/*
* Software event timer adapter implementation
*/
-
-struct rte_event_timer_adapter_sw_data {
- /* List of messages for outstanding timers */
- TAILQ_HEAD(, msg) msgs_tailq_head;
- /* Lock to guard tailq and armed count */
- rte_spinlock_t msgs_tailq_sl;
+struct swtim {
/* Identifier of service executing timer management logic. */
uint32_t service_id;
/* The cycle count at which the adapter should next tick */
uint64_t next_tick_cycles;
- /* Incremented as the service moves through phases of an iteration */
- volatile int service_phase;
/* The tick resolution used by adapter instance. May have been
* adjusted from what user requested
*/
uint64_t timer_tick_ns;
/* Maximum timeout in nanoseconds allowed by adapter instance. */
uint64_t max_tmo_ns;
- /* Ring containing messages to arm or cancel event timers */
- struct rte_ring *msg_ring;
- /* Mempool containing msg objects */
- struct rte_mempool *msg_pool;
/* Buffered timer expiry events to be enqueued to an event device. */
struct event_buffer buffer;
/* Statistics */
struct rte_event_timer_adapter_stats stats;
- /* The number of threads currently adding to the message ring */
- rte_atomic16_t message_producer_count;
+ /* Mempool of timer objects */
+ struct rte_mempool *tim_pool;
+ /* Back pointer for convenience */
+ struct rte_event_timer_adapter *adapter;
+ /* Identifier of timer data instance */
+ uint32_t timer_data_id;
+ /* Track which cores have actually armed a timer */
+ rte_atomic16_t in_use[RTE_MAX_LCORE];
+ /* Track which cores' timer lists should be polled */
+ unsigned int poll_lcores[RTE_MAX_LCORE];
+ /* The number of lists that should be polled */
+ int n_poll_lcores;
+ /* Lock to atomically access the above two variables */
+ rte_spinlock_t poll_lcores_sl;
};

-enum msg_type {MSG_TYPE_ARM, MSG_TYPE_CANCEL};
-
-struct msg {
- enum msg_type type;
- struct rte_event_timer *evtim;
- struct rte_timer tim;
- TAILQ_ENTRY(msg) msgs;
-};
+static inline struct swtim *
+swtim_pmd_priv(const struct rte_event_timer_adapter *adapter)
+{
+ return adapter->data->adapter_priv;
+}

static void
-sw_event_timer_cb(struct rte_timer *tim, void *arg)
+swtim_callback(void *arg)
{
- int ret;
+ struct rte_timer *tim = arg;
+ struct rte_event_timer *evtim = tim->arg;
+ struct rte_event_timer_adapter *adapter;
+ struct swtim *sw;
uint16_t nb_evs_flushed = 0;
uint16_t nb_evs_invalid = 0;
uint64_t opaque;
- struct rte_event_timer *evtim;
- struct rte_event_timer_adapter *adapter;
- struct rte_event_timer_adapter_sw_data *sw_data;
+ int ret;

- evtim = arg;
opaque = evtim->impl_opaque[1];
adapter = (struct rte_event_timer_adapter *)(uintptr_t)opaque;
- sw_data = adapter->data->adapter_priv;
+ sw = swtim_pmd_priv(adapter);

- ret = event_buffer_add(&sw_data->buffer, &evtim->ev);
+ ret = event_buffer_add(&sw->buffer, &evtim->ev);
if (ret < 0) {
/* If event buffer is full, put timer back in list with
* immediate expiry value, so that we process it again on the
* next iteration.
*/
- rte_timer_reset_sync(tim, 0, SINGLE, rte_lcore_id(),
- sw_event_timer_cb, evtim);
+ rte_timer_alt_reset(sw->timer_data_id, tim, 0, SINGLE,
+ rte_lcore_id(), NULL, evtim);
+
+ sw->stats.evtim_retry_count++;

- sw_data->stats.evtim_retry_count++;
EVTIM_LOG_DBG("event buffer full, resetting rte_timer with "
"immediate expiry value");
} else {
- struct msg *m = container_of(tim, struct msg, tim);
- TAILQ_REMOVE(&sw_data->msgs_tailq_head, m, msgs);
EVTIM_BUF_LOG_DBG("buffered an event timer expiry event");
- evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
+ rte_mempool_put(sw->tim_pool, tim);
+ sw->stats.evtim_exp_count++;

- /* Free the msg object containing the rte_timer now that
- * we've buffered its event successfully.
- */
- rte_mempool_put(sw_data->msg_pool, m);
-
- /* Bump the count when we successfully add an expiry event to
- * the buffer.
- */
- sw_data->stats.evtim_exp_count++;
+ evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
}

- if (event_buffer_batch_ready(&sw_data->buffer)) {
- event_buffer_flush(&sw_data->buffer,
+ if (event_buffer_batch_ready(&sw->buffer)) {
+ event_buffer_flush(&sw->buffer,
adapter->data->event_dev_id,
adapter->data->event_port_id,
&nb_evs_flushed,
&nb_evs_invalid);

- sw_data->stats.ev_enq_count += nb_evs_flushed;
- sw_data->stats.ev_inv_count += nb_evs_invalid;
+ sw->stats.ev_enq_count += nb_evs_flushed;
+ sw->stats.ev_inv_count += nb_evs_invalid;
}
}

static __rte_always_inline uint64_t
get_timeout_cycles(struct rte_event_timer *evtim,
- struct rte_event_timer_adapter *adapter)
+ const struct rte_event_timer_adapter *adapter)
{
- uint64_t timeout_ns;
- struct rte_event_timer_adapter_sw_data *sw_data;
-
- sw_data = adapter->data->adapter_priv;
- timeout_ns = evtim->timeout_ticks * sw_data->timer_tick_ns;
+ struct swtim *sw = swtim_pmd_priv(adapter);
+ uint64_t timeout_ns = evtim->timeout_ticks * sw->timer_tick_ns;
return timeout_ns * rte_get_timer_hz() / NSECPERSEC;
-
}

/* This function returns true if one or more (adapter) ticks have occurred since
* the last time it was called.
*/
static inline bool
-adapter_did_tick(struct rte_event_timer_adapter *adapter)
+swtim_did_tick(struct swtim *sw)
{
uint64_t cycles_per_adapter_tick, start_cycles;
uint64_t *next_tick_cyclesp;
- struct rte_event_timer_adapter_sw_data *sw_data;
-
- sw_data = adapter->data->adapter_priv;
- next_tick_cyclesp = &sw_data->next_tick_cycles;

- cycles_per_adapter_tick = sw_data->timer_tick_ns *
+ next_tick_cyclesp = &sw->next_tick_cycles;
+ cycles_per_adapter_tick = sw->timer_tick_ns *
(rte_get_timer_hz() / NSECPERSEC);
-
start_cycles = rte_get_timer_cycles();

/* Note: initially, *next_tick_cyclesp == 0, so the clause below will
@@ -640,7 +629,6 @@ adapter_did_tick(struct rte_event_timer_adapter *adapter)
* boundary.
*/
start_cycles -= start_cycles % cycles_per_adapter_tick;
-
*next_tick_cyclesp = start_cycles + cycles_per_adapter_tick;

return true;
@@ -655,15 +643,12 @@ check_timeout(struct rte_event_timer *evtim,
const struct rte_event_timer_adapter *adapter)
{
uint64_t tmo_nsec;
- struct rte_event_timer_adapter_sw_data *sw_data;
-
- sw_data = adapter->data->adapter_priv;
- tmo_nsec = evtim->timeout_ticks * sw_data->timer_tick_ns;
+ struct swtim *sw = swtim_pmd_priv(adapter);

- if (tmo_nsec > sw_data->max_tmo_ns)
+ tmo_nsec = evtim->timeout_ticks * sw->timer_tick_ns;
+ if (tmo_nsec > sw->max_tmo_ns)
return -1;
-
- if (tmo_nsec < sw_data->timer_tick_ns)
+ if (tmo_nsec < sw->timer_tick_ns)
return -2;

return 0;
@@ -691,110 +676,34 @@ check_destination_event_queue(struct rte_event_timer *evtim,
return 0;
}

-#define NB_OBJS 32
static int
-sw_event_timer_adapter_service_func(void *arg)
+swtim_service_func(void *arg)
{
- int i, num_msgs;
- uint64_t cycles, opaque;
+ struct rte_event_timer_adapter *adapter = arg;
+ struct swtim *sw = swtim_pmd_priv(adapter);
uint16_t nb_evs_flushed = 0;
uint16_t nb_evs_invalid = 0;
- struct rte_event_timer_adapter *adapter;
- struct rte_event_timer_adapter_sw_data *sw_data;
- struct rte_event_timer *evtim = NULL;
- struct rte_timer *tim = NULL;
- struct msg *msg, *msgs[NB_OBJS];
-
- adapter = arg;
- sw_data = adapter->data->adapter_priv;
-
- sw_data->service_phase = 1;
- rte_smp_wmb();
-
- while (rte_atomic16_read(&sw_data->message_producer_count) > 0 ||
- !rte_ring_empty(sw_data->msg_ring)) {
-
- num_msgs = rte_ring_dequeue_burst(sw_data->msg_ring,
- (void **)msgs, NB_OBJS, NULL);
-
- for (i = 0; i < num_msgs; i++) {
- int ret = 0;
-
- RTE_SET_USED(ret);
-
- msg = msgs[i];
- evtim = msg->evtim;
-
- switch (msg->type) {
- case MSG_TYPE_ARM:
- EVTIM_SVC_LOG_DBG("dequeued ARM message from "
- "ring");
- tim = &msg->tim;
- rte_timer_init(tim);
- cycles = get_timeout_cycles(evtim,
- adapter);
- ret = rte_timer_reset(tim, cycles, SINGLE,
- rte_lcore_id(),
- sw_event_timer_cb,
- evtim);
- RTE_ASSERT(ret == 0);
-
- evtim->impl_opaque[0] = (uintptr_t)tim;
- evtim->impl_opaque[1] = (uintptr_t)adapter;
-
- TAILQ_INSERT_TAIL(&sw_data->msgs_tailq_head,
- msg,
- msgs);
- break;
- case MSG_TYPE_CANCEL:
- EVTIM_SVC_LOG_DBG("dequeued CANCEL message "
- "from ring");
- opaque = evtim->impl_opaque[0];
- tim = (struct rte_timer *)(uintptr_t)opaque;
- RTE_ASSERT(tim != NULL);
-
- ret = rte_timer_stop(tim);
- RTE_ASSERT(ret == 0);
-
- /* Free the msg object for the original arm
- * request.
- */
- struct msg *m;
- m = container_of(tim, struct msg, tim);
- TAILQ_REMOVE(&sw_data->msgs_tailq_head, m,
- msgs);
- rte_mempool_put(sw_data->msg_pool, m);
-
- /* Free the msg object for the current msg */
- rte_mempool_put(sw_data->msg_pool, msg);
-
- evtim->impl_opaque[0] = 0;
- evtim->impl_opaque[1] = 0;
-
- break;
- }
- }
- }
-
- sw_data->service_phase = 2;
- rte_smp_wmb();

- if (adapter_did_tick(adapter)) {
- rte_timer_manage();
+ if (swtim_did_tick(sw)) {
+ /* This lock is seldom acquired on the arm side */
+ rte_spinlock_lock(&sw->poll_lcores_sl);
+ rte_timer_alt_manage(sw->timer_data_id,
+ sw->poll_lcores,
+ sw->n_poll_lcores,
+ swtim_callback);
+ rte_spinlock_unlock(&sw->poll_lcores_sl);

- event_buffer_flush(&sw_data->buffer,
+ event_buffer_flush(&sw->buffer,
adapter->data->event_dev_id,
adapter->data->event_port_id,
- &nb_evs_flushed, &nb_evs_invalid);
+ &nb_evs_flushed,
+ &nb_evs_invalid);

- sw_data->stats.ev_enq_count += nb_evs_flushed;
- sw_data->stats.ev_inv_count += nb_evs_invalid;
- sw_data->stats.adapter_tick_count++;
+ sw->stats.ev_enq_count += nb_evs_flushed;
+ sw->stats.ev_inv_count += nb_evs_invalid;
+ sw->stats.adapter_tick_count++;
}

- sw_data->service_phase = 0;
- rte_smp_wmb();
-
return 0;
}

@@ -828,168 +737,145 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
return cache_size;
}

-#define SW_MIN_INTERVAL 1E5
-
static int
-sw_event_timer_adapter_init(struct rte_event_timer_adapter *adapter)
+swtim_init(struct rte_event_timer_adapter *adapter)
{
- int ret;
- struct rte_event_timer_adapter_sw_data *sw_data;
- uint64_t nb_timers;
+ int i, ret;
+ struct swtim *sw;
unsigned int flags;
struct rte_service_spec service;
- static bool timer_subsystem_inited; // static initialized to false

- /* Allocate storage for SW implementation data */
- char priv_data_name[RTE_RING_NAMESIZE];
- snprintf(priv_data_name, RTE_RING_NAMESIZE, "sw_evtim_adap_priv_%"PRIu8,
- adapter->data->id);
- adapter->data->adapter_priv = rte_zmalloc_socket(
- priv_data_name,
- sizeof(struct rte_event_timer_adapter_sw_data),
- RTE_CACHE_LINE_SIZE,
- adapter->data->socket_id);
- if (adapter->data->adapter_priv == NULL) {
+ /* Allocate storage for private data area */
+#define SWTIM_NAMESIZE 32
+ char swtim_name[SWTIM_NAMESIZE];
+ snprintf(swtim_name, SWTIM_NAMESIZE, "swtim_%"PRIu8,
+ adapter->data->id);
+ sw = rte_zmalloc_socket(swtim_name, sizeof(*sw), RTE_CACHE_LINE_SIZE,
+ adapter->data->socket_id);
+ if (sw == NULL) {
EVTIM_LOG_ERR("failed to allocate space for private data");
rte_errno = ENOMEM;
return -1;
}

- if (adapter->data->conf.timer_tick_ns < SW_MIN_INTERVAL) {
- EVTIM_LOG_ERR("failed to create adapter with requested tick "
- "interval");
- rte_errno = EINVAL;
- return -1;
- }
-
- sw_data = adapter->data->adapter_priv;
-
- sw_data->timer_tick_ns = adapter->data->conf.timer_tick_ns;
- sw_data->max_tmo_ns = adapter->data->conf.max_tmo_ns;
+ /* Connect storage to adapter instance */
+ adapter->data->adapter_priv = sw;
+ sw->adapter = adapter;

- TAILQ_INIT(&sw_data->msgs_tailq_head);
- rte_spinlock_init(&sw_data->msgs_tailq_sl);
- rte_atomic16_init(&sw_data->message_producer_count);
-
- /* Rings require power of 2, so round up to next such value */
- nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
-
- char msg_ring_name[RTE_RING_NAMESIZE];
- snprintf(msg_ring_name, RTE_RING_NAMESIZE,
- "sw_evtim_adap_msg_ring_%"PRIu8, adapter->data->id);
- flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
- RING_F_SP_ENQ | RING_F_SC_DEQ :
- RING_F_SC_DEQ;
- sw_data->msg_ring = rte_ring_create(msg_ring_name, nb_timers,
- adapter->data->socket_id, flags);
- if (sw_data->msg_ring == NULL) {
- EVTIM_LOG_ERR("failed to create message ring");
- rte_errno = ENOMEM;
- goto free_priv_data;
- }
+ sw->timer_tick_ns = adapter->data->conf.timer_tick_ns;
+ sw->max_tmo_ns = adapter->data->conf.max_tmo_ns;

- char pool_name[RTE_RING_NAMESIZE];
- snprintf(pool_name, RTE_RING_NAMESIZE, "sw_evtim_adap_msg_pool_%"PRIu8,
+ /* Create a timer pool */
+ char pool_name[SWTIM_NAMESIZE];
+ snprintf(pool_name, SWTIM_NAMESIZE, "swtim_pool_%"PRIu8,
adapter->data->id);
-
- /* Both the arming/canceling thread and the service thread will do puts
- * to the mempool, but if the SP_PUT flag is enabled, we can specify
- * single-consumer get for the mempool.
- */
- flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
- MEMPOOL_F_SC_GET : 0;
-
- /* The usable size of a ring is count - 1, so subtract one here to
- * make the counts agree.
- */
+ /* Optimal mempool size is a power of 2 minus one */
+ uint64_t nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
int pool_size = nb_timers - 1;
int cache_size = compute_msg_mempool_cache_size(
adapter->data->conf.nb_timers, nb_timers);
- sw_data->msg_pool = rte_mempool_create(pool_name, pool_size,
- sizeof(struct msg), cache_size,
- 0, NULL, NULL, NULL, NULL,
- adapter->data->socket_id, flags);
- if (sw_data->msg_pool == NULL) {
- EVTIM_LOG_ERR("failed to create message object mempool");
+ flags = 0; /* pool is multi-producer, multi-consumer */
+ sw->tim_pool = rte_mempool_create(pool_name, pool_size,
+ sizeof(struct rte_timer), cache_size, 0, NULL, NULL,
+ NULL, NULL, adapter->data->socket_id, flags);
+ if (sw->tim_pool == NULL) {
+ EVTIM_LOG_ERR("failed to create timer object mempool");
rte_errno = ENOMEM;
- goto free_msg_ring;
+ goto free_alloc;
+ }
+
+ /* Initialize the variables that track in-use timer lists */
+ rte_spinlock_init(&sw->poll_lcores_sl);
+ for (i = 0; i < RTE_MAX_LCORE; i++)
+ rte_atomic16_init(&sw->in_use[i]);
+
+ /* Initialize the timer subsystem and allocate timer data instance */
+ ret = rte_timer_subsystem_init();
+ if (ret < 0) {
+ if (ret != -EALREADY) {
+ EVTIM_LOG_ERR("failed to initialize timer subsystem");
+ rte_errno = ret;
+ goto free_mempool;
+ }
+ }
+
+ ret = rte_timer_data_alloc(&sw->timer_data_id);
+ if (ret < 0) {
+ EVTIM_LOG_ERR("failed to allocate timer data instance");
+ rte_errno = ret;
+ goto free_mempool;
}

- event_buffer_init(&sw_data->buffer);
+ /* Initialize timer event buffer */
+ event_buffer_init(&sw->buffer);
+
+ sw->adapter = adapter;

/* Register a service component to run adapter logic */
memset(&service, 0, sizeof(service));
snprintf(service.name, RTE_SERVICE_NAME_MAX,
- "sw_evimer_adap_svc_%"PRIu8, adapter->data->id);
+ "swtim_svc_%"PRIu8, adapter->data->id);
service.socket_id = adapter->data->socket_id;
- service.callback = sw_event_timer_adapter_service_func;
+ service.callback = swtim_service_func;
service.callback_userdata = adapter;
service.capabilities &= ~(RTE_SERVICE_CAP_MT_SAFE);
- ret = rte_service_component_register(&service, &sw_data->service_id);
+ ret = rte_service_component_register(&service, &sw->service_id);
if (ret < 0) {
EVTIM_LOG_ERR("failed to register service %s with id %"PRIu32
- ": err = %d", service.name, sw_data->service_id,
+ ": err = %d", service.name, sw->service_id,
ret);

rte_errno = ENOSPC;
- goto free_msg_pool;
+ goto free_mempool;
}

EVTIM_LOG_DBG("registered service %s with id %"PRIu32, service.name,
- sw_data->service_id);
+ sw->service_id);

- adapter->data->service_id = sw_data->service_id;
+ adapter->data->service_id = sw->service_id;
adapter->data->service_inited = 1;

- if (!timer_subsystem_inited) {
- rte_timer_subsystem_init();
- timer_subsystem_inited = true;
- }
-
return 0;
-
-free_msg_pool:
- rte_mempool_free(sw_data->msg_pool);
-free_msg_ring:
- rte_ring_free(sw_data->msg_ring);
-free_priv_data:
- rte_free(sw_data);
+free_mempool:
+ rte_mempool_free(sw->tim_pool);
+free_alloc:
+ rte_free(sw);
return -1;
}

-static int
-sw_event_timer_adapter_uninit(struct rte_event_timer_adapter *adapter)
+static void
+swtim_free_tim(struct rte_timer *tim, void *arg)
{
- int ret;
- struct msg *m1, *m2;
- struct rte_event_timer_adapter_sw_data *sw_data =
- adapter->data->adapter_priv;
+ struct swtim *sw = arg;

- rte_spinlock_lock(&sw_data->msgs_tailq_sl);
-
- /* Cancel outstanding rte_timers and free msg objects */
- m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head);
- while (m1 != NULL) {
- EVTIM_LOG_DBG("freeing outstanding timer");
- m2 = TAILQ_NEXT(m1, msgs);
-
- rte_timer_stop_sync(&m1->tim);
- rte_mempool_put(sw_data->msg_pool, m1);
+ rte_mempool_put(sw->tim_pool, (void *)tim);
+}

- m1 = m2;
- }
+/* Traverse the list of outstanding timers and put them back in the mempool
+ * before freeing the adapter to avoid leaking the memory.
+ */
+static int
+swtim_uninit(struct rte_event_timer_adapter *adapter)
+{
+ int ret;
+ struct swtim *sw = swtim_pmd_priv(adapter);

- rte_spinlock_unlock(&sw_data->msgs_tailq_sl);
+ /* Free outstanding timers */
+ rte_timer_stop_all(sw->timer_data_id,
+ sw->poll_lcores,
+ sw->n_poll_lcores,
+ swtim_free_tim,
+ sw);

- ret = rte_service_component_unregister(sw_data->service_id);
+ ret = rte_service_component_unregister(sw->service_id);
if (ret < 0) {
EVTIM_LOG_ERR("failed to unregister service component");
return ret;
}

- rte_ring_free(sw_data->msg_ring);
- rte_mempool_free(sw_data->msg_pool);
- rte_free(adapter->data->adapter_priv);
+ rte_mempool_free(sw->tim_pool);
+ rte_free(sw);
+ adapter->data->adapter_priv = NULL;

return 0;
}
@@ -1010,88 +896,79 @@ get_mapped_count_for_service(uint32_t service_id)
}

static int
-sw_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+swtim_start(const struct rte_event_timer_adapter *adapter)
{
int mapped_count;
- struct rte_event_timer_adapter_sw_data *sw_data;
-
- sw_data = adapter->data->adapter_priv;
+ struct swtim *sw = swtim_pmd_priv(adapter);

/* Mapping the service to more than one service core can introduce
* delays while one thread is waiting to acquire a lock, so only allow
* one core to be mapped to the service.
+ *
+ * Note: the service could be modified such that it spreads cores to
+ * poll over multiple service instances.
*/
- mapped_count = get_mapped_count_for_service(sw_data->service_id);
+ mapped_count = get_mapped_count_for_service(sw->service_id);

- if (mapped_count == 1)
- return rte_service_component_runstate_set(sw_data->service_id,
- 1);
+ if (mapped_count != 1)
+ return mapped_count < 1 ? -ENOENT : -ENOTSUP;

- return mapped_count < 1 ? -ENOENT : -ENOTSUP;
+ return rte_service_component_runstate_set(sw->service_id, 1);
}

static int
-sw_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+swtim_stop(const struct rte_event_timer_adapter *adapter)
{
int ret;
- struct rte_event_timer_adapter_sw_data *sw_data =
- adapter->data->adapter_priv;
+ struct swtim *sw = swtim_pmd_priv(adapter);

- ret = rte_service_component_runstate_set(sw_data->service_id, 0);
+ ret = rte_service_component_runstate_set(sw->service_id, 0);
if (ret < 0)
return ret;

- /* Wait for the service to complete its final iteration before
- * stopping.
- */
- while (sw_data->service_phase != 0)
+ /* Wait for the service to complete its final iteration */
+ while (rte_service_may_be_active(sw->service_id))
rte_pause();

- rte_smp_rmb();
-
return 0;
}

static void
-sw_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+swtim_get_info(const struct rte_event_timer_adapter *adapter,
struct rte_event_timer_adapter_info *adapter_info)
{
- struct rte_event_timer_adapter_sw_data *sw_data;
- sw_data = adapter->data->adapter_priv;
-
- adapter_info->min_resolution_ns = sw_data->timer_tick_ns;
- adapter_info->max_tmo_ns = sw_data->max_tmo_ns;
+ struct swtim *sw = swtim_pmd_priv(adapter);
+ adapter_info->min_resolution_ns = sw->timer_tick_ns;
+ adapter_info->max_tmo_ns = sw->max_tmo_ns;
}

static int
-sw_event_timer_adapter_stats_get(const struct rte_event_timer_adapter *adapter,
- struct rte_event_timer_adapter_stats *stats)
+swtim_stats_get(const struct rte_event_timer_adapter *adapter,
+ struct rte_event_timer_adapter_stats *stats)
{
- struct rte_event_timer_adapter_sw_data *sw_data;
- sw_data = adapter->data->adapter_priv;
- *stats = sw_data->stats;
+ struct swtim *sw = swtim_pmd_priv(adapter);
+ *stats = sw->stats; /* structure copy */
return 0;
}

static int
-sw_event_timer_adapter_stats_reset(
- const struct rte_event_timer_adapter *adapter)
+swtim_stats_reset(const struct rte_event_timer_adapter *adapter)
{
- struct rte_event_timer_adapter_sw_data *sw_data;
- sw_data = adapter->data->adapter_priv;
- memset(&sw_data->stats, 0, sizeof(sw_data->stats));
+ struct swtim *sw = swtim_pmd_priv(adapter);
+ memset(&sw->stats, 0, sizeof(sw->stats));
return 0;
}

-static __rte_always_inline uint16_t
-__sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
- struct rte_event_timer **evtims,
- uint16_t nb_evtims)
+static uint16_t
+__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+ struct rte_event_timer **evtims,
+ uint16_t nb_evtims)
{
- uint16_t i;
- int ret;
- struct rte_event_timer_adapter_sw_data *sw_data;
- struct msg *msgs[nb_evtims];
+ int i, ret;
+ struct swtim *sw = swtim_pmd_priv(adapter);
+ uint32_t lcore_id = rte_lcore_id();
+ struct rte_timer *tim, *tims[nb_evtims];
+ uint64_t cycles;

#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
/* Check that the service is running. */
@@ -1101,101 +978,104 @@ __sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
}
#endif

- sw_data = adapter->data->adapter_priv;
+ /* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
+ * the highest lcore to insert such timers into
+ */
+ if (lcore_id == LCORE_ID_ANY)
+ lcore_id = RTE_MAX_LCORE - 1;
+
+ /* If this is the first time we're arming an event timer on this lcore,
+ * mark this lcore as "in use"; this will cause the service
+ * function to process the timer list that corresponds to this lcore.
+ */
+ if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id]))) {
+ rte_spinlock_lock(&sw->poll_lcores_sl);
+ EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
+ lcore_id);
+ sw->poll_lcores[sw->n_poll_lcores++] = lcore_id;
+ rte_spinlock_unlock(&sw->poll_lcores_sl);
+ }

- ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
+ ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
+ nb_evtims);
if (ret < 0) {
rte_errno = ENOSPC;
return 0;
}

- /* Let the service know we're producing messages for it to process */
- rte_atomic16_inc(&sw_data->message_producer_count);
-
- /* If the service is managing timers, wait for it to finish */
- while (sw_data->service_phase == 2)
- rte_pause();
-
- rte_smp_rmb();
-
for (i = 0; i < nb_evtims; i++) {
/* Don't modify the event timer state in these cases */
if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
rte_errno = EALREADY;
break;
} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
- evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+ evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
rte_errno = EINVAL;
break;
}

ret = check_timeout(evtims[i], adapter);
- if (ret == -1) {
+ if (unlikely(ret == -1)) {
evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
rte_errno = EINVAL;
break;
- }
- if (ret == -2) {
+ } else if (unlikely(ret == -2)) {
evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
rte_errno = EINVAL;
break;
}

- if (check_destination_event_queue(evtims[i], adapter) < 0) {
+ if (unlikely(check_destination_event_queue(evtims[i],
+ adapter) < 0)) {
evtims[i]->state = RTE_EVENT_TIMER_ERROR;
rte_errno = EINVAL;
break;
}

- /* Checks passed, set up a message to enqueue */
- msgs[i]->type = MSG_TYPE_ARM;
- msgs[i]->evtim = evtims[i];
+ tim = tims[i];
+ rte_timer_init(tim);

- /* Set the payload pointer if not set. */
- if (evtims[i]->ev.event_ptr == NULL)
- evtims[i]->ev.event_ptr = evtims[i];
+ evtims[i]->impl_opaque[0] = (uintptr_t)tim;
+ evtims[i]->impl_opaque[1] = (uintptr_t)adapter;

- /* msg objects that get enqueued successfully will be freed
- * either by a future cancel operation or by the timer
- * expiration callback.
- */
- if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
- rte_errno = ENOSPC;
+ cycles = get_timeout_cycles(evtims[i], adapter);
+ ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles,
+ SINGLE, lcore_id, NULL, evtims[i]);
+ if (ret < 0) {
+ /* tim was in RUNNING or CONFIG state */
+ evtims[i]->state = RTE_EVENT_TIMER_ERROR;
break;
}

- EVTIM_LOG_DBG("enqueued ARM message to ring");
-
+ rte_smp_wmb();
+ EVTIM_LOG_DBG("armed an event timer");
evtims[i]->state = RTE_EVENT_TIMER_ARMED;
}

- /* Let the service know we're done producing messages */
- rte_atomic16_dec(&sw_data->message_producer_count);
-
if (i < nb_evtims)
- rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
- nb_evtims - i);
+ rte_mempool_put_bulk(sw->tim_pool,
+ (void **)&tims[i], nb_evtims - i);

return i;
}

static uint16_t
-sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
- struct rte_event_timer **evtims,
- uint16_t nb_evtims)
+swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+ struct rte_event_timer **evtims,
+ uint16_t nb_evtims)
{
- return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+ return __swtim_arm_burst(adapter, evtims, nb_evtims);
}

static uint16_t
-sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
- struct rte_event_timer **evtims,
- uint16_t nb_evtims)
+swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
+ struct rte_event_timer **evtims,
+ uint16_t nb_evtims)
{
- uint16_t i;
- int ret;
- struct rte_event_timer_adapter_sw_data *sw_data;
- struct msg *msgs[nb_evtims];
+ int i, ret;
+ struct rte_timer *timp;
+ uint64_t opaque;
+ struct swtim *sw = swtim_pmd_priv(adapter);

#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
/* Check that the service is running. */
@@ -1205,23 +1085,6 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
}
#endif

- sw_data = adapter->data->adapter_priv;
-
- ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
- if (ret < 0) {
- rte_errno = ENOSPC;
- return 0;
- }
-
- /* Let the service know we're producing messages for it to process */
- rte_atomic16_inc(&sw_data->message_producer_count);
-
- /* If the service could be modifying event timer states, wait */
- while (sw_data->service_phase == 2)
- rte_pause();
-
- rte_smp_rmb();
-
for (i = 0; i < nb_evtims; i++) {
/* Don't modify the event timer state in these cases */
if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
@@ -1232,54 +1095,54 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
break;
}

- msgs[i]->type = MSG_TYPE_CANCEL;
- msgs[i]->evtim = evtims[i];
+ opaque = evtims[i]->impl_opaque[0];
+ timp = (struct rte_timer *)(uintptr_t)opaque;
+ RTE_ASSERT(timp != NULL);

- if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
- rte_errno = ENOSPC;
+ ret = rte_timer_alt_stop(sw->timer_data_id, timp);
+ if (ret < 0) {
+ /* Timer is running or being configured */
+ rte_errno = EAGAIN;
break;
}

- EVTIM_LOG_DBG("enqueued CANCEL message to ring");
+ rte_mempool_put(sw->tim_pool, (void **)timp);

evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
- }
+ evtims[i]->impl_opaque[0] = 0;
+ evtims[i]->impl_opaque[1] = 0;

- /* Let the service know we're done producing messages */
- rte_atomic16_dec(&sw_data->message_producer_count);
-
- if (i < nb_evtims)
- rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
- nb_evtims - i);
+ rte_smp_wmb();
+ }

return i;
}

static uint16_t
-sw_event_timer_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
- struct rte_event_timer **evtims,
- uint64_t timeout_ticks,
- uint16_t nb_evtims)
+swtim_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
+ struct rte_event_timer **evtims,
+ uint64_t timeout_ticks,
+ uint16_t nb_evtims)
{
int i;

for (i = 0; i < nb_evtims; i++)
evtims[i]->timeout_ticks = timeout_ticks;

- return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+ return __swtim_arm_burst(adapter, evtims, nb_evtims);
}

-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops = {
- .init = sw_event_timer_adapter_init,
- .uninit = sw_event_timer_adapter_uninit,
- .start = sw_event_timer_adapter_start,
- .stop = sw_event_timer_adapter_stop,
- .get_info = sw_event_timer_adapter_get_info,
- .stats_get = sw_event_timer_adapter_stats_get,
- .stats_reset = sw_event_timer_adapter_stats_reset,
- .arm_burst = sw_event_timer_arm_burst,
- .arm_tmo_tick_burst = sw_event_timer_arm_tmo_tick_burst,
- .cancel_burst = sw_event_timer_cancel_burst,
+static const struct rte_event_timer_adapter_ops swtim_ops = {
+ .init = swtim_init,
+ .uninit = swtim_uninit,
+ .start = swtim_start,
+ .stop = swtim_stop,
+ .get_info = swtim_get_info,
+ .stats_get = swtim_stats_get,
+ .stats_reset = swtim_stats_reset,
+ .arm_burst = swtim_arm_burst,
+ .arm_tmo_tick_burst = swtim_arm_tmo_tick_burst,
+ .cancel_burst = swtim_cancel_burst,
};

RTE_INIT(event_timer_adapter_init_log)

--
2.6.4

Pavan Nikhilesh

2018-11-30 07:26:28 UTC

Permalink

Hi Eric,

I think we may need to address the librte_timer and event_timer patches in
separate series as we are modifying common code for the sake of sw_event_timer
PMD and the series title implies that only the PMD has been modified.

Also, I think we need to profile and report the performance regression
(timer_perf_autotest,) if any of the rte_timer library with the new patches
as it is also used as a standalone library.

Post by Erik Gabriel Carrillo
This patch series introduces a new version of the event timer
adapter software PMD [1]. In the original design, timer event producer
lcores in the primary and secondary processes enqueued event timers
into a ring, and a service core in the primary process dequeued them
and processed them further. To improve performance, this version does
away with the ring and lets the lcores in both primary and secondary
processes insert timers into directly into the timer skiplist data
structures; the service core directly accesses the lists as well.
To achieve this, however, modifications to the timer library [2] are
required to enable the timer skiplists to be created and accessed in
shared memory. New APIs are introduced in the timer library to enable
selecting from multiple instances of the timer skiplists. Instances of
the event timer adapter, as well as the original APIs of the timer
library, can then each access distinct timer lists.
Future versions of this series will hopefully improve the names
used for the data structures and APIs in the timer library.
https://patches.dpdk.org/patch/48417/
[1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html
[2] https://doc.dpdk.org/guides/prog_guide/timer_lib.html
timer: allow timer management in shared memory
timer: add function to stop all timers in a list
eventdev: add new software event timer adapter
lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++---------------
lib/librte_timer/Makefile | 1 +
lib/librte_timer/rte_timer.c | 579 ++++++++++++++++++----
lib/librte_timer/rte_timer.h | 200 +++++++-
lib/librte_timer/rte_timer_version.map | 22 +-
5 files changed, 972 insertions(+), 517 deletions(-)
--
2.6.4

Carrillo, Erik G

2018-11-30 19:07:25 UTC

Permalink

Hi Pavan,

-----Original Message-----
Sent: Friday, November 30, 2018 1:26 AM
Subject: Re: [PATCH 0/3] new software event timer adapter
Hi Eric,
I think we may need to address the librte_timer and event_timer patches in
separate series as we are modifying common code for the sake of
sw_event_timer PMD and the series title implies that only the PMD has been
modified.
Also, I think we need to profile and report the performance regression
(timer_perf_autotest,) if any of the rte_timer library with the new patches
as it is also used as a standalone library.

Makes sense. I'll separate the series and check for a performance regression
in the timer library for the next iteration.

Thanks,
Erik

This patch series introduces a new version of the event timer adapter
software PMD [1]. In the original design, timer event producer lcores
in the primary and secondary processes enqueued event timers into a
ring, and a service core in the primary process dequeued them and
processed them further. To improve performance, this version does
away with the ring and lets the lcores in both primary and secondary
processes insert timers into directly into the timer skiplist data
structures; the service core directly accesses the lists as well.
To achieve this, however, modifications to the timer library [2] are
required to enable the timer skiplists to be created and accessed in
shared memory. New APIs are introduced in the timer library to enable
selecting from multiple instances of the timer skiplists. Instances of
the event timer adapter, as well as the original APIs of the timer
library, can then each access distinct timer lists.
Future versions of this series will hopefully improve the names used
for the data structures and APIs in the timer library.
https://patches.dpdk.org/patch/48417/
[1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html
[2] https://doc.dpdk.org/guides/prog_guide/timer_lib.html
timer: allow timer management in shared memory
timer: add function to stop all timers in a list
eventdev: add new software event timer adapter
lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++-------

--------

lib/librte_timer/Makefile | 1 +
lib/librte_timer/rte_timer.c | 579 ++++++++++++++++++----
lib/librte_timer/rte_timer.h | 200 +++++++-
lib/librte_timer/rte_timer_version.map | 22 +-
5 files changed, 972 insertions(+), 517 deletions(-)
--
2.6.4

Erik Gabriel Carrillo

2018-12-07 17:52:58 UTC

Permalink

This patch series modifies the timer library in such a way that
structures that used to be statically allocated in a process's data
segment are now allocated in shared memory. As these structures contain
lists of timers, new APIs are introduced that allow a caller to specify
the particular structure instance into which a timer should be inserted
or from which a timer should be removed. This enables primary and secondary
processes to modify the same timer list, which enables some
multi-process use cases that were not previously possible; e.g. a
secondary process can start a timer whose expiration is detected in a
primary process running a new flavor of timer_manage().

The original library API is mostly unchanged, though implementations are
updated to call into newly added functions with a default structure instance
ID that provides the original behavior. New functions are introduced to
enable applications to allocate structure instances to house timer
lists, and to reference them with an identifier when starting and
stopping timers, and finally, to manage the timer lists referenced with
an identifier.

My initial performance testing with the "timer_perf_autotest" test shows
no performance regression or improvement, and inspection of the
generated optimized code shows that the extra function call gets inlined
in the functions that now have an extra function call.

Depends on: https://patches.dpdk.org/patch/48417/

Changes in v2:
- split these changes out into their own series
- version the symbols where the existing ABI was updated, and
provide alternate implementation with behavior equivalent to original
behavior. Validate ABI compatibility with validate-abi.sh
- refactor changes to simplify patches

Erik Gabriel Carrillo (2):
timer: allow timer management in shared memory
timer: add function to stop all timers in a list

lib/librte_timer/Makefile | 1 +
lib/librte_timer/rte_timer.c | 558 ++++++++++++++++++++++++++++++---
lib/librte_timer/rte_timer.h | 258 ++++++++++++++-
lib/librte_timer/rte_timer_version.map | 23 ++
4 files changed, 795 insertions(+), 45 deletions(-)

--
2.6.4

Erik Gabriel Carrillo

2018-12-07 17:52:59 UTC

Permalink

Currently, the timer library uses a per-process table of structures to
manage skiplists of timers presumably because timers contain arbitrary
function pointers whose value may not resolve properly in other
processes.

However, if the same callback is used handle all timers, and that
callback is only invoked in one process, then it woud be safe to allow
the data structures to be allocated in shared memory, and to allow
secondary processes to modify the timer lists. This would let timers be
used in more multi-process scenarios.

The library's global variables are wrapped with a struct, and an array
of these structures is created in shared memory. The original APIs
are updated to reference the zeroth entry in the array. This maintains
the original behavior for both primary and secondary processes since
the set intersection of their coremasks should be empty [1]. New APIs
are introduced to enable the allocation/deallocation of other entries
in the array.

New variants of the APIs used to start and stop timers are introduced;
they allow a caller to specify which array entry should be used to
locate the timer list to insert into or delete from.

Finally, a new variant of rte_timer_manage() is introduced, which
allows a caller to specify which array entry should be used to locate
the timer lists to process; it can also process multiple timer lists per
invocation.

[1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations

Signed-off-by: Erik Gabriel Carrillo <***@intel.com>
---
lib/librte_timer/Makefile | 1 +
lib/librte_timer/rte_timer.c | 519 ++++++++++++++++++++++++++++++---
lib/librte_timer/rte_timer.h | 226 +++++++++++++-
lib/librte_timer/rte_timer_version.map | 22 ++
4 files changed, 723 insertions(+), 45 deletions(-)

diff --git a/lib/librte_timer/Makefile b/lib/librte_timer/Makefile
index 4ebd528..8ec63f4 100644
--- a/lib/librte_timer/Makefile
+++ b/lib/librte_timer/Makefile
@@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
# library name
LIB = librte_timer.a

+CFLAGS += -DALLOW_EXPERIMENTAL_API
CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
LDLIBS += -lrte_eal

diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 30c7b0a..571fb3f 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -5,6 +5,7 @@
#include <string.h>
#include <stdio.h>
#include <stdint.h>
+#include <stdbool.h>
#include <inttypes.h>
#include <assert.h>
#include <sys/queue.h>
@@ -21,11 +22,15 @@
#include <rte_spinlock.h>
#include <rte_random.h>
#include <rte_pause.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_compat.h>

#include "rte_timer.h"

-LIST_HEAD(rte_timer_list, rte_timer);
-
+/**
+ * Per-lcore info for timers.
+ */
struct priv_timer {
struct rte_timer pending_head; /**< dummy timer instance to head up list */
rte_spinlock_t list_lock; /**< lock to protect list access */
@@ -48,25 +53,84 @@ struct priv_timer {
#endif
} __rte_cache_aligned;

-/** per-lcore private info for timers */
-static struct priv_timer priv_timer[RTE_MAX_LCORE];
+#define FL_ALLOCATED (1 << 0)
+struct rte_timer_data {
+ struct priv_timer priv_timer[RTE_MAX_LCORE];
+ uint8_t internal_flags;
+};
+
+#define RTE_MAX_DATA_ELS 64
+static struct rte_timer_data *rte_timer_data_arr;
+static uint32_t default_data_id; // id set to zero automatically
+static uint32_t rte_timer_subsystem_initialized;
+
+/* For maintaining older interfaces for a period */
+static struct rte_timer_data default_timer_data;

/* when debug is enabled, store some statistics */
#ifdef RTE_LIBRTE_TIMER_DEBUG
-#define __TIMER_STAT_ADD(name, n) do { \
+#define __TIMER_STAT_ADD(priv_timer, name, n) do { \
unsigned __lcore_id = rte_lcore_id(); \
if (__lcore_id < RTE_MAX_LCORE) \
priv_timer[__lcore_id].stats.name += (n); \
} while(0)
#else
-#define __TIMER_STAT_ADD(name, n) do {} while(0)
+#define __TIMER_STAT_ADD(priv_timer, name, n) do {} while (0)
#endif

-/* Init the timer library. */
+static inline int
+timer_data_valid(uint32_t id)
+{
+ return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED);
+}
+
+/* validate ID and retrieve timer data pointer, or return error value */
+#define TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, retval) do { \
+ if (id >= RTE_MAX_DATA_ELS || !timer_data_valid(id)) \
+ return retval; \
+ timer_data = &rte_timer_data_arr[id]; \
+} while (0)
+
+int __rte_experimental
+rte_timer_data_alloc(uint32_t *id_ptr)
+{
+ int i;
+ struct rte_timer_data *data;
+
+ if (!rte_timer_subsystem_initialized)
+ return -ENOMEM;
+
+ for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+ data = &rte_timer_data_arr[i];
+ if (!(data->internal_flags & FL_ALLOCATED)) {
+ data->internal_flags |= FL_ALLOCATED;
+
+ if (id_ptr)
+ *id_ptr = i;
+
+ return 0;
+ }
+ }
+
+ return -ENOSPC;
+}
+
+int __rte_experimental
+rte_timer_data_dealloc(uint32_t id)
+{
+ struct rte_timer_data *timer_data;
+ TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, -EINVAL);
+
+ timer_data->internal_flags &= ~(FL_ALLOCATED);
+
+ return 0;
+}
+
void
-rte_timer_subsystem_init(void)
+rte_timer_subsystem_init_v20(void)
{
unsigned lcore_id;
+ struct priv_timer *priv_timer = default_timer_data.priv_timer;

/* since priv_timer is static, it's zeroed by default, so only init some
* fields.
@@ -76,6 +140,76 @@ rte_timer_subsystem_init(void)
priv_timer[lcore_id].prev_lcore = lcore_id;
}
}
+VERSION_SYMBOL(rte_timer_subsystem_init, _v20, 2.0);
+
+/* Init the timer library. Allocate an array of timer data structs in shared
+ * memory, and allocate the zeroth entry for use with original timer
+ * APIs. Since the intersection of the sets of lcore ids in primary and
+ * secondary processes should be empty, the zeroth entry can be shared by
+ * multiple processes.
+ */
+int
+rte_timer_subsystem_init_v1902(void)
+{
+ const struct rte_memzone *mz;
+ struct rte_timer_data *data;
+ int i, lcore_id;
+ static const char *mz_name = "rte_timer_mz";
+
+ if (rte_timer_subsystem_initialized)
+ return -EALREADY;
+
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+ mz = rte_memzone_lookup(mz_name);
+ if (mz == NULL)
+ return -EEXIST;
+
+ rte_timer_data_arr = mz->addr;
+
+ rte_timer_data_arr[default_data_id].internal_flags |=
+ FL_ALLOCATED;
+
+ rte_timer_subsystem_initialized = 1;
+
+ return 0;
+ }
+
+ mz = rte_memzone_reserve_aligned(mz_name,
+ RTE_MAX_DATA_ELS * sizeof(*rte_timer_data_arr),
+ SOCKET_ID_ANY, 0, RTE_CACHE_LINE_SIZE);
+ if (mz == NULL)
+ return -ENOMEM;
+
+ rte_timer_data_arr = mz->addr;
+
+ for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+ data = &rte_timer_data_arr[i];
+
+ for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+ rte_spinlock_init(
+ &data->priv_timer[lcore_id].list_lock);
+ data->priv_timer[lcore_id].prev_lcore = lcore_id;
+ }
+ }
+
+ rte_timer_data_arr[default_data_id].internal_flags |= FL_ALLOCATED;
+
+ rte_timer_subsystem_initialized = 1;
+
+ return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_subsystem_init(void),
+ rte_timer_subsystem_init_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_subsystem_init, _v1902, 19.02);
+
+void __rte_experimental
+rte_timer_subsystem_finalize(void)
+{
+ if (rte_timer_data_arr)
+ rte_free(rte_timer_data_arr);
+
+ rte_timer_subsystem_initialized = 0;
+}

/* Initialize the timer handle tim for use */
void
@@ -95,7 +229,8 @@ rte_timer_init(struct rte_timer *tim)
*/
static int
timer_set_config_state(struct rte_timer *tim,
- union rte_timer_status *ret_prev_status)
+ union rte_timer_status *ret_prev_status,
+ struct priv_timer *priv_timer)
{
union rte_timer_status prev_status, status;
int success = 0;
@@ -207,7 +342,7 @@ timer_get_skiplist_level(unsigned curr_depth)
*/
static void
timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
- struct rte_timer **prev)
+ struct rte_timer **prev, struct priv_timer *priv_timer)
{
unsigned lvl = priv_timer[tim_lcore].curr_skiplist_depth;
prev[lvl] = &priv_timer[tim_lcore].pending_head;
@@ -226,13 +361,15 @@ timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
*/
static void
timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
- struct rte_timer **prev)
+ struct rte_timer **prev,
+ struct priv_timer *priv_timer)
{
int i;
+
/* to get a specific entry in the list, look for just lower than the time
* values, and then increment on each level individually if necessary
*/
- timer_get_prev_entries(tim->expire - 1, tim_lcore, prev);
+ timer_get_prev_entries(tim->expire - 1, tim_lcore, prev, priv_timer);
for (i = priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; i--) {
while (prev[i]->sl_next[i] != NULL &&
prev[i]->sl_next[i] != tim &&
@@ -247,14 +384,15 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
* timer must not be in a list
*/
static void
-timer_add(struct rte_timer *tim, unsigned int tim_lcore)
+timer_add(struct rte_timer *tim, unsigned int tim_lcore,
+ struct priv_timer *priv_timer)
{
unsigned lvl;
struct rte_timer *prev[MAX_SKIPLIST_DEPTH+1];

/* find where exactly this element goes in the list of elements
* for each depth. */
- timer_get_prev_entries(tim->expire, tim_lcore, prev);
+ timer_get_prev_entries(tim->expire, tim_lcore, prev, priv_timer);

/* now assign it a new level and add at that level */
const unsigned tim_level = timer_get_skiplist_level(
@@ -284,7 +422,7 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore)
*/
static void
timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
- int local_is_locked)
+ int local_is_locked, struct priv_timer *priv_timer)
{
unsigned lcore_id = rte_lcore_id();
unsigned prev_owner = prev_status.owner;
@@ -304,7 +442,7 @@ timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
((tim->sl_next[0] == NULL) ? 0 : tim->sl_next[0]->expire);

/* adjust pointers from previous entries to point past this */
- timer_get_prev_entries_for_node(tim, prev_owner, prev);
+ timer_get_prev_entries_for_node(tim, prev_owner, prev, priv_timer);
for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) {
if (prev[i]->sl_next[i] == tim)
prev[i]->sl_next[i] = tim->sl_next[i];
@@ -326,11 +464,13 @@ static int
__rte_timer_reset(struct rte_timer *tim, uint64_t expire,
uint64_t period, unsigned tim_lcore,
rte_timer_cb_t fct, void *arg,
- int local_is_locked)
+ int local_is_locked,
+ struct rte_timer_data *timer_data)
{
union rte_timer_status prev_status, status;
int ret;
unsigned lcore_id = rte_lcore_id();
+ struct priv_timer *priv_timer = timer_data->priv_timer;

/* round robin for tim_lcore */
if (tim_lcore == (unsigned)LCORE_ID_ANY) {
@@ -348,11 +488,11 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,

/* wait that the timer is in correct status before update,
* and mark it as being configured */
- ret = timer_set_config_state(tim, &prev_status);
+ ret = timer_set_config_state(tim, &prev_status, priv_timer);
if (ret < 0)
return -1;

- __TIMER_STAT_ADD(reset, 1);
+ __TIMER_STAT_ADD(priv_timer, reset, 1);
if (prev_status.state == RTE_TIMER_RUNNING &&
lcore_id < RTE_MAX_LCORE) {
priv_timer[lcore_id].updated = 1;
@@ -360,8 +500,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,

/* remove it from list */
if (prev_status.state == RTE_TIMER_PENDING) {
- timer_del(tim, prev_status, local_is_locked);
- __TIMER_STAT_ADD(pending, -1);
+ timer_del(tim, prev_status, local_is_locked, priv_timer);
+ __TIMER_STAT_ADD(priv_timer, pending, -1);
}

tim->period = period;
@@ -376,8 +516,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
if (tim_lcore != lcore_id || !local_is_locked)
rte_spinlock_lock(&priv_timer[tim_lcore].list_lock);

- __TIMER_STAT_ADD(pending, 1);
- timer_add(tim, tim_lcore);
+ __TIMER_STAT_ADD(priv_timer, pending, 1);
+ timer_add(tim, tim_lcore, priv_timer);

/* update state: as we are in CONFIG state, only us can modify
* the state so we don't need to use cmpset() here */
@@ -394,9 +534,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,

/* Reset and start the timer associated with the timer handle tim */
int
-rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
- enum rte_timer_type type, unsigned tim_lcore,
- rte_timer_cb_t fct, void *arg)
+rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+ enum rte_timer_type type, unsigned int tim_lcore,
+ rte_timer_cb_t fct, void *arg)
{
uint64_t cur_time = rte_get_timer_cycles();
uint64_t period;
@@ -412,7 +552,48 @@ rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
period = 0;

return __rte_timer_reset(tim, cur_time + ticks, period, tim_lcore,
- fct, arg, 0);
+ fct, arg, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_reset, _v20, 2.0);
+
+int
+rte_timer_reset_v1902(struct rte_timer *tim, uint64_t ticks,
+ enum rte_timer_type type, unsigned int tim_lcore,
+ rte_timer_cb_t fct, void *arg)
+{
+ return rte_timer_alt_reset(default_data_id, tim, ticks, type,
+ tim_lcore, fct, arg);
+}
+MAP_STATIC_SYMBOL(int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
+ enum rte_timer_type type,
+ unsigned int tim_lcore,
+ rte_timer_cb_t fct, void *arg),
+ rte_timer_reset_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_reset, _v1902, 19.02);
+
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+ uint64_t ticks, enum rte_timer_type type,
+ unsigned int tim_lcore, rte_timer_cb_t fct, void *arg)
+{
+ uint64_t cur_time = rte_get_timer_cycles();
+ uint64_t period;
+ struct rte_timer_data *timer_data;
+
+ TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+ if (unlikely((tim_lcore != (unsigned int)LCORE_ID_ANY) &&
+ !(rte_lcore_is_enabled(tim_lcore) ||
+ rte_lcore_has_role(tim_lcore, ROLE_SERVICE))))
+ return -1;
+
+ if (type == PERIODICAL)
+ period = ticks;
+ else
+ period = 0;
+
+ return __rte_timer_reset(tim, cur_time + ticks, period, tim_lcore,
+ fct, arg, 0, timer_data);
}

/* loop until rte_timer_reset() succeed */
@@ -426,21 +607,22 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
rte_pause();
}

-/* Stop the timer associated with the timer handle tim */
-int
-rte_timer_stop(struct rte_timer *tim)
+static int
+__rte_timer_stop(struct rte_timer *tim, int local_is_locked,
+ struct rte_timer_data *timer_data)
{
union rte_timer_status prev_status, status;
unsigned lcore_id = rte_lcore_id();
int ret;
+ struct priv_timer *priv_timer = timer_data->priv_timer;

/* wait that the timer is in correct status before update,
* and mark it as being configured */
- ret = timer_set_config_state(tim, &prev_status);
+ ret = timer_set_config_state(tim, &prev_status, priv_timer);
if (ret < 0)
return -1;

- __TIMER_STAT_ADD(stop, 1);
+ __TIMER_STAT_ADD(priv_timer, stop, 1);
if (prev_status.state == RTE_TIMER_RUNNING &&
lcore_id < RTE_MAX_LCORE) {
priv_timer[lcore_id].updated = 1;
@@ -448,8 +630,8 @@ rte_timer_stop(struct rte_timer *tim)

/* remove it from list */
if (prev_status.state == RTE_TIMER_PENDING) {
- timer_del(tim, prev_status, 0);
- __TIMER_STAT_ADD(pending, -1);
+ timer_del(tim, prev_status, local_is_locked, priv_timer);
+ __TIMER_STAT_ADD(priv_timer, pending, -1);
}

/* mark timer as stopped */
@@ -461,6 +643,33 @@ rte_timer_stop(struct rte_timer *tim)
return 0;
}

+/* Stop the timer associated with the timer handle tim */
+int
+rte_timer_stop_v20(struct rte_timer *tim)
+{
+ return __rte_timer_stop(tim, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_stop, _v20, 2.0);
+
+int
+rte_timer_stop_v1902(struct rte_timer *tim)
+{
+ return rte_timer_alt_stop(default_data_id, tim);
+}
+MAP_STATIC_SYMBOL(int rte_timer_stop(struct rte_timer *tim),
+ rte_timer_stop_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_stop, _v1902, 19.02);
+
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
+{
+ struct rte_timer_data *timer_data;
+
+ TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+ return __rte_timer_stop(tim, 0, timer_data);
+}
+
/* loop until rte_timer_stop() succeed */
void
rte_timer_stop_sync(struct rte_timer *tim)
@@ -477,7 +686,8 @@ rte_timer_pending(struct rte_timer *tim)
}

/* must be called periodically, run all timer that expired */
-void rte_timer_manage(void)
+static void
+__rte_timer_manage(struct rte_timer_data *timer_data)
{
union rte_timer_status status;
struct rte_timer *tim, *next_tim;
@@ -486,11 +696,12 @@ void rte_timer_manage(void)
struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
uint64_t cur_time;
int i, ret;
+ struct priv_timer *priv_timer = timer_data->priv_timer;

/* timer manager only runs on EAL thread with valid lcore_id */
assert(lcore_id < RTE_MAX_LCORE);

- __TIMER_STAT_ADD(manage, 1);
+ __TIMER_STAT_ADD(priv_timer, manage, 1);
/* optimize for the case where per-cpu list is empty */
if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
return;
@@ -518,7 +729,7 @@ void rte_timer_manage(void)
tim = priv_timer[lcore_id].pending_head.sl_next[0];

/* break the existing list at current time point */
- timer_get_prev_entries(cur_time, lcore_id, prev);
+ timer_get_prev_entries(cur_time, lcore_id, prev, priv_timer);
for (i = priv_timer[lcore_id].curr_skiplist_depth -1; i >= 0; i--) {
if (prev[i] == &priv_timer[lcore_id].pending_head)
continue;
@@ -563,7 +774,7 @@ void rte_timer_manage(void)
/* execute callback function with list unlocked */
tim->f(tim, tim->arg);

- __TIMER_STAT_ADD(pending, -1);
+ __TIMER_STAT_ADD(priv_timer, pending, -1);
/* the timer was stopped or reloaded by the callback
* function, we have nothing to do here */
if (priv_timer[lcore_id].updated == 1)
@@ -580,24 +791,222 @@ void rte_timer_manage(void)
/* keep it in list and mark timer as pending */
rte_spinlock_lock(&priv_timer[lcore_id].list_lock);
status.state = RTE_TIMER_PENDING;
- __TIMER_STAT_ADD(pending, 1);
+ __TIMER_STAT_ADD(priv_timer, pending, 1);
status.owner = (int16_t)lcore_id;
rte_wmb();
tim->status.u32 = status.u32;
__rte_timer_reset(tim, tim->expire + tim->period,
- tim->period, lcore_id, tim->f, tim->arg, 1);
+ tim->period, lcore_id, tim->f, tim->arg, 1,
+ timer_data);
rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
}
}
priv_timer[lcore_id].running_tim = NULL;
}

+void
+rte_timer_manage_v20(void)
+{
+ __rte_timer_manage(&default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_manage, _v20, 2.0);
+
+int
+rte_timer_manage_v1902(void)
+{
+ struct rte_timer_data *timer_data;
+
+ TIMER_DATA_VALID_GET_OR_ERR_RET(default_data_id, timer_data, -EINVAL);
+
+ __rte_timer_manage(timer_data);
+
+ return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_manage(void), rte_timer_manage_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_manage, _v1902, 19.02);
+
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id,
+ unsigned int *poll_lcores,
+ int nb_poll_lcores,
+ rte_timer_alt_manage_cb_t f)
+{
+ union rte_timer_status status;
+ struct rte_timer *tim, *next_tim, **pprev;
+ struct rte_timer *run_first_tims[RTE_MAX_LCORE];
+ unsigned int runlist_lcore_ids[RTE_MAX_LCORE];
+ unsigned int this_lcore = rte_lcore_id();
+ struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
+ uint64_t cur_time;
+ int i, j, ret;
+ int nb_runlists = 0;
+ struct rte_timer_data *data;
+ struct priv_timer *privp;
+ uint32_t poll_lcore;
+
+ TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);
+
+ /* timer manager only runs on EAL thread with valid lcore_id */
+ assert(this_lcore < RTE_MAX_LCORE);
+
+ __TIMER_STAT_ADD(data->priv_timer, manage, 1);
+
+ if (poll_lcores == NULL) {
+ poll_lcores = (unsigned int []){rte_lcore_id()};
+ nb_poll_lcores = 1;
+ }
+
+ for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
+ poll_lcore = poll_lcores[++i]) {
+ privp = &data->priv_timer[poll_lcore];
+
+ /* optimize for the case where per-cpu list is empty */
+ if (privp->pending_head.sl_next[0] == NULL)
+ continue;
+ cur_time = rte_get_timer_cycles();
+
+#ifdef RTE_ARCH_64
+ /* on 64-bit the value cached in the pending_head.expired will
+ * be updated atomically, so we can consult that for a quick
+ * check here outside the lock
+ */
+ if (likely(privp->pending_head.expire > cur_time))
+ continue;
+#endif
+
+ /* browse ordered list, add expired timers in 'expired' list */
+ rte_spinlock_lock(&privp->list_lock);
+
+ /* if nothing to do just unlock and return */
+ if (privp->pending_head.sl_next[0] == NULL ||
+ privp->pending_head.sl_next[0]->expire > cur_time) {
+ rte_spinlock_unlock(&privp->list_lock);
+ continue;
+ }
+
+ /* save start of list of expired timers */
+ tim = privp->pending_head.sl_next[0];
+
+ /* break the existing list at current time point */
+ timer_get_prev_entries(cur_time, poll_lcore, prev,
+ data->priv_timer);
+ for (j = privp->curr_skiplist_depth - 1; j >= 0; j--) {
+ if (prev[j] == &privp->pending_head)
+ continue;
+ privp->pending_head.sl_next[j] =
+ prev[j]->sl_next[j];
+ if (prev[j]->sl_next[j] == NULL)
+ privp->curr_skiplist_depth--;
+
+ prev[j]->sl_next[j] = NULL;
+ }
+
+ /* transition run-list from PENDING to RUNNING */
+ run_first_tims[nb_runlists] = tim;
+ runlist_lcore_ids[nb_runlists] = poll_lcore;
+ pprev = &run_first_tims[nb_runlists];
+ nb_runlists++;
+
+ for ( ; tim != NULL; tim = next_tim) {
+ next_tim = tim->sl_next[0];
+
+ ret = timer_set_running_state(tim);
+ if (likely(ret == 0)) {
+ pprev = &tim->sl_next[0];
+ } else {
+ /* another core is trying to re-config this one,
+ * remove it from local expired list
+ */
+ *pprev = next_tim;
+ }
+ }
+
+ /* update the next to expire timer value */
+ privp->pending_head.expire =
+ (privp->pending_head.sl_next[0] == NULL) ? 0 :
+ privp->pending_head.sl_next[0]->expire;
+
+ rte_spinlock_unlock(&privp->list_lock);
+ }
+
+ /* Now process the run lists */
+ while (1) {
+ bool done = true;
+ uint64_t min_expire = UINT64_MAX;
+ int min_idx = 0;
+
+ /* Find the next oldest timer to process */
+ for (i = 0; i < nb_runlists; i++) {
+ tim = run_first_tims[i];
+
+ if (tim != NULL && tim->expire < min_expire) {
+ min_expire = tim->expire;
+ min_idx = i;
+ done = false;
+ }
+ }
+
+ if (done)
+ break;
+
+ tim = run_first_tims[min_idx];
+ privp = &data->priv_timer[runlist_lcore_ids[min_idx]];
+
+ /* Move down the runlist from which we picked a timer to
+ * execute
+ */
+ run_first_tims[min_idx] = run_first_tims[min_idx]->sl_next[0];
+
+ privp->updated = 0;
+ privp->running_tim = tim;
+
+ /* Call the provided callback function */
+ f(tim);
+
+ __TIMER_STAT_ADD(privp, pending, -1);
+
+ /* the timer was stopped or reloaded by the callback
+ * function, we have nothing to do here
+ */
+ if (privp->updated == 1)
+ continue;
+
+ if (tim->period == 0) {
+ /* remove from done list and mark timer as stopped */
+ status.state = RTE_TIMER_STOP;
+ status.owner = RTE_TIMER_NO_OWNER;
+ rte_wmb();
+ tim->status.u32 = status.u32;
+ } else {
+ /* keep it in list and mark timer as pending */
+ rte_spinlock_lock(
+ &data->priv_timer[this_lcore].list_lock);
+ status.state = RTE_TIMER_PENDING;
+ __TIMER_STAT_ADD(data->priv_timer, pending, 1);
+ status.owner = (int16_t)this_lcore;
+ rte_wmb();
+ tim->status.u32 = status.u32;
+ __rte_timer_reset(tim, tim->expire + tim->period,
+ tim->period, this_lcore, tim->f, tim->arg, 1,
+ data);
+ rte_spinlock_unlock(
+ &data->priv_timer[this_lcore].list_lock);
+ }
+
+ privp->running_tim = NULL;
+ }
+
+ return 0;
+}
+
/* dump statistics about timers */
-void rte_timer_dump_stats(FILE *f)
+static void
+__rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
{
#ifdef RTE_LIBRTE_TIMER_DEBUG
struct rte_timer_debug_stats sum;
unsigned lcore_id;
+ struct priv_timer *priv_timer = timer_data->priv_timer;

memset(&sum, 0, sizeof(sum));
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
@@ -615,3 +1024,31 @@ void rte_timer_dump_stats(FILE *f)
fprintf(f, "No timer statistics, RTE_LIBRTE_TIMER_DEBUG is disabled\n");
#endif
}
+
+void
+rte_timer_dump_stats_v20(FILE *f)
+{
+ __rte_timer_dump_stats(&default_timer_data, f);
+}
+VERSION_SYMBOL(rte_timer_dump_stats, _v20, 2.0);
+
+int
+rte_timer_dump_stats_v1902(FILE *f)
+{
+ return rte_timer_alt_dump_stats(default_data_id, f);
+}
+MAP_STATIC_SYMBOL(int rte_timer_dump_stats(FILE *f),
+ rte_timer_dump_stats_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_dump_stats, _v1902, 19.02);
+
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id __rte_unused, FILE *f)
+{
+ struct rte_timer_data *timer_data;
+
+ TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+ __rte_timer_dump_stats(timer_data, f);
+
+ return 0;
+}
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 9b95cd2..82f5fba 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -39,6 +39,7 @@
#include <stddef.h>
#include <rte_common.h>
#include <rte_config.h>
+#include <rte_spinlock.h>

#ifdef __cplusplus
extern "C" {
@@ -132,12 +133,68 @@ struct rte_timer
#endif

/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Allocate a timer data instance in shared memory to track a set of pending
+ * timer lists.
+ *
+ * @param id_ptr
+ * Pointer to variable into which to write the identifier of the allocated
+ * timer data instance.
+ *
+ * @return
+ * - 0: Success
+ * - -ENOSPC: maximum number of timer data instances already allocated
+ */
+int __rte_experimental rte_timer_data_alloc(uint32_t *id_ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Deallocate a timer data instance.
+ *
+ * @param id
+ * Identifier of the timer data instance to deallocate.
+ *
+ * @return
+ * - 0: Success
+ * - -EINVAL: invalid timer data instance identifier
+ */
+int __rte_experimental rte_timer_data_dealloc(uint32_t id);
+
+/**
* Initialize the timer library.
*
* Initializes internal variables (list, locks and so on) for the RTE
* timer library.
*/
-void rte_timer_subsystem_init(void);
+void rte_timer_subsystem_init_v20(void);
+
+/**
+ * Initialize the timer library.
+ *
+ * Initializes internal variables (list, locks and so on) for the RTE
+ * timer library.
+ *
+ * @return
+ * - 0: Success
+ * - -EEXIST: Returned in secondary process when primary process has not
+ * yet initialized the timer subsystem
+ * - -ENOMEM: Unable to allocate memory needed to initialize timer
+ * subsystem
+ */
+int rte_timer_subsystem_init_v1902(void);
+int rte_timer_subsystem_init(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Free timer subsystem resources.
+ */
+void __rte_experimental rte_timer_subsystem_finalize(void);

/**
* Initialize a timer handle.
@@ -193,6 +250,12 @@ void rte_timer_init(struct rte_timer *tim);
* - 0: Success; the timer is scheduled.
* - (-1): Timer is in the RUNNING or CONFIG state.
*/
+int rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+ enum rte_timer_type type, unsigned int tim_lcore,
+ rte_timer_cb_t fct, void *arg);
+int rte_timer_reset_v1902(struct rte_timer *tim, uint64_t ticks,
+ enum rte_timer_type type, unsigned int tim_lcore,
+ rte_timer_cb_t fct, void *arg);
int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
enum rte_timer_type type, unsigned tim_lcore,
rte_timer_cb_t fct, void *arg);
@@ -252,9 +315,10 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
* - 0: Success; the timer is stopped.
* - (-1): The timer is in the RUNNING or CONFIG state.
*/
+int rte_timer_stop_v20(struct rte_timer *tim);
+int rte_timer_stop_v1902(struct rte_timer *tim);
int rte_timer_stop(struct rte_timer *tim);

-
/**
* Loop until rte_timer_stop() succeeds.
*
@@ -292,7 +356,25 @@ int rte_timer_pending(struct rte_timer *tim);
* function. However, the more often the function is called, the more
* CPU resources it will use.
*/
-void rte_timer_manage(void);
+void rte_timer_manage_v20(void);
+
+/**
+ * Manage the timer list and execute callback functions.
+ *
+ * This function must be called periodically from EAL lcores
+ * main_loop(). It browses the list of pending timers and runs all
+ * timers that are expired.
+ *
+ * The precision of the timer depends on the call frequency of this
+ * function. However, the more often the function is called, the more
+ * CPU resources it will use.
+ *
+ * @return
+ * - 0: Success
+ * - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_manage_v1902(void);
+int rte_timer_manage(void);

/**
* Dump statistics about timers.
@@ -300,7 +382,143 @@ void rte_timer_manage(void);
* @param f
* A pointer to a file for output
*/
-void rte_timer_dump_stats(FILE *f);
+void rte_timer_dump_stats_v20(FILE *f);
+
+/**
+ * Dump statistics about timers.
+ *
+ * @param f
+ * A pointer to a file for output
+ * @return
+ * - 0: Success
+ * - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_dump_stats_v1902(FILE *f);
+int rte_timer_dump_stats(FILE *f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_reset(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list to which
+ * the timer should be added.
+ *
+ * @see rte_timer_reset()
+ *
+ * @param timer_data_id
+ * An identifier indicating which instance of timer data should be used for
+ * this operation.
+ * @param tim
+ * The timer handle.
+ * @param ticks
+ * The number of cycles (see rte_get_hpet_hz()) before the callback
+ * function is called.
+ * @param type
+ * The type can be either:
+ * - PERIODICAL: The timer is automatically reloaded after execution
+ * (returns to the PENDING state)
+ * - SINGLE: The timer is one-shot, that is, the timer goes to a
+ * STOPPED state after execution.
+ * @param tim_lcore
+ * The ID of the lcore where the timer callback function has to be
+ * executed. If tim_lcore is LCORE_ID_ANY, the timer library will
+ * launch it on a different core for each call (round-robin).
+ * @param fct
+ * The callback function of the timer. This parameter can be NULL if (and
+ * only if) rte_timer_alt_manage() will be used to manage this timer.
+ * @param arg
+ * The user argument of the callback function.
+ * @return
+ * - 0: Success; the timer is scheduled.
+ * - (-1): Timer is in the RUNNING or CONFIG state.
+ * - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+ uint64_t ticks, enum rte_timer_type type,
+ unsigned int tim_lcore, rte_timer_cb_t fct, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_stop(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list from which
+ * this timer should be removed.
+ *
+ * @see rte_timer_stop()
+ *
+ * @param timer_data_id
+ * An identifier indicating which instance of timer data should be used for
+ * this operation.
+ * @param tim
+ * The timer handle.
+ * @return
+ * - 0: Success; the timer is stopped.
+ * - (-1): The timer is in the RUNNING or CONFIG state.
+ * - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim);
+
+/**
+ * Callback function type for rte_timer_alt_manage().
+ */
+typedef void (*rte_timer_alt_manage_cb_t)(void *);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Manage a set of timer lists and execute the specified callback function for
+ * all expired timers. This function is similar to rte_timer_manage(), except
+ * that it allows a caller to specify the timer_data instance that should
+ * be operated on, as well as a set of lcore IDs identifying which timer lists
+ * should be processed. Callback functions of individual timers are ignored.
+ *
+ * @see rte_timer_manage()
+ *
+ * @param timer_data_id
+ * An identifier indicating which instance of timer data should be used for
+ * this operation.
+ * @param poll_lcores
+ * An array of lcore ids identifying the timer lists that should be processed.
+ * NULL is allowed - if NULL, the timer list corresponding to the lcore
+ * calling this routine is processed (same as rte_timer_manage()).
+ * @param n_poll_lcores
+ * The size of the poll_lcores array. If 'poll_lcores' is NULL, this parameter
+ * is ignored.
+ * @param f
+ * The callback function which should be called for all expired timers.
+ * @return
+ * - 0: success
+ * - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
+ int n_poll_lcores, rte_timer_alt_manage_cb_t f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_dump_stats(), except that it allows
+ * the caller to specify the rte_timer_data instance that should be used.
+ *
+ * @see rte_timer_dump_stats()
+ *
+ * @param timer_data_id
+ * An identifier indicating which instance of timer data should be used for
+ * this operation.
+ * @param f
+ * A pointer to a file for output
+ * @return
+ * - 0: success
+ * - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id, FILE *f);

#ifdef __cplusplus
}
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index 9b2e4b8..b3f4b6c 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -13,3 +13,25 @@ DPDK_2.0 {

local: *;
};
+
+DPDK_19.02 {
+ global:
+
+ rte_timer_dump_stats;
+ rte_timer_manage;
+ rte_timer_reset;
+ rte_timer_stop;
+ rte_timer_subsystem_init;
+} DPDK_2.0;
+
+EXPERIMENTAL {
+ global:
+
+ rte_timer_alt_dump_stats;
+ rte_timer_alt_manage;
+ rte_timer_alt_reset;
+ rte_timer_alt_stop;
+ rte_timer_data_alloc;
+ rte_timer_data_dealloc;
+ rte_timer_subsystem_finalize;
+};

--
2.6.4

Stephen Hemminger

2018-12-07 18:10:11 UTC

Permalink

On Fri, 7 Dec 2018 11:52:59 -0600

Post by Erik Gabriel Carrillo
Currently, the timer library uses a per-process table of structures to
manage skiplists of timers presumably because timers contain arbitrary
function pointers whose value may not resolve properly in other
processes.
However, if the same callback is used handle all timers, and that
callback is only invoked in one process, then it woud be safe to allow
the data structures to be allocated in shared memory, and to allow
secondary processes to modify the timer lists. This would let timers be
used in more multi-process scenarios.
The library's global variables are wrapped with a struct, and an array
of these structures is created in shared memory. The original APIs
are updated to reference the zeroth entry in the array. This maintains
the original behavior for both primary and secondary processes since
the set intersection of their coremasks should be empty [1]. New APIs
are introduced to enable the allocation/deallocation of other entries
in the array.
New variants of the APIs used to start and stop timers are introduced;
they allow a caller to specify which array entry should be used to
locate the timer list to insert into or delete from.
Finally, a new variant of rte_timer_manage() is introduced, which
allows a caller to specify which array entry should be used to locate
the timer lists to process; it can also process multiple timer lists per
invocation.
[1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations

Makes sense but it looks to me like an ABI breakage. Experimental isn't going to
work for this.

Post by Erik Gabriel Carrillo
+static uint32_t default_data_id; // id set to zero automatically

C++ style comments are not allowed per DPDK coding style.
Best to just drop the comment, it is stating the obvious.

Post by Erik Gabriel Carrillo
-/* Init the timer library. */
+static inline int
+timer_data_valid(uint32_t id)
+{
+ return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED);
+}

Don't need inline on static functions.
...

Post by Erik Gabriel Carrillo
+MAP_STATIC_SYMBOL(int rte_timer_manage(void), rte_timer_manage_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_manage, _v1902, 19.02);
+
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id,
+ unsigned int *poll_lcores,
+ int nb_poll_lcores,
+ rte_timer_alt_manage_cb_t f)
+{
+ union rte_timer_status status;
+ struct rte_timer *tim, *next_tim, **pprev;
+ struct rte_timer *run_first_tims[RTE_MAX_LCORE];
+ unsigned int runlist_lcore_ids[RTE_MAX_LCORE];
+ unsigned int this_lcore = rte_lcore_id();
+ struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
+ uint64_t cur_time;
+ int i, j, ret;
+ int nb_runlists = 0;
+ struct rte_timer_data *data;
+ struct priv_timer *privp;
+ uint32_t poll_lcore;
+
+ TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);
+
+ /* timer manager only runs on EAL thread with valid lcore_id */
+ assert(this_lcore < RTE_MAX_LCORE);
+
+ __TIMER_STAT_ADD(data->priv_timer, manage, 1);
+
+ if (poll_lcores == NULL) {
+ poll_lcores = (unsigned int []){rte_lcore_id()};

This isn't going to be safe. It assigns poll_lcores to an array
allocated on the stack.

Post by Erik Gabriel Carrillo
+
+ for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
+ poll_lcore = poll_lcores[++i]) {
+ privp = &data->priv_timer[poll_lcore];
+
+ /* optimize for the case where per-cpu list is empty */
+ if (privp->pending_head.sl_next[0] == NULL)
+ continue;
+ cur_time = rte_get_timer_cycles();
+
+#ifdef RTE_ARCH_64
+ /* on 64-bit the value cached in the pending_head.expired will
+ * be updated atomically, so we can consult that for a quick
+ * check here outside the lock
+ */
+ if (likely(privp->pending_head.expire > cur_time))
+ continue;
+#endif

This code needs to be optimized so that application can call this at a very
high rate without performance impact.

Carrillo, Erik G

2018-12-07 19:21:55 UTC

Permalink

Hi Stephen,

-----Original Message-----
Sent: Friday, December 7, 2018 12:10 PM
Subject: Re: [dpdk-dev] [PATCH v2 1/2] timer: allow timer management in
shared memory
On Fri, 7 Dec 2018 11:52:59 -0600

Post by Erik Gabriel Carrillo
Currently, the timer library uses a per-process table of structures to
manage skiplists of timers presumably because timers contain arbitrary
function pointers whose value may not resolve properly in other
processes.
However, if the same callback is used handle all timers, and that
callback is only invoked in one process, then it woud be safe to allow
the data structures to be allocated in shared memory, and to allow
secondary processes to modify the timer lists. This would let timers
be used in more multi-process scenarios.
The library's global variables are wrapped with a struct, and an array
of these structures is created in shared memory. The original APIs
are updated to reference the zeroth entry in the array. This maintains
the original behavior for both primary and secondary processes since
the set intersection of their coremasks should be empty [1]. New APIs
are introduced to enable the allocation/deallocation of other entries
in the array.
New variants of the APIs used to start and stop timers are introduced;
they allow a caller to specify which array entry should be used to
locate the timer list to insert into or delete from.
Finally, a new variant of rte_timer_manage() is introduced, which
allows a caller to specify which array entry should be used to locate
the timer lists to process; it can also process multiple timer lists
per invocation.
[1]
https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-

Post by Erik Gabriel Carrillo
rocess-limitations

Makes sense but it looks to me like an ABI breakage. Experimental isn't going
to work for this.

For APIs that existed prior to this patch, I've duplicated them in a "19.02" node in
the map file; I only marked new APIs as experimental. I versioned each API in
order to maintain the prior interface as well. I tested ABI compatibility
with devtools/validate-abi.sh; it reported no errors detected. So I believe this
won't break the ABI, but if I need to change something I certainly will.

Post by Erik Gabriel Carrillo
+static uint32_t default_data_id; // id set to zero automatically

C++ style comments are not allowed per DPDK coding style.
Best to just drop the comment, it is stating the obvious.

Sure - will do.

Post by Erik Gabriel Carrillo
-/* Init the timer library. */
+static inline int
+timer_data_valid(uint32_t id)
+{
+ return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED); }

Don't need inline on static functions.
...

Post by Erik Gabriel Carrillo
+MAP_STATIC_SYMBOL(int rte_timer_manage(void),
+rte_timer_manage_v1902); BIND_DEFAULT_SYMBOL(rte_timer_manage,
+_v1902, 19.02);
+
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id,
+ unsigned int *poll_lcores,
+ int nb_poll_lcores,
+ rte_timer_alt_manage_cb_t f)
+{
+ union rte_timer_status status;
+ struct rte_timer *tim, *next_tim, **pprev;
+ struct rte_timer *run_first_tims[RTE_MAX_LCORE];
+ unsigned int runlist_lcore_ids[RTE_MAX_LCORE];
+ unsigned int this_lcore = rte_lcore_id();
+ struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
+ uint64_t cur_time;
+ int i, j, ret;
+ int nb_runlists = 0;
+ struct rte_timer_data *data;
+ struct priv_timer *privp;
+ uint32_t poll_lcore;
+
+ TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -

EINVAL);

Post by Erik Gabriel Carrillo
+
+ /* timer manager only runs on EAL thread with valid lcore_id */
+ assert(this_lcore < RTE_MAX_LCORE);
+
+ __TIMER_STAT_ADD(data->priv_timer, manage, 1);
+
+ if (poll_lcores == NULL) {
+ poll_lcores = (unsigned int []){rte_lcore_id()};

This isn't going to be safe. It assigns poll_lcores to an array allocated on the
stack.

poll_lcores is allowed to be NULL when rte_timer_alt_manage() is called for
convenience; if it is NULL, then we create an array on the stack
containing one item and point poll_lcores at it. poll_lcores only needs to be
valid for the invocation of the function, so pointing to an array on the stack
seems fine. Did I miss the point?

will

Post by Erik Gabriel Carrillo
+ * be updated atomically, so we can consult that for a quick
+ * check here outside the lock
+ */
+ if (likely(privp->pending_head.expire > cur_time))
+ continue;
+#endif

This code needs to be optimized so that application can call this at a very high
rate without performance impact.

Erik Gabriel Carrillo

2018-12-07 17:53:00 UTC

Permalink

--
2.6.4

Erik Gabriel Carrillo

2018-12-07 20:34:44 UTC

Permalink

This patch introduces a new version of the event timer adapter software
PMD [1]. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a service
core in the primary process dequeued them and processed them further. To
improve performance, this version does away with the ring and lets lcores in
both primary and secondary processes insert timers directly into timer
skiplist data structures; the service core directly accesses the lists as
well, when looking for timers that have expired. (This behavior requires
the patch to the timer library that is referenced below.)

Depends on: https://patches.dpdk.org/project/dpdk/list/?series=2699

[1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html

Changes in v2:
- split this change out into its own patch series

Erik Gabriel Carrillo (1):
eventdev: add new software event timer adapter

lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++---------------
1 file changed, 275 insertions(+), 412 deletions(-)

--
2.6.4

Erik Gabriel Carrillo

2018-12-07 20:34:45 UTC

Permalink

This patch introduces a new version of the event timer adapter software
PMD. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a
service core in the primary process dequeued them and processed them
further. To improve performance, this version does away with the ring
and lets lcores in both primary and secondary processes insert timers
directly into timer skiplist data structures; the service core directly
accesses the lists as well, when looking for timers that have expired.

Signed-off-by: Erik Gabriel Carrillo <***@intel.com>
---
lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++---------------
1 file changed, 275 insertions(+), 412 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index 79070d4..9c528cb 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -7,6 +7,7 @@
#include <inttypes.h>
#include <stdbool.h>
#include <sys/queue.h>
+#include <assert.h>

#include <rte_memzone.h>
#include <rte_memory.h>
@@ -19,6 +20,7 @@
#include <rte_timer.h>
#include <rte_service_component.h>
#include <rte_cycles.h>
+#include <rte_random.h>

#include "rte_eventdev.h"
#include "rte_eventdev_pmd.h"
@@ -34,7 +36,7 @@ static int evtim_buffer_logtype;

static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];

-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops;
+static const struct rte_event_timer_adapter_ops swtim_ops;

#define EVTIM_LOG(level, logtype, ...) \
rte_log(RTE_LOG_ ## level, logtype, \
@@ -211,7 +213,7 @@ rte_event_timer_adapter_create_ext(
* implementation.
*/
if (adapter->ops == NULL)
- adapter->ops = &sw_event_adapter_timer_ops;
+ adapter->ops = &swtim_ops;

/* Allow driver to do some setup */
FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
@@ -334,7 +336,7 @@ rte_event_timer_adapter_lookup(uint16_t adapter_id)
* implementation.
*/
if (adapter->ops == NULL)
- adapter->ops = &sw_event_adapter_timer_ops;
+ adapter->ops = &swtim_ops;

/* Set fast-path function pointers */
adapter->arm_burst = adapter->ops->arm_burst;
@@ -491,6 +493,7 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
}

*nb_events_inv = 0;
+
*nb_events_flushed = rte_event_enqueue_burst(dev_id, port_id,
&events[tail_idx], n);
if (*nb_events_flushed != n && rte_errno == -EINVAL) {
@@ -498,137 +501,123 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
(*nb_events_inv)++;
}

+ if (*nb_events_flushed > 0)
+ EVTIM_BUF_LOG_DBG("enqueued %"PRIu16" timer events to event "
+ "device", *nb_events_flushed);
+
bufp->tail = bufp->tail + *nb_events_flushed + *nb_events_inv;
}

/*
* Software event timer adapter implementation
*/
-
-struct rte_event_timer_adapter_sw_data {
- /* List of messages for outstanding timers */
- TAILQ_HEAD(, msg) msgs_tailq_head;
- /* Lock to guard tailq and armed count */
- rte_spinlock_t msgs_tailq_sl;
+struct swtim {
/* Identifier of service executing timer management logic. */
uint32_t service_id;
/* The cycle count at which the adapter should next tick */
uint64_t next_tick_cycles;
- /* Incremented as the service moves through phases of an iteration */
- volatile int service_phase;
/* The tick resolution used by adapter instance. May have been
* adjusted from what user requested
*/
uint64_t timer_tick_ns;
/* Maximum timeout in nanoseconds allowed by adapter instance. */
uint64_t max_tmo_ns;
- /* Ring containing messages to arm or cancel event timers */
- struct rte_ring *msg_ring;
- /* Mempool containing msg objects */
- struct rte_mempool *msg_pool;
/* Buffered timer expiry events to be enqueued to an event device. */
struct event_buffer buffer;
/* Statistics */
struct rte_event_timer_adapter_stats stats;
- /* The number of threads currently adding to the message ring */
- rte_atomic16_t message_producer_count;
+ /* Mempool of timer objects */
+ struct rte_mempool *tim_pool;
+ /* Back pointer for convenience */
+ struct rte_event_timer_adapter *adapter;
+ /* Identifier of timer data instance */
+ uint32_t timer_data_id;
+ /* Track which cores have actually armed a timer */
+ rte_atomic16_t in_use[RTE_MAX_LCORE];
+ /* Track which cores' timer lists should be polled */
+ unsigned int poll_lcores[RTE_MAX_LCORE];
+ /* The number of lists that should be polled */
+ int n_poll_lcores;
+ /* Lock to atomically access the above two variables */
+ rte_spinlock_t poll_lcores_sl;
};

-enum msg_type {MSG_TYPE_ARM, MSG_TYPE_CANCEL};
-
-struct msg {
- enum msg_type type;
- struct rte_event_timer *evtim;
- struct rte_timer tim;
- TAILQ_ENTRY(msg) msgs;
-};
+static inline struct swtim *
+swtim_pmd_priv(const struct rte_event_timer_adapter *adapter)
+{
+ return adapter->data->adapter_priv;
+}

static void
-sw_event_timer_cb(struct rte_timer *tim, void *arg)
+swtim_callback(void *arg)
{
- int ret;
+ struct rte_timer *tim = arg;
+ struct rte_event_timer *evtim = tim->arg;
+ struct rte_event_timer_adapter *adapter;
+ struct swtim *sw;
uint16_t nb_evs_flushed = 0;
uint16_t nb_evs_invalid = 0;
uint64_t opaque;
- struct rte_event_timer *evtim;
- struct rte_event_timer_adapter *adapter;
- struct rte_event_timer_adapter_sw_data *sw_data;
+ int ret;

- evtim = arg;
opaque = evtim->impl_opaque[1];
adapter = (struct rte_event_timer_adapter *)(uintptr_t)opaque;
- sw_data = adapter->data->adapter_priv;
+ sw = swtim_pmd_priv(adapter);

- ret = event_buffer_add(&sw_data->buffer, &evtim->ev);
+ ret = event_buffer_add(&sw->buffer, &evtim->ev);
if (ret < 0) {
/* If event buffer is full, put timer back in list with
* immediate expiry value, so that we process it again on the
* next iteration.
*/
- rte_timer_reset_sync(tim, 0, SINGLE, rte_lcore_id(),
- sw_event_timer_cb, evtim);
+ rte_timer_alt_reset(sw->timer_data_id, tim, 0, SINGLE,
+ rte_lcore_id(), NULL, evtim);
+
+ sw->stats.evtim_retry_count++;

- sw_data->stats.evtim_retry_count++;
EVTIM_LOG_DBG("event buffer full, resetting rte_timer with "
"immediate expiry value");
} else {
- struct msg *m = container_of(tim, struct msg, tim);
- TAILQ_REMOVE(&sw_data->msgs_tailq_head, m, msgs);
EVTIM_BUF_LOG_DBG("buffered an event timer expiry event");
- evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
+ rte_mempool_put(sw->tim_pool, tim);
+ sw->stats.evtim_exp_count++;

- /* Free the msg object containing the rte_timer now that
- * we've buffered its event successfully.
- */
- rte_mempool_put(sw_data->msg_pool, m);
-
- /* Bump the count when we successfully add an expiry event to
- * the buffer.
- */
- sw_data->stats.evtim_exp_count++;
+ evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
}

- if (event_buffer_batch_ready(&sw_data->buffer)) {
- event_buffer_flush(&sw_data->buffer,
+ if (event_buffer_batch_ready(&sw->buffer)) {
+ event_buffer_flush(&sw->buffer,
adapter->data->event_dev_id,
adapter->data->event_port_id,
&nb_evs_flushed,
&nb_evs_invalid);

- sw_data->stats.ev_enq_count += nb_evs_flushed;
- sw_data->stats.ev_inv_count += nb_evs_invalid;
+ sw->stats.ev_enq_count += nb_evs_flushed;
+ sw->stats.ev_inv_count += nb_evs_invalid;
}
}

static __rte_always_inline uint64_t
get_timeout_cycles(struct rte_event_timer *evtim,
- struct rte_event_timer_adapter *adapter)
+ const struct rte_event_timer_adapter *adapter)
{
- uint64_t timeout_ns;
- struct rte_event_timer_adapter_sw_data *sw_data;
-
- sw_data = adapter->data->adapter_priv;
- timeout_ns = evtim->timeout_ticks * sw_data->timer_tick_ns;
+ struct swtim *sw = swtim_pmd_priv(adapter);
+ uint64_t timeout_ns = evtim->timeout_ticks * sw->timer_tick_ns;
return timeout_ns * rte_get_timer_hz() / NSECPERSEC;
-
}

/* This function returns true if one or more (adapter) ticks have occurred since
* the last time it was called.
*/
static inline bool
-adapter_did_tick(struct rte_event_timer_adapter *adapter)
+swtim_did_tick(struct swtim *sw)
{
uint64_t cycles_per_adapter_tick, start_cycles;
uint64_t *next_tick_cyclesp;
- struct rte_event_timer_adapter_sw_data *sw_data;
-
- sw_data = adapter->data->adapter_priv;
- next_tick_cyclesp = &sw_data->next_tick_cycles;

- cycles_per_adapter_tick = sw_data->timer_tick_ns *
+ next_tick_cyclesp = &sw->next_tick_cycles;
+ cycles_per_adapter_tick = sw->timer_tick_ns *
(rte_get_timer_hz() / NSECPERSEC);
-
start_cycles = rte_get_timer_cycles();

/* Note: initially, *next_tick_cyclesp == 0, so the clause below will
@@ -640,7 +629,6 @@ adapter_did_tick(struct rte_event_timer_adapter *adapter)
* boundary.
*/
start_cycles -= start_cycles % cycles_per_adapter_tick;
-
*next_tick_cyclesp = start_cycles + cycles_per_adapter_tick;

return true;
@@ -655,15 +643,12 @@ check_timeout(struct rte_event_timer *evtim,
const struct rte_event_timer_adapter *adapter)
{
uint64_t tmo_nsec;
- struct rte_event_timer_adapter_sw_data *sw_data;
-
- sw_data = adapter->data->adapter_priv;
- tmo_nsec = evtim->timeout_ticks * sw_data->timer_tick_ns;
+ struct swtim *sw = swtim_pmd_priv(adapter);

- if (tmo_nsec > sw_data->max_tmo_ns)
+ tmo_nsec = evtim->timeout_ticks * sw->timer_tick_ns;
+ if (tmo_nsec > sw->max_tmo_ns)
return -1;
-
- if (tmo_nsec < sw_data->timer_tick_ns)
+ if (tmo_nsec < sw->timer_tick_ns)
return -2;

return 0;
@@ -691,110 +676,34 @@ check_destination_event_queue(struct rte_event_timer *evtim,
return 0;
}

-#define NB_OBJS 32
static int
-sw_event_timer_adapter_service_func(void *arg)
+swtim_service_func(void *arg)
{
- int i, num_msgs;
- uint64_t cycles, opaque;
+ struct rte_event_timer_adapter *adapter = arg;
+ struct swtim *sw = swtim_pmd_priv(adapter);
uint16_t nb_evs_flushed = 0;
uint16_t nb_evs_invalid = 0;
- struct rte_event_timer_adapter *adapter;
- struct rte_event_timer_adapter_sw_data *sw_data;
- struct rte_event_timer *evtim = NULL;
- struct rte_timer *tim = NULL;
- struct msg *msg, *msgs[NB_OBJS];
-
- adapter = arg;
- sw_data = adapter->data->adapter_priv;
-
- sw_data->service_phase = 1;
- rte_smp_wmb();
-
- while (rte_atomic16_read(&sw_data->message_producer_count) > 0 ||
- !rte_ring_empty(sw_data->msg_ring)) {
-
- num_msgs = rte_ring_dequeue_burst(sw_data->msg_ring,
- (void **)msgs, NB_OBJS, NULL);
-
- for (i = 0; i < num_msgs; i++) {
- int ret = 0;
-
- RTE_SET_USED(ret);
-
- msg = msgs[i];
- evtim = msg->evtim;
-
- switch (msg->type) {
- case MSG_TYPE_ARM:
- EVTIM_SVC_LOG_DBG("dequeued ARM message from "
- "ring");
- tim = &msg->tim;
- rte_timer_init(tim);
- cycles = get_timeout_cycles(evtim,
- adapter);
- ret = rte_timer_reset(tim, cycles, SINGLE,
- rte_lcore_id(),
- sw_event_timer_cb,
- evtim);
- RTE_ASSERT(ret == 0);
-
- evtim->impl_opaque[0] = (uintptr_t)tim;
- evtim->impl_opaque[1] = (uintptr_t)adapter;
-
- TAILQ_INSERT_TAIL(&sw_data->msgs_tailq_head,
- msg,
- msgs);
- break;
- case MSG_TYPE_CANCEL:
- EVTIM_SVC_LOG_DBG("dequeued CANCEL message "
- "from ring");
- opaque = evtim->impl_opaque[0];
- tim = (struct rte_timer *)(uintptr_t)opaque;
- RTE_ASSERT(tim != NULL);
-
- ret = rte_timer_stop(tim);
- RTE_ASSERT(ret == 0);
-
- /* Free the msg object for the original arm
- * request.
- */
- struct msg *m;
- m = container_of(tim, struct msg, tim);
- TAILQ_REMOVE(&sw_data->msgs_tailq_head, m,
- msgs);
- rte_mempool_put(sw_data->msg_pool, m);
-
- /* Free the msg object for the current msg */
- rte_mempool_put(sw_data->msg_pool, msg);
-
- evtim->impl_opaque[0] = 0;
- evtim->impl_opaque[1] = 0;
-
- break;
- }
- }
- }
-
- sw_data->service_phase = 2;
- rte_smp_wmb();

- if (adapter_did_tick(adapter)) {
- rte_timer_manage();
+ if (swtim_did_tick(sw)) {
+ /* This lock is seldom acquired on the arm side */
+ rte_spinlock_lock(&sw->poll_lcores_sl);
+ rte_timer_alt_manage(sw->timer_data_id,
+ sw->poll_lcores,
+ sw->n_poll_lcores,
+ swtim_callback);
+ rte_spinlock_unlock(&sw->poll_lcores_sl);

- event_buffer_flush(&sw_data->buffer,
+ event_buffer_flush(&sw->buffer,
adapter->data->event_dev_id,
adapter->data->event_port_id,
- &nb_evs_flushed, &nb_evs_invalid);
+ &nb_evs_flushed,
+ &nb_evs_invalid);

- sw_data->stats.ev_enq_count += nb_evs_flushed;
- sw_data->stats.ev_inv_count += nb_evs_invalid;
- sw_data->stats.adapter_tick_count++;
+ sw->stats.ev_enq_count += nb_evs_flushed;
+ sw->stats.ev_inv_count += nb_evs_invalid;
+ sw->stats.adapter_tick_count++;
}

- sw_data->service_phase = 0;
- rte_smp_wmb();
-
return 0;
}

@@ -828,168 +737,145 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
return cache_size;
}

-#define SW_MIN_INTERVAL 1E5
-
static int
-sw_event_timer_adapter_init(struct rte_event_timer_adapter *adapter)
+swtim_init(struct rte_event_timer_adapter *adapter)
{
- int ret;
- struct rte_event_timer_adapter_sw_data *sw_data;
- uint64_t nb_timers;
+ int i, ret;
+ struct swtim *sw;
unsigned int flags;
struct rte_service_spec service;
- static bool timer_subsystem_inited; // static initialized to false

- /* Allocate storage for SW implementation data */
- char priv_data_name[RTE_RING_NAMESIZE];
- snprintf(priv_data_name, RTE_RING_NAMESIZE, "sw_evtim_adap_priv_%"PRIu8,
- adapter->data->id);
- adapter->data->adapter_priv = rte_zmalloc_socket(
- priv_data_name,
- sizeof(struct rte_event_timer_adapter_sw_data),
- RTE_CACHE_LINE_SIZE,
- adapter->data->socket_id);
- if (adapter->data->adapter_priv == NULL) {
+ /* Allocate storage for private data area */
+#define SWTIM_NAMESIZE 32
+ char swtim_name[SWTIM_NAMESIZE];
+ snprintf(swtim_name, SWTIM_NAMESIZE, "swtim_%"PRIu8,
+ adapter->data->id);
+ sw = rte_zmalloc_socket(swtim_name, sizeof(*sw), RTE_CACHE_LINE_SIZE,
+ adapter->data->socket_id);
+ if (sw == NULL) {
EVTIM_LOG_ERR("failed to allocate space for private data");
rte_errno = ENOMEM;
return -1;
}

- if (adapter->data->conf.timer_tick_ns < SW_MIN_INTERVAL) {
- EVTIM_LOG_ERR("failed to create adapter with requested tick "
- "interval");
- rte_errno = EINVAL;
- return -1;
- }
-
- sw_data = adapter->data->adapter_priv;
-
- sw_data->timer_tick_ns = adapter->data->conf.timer_tick_ns;
- sw_data->max_tmo_ns = adapter->data->conf.max_tmo_ns;
+ /* Connect storage to adapter instance */
+ adapter->data->adapter_priv = sw;
+ sw->adapter = adapter;

- TAILQ_INIT(&sw_data->msgs_tailq_head);
- rte_spinlock_init(&sw_data->msgs_tailq_sl);
- rte_atomic16_init(&sw_data->message_producer_count);
-
- /* Rings require power of 2, so round up to next such value */
- nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
-
- char msg_ring_name[RTE_RING_NAMESIZE];
- snprintf(msg_ring_name, RTE_RING_NAMESIZE,
- "sw_evtim_adap_msg_ring_%"PRIu8, adapter->data->id);
- flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
- RING_F_SP_ENQ | RING_F_SC_DEQ :
- RING_F_SC_DEQ;
- sw_data->msg_ring = rte_ring_create(msg_ring_name, nb_timers,
- adapter->data->socket_id, flags);
- if (sw_data->msg_ring == NULL) {
- EVTIM_LOG_ERR("failed to create message ring");
- rte_errno = ENOMEM;
- goto free_priv_data;
- }
+ sw->timer_tick_ns = adapter->data->conf.timer_tick_ns;
+ sw->max_tmo_ns = adapter->data->conf.max_tmo_ns;

- char pool_name[RTE_RING_NAMESIZE];
- snprintf(pool_name, RTE_RING_NAMESIZE, "sw_evtim_adap_msg_pool_%"PRIu8,
+ /* Create a timer pool */
+ char pool_name[SWTIM_NAMESIZE];
+ snprintf(pool_name, SWTIM_NAMESIZE, "swtim_pool_%"PRIu8,
adapter->data->id);
-
- /* Both the arming/canceling thread and the service thread will do puts
- * to the mempool, but if the SP_PUT flag is enabled, we can specify
- * single-consumer get for the mempool.
- */
- flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
- MEMPOOL_F_SC_GET : 0;
-
- /* The usable size of a ring is count - 1, so subtract one here to
- * make the counts agree.
- */
+ /* Optimal mempool size is a power of 2 minus one */
+ uint64_t nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
int pool_size = nb_timers - 1;
int cache_size = compute_msg_mempool_cache_size(
adapter->data->conf.nb_timers, nb_timers);
- sw_data->msg_pool = rte_mempool_create(pool_name, pool_size,
- sizeof(struct msg), cache_size,
- 0, NULL, NULL, NULL, NULL,
- adapter->data->socket_id, flags);
- if (sw_data->msg_pool == NULL) {
- EVTIM_LOG_ERR("failed to create message object mempool");
+ flags = 0; /* pool is multi-producer, multi-consumer */
+ sw->tim_pool = rte_mempool_create(pool_name, pool_size,
+ sizeof(struct rte_timer), cache_size, 0, NULL, NULL,
+ NULL, NULL, adapter->data->socket_id, flags);
+ if (sw->tim_pool == NULL) {
+ EVTIM_LOG_ERR("failed to create timer object mempool");
rte_errno = ENOMEM;
- goto free_msg_ring;
+ goto free_alloc;
+ }
+
+ /* Initialize the variables that track in-use timer lists */
+ rte_spinlock_init(&sw->poll_lcores_sl);
+ for (i = 0; i < RTE_MAX_LCORE; i++)
+ rte_atomic16_init(&sw->in_use[i]);
+
+ /* Initialize the timer subsystem and allocate timer data instance */
+ ret = rte_timer_subsystem_init();
+ if (ret < 0) {
+ if (ret != -EALREADY) {
+ EVTIM_LOG_ERR("failed to initialize timer subsystem");
+ rte_errno = ret;
+ goto free_mempool;
+ }
+ }
+
+ ret = rte_timer_data_alloc(&sw->timer_data_id);
+ if (ret < 0) {
+ EVTIM_LOG_ERR("failed to allocate timer data instance");
+ rte_errno = ret;
+ goto free_mempool;
}

- event_buffer_init(&sw_data->buffer);
+ /* Initialize timer event buffer */
+ event_buffer_init(&sw->buffer);
+
+ sw->adapter = adapter;

/* Register a service component to run adapter logic */
memset(&service, 0, sizeof(service));
snprintf(service.name, RTE_SERVICE_NAME_MAX,
- "sw_evimer_adap_svc_%"PRIu8, adapter->data->id);
+ "swtim_svc_%"PRIu8, adapter->data->id);
service.socket_id = adapter->data->socket_id;
- service.callback = sw_event_timer_adapter_service_func;
+ service.callback = swtim_service_func;
service.callback_userdata = adapter;
service.capabilities &= ~(RTE_SERVICE_CAP_MT_SAFE);
- ret = rte_service_component_register(&service, &sw_data->service_id);
+ ret = rte_service_component_register(&service, &sw->service_id);
if (ret < 0) {
EVTIM_LOG_ERR("failed to register service %s with id %"PRIu32
- ": err = %d", service.name, sw_data->service_id,
+ ": err = %d", service.name, sw->service_id,
ret);

rte_errno = ENOSPC;
- goto free_msg_pool;
+ goto free_mempool;
}

EVTIM_LOG_DBG("registered service %s with id %"PRIu32, service.name,
- sw_data->service_id);
+ sw->service_id);

- adapter->data->service_id = sw_data->service_id;
+ adapter->data->service_id = sw->service_id;
adapter->data->service_inited = 1;

- if (!timer_subsystem_inited) {
- rte_timer_subsystem_init();
- timer_subsystem_inited = true;
- }
-
return 0;
-
-free_msg_pool:
- rte_mempool_free(sw_data->msg_pool);
-free_msg_ring:
- rte_ring_free(sw_data->msg_ring);
-free_priv_data:
- rte_free(sw_data);
+free_mempool:
+ rte_mempool_free(sw->tim_pool);
+free_alloc:
+ rte_free(sw);
return -1;
}

-static int
-sw_event_timer_adapter_uninit(struct rte_event_timer_adapter *adapter)
+static void
+swtim_free_tim(struct rte_timer *tim, void *arg)
{
- int ret;
- struct msg *m1, *m2;
- struct rte_event_timer_adapter_sw_data *sw_data =
- adapter->data->adapter_priv;
+ struct swtim *sw = arg;

- rte_spinlock_lock(&sw_data->msgs_tailq_sl);
-
- /* Cancel outstanding rte_timers and free msg objects */
- m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head);
- while (m1 != NULL) {
- EVTIM_LOG_DBG("freeing outstanding timer");
- m2 = TAILQ_NEXT(m1, msgs);
-
- rte_timer_stop_sync(&m1->tim);
- rte_mempool_put(sw_data->msg_pool, m1);
+ rte_mempool_put(sw->tim_pool, (void *)tim);
+}

- m1 = m2;
- }
+/* Traverse the list of outstanding timers and put them back in the mempool
+ * before freeing the adapter to avoid leaking the memory.
+ */
+static int
+swtim_uninit(struct rte_event_timer_adapter *adapter)
+{
+ int ret;
+ struct swtim *sw = swtim_pmd_priv(adapter);

- rte_spinlock_unlock(&sw_data->msgs_tailq_sl);
+ /* Free outstanding timers */
+ rte_timer_stop_all(sw->timer_data_id,
+ sw->poll_lcores,
+ sw->n_poll_lcores,
+ swtim_free_tim,
+ sw);

- ret = rte_service_component_unregister(sw_data->service_id);
+ ret = rte_service_component_unregister(sw->service_id);
if (ret < 0) {
EVTIM_LOG_ERR("failed to unregister service component");
return ret;
}

- rte_ring_free(sw_data->msg_ring);
- rte_mempool_free(sw_data->msg_pool);
- rte_free(adapter->data->adapter_priv);
+ rte_mempool_free(sw->tim_pool);
+ rte_free(sw);
+ adapter->data->adapter_priv = NULL;

return 0;
}
@@ -1010,88 +896,79 @@ get_mapped_count_for_service(uint32_t service_id)
}

static int
-sw_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+swtim_start(const struct rte_event_timer_adapter *adapter)
{
int mapped_count;
- struct rte_event_timer_adapter_sw_data *sw_data;
-
- sw_data = adapter->data->adapter_priv;
+ struct swtim *sw = swtim_pmd_priv(adapter);

/* Mapping the service to more than one service core can introduce
* delays while one thread is waiting to acquire a lock, so only allow
* one core to be mapped to the service.
+ *
+ * Note: the service could be modified such that it spreads cores to
+ * poll over multiple service instances.
*/
- mapped_count = get_mapped_count_for_service(sw_data->service_id);
+ mapped_count = get_mapped_count_for_service(sw->service_id);

- if (mapped_count == 1)
- return rte_service_component_runstate_set(sw_data->service_id,
- 1);
+ if (mapped_count != 1)
+ return mapped_count < 1 ? -ENOENT : -ENOTSUP;

- return mapped_count < 1 ? -ENOENT : -ENOTSUP;
+ return rte_service_component_runstate_set(sw->service_id, 1);
}

static int
-sw_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+swtim_stop(const struct rte_event_timer_adapter *adapter)
{
int ret;
- struct rte_event_timer_adapter_sw_data *sw_data =
- adapter->data->adapter_priv;
+ struct swtim *sw = swtim_pmd_priv(adapter);

- ret = rte_service_component_runstate_set(sw_data->service_id, 0);
+ ret = rte_service_component_runstate_set(sw->service_id, 0);
if (ret < 0)
return ret;

- /* Wait for the service to complete its final iteration before
- * stopping.
- */
- while (sw_data->service_phase != 0)
+ /* Wait for the service to complete its final iteration */
+ while (rte_service_may_be_active(sw->service_id))
rte_pause();

- rte_smp_rmb();
-
return 0;
}

static void
-sw_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+swtim_get_info(const struct rte_event_timer_adapter *adapter,
struct rte_event_timer_adapter_info *adapter_info)
{
- struct rte_event_timer_adapter_sw_data *sw_data;
- sw_data = adapter->data->adapter_priv;
-
- adapter_info->min_resolution_ns = sw_data->timer_tick_ns;
- adapter_info->max_tmo_ns = sw_data->max_tmo_ns;
+ struct swtim *sw = swtim_pmd_priv(adapter);
+ adapter_info->min_resolution_ns = sw->timer_tick_ns;
+ adapter_info->max_tmo_ns = sw->max_tmo_ns;
}

static int
-sw_event_timer_adapter_stats_get(const struct rte_event_timer_adapter *adapter,
- struct rte_event_timer_adapter_stats *stats)
+swtim_stats_get(const struct rte_event_timer_adapter *adapter,
+ struct rte_event_timer_adapter_stats *stats)
{
- struct rte_event_timer_adapter_sw_data *sw_data;
- sw_data = adapter->data->adapter_priv;
- *stats = sw_data->stats;
+ struct swtim *sw = swtim_pmd_priv(adapter);
+ *stats = sw->stats; /* structure copy */
return 0;
}

static int
-sw_event_timer_adapter_stats_reset(
- const struct rte_event_timer_adapter *adapter)
+swtim_stats_reset(const struct rte_event_timer_adapter *adapter)
{
- struct rte_event_timer_adapter_sw_data *sw_data;
- sw_data = adapter->data->adapter_priv;
- memset(&sw_data->stats, 0, sizeof(sw_data->stats));
+ struct swtim *sw = swtim_pmd_priv(adapter);
+ memset(&sw->stats, 0, sizeof(sw->stats));
return 0;
}

-static __rte_always_inline uint16_t
-__sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
- struct rte_event_timer **evtims,
- uint16_t nb_evtims)
+static uint16_t
+__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+ struct rte_event_timer **evtims,
+ uint16_t nb_evtims)
{
- uint16_t i;
- int ret;
- struct rte_event_timer_adapter_sw_data *sw_data;
- struct msg *msgs[nb_evtims];
+ int i, ret;
+ struct swtim *sw = swtim_pmd_priv(adapter);
+ uint32_t lcore_id = rte_lcore_id();
+ struct rte_timer *tim, *tims[nb_evtims];
+ uint64_t cycles;

#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
/* Check that the service is running. */
@@ -1101,101 +978,104 @@ __sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
}
#endif

- sw_data = adapter->data->adapter_priv;
+ /* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
+ * the highest lcore to insert such timers into
+ */
+ if (lcore_id == LCORE_ID_ANY)
+ lcore_id = RTE_MAX_LCORE - 1;
+
+ /* If this is the first time we're arming an event timer on this lcore,
+ * mark this lcore as "in use"; this will cause the service
+ * function to process the timer list that corresponds to this lcore.
+ */
+ if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id]))) {
+ rte_spinlock_lock(&sw->poll_lcores_sl);
+ EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
+ lcore_id);
+ sw->poll_lcores[sw->n_poll_lcores++] = lcore_id;
+ rte_spinlock_unlock(&sw->poll_lcores_sl);
+ }

- ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
+ ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
+ nb_evtims);
if (ret < 0) {
rte_errno = ENOSPC;
return 0;
}

- /* Let the service know we're producing messages for it to process */
- rte_atomic16_inc(&sw_data->message_producer_count);
-
- /* If the service is managing timers, wait for it to finish */
- while (sw_data->service_phase == 2)
- rte_pause();
-
- rte_smp_rmb();
-
for (i = 0; i < nb_evtims; i++) {
/* Don't modify the event timer state in these cases */
if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
rte_errno = EALREADY;
break;
} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
- evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+ evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
rte_errno = EINVAL;
break;
}

ret = check_timeout(evtims[i], adapter);
- if (ret == -1) {
+ if (unlikely(ret == -1)) {
evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
rte_errno = EINVAL;
break;
- }
- if (ret == -2) {
+ } else if (unlikely(ret == -2)) {
evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
rte_errno = EINVAL;
break;
}

- if (check_destination_event_queue(evtims[i], adapter) < 0) {
+ if (unlikely(check_destination_event_queue(evtims[i],
+ adapter) < 0)) {
evtims[i]->state = RTE_EVENT_TIMER_ERROR;
rte_errno = EINVAL;
break;
}

- /* Checks passed, set up a message to enqueue */
- msgs[i]->type = MSG_TYPE_ARM;
- msgs[i]->evtim = evtims[i];
+ tim = tims[i];
+ rte_timer_init(tim);

- /* Set the payload pointer if not set. */
- if (evtims[i]->ev.event_ptr == NULL)
- evtims[i]->ev.event_ptr = evtims[i];
+ evtims[i]->impl_opaque[0] = (uintptr_t)tim;
+ evtims[i]->impl_opaque[1] = (uintptr_t)adapter;

- /* msg objects that get enqueued successfully will be freed
- * either by a future cancel operation or by the timer
- * expiration callback.
- */
- if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
- rte_errno = ENOSPC;
+ cycles = get_timeout_cycles(evtims[i], adapter);
+ ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles,
+ SINGLE, lcore_id, NULL, evtims[i]);
+ if (ret < 0) {
+ /* tim was in RUNNING or CONFIG state */
+ evtims[i]->state = RTE_EVENT_TIMER_ERROR;
break;
}

- EVTIM_LOG_DBG("enqueued ARM message to ring");
-
+ rte_smp_wmb();
+ EVTIM_LOG_DBG("armed an event timer");
evtims[i]->state = RTE_EVENT_TIMER_ARMED;
}

- /* Let the service know we're done producing messages */
- rte_atomic16_dec(&sw_data->message_producer_count);
-
if (i < nb_evtims)
- rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
- nb_evtims - i);
+ rte_mempool_put_bulk(sw->tim_pool,
+ (void **)&tims[i], nb_evtims - i);

return i;
}

static uint16_t
-sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
- struct rte_event_timer **evtims,
- uint16_t nb_evtims)
+swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+ struct rte_event_timer **evtims,
+ uint16_t nb_evtims)
{
- return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+ return __swtim_arm_burst(adapter, evtims, nb_evtims);
}

static uint16_t
-sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
- struct rte_event_timer **evtims,
- uint16_t nb_evtims)
+swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
+ struct rte_event_timer **evtims,
+ uint16_t nb_evtims)
{
- uint16_t i;
- int ret;
- struct rte_event_timer_adapter_sw_data *sw_data;
- struct msg *msgs[nb_evtims];
+ int i, ret;
+ struct rte_timer *timp;
+ uint64_t opaque;
+ struct swtim *sw = swtim_pmd_priv(adapter);

#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
/* Check that the service is running. */
@@ -1205,23 +1085,6 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
}
#endif

- sw_data = adapter->data->adapter_priv;
-
- ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
- if (ret < 0) {
- rte_errno = ENOSPC;
- return 0;
- }
-
- /* Let the service know we're producing messages for it to process */
- rte_atomic16_inc(&sw_data->message_producer_count);
-
- /* If the service could be modifying event timer states, wait */
- while (sw_data->service_phase == 2)
- rte_pause();
-
- rte_smp_rmb();
-
for (i = 0; i < nb_evtims; i++) {
/* Don't modify the event timer state in these cases */
if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
@@ -1232,54 +1095,54 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
break;
}

- msgs[i]->type = MSG_TYPE_CANCEL;
- msgs[i]->evtim = evtims[i];
+ opaque = evtims[i]->impl_opaque[0];
+ timp = (struct rte_timer *)(uintptr_t)opaque;
+ RTE_ASSERT(timp != NULL);

- if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
- rte_errno = ENOSPC;
+ ret = rte_timer_alt_stop(sw->timer_data_id, timp);
+ if (ret < 0) {
+ /* Timer is running or being configured */
+ rte_errno = EAGAIN;
break;
}

- EVTIM_LOG_DBG("enqueued CANCEL message to ring");
+ rte_mempool_put(sw->tim_pool, (void **)timp);

evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
- }
+ evtims[i]->impl_opaque[0] = 0;
+ evtims[i]->impl_opaque[1] = 0;

- /* Let the service know we're done producing messages */
- rte_atomic16_dec(&sw_data->message_producer_count);
-
- if (i < nb_evtims)
- rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
- nb_evtims - i);
+ rte_smp_wmb();
+ }

return i;
}

static uint16_t
-sw_event_timer_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
- struct rte_event_timer **evtims,
- uint64_t timeout_ticks,
- uint16_t nb_evtims)
+swtim_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
+ struct rte_event_timer **evtims,
+ uint64_t timeout_ticks,
+ uint16_t nb_evtims)
{
int i;

for (i = 0; i < nb_evtims; i++)
evtims[i]->timeout_ticks = timeout_ticks;

- return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+ return __swtim_arm_burst(adapter, evtims, nb_evtims);
}

-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops = {
- .init = sw_event_timer_adapter_init,
- .uninit = sw_event_timer_adapter_uninit,
- .start = sw_event_timer_adapter_start,
- .stop = sw_event_timer_adapter_stop,
- .get_info = sw_event_timer_adapter_get_info,
- .stats_get = sw_event_timer_adapter_stats_get,
- .stats_reset = sw_event_timer_adapter_stats_reset,
- .arm_burst = sw_event_timer_arm_burst,
- .arm_tmo_tick_burst = sw_event_timer_arm_tmo_tick_burst,
- .cancel_burst = sw_event_timer_cancel_burst,
+static const struct rte_event_timer_adapter_ops swtim_ops = {
+ .init = swtim_init,
+ .uninit = swtim_uninit,
+ .start = swtim_start,
+ .stop = swtim_stop,
+ .get_info = swtim_get_info,
+ .stats_get = swtim_stats_get,
+ .stats_reset = swtim_stats_reset,
+ .arm_burst = swtim_arm_burst,
+ .arm_tmo_tick_burst = swtim_arm_tmo_tick_burst,
+ .cancel_burst = swtim_cancel_burst,
};

RTE_INIT(event_timer_adapter_init_log)

--
2.6.4

Carrillo, Erik G

2018-12-10 17:17:11 UTC

Permalink

Hi Mattias,

-----Original Message-----
Sent: Sunday, December 9, 2018 1:17 PM
Subject: Re: [dpdk-dev] [PATCH v2 1/1] eventdev: add new software event
timer adapter

Post by Erik Gabriel Carrillo
This patch introduces a new version of the event timer adapter
software PMD. In the original design, timer event producer lcores in
the primary and secondary processes enqueued event timers into a ring,
and a service core in the primary process dequeued them and processed
them further. To improve performance, this version does away with the
ring and lets lcores in both primary and secondary processes insert
timers directly into timer skiplist data structures; the service core
directly accesses the lists as well, when looking for timers that have

expired.

Post by Erik Gabriel Carrillo
---
lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++------

---------

Post by Erik Gabriel Carrillo
1 file changed, 275 insertions(+), 412 deletions(-)
diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c
b/lib/librte_eventdev/rte_event_timer_adapter.c
index 79070d4..9c528cb 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -7,6 +7,7 @@
#include <inttypes.h>
#include <stdbool.h>
#include <sys/queue.h>
+#include <assert.h>

You have no assert() calls, from what I can see. Include <rte_debug.h> for
RTE_ASSERT().

Indeed - looks like I can remove that.

<...snipped...>

Post by Erik Gabriel Carrillo
+static void
+swtim_free_tim(struct rte_timer *tim, void *arg)
{
- int ret;
- struct msg *m1, *m2;
- struct rte_event_timer_adapter_sw_data *sw_data =
- adapter->data-
adapter_priv;
+ struct swtim *sw = arg;
- rte_spinlock_lock(&sw_data->msgs_tailq_sl);
-
- /* Cancel outstanding rte_timers and free msg objects */
- m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head);
- while (m1 != NULL) {
- EVTIM_LOG_DBG("freeing outstanding timer");
- m2 = TAILQ_NEXT(m1, msgs);
-
- rte_timer_stop_sync(&m1->tim);
- rte_mempool_put(sw_data->msg_pool, m1);
+ rte_mempool_put(sw->tim_pool, (void *)tim); }

No cast required.

Will update.

<...snipped...>

Post by Erik Gabriel Carrillo
+static uint16_t
+__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+ struct rte_event_timer **evtims,
+ uint16_t nb_evtims)
{
- uint16_t i;
- int ret;
- struct rte_event_timer_adapter_sw_data *sw_data;
- struct msg *msgs[nb_evtims];
+ int i, ret;
+ struct swtim *sw = swtim_pmd_priv(adapter);
+ uint32_t lcore_id = rte_lcore_id();
+ struct rte_timer *tim, *tims[nb_evtims];
+ uint64_t cycles;
#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
__sw_event_timer_arm_burst(const struct rte_event_timer_adapter

*adapter,

Post by Erik Gabriel Carrillo
}
#endif
- sw_data = adapter->data->adapter_priv;
+ /* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
+ * the highest lcore to insert such timers into
+ */
+ if (lcore_id == LCORE_ID_ANY)
+ lcore_id = RTE_MAX_LCORE - 1;
+
+ /* If this is the first time we're arming an event timer on this lcore,
+ * mark this lcore as "in use"; this will cause the service
+ * function to process the timer list that corresponds to this lcore.
+ */
+ if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id]))) {

I suspect we have a performance critical false sharing issue above.
Many/all flags are going to be arranged on the same cache line.

Good catch - thanks for spotting this. I'll update the

Continue reading on narkive:

Search results for '[dpdk-dev] [PATCH 0/3] new software event timer adapter' (Questions and Answers)

replies

Nokia N series?

started 2006-09-14 00:56:23 UTC

cell phones & plans

replies

Can u suggest the mobile?

started 2007-05-31 08:52:08 UTC

cell phones & plans