Discussion:
[dpdk-dev] [RFC PATCH 0/3] Add rte_eth_read_clock API
(too old to reply)
Tom Barbette
2018-11-28 09:52:09 UTC
Permalink
Some NICs allows to timestamp packets, but do not support the full
PTP synchronization process. Hence, the value set in the mbuf
timestamp field is only the raw value of an internal clock.

To make sense of this value, one at least needs to be able to query
the current hardware clock value. As with the TSC, from there
a frequency can be derieved by querying multiple time the current value of the
internal clock with some known delay between the queries.

This RFC patch series adds support for MLX5.

An example is provided in the rxtx_callback application.
It has been updated to display, on top of the software latency
in cycles, the total latency since the packet was received in hardware.
The API is used to compute a delta in the TX callback. The raw amount of
ticks is converted to cycles using the technique describe above.

Aside from offloading timestamping, which relieve the
software from a few operations, this allows to get much more precision
when studying the source of the latency in a system.
Eg. in our 100G, CX5 setup the rxtx callback application shows
SW latency is around 74 cycles (TSC is 3.2Ghz), but the latency
including NIC processing, PCIe, and queuing is around 196 cycles.

The RFC lacks the documentation update for the sample application. I wanted
further validation before doing so. I'm not sure if it's fine to
update an example to show a "double feature". At the same time a full app for
this would be overkill, and the information nicely complements the one
in rxtx_callback.

Tom Barbette (3):
rte_ethdev: Add API function to read dev clock
mlx5: Implement support for read_clock
rxtx_callbacks: Add support for HW timestamp

doc/guides/nics/features.rst | 1 +
drivers/net/mlx5/mlx5.c | 1 +
drivers/net/mlx5/mlx5.h | 1 +
drivers/net/mlx5/mlx5_ethdev.c | 31 +++++++++
drivers/net/mlx5/mlx5_glue.c | 8 +++
drivers/net/mlx5/mlx5_glue.h | 2 +
examples/rxtx_callbacks/Makefile | 2 +
examples/rxtx_callbacks/main.c | 87 ++++++++++++++++++++++--
examples/rxtx_callbacks/meson.build | 1 +
lib/librte_ethdev/rte_ethdev.c | 13 ++++
lib/librte_ethdev/rte_ethdev.h | 23 +++++++
lib/librte_ethdev/rte_ethdev_core.h | 6 ++
lib/librte_ethdev/rte_ethdev_version.map | 1 +
lib/librte_mbuf/rte_mbuf.h | 2 +
14 files changed, 175 insertions(+), 4 deletions(-)

--
2.17.1
Tom Barbette
2018-11-28 09:52:10 UTC
Permalink
Add rte_eth_read_clock to read the current clock of a devide.

The main use is to get the current clock as written by the driver in
the timestamp field of the pkt mbuf when timestamp offloading is
enabled.

This function was missing to allow users to convert that RX timestamp field
to real time without the complexity of the rte_timesync* facility. One can
derivate the clock frequency by calling twice read_clock and then keep a
common time base.

Signed-off-by: Tom Barbette <***@kth.se>
---
doc/guides/nics/features.rst | 1 +
lib/librte_ethdev/rte_ethdev.c | 13 +++++++++++++
lib/librte_ethdev/rte_ethdev.h | 23 +++++++++++++++++++++++
lib/librte_ethdev/rte_ethdev_core.h | 6 ++++++
lib/librte_ethdev/rte_ethdev_version.map | 1 +
lib/librte_mbuf/rte_mbuf.h | 2 ++
6 files changed, 46 insertions(+)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index d3f904839..45484a30f 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -602,6 +602,7 @@ Supports Timestamp.
* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_TIMESTAMP``.
* **[provides] mbuf**: ``mbuf.timestamp``.
* **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa: DEV_RX_OFFLOAD_TIMESTAMP``.
+* **[implements] eth_dev_ops**: ``read_clock``.

.. _nic_features_macsec_offload:

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 5f858174b..48e8218b2 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -4124,6 +4124,19 @@ rte_eth_timesync_write_time(uint16_t port_id, const struct timespec *timestamp)
timestamp));
}

+int
+rte_eth_read_clock(uint16_t port_id, uint64_t *timestamp)
+{
+ struct rte_eth_dev *dev;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+ dev = &rte_eth_devices[port_id];
+
+ RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->read_clock, -ENOTSUP);
+ return eth_err(port_id, (*dev->dev_ops->read_clock)(dev,
+ timestamp));
+}
+
int
rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
{
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 1960f3a2d..6e3a48308 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -3642,6 +3642,29 @@ int rte_eth_timesync_read_time(uint16_t port_id, struct timespec *time);
*/
int rte_eth_timesync_write_time(uint16_t port_id, const struct timespec *time);

+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Read the current clock counter of an Ethernet device
+ *
+ * This returns the current raw clock value of an Ethernet device.
+ * The value returned here is from the same clock than the one
+ * filling timestamp field of RX packets. Therefore it can be used
+ * to compute a precise conversion of the device clock to the real time.
+ *
+ * @param port_id
+ * The port identifier of the Ethernet device.
+ * @param time
+ * Pointer to the uint64_t that holds the raw clock value.
+ *
+ * @return
+ * - 0: Success.
+ * - -ENODEV: The port ID is invalid.
+ * - -ENOTSUP: The function is not supported by the Ethernet driver.
+ */
+int __rte_experimental rte_eth_read_clock(uint16_t port_id, uint64_t *time);
+
/**
* Config l2 tunnel ether type of an Ethernet device for filtering specific
* tunnel packets by ether type.
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index 8f03f83f6..86806b3eb 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -322,6 +322,10 @@ typedef int (*eth_timesync_write_time)(struct rte_eth_dev *dev,
const struct timespec *timestamp);
/**< @internal Function used to get time from the device clock */

+typedef int (*eth_read_clock)(struct rte_eth_dev *dev,
+ uint64_t *timestamp);
+/**< @internal Function used to get the current value of the device clock. */
+
typedef int (*eth_get_reg_t)(struct rte_eth_dev *dev,
struct rte_dev_reg_info *info);
/**< @internal Retrieve registers */
@@ -496,6 +500,8 @@ struct eth_dev_ops {
eth_timesync_read_time timesync_read_time; /** Get the device clock time. */
eth_timesync_write_time timesync_write_time; /** Set the device clock time. */

+ eth_read_clock read_clock;
+
eth_xstats_get_by_id_t xstats_get_by_id;
/**< Get extended device statistic values by ID. */
eth_xstats_get_names_by_id_t xstats_get_names_by_id;
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index 92ac3de25..12d6c3c1d 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -249,6 +249,7 @@ EXPERIMENTAL {
rte_eth_switch_domain_free;
rte_flow_conv;
rte_flow_expand_rss;
+ rte_eth_read_clock;
rte_mtr_capabilities_get;
rte_mtr_create;
rte_mtr_destroy;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 3dbc6695e..4ceefa913 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -605,6 +605,8 @@ struct rte_mbuf {

/** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
* are not normalized but are always the same for a given port.
+ * Some devices allow to query rte_eth_read_clock that will return the
+ * current device timestamp.
*/
uint64_t timestamp;

--
2.17.1
z***@semihalf.com
2018-12-05 16:33:20 UTC
Permalink
From: Zyta Szpak <***@semihalf.com>

I'd ask few questions: how is this new rte_ethdev_read_clock different from existing rte_eth_timesync_read_time ?
Is it that this new function does not convert the clock raw value into timespec? Would it be too much overhead to use the timespec type of value? Or do they intent to read different clocks?
Tom Barbette
2018-12-08 12:07:42 UTC
Permalink
Hi Zyta,

This was my initial proposal, but after discussion with Shahaf Shuler we thought this was too far from the original intent of the function.
rte_eth_timesync_read_time is clearly identified as part of a set of function to use PTP synchronization.
This clock is not "sync" in any way. More importantly, the returned value is not a timeval. So it's true this patch actually does a cast from timeval in ib_verbs, but it's already a "bad usage" in some sense because the value sitting in tv_nsec is not an amount of nsec but only an amount of ticks. I would be afraid some people seeing the timeval type of rte_eth_timesync_read_time would use it blindly.

But as far as I'm concerned, I can go with one or the other in our use cases...

Thanks,
Tom


________________________________________
De : ***@semihalf.com <***@semihalf.com>
Envoyé : mercredi 5 décembre 2018 17:33
À : Tom Barbette; ***@dpdk.org
Cc : Zyta Szpak
Objet : Re: [RFC PATCH 0/3] Add rte_eth_read_clock API

From: Zyta Szpak <***@semihalf.com>

I'd ask few questions: how is this new rte_ethdev_read_clock different from existing rte_eth_timesync_read_time ?
Is it that this new function does not convert the clock raw value into timespec? Would it be too much overhead to use the timespec type of value? Or do they intent to read different clocks?
Shahaf Shuler
2018-12-09 06:00:09 UTC
Permalink
Hi Tom,

See few nits below.

Wednesday, November 28, 2018 11:52 AM, Tom Barbette:
> Subject: [RFC PATCH 1/3] rte_ethdev: Add API function to read dev clock
>
> Add rte_eth_read_clock to read the current clock of a devide.

To be explicit, the **raw** clock of the device.

>
> The main use is to get the current clock as written by the driver in the
> timestamp field of the pkt mbuf when timestamp offloading is enabled.

I think the main use of it is to get the device clock conversion co-eff to be able to translate the mbuf raw timestamp to a local synced time value.

>
> This function was missing to allow users to convert that RX timestamp field to
> real time without the complexity of the rte_timesync* facility. One can
> derivate the clock frequency by calling twice read_clock and then keep a
> common time base.
>
> Signed-off-by: Tom Barbette <***@kth.se>
> ---
> doc/guides/nics/features.rst | 1 +
> lib/librte_ethdev/rte_ethdev.c | 13 +++++++++++++
> lib/librte_ethdev/rte_ethdev.h | 23 +++++++++++++++++++++++
> lib/librte_ethdev/rte_ethdev_core.h | 6 ++++++
> lib/librte_ethdev/rte_ethdev_version.map | 1 +
> lib/librte_mbuf/rte_mbuf.h | 2 ++
> 6 files changed, 46 insertions(+)
>
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index
> d3f904839..45484a30f 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -602,6 +602,7 @@ Supports Timestamp.
> * **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_TIMESTAMP``.
> * **[provides] mbuf**: ``mbuf.timestamp``.
> * **[provides] rte_eth_dev_info**:
> ``rx_offload_capa,rx_queue_offload_capa:
> DEV_RX_OFFLOAD_TIMESTAMP``.
> +* **[implements] eth_dev_ops**: ``read_clock``.
>
> .. _nic_features_macsec_offload:
>
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 5f858174b..48e8218b2 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -4124,6 +4124,19 @@ rte_eth_timesync_write_time(uint16_t port_id,
> const struct timespec *timestamp)
> timestamp));
> }
>
> +int
> +rte_eth_read_clock(uint16_t port_id, uint64_t *timestamp) {
> + struct rte_eth_dev *dev;
> +
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> + dev = &rte_eth_devices[port_id];
> +
> + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->read_clock, -
> ENOTSUP);
> + return eth_err(port_id, (*dev->dev_ops->read_clock)(dev,
> + timestamp));
> +}
> +
> int
> rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
> { diff --git a/lib/librte_ethdev/rte_ethdev.h
> b/lib/librte_ethdev/rte_ethdev.h index 1960f3a2d..6e3a48308 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -3642,6 +3642,29 @@ int rte_eth_timesync_read_time(uint16_t
> port_id, struct timespec *time);
> */
> int rte_eth_timesync_write_time(uint16_t port_id, const struct timespec
> *time);
>
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Read the current clock counter of an Ethernet device
> + *
> + * This returns the current raw clock value of an Ethernet device.
> + * The value returned here is from the same clock than the one
> + * filling timestamp field of RX packets. Therefore it can be used
> + * to compute a precise conversion of the device clock to the real time.

I would add in the documentation of the function the heuristics to get the conversion co-eff.

> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param time
> + * Pointer to the uint64_t that holds the raw clock value.
> + *
> + * @return
> + * - 0: Success.
> + * - -ENODEV: The port ID is invalid.
> + * - -ENOTSUP: The function is not supported by the Ethernet driver.
> + */
> +int __rte_experimental rte_eth_read_clock(uint16_t port_id, uint64_t
> +*time);
> +
> /**
> * Config l2 tunnel ether type of an Ethernet device for filtering specific
> * tunnel packets by ether type.
> diff --git a/lib/librte_ethdev/rte_ethdev_core.h
> b/lib/librte_ethdev/rte_ethdev_core.h
> index 8f03f83f6..86806b3eb 100644
> --- a/lib/librte_ethdev/rte_ethdev_core.h
> +++ b/lib/librte_ethdev/rte_ethdev_core.h
> @@ -322,6 +322,10 @@ typedef int (*eth_timesync_write_time)(struct
> rte_eth_dev *dev,
> const struct timespec *timestamp); /**<
> @internal Function used to get time from the device clock */
>
> +typedef int (*eth_read_clock)(struct rte_eth_dev *dev,
> + uint64_t *timestamp);
> +/**< @internal Function used to get the current value of the device
> +clock. */
> +
> typedef int (*eth_get_reg_t)(struct rte_eth_dev *dev,
> struct rte_dev_reg_info *info);
> /**< @internal Retrieve registers */
> @@ -496,6 +500,8 @@ struct eth_dev_ops {
> eth_timesync_read_time timesync_read_time; /** Get the device
> clock time. */
> eth_timesync_write_time timesync_write_time; /** Set the device
> clock time. */
>
> + eth_read_clock read_clock;
> +
> eth_xstats_get_by_id_t xstats_get_by_id;
> /**< Get extended device statistic values by ID. */
> eth_xstats_get_names_by_id_t xstats_get_names_by_id; diff --git
> a/lib/librte_ethdev/rte_ethdev_version.map
> b/lib/librte_ethdev/rte_ethdev_version.map
> index 92ac3de25..12d6c3c1d 100644
> --- a/lib/librte_ethdev/rte_ethdev_version.map
> +++ b/lib/librte_ethdev/rte_ethdev_version.map
> @@ -249,6 +249,7 @@ EXPERIMENTAL {
> rte_eth_switch_domain_free;
> rte_flow_conv;
> rte_flow_expand_rss;
> + rte_eth_read_clock;
> rte_mtr_capabilities_get;
> rte_mtr_create;
> rte_mtr_destroy;
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index
> 3dbc6695e..4ceefa913 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -605,6 +605,8 @@ struct rte_mbuf {
>
> /** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
> * are not normalized but are always the same for a given port.
> + * Some devices allow to query rte_eth_read_clock that will return
> the
> + * current device timestamp.
> */
> uint64_t timestamp;
>

Other than that, on the RFC - Acked-by: Shahaf Shluer <***@mellanox.com>.
Let's see next the full implementation.


> --
> 2.17.1
Tom Barbette
2018-11-28 09:52:11 UTC
Permalink
Signed-off-by: Tom Barbette <***@kth.se>
---
drivers/net/mlx5/mlx5.c | 1 +
drivers/net/mlx5/mlx5.h | 1 +
drivers/net/mlx5/mlx5_ethdev.c | 31 +++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_glue.c | 8 ++++++++
drivers/net/mlx5/mlx5_glue.h | 2 ++
5 files changed, 43 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 9e5cab169..ed799ab5a 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -372,6 +372,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
.xstats_reset = mlx5_xstats_reset,
.xstats_get_names = mlx5_xstats_get_names,
.dev_infos_get = mlx5_dev_infos_get,
+ .read_clock = mlx5_read_clock,
.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
.vlan_filter_set = mlx5_vlan_filter_set,
.rx_queue_setup = mlx5_rx_queue_setup,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index bc500b2bc..3972debf5 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -258,6 +258,7 @@ int mlx5_set_flags(struct rte_eth_dev *dev, unsigned int keep,
unsigned int flags);
int mlx5_dev_configure(struct rte_eth_dev *dev);
void mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info);
+int mlx5_read_clock(struct rte_eth_dev *dev, uint64_t *time);
const uint32_t *mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev);
int mlx5_link_update(struct rte_eth_dev *dev, int wait_to_complete);
int mlx5_force_link_status_change(struct rte_eth_dev *dev, int status);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index d178ed6a1..eb97ab27e 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -557,6 +557,37 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
}
}

+/**
+ * Get device current raw clock counter
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ *
+ * @param[out] time
+ * Current raw clock counter of the device.
+ *
+ * @return
+ * 0 if the clock has correctly been read
+ */
+int
+mlx5_read_clock(struct rte_eth_dev *dev, uint64_t *clock)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct ibv_values_ex values;
+ int err = 0;
+
+ values.comp_mask = IBV_VALUES_MASK_RAW_CLOCK;
+ err = mlx5_glue->query_rt_values_ex(priv->ctx, &values);
+ if (err != 0) {
+ DRV_LOG(WARNING, "Could not query the clock !");
+ return err;
+ }
+
+ *clock = values.raw_clock.tv_nsec;
+ return 0;
+}
+
+
/**
* Get supported packet types.
*
diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
index dd10ad6de..e23296519 100644
--- a/drivers/net/mlx5/mlx5_glue.c
+++ b/drivers/net/mlx5/mlx5_glue.c
@@ -86,6 +86,13 @@ mlx5_glue_query_device_ex(struct ibv_context *context,
return ibv_query_device_ex(context, input, attr);
}

+static int
+mlx5_glue_query_rt_values_ex(struct ibv_context *context,
+ struct ibv_values_ex *values)
+{
+ return ibv_query_rt_values_ex(context, values);
+}
+
static int
mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
struct ibv_port_attr *port_attr)
@@ -491,6 +498,7 @@ const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
.close_device = mlx5_glue_close_device,
.query_device = mlx5_glue_query_device,
.query_device_ex = mlx5_glue_query_device_ex,
+ .query_rt_values_ex = mlx5_glue_query_rt_values_ex,
.query_port = mlx5_glue_query_port,
.create_comp_channel = mlx5_glue_create_comp_channel,
.destroy_comp_channel = mlx5_glue_destroy_comp_channel,
diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
index 2d92ba8bc..31ebee72a 100644
--- a/drivers/net/mlx5/mlx5_glue.h
+++ b/drivers/net/mlx5/mlx5_glue.h
@@ -70,6 +70,8 @@ struct mlx5_glue {
int (*query_device_ex)(struct ibv_context *context,
const struct ibv_query_device_ex_input *input,
struct ibv_device_attr_ex *attr);
+ int (*query_rt_values_ex)(struct ibv_context *context,
+ struct ibv_values_ex *values);
int (*query_port)(struct ibv_context *context, uint8_t port_num,
struct ibv_port_attr *port_attr);
struct ibv_comp_channel *(*create_comp_channel)
--
2.17.1
Shahaf Shuler
2018-12-09 06:03:57 UTC
Permalink
Hi Tom,

See few comments.

Wednesday, November 28, 2018 11:52 AM, Tom Barbette:
> Subject: [RFC PATCH 2/3] mlx5: Implement support for read_clock
>
> Signed-off-by: Tom Barbette <***@kth.se>
> ---
> drivers/net/mlx5/mlx5.c | 1 +
> drivers/net/mlx5/mlx5.h | 1 +
> drivers/net/mlx5/mlx5_ethdev.c | 31
> +++++++++++++++++++++++++++++++
> drivers/net/mlx5/mlx5_glue.c | 8 ++++++++
> drivers/net/mlx5/mlx5_glue.h | 2 ++
> 5 files changed, 43 insertions(+)
>
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> 9e5cab169..ed799ab5a 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -372,6 +372,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
> .xstats_reset = mlx5_xstats_reset,
> .xstats_get_names = mlx5_xstats_get_names,
> .dev_infos_get = mlx5_dev_infos_get,
> + .read_clock = mlx5_read_clock,
> .dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
> .vlan_filter_set = mlx5_vlan_filter_set,
> .rx_queue_setup = mlx5_rx_queue_setup, diff --git
> a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> bc500b2bc..3972debf5 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -258,6 +258,7 @@ int mlx5_set_flags(struct rte_eth_dev *dev, unsigned
> int keep,
> unsigned int flags);
> int mlx5_dev_configure(struct rte_eth_dev *dev); void
> mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info
> *info);
> +int mlx5_read_clock(struct rte_eth_dev *dev, uint64_t *time);
> const uint32_t *mlx5_dev_supported_ptypes_get(struct rte_eth_dev
> *dev); int mlx5_link_update(struct rte_eth_dev *dev, int
> wait_to_complete); int mlx5_force_link_status_change(struct rte_eth_dev
> *dev, int status); diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> b/drivers/net/mlx5/mlx5_ethdev.c index d178ed6a1..eb97ab27e 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -557,6 +557,37 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev,
> struct rte_eth_dev_info *info)
> }
> }
>
> +/**
> + * Get device current raw clock counter
> + *
> + * @param dev
> + * Pointer to Ethernet device structure.
> + *

Above space is not needed.

> + * @param[out] time
> + * Current raw clock counter of the device.
> + *
> + * @return
> + * 0 if the clock has correctly been read

And if not?

> + */
> +int
> +mlx5_read_clock(struct rte_eth_dev *dev, uint64_t *clock) {
> + struct priv *priv = dev->data->dev_private;
> + struct ibv_values_ex values;
> + int err = 0;
> +
> + values.comp_mask = IBV_VALUES_MASK_RAW_CLOCK;
> + err = mlx5_glue->query_rt_values_ex(priv->ctx, &values);
> + if (err != 0) {
> + DRV_LOG(WARNING, "Could not query the clock !");
> + return err;
> + }
> +

Above space is not needed.

> + *clock = values.raw_clock.tv_nsec;
> + return 0;
> +}
> +
> +
> /**
> * Get supported packet types.
> *
> diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
> index dd10ad6de..e23296519 100644
> --- a/drivers/net/mlx5/mlx5_glue.c
> +++ b/drivers/net/mlx5/mlx5_glue.c
> @@ -86,6 +86,13 @@ mlx5_glue_query_device_ex(struct ibv_context
> *context,
> return ibv_query_device_ex(context, input, attr); }
>
> +static int
> +mlx5_glue_query_rt_values_ex(struct ibv_context *context,
> + struct ibv_values_ex *values)
> +{
> + return ibv_query_rt_values_ex(context, values); }
> +
> static int
> mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
> struct ibv_port_attr *port_attr) @@ -491,6 +498,7 @@
> const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
> .close_device = mlx5_glue_close_device,
> .query_device = mlx5_glue_query_device,
> .query_device_ex = mlx5_glue_query_device_ex,
> + .query_rt_values_ex = mlx5_glue_query_rt_values_ex,
> .query_port = mlx5_glue_query_port,
> .create_comp_channel = mlx5_glue_create_comp_channel,
> .destroy_comp_channel = mlx5_glue_destroy_comp_channel, diff --
> git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h index
> 2d92ba8bc..31ebee72a 100644
> --- a/drivers/net/mlx5/mlx5_glue.h
> +++ b/drivers/net/mlx5/mlx5_glue.h
> @@ -70,6 +70,8 @@ struct mlx5_glue {
> int (*query_device_ex)(struct ibv_context *context,
> const struct ibv_query_device_ex_input *input,
> struct ibv_device_attr_ex *attr);
> + int (*query_rt_values_ex)(struct ibv_context *context,
> + struct ibv_values_ex *values);

You need to bump up the LIB_GLUE_VERSION to 19.02 on the makefile(s) as you change the lib glue ABI.

> int (*query_port)(struct ibv_context *context, uint8_t port_num,
> struct ibv_port_attr *port_attr);
> struct ibv_comp_channel *(*create_comp_channel)
> --
> 2.17.1
Tom Barbette
2018-11-28 09:52:12 UTC
Permalink
Use rxtx callback to demonstrate a way to use rte_eth_read_clock to
convert the hardware timestamps to an amount of cycles.

This allows to get the amount of time the packet spent since its entry
in the device. While the regular latency only shows the latency from
when it entered the software stack.

Signed-off-by: Tom Barbette <***@kth.se>
---
examples/rxtx_callbacks/Makefile | 2 +
examples/rxtx_callbacks/main.c | 87 +++++++++++++++++++++++++++--
examples/rxtx_callbacks/meson.build | 1 +
3 files changed, 86 insertions(+), 4 deletions(-)

diff --git a/examples/rxtx_callbacks/Makefile b/examples/rxtx_callbacks/Makefile
index e9d30d56f..e70c081f2 100644
--- a/examples/rxtx_callbacks/Makefile
+++ b/examples/rxtx_callbacks/Makefile
@@ -50,6 +50,8 @@ include $(RTE_SDK)/mk/rte.vars.mk

CFLAGS += $(WERROR_FLAGS)

+
+CFLAGS += -DALLOW_EXPERIMENTAL_API
# workaround for a gcc bug with noreturn attribute
# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
diff --git a/examples/rxtx_callbacks/main.c b/examples/rxtx_callbacks/main.c
index 2058be627..cd4e86f0f 100644
--- a/examples/rxtx_callbacks/main.c
+++ b/examples/rxtx_callbacks/main.c
@@ -10,6 +10,8 @@
#include <rte_lcore.h>
#include <rte_mbuf.h>

+#include <getopt.h>
+
#define RX_RING_SIZE 1024
#define TX_RING_SIZE 1024

@@ -17,6 +19,9 @@
#define MBUF_CACHE_SIZE 250
#define BURST_SIZE 32

+static const char usage[] =
+ "%s EAL_ARGS -- [-t]\n";
+
static const struct rte_eth_conf port_conf_default = {
.rxmode = {
.max_rx_pkt_len = ETHER_MAX_LEN,
@@ -25,9 +30,14 @@ static const struct rte_eth_conf port_conf_default = {

static struct {
uint64_t total_cycles;
+ uint64_t total_queue_cycles;
uint64_t total_pkts;
} latency_numbers;

+int hw_timestamping;
+
+#define TICKS_PER_CYCLE_SHIFT 16
+uint64_t ticks_per_cycle_mult;

static uint16_t
add_timestamps(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
@@ -43,22 +53,42 @@ add_timestamps(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
}

static uint16_t
-calc_latency(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+calc_latency(uint16_t port, uint16_t qidx __rte_unused,
struct rte_mbuf **pkts, uint16_t nb_pkts, void *_ __rte_unused)
{
uint64_t cycles = 0;
+ uint64_t queue_ticks = 0;
uint64_t now = rte_rdtsc();
+ uint64_t ticks;
unsigned i;

- for (i = 0; i < nb_pkts; i++)
+ if (hw_timestamping)
+ rte_eth_read_clock(port, &ticks);
+
+ for (i = 0; i < nb_pkts; i++) {
cycles += now - pkts[i]->udata64;
+ if (hw_timestamping)
+ queue_ticks += ticks - pkts[i]->timestamp;
+ }
+
latency_numbers.total_cycles += cycles;
+ if (hw_timestamping)
+ latency_numbers.total_queue_cycles += (queue_ticks
+ * ticks_per_cycle_mult) >> TICKS_PER_CYCLE_SHIFT;
+
latency_numbers.total_pkts += nb_pkts;

if (latency_numbers.total_pkts > (100 * 1000 * 1000ULL)) {
printf("Latency = %"PRIu64" cycles\n",
latency_numbers.total_cycles / latency_numbers.total_pkts);
- latency_numbers.total_cycles = latency_numbers.total_pkts = 0;
+ if (hw_timestamping) {
+ printf("Latency from HW = %"PRIu64" cycles\n",
+ latency_numbers.total_queue_cycles
+ / latency_numbers.total_pkts);
+ }
+ latency_numbers.total_cycles = 0;
+ latency_numbers.total_queue_cycles = 0;
+ latency_numbers.total_pkts = 0;
}
return nb_pkts;
}
@@ -77,6 +107,7 @@ port_init(uint16_t port, struct rte_mempool *mbuf_pool)
int retval;
uint16_t q;
struct rte_eth_dev_info dev_info;
+ struct rte_eth_rxconf rxconf;
struct rte_eth_txconf txconf;

if (!rte_eth_dev_is_valid_port(port))
@@ -95,9 +126,20 @@ port_init(uint16_t port, struct rte_mempool *mbuf_pool)
if (retval != 0)
return retval;

+ rxconf = dev_info.default_rxconf;
+
+ if (hw_timestamping) {
+ if (!(dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TIMESTAMP)) {
+ printf("\nERROR: Port %u does not support hardware timestamping\n"
+ , port);
+ return -1;
+ }
+ rxconf.offloads |= DEV_RX_OFFLOAD_TIMESTAMP;
+ }
+
for (q = 0; q < rx_rings; q++) {
retval = rte_eth_rx_queue_setup(port, q, nb_rxd,
- rte_eth_dev_socket_id(port), NULL, mbuf_pool);
+ rte_eth_dev_socket_id(port), &rxconf, mbuf_pool);
if (retval < 0)
return retval;
}
@@ -115,6 +157,25 @@ port_init(uint16_t port, struct rte_mempool *mbuf_pool)
if (retval < 0)
return retval;

+ if (hw_timestamping && ticks_per_cycle_mult == 0) {
+ uint64_t cycles_base = rte_rdtsc();
+ uint64_t ticks_base;
+ rte_eth_read_clock(port, &ticks_base);
+ rte_delay_ms(100);
+ uint64_t cycles = rte_rdtsc();
+ uint64_t ticks;
+ rte_eth_read_clock(port, &ticks);
+ uint64_t c_freq = cycles - cycles_base;
+ uint64_t t_freq = ticks - ticks_base;
+ double freq_mult = (double)c_freq / t_freq;
+ printf("TSC Freq ~= %lu\nHW Freq ~= %lu\nRatio : %f\n",
+ c_freq * 10, t_freq * 10, freq_mult);
+ /* TSC will be faster than internal ticks so freq_mult is > 0
+ * We convert the multiplication to an integer shift & mult
+ */
+ ticks_per_cycle_mult = (1 << TICKS_PER_CYCLE_SHIFT) / freq_mult;
+ }
+
struct ether_addr addr;

rte_eth_macaddr_get(port, &addr);
@@ -177,6 +238,11 @@ main(int argc, char *argv[])
struct rte_mempool *mbuf_pool;
uint16_t nb_ports;
uint16_t portid;
+ struct option lgopts[] = {
+ { NULL, 0, 0, 0 }
+ };
+ int opt, option_index;
+

/* init EAL */
int ret = rte_eal_init(argc, argv);
@@ -186,6 +252,19 @@ main(int argc, char *argv[])
argc -= ret;
argv += ret;

+ while ((opt = getopt_long(argc, argv, "t", lgopts, &option_index))
+ != EOF)
+ switch (opt) {
+ case 't':
+ hw_timestamping = 1;
+ break;
+ default:
+ printf(usage, argv[0]);
+ return -1;
+ }
+ optind = 1; /* reset getopt lib */
+
+
nb_ports = rte_eth_dev_count_avail();
if (nb_ports < 2 || (nb_ports & 1))
rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n");
diff --git a/examples/rxtx_callbacks/meson.build b/examples/rxtx_callbacks/meson.build
index c34e11e36..2b0a25036 100644
--- a/examples/rxtx_callbacks/meson.build
+++ b/examples/rxtx_callbacks/meson.build
@@ -6,6 +6,7 @@
# To build this example as a standalone application with an already-installed
# DPDK instance, use 'make'

+allow_experimental_apis = true
sources = files(
'main.c'
)
--
2.17.1
Loading...