Unverified Commit 594aade8 authored by Tao Feng's avatar Tao Feng Committed by GitHub

doc: add a few sample query for user metric doc (#774)

* doc: add a few sample query for user metric doc
Signed-off-by: 's avatarfeng-tao <fengtao04@gmail.com>

* doc: fix index
Signed-off-by: 's avatarfeng-tao <fengtao04@gmail.com>
parent de66341d
# How to track Amundsen user metric # How to track Amundsen user metric
After you have deployed Amundsen into production, you want to track how user interacts with Amundsen for various reasons. After you have deployed Amundsen into production, you want to track how user interacts with Amundsen for various reasons.
The easier way is to leverage [Google Analytics](https://analytics.google.com/analytics/web/) for basic user tracking. You could first The easier way is to leverage [Google Analytics](https://analytics.google.com/analytics/web/) for basic user tracking. You could first
get the analytics token for your domain and put it as the [frontend config](https://github.com/amundsen-io/amundsenfrontendlibrary/blob/54de01bdc574665316f0517aefbd55cf7ca37ef0/amundsen_application/static/js/config/config-default.ts#L22) get the analytics token for your domain and put it as the [frontend config](https://github.com/amundsen-io/amundsenfrontendlibrary/blob/54de01bdc574665316f0517aefbd55cf7ca37ef0/amundsen_application/static/js/config/config-default.ts#L22)
Besides implementing Google Analytics, we provide a way called `action_logging` to do fine grained user action tracking. Besides implementing Google Analytics, we provide a way called `action_logging` to do fine grained user action tracking.
The `action_logging` is a decorator to allow you to integrate user info and pipe it to your inhouse event tracking system(e.g Kafka). The `action_logging` is a decorator to allow you to integrate user info and pipe it to your inhouse event tracking system(e.g Kafka).
You need to put the custom method into entry_points following this You need to put the custom method into entry_points following this
[example](https://github.com/amundsen-io/amundsenfrontendlibrary/blob/54de01bdc574665316f0517aefbd55cf7ca37ef0/docs/configuration.md#action-logging). [example](https://github.com/amundsen-io/amundsenfrontendlibrary/blob/54de01bdc574665316f0517aefbd55cf7ca37ef0/docs/configuration.md#action-logging).
And here is the IDL proto we used at Lyft to send the event message: And here is the IDL proto we used at Lyft to send the event message:
...@@ -40,20 +40,145 @@ Once you have the event in your data warehouse, you could start building differe ...@@ -40,20 +40,145 @@ Once you have the event in your data warehouse, you could start building differe
1. WAU 1. WAU
Sample query if the event table named as `default.event_amundsenfrontend_user_action`
```
SELECT date_trunc('week', CAST("ds" AS TIMESTAMP)) AS "__timestamp",
COUNT(DISTINCT user_value) AS "count_distinct_active_users"
FROM
(SELECT *
FROM default.event_amundsenfrontend_user_action
WHERE ds > '2019-09-01') AS "expr_qry"
WHERE "ds" >= '2020-04-21 00:00:00.000000'
AND "ds" <= '2020-10-21 05:31:14.000000'
GROUP BY date_trunc('week', CAST("ds" AS TIMESTAMP))
ORDER BY "count_distinct_active_users" DESC
LIMIT 10000
```
2. DAU 2. DAU
3. Top 50 active user Sample query if the event table named as `default.event_amundsenfrontend_user_action`
```
SELECT date_trunc('day', CAST("ds" AS TIMESTAMP)) AS "__timestamp",
COUNT(DISTINCT user_value) AS "count_distinct_active_users"
FROM
(SELECT *
FROM default.event_amundsenfrontend_user_action
WHERE ds > '2019-09-01') AS "expr_qry"
WHERE "ds" >= '2020-07-21 00:00:00.000000'
AND "ds" <= '2020-10-21 00:00:00.000000'
GROUP BY date_trunc('day', CAST("ds" AS TIMESTAMP))
ORDER BY "count_distinct_active_users" DESC
LIMIT 50000
```
You could also exclude weekends:
```
SELECT date_trunc('day', CAST("ds" AS TIMESTAMP)) AS "__timestamp",
COUNT(DISTINCT user_value) AS "count_distinct_active_users"
FROM
(SELECT *
FROM default.event_amundsenfrontend_user_action
WHERE ds > '2019-09-01') AS "expr_qry"
WHERE "ds" >= '2020-04-21 00:00:00.000000'
AND "ds" <= '2020-10-21 05:33:11.000000'
AND day_of_week(logged_at) NOT IN (6,
7)
GROUP BY date_trunc('day', CAST("ds" AS TIMESTAMP))
ORDER BY "count_distinct_active_users" DESC
LIMIT 50000
```
3. User Penetration per role
Sample query if the event table named as `default.event_amundsenfrontend_user_action` and a table for user:
```
SELECT "title" AS "title",
COUNT(DISTINCT email) * 100 / MAX(role_count) AS "penetration_percent"
FROM
(SELECT e.occurred_at,
u.email,
u.title,
tmp.role_count
FROM default.family_user u
JOIN default.event_amundsenfrontend_user_action e ON u.email = e.user_value
JOIN
(SELECT title,
count(*) role_count
FROM default.family_user
GROUP BY 1) as tmp ON u.title = tmp.title
where ds is not NULL) AS "expr_qry"
WHERE "occurred_at" >= from_iso8601_timestamp('2020-10-14T00:00:00.000000')
AND "occurred_at" <= from_iso8601_timestamp('2020-10-21T00:00:00.000000')
AND "role_count" > 20
GROUP BY "title"
ORDER BY "penetration_percent" DESC
LIMIT 100
```
4. Usage breakdown per role_count
sample query:
```
SELECT "title" AS "title",
count("email") AS "COUNT(email)"
FROM
(SELECT e.occurred_at,
u.email,
u.title,
tmp.role_count
FROM default.family_user u
JOIN default.event_amundsenfrontend_user_action e ON u.email = e.user_value
JOIN
(SELECT title,
count(*) role_count
FROM default.family_user
GROUP BY 1) as tmp ON u.title = tmp.title
where ds is not NULL) AS "expr_qry"
WHERE "occurred_at" >= from_iso8601_timestamp('2020-10-14T00:00:00.000000')
AND "occurred_at" <= from_iso8601_timestamp('2020-10-21T00:00:00.000000')
GROUP BY "title"
ORDER BY "COUNT(email)" DESC
LIMIT 15
```
5. search click through rate
sample query:
```
SELECT date_trunc('day', CAST("occurred_at" AS TIMESTAMP)) AS "__timestamp",
SUM(CASE
WHEN CAST(json_extract_scalar(keyword_args_json, '$.index') AS BIGINT) <= 3 THEN 1
ELSE 0
END) * 100 / COUNT(*) AS "click_through_rate"
FROM
(SELECT *
FROM default.event_amundsenfrontend_user_action
WHERE ds > '2019-09-01') AS "expr_qry"
WHERE "occurred_at" >= from_iso8601_timestamp('2020-09-21T00:00:00.000000')
AND "occurred_at" <= from_iso8601_timestamp('2020-10-21T00:00:00.000000')
AND "command" IN ('_get_table_metadata',
'_get_dashboard_metadata',
'_log_get_user')
AND json_extract_scalar(keyword_args_json, '$.source') IN ('search_results',
'inline_search')
GROUP BY date_trunc('day', CAST("occurred_at" AS TIMESTAMP))
ORDER BY "click_through_rate" DESC
LIMIT 10000
```
6. Top 50 active user
4. Top search term 7. Top search term
5. Top popular tables 8. Top popular tables
6. Search click index 9. Search click index
7. Metadata edits 10. Metadata edits
8. Metadata edit leaders 11. Metadata edit leaders
9. Amundsen user per role (by joining with employee data) 12. Amundsen user per role (by joining with employee data)
10. ... 13. ...
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment