Unverified Commit c70bce21 authored by Tao Feng's avatar Tao Feng Committed by GitHub

Add Tutorial(1. index postgres; 2. setup superset preview client) for the doc (#413)

* Add Tutorial for the doc

* Add superset doc

* Add mkdocs change

* Update docs/tutorials/data-preview-with-superset.md
Co-Authored-By: 's avatarTamika Tannis <ttannis@lyft.com>

* Update docs/tutorials/data-preview-with-superset.md
Co-Authored-By: 's avatarTamika Tannis <ttannis@lyft.com>

* Update docs/tutorials/data-preview-with-superset.md
Co-Authored-By: 's avatarTamika Tannis <ttannis@lyft.com>

* update
Co-authored-by: 's avatarTamika Tannis <ttannis@lyft.com>
parent 5fbed864
...@@ -28,6 +28,11 @@ The following instructions are for setting up a version of Amundsen using Docker ...@@ -28,6 +28,11 @@ The following instructions are for setting up a version of Amundsen using Docker
$ python3 example/scripts/sample_data_loader.py $ python3 example/scripts/sample_data_loader.py
``` ```
5. View UI at [`http://localhost:5000`](http://localhost:5000) and try to search `test`, it should return some result. 5. View UI at [`http://localhost:5000`](http://localhost:5000) and try to search `test`, it should return some result.
![](img/search-page.png)
6. We could also do an exact matched search for table entity. For example: search `test_table1` in table field and
it return the records that matched.
![](img/search-exact-match.png)
**Atlas Note:** Atlas takes some time to boot properly. So you may not be able to see the results immediately **Atlas Note:** Atlas takes some time to boot properly. So you may not be able to see the results immediately
after `docker-compose up` command. after `docker-compose up` command.
...@@ -37,7 +42,8 @@ Atlas would be ready once you'll have the following output in the docker output ...@@ -37,7 +42,8 @@ Atlas would be ready once you'll have the following output in the docker output
1. You can verify dummy data has been ingested into Neo4j by by visiting [`http://localhost:7474/browser/`](http://localhost:7474/browser/) and run `MATCH (n:Table) RETURN n LIMIT 25` in the query box. You should see two tables: 1. You can verify dummy data has been ingested into Neo4j by by visiting [`http://localhost:7474/browser/`](http://localhost:7474/browser/) and run `MATCH (n:Table) RETURN n LIMIT 25` in the query box. You should see two tables:
1. `hive.test_schema.test_table1` 1. `hive.test_schema.test_table1`
2. `dynamo.test_schema.test_table2` 2. `hive.test_schema.test_table2`
![](img/neo4j-debug.png)
2. You can verify the data has been loaded into the metadataservice by visiting: 2. You can verify the data has been loaded into the metadataservice by visiting:
1. [`http://localhost:5000/table_detail/gold/hive/test_schema/test_table1`](http://localhost:5000/table_detail/gold/hive/test_schema/test_table1) 1. [`http://localhost:5000/table_detail/gold/hive/test_schema/test_table1`](http://localhost:5000/table_detail/gold/hive/test_schema/test_table1)
2. [`http://localhost:5000/table_detail/gold/dynamo/test_schema/test_table2`](http://localhost:5000/table_detail/gold/dynamo/test_schema/test_table2) 2. [`http://localhost:5000/table_detail/gold/dynamo/test_schema/test_table2`](http://localhost:5000/table_detail/gold/dynamo/test_schema/test_table2)
......
# How to setup a preview client with Apache Superset
In the previous [tutorial](docs/tutorials/index-postgres.md), we talked about how to index the table metadata
for a postgres database. In this tutorial, we will walk through how to configure data preview for this `films` table
using Apache Superset.
Amundsen provides an integration between Amundsen and BI Viz tool for data preview. It is not necessary to use Apache Superset
as long as the BI Viz tool provides endpoint to do querying and get the results back from the BI tool.
[Apache Superset](https://superset.apache.org/) is an open-source business intelligence tool
that can be used for data exploration and it is what we leverage internally at Lyft to support the feature.
1. Please setup Apache Superset following its official installation
[guide](https://superset.apache.org/installation.html#superset-installation-and-initialization):
```bash
# Install superset
pip install apache-superset
# Initialize the database
superset db upgrade
# Create an admin user (you will be prompted to set a username, first and last name before setting a password)
$ export FLASK_APP=superset
superset fab create-admin
# Load some data to play with
superset load_examples
# Create default roles and permissions
superset init
# To start a development web server on port 8088, use -p to bind to another port
superset run -p 8088 --with-threads --reload --debugger
```
Once setup properly, you could view the superset UI as following:
![](../img/tutorials/superset-welcome.png)
2. We need to add the postgres database to superset as the following:
![](../img/tutorials/superset-add-db.png)
3. We could verify the content of the `films` table using superset's sqlab feature:
![](../img/tutorials/superset-sqllab-verify.png)
4. Next, We need to build a preview client following this [guide](../frontend/docs/examples/superset_preview_client.md)
and the [example client code](https://github.com/lyft/amundsenfrontendlibrary/blob/master/amundsen_application/base/examples/example_superset_preview_client.py).
There are a couple of things to keep in mind:
- We could start with an unauthenticated Superset([example superset config](https://gist.github.com/feng-tao/b89e6faf7236372cef70a44f13615c39)),
but in production, we will need to send the impersonate info to Superset
to properly verify whether the given user could view the data.
- When we build the client, we could need to configure the database id instead of the database name when send the request to superset.
5. Once we configure the preview client, put it in the frontend service entry point ([example](https://github.com/lyft/amundsenfrontendlibrary/blob/master/docs/configuration.md#python-entry-points)) and restart the frontend.
6. We could now view the preview data for the `films` table in Amundsen.
![](../img/tutorials/amundsen-preview1.png)
From the above figure, the preview button on the table page is clickable.
Once it clicked, you could see the actual data queried
from Apache Superset:
![](../img/tutorials/amundsen-preview2.png)
# How to index metadata for real life databases
From previous [doc](docs/installation.md), we have indexed tables from a csv files. In real production cases,
the table metadata is stored in data warehouses(e.g Hive, Postgres, Mysql, Snowflake, Bigquery etc.) which Amundsen has
the extractors for metadata extraction.
In this tutorial, we will use a postgres db as an example to walk through how to index metadata for a postgres database.
The doc won't cover how to setup a postgres database.
1. In the example, we have a postgres table in localhost postgres named `films`.
![](../img/tutorials/postgres.png)
2. We leverage the [postgres metadata extractor](https://github.com/lyft/amundsendatabuilder/blob/master/databuilder/extractor/postgres_metadata_extractor.py)
to extract the metadata information of the postgres database. We could call the metadata extractor
in an adhoc python function as this [example](https://github.com/lyft/amundsendatabuilder/pull/248/commits/f5064e58a19a5bfa380b333cfc657ebb34702a2c)
or from an Airflow DAG.
3. Once we run the script, we could search the `films` table using Amundsen Search.
![](../img/tutorials/search-postgres.png)
4. We could also find and view the `films` table in the table detail page.
![](../img/tutorials/table-postgres.png)
This tutorial uses postgres to serve as an example, but you could apply the same approach for your various data warehouses. If Amundsen
doesn't provide the extractor, you could build one based on the API and contribute the extractor back to us!
...@@ -62,6 +62,9 @@ nav: ...@@ -62,6 +62,9 @@ nav:
- 'Overview': developer_guide.md - 'Overview': developer_guide.md
- 'User Guide': - 'User Guide':
- 'Quick Start': 'installation.md' - 'Quick Start': 'installation.md'
- 'Tutorials':
- 'How to index the postgres database metadata': 'tutorials/index-postgres.md'
- 'How to setup a preview client with Apache Superset': 'tutorials/data-preview-with-superset.md'
- 'Deployment': - 'Deployment':
- 'Authentication': 'authentication/oidc.md' - 'Authentication': 'authentication/oidc.md'
- 'AWS ECS Installation': 'installation-aws-ecs/aws-ecs-deployment.md' - 'AWS ECS Installation': 'installation-aws-ecs/aws-ecs-deployment.md'
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment