Add Tutorial(1. index postgres; 2. setup superset preview client) for the doc (#413)

* Add Tutorial for the doc * Add superset doc * Add mkdocs change * Update docs/tutorials/data-preview-with-superset.md Co-Authored-By: Tamika Tannis <ttannis@lyft.com> * Update docs/tutorials/data-preview-with-superset.md Co-Authored-By: Tamika Tannis <ttannis@lyft.com> * Update docs/tutorials/data-preview-with-superset.md Co-Authored-By: Tamika Tannis <ttannis@lyft.com> * update Co-authored-by: Tamika Tannis <ttannis@lyft.com>

Add Tutorial(1. index postgres; 2. setup superset preview client) for the doc (#413)
* Add Tutorial for the doc * Add superset doc * Add mkdocs change * Update docs/tutorials/data-preview-with-superset.md Co-Authored-By: Tamika Tannis <ttannis@lyft.com> * Update docs/tutorials/data-preview-with-superset.md Co-Authored-By: Tamika Tannis <ttannis@lyft.com> * Update docs/tutorials/data-preview-with-superset.md Co-Authored-By: Tamika Tannis <ttannis@lyft.com> * update Co-authored-by: Tamika Tannis <ttannis@lyft.com>
c70bce21 · Tao Feng · GitHub · 5fbed864 · c70bce21 · c70bce21
Unverified Commit c70bce21 authored Apr 27, 2020 by Tao Feng Committed by GitHub Apr 27, 2020
15 changed files
--- a/docs/img/neo4j-debug.png
+++ b/docs/img/neo4j-debug.png
--- a/docs/img/search-exact-match.png
+++ b/docs/img/search-exact-match.png
--- a/docs/img/search-page.png
+++ b/docs/img/search-page.png
--- a/docs/img/tutorials/amundsen-preview1.png
+++ b/docs/img/tutorials/amundsen-preview1.png
--- a/docs/img/tutorials/amundsen-preview2.png
+++ b/docs/img/tutorials/amundsen-preview2.png
--- a/docs/img/tutorials/postgres.png
+++ b/docs/img/tutorials/postgres.png
--- a/docs/img/tutorials/search-postgres.png
+++ b/docs/img/tutorials/search-postgres.png
--- a/docs/img/tutorials/superset-add-db.png
+++ b/docs/img/tutorials/superset-add-db.png
--- a/docs/img/tutorials/superset-sqllab-verify.png
+++ b/docs/img/tutorials/superset-sqllab-verify.png
--- a/docs/img/tutorials/superset-welcome.png
+++ b/docs/img/tutorials/superset-welcome.png
--- a/docs/img/tutorials/table-postgres.png
+++ b/docs/img/tutorials/table-postgres.png
--- a/docs/installation.md
+++ b/docs/installation.md
@@ -28,6 +28,11 @@ The following instructions are for setting up a version of Amundsen using Docker
    $ python3 example/scripts/sample_data_loader.py
   ```
 5. View UI at [`http://localhost:5000`](http://localhost:5000) and try to search `test`, it should return some result.
+![](img/search-page.png)
+6. We could also do an exact matched search for table entity. For example: search `test_table1` in table field and 
+it return the records that matched.
+![](img/search-exact-match.png)
 **Atlas Note:** Atlas takes some time to boot properly. So you may not be able to see the results immediately 
 after `docker-compose up` command. 
@@ -37,7 +42,8 @@ Atlas would be ready once you'll have the following output in the docker output
 1. You can verify dummy data has been ingested into Neo4j by by visiting [`http://localhost:7474/browser/`](http://localhost:7474/browser/) and run `MATCH (n:Table) RETURN n LIMIT 25` in the query box. You should see two tables:
   1. `hive.test_schema.test_table1`
-   2. `dynamo.test_schema.test_table2`
+   2. `hive.test_schema.test_table2`
+![](img/neo4j-debug.png)
 2. You can verify the data has been loaded into the metadataservice by visiting:
   1. [`http://localhost:5000/table_detail/gold/hive/test_schema/test_table1`](http://localhost:5000/table_detail/gold/hive/test_schema/test_table1)
   2. [`http://localhost:5000/table_detail/gold/dynamo/test_schema/test_table2`](http://localhost:5000/table_detail/gold/dynamo/test_schema/test_table2)

--- a/docs/tutorials/data-preview-with-superset.md
+++ b/docs/tutorials/data-preview-with-superset.md
+# How to setup a preview client with Apache Superset
+In the previous [tutorial](docs/tutorials/index-postgres.md), we talked about how to index the table metadata
+for a postgres database. In this tutorial, we will walk through how to configure data preview for this `films` table
+using Apache Superset.
+Amundsen provides an integration between Amundsen and BI Viz tool for data preview. It is not necessary to use Apache Superset
+as long as the BI Viz tool provides endpoint to do querying and get the results back from the BI tool.
+[Apache Superset](https://superset.apache.org/) is an open-source business intelligence tool
+that can be used for data exploration and it is what we leverage internally at Lyft to support the feature.
+1. Please setup Apache Superset following its official installation
+[guide](https://superset.apache.org/installation.html#superset-installation-and-initialization):
+   ```bash
+    # Install superset
+    pip install apache-superset
+    # Initialize the database
+    superset db upgrade
+    # Create an admin user (you will be prompted to set a username, first and last name before setting a password)
+    $ export FLASK_APP=superset
+    superset fab create-admin
+    # Load some data to play with
+    superset load_examples
+    # Create default roles and permissions
+    superset init
+    # To start a development web server on port 8088, use -p to bind to another port
+    superset run -p 8088 --with-threads --reload --debugger
+   ```
+   Once setup properly, you could view the superset UI as following:
+   ![](../img/tutorials/superset-welcome.png)
+2. We need to add the postgres database to superset as the following:
+![](../img/tutorials/superset-add-db.png)
+3. We could verify the content of the `films` table using superset's sqlab feature:
+![](../img/tutorials/superset-sqllab-verify.png)
+4. Next, We need to build a preview client following this [guide](../frontend/docs/examples/superset_preview_client.md)
+and the [example client code](https://github.com/lyft/amundsenfrontendlibrary/blob/master/amundsen_application/base/examples/example_superset_preview_client.py).
+There are a couple of things to keep in mind:
+    - We could start with an unauthenticated Superset([example superset config](https://gist.github.com/feng-tao/b89e6faf7236372cef70a44f13615c39)),
+    but in production, we will need to send the impersonate info to Superset
+    to properly verify whether the given user could view the data.
+    - When we build the client, we could need to configure the database id instead of the database name when send the request to superset.
+5. Once we configure the preview client, put it in the frontend service entry point ([example](https://github.com/lyft/amundsenfrontendlibrary/blob/master/docs/configuration.md#python-entry-points)) and restart the frontend.
+6. We could now view the preview data for the `films` table in Amundsen.
+![](../img/tutorials/amundsen-preview1.png)
+From the above figure, the preview button on the table page is clickable.
+Once it clicked, you could see the actual data queried
+from Apache Superset:
+![](../img/tutorials/amundsen-preview2.png)
--- a/docs/tutorials/index-postgres.md
+++ b/docs/tutorials/index-postgres.md
+# How to index metadata for real life databases
+From previous [doc](docs/installation.md), we have indexed tables from a csv files. In real production cases, 
+the table metadata is stored in data warehouses(e.g Hive, Postgres, Mysql, Snowflake, Bigquery etc.) which Amundsen has 
+the extractors for metadata extraction.
+In this tutorial, we will use a postgres db as an example to walk through how to index metadata for a postgres database.
+The doc won't cover how to setup a postgres database.
+1. In the example, we have a postgres table in localhost postgres named `films`.
+![](../img/tutorials/postgres.png)
+2. We leverage the [postgres metadata extractor](https://github.com/lyft/amundsendatabuilder/blob/master/databuilder/extractor/postgres_metadata_extractor.py)
+to extract the metadata information of the postgres database. We could call the metadata extractor 
+in an adhoc python function as this [example](https://github.com/lyft/amundsendatabuilder/pull/248/commits/f5064e58a19a5bfa380b333cfc657ebb34702a2c)
+or from an Airflow DAG.
+3. Once we run the script, we could search the `films` table using Amundsen Search.
+![](../img/tutorials/search-postgres.png)
+4. We could also find and view the `films` table in the table detail page.
+![](../img/tutorials/table-postgres.png)
+This tutorial uses postgres to serve as an example, but you could apply the same approach for your various data warehouses. If Amundsen 
+doesn't provide the extractor, you could build one based on the API and contribute the extractor back to us!
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -62,6 +62,9 @@ nav:
    - 'Overview': developer_guide.md
  - 'User Guide':
    - 'Quick Start': 'installation.md'
+    - 'Tutorials':
+        - 'How to index the postgres database metadata': 'tutorials/index-postgres.md'
+        - 'How to setup a preview client with Apache Superset': 'tutorials/data-preview-with-superset.md'
    - 'Deployment':
      - 'Authentication': 'authentication/oidc.md'
      - 'AWS ECS Installation': 'installation-aws-ecs/aws-ecs-deployment.md'