Unverified Commit 0c77114a authored by Tamika Tannis's avatar Tamika Tannis Committed by GitHub

Configure submodules to child repos (#4)

* Add submodules

* Update doc with recent changes

* Add a PR template
parent b9e3a6e3
### Summary of Changes
_Include a summary of changes then remove this line_
### Documentation
_What documentation did you add or modify and why? Add any relevant links then remove this line_
### CheckList
Make sure you have checked **all** steps below to ensure a timely review.
- [ ] PR title addresses the issue accurately and concisely.
- [ ] PR includes a summary of changes.
- [ ] I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
[submodule "amundsendatabuilder"]
path = amundsendatabuilder
url = https://github.com/lyft/amundsendatabuilder
[submodule "amundsenfrontendlibrary"]
path = amundsenfrontendlibrary
url = https://github.com/lyft/amundsenfrontendlibrary
[submodule "amundsenmetadatalibrary"]
path = amundsenmetadatalibrary
url = https://github.com/lyft/amundsenmetadatalibrary
[submodule "amundsensearchlibrary"]
path = amundsensearchlibrary
url = https://github.com/lyft/amundsensearchlibrary
...@@ -77,6 +77,8 @@ Please visit [Roadmap](docs/roadmap.md) if you are interested in Amundsen upcomi ...@@ -77,6 +77,8 @@ Please visit [Roadmap](docs/roadmap.md) if you are interested in Amundsen upcomi
- [Amundsen: A Data Discovery Platform from Lyft](https://www.slideshare.net/taofung/data-council-sf-amundsen-presentation) (Data council 19 SF) - [Amundsen: A Data Discovery Platform from Lyft](https://www.slideshare.net/taofung/data-council-sf-amundsen-presentation) (Data council 19 SF)
- [Software Engineering Daily podcast on Amundsen](https://softwareengineeringdaily.com/2019/04/16/lyft-data-discovery-with-tao-feng-and-mark-grover/) (April 2019) - [Software Engineering Daily podcast on Amundsen](https://softwareengineeringdaily.com/2019/04/16/lyft-data-discovery-with-tao-feng-and-mark-grover/) (April 2019)
- [Disrupting Data Discovery](https://www.slideshare.net/markgrover/disrupting-data-discovery) (Strata London 2019) - [Disrupting Data Discovery](https://www.slideshare.net/markgrover/disrupting-data-discovery) (Strata London 2019)
- [Disrupting Data Discovery (video)](https://www.youtube.com/watch?v=m1B-ptm0Rrw) (Strata SF 2019)
- [ING Data Analytics Platform (Amundsen is mentioned)](https://static.sched.com/hosted_files/kccnceu19/65/ING%20Data%20Analytics%20Platform.pdf) (Kubecon Barcelona 2019)
# License # License
[Apache 2.0 License.](/LICENSE) [Apache 2.0 License.](/LICENSE)
Subproject commit f73c8128671b37020503558e6cd00ac02fd26306
Subproject commit 525d4323854f8f74f4c5198cc4efdf0283ebb13b
Subproject commit 2b33102d3f9511537656f60f987e3e79caef0c72
Subproject commit 46513a881e7b49f862b2a8b67131135d9026aed2
...@@ -4,8 +4,7 @@ services: ...@@ -4,8 +4,7 @@ services:
image: neo4j:3.3.0 image: neo4j:3.3.0
container_name: neo4j_amundsen container_name: neo4j_amundsen
environment: environment:
- CREDENTIALS_PROXY_USER=neo4j - NEO4J_AUTH=neo4j/test
- CREDENTIALS_PROXY_PASSWORD=test
ulimits: ulimits:
nofile: nofile:
soft: 40000 soft: 40000
...@@ -46,7 +45,6 @@ services: ...@@ -46,7 +45,6 @@ services:
- amundsennet - amundsennet
environment: environment:
- PROXY_HOST=bolt://neo4j_amundsen - PROXY_HOST=bolt://neo4j_amundsen
# - CREDENTIALS_PROXY_PASSWORD=neo4j_NOTE_FOR_NOW_IT_SEEMS_NEO4JCONFIG_DISREGARDS_CREDENTIALS_WE_SHOULD_FILE_A_BUG
amundsenfrontend: amundsenfrontend:
image: amundsendev/amundsen-frontend:1.0.5 image: amundsendev/amundsen-frontend:1.0.5
container_name: amundsenfrontend container_name: amundsenfrontend
......
# OIDC (Keycloak) Authentication # OIDC (Keycloak) Authentication
Setting up end-to-end authentication using OIDC is fairly simple and can be done using a Flask wrapper i.e., [flaskoidc](https://github.com/verdan/flaskoidc). Setting up end-to-end authentication using OIDC is fairly simple and can be done using a Flask wrapper i.e., [flaskoidc](https://github.com/verdan/flaskoidc).
`flaskoidc` leverages the Flask's `before_request` functionality to authenticate each request before passing that to `flaskoidc` leverages the Flask's `before_request` functionality to authenticate each request before passing that to
the views. It also accepts headers on each request if available in order to validate bearer token from incoming requests. the views. It also accepts headers on each request if available in order to validate bearer token from incoming requests.
## Installation ## Installation
Please refer to the [flaskoidc documentation](https://github.com/verdan/flaskoidc/blob/master/README.md) Please refer to the [flaskoidc documentation](https://github.com/verdan/flaskoidc/blob/master/README.md)
for the installation and the configurations. for the installation and the configurations.
Note: You need to install and configure `flaskoidc` for each microservice of Amundsen Note: You need to install and configure `flaskoidc` for each microservice of Amundsen
i.e., for frontendlibrary, metadatalibrary and searchlibrary in order to secure each of them. i.e., for frontendlibrary, metadatalibrary and searchlibrary in order to secure each of them.
## Amundsen Configuration ## Amundsen Configuration
...@@ -19,7 +19,7 @@ Once you have `flaskoidc` installed and configured for each microservice, please ...@@ -19,7 +19,7 @@ Once you have `flaskoidc` installed and configured for each microservice, please
APP_WRAPPER: flaskoidc APP_WRAPPER: flaskoidc
APP_WRAPPER_CLASS: FlaskOIDC APP_WRAPPER_CLASS: FlaskOIDC
``` ```
- amundsenmetadatalibrary: - amundsenmetadatalibrary:
```bash ```bash
FLASK_APP_MODULE_NAME: flaskoidc FLASK_APP_MODULE_NAME: flaskoidc
...@@ -31,16 +31,16 @@ Once you have `flaskoidc` installed and configured for each microservice, please ...@@ -31,16 +31,16 @@ Once you have `flaskoidc` installed and configured for each microservice, please
FLASK_APP_MODULE_NAME: flaskoidc FLASK_APP_MODULE_NAME: flaskoidc
FLASK_APP_CLASS_NAME: FlaskOIDC FLASK_APP_CLASS_NAME: FlaskOIDC
``` ```
By default `flaskoidc` whitelist the healthcheck URLs, to not authenticate them. In case of metadatalibrary and searchlibrary By default `flaskoidc` whitelist the healthcheck URLs, to not authenticate them. In case of metadatalibrary and searchlibrary
we may want to whitelist the healthcheck APIs explicitly using following environment variable. we may want to whitelist the healthcheck APIs explicitly using following environment variable.
```bash ```bash
FLASK_OIDC_WHITELISTED_ENDPOINTS: 'api.healthcheck' FLASK_OIDC_WHITELISTED_ENDPOINTS: 'api.healthcheck'
``` ```
## Setting Up Request Headers ## Setting Up Request Headers
To communicate securely between the microservices, you need to pass the bearer token from frontend in each request To communicate securely between the microservices, you need to pass the bearer token from frontend in each request
to metadatalibrary and searchlibrary. This should be done using `REQUEST_HEADERS_METHOD` config variable in frontendlibrary. to metadatalibrary and searchlibrary. This should be done using `REQUEST_HEADERS_METHOD` config variable in frontendlibrary.
- Define a function to add the bearer token in each request in your config.py: - Define a function to add the bearer token in each request in your config.py:
...@@ -58,15 +58,42 @@ def get_access_headers(app): ...@@ -58,15 +58,42 @@ def get_access_headers(app):
return {'Authorization': 'Bearer {}'.format(access_token)} return {'Authorization': 'Bearer {}'.format(access_token)}
except Exception: except Exception:
return None return None
``` ```
- Set the method as the request header method in your config.py: - Set the method as the request header method in your config.py:
```python ```python
REQUEST_HEADERS_METHOD = get_access_headers REQUEST_HEADERS_METHOD = get_access_headers
``` ```
This function will be called using the current `app` instance to add the headers in each request when calling any endpoint of This function will be called using the current `app` instance to add the headers in each request when calling any endpoint of
metadatalibrary and searchlibrary [here](https://github.com/lyft/amundsenfrontendlibrary/blob/master/amundsen_application/api/utils/request_utils.py) metadatalibrary and searchlibrary [here](https://github.com/lyft/amundsenfrontendlibrary/blob/master/amundsen_application/api/utils/request_utils.py)
## Setting Up Auth User Method
In order to get the current authenticated user (which is being used in Amundsen for many operations), we need to set
`AUTH_USER_METHOD` config variable in frontendlibrary.
This function should return email address, user id and any other required information.
- Define a function to fetch the user information in your config.py:
```python
def get_auth_user(app):
"""
Retrieves the user information from oidc token, and then makes
a dictionary 'UserInfo' from the token information dictionary.
We need to convert it to a class in order to use the information
in the rest of the Amundsen application.
:param app: The instance of the current app.
:return: A class UserInfo
"""
from flask import g
user_info = type('UserInfo', (object,), g.oidc_id_token)
# noinspection PyUnresolvedReferences
user_info.user_id = user_info.preferred_username
return user_info
```
- Set the method as the auth user method in your config.py:
```python
AUTH_USER_METHOD = get_auth_user
```
Once done, you'll have the end-to-end authentication in Amundsen without any proxy or code changes. Once done, you'll have the end-to-end authentication in Amundsen without any proxy or code changes.
\ No newline at end of file
# Installation # Installation
## Bootstrap a default version of Amundsen using Docker ## Bootstrap a default version of Amundsen using Docker
The following instructions are for setting up a version of Amundsen using Docker. At the moment, we only support a bootstrap for connecting the Amundsen application to an example metadata service. The following instructions are for setting up a version of Amundsen using Docker.
1. Install `docker`, `docker-compose`, and `docker-machine`. 1. Install `docker` and `docker-compose`.
2. Install `virtualbox` and `virtualenv`. 2. Clone [amundsenfrontendlibrary](https://github.com/lyft/amundsenfrontendlibrary) or download the [docker-amundsen.yml](https://github.com/lyft/amundsenfrontendlibrary/blob/master/docker-amundsen.yml) file directly.
3. Start a managed docker virtual host using the following command: 3. Enter the directory where the `docker-amundsen.yml` file is and then:
```bash
# in our examples our machine is named 'default'
$ docker-machine create -d virtualbox default
```
4. Check your docker daemon locally using:
```bash
$ docker-machine ls
```
You should see the `default` machine listed, running on virtualbox with no errors listed.
5. Set up the docker environment using
```bash
$ eval $(docker-machine env default)
```
TODO (ttannis): Once submodules configured, they _should_ be able to `cd amundsenfrontendlibrary`, etc. Will go through setup again and verify it works.
6. Setup your local environment.
* Clone [amundsenfrontendlibrary](https://github.com/lyft/amundsenfrontendlibrary), [amundsenmetadatalibrary](https://github.com/lyft/amundsenmetadatalibrary), and [amundsensearchlibrary](https://github.com/lyft/amundsensearchlibrary).
* In your local versions of each library, update the `LOCAL_HOST` in the `LocalConfig` with the IP used for the `default` docker machine. You can see the IP in the `URL` outputted from running `docker-machine ls`.
7. Start all of the services using:
```bash
# in ~/<your-path-to-cloned-repo>/amundsen
$ docker-compose -f docker-amundsen.yml up
```
8. Ingest dummy data into Neo4j by doing the following:
* Clone [amundsendatabuilder](https://github.com/lyft/amundsendatabuilder).
* Update the `NEO4J_ENDPOINT` and `Elasticsearch host` in [sample_data_loader.py](https://github.com/lyft/amundsendatabuilder/blob/master/example/scripts/sample_data_loader.py) and replace `localhost` with the IP used for the `default` docker machine. You can see the IP in the `URL` outputted from running `docker-machine ls`.
* Run the following commands:
```bash ```bash
# in ~/<your-path-to-cloned-repo>/amundsendatabuilder $ docker-compose -f docker-amundsen.yml up
$ virtualenv -p python3 venv3
$ source venv3/bin/activate
$ pip3 install -r requirements.txt
$ python setup.py install
$ python example/scripts/sample_data_loader.py
``` ```
9. Verify dummy data has been ingested by viewing in Neo4j by visiting `http://YOUR-DOCKER-HOST-IP:7474/browser/` and run `MATCH (n:Table) RETURN n LIMIT 25` in the query box. You should see two tables -- `hive.test_schema.test_table1` and `dynamo.test_schema.test_table2`. 4. Ingest dummy data into Neo4j by doing the following:
10. View UI at `http://YOUR-DOCKER-HOST-IP:5000/table_detail/gold/hive/test_schema/test_table1` or `/table_detail/gold/dynamo/test_schema/test_table2` * Clone [amundsendatabuilder](https://github.com/lyft/amundsendatabuilder).
11. View UI at `http://YOUR-DOCKER-HOST-IP:5000` and try to search `test`, it should return some result. * Run the following commands in the `amundsenddatabuilder` directory:
```bash
$ python3 -m venv venv
$ source venv/bin/activate
$ pip3 install -r requirements.txt
$ python3 setup.py install
$ python3 example/scripts/sample_data_loader.py
```
5. View UI at [`http://localhost:5000`](http://localhost:5000) and try to search `test`, it should return some result.
### Verify setup
1. You can verify dummy data has been ingested into Neo4j by by visiting [`http://localhost:7474/browser/`](http://localhost:7474/browser/) and run `MATCH (n:Table) RETURN n LIMIT 25` in the query box. You should see two tables:
1. `hive.test_schema.test_table1`
2. `dynamo.test_schema.test_table2`
2. You can verify the data has been loaded into the metadataservice by visiting:
1. [`http://localhost:5000/table_detail/gold/hive/test_schema/test_table1`](http://localhost:5000/table_detail/gold/hive/test_schema/test_table1)
2. [`http://localhost:5000/table_detail/gold/dynamo/test_schema/test_table2`](http://localhost:5000/table_detail/gold/dynamo/test_schema/test_table2)
### Troubleshooting
1. If the docker container doesn't have enough heap memory for Elastic Search, `es_amundsen` will fail during `docker-compose`.
1. docker-compose error: `es_amundsen | [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]`
2. Increase the heap memory [detailed instructions here](https://www.elastic.co/guide/en/elasticsearch/reference/7.1/docker.html#docker-cli-run-prod-mode)
1. Edit `/etc/sysctl.conf`
2. Make entry `vm.max_map_count=262144`. Save and exit.
3. Reload settings `$ sysctl -p`
4. Restart `docker-compose`
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment