Unverified Commit 0c77114a authored by Tamika Tannis's avatar Tamika Tannis Committed by GitHub

Configure submodules to child repos (#4)

* Add submodules

* Update doc with recent changes

* Add a PR template
parent b9e3a6e3
### Summary of Changes
_Include a summary of changes then remove this line_
### Documentation
_What documentation did you add or modify and why? Add any relevant links then remove this line_
### CheckList
Make sure you have checked **all** steps below to ensure a timely review.
- [ ] PR title addresses the issue accurately and concisely.
- [ ] PR includes a summary of changes.
- [ ] I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
[submodule "amundsendatabuilder"]
path = amundsendatabuilder
url = https://github.com/lyft/amundsendatabuilder
[submodule "amundsenfrontendlibrary"]
path = amundsenfrontendlibrary
url = https://github.com/lyft/amundsenfrontendlibrary
[submodule "amundsenmetadatalibrary"]
path = amundsenmetadatalibrary
url = https://github.com/lyft/amundsenmetadatalibrary
[submodule "amundsensearchlibrary"]
path = amundsensearchlibrary
url = https://github.com/lyft/amundsensearchlibrary
...@@ -77,6 +77,8 @@ Please visit [Roadmap](docs/roadmap.md) if you are interested in Amundsen upcomi ...@@ -77,6 +77,8 @@ Please visit [Roadmap](docs/roadmap.md) if you are interested in Amundsen upcomi
- [Amundsen: A Data Discovery Platform from Lyft](https://www.slideshare.net/taofung/data-council-sf-amundsen-presentation) (Data council 19 SF) - [Amundsen: A Data Discovery Platform from Lyft](https://www.slideshare.net/taofung/data-council-sf-amundsen-presentation) (Data council 19 SF)
- [Software Engineering Daily podcast on Amundsen](https://softwareengineeringdaily.com/2019/04/16/lyft-data-discovery-with-tao-feng-and-mark-grover/) (April 2019) - [Software Engineering Daily podcast on Amundsen](https://softwareengineeringdaily.com/2019/04/16/lyft-data-discovery-with-tao-feng-and-mark-grover/) (April 2019)
- [Disrupting Data Discovery](https://www.slideshare.net/markgrover/disrupting-data-discovery) (Strata London 2019) - [Disrupting Data Discovery](https://www.slideshare.net/markgrover/disrupting-data-discovery) (Strata London 2019)
- [Disrupting Data Discovery (video)](https://www.youtube.com/watch?v=m1B-ptm0Rrw) (Strata SF 2019)
- [ING Data Analytics Platform (Amundsen is mentioned)](https://static.sched.com/hosted_files/kccnceu19/65/ING%20Data%20Analytics%20Platform.pdf) (Kubecon Barcelona 2019)
# License # License
[Apache 2.0 License.](/LICENSE) [Apache 2.0 License.](/LICENSE)
Subproject commit f73c8128671b37020503558e6cd00ac02fd26306
Subproject commit 525d4323854f8f74f4c5198cc4efdf0283ebb13b
Subproject commit 2b33102d3f9511537656f60f987e3e79caef0c72
Subproject commit 46513a881e7b49f862b2a8b67131135d9026aed2
...@@ -4,8 +4,7 @@ services: ...@@ -4,8 +4,7 @@ services:
image: neo4j:3.3.0 image: neo4j:3.3.0
container_name: neo4j_amundsen container_name: neo4j_amundsen
environment: environment:
- CREDENTIALS_PROXY_USER=neo4j - NEO4J_AUTH=neo4j/test
- CREDENTIALS_PROXY_PASSWORD=test
ulimits: ulimits:
nofile: nofile:
soft: 40000 soft: 40000
...@@ -46,7 +45,6 @@ services: ...@@ -46,7 +45,6 @@ services:
- amundsennet - amundsennet
environment: environment:
- PROXY_HOST=bolt://neo4j_amundsen - PROXY_HOST=bolt://neo4j_amundsen
# - CREDENTIALS_PROXY_PASSWORD=neo4j_NOTE_FOR_NOW_IT_SEEMS_NEO4JCONFIG_DISREGARDS_CREDENTIALS_WE_SHOULD_FILE_A_BUG
amundsenfrontend: amundsenfrontend:
image: amundsendev/amundsen-frontend:1.0.5 image: amundsendev/amundsen-frontend:1.0.5
container_name: amundsenfrontend container_name: amundsenfrontend
......
...@@ -68,5 +68,32 @@ REQUEST_HEADERS_METHOD = get_access_headers ...@@ -68,5 +68,32 @@ REQUEST_HEADERS_METHOD = get_access_headers
This function will be called using the current `app` instance to add the headers in each request when calling any endpoint of This function will be called using the current `app` instance to add the headers in each request when calling any endpoint of
metadatalibrary and searchlibrary [here](https://github.com/lyft/amundsenfrontendlibrary/blob/master/amundsen_application/api/utils/request_utils.py) metadatalibrary and searchlibrary [here](https://github.com/lyft/amundsenfrontendlibrary/blob/master/amundsen_application/api/utils/request_utils.py)
Once done, you'll have the end-to-end authentication in Amundsen without any proxy or code changes. ## Setting Up Auth User Method
In order to get the current authenticated user (which is being used in Amundsen for many operations), we need to set
`AUTH_USER_METHOD` config variable in frontendlibrary.
This function should return email address, user id and any other required information.
- Define a function to fetch the user information in your config.py:
```python
def get_auth_user(app):
"""
Retrieves the user information from oidc token, and then makes
a dictionary 'UserInfo' from the token information dictionary.
We need to convert it to a class in order to use the information
in the rest of the Amundsen application.
:param app: The instance of the current app.
:return: A class UserInfo
"""
from flask import g
user_info = type('UserInfo', (object,), g.oidc_id_token)
# noinspection PyUnresolvedReferences
user_info.user_id = user_info.preferred_username
return user_info
```
- Set the method as the auth user method in your config.py:
```python
AUTH_USER_METHOD = get_auth_user
```
Once done, you'll have the end-to-end authentication in Amundsen without any proxy or code changes.
# Installation # Installation
## Bootstrap a default version of Amundsen using Docker ## Bootstrap a default version of Amundsen using Docker
The following instructions are for setting up a version of Amundsen using Docker. At the moment, we only support a bootstrap for connecting the Amundsen application to an example metadata service. The following instructions are for setting up a version of Amundsen using Docker.
1. Install `docker`, `docker-compose`, and `docker-machine`. 1. Install `docker` and `docker-compose`.
2. Install `virtualbox` and `virtualenv`. 2. Clone [amundsenfrontendlibrary](https://github.com/lyft/amundsenfrontendlibrary) or download the [docker-amundsen.yml](https://github.com/lyft/amundsenfrontendlibrary/blob/master/docker-amundsen.yml) file directly.
3. Start a managed docker virtual host using the following command: 3. Enter the directory where the `docker-amundsen.yml` file is and then:
```bash ```bash
# in our examples our machine is named 'default' $ docker-compose -f docker-amundsen.yml up
$ docker-machine create -d virtualbox default ```
``` 4. Ingest dummy data into Neo4j by doing the following:
4. Check your docker daemon locally using:
```bash
$ docker-machine ls
```
You should see the `default` machine listed, running on virtualbox with no errors listed.
5. Set up the docker environment using
```bash
$ eval $(docker-machine env default)
```
TODO (ttannis): Once submodules configured, they _should_ be able to `cd amundsenfrontendlibrary`, etc. Will go through setup again and verify it works.
6. Setup your local environment.
* Clone [amundsenfrontendlibrary](https://github.com/lyft/amundsenfrontendlibrary), [amundsenmetadatalibrary](https://github.com/lyft/amundsenmetadatalibrary), and [amundsensearchlibrary](https://github.com/lyft/amundsensearchlibrary).
* In your local versions of each library, update the `LOCAL_HOST` in the `LocalConfig` with the IP used for the `default` docker machine. You can see the IP in the `URL` outputted from running `docker-machine ls`.
7. Start all of the services using:
```bash
# in ~/<your-path-to-cloned-repo>/amundsen
$ docker-compose -f docker-amundsen.yml up
```
8. Ingest dummy data into Neo4j by doing the following:
* Clone [amundsendatabuilder](https://github.com/lyft/amundsendatabuilder). * Clone [amundsendatabuilder](https://github.com/lyft/amundsendatabuilder).
* Update the `NEO4J_ENDPOINT` and `Elasticsearch host` in [sample_data_loader.py](https://github.com/lyft/amundsendatabuilder/blob/master/example/scripts/sample_data_loader.py) and replace `localhost` with the IP used for the `default` docker machine. You can see the IP in the `URL` outputted from running `docker-machine ls`. * Run the following commands in the `amundsenddatabuilder` directory:
* Run the following commands:
```bash ```bash
# in ~/<your-path-to-cloned-repo>/amundsendatabuilder $ python3 -m venv venv
$ virtualenv -p python3 venv3 $ source venv/bin/activate
$ source venv3/bin/activate
$ pip3 install -r requirements.txt $ pip3 install -r requirements.txt
$ python setup.py install $ python3 setup.py install
$ python example/scripts/sample_data_loader.py $ python3 example/scripts/sample_data_loader.py
``` ```
9. Verify dummy data has been ingested by viewing in Neo4j by visiting `http://YOUR-DOCKER-HOST-IP:7474/browser/` and run `MATCH (n:Table) RETURN n LIMIT 25` in the query box. You should see two tables -- `hive.test_schema.test_table1` and `dynamo.test_schema.test_table2`. 5. View UI at [`http://localhost:5000`](http://localhost:5000) and try to search `test`, it should return some result.
10. View UI at `http://YOUR-DOCKER-HOST-IP:5000/table_detail/gold/hive/test_schema/test_table1` or `/table_detail/gold/dynamo/test_schema/test_table2`
11. View UI at `http://YOUR-DOCKER-HOST-IP:5000` and try to search `test`, it should return some result. ### Verify setup
1. You can verify dummy data has been ingested into Neo4j by by visiting [`http://localhost:7474/browser/`](http://localhost:7474/browser/) and run `MATCH (n:Table) RETURN n LIMIT 25` in the query box. You should see two tables:
1. `hive.test_schema.test_table1`
2. `dynamo.test_schema.test_table2`
2. You can verify the data has been loaded into the metadataservice by visiting:
1. [`http://localhost:5000/table_detail/gold/hive/test_schema/test_table1`](http://localhost:5000/table_detail/gold/hive/test_schema/test_table1)
2. [`http://localhost:5000/table_detail/gold/dynamo/test_schema/test_table2`](http://localhost:5000/table_detail/gold/dynamo/test_schema/test_table2)
### Troubleshooting
1. If the docker container doesn't have enough heap memory for Elastic Search, `es_amundsen` will fail during `docker-compose`.
1. docker-compose error: `es_amundsen | [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]`
2. Increase the heap memory [detailed instructions here](https://www.elastic.co/guide/en/elasticsearch/reference/7.1/docker.html#docker-cli-run-prod-mode)
1. Edit `/etc/sysctl.conf`
2. Make entry `vm.max_map_count=262144`. Save and exit.
3. Reload settings `$ sysctl -p`
4. Restart `docker-compose`
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment