Unverified Commit 0c77114a authored by Tamika Tannis's avatar Tamika Tannis Committed by GitHub

Configure submodules to child repos (#4)

* Add submodules

* Update doc with recent changes

* Add a PR template
parent b9e3a6e3
### Summary of Changes
_Include a summary of changes then remove this line_
### Documentation
_What documentation did you add or modify and why? Add any relevant links then remove this line_
### CheckList
Make sure you have checked **all** steps below to ensure a timely review.
- [ ] PR title addresses the issue accurately and concisely.
- [ ] PR includes a summary of changes.
- [ ] I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
[submodule "amundsendatabuilder"]
path = amundsendatabuilder
url = https://github.com/lyft/amundsendatabuilder
[submodule "amundsenfrontendlibrary"]
path = amundsenfrontendlibrary
url = https://github.com/lyft/amundsenfrontendlibrary
[submodule "amundsenmetadatalibrary"]
path = amundsenmetadatalibrary
url = https://github.com/lyft/amundsenmetadatalibrary
[submodule "amundsensearchlibrary"]
path = amundsensearchlibrary
url = https://github.com/lyft/amundsensearchlibrary
......@@ -77,6 +77,8 @@ Please visit [Roadmap](docs/roadmap.md) if you are interested in Amundsen upcomi
- [Amundsen: A Data Discovery Platform from Lyft](https://www.slideshare.net/taofung/data-council-sf-amundsen-presentation) (Data council 19 SF)
- [Software Engineering Daily podcast on Amundsen](https://softwareengineeringdaily.com/2019/04/16/lyft-data-discovery-with-tao-feng-and-mark-grover/) (April 2019)
- [Disrupting Data Discovery](https://www.slideshare.net/markgrover/disrupting-data-discovery) (Strata London 2019)
- [Disrupting Data Discovery (video)](https://www.youtube.com/watch?v=m1B-ptm0Rrw) (Strata SF 2019)
- [ING Data Analytics Platform (Amundsen is mentioned)](https://static.sched.com/hosted_files/kccnceu19/65/ING%20Data%20Analytics%20Platform.pdf) (Kubecon Barcelona 2019)
# License
[Apache 2.0 License.](/LICENSE)
Subproject commit f73c8128671b37020503558e6cd00ac02fd26306
Subproject commit 525d4323854f8f74f4c5198cc4efdf0283ebb13b
Subproject commit 2b33102d3f9511537656f60f987e3e79caef0c72
Subproject commit 46513a881e7b49f862b2a8b67131135d9026aed2
......@@ -4,8 +4,7 @@ services:
image: neo4j:3.3.0
container_name: neo4j_amundsen
environment:
- CREDENTIALS_PROXY_USER=neo4j
- CREDENTIALS_PROXY_PASSWORD=test
- NEO4J_AUTH=neo4j/test
ulimits:
nofile:
soft: 40000
......@@ -46,7 +45,6 @@ services:
- amundsennet
environment:
- PROXY_HOST=bolt://neo4j_amundsen
# - CREDENTIALS_PROXY_PASSWORD=neo4j_NOTE_FOR_NOW_IT_SEEMS_NEO4JCONFIG_DISREGARDS_CREDENTIALS_WE_SHOULD_FILE_A_BUG
amundsenfrontend:
image: amundsendev/amundsen-frontend:1.0.5
container_name: amundsenfrontend
......
......@@ -68,5 +68,32 @@ REQUEST_HEADERS_METHOD = get_access_headers
This function will be called using the current `app` instance to add the headers in each request when calling any endpoint of
metadatalibrary and searchlibrary [here](https://github.com/lyft/amundsenfrontendlibrary/blob/master/amundsen_application/api/utils/request_utils.py)
Once done, you'll have the end-to-end authentication in Amundsen without any proxy or code changes.
## Setting Up Auth User Method
In order to get the current authenticated user (which is being used in Amundsen for many operations), we need to set
`AUTH_USER_METHOD` config variable in frontendlibrary.
This function should return email address, user id and any other required information.
- Define a function to fetch the user information in your config.py:
```python
def get_auth_user(app):
"""
Retrieves the user information from oidc token, and then makes
a dictionary 'UserInfo' from the token information dictionary.
We need to convert it to a class in order to use the information
in the rest of the Amundsen application.
:param app: The instance of the current app.
:return: A class UserInfo
"""
from flask import g
user_info = type('UserInfo', (object,), g.oidc_id_token)
# noinspection PyUnresolvedReferences
user_info.user_id = user_info.preferred_username
return user_info
```
- Set the method as the auth user method in your config.py:
```python
AUTH_USER_METHOD = get_auth_user
```
Once done, you'll have the end-to-end authentication in Amundsen without any proxy or code changes.
# Installation
## Bootstrap a default version of Amundsen using Docker
The following instructions are for setting up a version of Amundsen using Docker. At the moment, we only support a bootstrap for connecting the Amundsen application to an example metadata service.
The following instructions are for setting up a version of Amundsen using Docker.
1. Install `docker`, `docker-compose`, and `docker-machine`.
2. Install `virtualbox` and `virtualenv`.
3. Start a managed docker virtual host using the following command:
```bash
# in our examples our machine is named 'default'
$ docker-machine create -d virtualbox default
```
4. Check your docker daemon locally using:
```bash
$ docker-machine ls
```
You should see the `default` machine listed, running on virtualbox with no errors listed.
5. Set up the docker environment using
```bash
$ eval $(docker-machine env default)
```
TODO (ttannis): Once submodules configured, they _should_ be able to `cd amundsenfrontendlibrary`, etc. Will go through setup again and verify it works.
6. Setup your local environment.
* Clone [amundsenfrontendlibrary](https://github.com/lyft/amundsenfrontendlibrary), [amundsenmetadatalibrary](https://github.com/lyft/amundsenmetadatalibrary), and [amundsensearchlibrary](https://github.com/lyft/amundsensearchlibrary).
* In your local versions of each library, update the `LOCAL_HOST` in the `LocalConfig` with the IP used for the `default` docker machine. You can see the IP in the `URL` outputted from running `docker-machine ls`.
7. Start all of the services using:
```bash
# in ~/<your-path-to-cloned-repo>/amundsen
$ docker-compose -f docker-amundsen.yml up
```
8. Ingest dummy data into Neo4j by doing the following:
1. Install `docker` and `docker-compose`.
2. Clone [amundsenfrontendlibrary](https://github.com/lyft/amundsenfrontendlibrary) or download the [docker-amundsen.yml](https://github.com/lyft/amundsenfrontendlibrary/blob/master/docker-amundsen.yml) file directly.
3. Enter the directory where the `docker-amundsen.yml` file is and then:
```bash
$ docker-compose -f docker-amundsen.yml up
```
4. Ingest dummy data into Neo4j by doing the following:
* Clone [amundsendatabuilder](https://github.com/lyft/amundsendatabuilder).
* Update the `NEO4J_ENDPOINT` and `Elasticsearch host` in [sample_data_loader.py](https://github.com/lyft/amundsendatabuilder/blob/master/example/scripts/sample_data_loader.py) and replace `localhost` with the IP used for the `default` docker machine. You can see the IP in the `URL` outputted from running `docker-machine ls`.
* Run the following commands:
* Run the following commands in the `amundsenddatabuilder` directory:
```bash
# in ~/<your-path-to-cloned-repo>/amundsendatabuilder
$ virtualenv -p python3 venv3
$ source venv3/bin/activate
$ python3 -m venv venv
$ source venv/bin/activate
$ pip3 install -r requirements.txt
$ python setup.py install
$ python example/scripts/sample_data_loader.py
$ python3 setup.py install
$ python3 example/scripts/sample_data_loader.py
```
9. Verify dummy data has been ingested by viewing in Neo4j by visiting `http://YOUR-DOCKER-HOST-IP:7474/browser/` and run `MATCH (n:Table) RETURN n LIMIT 25` in the query box. You should see two tables -- `hive.test_schema.test_table1` and `dynamo.test_schema.test_table2`.
10. View UI at `http://YOUR-DOCKER-HOST-IP:5000/table_detail/gold/hive/test_schema/test_table1` or `/table_detail/gold/dynamo/test_schema/test_table2`
11. View UI at `http://YOUR-DOCKER-HOST-IP:5000` and try to search `test`, it should return some result.
5. View UI at [`http://localhost:5000`](http://localhost:5000) and try to search `test`, it should return some result.
### Verify setup
1. You can verify dummy data has been ingested into Neo4j by by visiting [`http://localhost:7474/browser/`](http://localhost:7474/browser/) and run `MATCH (n:Table) RETURN n LIMIT 25` in the query box. You should see two tables:
1. `hive.test_schema.test_table1`
2. `dynamo.test_schema.test_table2`
2. You can verify the data has been loaded into the metadataservice by visiting:
1. [`http://localhost:5000/table_detail/gold/hive/test_schema/test_table1`](http://localhost:5000/table_detail/gold/hive/test_schema/test_table1)
2. [`http://localhost:5000/table_detail/gold/dynamo/test_schema/test_table2`](http://localhost:5000/table_detail/gold/dynamo/test_schema/test_table2)
### Troubleshooting
1. If the docker container doesn't have enough heap memory for Elastic Search, `es_amundsen` will fail during `docker-compose`.
1. docker-compose error: `es_amundsen | [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]`
2. Increase the heap memory [detailed instructions here](https://www.elastic.co/guide/en/elasticsearch/reference/7.1/docker.html#docker-cli-run-prod-mode)
1. Edit `/etc/sysctl.conf`
2. Make entry `vm.max_map_count=262144`. Save and exit.
3. Reload settings `$ sysctl -p`
4. Restart `docker-compose`
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment