Use EMR Serverless with Boto3
generate-parquet.ipynb- Download source data and transform into
parquetformat
- Download source data and transform into
credentials_example.cfg- Credentials required for running
EMR Serverless.
- Credentials required for running
emr-serverless-IaC-functional.ipynb- Set up an
ApplicationinEMR Studio. - Generate required
role,policyand attach it to the role. - Submit
jobto theApplicationand track the status
- Set up an
read_outputs.ipynb- read outputs in
S3withawswrangler - visualize data with
matplotlib
- read outputs in
- Generate a programmatic access user with policy as below:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "emr-serverless:*", "iam:GetAccountAuthorizationDetails" ], "Resource": "*" }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:ListBucket", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::tpe-mrt-data", "arn:aws:s3:::tpe-mrt-data/*" ] } ] } - update
access_key,secret_access_keyanduser_account_idincredentials_example.cfg - rename
credentials_example.cfgtocredentials.cfg - run
emr-serverless-IaC-functional.ipynb - read output with
read_outputs.ipynb

