#EKSclusters
Explore tagged Tumblr posts
govindhtech · 3 months ago
Text
EMR Studio Features Requirements and Limits AWS
Tumblr media
Amazon EMR Studio features, specs, and limitations:
Amazon EMR Studio describes an IDE for data preparation and visualisation, departmental collaboration, and application debugging. When utilising EMR Studio, consider tool usage, cluster demands, known issues, feature constraints, service limits, and regional availability.
Features of Amazon EMR Studio
Service Catalogue lets administrators connect EMR Studio to cluster templates. This lets users create Amazon EC2 EMR clusters for workspaces. Administrators can grant or deny Studio users access to cluster templates.
The Amazon EMR service role is needed to define access permissions to Amazon S3 notebook files or AWS Secrets Manager secrets because session policies do not allow them.
Multiple EMR Studios can control access to EMR clusters in different VPCs.
Use the AWS CLI to configure Amazon EMR on EKS clusters. Connect these clusters to Workspaces via a controlled API in Studio to run notebook jobs.
Amazon EMR and EMR Studio use trusted identity propagation, which has extra considerations. IAM Identity Centre and trusted identity propagation are required for EMR Studio to connect to EMR clusters that use it.
To secure Amazon EMR off-console applications, application hosting domains list their apps in the Public Suffix List (PSL). Examples are emrappui-prod.us-east-1.amazonaws.com, emrnotebooks-prod.us-east-1.amazonaws.com, and emrstudio-prod.us-east-1.amazonaws.com. For sensitive cookies in the default domain name, a __Host- prefix can prevent CSRF and add security.
EMR Studio Workspaces and Persistent UI endpoints use FIPS 140-certified cryptographic modules for encryption-in-transit, making the service suitable for regulated workloads.
Amazon EMR Studio requirements and compatibility
EMR Studio supports Amazon EMR Software versions 5.32.0 and 6.2.0.
EMR clusters using IAM Identity Centre with trusted identity propagation must use it.
Before setting up a Studio, disable browser proxy control applications like FoxyProxy or SwitchyOmega. Active proxies can cause Studio creation network failures.
Amazon EMR Studio restrictions and issues
EMR Studio does not support Python magic commands %alias, %alias_magic, %automagic, %macro, %%js, and %%javascript. Changing KERNEL_USERNAME or proxy_user using %env or %set_env or %configure is not supported.
Amazon EMR on EKS clusters does not support SparkMagic commands in EMR Studio.
All multi-line Scala statements in notebook cells must end with a period except the last.
Amazon EMR kernels on EKS clusters may timeout and fail to start. Should this happen, restart the kernel and close and reopen the notebook file. The Restart kernel operation requires restarting the Workspace, and EMR on EKS clusters may not work.
If a workspace is not connected to a cluster, starting a notebook and choosing a kernel fails. Choose a kernel and attach the workspace to run code, but ignore this error.
With Amazon EMR 6.2.0 security, the Workspace interface may be blank. For security-configured EMRFS S3 authorisation or data encryption, choose a different supported version. Troubleshooting EMR on EC2 tasks may disable on-cluster Spark UI connectivity. Run %%info in a new cell to regenerate these links.
5.32.0, 5.33.0, 6.2.0, and 6.3.0 Amazon EMR primary nodes do not have idle kernels cleaned away by Jupyter Enterprise Gateway. This may drain resources and crash long-running clusters. A script in the sources configures idle kernel cleanup for certain versions.
If the auto-termination policy is enabled on Amazon EMR versions 5.32.0, 5.33.0, 6.2.0, or 6.3.0, a cluster with an active Python3 kernel may be designated as inactive and terminated since it does not submit a Spark task. Amazon EMR 6.4.0 or later is recommended for Python3 kernel auto-termination.
Displaying a Spark DataFrame using %%display may truncate wide tables. Create a scrollable view by right-clicking the output and selecting Create New View for Output.
If you interrupt a running cell in a Spark-based kernel (PySpark, Spark, SparkR), the Spark task stays running. The on-cluster Spark UI is needed to end the job.
EMR Studio Workspaces as the root user in an AWS account causes a 403: Forbidden error because Jupyter Enterprise Gateway settings disallow root user access. Instead of root, employ alternate authentication methods for normal activities.
EMR Studio does not support Amazon EMR features:
connecting to and running tasks on Kerberos-secured clusters.
multi-node clusters.
AWS Graviton2-based EC2 clusters for EMR 6.x releases below 6.9.0 and 5.x releases below 5.36.1.
A studio utilising trusted identity propagation cannot provide these features:
Building EMR clusters without templates using serverless applications.
Amazon EMR launches on EKS clusters.
Use a runtime role.
Supporting SQL Explorer or Workspace collaboration.
Limited Amazon EMR Studio Service
Service Restriction The sources list EMR Studio service limits:
EMR Studios:
Each AWS account can have 100 max.
Maximum five subnets per EMR Studio.
IAM Identity Centre Groups are limited to five per EMR Studio.
EMR Studios can have 100 IAM Identity Centre users.
0 notes