Security Features¶
Jupyter Enterprise Gateway does not currently perform user authentication but, instead, assumes that all users issuing requests have been previously authenticated. Recommended applications for this are Apache Knox or perhaps even Jupyter Hub (e.g., if nb2kg-enabled notebook servers were spawned targeting an Enterprise Gateway cluster).
This section introduces some of the security features inherent in Enterprise Gateway (with more to come).
KERNEL_USERNAME
In order to convey the name of the authenicated user, KERNEL_USERNAME
should be sent in the kernel creation request
via the env:
entry. This will occur automatically within NB2KG since it propagates all environment variables
prefixed with KERNEL_
. If the request does not include a KERNEL_USERNAME
entry, one will be added to the kernel’s
launch environment with the value of the gateway user.
This value is then used within the authorization and impersonation functionality.
Authorization¶
By default, all users are authorized to start kernels. This behavior can be adjusted when situations arise where more control is required. Basic authorization can be expressed in two ways.
Authorized Users¶
The command-line or configuration file option: EnterpriseGatewayApp.authorized_users
can be specified to contain a
list of user names indicating which users are permitted to launch kernels within the current gateway server.
On each kernel launched, the authorized users list is searched for the value of KERNEL_USERNAME
(case-sensitive). If
the user is found in the list the kernel’s launch sequence continues, otherwise HTTP Error 403 (Forbidden) is raised
and the request fails.
Warning: Since the authorized_users
option must be exhaustive, it should be used only in situations where a small
and limited set of users are allowed access and empty otherwise.
Unauthorized Users¶
The command-line or configuration file option: EnterpriseGatewayApp.unauthorized_users
can be specified to contain a
list of user names indicating which users are NOT permitted to launch kernels within the current gateway server.
The unauthorized_users
list is always checked prior to the authorized_users
list. If the value of KERNEL_USERNAME
appears in the unauthorized_users
list, the request is immediately failed with the same 403 (Forbidden) HTTP Error.
From a system security standpoint, privileged users (e.g., root
and any users allowed sudo
privileges) should be
added to this option.
Authorization Failures¶
It should be noted that the corresponding messages logged when each of the above authorization failures occur are slightly different. This allows the administrator to discern from which authorization list the failure was generated.
Failures stemming from inclusion in the unauthorized_users
list will include text similar to the following:
User 'bob' is not authorized to start kernel 'Spark - Python (YARN Client Mode)'. Ensure
KERNEL_USERNAME is set to an appropriate value and retry the request.
Failures stemming from exclusion from a non-empty authorized_users
list will include text similar to the following:
User 'bob' is not in the set of users authorized to start kernel 'Spark - Python (YARN Client Mode)'. Ensure
KERNEL_USERNAME is set to an appropriate value and retry the request.
User Impersonation¶
The Enterprise Gateway server leverages other technologies to implement user impersonation when launching kernels. This
option is configured via two pieces of information: EG_IMPERSONATION_ENABLED
and
KERNEL_USERNAME
.
EG_IMPERSONATION_ENABLED
indicates the intention that user impersonation should be performed and can also be conveyed
via the command-line boolean option EnterpriseGatewayApp.impersonation_enabled
(default = False).
KERNEL_USERNAME
is also conveyed within the environment of the kernel launch sequence where
its value is used to indicate the user that should be impersonated.
Impersonation in YARN Cluster Mode¶
In a cluster managed by the YARN resource manager, impersonation is implemented by leveraging kerberos, and thus require
this security option as a pre-requisite for user impersonation. When user impersonation is enabled, kernels are launched
with the --proxy-user ${KERNEL_USERNAME}
which will tell YARN to launch the kernel in a container used by the provided
user name.
Note that, when using kerberos in a YARN managed cluster, the gateway user (elyra
by default) needs to be set up as a
proxyuser
superuser in hadoop configuration. Please refer to the
Hadoop documentation
regarding the proper configuration steps.
SPNEGO Authentication to YARN APIs¶
When kerberos is enabled in a YARN managed cluster, the administration uis can be configured to require authentication/authorization via SPENEGO. When running Enterprise Gateway in a environment configured this way, we need to convey an extra configuration to enable the proper authorization when communicating with YARN via the YARN APIs.
YARN_ENDPOINT_SECURITY_ENABLED
indicates the requirement to use SPNEGO authentication/authorization when connecting with the
YARN APIs and can also be conveyed via the command-line boolean option EnterpriseGatewayApp.yarn_endpoint_security_enabled
(default = False)
Impersonation in Standalone or YARN Client Mode¶
Impersonation performed in standalone or YARN cluster modes tends to take the form of using sudo
to perform the
kernel launch as the target user. This can also be configured within the
run.sh
script and requires the following:
- The gateway user (i.e., the user in which Enterprise Gateway is running) must be enabled to perform sudo operations on each potential host. This enablement must also be done to prevent password prompts since Enterprise Gateway runs in the background. Refer to your operating system documentation for details.
- Each user identified by
KERNEL_USERNAME
must be associated with an actual operating system user on each host. - Once the gateway user is configured for
sudo
privileges it is strongly recommended that that user be included in the set ofunauthorized_users
. Otherwise, kernels not configured for impersonation, or those requests that do not includeKERNEL_USERNAME
, will run as the, now, highly privileged gateway user!
WARNING: Should impersonation be disabled after granting the gateway user elevated privileges, it is strongly recommended those privileges be revoked (on all hosts) prior to starting kernels since those kernels will run as the gateway user regardless of the value of KERNEL_USERNAME.
SSH Tunneling¶
Jupyter Enterprise Gateway is now configured to perform SSH tunneling on the five ZeroMQ kernel sockets as well as the
communication socket created within the launcher and used to perform remote and cross-user signalling functionality. SSH
tunneling is NOT enabled by default. Tunneling can be enabled/disabled via the environment variable EG_ENABLE_TUNNELING=False
.
Note, there is no command-line or configuration file support for this variable.
Note that SSH by default validates host keys before connecting to remote hosts and the connection will fail for invalid or unknown hosts. Enterprise Gateway honors this requirement, and invalid or unknown hosts will cause tunneling to fail. Please perform necessary steps to validate all hosts before enabling SSH tunneling, such as:
- SSH to each node cluster and accept the host key properly
- Configure SSH to disable
StrictHostKeyChecking
Securing Enterprise Gateway Server¶
Using SSL for encrypted communication¶
Enterprise Gateway supports Secure Sockets Layer (SSL) communication with its clients. With SSL enabled, all the communication between the server and client are encrypted and highly secure.
You can start Enterprise Gateway to communicate via a secure protocol mode by setting the
certfile
andkeyfile
options with the command:jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0 --certfile=mycert.pem --keyfile=mykey.key
As server starts up, the log should reflect the following,
[EnterpriseGatewayApp] Jupyter Enterprise Gateway at https://localhost:8888
Note: Enterprise Gateway server is started with
HTTPS
instead ofHTTP
, meaning server side SSL is enabled.TIP: A self-signed certificate can be generated with openssl. For example, the following command will create a certificate valid for 365 days with both the key and certificate data written to the same file:
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mykey.key -out mycert.pem
With Enterprise Gateway server SSL enabled, now you need to configure the client side SSL, which is NB2KG serverextension.
During Jupyter notebook server startup, export the following environment variables where NB2KG will access during runtime:
export KG_CLIENT_CERT=${PATH_TO_PEM_FILE} export KG_CLIENT_KEY=${PATH_TO_KEY_FILE} export KG_CLIENT_CA=${PATH_TO_SELFSIGNED_CA}
Note: If using a self-signed certificate, you can set
KG_CLIENT_CA
same asKG_CLIENT_CERT
.
Using Enterprise Gateway configuration file¶
You can also utilize the Enterprise Gateway configuration file to set static configurations for the server.
If you do not already have a configuration file, generate a Enterprise Gateway configuration file by running the following command:
jupyter enterprisegateway --generate-config
By default, the configuration file will be generated
~/.jupyter/jupyter_enterprise_gateway_config.py
.By default, all the configuration fields in
jupyter_enterprise_gateway_config.py
are commented out. To enable SSL from the configuration file, modify the corresponding parameter to the appropriate value.s,c.KernelGatewayApp.certfile = '/absolute/path/to/your/certificate/fullchain.pem' s,c.KernelGatewayApp.keyfile = '/absolute/path/to/your/certificate/privatekey.key'
Using configuration file achieves the same result as starting the server with
--certfile
and--keyfile
, this way provides better readability and debuggability.
After configuring the above, the communication between NB2KG and Enterprise Gateway is SSL enabled.