Placeholder image

Janos Matyas

Fri, Jan 26, 2018


Authentication and authorization of Pipeline users with OAuth2 and Vault

Security series:
Authentication and authorization of Pipeline users with OAuth2 and Vault
Dynamic credentials with Vault using Kubernetes Service Accounts
Dynamic SSH with Vault and Pipeline

Pipeline is quickly moving towards an as a Service milestone, where the Pipeline PaaS will be available for the masses and early adopters as a hosted service as well (current deployments are all self-hosted). The hosted version, as many PaaS offerings will be a multitenant service. The resource and performance isolation of tenants will be handled by the underlying platform/core building block - Kubernetes (this topic deserves a post on its own). In this blogpost we are going to talk about authentication authn and authorization authz in Pipeline and briefly touch on the topics of SSO and securing the internal communication of the Kubernetes cluster itself using the same mechanisms.

Authentication using OAuth2 tokens

For Authentication we are going to use OAuth2 via delegating user authentication to the service that hosts the user account. There are plenty of OAuth2 identity providers out there: GitHub, Google, Facebook, Azure Active Directory, Twitter and Salesforce to mention only the biggest ones. At this time in Pipeline there is support for GitHub, mainly due to the fact that our CI/CD component is triggered by GitHub events, but we are using the very flexible QOR package which supports many major providers as a plugin mechanism, so it is just a matter of a configuration change to have support for the providers above (beside oldschool username/passwords). The main benefit of this solution is that we don’t have to store any user credentials and our users can use their existing accounts at these sites to access our service. The OAuth2 flow can be seen in the first picture. When a user hits Pipeline, they have to first login with GitHub to have a user record created in the RDBMS - the REST endpoint for that is: https://$HOST/auth/login.

Overview of the Pipeline's Auth flow

After we receive the user information from GitHub we create a user profile in our database with a generated user ID - we already use GORM and by chance QOR Auth uses GORM as well. After a successful login a session cookie gets stored in the user’s browser session. With this cookie the user is authorized to navigate through the Pipeline API but probably this is not what the desired use case is. Most of the interactions with the Pipeline API are through automations and code interactions, and an Access (Bearer) Token is more appropriate. Token creation is possible at the https://$HOST/api/v1/token REST endpoint and the returned JSON contains the token for sequential access:

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJodHRwczovL3BpcGVsaW5lLmJhbnphaWNsb3VkLmNvbSIsImV4cCI6MzAzMzYwMzI2MDA3MzQyMDAwMCwianRpIjoiMTMxOGQ0YTktMzI1ZS00YTNhLWFmNTctZjc0ZTRlODc5MTY1IiwiaWF0IjoxNTE2ODAxNjMwMDM2NzA5MDAwLCJpc3MiOiJodHRwczovL2JhbnphaWNsb3VkLmF1dGgwLmNvbS8iLCJzdWIiOiIxIiwic2NvcGUiOiJhcGk6aW52b2tlIn0.pMQrGyhc8H4Mn7CnmhfNAUv9pxecgymOWjIIi5MwCHA"
}

Now if you store this token for example in a variable within a shell, you can use cURL to call the protected parts of the API:

$ export TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJodHRwczovL3BpcGVsaW5lLmJhbnphaWNsb3VkLmNvbSIsImV4cCI6MzAzMzYwMzI2MDA3MzQyMDAwMCwianRpIjoiMTMxOGQ0YTktMzI1ZS00YTNhLWFmNTctZjc0ZTRlODc5MTY1IiwiaWF0IjoxNTE2ODAxNjMwMDM2NzA5MDAwLCJpc3MiOiJodHRwczovL2JhbnphaWNsb3VkLmF1dGgwLmNvbS8iLCJzdWIiOiIxIiwic2NvcGUiOiJhcGk6aW52b2tlIn0.pMQrGyhc8H4Mn7CnmhfNAUv9pxecgymOWjIIi5MwCHA"

$ curl -v -H "Authorization: Bearer $TOKEN" http://localhost:9090/api/v1/status
{"No running clusters found.":200}

JWT for Bearer token

Probably some of you have already noticed the format of the token - it is a JWT token which is a really good candidate for being a Bearer token. Note that JWT is based on the RFC 7519 standard. The main benefit of JWT is that is self-contained, so it allows stateless authentication. The server’s protected routes will check for a valid JWT in the Authorization header and if it’s present the user will be allowed to access protected resources based on the scopes field of the token.

JWT is stateless unless you would like to allow users to revoke the generated tokens immediately (so not waiting until the token expires). To be able to revoke JWT tokens you have to maintain a blacklist or a whitelist where you store all revoked or valid tokens. The question is then: How to do that and still be fast enough (e.g.: not querying an RDBMS) without imposing a security risk when somebody finds the tokens in plaintext format in a Key-Value store?

Vault them all

You can install Vault to Kubernetes using our Helm chart

For the purpose above we choose HashiCorp’s Vault. However there was another major contributor to the decision to standardize on Vault: Vault’s nice integration with the Kubernetes Authentication API. After Vault is started, the Kubernetes auth backend has to be enabled and configured, and with that Vault can lease tokens to be able to use its API based on ServiceAccount JWT tokens. This enables other applications running in the same Kubernetes cluster to call Vault and with this we can use tightly scoped tokens with various TTLs.

Pipeline Vault connection

Now that Pipeline has initiated its token with Vault it can start storing/looking-up the JWT user tokens inside it. When the user sends in the JWT in the Authorization HTTP header Pipeline validates its integrity (i.e. it hasn’t been changed and corrupted since its birth) and after that it extracts the user’s ID and the token’s ID from it, since every issued JWT by us has a unique UUID. With these two IDs we can have a per token unique path inside Vault’s key-value store (we are currently using Vault’s general Key/Values based Secret Backend mounted at secret/):

vault write secret/32412/971aa3a5-36d4-43f5-992f-68051092ccff jwt="eyJhbGciOi..."

Path and key names are not obfuscated or encrypted in Vault, but since only the IDs are stored in the path nothing especially secret is stored as plain text only. This unique path has two benefits

  • First we can list all tokens issued to a user easily with the following command (of course we are using the Vault Go Client, these examples are just for demonstration):
  vault list secret/32412
  • The second benefit is that we are not storing all keys inside the value part of a key because in that way we could lose or generate extra data during concurrent API calls touching the same key (a race condition would exist in that case):

Writing to a key in the kv backend will replace the old value; sub-fields are not merged. https://www.vaultproject.io/docs/secrets/kv/index.html

User JWT lookup in Vault

The number of Vault lookups executed this way could be reduced later on by employing an in-memory cache inside the Pipeline process with expiring LRU keys in front of the Vault lookups (with a short TTL of course, respecting invalidation as quickly as possible).

Learn by the code

The code implementing this authentication and authorization mechanism described above can be found in our open sourced GitHub repository.

Hope this first post on security was useful - the next follow up posts will talk about the other components we are running inside the Pipeline PaaS, and also Kubernetes internally, using OAuth2 tokens.

If you are interested in our technology and open source projects, follow us on GitHub, LinkedIn or Twitter:

Star



Comments

comments powered by Disqus