Per [rfd223], the web console is built as a static JS bundle served by Nexus. This means that it makes API requests directly from the browser. This RFD is about how we authenticate those requests. [owasp-session] is an excellent resource and is well worth reading for background. The determinations below largely represent things that have already been implemented in Nexus at the time of this revision.
In order to authenticate requests from the Console running in the user’s browser, Nexus must accept session cookies as an authentication method. The value of the cookie is a simple random string (see [owasp-session]) that points to at most one row in a sessions table. A session row contains a user ID foreign key. If a request comes in with a session cookie that points to a session in the DB that is not expired, the user is authenticated and identified as themselves.
Details in [security] section below.
SameSite=Lax, expiration time, don’t explicitly set a domain.
Mitigate CSRF with
SameSite=Lax + no mutations in GET requests (which we’re already doing). Due to browser support and subdomain weakness, combine with session-scoped CSRF token. Put token in HTML response (or non-
HttpOnly cookie) and have API requests send it back in a custom header. Custom header with arbitrary placeholder value (no token) may actually be sufficient due to browser restrictions on custom headers from other sites.
If the session is expired or nonexistent, Nexus’s response depends on whether the request is an API request or a console route request. For API requests, we respond with a 401 and the client decides what to do with that (in most cases we will likely redirect to a login page). But for console pages, we will want to return a redirect directly to the customer’s auth provider as indicated in the login flow diagram below.
As recommended in [owasp-session], there are two different TTLs that can cause a session to expire: an idle timeout and an absolute timeout. Precise numbers for these do not need to be determined here. They are configurable in Nexus.
Idle timeout is meant to be short, on the order of 30 minutes or an hour. Idle time is measured since the last successful use of the session. If the user does not do anything to trigger an authenticated request for the length of the idle timeout, the session expires. Upon successful use of a session cookie for authentication, the time of last use for that session is updated to
now in the database.
Absolute timeout is a bound on the total lifetime of a session, so it is measured from the time of session creation rather than time of last use. If a request comes in and time created was longer ago than the absolute timeout, the session is considered expired. It limits how long a session can be extended for. It is meant to be longer than the idle timeout.
For now we are hard-deleting sessions when they are found to be expired. For sessions that expire without a request actually coming in to trigger a deletion, we will also run a regular job to delete old sessions. Details TBD.
This diagram illustrates what happens when a user logs in. I’m using an OAuth 2.0 auth code flow as an example, but it’s going to look similar regardless of the protocol. The mechanism by which we redirect to the original target page (the last step in the flow) is discussed below in [login-redirect].
Now that the user is logged in, they can make the same request to
/ again, but this time it will go through.
If a user is logged in to the console and has a session cookie set in the browser, that cookie will be sent along with any requests to Nexus. CSRF attacks trick the user into sending a request to Nexus from some other site, usually by setting a URL of ours as the
action on a form embedded in a page.
SameSite=Lax: Cookies are not sent on normal cross-site subrequests (for example to load images or frames into a third party site), but are sent when a user is navigating to the origin site (i.e., when following a link).
SameSite=Strict: Cookies will only be sent in a first-party context and not be sent along with requests initiated by third party websites.
SameSite cookie attribute neutralizes CSRF more or less completely by telling the browser to only send the cookie along with requests that originate from our site. We will want to use the
Lax value so that when a user clicks a link to the console from somewhere else, their session cookie is sent along with that request and we do not redirect them to login.
However, there are two problems with
SameSite: browser support and subdomains.
Browser support is high but not quite at 100% — you basically have to be using a 2-3 year old browser to not have it. some devs are not comfortable relying on it yet. Virtually all of our users will be using a browser that supports it, but it’s always possible that one might not be. So we should use the
SameSite attribute alongside some other mitigation. This StackOverflow answer makes the clever point that browsers supporting TLS 1.3 all support SameSite cookies, so one way to guarantee SameSite is supported is to disable TLS 1.2 in Nexus. Browser support for TLS 1.3 is pretty good, so this is actually a live possibility.
The other problem is that the "site" in question only refers to the top-level domain. This means that
SameSite offers no protection against CSRF attacks from subdomains. Because we will not control the domain Nexus gets served through, the customer may well host other sites we cannot trust at subdomains with the same TLD as the rack. For this reason we should use
SameSite=Lax in combination with tokens as described in the next section.
The traditional approach to mitigating CSRF is to embed a special token in the page to send along with form posts and refuse any request missing such a token, as POSTS from third party sites would not have it. The traditional way of doing this with server-rendered web apps is to put a CSRF token inside each
<form> as a hidden input. That way every instance of every form gets it own token. But given that the console is rendered client-side, it’s hard to do that — we would have to ask the API for a token every time we render a form.
An easier approach for a single-page app is a single CSRF token for the entire session. It can be stored in a column on the session table and sent down in the HTML on initial pageload. When the console makes an API request, it can stick the token in a special header. This is slightly less secure that per-page or per-form tokens because the token has a longer lifetime, but considering that a custom header may be sufficient even without a token (see next section) it seems like a good middle ground.
If the CSRF token is a random token that’s generated alongside the session token and sent along with every request, why do you need it in addition to the session token? The key is that it doesn’t live in a cookie and therefore it is not automatically sent by the browser with every request to Nexus. That automatic sending-along is the root of the CSRF vulnerability. Malicious sites cannot send the CSRF token because they can’t put custom headers on requests to our domain from JS. They also cannot access the token at all because it only shows up in the response to a GET request, which are blocked by CORS when they come from third parties.
The CSRF token can also be sent down to the client in a non-
HttpOnly cookie (so it can be accessed from JS), which sounds like it shouldn’t work, but it does as long as the server doesn’t take it back as a cookie — it still has to be in a header or form post when you send a request back. Because it’s a cookie it will still be sent along with all requests, but the server should must it. And because it’s scoped to your site, other sites cannot access it from JS.
We may be able to avoid CSRF tokens altogether by using custom request headers.
This seems good enough to me and is trivial to implement. All you have to do is look for the header on all API requests from the console, which can be identified by their use of session cookie authentication. Note also:
CORS configuration should also be robust to make this solution work effectively (as custom headers for requests coming from other domains trigger a pre-flight CORS check).
The basic idea is you have to persist the target URL somewhere while you do the login rigmarole, and then retrieve the target URL at the end in order to go there. [login-redirect-auth0] covers the options in detail. The short version is either you store it in the browser, in a cookie or in a web storage thing like
sessionStorage, or in the
state param on the OAuth request. Cookie/web storage is easier and more protocol agnostic (if we’re supporting OAuth from one place and SAML from another, for example) but it might not always work, for example if the user’s browser is overzealous in blocking cookies. The
state param is more work to implement (for one thing, it probably needs to be implemented for each auth protocol) but is guaranteed to work. This is a pretty small detail and nothing is blocked by uncertainty about it, so we can figure this out at implementation time.
[rfd61] Oxide Computer Co. RFD 61: Control Plane Architecture and Design. https://rfd.shared.oxide.computer/rfd/0061. 2021.
[rfd223] Oxide Computer Co. RFD 223: Web Console Architecture. https://rfd.shared.oxide.computer/rfd/0223. 2021.
[owasp-session] OWASP Session Management Cheatsheet. https://cheatsheetseries.owasp.org/cheatsheets/Session_Management_Cheat_Sheet.html.
[owasp-xss] OWASP Cross Site Scripting Prevention Cheat Sheet. https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html.
[owasp-csrf] OWASP Cross-Site Request Forgery Prevention Cheat Sheet. https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html.
[login-redirect-auth0] Redirect Users After Login. https://auth0.com/docs/users/redirect-users-after-login.