Skip to main content

API Errors

Well formatted errors are an important component of education to the user of the API and should be given as much care as the API design.

Just like an HTML error page shows a useful error message to a visitor, an API should provide a useful error message in a known consumable format.

The API should always return sensible HTTP status codes. API errors typically break down into 2 types: 4xx series status codes for client issues & 5xx series status codes for server issues. All 4xx series errors generated by Milmove handlers should provide a consumable JSON error representation. 5xx errors served by the Milmove code should also abide by this requirement. However some 5xx errors are generated by the router and upstream mechanisms and those may not abide by this requirement.

The JSON error body should provide a few things for the developer - a useful error title, possibly a further detailed description, and an instance identifier that makes it possible to refer to this particular occurrence of the problem.

The instance identifier enables the Milmove team to look up Cloudwatch logs that were written when the problem occurred.

Default Error

The default JSON output representation for the error messages looks like:

{
"title": "Conflict Error",
"detail": "Estimated weight must be set before this item can be approved.",
"instance": "7673868d-231e-490d-9c4f-19288e7e668d"
}

Custom Error

In addition to this default format, specific errors can choose to provide a custom representation that builds on this. A swagger example of how to define these errors is provided in this article on backend errors.

Validation errors for PUT, PATCH and POST requests need a field breakdown, and will return an UnprocessableEntity response. The top level detail can summarize or generalize the validation failures and provide the detailed errors in an additional invalidFields object, like so:

 {
"title": "Validation Error",
"detail": "MoveTaskOrderID can not be blank.",
"instance": "1fd81778-4c47-4998-ba03-ea94bc0ac21c",
"invalidFields": {
"move_task_order_id": [
"MoveTaskOrderID can not be blank."
]
}
}

HTTP status codes

HTTP defines a bunch of meaningful status codes that can be returned from an API. Milmove will leverage these to help the API consumers route their responses accordingly.

The types of responses are grouped into the following top level categories which we would generate within Milmove handler and service code:

  • 2xx Success - This category indicates the action requested by the client was received, understood, and accepted.
  • 4xx Client Error - This category indicate that the error seems to have been caused by the client.
  • 5xx Server Error - This category indicates the server failed to fulfill a request.

Importance of the Client vs. Server Error Distinction - When we have multiple 500 server errors, this results in a PagerDuty alarm that on-call folks have to resolve. It signals that something went wrong on our side that shouldn't have. Keep this in mind as you select which type of error response your situation calls for.

Below are the subset of HTTP responses we generate in the Prime API handlers and service code.

  • 200 OK - Response to a successful GET, PUT, PATCH or DELETE. Can also be used for a POST that doesn't result in a creation.
  • 201 Created - Response to a POST that results in a creation.
  • 404 Not Found - When a non-existent resource is requested.
  • 422 Unprocessable Entity - Used for validation errors.
  • 409 Conflict - When we cannot process the request due to the current state of the server.
  • 412 Precondition Failed - When the optimistic locking precondition is violated because the provided eTag does not match the record.

One special note for Bad Request errors, which is not listed above. 400 Bad Request implies that the request is malformed, such as if the body does not parse. Since the request is first parsed by swagger, swagger will return a Bad Request error if it is unable to parse. The handler never gets called unless the request was able to be correctly parsed. Therefore, we should never need to return Bad Request.

Adapted from Best Practices for a Pragmatic Restful API

Handling Errors in the Code

In the handler functions and service objects, we create an error type based on events in the system. These events are handed back up to the top level handler where they are then converted into an appropriate error response.

An error type is one that we define, create and pass in the service level code such as InvalidInputError. It has no dependence on swagger generated code or a specific endpoint.

An error response is a swagger generated type that we use to send errors back to the user, with a payload we construct. It is specific to the endpoint.

mtoshipmentops.NewCreateMTOShipmentNotFound().WithPayload(payload)

Error types and Error responses

Error types in the code are defined in pkg/apperror/errors.go.

Error responses are defined in the yamls, swagger/prime.yaml and swagger/support.yaml

Here are the main error types and the responses we send in that case:

  • NotFoundError404 Not Found

    This error is generated when we can't find a resource or record. It is returned to the user as 404 Not Found with a message indicating the type of object and preferably the ID of the missing record, if known.

  • InvalidInputError422 Unprocessable Entity

    This error is generated when there are validation errors in the input. Swagger may also return this error as it performs pre-parsing of the request before calling the handler.

    When we generate this error, we should populate the ValidationErrors field in the struct with the error field and detail so that it can be return in the invalidFields of the JSON response.

  • ConflictError409 Conflict

    This error is generated when we cannot process the request due to the current state of the server. For e.g. if the PrimeEstimatedWeight is not set on a shipment, we cannot create certain service items on that shipment. If we get a request to create such a service item, we have a conflict error.

  • PreconditionFailedError412 Precondition Failed

    This error is generated on a PUT or PATCH when the eTag does not match the record. It is returned to the user with a message indicating that the eTag is a mismatch.

  • QueryError500 Internal Server Error

    If there is nothing wrong with the request the client sent us, and we still get an error in the query to the DB, we may then generate a QueryError. At the top level, we send a 500 Internal Server Error.

    The error generated at the query level may contain more detail than we want to expose to the user of the API such as database table and column names. It's a good idea to log these errors but we don't want to return them to the user.

Note: There might be new errors that we generate as the code evolves, but it's a good idea to understand these definitions and see if your error fits one of them before defining a new one. This improves consistency and makes the errors easier to understand and process by the user. If you do define a new error, please update this guide.

Security Guidelines

In the next section we will cover converting the errors into responses. For each of the error responses, do not provide any information in the error message to help debug the issue. The target audience for the error responses is the client, not the developer. So we can let them know what they did wrong, but not reveal information about our own systems.

Secure logging of errors into Cloudwatch is what we use for developer information. This is accomplished in using logger.Error() function calls.

Internal Server Error: Always use an Internal Server Error response if something broke on our side. You can set the message to nil and it will have a boilerplate message. The client does not need to know anything further if an Internal Server Error occurred.

Trace ID: Make sure each error has a trace ID. We use the trace ID to identify the set of events that resulted in an error message. The client can report the trace ID to us, and a developer can then debug the issue that occurred using the trace id. The trace ID is meaningless to them.

DB data: Sometimes an error is triggered in the DB. The message often starts with pq. Do not pass on this error string in the response. You may log it securely into Cloudwatch but there is no reason to pass on the contents of a DB error to the client.

File paths: Equally there is no reason to reveal a file path to the client. Never provide this information in the error message.

Constructing Errors and Responses

To create an instance of an error type, you'll call the New function for that error, and pass in parameters pertaining to that error.

For a validation error, details of the validation issue should be in verrs, but there is an additional message field as well.

invalidInputError := services.NewInvalidInputError(
shipment.ID, nil, verrs,
"DestinationID was not valid"
)

At the topmost handler function, usually called Handle, you'll check the type of the error and construct the appropriate response.

Note that the message in the call to payloads.InternalServerError is often nil. This is because we have a default message and title that will be populated.

You can override the default message in the call if you have a more useful message, but if not, the default message is sufficient.

Here is some example code of how we take the error type and create the appropriate response, according to the mapping described above.

if err != nil {
logger.Error("primeapi.CreateMTOShipmentHandler", zap.Error(err))

switch e := err.(type) {

// NotFoundError -> Not Found response
case services.NotFoundError:
return mtoshipmentops.NewCreateMTOShipmentNotFound().
WithPayload(payloads.ClientError(handlers.NotFoundMessage, err.Error(), h.GetTraceID()))

// InvalidInputError -> Unprocessable Entity reponse
case services.InvalidInputError:
return mtoshipmentops.NewCreateMTOShipmentUnprocessableEntity().
WithPayload(payloads.ValidationError(handlers.ValidationErrMessage, h.GetTraceID(), e.ValidationErrors))

// QueryError -> Internal Server error
case services.QueryError:
if e.Unwrap() != nil {
// If you can unwrap, log the internal error (usually a pq error) for better debugging
// Note we do not expose this detail in the payload
logger.Error("primeapi.CreateMTOServiceItemHandler error", zap.Error(e.Unwrap()))
}
return mtoshipmentops.NewCreateMTOShipmentInternalServerError().
WithPayload(payloads.InternalServerError(nil, h.GetTraceID()))
// Unknown -> Internal Server Error
default:
return mtoshipmentops.NewCreateMTOShipmentInternalServerError().
WithPayload(payloads.InternalServerError(nil, h.GetTraceID()))
}
}