Large Object Support

Overview

Swift has a limit on the size of a single uploaded object; by default this is 5GB. However, the download size of a single object is virtually unlimited with the concept of segmentation. Segments of the larger object are uploaded and a special manifest file is created that, when downloaded, sends all the segments concatenated as a single object. This also offers much greater upload speed with the possibility of parallel uploads of the segments.

Dynamic Large Objects

Middleware that will provide Dynamic Large Object (DLO) support.

Using swift

The quickest way to try out this feature is use the swift Swift Tool included with the python-swiftclient library. You can use the -S option to specify the segment size to use when splitting a large file. For example:

swift upload test_container -S 1073741824 large_file

This would split the large_file into 1G segments and begin uploading those segments in parallel. Once all the segments have been uploaded, swift will then create the manifest file so the segments can be downloaded as one.

So now, the following swift command would download the entire large object:

swift download test_container large_file

swift command uses a strict convention for its segmented object support. In the above example it will upload all the segments into a second container named test_container_segments. These segments will have names like large_file/1290206778.25/21474836480/00000000, large_file/1290206778.25/21474836480/00000001, etc.

The main benefit for using a separate container is that the main container listings will not be polluted with all the segment names. The reason for using the segment name format of <name>/<timestamp>/<size>/<segment> is so that an upload of a new file with the same name won’t overwrite the contents of the first until the last moment when the manifest file is updated.

swift will manage these segment files for you, deleting old segments on deletes and overwrites, etc. You can override this behavior with the --leave-segments option if desired; this is useful if you want to have multiple versions of the same large object available.

Direct API

You can also work with the segments and manifests directly with HTTP requests instead of having swift do that for you. You can just upload the segments like you would any other object and the manifest is just a zero-byte (not enforced) file with an extra X-Object-Manifest header.

All the object segments need to be in the same container, have a common object name prefix, and sort in the order in which they should be concatenated. Object names are sorted lexicographically as UTF-8 byte strings. They don’t have to be in the same container as the manifest file will be, which is useful to keep container listings clean as explained above with swift.

The manifest file is simply a zero-byte (not enforced) file with the extra X-Object-Manifest: <container>/<prefix> header, where <container> is the container the object segments are in and <prefix> is the common prefix for all the segments.

It is best to upload all the segments first and then create or update the manifest. In this way, the full object won’t be available for downloading until the upload is complete. Also, you can upload a new set of segments to a second location and then update the manifest to point to this new location. During the upload of the new segments, the original manifest will still be available to download the first set of segments.

Note

When updating a manifest object using a POST request, a X-Object-Manifest header must be included for the object to continue to behave as a manifest object.

The manifest file should have no content. However, this is not enforced. If the manifest path itself conforms to container/prefix specified in X-Object-Manifest, and if manifest has some content/data in it, it would also be considered as segment and manifest’s content will be part of the concatenated GET response. The order of concatenation follows the usual DLO logic which is - the order of concatenation adheres to order returned when segment names are sorted.

Here’s an example using curl with tiny 1-byte segments:

# First, upload the segments
curl -X PUT -H 'X-Auth-Token: <token>'         http://<storage_url>/container/myobject/00000001 --data-binary '1'
curl -X PUT -H 'X-Auth-Token: <token>'         http://<storage_url>/container/myobject/00000002 --data-binary '2'
curl -X PUT -H 'X-Auth-Token: <token>'         http://<storage_url>/container/myobject/00000003 --data-binary '3'

# Next, create the manifest file
curl -X PUT -H 'X-Auth-Token: <token>'         -H 'X-Object-Manifest: container/myobject/'         http://<storage_url>/container/myobject --data-binary ''

# And now we can download the segments as a single object
curl -H 'X-Auth-Token: <token>'         http://<storage_url>/container/myobject

Static Large Objects

Middleware that will provide Static Large Object (SLO) support.

This feature is very similar to Dynamic Large Object (DLO) support in that it allows the user to upload many objects concurrently and afterwards download them as a single object. It is different in that it does not rely on eventually consistent container listings to do so. Instead, a user defined manifest of the object segments is used.

Uploading the Manifest

After the user has uploaded the objects to be concatenated, a manifest is uploaded. The request must be a PUT with the query parameter:

?multipart-manifest=put

The body of this request will be an ordered list of segment descriptions in JSON format. The data to be supplied for each segment is:

Key Description
path the path to the segment object (not including account) /container/object_name
etag the ETag given back when the segment object was PUT, or null
size_bytes the size of the complete segment object in bytes, or null
range (optional) the (inclusive) range within the object to use as a segment. If omitted, the entire object is used.

The format of the list will be:

[{"path": "/cont/object",
  "etag": "etagoftheobjectsegment",
  "size_bytes": 10485760,
  "range": "1048576-2097151"}, ...]

The number of object segments is limited to a configurable amount, default 1000. Each segment must be at least 1 byte. On upload, the middleware will head every segment passed in to verify:

  1. the segment exists (i.e. the HEAD was successful);
  2. the segment meets minimum size requirements;
  3. if the user provided a non-null etag, the etag matches;
  4. if the user provided a non-null size_bytes, the size_bytes matches; and
  5. if the user provided a range, it is a singular, syntactically correct range that is satisfiable given the size of the object.

Note that the etag and size_bytes keys are still required; this acts as a guard against user errors such as typos. If any of the objects fail to verify (not found, size/etag mismatch, below minimum size, invalid range) then the user will receive a 4xx error response. If everything does match, the user will receive a 2xx response and the SLO object is ready for downloading.

Behind the scenes, on success, a json manifest generated from the user input is sent to object servers with an extra “X-Static-Large-Object: True” header and a modified Content-Type. The items in this manifest will include the etag and size_bytes for each segment, regardless of whether the client specified them for verification. The parameter: swift_bytes=$total_size will be appended to the existing Content-Type, where total_size is the sum of all the included segments’ size_bytes. This extra parameter will be hidden from the user.

Manifest files can reference objects in separate containers, which will improve concurrent upload speed. Objects can be referenced by multiple manifests. The segments of a SLO manifest can even be other SLO manifests. Treat them as any other object i.e., use the Etag and Content-Length given on the PUT of the sub-SLO in the manifest to the parent SLO.

Range Specification

Users now have the ability to specify ranges for SLO segments. Users can now include an optional ‘range’ field in segment descriptions to specify which bytes from the underlying object should be used for the segment data. Only one range may be specified per segment.

Note

The ‘etag’ and ‘size_bytes’ fields still describe the backing object as a whole.

If a user uploads this manifest:

[{"path": "/con/obj_seg_1", "etag": null, "size_bytes": 2097152,
  "range": "0-1048576"},
 {"path": "/con/obj_seg_2", "etag": null, "size_bytes": 2097152,
  "range": "512-1550000"},
 {"path": "/con/obj_seg_1", "etag": null, "size_bytes": 2097152,
  "range": "-2048"}]

The segment will consist of the first 1048576 bytes of /con/obj_seg_1, followed by bytes 513 through 1550000 (inclusive) of /con/obj_seg_2, and finally bytes 2095104 through 2097152 (i.e., the last 2048 bytes) of /con/obj_seg_1.

Note

The minimum sized range is 1 byte. This is the same as the minimum segment size.

Retrieving a Large Object

A GET request to the manifest object will return the concatenation of the objects from the manifest much like DLO. If any of the segments from the manifest are not found or their Etag/Content Length have changed since upload, the connection will drop. In this case a 409 Conflict will be logged in the proxy logs and the user will receive incomplete results. Note that this will be enforced regardless of whether the user performed per-segment validation during upload.

The headers from this GET or HEAD request will return the metadata attached to the manifest object itself with some exceptions:

Content-Length: the total size of the SLO (the sum of the sizes of
                the segments in the manifest)
X-Static-Large-Object: True
Etag: the etag of the SLO (generated the same way as DLO)

A GET request with the query parameter:

?multipart-manifest=get

will return a transformed version of the original manifest, containing additional fields and different key names.

A GET request with the query parameters:

?multipart-manifest=get&format=raw

will return the contents of the original manifest as it was sent by the client. The main purpose for both calls is solely debugging.

When the manifest object is uploaded you are more or less guaranteed that every segment in the manifest exists and matched the specifications. However, there is nothing that prevents the user from breaking the SLO download by deleting/replacing a segment referenced in the manifest. It is left to the user to use caution in handling the segments.

Deleting a Large Object

A DELETE request will just delete the manifest object itself.

A DELETE with a query parameter:

?multipart-manifest=delete

will delete all the segments referenced in the manifest and then the manifest itself. The failure response will be similar to the bulk delete middleware.

Modifying a Large Object

PUTs / POSTs will work as expected, PUTs will just overwrite the manifest object for example.

Container Listings

In a container listing the size listed for SLO manifest objects will be the total_size of the concatenated segments in the manifest. The overall X-Container-Bytes-Used for the container (and subsequently for the account) will not reflect total_size of the manifest but the actual size of the json data stored. The reason for this somewhat confusing discrepancy is we want the container listing to reflect the size of the manifest object when it is downloaded. We do not, however, want to count the bytes-used twice (for both the manifest and the segments it’s referring to) in the container and account metadata which can be used for stats purposes.

class swift.common.middleware.slo.StaticLargeObject(app, conf, max_manifest_segments=1000, max_manifest_size=2097152)

Bases: object

StaticLargeObject Middleware

See above for a full description.

The proxy logs created for any subrequests made will have swift.source set to “SLO”.

Parameters:
  • app – The next WSGI filter or app in the paste.deploy chain.
  • conf – The configuration dict for the middleware.
get_segments_to_delete_iter(req)

A generator function to be used to delete all the segments and sub-segments referenced in a manifest.

Params req:

a swob.Request with an SLO manifest in path

Raises:
  • HTTPPreconditionFailed – on invalid UTF8 in request path
  • HTTPBadRequest – on too many buffered sub segments and on invalid SLO manifest path
get_slo_segments(obj_name, req)

Performs a swob.Request and returns the SLO manifest’s segments.

Raises:
  • HTTPServerError – on unable to load obj_name or on unable to load the SLO manifest data.
  • HTTPBadRequest – on not an SLO manifest
  • HTTPNotFound – on SLO manifest not found
Returns:

SLO manifest’s segments

handle_multipart_delete(req)

Will delete all the segments in the SLO manifest and then, if successful, will delete the manifest file.

Params req:a swob.Request with an obj in path
Returns:swob.Response whose app_iter set to Bulk.handle_delete_iter
handle_multipart_get_or_head(req, start_response)

Handles the GET or HEAD of a SLO manifest.

The response body (only on GET, of course) will consist of the concatenation of the segments.

Params req:a swob.Request with a path referencing an object
Raises:HttpException on errors
handle_multipart_put(req, start_response)

Will handle the PUT of a SLO manifest. Heads every object in manifest to check if is valid and if so will save a manifest generated from the user input. Uses WSGIContext to call self and start_response and returns a WSGI iterator.

Params req:a swob.Request with an obj in path
Raises:HttpException on errors
swift.common.middleware.slo.parse_and_validate_input(req_body, req_path)

Given a request body, parses it and returns a list of dictionaries.

The output structure is nearly the same as the input structure, but it is not an exact copy. Given a valid input dictionary d_in, its corresponding output dictionary d_out will be as follows:

  • d_out[‘etag’] == d_in[‘etag’]
  • d_out[‘path’] == d_in[‘path’]
  • d_in[‘size_bytes’] can be a string (“12”) or an integer (12), but d_out[‘size_bytes’] is an integer.
  • (optional) d_in[‘range’] is a string of the form “M-N”, “M-”, or “-N”, where M and N are non-negative integers. d_out[‘range’] is the corresponding swob.Range object. If d_in does not have a key ‘range’, neither will d_out.
Raises:HTTPException on parse errors or semantic errors (e.g. bogus JSON structure, syntactically invalid ranges)
Returns:a list of dictionaries on success

Direct API

SLO support centers around the user generated manifest file. After the user has uploaded the segments into their account a manifest file needs to be built and uploaded. All object segments, must be at least 1 byte in size. Please see the SLO docs for Static Large Objects further details.

Additional Notes

  • With a GET or HEAD of a manifest file, the X-Object-Manifest: <container>/<prefix> header will be returned with the concatenated object so you can tell where it’s getting its segments from.
  • When updating a manifest object using a POST request, a X-Object-Manifest header must be included for the object to continue to behave as a manifest object.
  • The response’s Content-Length for a GET or HEAD on the manifest file will be the sum of all the segments in the <container>/<prefix> listing, dynamically. So, uploading additional segments after the manifest is created will cause the concatenated object to be that much larger; there’s no need to recreate the manifest file.
  • The response’s Content-Type for a GET or HEAD on the manifest will be the same as the Content-Type set during the PUT request that created the manifest. You can easily change the Content-Type by reissuing the PUT.
  • The response’s ETag for a GET or HEAD on the manifest file will be the MD5 sum of the concatenated string of ETags for each of the segments in the manifest (for DLO, from the listing <container>/<prefix>). Usually in Swift the ETag is the MD5 sum of the contents of the object, and that holds true for each segment independently. But it’s not meaningful to generate such an ETag for the manifest itself so this method was chosen to at least offer change detection.

Note

If you are using the container sync feature you will need to ensure both your manifest file and your segment files are synced if they happen to be in different containers.

History

Dynamic large object support has gone through various iterations before settling on this implementation.

The primary factor driving the limitation of object size in Swift is maintaining balance among the partitions of the ring. To maintain an even dispersion of disk usage throughout the cluster the obvious storage pattern was to simply split larger objects into smaller segments, which could then be glued together during a read.

Before the introduction of large object support some applications were already splitting their uploads into segments and re-assembling them on the client side after retrieving the individual pieces. This design allowed the client to support backup and archiving of large data sets, but was also frequently employed to improve performance or reduce errors due to network interruption. The major disadvantage of this method is that knowledge of the original partitioning scheme is required to properly reassemble the object, which is not practical for some use cases, such as CDN origination.

In order to eliminate any barrier to entry for clients wanting to store objects larger than 5GB, initially we also prototyped fully transparent support for large object uploads. A fully transparent implementation would support a larger max size by automatically splitting objects into segments during upload within the proxy without any changes to the client API. All segments were completely hidden from the client API.

This solution introduced a number of challenging failure conditions into the cluster, wouldn’t provide the client with any option to do parallel uploads, and had no basis for a resume feature. The transparent implementation was deemed just too complex for the benefit.

The current “user manifest” design was chosen in order to provide a transparent download of large objects to the client and still provide the uploading client a clean API to support segmented uploads.

To meet an many use cases as possible Swift supports two types of large object manifests. Dynamic and static large object manifests both support the same idea of allowing the user to upload many segments to be later downloaded as a single file.

Dynamic large objects rely on a container listing to provide the manifest. This has the advantage of allowing the user to add/removes segments from the manifest at any time. It has the disadvantage of relying on eventually consistent container listings. All three copies of the container dbs must be updated for a complete list to be guaranteed. Also, all segments must be in a single container, which can limit concurrent upload speed.

Static large objects rely on a user provided manifest file. A user can upload objects into multiple containers and then reference those objects (segments) in a self generated manifest file. Future GETs to that file will download the concatenation of the specified segments. This has the advantage of being able to immediately download the complete object once the manifest has been successfully PUT. Being able to upload segments into separate containers also improves concurrent upload speed. It has the disadvantage that the manifest is finalized once PUT. Any changes to it means it has to be replaced.

Between these two methods the user has great flexibility in how (s)he chooses to upload and retrieve large objects to Swift. Swift does not, however, stop the user from harming themselves. In both cases the segments are deletable by the user at any time. If a segment was deleted by mistake, a dynamic large object, having no way of knowing it was ever there, would happily ignore the deleted file and the user will get an incomplete file. A static large object would, when failing to retrieve the object specified in the manifest, drop the connection and the user would receive partial results.