Ubuntu Manpage: S3 - Amazon S3 Web Service Interface

NAME

       S3 - Amazon S3 Web Service Interface

SYNOPSIS

       package require Tcl  8.5

       package require sha1  1.0

       package require md5  2.0

       package require base64  2.3

       package require xsxp  1.0

       S3::Configure  ?-reset  boolean?  ?-retries  integer? ?-accesskeyid idstring? ?-secretaccesskey idstring?
       ?-service-access-point           FQDN?           ?-use-tls           boolean?           ?-default-compare
       always|never|exists|missing|newer|date|checksum|different?   ?-default-separator   string?  ?-default-acl
       private|public-read|public-read-write|authenticated-read|keep|calc? ?-default-bucket bucketname?

       S3::SuggestBucket ?name?

       S3::REST dict

       S3::ListAllMyBuckets      ?-blocking      boolean?       ?-parse-xml       xmlstring?       ?-result-type
       REST|xml|pxml|dict|names|owner?

       S3::PutBucket   ?-bucket   bucketname?   ?-blocking  boolean?  ?-acl  {}|private|public-read|public-read-
       write|authenticated-read?

       S3::DeleteBucket ?-bucket bucketname? ?-blocking boolean?

       S3::GetBucket ?-bucket  bucketname?  ?-blocking  boolean?  ?-parse-xml  xmlstring?  ?-max-count  integer?
       ?-prefix prefixstring? ?-delimiter delimiterstring? ?-result-type REST|xml|pxml|names|dict?

       S3::Put  ?-bucket  bucketname?  -resource  resourcename  ?-blocking  boolean?  ?-file filename? ?-content
       contentstring? ?-acl  private|public-read|public-read-write|authenticated-read|calc|keep?  ?-content-type
       contenttypestring? ?-x-amz-meta-* metadatatext? ?-compare comparemode?

       S3::Get  ?-bucket  bucketname?  -resource  resourcename ?-blocking boolean? ?-compare comparemode? ?-file
       filename? ?-content contentvarname? ?-timestamp aws|now? ?-headers headervarname?

       S3::Head ?-bucket bucketname? -resource resourcename ?-blocking boolean?  ?-dict  dictvarname?  ?-headers
       headersvarname? ?-status statusvarname?

       S3::GetAcl ?-blocking boolean? ?-bucket bucketname? -resource resourcename ?-result-type REST|xml|pxml?

       S3::PutAcl ?-blocking boolean? ?-bucket bucketname? -resource resourcename ?-acl new-acl?

       S3::Delete ?-bucket bucketname? -resource resourcename ?-blocking boolean? ?-status statusvar?

       S3::Push ?-bucket bucketname? -directory directoryname ?-prefix prefixstring? ?-compare comparemode? ?-x-
       amz-meta-*   metastring?  ?-acl  aclcode?  ?-delete  boolean?  ?-error  throw|break|continue?  ?-progress
       scriptprefix?

       S3::Pull  ?-bucket  bucketname?  -directory  directoryname  ?-prefix  prefixstring?  ?-blocking  boolean?
       ?-compare  comparemode?  ?-delete  boolean? ?-timestamp aws|now? ?-error throw|break|continue? ?-progress
       scriptprefix?

       S3::Toss ?-bucket bucketname? -prefix  prefixstring  ?-blocking  boolean?  ?-error  throw|break|continue?
       ?-progress scriptprefix?

_________________________________________________________________

DESCRIPTION

       This package provides access to Amazon's Simple Storage Solution web service.

       As a quick summary, Amazon Simple Storage Solution provides a for-fee web service allowing the storage of
       arbitrary data as "resources" within "buckets" online.  See http://www.amazonaws.com/ for details on that
       system.   Access  to  the  service  is via HTTP (SOAP or REST).  Much of this documentation will not make
       sense if you're not familiar with the terms and functionality of the Amazon S3 service.

       This package provides services for reading and writing the data items via the REST  interface.   It  also
       provides  some  higher-level  operations.   Other packages in the same distribution provide for even more
       functionality.

       Copyright 2006 Darren New. All Rights Reserved.  NO WARRANTIES OF ANY TYPE ARE PROVIDED.  COPYING OR  USE
       INDEMNIFIES  THE  AUTHOR IN ALL WAYS.  This software is licensed under essentially the same terms as Tcl.
       See LICENSE.txt for the terms.

ERROR REPORTING

       The error reporting from this package makes use of $errorCode to provide more details  on  what  happened
       than  simply  throwing  an error.  Any error caught by the S3 package (and we try to catch them all) will
       return with an $errorCode being a list having at least three elements. In all cases,  the  first  element
       will  be "S3". The second element will take on one of six values, with that element defining the value of
       the third and subsequent elements. S3::REST does not throw an error, but rather returns a dictionary with
       the keys "error", "errorInfo", and "errorCode" set. This allows for reliable background use. The possible
       second elements are these:

       usage  The usage of the package is incorrect. For example, a command has been invoked which requires  the
              library  to  be  configured  before  the library has been configured, or an invalid combination of
              options has been specified. The third element of $errorCode supplies the  name  of  the  parameter
              that  was  wrong.  The  fourth  usually  provides the arguments that were actually supplied to the
              throwing proc, unless the usage error isn't confined to a single proc.

       local  Something happened on the local system which threw an error. For example, a request to  upload  or
              download a file was made and the file permissions denied that sort of access. The third element of
              $errorCode is the original $errorCode.

       socket Something  happened with the socket. It closed prematurely, or some other condition of failure-to-
              communicate-with-Amazon was detected. The third element of $errorCode is the original  $errorCode,
              or sometimes the message from fcopy, or ...?

       remote The  Amazon  web service returned an error code outside the 2xx range in the HTTP header. In other
              words, everything went as documented, except this particular case was documented not to work.  The
              third element is the dictionary returned from ::S3::REST.  Note that S3::REST itself never  throws
              this  error,  but  just  returns  the  dictionary.  Most  of  the  higher-level commands throw for
              convenience, unless an argument indicates they should not. If  something  is  documented  as  "not
              throwing  an  S3  remote  error", it means a status return is set rather than throwing an error if
              Amazon returns a non-2XX HTTP result code.

       notyet The user obeyed the documentation, but the author has not yet gotten around to  implementing  this
              feature.  (Right  now,  only TLS support and sophisticated permissions fall into this category, as
              well as the S3::Acl command.)

       xml    The service has returned invalid XML, or XML  whose  schema  is  unexpected.  For  the  high-level
              commands that accept service XML as input for parsing, this may also be thrown.

COMMANDS

       This package provides several separate levels of complexity.

       •      The  lowest  level  simply  takes  arguments  to be sent to the service, sends them, retrieves the
              result, and provides it to the caller.  Note: This layer allows both synchronous and  event-driven
              processing.  It  depends  on  the  MD5  and  SHA1  and  base64  packages from Tcllib (available at
              http://tcllib.sourceforge.net/).  Note that S3::Configure is required for S3::REST to work due  to
              the authentication portion, so we put that in the "lowest level."

       •      The  next  layer  parses  the  results of calls, allowing for functionality such as uploading only
              changed files, synchronizing directories, and so on.  This layer depends on the TclXML package  as
              well  as  the  included  xsxp  package.  These  packages  are  package  required  when these more-
              sophisticated routines are called, so nothing breaks if they are not correctly installed.

       •      Also included is a separate program that uses the library.  It provides code to parse  $argv0  and
              $argv from the command line, allowing invocation as a tclkit, etc.  (Not yet implmented.)

       •      Another  separate  program  provides  a  GUI  interface  allowing  drag-and-drop  and  other  such
              functionality. (Not yet implemented.)

       •      Also built on this package is the OddJob program. It is  a  separate  program  designed  to  allow
              distribution of computational work units over Amazon's Elastic Compute Cloud web service.

       The  goal  is to have at least the bottom-most layers implemented in pure Tcl using only that which comes
       from widely-available sources, such as Tcllib.

LOW LEVEL COMMANDS

These commands do not require any packages not listed above. They talk directly to the service, or they
are utility or configuration routines. Note that the "xsxp" package was written to support this package,
so it should be available wherever you got this package.

S3::Configure ?-reset boolean? ?-retries integer? ?-accesskeyid idstring? ?-secretaccesskey idstring?
?-service-access-point FQDN? ?-use-tls boolean? ?-default-compare
always|never|exists|missing|newer|date|checksum|different? ?-default-separator string? ?-default-acl
private|public-read|public-read-write|authenticated-read|keep|calc? ?-default-bucket bucketname?
There is one command for configuration, and that is S3::Configure. If called with no arguments,
it returns a dictionary of key/value pairs listing all current settings. If called with one
argument, it returns the value of that single argument. If called with two or more arguments, it
must be called with pairs of arguments, and it applies the changes in order. There is only one
set of configuration information per interpreter.

The following options are accepted:

-reset boolean
By default, false. If true, any previous changes and any changes on the same call before
the reset option will be returned to default values.

-retries integer
Default value is 3. If Amazon returns a 500 error, a retry after an exponential backoff
delay will be tried this many times before finally throwing the 500 error. This applies to
each call to S3::REST from the higher-level commands, but not to S3::REST itself. That is,
S3::REST will always return httpstatus 500 if that's what it receives. Functions like
S3::Put will retry the PUT call, and will also retry the GET and HEAD calls used to do
content comparison. Changing this to 0 will prevent retries and their associated delays.
In addition, socket errors (i.e., errors whose errorCode starts with "S3 socket") will be
similarly retried after backoffs.

-accesskeyid idstring

-secretaccesskey idstring
Each defaults to an empty string. These must be set before any calls are made. This is
your S3 ID. Once you sign up for an account, go to http://www.amazonaws.com/, sign in, go
to the "Your Web Services Account" button, pick "AWS Access Identifiers", and your access
key ID and secret access keys will be available. All S3::REST calls are authenticated.
Blame Amazon for the poor choice of names.

-service-access-point FQDN
Defaults to "s3.amazonaws.com". This is the fully-qualified domain name of the server to
contact for S3::REST calls. You should probably never need to touch this, unless someone
else implements a compatible service, or you wish to test something by pointing the library
at your own service.

-slop-seconds integer
When comparing dates between Amazon and the local machine, two dates within this many
seconds of each other are considered the same. Useful for clock drift correction,
processing overhead time, and so on.

-use-tls boolean
Defaults to false. This is not yet implemented. If true, S3::REST will negotiate a TLS
connection to Amazon. If false, unencrypted connections are used.

-bucket-prefix string
Defaults to "TclS3". This string is used by S3::SuggestBucketName if that command is
passed an empty string as an argument. It is used to distinguish different applications
using the Amazon service. Your application should always set this to keep from interfering
with the buckets of other users of Amazon S3 or with other buckets of the same user.

-default-separator string
Defaults to "/". This is currently unused. It might make sense to use this for S3::Push and
S3::Pull, but allowing resources to have slashes in their names that aren't marking
directories would be problematic. Hence, this currently does nothing.

-default-acl private|public-read|public-read-write|authenticated-read|keep|calc
Defaults to an empty string. If no -acl argument is provided to S3::Put or S3::Push, this
string is used (given as the x-amz-acl header if not keep or calc). If this is also empty,
no x-amz-acl header is generated. This is not used by S3::REST.

-default-bucket bucketname
If no bucket is given to S3::GetBucket, S3::PutBucket, S3::Get, S3::Put, S3::Head, S3::Acl,
S3::Delete, S3::Push, S3::Pull, or S3::Toss, and if this configuration variable is not an
empty string (and not simply "/"), then this value will be used for the bucket. This is
useful if one program does a large amount of resource manipulation within a single bucket.

S3::SuggestBucket ?name?
The S3::SuggestBucket command accepts an optional string as a prefix and returns a valid bucket
containing the name argument and the Access Key ID. This makes the name unique to the owner and to
the application (assuming the application picks a good name argument). If no name is provided,
the name from S3::Configure -bucket-prefix is used. If that too is empty (which is not the
default), an error is thrown.

S3::REST dict
The S3::REST command takes as an argument a dictionary and returns a dictionary. The return
dictionary has the same keys as the input dictionary, and includes additional keys as the result.
The presence or absence of keys in the input dictionary can control the behavior of the routine.
It never throws an error directly, but includes keys "error", "errorInfo", and "errorCode" if
necessary. Some keys are required, some optional. The routine can run either in blocking or non-
blocking mode, based on the presense of resultvar in the input dictionary. This requires the
-accesskeyid and -secretaccesskey to be configured via S3::Configure before being called.

The possible input keys are these:

verb GET|PUT|DELETE|HEAD
This required item indicates the verb to be used.

resource string
This required item indicates the resource to be accessed. A leading / is added if not
there already. It will be URL-encoded for you if necessary. Do not supply a resource name
that is already URL-encoded.

?rtype torrent|acl?
This indicates a torrent or acl resource is being manipulated. Do not include this in the
resource key, or the "?" separator will get URL-encoded.

?parameters dict?
This optional dictionary provides parameters added to the URL for the transaction. The keys
must be in the correct case (which is confusing in the Amazon documentation) and the values
must be valid. This can be an empty dictionary or omitted entirely if no parameters are
desired. No other error checking on parameters is performed.

?headers dict?
This optional dictionary provides headers to be added to the HTTP request. The keys must be
in lower case for the authentication to work. The values must not contain embedded newlines
or carriage returns. This is primarily useful for adding x-amz-* headers. Since
authentication is calculated by S3::REST, do not add that header here. Since content-type
gets its own key, also do not add that header here.

?inbody contentstring?
This optional item, if provided, gives the content that will be sent. It is sent with a
tranfer encoding of binary, and only the low bytes are used, so use [encoding convertto
utf-8] if the string is a utf-8 string. This is written all in one blast, so if you are
using non-blocking mode and the inbody is especially large, you may wind up blocking on the
write socket.

?infile filename?
This optional item, if provided, and if inbody is not provided, names the file from which
the body of the HTTP message will be constructed. The file is opened for reading and sent
progressively by [fcopy], so it should not block in non-blocking mode even if the file is
very large. The file is transfered in binary mode, so the bytes on your disk will match the
bytes in your resource. Due to HTTP restrictions, it must be possible to use [file size] on
this file to determine the size at the start of the transaction.

?S3chan channel?
This optional item, if provided, indicates the already-open socket over which the
transaction should be conducted. If not provided, a connection is made to the service
access point specified via S3::Configure, which is normally s3.amazonaws.com. If this is
provided, the channel is not closed at the end of the transaction.

?outchan channel?
This optional item, if provided, indicates the already-open channel to which the body
returned from S3 should be written. That is, to retrieve a large resource, open a file, set
the translation mode, and pass the channel as the value of the key outchan. Output will be
written to the channel in pieces so memory does not fill up unnecessarily. The channel is
not closed at the end of the transaction.

?resultvar varname?
This optional item, if provided, indicates that S3::REST should run in non-blocking mode.
The varname should be fully qualified with respect to namespaces and cannot be local to a
proc. If provided, the result of the S3::REST call is assigned to this variable once
everything has completed; use trace or vwait to know when this has happened. If this key
is not provided, the result is simply returned from the call to S3::REST and no calls to
the eventloop are invoked from within this call.

?throwsocket throw|return?
This optional item, if provided, indicates that S3::REST should throw an error if throwmode
is throw and a socket error is encountered. It indicates that S3::REST should return the
error code in the returned dictionary if a socket error is encountered and this is set to
return. If throwsocket is set to return or if the call is not blocking, then a socket error
(i.e., an error whose error code starts with "S3 socket" will be returned in the dictionary
as error, errorInfo, and errorCode. If a foreground call is made (i.e., resultvar is not
provided), and this option is not provided or is set to throw, then error will be invoked
instead.

Once the call to S3::REST completes, a new dict is returned, either in the resultvar or as the result of
execution. This dict is a copy of the original dict with the results added as new keys. The possible new
keys are these:

error errorstring

errorInfo errorstring

errorCode errorstring
If an error is caught, these three keys will be set in the result. Note that S3::REST does
not consider a non-2XX HTTP return code as an error. The errorCode value will be formatted
according to the ERROR REPORTING description. If these are present, other keys described
here might not be.

httpstatus threedigits
The three-digit code from the HTTP transaction. 2XX for good, 5XX for server error, etc.

httpmessage text
The textual result after the status code. "OK" or "Forbidden" or etc.

outbody contentstring
If outchan was not specified, this key will hold a reference to the (unencoded) contents of
the body returned. If Amazon returned an error (a la the httpstatus not a 2XX value), the
error message will be in outbody or written to outchan as appropriate.

outheaders dict
This contains a dictionary of headers returned by Amazon. The keys are always lower case.
It's mainly useful for finding the x-amz-meta-* headers, if any, although things like last-
modified and content-type are also useful. The keys of this dictionary are always lower
case. Both keys and values are trimmed of extraneous whitespace.

HIGH LEVEL COMMANDS

The routines in this section all make use of one or more calls to S3::REST to do their work, then parse
and manage the data in a convenient way. All these commands throw errors as described in ERROR REPORTING
unless otherwise noted.

In all these commands, all arguments are presented as name/value pairs, in any order. All the argument
names start with a hyphen.

There are a few options that are common to many of the commands, and those common options are documented
here.

-blocking boolean
If provided and specified as false, then any calls to S3:REST will be non-blocking, and internally
these routines will call [vwait] to get the results. In other words, these routines will return
the same value, but they'll have event loops running while waiting for Amazon.

-parse-xml xmlstring
If provided, the routine skips actually communicating with Amazon, and instead behaves as if the
XML string provided was returned as the body of the call. Since several of these routines allow
the return of data in various formats, this argument can be used to parse existing XML to extract
the bits of information that are needed. It's also helpful for testing.

-bucket bucketname
Almost every high-level command needs to know what bucket the resources are in. This option
specifies that. (Only the command to list available buckets does not require this parameter.)
This does not need to be URL-encoded, even if it contains special or non-ASCII characters. May or
may not contain leading or trailing spaces - commands normalize the bucket. If this is not
supplied, the value is taken from S3::Configure -default-bucket if that string isn't empty. Note
that spaces and slashes are always trimmed from both ends and the rest must leave a valid bucket.

-resource resourcename
This specifies the resource of interest within the bucket. It may or may not start with a slash -
both cases are handled. This does not need to be URL-encoded, even if it contains special or non-
ASCII characters.

-compare always|never|exists|missing|newer|date|checksum|different
When commands copy resources to files or files to resources, the caller may specify that the copy
should be skipped if the contents are the same. This argument specifies the conditions under which
the files should be copied. If it is not passed, the result of S3::Configure -default-compare is
used, which in turn defaults to "always." The meanings of the various values are these:

always Always copy the data. This is the default.

never Never copy the data. This is essentially a no-op, except in S3::Push and S3::Pull where the
-delete flag might make a difference.

exists Copy the data only if the destination already exists.

missing
Copy the data only if the destination does not already exist.

newer Copy the data if the destination is missing, or if the date on the source is newer than the
date on the destination by at least S3::Configure -slop-seconds seconds. If the source is
Amazon, the date is taken from the Last-Modified header. If the source is local, it is
taken as the mtime of the file. If the source data is specified in a string rather than a
file, it is taken as right now, via [clock seconds].

date Like newer, except copy if the date is newer or older.

checksum
Calculate the MD5 checksum on the local file or string, ask Amazon for the eTag of the
resource, and copy the data if they're different. Copy the data also if the destination is
missing. Note that this can be slow with large local files unless the C version of the MD5
support is available.

different
Copy the data if the destination does not exist. If the destination exists and an actual
file name was specified (rather than a content string), and the date on the file differs
from the date on the resource, copy the data. If the data is provided as a content string,
the "date" is treated as "right now", so it will likely always differ unless slop-seconds
is large. If the dates are the same, the MD5 checksums are compared, and the data is
copied if the checksums differ.

Note that "newer" and "date" don't care about the contents, and "checksum" doesn't care about the dates,
but "different" checks both.

S3::ListAllMyBuckets ?-blocking boolean? ?-parse-xml xmlstring? ?-result-type
REST|xml|pxml|dict|names|owner?
This routine performs a GET on the Amazon S3 service, which is defined to return a list of buckets
owned by the account identified by the authorization header. (Blame Amazon for the dumb names.)

-blocking boolean
See above for standard definition.

-parse-xml xmlstring
See above for standard definition.

-result-type REST
The dictionary returned by S3::REST is the return value of S3::ListAllMyBuckets. In this
case, a non-2XX httpstatus will not throw an error. You may not combine this with -parse-
xml.

-result-type xml
The raw XML of the body is returned as the result (with no encoding applied).

-result-type pxml
The XML of the body as parsed by xsxp::parse is returned.

-result-type dict
A dictionary of interesting portions of the XML is returned. The dictionary contains the
following keys:

Owner/ID
The Amazon AWS ID (in hex) of the owner of the bucket.

Owner/DisplayName
The Amazon AWS ID's Display Name.

Bucket/Name
A list of names, one for each bucket.

Bucket/CreationDate
A list of dates, one for each bucket, in the same order as Bucket/Name, in ISO
format (as returned by Amazon).

-result-type names
A list of bucket names is returned with all other information stripped out. This is the
default result type for this command.

-result-type owner
A list containing two elements is returned. The first element is the owner's ID, and the
second is the owner's display name.

S3::PutBucket ?-bucket bucketname? ?-blocking boolean? ?-acl {}|private|public-read|public-read-
write|authenticated-read?
This command creates a bucket if it does not already exist. Bucket names are globally unique, so
you may get a "Forbidden" error from Amazon even if you cannot see the bucket in
S3::ListAllMyBuckets. See S3::SuggestBucket for ways to minimize this risk. The x-amz-acl header
comes from the -acl option, or from S3::Configure -default-acl if not specified.

S3::DeleteBucket ?-bucket bucketname? ?-blocking boolean?
This command deletes a bucket if it is empty and you have such permission. Note that Amazon's
list of buckets is a global resource, requiring far-flung synchronization. If you delete a bucket,
it may be quite a few minutes (or hours) before you can recreate it, yielding "Conflict" errors
until then.

S3::GetBucket ?-bucket bucketname? ?-blocking boolean? ?-parse-xml xmlstring? ?-max-count integer?
?-prefix prefixstring? ?-delimiter delimiterstring? ?-result-type REST|xml|pxml|names|dict?
This lists the contents of a bucket. That is, it returns a directory listing of resources within a
bucket, rather than transfering any user data.

-bucket bucketname
The standard bucket argument.

-blocking boolean
The standard blocking argument.

-parse-xml xmlstring
The standard parse-xml argument.

-max-count integer
If supplied, this is the most number of records to be returned. If not supplied, the code
will iterate until all records have been found. Not compatible with -parse-xml. Note that
if this is supplied, only one call to S3::REST will be made. Otherwise, enough calls will
be made to exhaust the listing, buffering results in memory, so take care if you may have
huge buckets.

-prefix prefixstring
If present, restricts listing to resources with a particular prefix. One leading / is
stripped if present.

-delimiter delimiterstring
If present, specifies a delimiter for the listing. The presence of this will summarize
multiple resources into one entry, as if S3 supported directories. See the Amazon
documentation for details.

-result-type REST|xml|pxml|names|dict
This indicates the format of the return result of the command.

REST If -max-count is specified, the dictionary returned from S3::REST is returned. If
-max-count is not specified, a list of all the dictionaries returned from the one or
more calls to S3::REST is returned.

xml If -max-count is specified, the body returned from S3::REST is returned. If -max-
count is not specified, a list of all the bodies returned from the one or more calls
to S3::REST is returned.

pxml If -max-count is specified, the body returned from S3::REST is passed throught
xsxp::parse and then returned. If -max-count is not specified, a list of all the
bodies returned from the one or more calls to S3::REST are each passed through
xsxp::parse and then returned.

names Returns a list of all names found in either the Contents/Key fields or the
CommonPrefixes/Prefix fields. If no -delimiter is specified and no -max-count is
specified, this returns a list of all resources with the specified -prefix.

dict Returns a dictionary. (Returns only one dictionary even if -max-count wasn't
specified.) The keys of the dictionary are as follows:

Name The name of the bucket (from the final call to S3::REST).

Prefix From the final call to S3::REST.

Marker From the final call to S3::REST.

MaxKeys
From the final call to S3::REST.

IsTruncated
From the final call to S3::REST, so always false if -max-count is not
specified.

NextMarker
Always provided if IsTruncated is true, and calculated of Amazon does not
provide it. May be empty if IsTruncated is false.

Key A list of names of resources in the bucket matching the -prefix and
-delimiter restrictions.

LastModified
A list of times of resources in the bucket, in the same order as Key, in the
format returned by Amazon. (I.e., it is not parsed into a seconds-from-
epoch.)

ETag A list of entity tags (a.k.a. MD5 checksums) in the same order as Key.

Size A list of sizes in bytes of the resources, in the same order as Key.

Owner/ID
A list of owners of the resources in the bucket, in the same order as Key.

Owner/DisplayName
A list of owners of the resources in the bucket, in the same order as Key.
These are the display names.

CommonPrefixes/Prefix
A list of prefixes common to multiple entities. This is present only if
-delimiter was supplied.

S3::Put ?-bucket bucketname? -resource resourcename ?-blocking boolean? ?-file filename? ?-content
contentstring? ?-acl private|public-read|public-read-write|authenticated-read|calc|keep? ?-content-type
contenttypestring? ?-x-amz-meta-* metadatatext? ?-compare comparemode?
This command sends data to a resource on Amazon's servers for storage, using the HTTP PUT command.
It returns 0 if the -compare mode prevented the transfer, 1 if the transfer worked, or throws an
error if the transfer was attempted but failed. Server 5XX errors and S3 socket errors are
retried according to S3:Configure -retries settings before throwing an error; other errors throw
immediately.

-bucket
This specifies the bucket into which the resource will be written. Leading and/or trailing
slashes are removed for you, as are spaces.

-resource
This is the full name of the resource within the bucket. A single leading slash is removed,
but not a trailing slash. Spaces are not trimmed.

-blocking
The standard blocking flag.

-file If this is specified, the filename must exist, must be readable, and must not be a special
or directory file. [file size] must apply to it and must not change for the lifetime of the
call. The default content-type is calculated based on the name and/or contents of the
file. Specifying this is an error if -content is also specified, but at least one of -file
or -content must be specified. (The file is allowed to not exist or not be readable if
-compare never is specified.)

-content
If this is specified, the contentstring is sent as the body of the resource. The content-
type defaults to "application/octet-string". Only the low bytes are sent, so non-ASCII
should use the appropriate encoding (such as [encoding convertto utf-8]) before passing it
to this routine, if necessary. Specifying this is an error if -file is also specified, but
at least one of -file or -content must be specified.

-acl This defaults to S3::Configure -default-acl if not specified. It sets the x-amz-acl header
on the PUT operation. If the value provided is calc, the x-amz-acl header is calculated
based on the I/O permissions of the file to be uploaded; it is an error to specify calc and
-content. If the value provided is keep, the acl of the resource is read before the PUT
(or the default is used if the resource does not exist), then set back to what it was after
the PUT (if it existed). An error will occur if the resource is successfully written but
the kept ACL cannot be then applied. This should never happen. Note: calc is not
currently fully implemented.

-x-amz-meta-*
If any header starts with "-x-amz-meta-", its contents are added to the PUT command to be
stored as metadata with the resource. Again, no encoding is performed, and the metadata
should not contain characters like newlines, carriage returns, and so on. It is best to
stick with simple ASCII strings, or to fix the library in several places.

-content-type
This overrides the content-type calculated by -file or sets the content-type for -content.

-compare
This is the standard compare mode argument. S3::Put returns 1 if the data was copied or 0
if the data was skipped due to the comparison mode so indicating it should be skipped.

S3::Get ?-bucket bucketname? -resource resourcename ?-blocking boolean? ?-compare comparemode? ?-file
filename? ?-content contentvarname? ?-timestamp aws|now? ?-headers headervarname?
This command retrieves data from a resource on Amazon's S3 servers, using the HTTP GET command. It
returns 0 if the -compare mode prevented the transfer, 1 if the transfer worked, or throws an
error if the transfer was attempted but failed. Server 5XX errors and S3 socket errors are are
retried according to S3:Configure settings before throwing an error; other errors throw
immediately. Note that this is always authenticated as the user configured in via S3::Configure
-accesskeyid. Use the Tcllib http for unauthenticated GETs.

-bucket
This specifies the bucket from which the resource will be read. Leading and/or trailing
slashes are removed for you, as are spaces.

-resource
This is the full name of the resource within the bucket. A single leading slash is removed,
but not a trailing slash. Spaces are not trimmed.

-blocking
The standard blocking flag.

-file If this is specified, the body of the resource will be read into this file, incrementally
without pulling it entirely into memory first. The parent directory must already exist. If
the file already exists, it must be writable. If an error is thrown part-way through the
process and the file already existed, it may be clobbered. If an error is thrown part-way
through the process and the file did not already exist, any partial bits will be deleted.
Specifying this is an error if -content is also specified, but at least one of -file or
-content must be specified.

-timestamp
This is only valid in conjunction with -file. It may be specified as now or aws. The
default is now. If now, the file's modification date is left up to the system. If aws, the
file's mtime is set to match the Last-Modified header on the resource, synchronizing the
two appropriately for -compare date or -compare newer.

-content
If this is specified, the contentvarname is a variable in the caller's scope (not
necessarily global) that receives the value of the body of the resource. No encoding is
done, so if the resource (for example) represents a UTF-8 byte sequence, use [encoding
convertfrom utf-8] to get a valid UTF-8 string. If this is specified, the -compare is
ignored unless it is never, in which case no assignment to contentvarname is performed.
Specifying this is an error if -file is also specified, but at least one of -file or
-content must be specified.

-compare
This is the standard compare mode argument. S3::Get returns 1 if the data was copied or 0
if the data was skipped due to the comparison mode so indicating it should be skipped.

-headers
If this is specified, the headers resulting from the fetch are stored in the provided
variable, as a dictionary. This will include content-type and x-amz-meta-* headers, as well
as the usual HTTP headers, the x-amz-id debugging headers, and so on. If no file is fetched
(due to -compare or other errors), no assignment to this variable is performed.

S3::Head ?-bucket bucketname? -resource resourcename ?-blocking boolean? ?-dict dictvarname? ?-headers
headersvarname? ?-status statusvarname?
This command requests HEAD from the resource. It returns whether a 2XX code was returned as a
result of the request, never throwing an S3 remote error. That is, if this returns 1, the
resource exists and is accessible. If this returns 0, something went wrong, and the -status result
can be consulted for details.

-bucket
This specifies the bucket from which the resource will be read. Leading and/or trailing
slashes are removed for you, as are spaces.

-resource
This is the full name of the resource within the bucket. A single leading slash is removed,
but not a trailing slash. Spaces are not trimmed.

-blocking
The standard blocking flag.

-dict If specified, the resulting dictionary from the S3::REST call is assigned to the indicated
(not necessarily global) variable in the caller's scope.

-headers
If specified, the dictionary of headers from the result are assigned to the indicated (not
necessarily global) variable in the caller's scope.

-status
If specified, the indicated (not necessarily global) variable in the caller's scope is
assigned a 2-element list. The first element is the 3-digit HTTP status code, while the
second element is the HTTP message (such as "OK" or "Forbidden").

S3::GetAcl ?-blocking boolean? ?-bucket bucketname? -resource resourcename ?-result-type REST|xml|pxml?
This command gets the ACL of the indicated resource or throws an error if it is unavailable.

-blocking boolean
See above for standard definition.

-bucket
This specifies the bucket from which the resource will be read. Leading and/or trailing
slashes are removed for you, as are spaces.

-resource
This is the full name of the resource within the bucket. A single leading slash is removed,
but not a trailing slash. Spaces are not trimmed.

-parse-xml xml
The XML from a previous GetACL can be passed in to be parsed into dictionary form. In this
case, -result-type must be pxml or dict.

-result-type REST
The dictionary returned by S3::REST is the return value of S3::GetAcl. In this case, a
non-2XX httpstatus will not throw an error.

-result-type xml
The raw XML of the body is returned as the result (with no encoding applied).

-result-type pxml
The XML of the body as parsed by xsxp::parse is returned.

-result-type dict
This fetches the ACL, parses it, and returns a dictionary of two elements.

The first element has the key "owner" whose value is the canonical ID of the owner of the
resource.

The second element has the key "acl" whose value is a dictionary. Each key in the
dictionary is one of Amazon's permissions, namely "READ", "WRITE", "READ_ACP", "WRITE_ACP",
or "FULL_CONTROL". Each value of each key is a list of canonical IDs or group URLs that
have that permission. Elements are not in the list in any particular order, and not all
keys are necessarily present. Display names are not returned, as they are not especially
useful; use pxml to obtain them if necessary.

S3::PutAcl ?-blocking boolean? ?-bucket bucketname? -resource resourcename ?-acl new-acl?
This sets the ACL on the indicated resource. It returns the XML written to the ACL, or throws an
error if anything went wrong.

-blocking boolean
See above for standard definition.

-bucket
This specifies the bucket from which the resource will be read. Leading and/or trailing
slashes are removed for you, as are spaces.

-resource
This is the full name of the resource within the bucket. A single leading slash is removed,
but not a trailing slash. Spaces are not trimmed.

-owner If this is provided, it is assumed to match the owner of the resource. Otherwise, a GET
may need to be issued against the resource to find the owner. If you already have the owner
(such as from a call to S3::GetAcl, you can pass the value of the "owner" key as the value
of this option, and it will be used in the construction of the XML.

-acl If this option is specified, it provides the ACL the caller wishes to write to the
resource. If this is not supplied or is empty, the value is taken from S3::Configure
-default-acl. The ACL is written with a PUT to the ?acl resource.

If the value passed to this option starts with "<", it is taken to be a body to be PUT to
the ACL resource.

If the value matches one of the standard Amazon x-amz-acl headers (i.e., a canned access
policy), that header is translated to XML and then applied. The canned access policies are
private, public-read, public-read-write, and authenticated-read (in lower case).

Otherwise, the value is assumed to be a dictionary formatted as the "acl" sub-entry within
the dict returns by S3::GetAcl -result-type dict. The proper XML is generated and applied
to the resource. Note that a value containing "//" is assumed to be a group, a value
containing "@" is assumed to be an AmazonCustomerByEmail, and otherwise the value is
assumed to be a canonical Amazon ID.

Note that you cannot change the owner, so calling GetAcl on a resource owned by one user
and applying it via PutAcl on a resource owned by another user may not do exactly what you
expect.

S3::Delete ?-bucket bucketname? -resource resourcename ?-blocking boolean? ?-status statusvar?
This command deletes the specified resource from the specified bucket. It returns 1 if the
resource was deleted successfully, 0 otherwise. It returns 0 rather than throwing an S3 remote
error.

-bucket
This specifies the bucket from which the resource will be deleted. Leading and/or trailing
slashes are removed for you, as are spaces.

-resource
This is the full name of the resource within the bucket. A single leading slash is removed,
but not a trailing slash. Spaces are not trimmed.

-blocking
The standard blocking flag.

-status
If specified, the indicated (not necessarily global) variable in the caller's scope is set
to a two-element list. The first element is the 3-digit HTTP status code. The second
element is the HTTP message (such as "OK" or "Forbidden"). Note that Amazon's DELETE result
is 204 on success, that being the code indicating no content in the returned body.

S3::Push ?-bucket bucketname? -directory directoryname ?-prefix prefixstring? ?-compare comparemode? ?-x-
amz-meta-* metastring? ?-acl aclcode? ?-delete boolean? ?-error throw|break|continue? ?-progress
scriptprefix?
This synchronises a local directory with a remote bucket by pushing the differences using S3::Put.
Note that if something has changed in the bucket but not locally, those changes could be lost.
Thus, this is not a general two-way synchronization primitive. (See S3::Sync for that.) Note too
that resource names are case sensitive, so changing the case of a file on a Windows machine may
lead to otherwise-unnecessary transfers. Note that only regular files are considered, so devices,
pipes, symlinks, and directories are not copied.

-bucket
This names the bucket into which data will be pushed.

-directory
This names the local directory from which files will be taken. It must exist, be readable
via [glob] and so on. If only some of the files therein are readable, S3::Push will PUT
those files that are readable and return in its results the list of files that could not be
opened.

-prefix
This names the prefix that will be added to all resources. That is, it is the remote
equivalent of -directory. If it is not specified, the root of the bucket will be treated
as the remote directory. An example may clarify.
S3::Push -bucket test -directory /tmp/xyz -prefix hello/world

In this example, /tmp/xyz/pdq.html will be stored as
http://s3.amazonaws.com/test/hello/world/pdq.html in Amazon's servers. Also,
/tmp/xyz/abc/def/Hello will be stored as
http://s3.amazonaws.com/test/hello/world/abc/def/Hello in Amazon's servers. Without the
-prefix option, /tmp/xyz/pdq.html would be stored as http://s3.amazonaws.com/test/pdq.html.

-blocking
This is the standard blocking option.

-compare
If present, this is passed to each invocation of S3::Put. Naturally, S3::Configure
-default-compare is used if this is not specified.

-x-amz-meta-*
If present, this is passed to each invocation of S3::Put. All copied files will have the
same metadata.

-acl If present, this is passed to each invocation of S3::Put.

-delete
This defaults to false. If true, resources in the destination that are not in the source
directory are deleted with S3::Delete. Since only regular files are considered, the
existance of a symlink, pipe, device, or directory in the local source will not prevent the
deletion of a remote resource with a corresponding name.

-error This controls the behavior of S3::Push in the event that S3::Put throws an error. Note that
errors encountered on the local file system or in reading the list of resources in the
remote bucket always throw errors. This option allows control over "partial" errors, when
some files were copied and some were not. S3::Delete is always finished up, with errors
simply recorded in the return result.

throw The error is rethrown with the same errorCode.

break Processing stops without throwing an error, the error is recorded in the return
value, and the command returns with a normal return. The calls to S3::Delete are
not started.

continue
This is the default. Processing continues without throwing, recording the error in
the return result, and resuming with the next file in the local directory to be
copied.

-progress
If this is specified and the indicated script prefix is not empty, the indicated script
prefix will be invoked several times in the caller's context with additional arguments at
various points in the processing. This allows progress reporting without backgrounding.
The provided prefix will be invoked with additional arguments, with the first additional
argument indicating what part of the process is being reported on. The prefix is initially
invoked with args as the first additional argument and a dictionary representing the
normalized arguments to the S3::Push call as the second additional argument. Then the
prefix is invoked with local as the first additional argument and a list of suffixes of the
files to be considered as the second argument. Then the prefix is invoked with remote as
the first additional argument and a list of suffixes existing in the remote bucket as the
second additional argument. Then, for each file in the local list, the prefix will be
invoked with start as the first additional argument and the common suffix as the second
additional argument. When S3::Put returns for that file, the prefix will be invoked with
copy as the first additional argument, the common suffix as the second additional argument,
and a third argument that will be "copied" (if S3::Put sent the resource), "skipped" (if
S3::Put decided not to based on -compare), or the errorCode that S3::Put threw due to
unexpected errors (in which case the third argument is a list that starts with "S3"). When
all files have been transfered, the prefix may be invoked zero or more times with delete as
the first additional argument and the suffix of the resource being deleted as the second
additional argument, with a third argument being either an empty string (if the delete
worked) or the errorCode from S3::Delete if it failed. Finally, the prefix will be invoked
with finished as the first additional argument and the return value as the second
additional argument.

The return result from this command is a dictionary. They keys are the suffixes (i.e., the common
portion of the path after the -directory and -prefix), while the values are either "copied",
"skipped" (if -compare indicated not to copy the file), or the errorCode thrown by S3::Put, as
appropriate. If -delete was true, there may also be entries for suffixes with the value "deleted"
or "notdeleted", indicating whether the attempted S3::Delete worked or not, respectively. There is
one additional pair in the return result, whose key is the empty string and whose value is a
nested dictionary. The keys of this nested dictionary include "filescopied" (the number of files
successfully copied), "bytescopied" (the number of data bytes in the files copied, excluding
headers, metadata, etc), "compareskipped" (the number of files not copied due to -compare mode),
"errorskipped" (the number of files not copied due to thrown errors), "filesdeleted" (the number
of resources deleted due to not having corresponding files locally, or 0 if -delete is false), and
"filesnotdeleted" (the number of resources whose deletion was attempted but failed).

Note that this is currently implemented somewhat inefficiently. It fetches the bucket listing
(including timestamps and eTags), then calls S3::Put, which uses HEAD to find the timestamps and
eTags again. Correcting this with no API change is planned for a future upgrade.

S3::Pull ?-bucket bucketname? -directory directoryname ?-prefix prefixstring? ?-blocking boolean?
?-compare comparemode? ?-delete boolean? ?-timestamp aws|now? ?-error throw|break|continue? ?-progress
scriptprefix?
This synchronises a remote bucket with a local directory by pulling the differences using S3::Get
If something has been changed locally but not in the bucket, those difference may be lost. This is
not a general two-way synchronization mechanism. (See S3::Sync for that.) This creates
directories if needed; new directories are created with default permissions. Note that resource
names are case sensitive, so changing the case of a file on a Windows machine may lead to
otherwise-unnecessary transfers. Also, try not to store data in resources that end with a slash,
or which are prefixes of resources that otherwise would start with a slash; i.e., don't use this
if you store data in resources whose names have to be directories locally.

Note that this is currently implemented somewhat inefficiently. It fetches the bucket listing
(including timestamps and eTags), then calls S3::Get, which uses HEAD to find the timestamps and
eTags again. Correcting this with no API change is planned for a future upgrade.

-bucket
This names the bucket from which data will be pulled.

-directory
This names the local directory into which files will be written It must exist, be readable
via [glob], writable for file creation, and so on. If only some of the files therein are
writable, S3::Pull will GET those files that are writable and return in its results the
list of files that could not be opened.

-prefix
The prefix of resources that will be considered for retrieval. See S3::Push for more
details, examples, etc. (Of course, S3::Pull reads rather than writes, but the prefix is
treated similarly.)

-blocking
This is the standard blocking option.

-compare
This is passed to each invocation of S3::Get if provided. Naturally, S3::Configure
-default-compare is used if this is not provided.

-timestamp
This is passed to each invocation of S3::Get if provided.

-delete
If this is specified and true, files that exist in the -directory that are not in the
-prefix will be deleted after all resources have been copied. In addition, empty
directories (other than the top-level -directory) will be deleted, as Amazon S3 has no
concept of an empty directory.

-error See S3::Push for a description of this option.

-progress
See S3::Push for a description of this option. It differs slightly in that local
directories may be included with a trailing slash to indicate they are directories.

The return value from this command is a dictionary. It is identical in form and meaning to the
description of the return result of S3::Push. It differs only in that directories may be included,
with a trailing slash in their name, if they are empty and get deleted.

S3::Toss ?-bucket bucketname? -prefix prefixstring ?-blocking boolean? ?-error throw|break|continue?
?-progress scriptprefix?
This deletes some or all resources within a bucket. It would be considered a "recursive delete"
had Amazon implemented actual directories.

-bucket
The bucket from which resources will be deleted.

-blocking
The standard blocking option.

-prefix
The prefix for resources to be deleted. Any resource that starts with this string will be
deleted. This is required. To delete everything in the bucket, pass an empty string for
the prefix.

-error If this is "throw", S3::Toss rethrows any errors it encounters. If this is "break",
S3::Toss returns with a normal return after the first error, recording that error in the
return result. If this is "continue", which is the default, S3::Toss continues on and lists
all errors in the return result.

-progress
If this is specified and not an empty string, the script prefix will be invoked several
times in the context of the caller with additional arguments appended. Initially, it will
be invoked with the first additional argument being args and the second being the processed
list of arguments to S3::Toss. Then it is invoked with remote as the first additional
argument and the list of suffixes in the bucket to be deleted as the second additional
argument. Then it is invoked with the first additional argument being delete and the second
additional argument being the suffix deleted and the third additional argument being
"deleted" or "notdeleted" depending on whether S3::Delete threw an error. Finally, the
script prefix is invoked with a first additional argument of "finished" and a second
additional argument of the return value.

The return value is a dictionary. The keys are the suffixes of files that S3::Toss attempted to
delete, and whose values are either the string "deleted" or "notdeleted". There is also one
additional pair, whose key is the empty string and whose value is an embedded dictionary. The keys
of this embedded dictionary include "filesdeleted" and "filesnotdeleted", each of which has
integer values.

LIMITATIONS

• The pure-Tcl MD5 checking is slow. If you are processing files in the megabyte range, consider
ensuring binary support is available.

• The commands S3::Pull and S3::Push fetch a directory listing which includes timestamps and MD5
hashes, then invoke S3::Get and S3::Put. If a complex -compare mode is specified, S3::Get and
S3::Put will invoke a HEAD operation for each file to fetch timestamps and MD5 hashes of each
resource again. It is expected that a future release of this package will solve this without any
API changes.

• The commands S3::Pull and S3::Push fetch a directory listing without using -max-count. The entire
directory is pulled into memory at once. For very large buckets, this could be a performance
problem. The author, at this time, does not plan to change this behavior. Welcome to Open Source.

• S3::Sync is neither designed nor implemented yet. The intention would be to keep changes
synchronised, so changes could be made to both the bucket and the local directory and be merged by
S3::Sync.

• Nor is -compare calc fully implemented. This is primarily due to Windows not providing a
convenient method for distinguishing between local files that are "public-read" or "public-read-
write". Assistance figuring out TWAPI for this would be appreciated. The U**X semantics are
difficult to map directly as well. See the source for details. Note that there are not tests for
calc, since it isn't done yet.

• The HTTP processing is implemented within the library, rather than using a "real" HTTP package.
Hence, multi-line headers are not (yet) handled correctly. Do not include carriage returns or
linefeeds in x-amz-meta-* headers, content-type values, and so on. The author does not at this
time expect to improve this.

• Internally, S3::Push and S3::Pull and S3::Toss are all very similar and should be refactored.

• The idea of using -compare never -delete true to delete files that have been deleted from one
place but not the other yet not copying changed files is untested.

USAGE SUGGESTIONS

       To fetch a "directory" out of a bucket, make changes, and store it back:
              file mkdir ./tempfiles
              S3::Pull -bucket sample -prefix of/interest -directory ./tempfiles \
              -timestamp aws
              do_my_process ./tempfiles other arguments
              S3::Push -bucket sample -prefix of/interest -directory ./tempfiles \
              -compare newer -delete true

       To delete files locally that were deleted off of S3 but not otherwise update files:
              S3::Pull -bucket sample -prefix of/interest -directory ./myfiles \
              -compare never -delete true

FUTURE DEVELOPMENTS

       The author intends to work on several additional  projects  related  to  this  package,  in  addition  to
       finishing the unfinished features.

       First,  a  command-line program allowing browsing of buckets and transfer of files from shell scripts and
       command prompts is useful.

       Second, a GUI-based program allowing visual manipulation of bucket and resource trees not unlike  Windows
       Explorer would be useful.

       Third,  a  command-line (and perhaps a GUI-based) program called "OddJob" that will use S3 to synchronize
       computation amongst multiple servers running OddJob. An S3 bucket will be set up with a number of scripts
       to run, and the OddJob program can be invoked on multiple machines to run scripts on  all  the  machines,
       each  moving  on to the next unstarted task as it finishes each.  This is still being designed, and it is
       intended primarily to be run on Amazon's Elastic Compute Cloud.

BUGS, IDEAS, FEEDBACK

       This document, and the package it describes, will undoubtedly contain bugs and  other  problems.   Please
       report      such      in     the     category     amazon-s3     of     the     Tcllib     SF     Trackers
       [http://sourceforge.net/tracker/?group_id=12883].  Please also report any ideas for enhancements you  may
       have for either package and/or documentation.

KEYWORDS

       amazon, cloud, s3

COPYRIGHT

       2006,2008 Darren New. All Rights Reserved. See LICENSE.TXT for terms.

amazon-s3                                             1.0.0                                             S3(3tcl)

NAME

SYNOPSIS

DESCRIPTION

ERROR REPORTING

COMMANDS

LOW LEVEL COMMANDS

HIGH LEVEL COMMANDS

LIMITATIONS

USAGE SUGGESTIONS

FUTURE DEVELOPMENTS

BUGS, IDEAS, FEEDBACK

KEYWORDS

CATEGORY

COPYRIGHT