-
Notifications
You must be signed in to change notification settings - Fork 1
jmscott/blobio
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Synopsis: A trivial key/value protocol using crypto digests as keys. Description: The blobio system implements a client/server protocol for associating crypto hash digests as keys pointing to immutable, binary (typically large) objects. The blobs are referenced with a uri (uniform resource identifier) called a uniform digest, shortened to 'udig', with a syntax like algorithm:digest where 'algorithm' is the hash algorithm, like 'sha' or 'btc20', and 'digest' is the ascii hash value of the blob. The colon character separates the algorithm and the digest. The algorithm matches the perl5 regex ^[a-z][a-z0-9]{0,7} and the digest matches both perl5 regexs [[:graph:]]{32,128} [[:ascii:]{32,128} Currently, just two digests are supported: sha:c59f2c4f35b56a4bdf2d2c9f4a9984f2049cf2d4 btc20:b0e889a067b5d4fff15c3d6e646554939a793783 where "sha" is the famous but broken SHA1 and "btc20" is a composition of RIPEMD(SHA256(SHA256(blob))), inspired by the bitcoin wallet, replacing base58 encoding with hex. Eventually a SHA3 digest can be added without breaking code. The "bc160" is a deprecated, experimental digest. An example udig is sha:cd50d19784897085a8d0e3e413f8612b097c03f1 which describes the string "hello, world\n". echo 'hello, world' | shasum cd50d19784897085a8d0e3e413f8612b097c03f1 - The maximum size of the algorithm tag is 8 ascii chars. The maximum size of a digest is 128 ascii graphical characters. The minimum size of a digest is 32 ascii chars, making the min/max udig sizes min udig = 1 + 1 (colon) + 32 = 34 ascii chars. max udig = 8 + 1 (colon) + 128 = 137 ascii chars. The network protocol implements 7 verbs and 1 proposed verb ("cat") get <udig> # read a blob with a particular digest put <udig> # write a blob with a digest take <udig> # get a blob that the source may forget give <udig> # put a blob that the source may forget eat <udig> # untrusted digest of a blob by server wrap # freeze current unwrapped blob traffic # log into a single blob and return # udig of the set of all blobs # "wrapped" since the previous # "roll". the wrapped <udig>, # returned to the client, will be in # the next wrap, which allows # perpetual, acyclic chaining, in the # spirit of a merkle tree. roll <udig> # forget all wrapped blob traffic logs # in the udig set described by <udig>. # subsequent wrap/rolls will never see # these blob sets again. # # however, the wrapped blobs # themselves will exist and probably # be gettable. the <udig> of the # rolled udig will be in the next wrap # set as a "roll" blob request record. # PROPOSED cat <udig> # sum the concatenation of blobs in # <udig-list> ... proof of storage After the initial request by the client, all server replies are just 'ok' or 'no', where 'no' is always the last reply. See below for the exact syntax. On the blob server - named "bio4d" - for each correct client request a single traffic record describing the request is written to the file $BLOBIO_ROOT/spool/bio4d.brr Those traffic records are named "Blob Request Records" and abbreviated as BRR. The BRR record format is modeled after the Call Detail Record from the telecommunications industry: https://en.wikipedia.org/wiki/Call_detail_record The format of a single BRR record in spool/bio4d.brr is a line of seven tab separated ascii fields: i.e, the typical unixy ascii record: start_time # RFC3339 # NSEC_re=([.][0-9]{0,10})? # TZ_re=(([+-]HH:MM)|Z)? # /YYYY-MM-DDThh:mm:ss$NSEC_re$TZ_re/ transport # <tag>~<transport detail> # TAG_re=[a-z][a-z0-9_]{0,7} # DETAIL_re=[[:graph:]]{1,160} # /${TAG_re}[~]$DETAIL_re/ verb # get|put|give|take|wrap|roll|eat|cat udig # algorithm:digest # ALGO_re=[a-z][a-z0-9]{0,7} # DIGEST_re=[[:graph:]]{32,128} # /$ALGO_re:$DIGEST_re/ chat_history # full ok|no handshakes between # server&client for request. # # /(ok|no)(,(ok|no)){0,2}/ # and "no" is always the final reply. # # Note: # also depends upon the verb, # which is tricky to express # with regular expression. # perhaps use a variation of # $1 syntax in perl to express blob_size # unsigned 63 bit long 0 <= 2^63 - 1 # /\d{1,19}/ # wall_duration # <ui31 sec>(.<ui31 sec>)? # /\d{1,10}([.]\d{1,10})?/ BRR records are always syntactally correct as described below. In other words, a client request that is syntactically incorrect will never generate a blob request record, which implies debugging an errant client may be out of band with BRR records. The minimum size of a blob request record is 35 ascii chars. A terminating new line or null char is NOT included in the tally of the minimum size. #field <tab> comment #----- ----- #------------------------------------------- (20) + 1 + # start time<tab> # YYYY-MM-DDThh:mm:ssZ (1+1+1) + 1 + # transport<tab> # [a-z]~[[:graph:]] (3) + 1 + # verb # (get|put|eat|cat) (0) + 1 + # empty string (wrap|cat|eat) when chat=no (2) + 1 + # chat history # (no) (1) + 1 # wall duration seconds # 0 The maximum size of a blob request record is 419 ascii chars. No newlime or null char is included in the tally of maximum size. field <tab> comment ----- ----- #------------------------------ (19+1+9+1+5) + 1 + # start_time # TIME_re=YYYY-MM-DDThh:mm:ss # /${TIME_re}[.]\d{9}[+-]HH:MM/ (8+1+160) + 1 + # transport # /[a-z][a-z0-9]{7}~[[:graph:]]{160}/ (4 + 1 + 19) + 1 + # verb=give (8+1+128) + 1 + # udig # /[a-z][a-z0-9]{7}:[[:graph:]]{128} (8) + 1 + # chat history # /(ok,ok,(ok|no)/ (19) + 1 + # blob size # /[1-9][0-9]{18}/ (10+1+10) # wall duration seconds # /[1-9][0-9]{9}[.][0-9]{10}/ = 419 bytes Since the length is 419 <= bytes, a BRR record can be broadcast in an UDP packet, which pragmatically "insures" an atomic payload up to 508 bytes. Consider the BRR format to have been cast in sapphire. Protocol Flow: >get udig\n # request for blob by udig <ok\n[bytes][close] # server sends bytes of blob <no\n[close] # server will not honor request >put udig\n # request to put blog matching a udig <ok\n # server ready for blob bytes >[bytes] # send bytes to server <ok\n[close] # accepted bytes for blob <no\n[close] # rejects bytes for blob <no\n # server will not honor request >take udig\n # request for blob by udig <ok\n[bytes] # server sends blob bytes >ok\n # client has the blob <no[close] # server may not forget the blob <ok[close] # server may forget the blob >no\n[close # client rejects the taken blob <no\n[close] # server can not forget blob >give udig\n # request to put blob matching a udig <ok\n # server ready for blob bytes >[bytes] # send digested bytes to server <ok\n # server accepts the bytes >ok[close] # client probably forgets blob >no[close] # client might remember the blob <no\n[close] # server rejects the blob <no\n[close] # server rejects blob give request >eat udig\n # client requests server to verify blob <ok\n[close] # blob exists and has been verified <no\n[close] # server was unable to digest the blob >wrap # request udig set of traffic logs <ok\nudig\n[close] # udig of set of traffic logs <no\n[close] # no logs available >roll udig # udig set of traffic logs to forget <ok\n[close] # logs forgotten by server. typically # a wrap set. <no\n[close] # not all logs in set forgotten, or other error Problematically, the <verb>.ui63 request leaks info about the blob. Also, probing for support of the <byte-count> extension could be "solved" by requiring all bio4 servers to contain the "empty" blob ("epsilon knowledge"). To illustrate probing for byte count support, consider the empty blob for sha digest: sha:da39a3ee5e6b4b0d3255bfef95601890afd80709 the chat probe looks like > get sha:da39a3ee5e6b4b0d3255bfef95601890afd80709 < ok # can never remove the empty blob > take sha:da39a3ee5e6b4b0d3255bfef95601890afd80709 < no > get.0 sha:da39a3ee5e6b4b0d3255bfef95601890afd80709 < no proves that the server does not support <byte-count> for the get request. and > get.0 sha:da39a3ee5e6b4b0d3255bfef95601890afd80709 < ok implies <byte-count> is supported for get/put/give/take requests. As blobio evolves, forward facing, untrusted applications will rarely speak the bio4 protocol. instead, a security oriented gateway protocol will sit in front of the bio4 services, such as https, rsync, or a REST api. in other words, all blob gateways are caches and a bio4 service is service of last resort. Proposed Byte Count Added to Protocol Flow (not implemented): >get.5432 udig\n # (request for blob of size 5432 bytes by udig <ok\n[5432 bytes][close] # server sends bytes of blob <no\n[close] # server can not honor request >put.5432 udig\n # request to put blog of size 5432 and udig <ok\n # server ready for blob bytes >[5432 bytes] # send bytes to server <ok\n[close] # accepted bytes for blob <no\n[close] # rejects bytes for blob <no\n # server can not honor request >take.5432 udig\n # request for blob of 5432 bytes by udig <ok\n[bytes] # server sends 5432 blob bytes >ok\n # client has the blob <no[close] # server may not forget the blob <ok[close] # server may forget the blob >no\n[close # client rejects the taken blob <no\n[close] # server can not forget blob >give.5432 udig\n # request to put blob of 5432 bytes by udig <ok\n # server ready for blob bytes >[bytes] # send digested bytes to server <ok\n # server accepts the bytes >ok[close] # client probably forgets blob >no[close] # client might remember the blob <no\n[close] # server rejects the blob <no\n[close] # server rejects blob give request >eat.5432 udig\n # client requests server to verify blob <ok.5432\n[close] # blob exists and has been verified <no\n[close] # server was unable to digest the blob Problematically, the <verb>.ui63 request leaks info about the blob. Protocol Questions: - think about a "cat" command that allows the server to "prove" the existence of a set of blobs on a remote server. the obvious technique would be for the cliet "cat" to send the udig of a list of blobs to the server, for which the server replies with the digest of the cancatenation of the blobs in the sent udig list. Billing Model Questions: - what to measure per billing cycle: total count of blobs stored total bytes stored total bytes transfered on public interface avg transfer bytes/duration > X for blob size > Y total number of eat requests/proof of retrievability should a small numbers of plans exist or should a formula determine cycle charges? what about overages? a mulligan is nice for customer satisfaction but rolling to next higher plan is draconian. - duration of storage: per blob or per plan? Note: - consider reordering brr fields to make validating stateless. in particular, chat history MUST be scanned after knowing when a udig can be missing, as in {wrap,roll}==no. - brr field "chat_history" only makes sense with network protocol. what is the chat history for service "fs"? this is a design flaw in general brr semantics! perhaps we should loosen syntax or meaning of chat history! maybe call flow_history? note sure. when blob trust levels add, chat history could be used to indicate if a blob was verified or not. - is blob size signed or unsigned i64? - should "get" verb include a reply? >get <ok ... bytes >ok - what would eat.ui63 mean? seems like would be useful with trusted' network. - investigate fsverity in linux kernel. https://www.kernel.org/doc/html/latest/filesystems/fsverity.html - also, what about leap-seconds (:23:59:60) in starttime. - a "wrap" -> "no" has no <udig> in brr, implying the minimum is too big. should the empty udig be in the brr record? - where is syntax for BLOBIO_SERVICE defined - move pgsql data type public.udig to schema blobio.udig! - follow project for atomic write in linux https://lwn.net/Articles/789600/ - move log/*.[fx]dr files to spool and, after wrapping, to data/ - Should the variable BLOBIO_ROOT be renamed BLOBIO_INSTALL or BLOBIO_DIST? Currently BLOBIO_ROOT means the actual install directory and not the parent, which is the correct root. - add option to bio4d to limit size of accepted blob - investigate linux splice() system call. - think about separate bio4d processes sharing the same data/ file system but otherwise separate. or, explore union file systems. - make the postgresql udig data type be an extension - prove that every brr log contains a successful "wrap" other than the the first brr log. critical to prove this. - two quick wraps in a row could generate the same wrap set? - should BRR log files always have unique digests, modulo other servers. perhaps the first put request in the wrapped brr is a blob uniquely summarizing the state of the server and the current wrap,request. - contemplate running the server such that the empty blob always exists - think about adding fadvise to file system blob reads. seems natural since every open of blob file will read the entire blob. - create an is_empty() function in pgsql udig datatype. - add support for unix domain socket, std{in,out}, inetd, and libevent - upon server start up on mac os x test for case sensitive file system see http://www.mail-archive.com/[email protected]/msg66830.html another idea under macosx would be to convert blob file path to lower case. - rewrite makefile for client only install, without golang requirement - should the postgres datatype include a function is_zero_size(udig)? - the postgres udig data type ought to be in the blobio schema. - merge fdr2sql & cron-fdr2pg into flowd - enable/disable specific commands get/put/give/take per server/per subnet - named pipes to flowd as another tail source - signal handling needs to be pushed to main listen loop or cleaned up with sigaction().
About
A trivial protocol to manage immutable blobs on a network or file system.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published