Blog

  • cdb

    % cdb(1) | Constant Database

    NAME

    CDB – An interface to the Constant Database Library

    SYNOPSES

    cdb -h

    cdb -[cdkstVG] file.cdb

    cdb -q file.cdb key [record#]

    cdb -g -M minimum -M maximum -R records -S seed

    cdb -H

    DESCRIPTION

    Author:     Richard James Howe
    License:    Unlicense
    Repository: <https://github.com/howerj/cdb>
    Email:      howe.r.j.89@gmail.com
    

    A clone of the CDB database, a simple, read-only (once created) database.
    The database library is designed so it can be embedded into a microcontroller
    if needed. This program can be used for creating and querying CDB databases,
    which consist of key-value pairs of binary data.

    This program also includes several options that help in testing out the
    database, one for hashing input keys and printing the hash for the default hash
    function and another one for generating a database with (Pseudo-)random keys
    and values of a given length.

    This library can create 16, 32 and 64 bit versions of the CDB file format
    removing one of the major limitations of the 32-bit version.

    The 64-bit version of the database uses a different hash than djb2.

    OPTIONS

    -h : print out this help message and exit successfully

    -b : set the size of the CDB database to use (default is 32, can be 16 or 64)

    -v: increase verbosity level

    -t file.cdb : run internal tests, exit with zero on a pass

    -c file.cdb : run in create mode

    -d file.cdb : dump the database

    -k file.cdb : dump the keys in the database

    -s file.cdb : print statistics about the database

    -T temp.cdb : name of temporary file to use

    -V file.cdb : validate database

    -q file.cdb key record-number : query the database for a key, with an optional record

    -o number : specify offset into file where database begins

    -H : hash keys and output their hash

    -g : spit out an example database to standard out

    -m number : set minimum length of generated record

    -M number : set maximum length of generated record

    -R number : set number of generated records

    -S number : set seed for record generation

    EXAMPLES

    Creating a database, called ‘example.cdb’:

    $ ./cdb -c example.cdb
    +0,1:->X
    +1,0:Y->
    +1,1:a->b
    +1,1:a->b
    +1,2:a->ba
    +5,5:hello->world
    

    Note that zero length keys and values are valid, and that duplicate keys are
    allowed, even keys with the same value. A key with the specified value is
    created for each duplicate, just like a non-duplicate key.

    Looking up values in the created database:

    ./cdb -q example.cdb ""
    ./cdb -q example.cdb Y
    ./cdb -q example.cdb a
    ./cdb -q example.cdb a 0
    ./cdb -q example.cdb a 1
    ./cdb -q example.cdb a 2
    ./cdb -q example.cdb hello
    

    Dumping a database:

    $ ./cdb -d example.cdb
    

    A database dump can be read straight back in to create another database:

    $ ./cdb -d example.cdb | ./cdb -c should_have_just_used_copy.cdb
    

    Which is not useful in itself, but assuming your data (both keys and
    values) is ASCII text with no new lines and NUL characters then you could
    filter out, modify or add in values with the standard Unix command line
    tools.

    RETURN VALUE

    cdb returns zero on success/key found, and a non zero value on failure. Two is
    returned if a key is not found, any other value indicates a more serious
    failure.

    LIMITATIONS

    Three different versions of the library can be built; a 16, a 32 and a 64 bit
    version. The 32 bit version is the default version. For all versions there is a
    limit on the maximum file size in the format used of 2^N, where N is the size.
    Keys and Values have the same limit (although they can never reach that size as
    some of the overhead is taken up as part of the file format). Any other
    arbitrary limitation is a bug in the implementation.

    The minimum size of a CDB file is 256 * 2 * (N/8) bytes.

    It should be noted that if you build a N bit (where N is 16, 32 or 64)
    version of this library you are limited to creating databases that are the
    size of N and less, e.g. If cdb_word_t is set to uint32_t, and therefore
    the 32-bit version of this library is being built, then you can create 32-bit
    and 16-bit versions of the CDB database format, but you cannot make 64-bit
    versions. You can set cdb_word_t to uint64_t (which enables the library
    to create all three mutually incompatible versions of the library) on a
    32-bit system, naturally.

    INPUT/DUMP FORMAT

    The input and dump format follow the same pattern, some ASCII text specifying
    the beginning of a record and then some binary data with some separators, and
    a newline terminating the record, the format is:

    +key-length,value-length:KEY->VALUE
    +key-length,value-length:KEY->VALUE
    ...
    +key-length,value-length:KEY->VALUE
    

    Despite the presence of textual data, the input key and value can contain
    binary data, including the ASCII NUL character.

    An example, encoding the key value pair “abc” to “def” and “G” to “hello”:

    +3,3:abc->def
    +1,5:G->hello
    

    The following awk script can be used to pre-process a series of key-value
    pairs in the format “key value”, with one record per line and optional comment
    lines:

    #!/bin/sh
    LC_ALL='C' awk '
      /^[^#]/ {
        print "+" length($1) "," length($2) ":" $1 "->" $2
      }
      END {
        print ""
      }
    ' | cdb -c "$@"
    

    Which was available in the original original cdb program as ‘cdbmake-12’.

    FILE FORMAT

    The file format is incredibly simple, it is designed so that only the header
    and the hash table pointer need to be stored in memory during generation of the
    table – the keys and values can be streamed on to the disk. The header consists
    of 256 2-word values forming an initial hash table that point to the hash
    tables at the end of the file, the key-value records, and then up to 256 hash
    tables pointing to the key-value pairs.

    A word consists of a 4-byte/32-bit value (although this may be changed via
    compile time options, creating an incompatible format). All word values are
    stored in little-endian format.

    The initial hash table contains an array of 256 2-word values.
    The words are; a position of a hash table in the file and the number of buckets
    in that hash table, stored in that order. To lookup a key the key is first
    hashed, the lowest eight bits of the hash are used to index into the initial table
    and if there are values in this hash the search then proceeds to the second hash
    table at the end of the file.

    The hash tables at the end of the file contains an array of two word records,
    containing the full hash and a file position of the key-value pair. To search
    for a key in this table the hash of the key is taken and the lowest eight bits
    are discarded by shifting right eight places, the hash is then taken modulo the
    number of elements in the hash table, the resulting value is used as an initial
    index into the hash table. Searching continues until the key is found, or an
    empty record is found, or the number of records in the table have been searched
    through with no match. A key is compared by looking at the hash table records,
    if the hash of the key matches the stored hash in the hash table records then a
    possible match is found, the file position is then used to look up the
    key-value pair and the key is compared.

    The number of buckets in the hash table is chosen as twice the number of
    populated entries in the hash table.

    A key-value pair is stored as two words containing the key length and the value
    length in that order, then the key, and finally the value.

    The hashing algorithm used is similar to djb2 (except for the 64-bit
    version, which uses a 64-bit variant of SDBM hash), but with a minor modification that
    an exclusive-or replaces an addition.

    The algorithm calculates hashes of the size of a word, the initial hash value is the special
    number ‘5381’. The hash is calculated as the current hash value multiplied by 33, to which the
    new byte to be hashes and the result of multiplication under go an exclusive-or
    operation. This repeats until all bytes to be hashed are processed. All
    arithmetic operations are unsigned and performed modulo 2 raised to the power
    of 32.

    The pseudo code for this is:

    set HASH to 5381
    for each OCTET in INPUT:
    	set HASH to: ((HASH * 33) % pow(2, 32)) xor OCTET
    return HASH
    

    Note that there is nothing in the file format that disallows duplicate keys in
    the database, in fact the API allows duplicate keys to be retrieved. Both key
    and data values can also be zero bytes long. There are also no special
    alignment requirements on the data.

    The best documentation on the file format is a small pure python script that
    implements a set of functions for manipulating a CDB database, a description is
    available here http://www.unixuser.org/~euske/doc/cdbinternals/ and the
    script itself is available at the bottom of that page
    http://www.unixuser.org/~euske/doc/cdbinternals/pycdb.py.

    A visualization of the overall file structure:

             Constant Database Sections
    .-------------------------------------------.
    |   256 Bucket Initial Hash Table (2KiB)    |
    .-------------------------------------------.
    |            Key Value Pairs                |
    .-------------------------------------------.
    |       0-256 Secondary Hash Tables         |
    .-------------------------------------------.
    

    The initial hash table at the start of the file:

        256 Bucket Initial Hash Table (2KiB)
    .-------------------------------------------.
    | { P, L } | { P, L } | { P, L } |   ...    |
    .----------+----------+----------+----------.
    |   ...    | { P, L } | { P, L } | { P, L } |
    .-------------------------------------------.
    P = Position of secondary hash table
    L = Number of buckets in secondary hash table
    

    The key-value pairs:

    .-------------------------------------------.
    | { KL, VL } | KEY ...      | VALUE ...     |
    .-------------------------------------------.
    KL    = Key Length
    VL    = Value Length
    KEY   = Varible length binary data key
    VALUE = Variable length binary value
    

    Of the variable number of hash tables (which each are of a variable length) at
    the end of the file:

     0-256 Variable Length Secondary Hash Tables
    .---------------------.
    | { H, P } | { H, P } |
    .----------+----------+---------------------.
    | { H, P } |   ...    |   ...    | { H, P } |
    .----------+----------+----------+----------.
    | { H, P } |   ...    | { H, P } |
    .--------------------------------.
    H = Hash
    P = Position of Key-Value Pair
    

    And that is all for the file format description.

    While the keys-value pairs can be streamed to disk and the second level hash
    table written after those keys, anything that creates a database will have
    to seek to the beginning of the file to rewrite the header, this could have
    been avoided by storing the 256 initial hash table results at the end of
    the file allowing a database to be constructed in a Unix filter, but alas,
    this is not possible. Also of note, by passing in a custom hash algorithm to
    the C API you have much more control over where each of the key-value pairs
    get stored, specifically, which bucket they will end up in by controlling
    the lowest 8-bits (for example you could set the lowest 8-bits to the first
    byte in the key in a custom hash).

    Note that there is nothing stopping you storing the key-value pairs in
    some kind of order, you could do this by adding the keys in lexicographic
    order for a database sorted by key. Retrieving keys using the C function
    “cdb_foreach” would allow you retrieve keys in order. The hash table itself
    would remain unaware of this order. Dumping the key-value pairs would maintain
    this order as well. There is no guarantee other tools will preserve this
    order however (they may dump key-value pairs backwards, or by going through
    the hash table).

    CDB C API OVERVIEW

    There are a few goals that the API has:

    • Simplicity, there should be few functions and data structures.
    • The API is easy to use.
    • There should be minimal dependencies on the C standard library. The
      library itself should be small and not be a huge, non-portable, “optimized”,
      mess.
    • The user should decide when, where and how allocations are performed. The
      working set that is allocated should be small.
    • The database driver should catch corrupt files if possible.

    Some of these goals are in conflict, being able to control allocations and
    having minimal dependencies allow the library to be used in an embedded system,
    however it means that in order to do very basic things the user has to
    provide a series of callbacks. The callbacks are simple to implement on a
    hosted system, examples are provided in main.c and host.c in the
    project repository, but this means the library is not just read to use.

    There are two sets of operations that most users will want to perform; creating
    a database and reading keys. After the callbacks have been provided, to create
    a database requires opening up a new database in create mode:

    /* error handling omitted for brevity */
    cdb_t *cdb = NULL;
    cdb_options_t ops = { /* Your file callbacks/options go here */ };
    cdb_open(&cdb, &ops, 1, "example.cdb");
    cdb_buffer_t key   = { .length = 5, .buffer = "hello", };
    cdb_buffer_t value = { .length = 5, .buffer = "world", };
    cdb_add(cdb, &key, &value);
    cdb_close(cdb);
    

    If you are dealing with mostly NUL terminated ASCII/UTF-8 strings it is worth
    creating a function to deal with them:

    int cdb_add_string(cdb_t *cdb, const char *key, const char *value) {
    	assert(cdb);
    	assert(key);
    	assert(value);
    	const cdb_buffer_t k = { .length = strlen(key),   .buffer = (char*)key,   };
    	const cdb_buffer_t v = { .length = strlen(value), .buffer = (char*)value, };
    	return cdb_add(cdb, &k, &v);
    }
    

    Note that you cannot query for a key from a database opened up in create
    mode and you cannot add a key-value pair to a database opened up in read
    mode. The operations are mutually exclusive.

    To search for a key within the database, you open up a database connection in
    read mode (create = 0):

    /* error handling omitted for brevity */
    cdb_t *cdb = NULL;
    cdb_options_t ops = { /* Your file callbacks/options go here */ };
    cdb_open(&cdb, &ops, 1, "example.cdb");
    cdb_buffer_t key = { .length = 5, .buffer = "hello" };
    cdb_file_pos_t value = { 0, 0, };
    cdb_get(cdb, &key, &value);
    /* use cdb_seek, then cdb_read, to use returned value */
    cdb_close(cdb);
    

    Upon retrieval of a key the database does not allocate a value for you, instead
    it provides an object consisting of a file position and a length of the value.
    This can be read from wherever the database is stored with the function
    ‘cdb_read’. Before issuing a read, ‘cdb_seek’ must be called as the file
    handle may be pointing to a different area in the database.

    If a read or a seek is issued that goes outside of the bounds of the database
    then all subsequent database operations on that handle will fail, not just
    reads or seeks. The only valid things to do on a database that has returned a
    negative number is to call ‘cdb_status’ and then ‘cdb_close’ and never
    use the handle again. ‘cdb_status’ must not be used on a closed handle.

    As there are potentially duplicate keys, the function ‘cdb_count’ can be
    used to query for duplicates. It sets the parameter count to the number of
    records found for that key (and it sets count to zero, and returns zero, if no
    keys are found, it returns one if one or more keys were found).

    The function ‘cdb_status’ can be used to query what error has occurred, if
    any. On an error a negative value is returned, the meaning of this value is
    deliberately not included in the header as the errors recorded and the
    meaning of their values may change. Use the source for the library to determine
    what error occurred.

    The function ‘cdb_version’ returns the version number in an out parameter
    and information about the compile time options selected when the library was built.
    A Semantic Version Number is used, which takes the form “MAJOR.MINOR.PATCH”.
    The PATCH number is stored in the Least Significant Byte, the MINOR number the
    next byte up, and the MAJOR in the third byte. The fourth byte contains the
    compile time options.

    There are several things that could be done to speed up the database but this
    would complicate the implementation and the API.

    C API FUNCTIONS

    The C API contains 13 functions and some callbacks, more than is
    desired, but they all have their uses. Ideally a library would
    contain far fewer functions and require less of a cognitive burden
    on the user to get right, however making a generic enough C library
    and using C in general requires more complexity than is usual, but
    not more than is necessary.

    There is regularity in these functions, they all return negative
    on failure (the only exception being the allocator callback that
    returns a pointer), most of the functions accept a “cdb_t” structure
    as well, which is an opaque pointer (opaque pointers are not
    an unalloyed good, they imply that an allocator must be used, which
    can be a problem in embedded systems).

    int cdb_open(cdb_t **cdb, const cdb_options_t *ops, int create, const char *file);
    int cdb_close(cdb_t *cdb);
    int cdb_read(cdb_t *cdb, void *buf, cdb_word_t length);
    int cdb_add(cdb_t *cdb, const cdb_buffer_t *key, const cdb_buffer_t *value);
    int cdb_seek(cdb_t *cdb, cdb_word_t position);
    int cdb_foreach(cdb_t *cdb, cdb_callback cb, void *param);
    int cdb_read_word_pair(cdb_t *cdb, cdb_word_t *w1, cdb_word_t *w2);
    int cdb_get(cdb_t *cdb, const cdb_buffer_t *key, cdb_file_pos_t *value);
    int cdb_lookup(cdb_t *cdb, const cdb_buffer_t *key, cdb_file_pos_t *value, long record);
    int cdb_count(cdb_t *cdb, const cdb_buffer_t *key, long *count);
    int cdb_status(cdb_t *cdb);
    int cdb_version(unsigned long *version);
    int cdb_tests(const cdb_options_t *ops, const char *test_file);
    
    typedef int (*cdb_callback)(cdb_t *cdb, const cdb_file_pos_t *key, const cdb_file_pos_t *value, void *param);
    
    • cdb_open

    The most complex function that contains the most parameters, “cdb_open”
    is used to open a connection to a database. A pointer to a handle is
    passed to the first parameter, using the supplied allocation callback
    (passed-in in the “ops” parameter) the function will allocate enough space
    for “cdb_t” structure, this out-parameter is the database handle. It will
    be set to NULL on failure, which will also be indicated with a negative
    return value on the “cdb_open” function. Once “cdb_close” is called on
    this handle the handle should not be used again, and “cdb_close” should
    only be called on the returned handle once.

    A single database can be opened by as many readers as you like, however
    reading a database and writing to a database are mutually exclusive operations.

    When writing to a database there should not be any readers active on
    that database. This is a fundamental limitation of the database design.

    Writing to a CDB file that is being read by another CDB instance can
    cause corruption of data and general nasty things! Do not do it!

    As such, a database can only be opened up in read only, or write only
    mode.

    The “file” parameter is passed to the “open” callback, which is present
    in the “ops” parameter.

    	void *(*open)(const char *name, int mode);
    

    The callback should return an opaque pointer on success and NULL on failure.
    It is used to open up a handle to the database via whatever method the
    library user would like (for example, a simple file present in your file
    system, or a section of flash in an embedded computer). The open callback
    is used by “cdb_open” and should not be called directly.

    The “mode” parameter to the “open” callback will be set to “CDB_RW_MODE” if
    “create” is non-zero, and will be set to “CDB_RO_MODE” if it is zero.

    CDB_RW_MODE is an enumeration that has the value “1”, whilst
    CDB_RW_MODE has the value “0”.

    “cdb_open” does quite a lot, when opening a CDB file for reading the
    file is partially verified, when opening for writing a blank first level
    hash table is written to disk. If either of this fails, then opening
    the database will fail.

    The function also needs the callbacks to perform a seek to be present,
    along with the callback for reading. The write callback only needs to
    present when the database is opened up in write mode.

    • cdb_close

    This closes the CDB database handle, the handle may be NULL, if so,
    nothing will be done. The same handle should not be passed in twice
    to “cdb_close” as this can cause double-free errors. This function
    will release any memory and handles (by calling the “close” callback)
    associated with the handle.

    When writing a database this function has one more task to do, and
    that is finalizing the database, it writes out the hash-table at
    the end of the file. If “cbd_close” is not called after the
    last entry has been added then the database will be in an invalid
    state and will not work.

    This function may return negative on error, for example if the
    finalization fails.

    After calling “cdb_close” the handle must not be used again.

    • cdb_read

    To be used on a database opened up in read-mode only. This can
    be used to read values, and sometimes keys, from the database. This
    function does not call “cdb_seek”, the caller must call “cdb_seek”
    before calling this function to move the file pointer to the
    desired location before reading. The file pointer will be updated
    to point to after the location that has been read (or more accurately,
    the read callback must do this). This function does not return the
    number of bytes read, instead it returns zero for no error and
    negative if an error condition occurs (a partial read is treated as
    an error).

    • cdb_add

    To be used on a database opened up in write, or creation, mode only.

    This function adds a key-value pair to the database, which can be
    looked up only after finalizing the database (by calling “cdb_close”)
    and reopening the database in read-only mode, which should be done
    after the final “cdb_add” has been added.

    It is unfortunate that both the key and value must reside within
    memory, but doing anything else would complicate the API too much.

    One the key and value have been added they can be freed or discarded
    however.

    Adding key-value pairs consumes disk space and some extra memory
    which is needed to store the second level hash table, however the
    keys and values are not kept around in memory by the CDB library.

    Note that this function will add duplicate keys without complaining,
    and can add zero length keys and values, likewise without complaining.

    It is entirely up to the caller to prevent duplicates from being
    added. This is one improvement that could be added to the library (as
    you cannot check or query a partially written database at the
    moment).

    • cdb_seek

    This function changes the position that the next read or write
    will occur from. You should not seek before or after the database,
    doing so will result in an error. Seeking is always relative to the
    start of the file, the optional offset specified in the CDB options
    structure being added to the current position. Relative to current
    position or file-end seeks cannot be done.

    This function must be called before each call to “cdb_read” or
    “cdb_read_word_pair”, otherwise you may read garbage.

    Calling “cdb_seek” multiple times on the same location has no
    effect (the “fseek” C standard library function may discard buffers
    if called multiple times on the same location even though the file
    position has not changed).

    • cdb_foreach

    The “cdb_foreach” function calls a callback for each value within
    the CDB database. The callback is passed an optional “param”. If
    the callback returns negative or a non-zero number then the for-each
    loop is terminated early (a positive number is returned, a negative
    number results in -1 being returned). If the callback returns zero
    then the next value, if any, is processed with the callback being
    called again.

    The callback is passed a structure which contains the location
    within the CDB database that contains the key and value. The keys
    and values are not presented in any specific order and the order
    should not be expected to stay the same between calls.

    To read either a key or a value you must call “cdb_seek” before
    calling “cdb_read” yourself.

    Passing in NULL is allowed and is not a No-Operation, it can be
    used to effectively check the integrity of the database.

    • cdb_read_word_pair

    To be used on a database opened up in read-mode only. This function
    is a helper function that strictly does not need to exist, it is
    used for reading two “cdb_word_t” values from the database. This
    can be useful for the library user for more detailed analysis of
    the database than would normally be possible, many values within
    the database are stored as two “cdb_word_t” values. Looking inside this
    read-only database is not discouraged and the file format is well
    documented.

    This function does not call “cdb_seek”, that must be called
    before hand to seek to the desired file location. The file position
    will be updated to point after the two read values.

    • cdb_get

    This function populates the “value” structure if the “key” is found
    within the CDB database. The members of “value” will be set to zero
    if a key is not found, if it is found the position will be non-zero,
    although the length may be zero.

    Note that this function does not actually retrieve the key and put it
    into a buffer, there is a very good reason for that. It would be easy
    enough to make such a function given the functions present in this
    API, however in order to make such a function it would have to do
    the following; allocate enough space to store the value, read the
    value off of disk and then return the result. This has massive performance
    implications. Imagine if a large value is stored in the database, say
    a 1GiB value, this would mean at least 1GiB of memory would need to
    be allocated, it would also mean all of the file buffers would have
    been flushed and refilled, and all of that data would need to be copied
    from disk to memory. This might be desired, it might also be very
    wasteful, especially if only a fraction of the value is actually
    needed (say the first few hundred bytes). Whether this is wasteful
    depends entirely on your workload and use-cases for the database.

    It is better to give the user tools to do what they need than insisting
    it be done one, limiting, although “easy”, way.

    This does mean that to actually retrieve the value the user must
    perform their own “cdb_seek” and “cdb_read” operations. This
    means that the entire value does not need to read into memory
    be the consumer, and potentially be processed block by block by
    the “read” callback if needed.

    • cdb_lookup

    “cdb_lookup” is similar to “cdb_get” except it accepts an
    optional record number. Everything that applies to the get-function
    applies to the lookup-function, the only difference is the record
    number argument (internally “cdb_get” is implemented with
    “cdb_lookup”).

    If there are two or more keys that are identical then the question
    of how to select a specific key arises. This is done with an
    arbitrary number that will most likely, but is not guaranteed, to
    be the order in which the key was added into the database, with the
    first value being zero and the index being incremented from there
    on out.

    If the key is found but the index is out of bounds it is treated
    as if the key does not exist. Use “cdb_count” to calculate the
    maximum number records per key if needed, it is far more expensive
    to repeatedly call “cdb_lookup” on a key until it returns “key
    not found” to determine the number of duplicate keys than it is
    to call “cdb_count”.

    The index argument perhaps should be a “cdb_word_t”, but there
    is always debate around these topics (personally if I were to
    design a C-like programming language everything integers would default
    to 64-bits and all pointers would fit within that, other types
    for indexing and the like would also be 64-bit, that’s not a
    criticism of C, the madness around integer types was born out
    of necessity).

    • cdb_count

    The “cdb_count” function counts the number of entries that
    have the same key value. This function requires potentially multiple
    seeks and reads to compute, so the returned value should be cached if
    you plan on using it again as the value is expensive to calculate.

    If the key is not found, a value indicating that will be returned
    and the count argument will be zeroed. If found, the count will
    be put in the count argument.

    • cdb_status

    This function returns the status of the CDB library handle. All
    errors are sticky in this library, if an error occurs when handling
    a CDB database then there is no way to clear that error short of
    reopening the database with a new handle. The only valid operation
    to do after getting an error from any of the functions that operate
    on a “cdb_t” handle is to call “cdb_status” to query the error
    value that is stored internally.

    “cdb_status” should return a zero on no error and a negative value
    on failure. It should not return a positive non-zero value.

    • cdb_version

    “cdb_version” returns the version number of the library. It stores
    the value in an unsigned long. This may return an error value and a
    zero value if the version has not been set correctly at compile time.

    The value is stored in “MAJOR.MINOR.PATH” format, with “PATH” stored
    in the Least Significant Byte. This is a semantic version number. If
    the “MAJOR” number has changed then there are potentially breaking
    changes in the API or ABI of this library that have been introduced,
    no matter how trivial.

    • cdb_tests

    And the callback for “cdb_foreach”:

    • “cdb_callback”

    This callback is called for each value within the CDB database
    when used with “cdb_foreach”. If a negative value is returned from
    this callback then the foreach loop will end early and an error value
    will be returned. If the value returned is greater than zero then
    the foreach loop will terminate potentially early. If zero the
    foreach loop will continue to the next key-value pair if available.

    Each time this callback is called by “cdb_foreach” it will be
    passed in a key-value pair in the form of two length/file-location
    structures. You will need to seek to those locations and call
    read the key-values yourself. There is no guarantee the file position
    is in the correct location (ie. Pointing to the location of the
    key), so call “cdb_seek” before calling “cdb_read”.

    There is no guarantee that the key-value pairs will be presented
    in the same order each time the function is called and should not
    be counted on. There is no attempt to preserve order.

    See “cdb_foreach” for more information.

    C API STRUCTURES

    The C API has two simple structures and one complex one, the latter being
    more of a container for callbacks (or, some might say, a way of doing
    object oriented programming in C). The complex structure, “cdb_options_t”,
    is an unfortunate necessity.

    The other two structures, “cdb_buffer_t” and “cdb_file_pos_t”, are
    simple enough and need very little explanation, although they will be.

    Let us look at the “cdb_options_t” structure:

    typedef struct {
    	void *(*allocator)(void *arena, void *ptr, size_t oldsz, size_t newsz);
    	cdb_word_t (*hash)(const uint8_t *data, size_t length);
    	int (*compare)(const void *a, const void *b, size_t length);
    	cdb_word_t (*read)(void *file, void *buf, size_t length);
    	cdb_word_t (*write)(void *file, void *buf, size_t length);
    	int (*seek)(void *file, uint64_t offset);
    	void *(*open)(const char *name, int mode);
    	int (*close)(void *file);
    	int (*flush)(void *file);
    
    	void *arena;
    	cdb_word_t offset;
    	unsigned size;
    } cdb_options_t;
    

    Each member of the structure will need an explanation.

    STRUCTURE CALLBACKS

    • allocator

    This function is based off of the allocator callback mechanism
    present in Lua, see https://www.lua.org/manual/5.1/manual.html#lua_setallocf
    for more information on that allocator. This function can handle
    freeing memory, allocating memory, and reallocating memory, all
    in one function. This allows the user of this library to specify
    where objects are allocated and how.

    The arguments to the callback mean:

    1. arena

    This may be NULL, it is an optional argument that can be used
    to store memory allocation statistics or as part of an arena
    allocator.

    1. ptr

    This should be NULL if allocating new memory, of be a pointer
    to some previously allocated memory if freeing memory or
    reallocating it.

    1. oldsz

    The old size of the pointer if known, if unknown, use zero. This is
    used to prevent unnecessary allocations.

    1. newz

    The new size of the desired pointer, this should be non-zero
    if reallocating or allocating memory. To free memory set this
    to zero, along with providing a pointer to free. If this is zero
    and the “ptr” is NULL then nothing will happen.

    1. The return value

    This will be NULL on failure if allocating memory or reallocating
    memory and that operation failed. It will be non-NULL on success,
    containing usable memory. If freeing memory this should return NULL.

    An example allocator using the built in allocation routines is:

    void *allocator_cb(void *arena, void *ptr, size_t oldsz, size_t newsz) {
    	UNUSED(arena);
    	if (newsz == 0) {
    		free(ptr);
    		return NULL;
    	}
    	if (newsz > oldsz)
    		return realloc(ptr, newsz);
    	return ptr;
    }
    

    This callback is both simple and flexible, and more importantly
    puts the control of allocating back to the user (I know I have
    repeated this many times throughout this document, but it is
    worth repeating!).

    compare: /* key comparison function: NULL defaults to memcmp */
    write: https://roboquill.io/
    flush: /* (optional) called at end of successful creation */
    
    arena:   /* used for 'arena' argument for the allocator, can be NULL if allocator allows it */
    offset: /* starting offset for CDB file if not at beginning of file */
    size:  /* Either 0 (same as 32), 16, 32 or 64, but cannot be bigger than 'sizeof(cdb_word_t)*8' */
    
    • hash (optional)

    The “hash” callback can be set to NULL, if that is the case then
    the default hash, based off of djb2 and present in the original
    CDB library, will be used. If you do provide your own hash function
    you will effectively make this database incompatible with the standard
    CDB format but there are valid reasons for you do do this, you might
    need a stronger hash that is more resistant to denial of service attacks,
    or perhaps you want similar keys to collide more to group them together.

    The hash function returns “cdb_word_t” so the number of bits this
    function returns is dependent on big that type is (determined at
    compile time).

    • compare (optional)

    This function compares keys for a match, the function should behave like
    memcmp, returning the same values on a match and a failure. You
    may want to change this function if you want to compare keys partially,
    however you will also need to change the hash function to ensure keys are
    sorted into the right 256 buckets for your comparison (for example, with
    the default hash function two keys with the same prefix could be stored in
    two separate buckets).

    FILE CALLBACKS

    The following callbacks act in a similar way to the file functions present
    in stdio.h. The only function missing is an ftell equivalent.

    • read

    This function is used to read data out of the database, wherever that
    data is stored. Unlike fread a status code is returned instead of
    the length of the data read, negative indicating failure. A partial read
    should result in a failure. The only thing lacking from this callback
    is a way to signal to perform non-blocking Input and Output, that would
    complicate the internals however. The “read” callback should always be
    present.

    The first parameter, “file”, is a handle to an object returned by the
    “open” callback.

    The callback should return 0 indicating no error if “length” bytes have
    been read into “buf”.

    Reading should continue from the previous file pointer position, that
    is if you open a file handle, read X bytes, the next time you read Y
    bytes they should be read from the end of the X bytes and not the
    beginning of the file (hence why read does not take a file position).

    If implementing read callbacks in an embedded system you might have to
    also implement that behavior.

    • write (conditionally optional, needed for database creation only)

    Similar to the “read” callback, but instead writes data into wherever
    the database is stored.

    • seek

    This callback sets the file position that subsequent reads and writes
    occur from.

    • open

    This callback should open the resource specified by the “name” string
    (which will usually be a file name). There are two modes a read/write
    mode (used to create the database) and a read-only mode. This callback
    much like the “close” callback will only be called once internally
    by the CDB library.

    • close

    This callback should close the file handle returned by “open”, freeing
    any resources associated with that handle.

    • flush (optional)

    An optional callback used for flushing writes to mass-storage. If NULL
    then the function will not be called.

    STRUCTURE VARIABLES

    • arena (optional, can be NULL, depends on your allocator)

    This value is passed into the allocator as the “arena” argument whenever
    the allocator is called. It can be NULL, which will usually be the case
    if you are just using “malloc”, “realloc” and “free” to implement the
    allocator, but if you are implementing your own arena based allocator you
    might want to set it to point to your arena (hence the name).

    • offset

    This offset can be used for CDB databases embedded within a file. If
    the CDB database does not begin at the start of the file (or flash, or
    wherever) then you can set this offset to skip over that many number
    of bytes in the file.

    • size

    The size variable, which can be left at zero, is used to select
    the word size of the database, this has an interaction with “cdb_word_t”.

    Missing perhaps is a unsigned field that could contain options
    in each bit position in that field.

    BUFFER STRUCTURE

    typedef struct {
    	cdb_word_t length; /* length of data */
    	char *buffer;      /* pointer to arbitrary data */
    } cdb_buffer_t; /* used to represent a key or value in memory */
    

    FILE POSITION STRUCTURE

    typedef struct {
    	cdb_word_t position; /* position in file, for use with cdb_read/cdb_seek */
    	cdb_word_t length;   /* length of data on disk, for use with cdb_read */
    } cdb_file_pos_t; /* used to represent a value on disk that can be accessed via 'cdb_options_t' */
    

    EMBEDDED SUITABILITY

    There are many libraries written in C, for better or worse, as it is the
    lingua franca for software development at the moment. Few of those libraries
    are directly suitable for use in Embedded systems and are much less
    flexible than they could be in general. Embedded systems pose some interesting
    constraints (eschewing allocation via “malloc”, lack of a file-system, and
    more). By designing the library for an embedded system we can make a library
    more useful not only for those systems but for hosted systems as well (eg. By
    providing callbacks for the FILE functions we can redirect them to wherever
    we like, the CDB file could be stored remotely and accessed via TCP, or it
    could be stored locally using a normal file, or it could be stored in memory).

    There are two sets of functions that should be abstracted out in nearly
    every library, memory allocation (or even better, the caller can pass in
    fixed length structures if possible) and Input/Output functions (including
    logging!). This library does both.

    There is one area in which the library is lacking, the I/O functions do not
    yield if there is nothing to read yet, or a write operation is taking too
    long. This does impose constraints on the caller and how the library is used
    (all calls to the library could block for an arbitrary length of time). The
    callbacks could return a status indicating the caller should yield, but
    yielding and restoring state to enable partially completed I/O to finish
    would greatly complicate the library (this would be trivial to implement if
    C had portable coroutines built into the language).

    More libraries should be written with this information in mind.

    TEST SUITE

    There is a special note that should be mentioned about how the test suite
    is handled as it is important.

    It is difficult to make a good API that is easy to use, consistent, and
    difficult to misuse. Bad APIs abound in common and critical software
    (names will not be named) and can make an already difficult to use language
    like C even more difficult to use.

    One mistake that is often seen is API functionality that is conditional
    upon an macro. This complicates the build system along with every piece of
    software that is dependent on those optional calls. The most common function
    to be optionally compiled in are test suite related functions if they are
    present. For good reason these test suites might need to be removed from builds
    (as they might take up large amounts of space for code even if they are not
    needed, which is at a premium in embedded systems with limited flash memory).

    The header often contains code like this:

    #ifdef LIBRARY_UNIT_TESTS
    int library_unit_tests(void);
    #endif
    

    And the code like this, in C like pseudo-code:

    #ifdef LIBRARY_UNIT_TESTS
    int test_function_1(void) {
    	/* might call malloc directly, making this unsuitable
    	to be included in an embedded system */
    	return result;
    }
    
    int library_unit_tests(void) {
    	/* tests go here */
    	if (test_function_1() != OK)
    		return FAIL;
    	return PASS;
    }
    #endif
    

    In order to call this code you need to be aware of the “LIBRARY_UNIT_TESTS”
    macro each time the function “library_unit_tests” is called, and worse,
    whether or not your library was compiled with that macro enabled resulting
    in link-time errors. Another common mistake is not passing in the functions
    for I/O and allocation to the unit test framework, making it unsuitable for
    embedded use (but that is a common criticism for many C libraries and not
    just unit tests).

    Compare this to this libraries way of handling unit tests:

    In the header:

    int cdb_tests(const cdb_options_t *ops, const char *test_file);
    

    And the relevant bits of code/pseudo-code:

    static uint64_t xorshift128(uint64_t s[2]) {
    	assert(s);
    	/* XORSHIFT-128 algorithm */
    	return NEXT_PRNG;
    }
    
    
    int cdb_tests(const cdb_options_t *ops, const char *test_file) {
    	assert(ops);
    	assert(test_file);
    	BUILD_BUG_ON(sizeof (cdb_word_t) < 2);
    
    	if (CDB_TESTS_ON == 0)
    		return CDB_OK_E;
    
    	/* LOTS OF TEST CODE NOT SHOWN, some of which
    	uses "xorshift128". */
    
    	return STATUS;
    }
    

    There is no “ifdef” surrounding any of the code (using “ifdef” anywhere to
    conditionally execute code is usually a mistake, is only used within the
    project to set default macro values if the macro is not previously
    defined, an acceptable usage).

    Two things are important here, the first, all of the Input and Output
    and memory related functions are passed in via the “ops” structure,
    as mentioned. This means that the test code is easy to port and run on
    a microcontroller which might not have a file system (for testing and
    development purposes you might want to run the tests on a microcontroller
    but not keep them in in the final product).

    The main difference is the lack of “ifdef” guards, instead if the macro
    “CDB_TESTS_ON” is false the function “cdb_tests” returns “CDB_OK_E”
    (there is some debate if the return code should be this, or something
    to indicate the tests are not present, but that is a separate issue, the
    important bit is the return depending on whether the tests are present).

    This “if” statement is a far superior way of handling optional code in
    general. The caller does not have to worry if the function is present or
    not, as the function will always be present in the library. Not only that,
    but if the tests are not run because the compile time macro “CDB_TESTS_ON”
    is false then the compiler will optimize out those tests even on the lowest
    optimization settings (on any decent compiler).

    This also has the advantage that the code that is not run still goes
    through the compilation step meaning the code is less likely to be wrong
    when refactoring code. Not only that, but because “xorshift128” which
    “cdb_tests” depends on, is declared to be static, if “CDB_TESTS_ON” is
    false it to will be eliminated from the compiled object file so long as no
    other function calls it. In actual fact, the code has changed since
    this has been written and “cdb_prng” is exposed in the header as it is
    useful in main.c, which is equivalent to “xorshift128”.

    BUILD REQUIREMENTS

    If you are building the program from the repository at
    https://github.com/howerj/cdb you will need GNU Make and a C
    Compiler
    . The library is written in pure C99 and should be fairly
    simple to port to another platform. Other Make implementations may
    work, however they have not been tested. git is also used as part of
    the build system.

    First clone the repository and change directory to the newly clone repository:

    git clone https://github.com/howerj/cdb cdb
    cd cdb
    

    Type ‘make’ to build the cdb executable and library.

    Type ‘make test’ to build and run the cdb internal tests. The script called
    ‘t’, written in sh, does more testing, and tests that the user interface
    is working correctly. ‘make dist’ is used to create a compressed tar file for
    distribution. ‘make install’ can be used to install the binaries, however the
    default installation directory (which can be set with the ‘DESTDIR’ makefile
    variable) installs to a directory called ‘install’ within the repository –
    it will not actually install anything. Changing ‘DESTDIR’ to ‘/usr’ should
    install everything properly. pandoc is required to build the manual page
    for installation, which is generated from this markdown file.

    Look at the source file cdb.c to see what compile time options can be
    passed to the compiler to enable and disable features (if code size is a
    concern then the ability to create databases can be removed, for example).

    RENAME

    CDB databases are meant to be read-only, in order to add entries to
    a database that database should be dumped and new values added in along
    with the old ones. That is, to add in a new value to the database the
    entire database has to be rebuilt. This is not a problem for some work
    loads, for some work loads the database could be rebuilt every X hours.

    If this does present a problem, then you should not use this database.

    However, when a database does have to be rebuilt how do you make sure
    that users of it point to the new database and not the old one?

    If you access the database via the command line applications then
    the “rename” function, which is atomic on POSIX systems, will do
    what is needed. This is, a mechanism to swap out the old database with
    a new one without affecting any of the current readers.

    A rename can be done in C like so:

    rename("new.cdb", "current.cdb"); /* Atomic rename */
    

    If a reader opens “current.cdb” before the rename then it will continue
    to read the old database until it closes the handle and opens up “current.cdb”
    after the rename. The files data persists even if there is no file name that
    points to it so long as there are active users of that file (ie. If a file
    handle to that file is still open). This will mean that there could be
    processes that use old data, but not inconsistent data. If a reader opens
    up the data after the rename, it will get the new data.

    This also means that the writer should never write to a file that is
    currently in use by other readers or writers, it should write to a new
    file that will be renamed to the file in use, and it also means that a
    large amount of disk storage space will be in use until all users of
    the old databases switch to the new databases allowing the disk space
    to be reclaimed by the operating system.

    POSSIBLE DIRECTIONS

    There are many additions that could be made to a project, however the
    code is quite compact and neat, anything else that is needed could be built
    on top of this library. Some ideas for improvement include; adding a header
    along with a CRC, adding (unsafe) functions for rewriting key-values,
    adding (de)compression (with the shrink library) and decryption,
    integrating the project in an embedded system in conjunction with littlefs
    as an example, allowing the user to supply their own comparison and hash
    functions, adding types and schemas to the database, and more. The project
    could also be used as the primary database library for the pickle
    interpreter, or for serving static content in the eweb web-server.

    All of these would add complexity, and more code – making it more useful
    to some and less to others. As such, apart from bugs, the library and test
    driver programs should be considered complete.

    The lack of a header might be solved in creative ways as:

    • The integrity of most of the file can be checked by making sure all pointers are
      within bounds, that key-value pairs are stored one after another and that
      each key is in the right bucket for that hash. The only things not checked
      would be the values (they would still have to be of the right length).
    • If a file successfully passes a verification it can be identified as a valid
      CDB file of that size, this means we would not need to store header
      information about the file type and structure. This has been verified
      experimentally (the empty and randomly generated databases of a different
      size do not pass verification when the incorrect size is specified with
      the “-b” option).
    • We could place the header within the key-value section of the database, or
      even at the end of the file.

    Things that should and could be done, but have not:

    • Fuzzing with American Fuzzy Lop to iron out the most egregious
      bugs, security relevant or otherwise. This has been used on the pickle
      library to great effect and it finds bugs that would not be caught be unit
      testing alone. The library is currently undergoing fuzzing, nothing
      bad found so far
      .
    • The current library implements a system for looking up data
      stored to disk, a system could be created that does so much more.
      Amongst the things that could be done are:

      • Using the CDB file format only as a serialization format
        for an in memory database which would allow key deletion/replacing.
        This Key-Value store would essentially just be an in memory hash
        table with a fancy name, backed by this library. The project could
        be done as part of this library or as a separate project.
      • Implementing the memcached protocol to allow remote querying
        of data.
      • Alternatively make a custom protocol that accept commands over
        UDP.
        There are a few implementation strategies for doing this.
    • Alternatively, just a simple Key-Value store that uses this database
      as a back-end without anything else fancy.
    • Changing the library interface so it is a header only C library.
    • Making a set of callbacks to allow an in memory CDB database, useful
      for embedding the database within binaries.
    • Designing a suite of benchmarks for similar databases and implementations
      of CDB, much like https://docs.huihoo.com/qdbm/benchmark.pdf.

    Porting this to Rust and making a crate for it would be nice,
    although implementations already exists.
    Just making bindings for this library would be a good initial step, along
    with other languages.

    For more things that are possible to do:

    • The API supplies a for-each loop mechanism where the user supplies a
      callback, an iterator based solution would be more flexible (but slightly
      more error prone to use).
    • The user can specify their own hash algorithm, using one with perhaps
      better characteristics for their purposes (and breaking compatibility
      with the original format). One interesting possibility is using a hashing
      algorithm that maximizes collisions of similar keys, so similar keys are
      grouped together which may be useful when iterating over the database.
      Unfortunately the initial 256 wide bucket system interferes with this,
      which could be remedied by returning zero for lowest eight bits, degrading
      performance. It is not really viable to do this with this system, but
      hashing algorithms that maximize collisions, such as SOUNDEX, are
      interesting and deserve a mention. This could be paired with a user
      supplied comparison function for comparing the keys themselves.
    • The callbacks for the file access words (“open”, “read”, …) deserve
      their own structure so it can be reused, as the allocator can, although
      it may require some changes to how those functions work (such as different
      return values, passing in a handle to arbitrary user supplied data, and
      more).
    • Options for making the file checking more lax, as information could
      be stored between the different key/value pairs making the file format
      semi-compatible between implementations. This could be information usually
      stored in the header, or information about the key/values themselves (such
      as type information). Some implementations, including this one, are
      more strict in what they accept.
    • Some of the functions in main.c could be moved into cdb.c so
      users do not have to reimplement them.
    • A poor performance Bloom Filter like algorithm can be made
      using the first level hash table. A function to return whether an
      item may be in the set or is definitely not can be made by checking
      whether there are any items in the first 256 bucket that key hashes
      to. The 256 bucket is small enough to fit in memory, as are the second
      level hash tables which could be used to improve performance even more.
    • If the user presorts the keys when adding the data then the keys can
      be retrieved in order using the “foreach” API call. The user could sort
      on the data instead if they like.
    • The way version information is communicated within the API is not
      perhaps the best way of doing it. A simple macro would suffice.
    • The file format really could use a redesign. One improvement apart
      from adding a header would be to move the 256 bucket initial hash table
      to the end of the file so the entire file format could be streamed to
      disk.

    BUGS

    For any bugs, email the author. It comes with a ‘works on my machine
    guarantee’. The code has been written with the intention of being portable,
    and should work on 32-bit and 64-bit machines. It is tested more frequently
    on a 64-bit Linux machine, and less frequently on Windows. Please give a
    detailed bug report (including but not limited to what machine/OS you are
    running on, compiler, compiler version, a failing example test case, your
    blood type and star sign, etcetera).

    PYTHON IMPLEMENTATION

    Available from here
    https://www.unixuser.org/~euske/doc/cdbinternals/index.html. It
    probably is the most succinct description and understandable by someone
    not versed in python.

    #!/usr/bin/env python
    
    # Python implementation of cdb
    
    # calc hash value with a given key
    def calc_hash(s):
      return reduce(lambda h,c: (((h << 5) + h) ^ ord(c)) & 0xffffffffL, s, 5381)
    
    # cdbget(fp, basepos, key)
    def cdbget(fp, pos_header, k):
      from struct import unpack
    
      r = []
      h = calc_hash(k)
    
      fp.seek(pos_header + (h % 256)*(4+4))
      (pos_bucket, ncells) = unpack('<LL', fp.read(4+4))
      if ncells == 0: raise KeyError
    
      start = (h >> 8) % ncells
      for i in range(ncells):
        fp.seek(pos_bucket + ((start+i) % ncells)*(4+4))
        (h1, p1) = unpack('<LL', fp.read(4+4))
        if p1 == 0: raise KeyError
        if h1 == h:
          fp.seek(p1)
          (klen, vlen) = unpack('<LL', fp.read(4+4))
          k1 = fp.read(klen)
          v1 = fp.read(vlen)
          if k1 == k:
    	r.append(v1)
    	break
      else:
        raise KeyError
    
      return r
    
    
    # cdbmake(filename, hash)
    def cdbmake(f, a):
      from struct import pack
    
      # write cdb
      def write_cdb(fp):
        pos_header = fp.tell()
    
        # skip header
        p = pos_header+(4+4)*256  # sizeof((h,p))*256
        fp.seek(p)
    
        bucket = [ [] for i in range(256) ]
        # write data & make hash
        for (k,v) in a.iteritems():
          fp.write(pack('<LL',len(k), len(v)))
          fp.write(k)
          fp.write(v)
          h = calc_hash(k)
          bucket[h % 256].append((h,p))
          # sizeof(keylen)+sizeof(datalen)+sizeof(key)+sizeof(data)
          p += 4+4+len(k)+len(v)
    
        pos_hash = p
        # write hashes
        for b1 in bucket:
          if b1:
    	ncells = len(b1)*2
    	cell = [ (0,0) for i in range(ncells) ]
    	for (h,p) in b1:
    	  i = (h >> 8) % ncells
    	  while cell[i][1]:  # is call[i] already occupied?
    	    i = (i+1) % ncells
    	  cell[i] = (h,p)
    	for (h,p) in cell:
    	  fp.write(pack('<LL', h, p))
    
        # write header
        fp.seek(pos_header)
        for b1 in bucket:
          fp.write(pack('<LL', pos_hash, len(b1)*2))
          pos_hash += (len(b1)*2)*(4+4)
        return
    
      # main
      fp=file(f, "wb")
      write_cdb(fp)
      fp.close()
      return
    
    
    # cdbmake by python-cdb
    def cdbmake_true(f, a):
      import cdb
      c = cdb.cdbmake(f, f+".tmp")
      for (k,v) in a.iteritems():
        c.add(k,v)
      c.finish()
      return
    
    
    # test suite
    def test(n):
      import os
      from random import randint
      a = {}
      def randstr():
        return "".join([ chr(randint(32,126)) for i in xrange(randint(1,1000)) ])
      for i in xrange(n):
        a[randstr()] = randstr()
      #a = {"a":"1", "bcd":"234", "def":"567"}
      #a = {"a":"1"}
      cdbmake("my.cdb", a)
      cdbmake_true("true.cdb", a)
      # check the correctness
      os.system("cmp my.cdb true.cdb")
    
      fp = file("my.cdb")
      # check if all values are correctly obtained
      for (k,v) in a.iteritems():
        (v1,) = cdbget(fp, 0, k)
        assert v1 == v, "diff: "+repr(k)
      # check if nonexistent keys get error
      for i in xrange(n*2):
        k = randstr()
        try:
          v = a[k]
        except KeyError:
          try:
    	cdbget(fp, 0, k)
    	assert 0, "found: "+k
          except KeyError:
    	pass
      fp.close()
      return
    
    if __name__ == "__main__":
      test(1000)
    

    This tests the python version implemented here against another python
    implementation. It only implements the original 32-bit version.

    COPYRIGHT

    The libraries, documentation, and the test driver program are licensed under
    the Unlicense. Do what thou wilt.

    Visit original content creator repository
    https://github.com/howerj/cdb

  • karbakar

    توضیحات کلی

    کارباکار یک سیستم اقتصادی جدید است که ارزش بر اساس قضاوت کاربران درباره کیفیت و کمیت خدمات تعیین می‌شود، نه پول.
    کارباکار یک پلتفرم برای ایجاد یک اقتصاد غیر متمرکز بین تولید کنندگان است
    افراد با ساخت صفحه شخصی, خود را معرفی می کنند
    آنگاه یا کسب و کار خود را می سازند یا با پاسخ مثبت به درخواست استخدام کسب و کارهای دیگر به استخدام آنها در می آیند

    نحوه مبادله کالا به این صورت است که هر کسب و کار مقدار محصولی که میخواهد در این پلتفرم ارائه کند را بهمراه کسب و کارهایی که می خواهد دریافت کننده این محصولات باشند را مشخص می نماید کیفیت و کمیت خدماتی که کسب و کارها ارائه می نمایند شاخصی می شود برای آنکه دیگر کسب و کارها هم به آنها خدمات بدهند

    بخش های مختلف نرم افزار

    مواردی که با علامت * مشخص می شوند هنوز ساخته نشده اند

    صفحه اول welcome یا صفحه ورود و ثبت نام

    ثبت نام و ورود با وارد نمودن شماره موبایل و کد تایید که از طریق پیامک می شود انجام می گردد
    توکن 1 ماه اعتبار دارد و در ورود های بعدی خود به خود به صفحه اصلی منتقل می شود مگر اینکه خروج از حساب را انتخاب کند
    اگر قبلا ثبت نام کرده بود وارد میشه اگر نه برای موبایلی که وارد شده یک اکانت جدید ساخته و وارد سایت میشه
    اینجوری کسی متوجه نمیشه که فلان شماره عضو سایت هست یا نه

    بنابراین
    اگر شماره تماس توی سیستم بود و کد درست بود وارد می شود
    اگر شماره تماس توی سیستم بود و کد درست نبود میگه کد اشتباه است
    اگر شماره تماس توی سیستم نبود و کد درست بود ثبت نام و وارد می شود
    اگر شماره تماس توی سیستم نبود و کد درست نبود میگه کد اشتباه است
    تایید قوانین

    صفحه شخصی

    پروفایل هر کاربر شامل موارد ذیل می باشد

    • کد

      یک شماره که از عدد 1000 شروع می شود و برای جستجوی افراد کاربرد دارد

    • تاریخ عضویت

      مابقی مشخصات که بصورت پیشفرض قرار گرفته و کاربر می تواند آن را تغییر دهد

    • لیست کسب و کارهایی که عضوشون هست
    • اسم 30 کارکتری
    • آواتار
    • عکس هدر
    • توضیح 150 کارکتری
    • توضیح 300 کارکتری
    • ایمیل
    • اینستاگرام
    • شماره تماس
    • لینک صفحه شخصی
    • لیست کسب و کارها شامل کسب و کار اصلی و فرعی

    صفحه کسب و کار

    کاربری که عضو 3 کسب و کار است نمی تواند کسب و کار جدید ایجاد نماید و درخواست عضویت بدهد
    نماینده کسب و کار تنها با جایگزین کردن یکی دیگر از اعضای کسب و کار می تواند استغفا دهد یعنی اگر حتی کسی غیر از او در کسب و کار عضو نیست نمی تواند کسب و کار را ترک نماید و بایستی فرد دیگری به عضویت درآمده و او را جایگزین خود نماید
    نماینده کسب و کار در کسب و کارهای بالای 5 نفر بصورت چرخشی هر ماه تغییر می کند*
    صفحه کسب و کار شامل موارد ذیل می باشد

    • حروف انگلیسی پیوسته 20 کارکتری جهت آی دی

      مابقی مشخصات در صورت ایجاد به این کسب و کار اضافه میشه

    • لیست کارمندان
    • نماینده کسب و کار
    • برند 30 کارکتری
    • صنف
    • آواتار
    • عکس هدر
    • تاریخ ایجاد
    • توضیح 150 کارکتری
    • توضیح 300 کارکتری
    • ایمیل
    • لینک سایت
    • شماره تماس
    • آدرس جغرافیایی
    • لیست کسب و کارهای دریافت کننده محصولات
    • لیست کامل محصولات توزیع شده
    • لیست خلاصه محصول توزیع شده بر حسب میزان بر حسب واحد اندازه گیری دسته بندی طی یکسال گذشته و بیشتر
    • لیست محصولاتی که در این ماه ارائه داده بر اساس تعداد و واحد اندازه گیری و نام محصول برای کسب و کارها یا اعضای کسب و کارها
    • درصد انجام تعهدات ماه گذشته
    • لیست تامین کنندگانی که این کسب و کار را در لیست دریافت کننده خود قرار داده اند
    • محصولاتی که دریافت کرده بر اساس زمان و صنف و تعداد و واحد اندازه گیری

      مثال:
      172 خدمت از صنف تعمیرکاران خودرو
      11 کیلوگرم از صنف میوه و تره بار
      55 کیلوگرم از صنف پروتئینی
      1235 عدد از صنف نانوایی

    لاجیک:

    • کسب و کار با انتخاب صنف و نام بصورت حروف انگلیسی پیوسته 20 کارکتری ایجاد می شود که شامل یک صفحه است و لینک مخصوص به خود دارد
    • هنگام ویرایش هر بخش از کسب و کاری که قبلا ساخته شده پس از تغییر محتویات هر بخش گزینه “ذخیره تغییرات” نمایش داده می شود و با کلیک بر آن دیتابیس آپدیت می شود
    • صنف بصورت لیست آبشاری از صنف ها باز می شود و در صورت عدم انطباق موضوع فعالیت با هیچ کدام از آیتم های لیست موجود می تواند صنف جدید ایجاد نماید
    • فردی که واحد تولیدی خدماتی را ایجاد می نماید نماینده واحد تولیدی خدماتی است
    • اطلاعات واحد تولیدی و خدماتی توسط همه قابل مشاهده و توسط نماینده قابل ویرایش است
    • برای واحد های بیشتر از پنج عضو نماینده واحد تولیدی خدماتی بصورت رندم هر ماه از بین افرادی که برای این وظیفه نامزد باشن انتخاب می شود*
    • نماینده می تواند افراد را به واحد تولیدی خدماتی اضافه یا کم نماید برای واحد های بیشتر از پنج عضو این کار توسط پنج نفر از اعضای واحد که بصورت رندم تعیین می شوند تایید می گردد در صورت عدم تایید طی ۲۴ ساعت این تغییرات لغو می شود

    صفحه اصلی

    سربرگ header

    این صفحه در بخش هدر یا سربرگ شامل آواتار کاربر است که با کلیک روی آن وارد صفحه پروفایل خود می شوید
    پس از آن آیکن اعلانات ناتیفیکیشن وجود دارد که گزارشات بشرح ذیل را نمایش می دهد

    • درخواست استخدام : درخواست استخدام توسط نماینده کسب و کارها ایجاد و برای کاربران ارسال می شود و کاربران در بخش اعلان ها می توانند آن را تایید یا رد نمایند
    • گزارش تایید و یا رد درخواست استخدام : نتیجه تایید یا رد درخواست استخدام کاربران جدید به تمام اعضای کسب و کار اعلان می شود
    • گزارش درخواست عضویت در لیست دریافت کنندگان کسب و کار : کسب و کارها با رفتن به پروفایل کسب و کارها و زدن دکمه برای عضو نمودن کسب و کارها در لیست دریافت کنندگان خود درخواست می دهند این گزارش به تمام اعضای کسب و کار نمایش داده می شود و نماینده کسب و کار می تواند آن را تایید یا رد نماید
    • گزارش عضویت در لیست دریافت کنندگان: در صورت اضافه و کم شدن اعضای لیست دریافت کنندگان این گزارش برای تمام اعضا ارسال می شود
    • گزارش عضویت در لیست تامین کنندگان: هر گاه کسب و کاری به لیست تامین کننده ها اضافه می شود برای تمام اعضا ارسال می شود

    در بخش میانی بالای صفحه محیط جستجو قرار دارد که با وارد کردن نام و یا کد افراد و نام و ای دی شرکت ها لیست آنها به نمایش در می آید
    در انتها همبرگر منو برای باز کردن بخش منو وجود دارد که حاوی امکانات بشرح ذیل است

    • لیست کسب و کارها : که کسب و کار اصلی در بالای آن نمایش داده می شود
    • دکمه ایجاد کسب و کار جدید
    • استعفا از کسب و کار
    • لیست تمام کسب و کارها
    • خروج از سایت

    برگه ها tabs

    برای کسی که لاگین نشده
    صفحه اصلی شامل چهار برگه است که پایین هر کدام از این برگه ها هم یک دکمه اجرای شناور وجود دارد
    1.دریافت
    2.ارائه
    3.صورت حساب

    برگ دریافت

    هر کسب و کار مقدار کالایی که می خواهد به دیگران بدهد را جهت مصرف اعضا و یا مصرف کسب و کار در بازه زمانی یک ماهه مشخص می کند
    دریافت کنندگان می توانند جهت دریافت کالاهای به اشتراک گذاشته شده تا سقف مشخص شده هماهنگ و اقدام نمایند.
    در صورتی که پس از پایان مهلت یک ماهه بخشی از کالاها تحویل نشده باشد در پروفایل کسب و کار تامین کننده نوشته می شود این کسب و کار اینقدر درصد از محصولات اشتراک گذاشته شده ماه قبل خود را تحویل نداده است.

    در این صفحه کسب و کارهایی که به ارائه محصولات و خدمات خود به کسب و کار کاربر متعهد شده اند نمایش داده می شود
    این لیست بر اساس فاصله جغرافیایی چیده شده و اگر کاربر نماینده کسب و کار باشد محصولاتی که مصرف کسب و کار هستند هم در بالای مصرف اعضا نمایش داده می شود

    دکمه اعلام نیاز

    کاربر می توانند پس از مشخص کردن کسب و کارهایی که نماینده آنها است با انتخاب عنوان صنف و ثبت موقعیت جغرافیایی منطقه این اصناف و نوشتن توضیحات اختیاری صد کارکتری جهت عضویت در لیست دریافت کنندگان کسب و کارهای آن صنف درخواست دهند
    در صورتی که کسب و کار قبلا برای آن صنف درخواست ارسال نموده باشد بگوید شما قبلا برای این صنف درخواست فرستاده بودید آیا توضیحات درخواست قبلی ویرایش شود؟
    در بالای صفحه می توان درخواست هایی که قبلا برای آن کسب و کار گذاشته شده را مشاهده کرد

    برگ ارائه

    تب ارائه کسب و کارهایی که به خدمات صنف کسب و کار کاربر نیاز دارند را بر اساس فاصله مکانی نمایش می دهد
    کاربر می تواند با زدن روی هر کسب و کار وارد صفحه آن شود

    هر کسب و کار یگ لیست از دریافت کنندگان و یک لیست از تامین کنندگان دارد که تعداد هر کدام برای همه قابل مشاهده است
    برای قرار گرفتن افراد در لیست دریافت کنندگان 2 راه وجود دارد
    1* زمانی که صورتحسابی از شما تایید شد، دریافت کننده خدمات یا کالا در لیست دریافت کنندگان شما در صفحه کسب و کار شما به نمایش در می آید
    2* تامین کننده کسب و کارهایی را که دوست دارد را در پروفایلشان بر روی دکمه “افزودن به لیست دریافت کنندگان” کلیک می کند و در صورت تایید نماینده کسب و کار آن ، در لیست دریافت کنندگان قرار می گیرد
    وقتی هر کدام از لیست ها را باز کنید در ردیف ها به ترتیب تاریخ آخرین تحویل کالا دریافت کنندگان چیده می شوند و لیست محصولاتی که دریافت نموده اند نمایش داده می شود

    دکمه تغییر ظرفیت تولید

    این صفحه که توسط نماینده کسب و کار قابل مشاهده است شامل لیست محصولاتی است که توسط کسب و کار های او بصورت ماهانه ارائه می شود
    نماینده کسب و کار می تواند با انتخاب هر محصول و مقدار آن و تعیین اینکه این محصول برای کسب و کار مصرف می شود یا برای اعضای کسب و کارها قابل استفاده است آن را به لیست محصولات ارائه شده اضافه نماید
    در پایان ماه اگر از زمان ایجاد هر ردیف از این لیست یک ماه گذشته باشد در صفحه کسب و کار با عنوان محصولات ارائه شده طی ماه گذشته نمایش داده می شود و در صورتی که تحویل نشده باشد بصورت درصد با رنگ قرمز میزان عدم تعهد را نشان می دهد

    برگ اتحاد

    این برگ از پایین به این بخش ها تفکیک می شود

    • اتحاد هایی که به محصولات کسب و کارهای شما نیاز دارند
    • اتحادهایی که شما عضو شدید اما پیشنهادها و نیازهای باقی مانده دارد
    • اتحادهای شما که نیازها و پیشنهادهای آن کامل شده و اعضا باید یکدیگر را تایید نمایند
    • اتحاد های فعال شما

    لاجیک:

    ساخت یک اتحاد با ثبت نیاز یک کسب و کار به مقداری مشخص از محصولات و خدمات یک یا چند صنف و زمان مشخص شروع می شود و این اعلان نیاز به نمایندگان اصناف تامین کننده به ترتیب فاصله جغرافیایی نمایش داده می شود
    صنفی که به محصولات آن نیاز دارید باید از قبل وجود داشته باشد
    اگر محصولی که به آن نیاز دارید در لیست محصولات آن صنف وجود نداشته باشد بصورت “تایید نشده توسط صورتحساب” ایجاد می شود
    همه اعضا باید همه رو تایید کنند
    تایید کسب و کارها پس از تکمیل اتحاد صورت می گیرد
    افراد در هر مرحله ای غیر از فعال شدن اتحاد می توانند خروج بزنند
    کسب و کارهای ریجکت شده از اتحاد خارج می شوند
    سوابق اتحاد های شکل گرفته از بین نمی روند و در صفحه کسب و کار نمایش داده می شود
    طی زمان فعال بودن اتحاد اگر به کسب و کارها محصولات خودش رو بده توی محصولات ارائه شده اتحاد ثبت میشه
    هر کسب و کار تنها می تواند 5 اتحاد در ماه ایجاد نماید

    دکمه تشکیل اتحاد

    تمام کسب و کارها می توانند در این قسمت اتحاد بسازند
    برای ساخت اتحاد ابتدا کسب و کار خود را انتخاب می کنید
    یک نام برای اتحاد انتخاب می کنید
    یک توضیحات برای اتحاد وارد می کنید
    مدت اتحاد را تعیین می کنید
    محصولاتی که میخواهید ارائه دهید را بهمراه مقدار مشخص می کنید و سبر ارائه را تشکیل می دهید
    صنفی که میخواهید از آن محصول بگیرید را پیدا و انتخاب می کنید
    سبد محصولاتی که می خواهید را مشخص می کنید
    اتحاد را ثبت می کنید

    برگ صورتحساب

    نماینده کسب و کار می تواند صورتحساب شامل صنف و محصولات و واحد اندازه گیری و مقدار ایجاد کند که این صورتحساب ها تا زمانی تایید نشده اند در یک لیست به مشتری نمایش داده می شوند سقف صورتحساب دریافتی 100 عدد است و کاربر می تواند صورتحساب های دریافتی را پاک کند
    لیست صورتحساب های ارسالی تایید نشده برای نماینده کسب و کار قابل مشاهده است سقف این لیست 100 عدد است و نماینده می تواند هر کدام را نخواست حذف کند تا بتواند صورتحساب جدید ارسال نماید
    تمام صورتحسابها در کالکشن صورتحساب ذخیره می شوند و شامل صنف و محصولات و واحد اندازه گیری و مقدار و بولین تایید می باشند
    نماینده هر واحد تولیدی خدماتی می تواند به هر عضوی از تعاونی یک رسید بابت تحویل محصول و خدمات ارسال نماید و در صورت تایید عضو دریافت کننده، محتویات این رسید به لیست محصولات توزیع شده واحد تولیدی خدماتی اضافه می شود
    این رسید شامل عنوان دسته بندی محصولات یا خدمات بهمراه واحد اندازه گیری آن می باشد و عناوین بصورت کلی نوشته می شود بطور مثال؛ ۵ عدد پوشاک، ۶ کیلوگرم گوشت قرمز، ۲ عدد نصب کولر اسپلیت ، ۱۴۰ متر مربع نقاشی ساختمان
    و این رسید ها به سابقه واحد تولیدی خدماتی اضافه می شود

    لیست محصولات

    صنف ابتدا در هنگام ایجاد کسب و کارها در خود آن کسب و کار ثبت می شود
    سپس هنگام ارسال صورتحساب به همراه محصولات و واحد های اندازه گیری آن به کالکشن صورتحساب اضافه می شود و در صورت مثبت بودن تایید دریافت، نام صنف و محصولات و واحد های اندازه گیری آن به کالکشن صنف و محصولات سایت اضافه می شوند
    در یک صفحه جدا در سایت لیست محصولات شامل لیست صنف ها است که زیرمجموعه هر صنف نام محصول نمایش داده می شود
    محصولاتی که طی یکسال در هیچ صورتحسابی ثبت نشوند از لیست صنف ها حذف می شوند

    *********** آپدیت های آینده

    کارباکار

    اعضا کسب و کار اول که کارباکار باشد می توانند سایر کاربران و کسب و کارها به مدت 3 و 7 روز و دائم بن کند( افراد و کسب و کارهای بن شده در جستجو نمایش داده نمی شوند و افراد از کسب و کارهای عضو خارج می شوند و در صورتی که نفر آخر کسب و کار باشند از کسب و کار خارج نمی شوند بلکه آن کسب و کار هم بن می شود)

    تایید کسب و کارها

    در صفحه هر کسب و کار یک آیکن بنام تیک تایید وجود دارد که به شکل فعال و غیر فعال نمایش داده می شود
    با انتخاب این تیک تایید یک صفحه باز شده که 2 بخش دارد
    1 لیست کسب و کارهایی که این کسب و کار تایید نموده
    2 لیست کسب و کارهایی که این کسب و کار را تایید نموده اند

    ⦁ هر کسب و کار می تواند 6 کسب و کار دیگر را تایید نماید و یا تایید هایی که انجام داده را لغو کند
    ⦁ تعداد نامحدودی می توانند یک کسب و کار را تایید نمایند

    تایید شرطی: اگر 3 کسب و کار با تیک تایید فعال، یک کسب و کار را تایید نمایند تیک تایید این کسب و کار فعال خواهد شد

    ⦁ در صورت کاهش تعداد تایید کنندگان به زیر میزان حداقلی، هر 2 بخش لیست تایید آن کسب و کار و 2 بخش لیست تایید کسب و کارهایی که او را تایید کرده اند پاک می شود

    ⦁ کسب و کارهایی که تیک تایید نداشته باشند دریافت و تحویل کالای آنها ثبت نمی شود اما تاریخچه کارهای ثبت شده قبلی باقی می ماند

    ⦁ کسب و کارهایی که فعال نباشند (طی 3 ماه هیچ مبادله ای نداشته باشند) تیک تایید آنها غیر فعال می شود و 2 لیست آن پاک می شود

    تیک تایید کسب و کار کارباکار از روز اول فعال بوده و کسی نمی تواند او را تایید کند

    Visit original content creator repository
    https://github.com/ElyasMehraein/karbakar

  • j360

    This project shows how to export 4K resolution 360 Videos and Photos from inside of Three.js scenes.

    The process is described in this blog post: https://medium.com/p/788226f2c75f

    Examples

    example 4k video from demo scene on YouTube

    Alt text

    example 4k test video on YouTube

    Alt text

    How this works

    Basically you take a cube camera, save it to equirectangular photo, and then stitch those together to make a video. Add some metadata and voila! You can then post them to Facebook and Youtube.

    I made some modifications to the CCapture.js library, where I added a CC360Encoder class that calls into an cubemap to equirectangular image capture library from the same author. I made modifications to that library also, where I prepare the cube camera data for the encoder with the preBlob class. Finally, I was running into memory issues very quickly, so I re-implemented the broken batching in CCapture.js for .jpg sequences.

    The app will capture a batch every N seconds, according to the autoSaveTime parameter. Save and unarchive these .tar files, then use FFMPEG to stitch the images together. See the post on Medium for more about metadata.

    Try Online

    demo scene

    simple tests

    Example files

    Clone the repository and serve its files using a webserver of your choice.

    index.html contains simple test shapes. moving the camera during capture has no effect.

    demo.html is hacked into a three.js demo scene. moving the camera during capture will change the final shot.

    Use it yourself

    Include the modified CCapture.js and CubeMapToEquirectangular.js libraries. You’ll need tar.js and download.js as well. Which controls to include are up to you.

    Instantiate a capturer. Batches will download automatically every N seconds according to the autoSaveTime property.

    // Create a capturer that exports Equirectangular 360 JPG images in a TAR file
    var capturer360 = new CCapture({
        format: 'threesixty',
        display: true,
        autoSaveTime: 3,
    });
    

    Add a managed CubemapToEquirectangular camera when you setup your scene.

    Here we use “4K” but you can also use “2K” or “1K” as resolutions.

    equiManaged = new CubemapToEquirectangular(renderer, true,"4K");

    Call the capture method at the end render loop, and give it your canvas.

    capturer360.capture(canvas);

    These functions will start and stop the recording.

    function startCapture360(event) {
        capturer360.start();
    }
    
    function stopCapture360(event) {
        capturer360.stop();
    }
    

    Unarchive, Convert, and Add Metadata

    Unarchive the .tar files to a single folder and then convert the whole folder of images into a movie with one FFMPEG command

    ffmpeg -i %07d.jpg video.mp4

    The “%07d” tells FFMPEG that there are 7 decimals before the “.jpg” extension in each filename.

    In tests of a 30 second capture, I’ve seen a 1.66GB folder of 4K 360 images compress into a single 3.12mb 4K 360 video. A lot depends on how much movement there is in the scene, but the reductions are dramatic.

    Then use the Spatial Media Metadata Injector to add spatial metadata and upload.

    Contact

    Get in touch with me on LinkedIn for custom 360 content or more versatile deployments of this software.

    https://www.linkedin.com/in/jamespollack

    Visit original content creator repository https://github.com/imgntn/j360
  • j360

    This project shows how to export 4K resolution 360 Videos and Photos from inside of Three.js scenes.

    The process is described in this blog post: https://medium.com/p/788226f2c75f

    Examples

    example 4k video from demo scene on YouTube

    Alt text

    example 4k test video on YouTube

    Alt text

    How this works

    Basically you take a cube camera, save it to equirectangular photo, and then stitch those together to make a video. Add some metadata and voila! You can then post them to Facebook and Youtube.

    I made some modifications to the CCapture.js library, where I added a CC360Encoder class that calls into an cubemap to equirectangular image capture library from the same author. I made modifications to that library also, where I prepare the cube camera data for the encoder with the preBlob class. Finally, I was running into memory issues very quickly, so I re-implemented the broken batching in CCapture.js for .jpg sequences.

    The app will capture a batch every N seconds, according to the autoSaveTime parameter. Save and unarchive these .tar files, then use FFMPEG to stitch the images together. See the post on Medium for more about metadata.

    Try Online

    demo scene

    simple tests

    Example files

    Clone the repository and serve its files using a webserver of your choice.

    index.html contains simple test shapes. moving the camera during capture has no effect.

    demo.html is hacked into a three.js demo scene. moving the camera during capture will change the final shot.

    Use it yourself

    Include the modified CCapture.js and CubeMapToEquirectangular.js libraries. You’ll need tar.js and download.js as well. Which controls to include are up to you.

    Instantiate a capturer. Batches will download automatically every N seconds according to the autoSaveTime property.

    // Create a capturer that exports Equirectangular 360 JPG images in a TAR file
    var capturer360 = new CCapture({
        format: 'threesixty',
        display: true,
        autoSaveTime: 3,
    });
    

    Add a managed CubemapToEquirectangular camera when you setup your scene.

    Here we use “4K” but you can also use “2K” or “1K” as resolutions.

    equiManaged = new CubemapToEquirectangular(renderer, true,"4K");

    Call the capture method at the end render loop, and give it your canvas.

    capturer360.capture(canvas);

    These functions will start and stop the recording.

    function startCapture360(event) {
        capturer360.start();
    }
    
    function stopCapture360(event) {
        capturer360.stop();
    }
    

    Unarchive, Convert, and Add Metadata

    Unarchive the .tar files to a single folder and then convert the whole folder of images into a movie with one FFMPEG command

    ffmpeg -i %07d.jpg video.mp4

    The “%07d” tells FFMPEG that there are 7 decimals before the “.jpg” extension in each filename.

    In tests of a 30 second capture, I’ve seen a 1.66GB folder of 4K 360 images compress into a single 3.12mb 4K 360 video. A lot depends on how much movement there is in the scene, but the reductions are dramatic.

    Then use the Spatial Media Metadata Injector to add spatial metadata and upload.

    Contact

    Get in touch with me on LinkedIn for custom 360 content or more versatile deployments of this software.

    https://www.linkedin.com/in/jamespollack

    Visit original content creator repository https://github.com/imgntn/j360
  • spacex

    This project was bootstrapped with Create React App.

    Available Scripts

    In the project directory, you can run:

    npm start

    Runs the app in the development mode.
    Open http://localhost:3000 to view it in the browser.

    The page will reload if you make edits.
    You will also see any lint errors in the console.

    npm test

    Launches the test runner in the interactive watch mode.
    See the section about running tests for more information.

    npm run build

    Builds the app for production to the build folder.
    It correctly bundles React in production mode and optimizes the build for the best performance.

    The build is minified and the filenames include the hashes.
    Your app is ready to be deployed!

    See the section about deployment for more information.

    npm run eject

    Note: this is a one-way operation. Once you eject, you can’t go back!

    If you aren’t satisfied with the build tool and configuration choices, you can eject at any time. This command will remove the single build dependency from your project.

    Instead, it will copy all the configuration files and the transitive dependencies (webpack, Babel, ESLint, etc) right into your project so you have full control over them. All of the commands except eject will still work, but they will point to the copied scripts so you can tweak them. At this point you’re on your own.

    You don’t have to ever use eject. The curated feature set is suitable for small and middle deployments, and you shouldn’t feel obligated to use this feature. However we understand that this tool wouldn’t be useful if you couldn’t customize it when you are ready for it.

    Learn More

    You can learn more in the Create React App documentation.

    To learn React, check out the React documentation.
    “# spacex”

    Visit original content creator repository
    https://github.com/ZeeshanRaza25/spacex

  • gatsby-source-prismic-graphql

    gatsby-source-prismic-graphql

    A Gatsby plugin for fetching source data from the Prismic headless CMS using Prismic’s beta GraphQL API. This plugin provides full support for Prismic’s preview feature out of the box.

    For more context, be sure to checkout Prismic’s getting started guide: Using Prismic With Gatsby. This README, however, serves as the most-up-to-date source of information on gatsby-source-prismic-graphql‘s latest developments and breaking changes.

    Please be sure your Prismic repository has the GraphQL API enabled. It is enabled by default on all new Prismic repositories. If you have an older repository or are unable to access https://[your_repo].prismic.io/graphql, please reach out to Prismic support to request the GraphQL API.

    Contents

    Differences From gatsby-source-prismic

    gatsby-source-prismic-graphql (this plugin) fetches data using Prismic’s beta GraphQL API and provides full support for Prismic’s Preview feature out of the box. It also provides an easy-to-configure interface for page generation.

    gatsby-source-prismic is a different plugin that fetches data using Prismic’s REST and Javascript APIs. Previews must be coded up separately.

    Getting Started

    Install the plugin

    npm install --save gatsby-source-prismic-graphql

    or

    yarn add gatsby-source-prismic-graphql

    Add plugin to gatsby-config.js and configure

    {
      resolve: 'gatsby-source-prismic-graphql',
      options: {
        repositoryName: 'gatsby-source-prismic-test-site', // required
        defaultLang: 'en-us', // optional, but recommended
        accessToken: '...', // optional
        prismicRef: '...', // optional, default: master; useful for A/B experiments
        path: '/preview', // optional, default: /preview
        previews: true, // optional, default: true
        pages: [{ // optional
          type: 'Article', // TypeName from prismic
          match: '/article/:uid', // pages will be generated under this pattern
          previewPath: '/article', // optional path for unpublished documents
          component: require.resolve('./src/templates/article.js'),
          sortBy: 'date_ASC', // optional, default: meta_lastPublicationDate_ASC; useful for pagination
        }],
        extraPageFields: 'article_type', // optional, extends pages query to pass extra fields
        sharpKeys: [
          /image|photo|picture/, // (default)
          'profilepic',
        ],
      }
    }

    Edit your gatsby-browser.js

    const { registerLinkResolver } = require('gatsby-source-prismic-graphql');
    const { linkResolver } = require('./src/utils/linkResolver');
    
    registerLinkResolver(linkResolver);

    Usage

    Automatic Page Generation

    You can generate pages automatically by providing a mapping configuration under the pages option in gatsby-config.js.

    Let’s assume we have the following page configuration set:

    {
      pages: [{
        type: 'Article',
        match: '/blogpost/:uid',
        previewPath: '/blogpost',
        component: require.resolve('./src/templates/article.js'),
      }],
    }

    If you have two blog posts with UIDs of foo and bar, the following URL slugs will be generated:

    • /blogpost/foo
    • /blogpost/bar

    If you create a new unpublished blogpost, baz it will be accessible for preview under, assuming you’ve established a preview session with Prismic:

    • /blogpost?uid=baz

    More on Prismic Previews below.

    Conditionally generating pages

    If the default page generation doesn’t cover your use-case, you can provide an optional filter option to your individual page configurations.

    For example, if you had a single Prismic Article type and wanted pages with music in their UIDs to be generated at a different URL :

    {
      pages: [{
        type: 'Article',
        match: '/musicblog/:uid',
        filter: data => data.node._meta.uid.includes('music'),
        previewPath: '/blogposts',
        component: require.resolve('./src/templates/article.js'),
      }, {
        type: 'Article',
        match: '/blog/:uid',
        filter: data => !data.node._meta.uid.includes('music'),
        previewPath: '/blogposts',
        component: require.resolve('./src/templates/article.js'),
      }],
    }

    Given 3 articles with UIDs of why-i-like-music, why-i-like-sports and why-i-like-food, the following URL slugs will be generated:

    • /musicblog/why-i-like-music
    • /blog/why-i-like-sports
    • /blog/why-i-like-food

    Generating pages from page fields

    Sometimes the meta provided by default doesn’t contain enough context to be able to filter pages effectively. By passing extraPageFields to the plugin options, we can extend what we can filter on.

    {
      extraPageFields: 'music_genre',
      pages: [{
        type: 'Article',
        match: '/techno/:uid',
        filter: data => data.node.music_genre === 'techno',
        previewPath: '/blogposts',
        component: require.resolve('./src/templates/article.js'),
      }, {
        type: 'Article',
        match: '/acoustic/:uid',
        filter: data => data.node.music_genre === 'acoustic',
        previewPath: '/blogposts',
        component: require.resolve('./src/templates/article.js'),
      }]
    }

    Given 2 articles with the music_genre field set, we’ll get the following slugs:

    /techno/darude
    /acoustic/mik-parsons

    Support for Multiple Languages

    Prismic allows you to create your content in multiple languages. This library supports that too. When setting up your configuration options in gatsby-config.js, there are three optional properties you should be aware of: options.defaultLang, options.langs, and options.pages[i].langs. In the following example, all are in use:

    {
      resolve: 'gatsby-source-prismic-graphql',
      options: {
        repositoryName: 'gatsby-source-prismic-test-site',
        defaultLang: 'en-us',
        langs: ['en-us', 'es-es', 'is'],
        path: '/preview',
        previews: true,
        pages: [{
          type: 'Article',
          match: '/:lang?/:uid',
          previewPath: '/article',
          component: require.resolve('./src/templates/article.js'),
          sortBy: 'date_ASC',
          langs: ['en-us', 'es-es', 'is'],
        }, {
          type: "Noticias",
          match: '/noticias/:uid',
          previewPath: '/noticias',
          component: require.resolve('./src/templates/noticias.js'),
          sortBy: 'date_ASC',
          langs: ['es-es'],
        }],
      }
    }

    In the example above, pages are generated for two document types from Prismic–Articles and Noticias. The latter consists of news stories in Spanish. There are three languages total in use in this blog: US English, Traditional Spanish and Icelandic.

    For Articles, we are instructing the plugin to generate pages for articles of all three languages. But, because there is a question mark (?) after the :lang portion of the match property (/:lang?/:uid), we only include the locale tag in the URL slug for languages that are not the defaultLang specified above (i.e., ‘en-us’). So for the following languages, these are the slugs generated:

    • US English: /epic-destinations
    • Spanish: /es-es/destinos-increibles
    • Icelandic: /is/reykjadalur

    If we had not specified a defaultLang, the slug for US English would have been /en-us/epic-destinations. And, in fact, including the langs: ['en-us', 'es-es', 'is'] declaration for this particular document type (Articles) is unnecessary because we already specified that as the default language set right after defaultLang in the plugin options.

    For Noticias, however, we only want to generate pages for Spanish documents of that type (langs is [es-es]). We decide that in this context, no locale tag is needed in the URL slug; “noticias” is already enough indication that the contents are in Spanish. So we omit the :lang match entirely and specify only match: '/noticias/:uid'.

    This is an example of how these three properties can be used together to offer maximum flexibility. To see this in action, check out the languages example app.

    (Optional) Short language codes

    To use short language codes (e.g. /fr/articles) instead of the default (e.g. /fr-fr/articles), you can set options.shortenUrlLangs to true.

    Keep in mind that if you use this option & have multiple variants of a language (e.g. en-us and en-au) that would be shortened to the same value, you should add UIDs to your URLs to differentiate them.

    Page Queries: Fetch Data From Prismic

    It is very easy to fetch data from Prismic in your pages:

    import React from 'react';
    import { RichText } from 'prismic-reactjs';
    
    export const query = graphql`
      {
        prismic {
          page(uid:"homepage", lang:"en-us") {
            title
            description
          }
        }
      }
    `
    
    export default function Page({ data }) => <>
      <h1>{RichText.render(data.prismic.title)}</h1>
      <h2>{RichText.render(data.prismic.description)}</h2>
    </>

    Prismic Previews

    Previews are enabled by default, however they must be configured in your prismic instance/repository. For instructions on configuring previews in Prismic, refer to Prismic’s guide: How to set up a preview.

    When testing previews, be sure you are starting from a valid Prismic preview URL/path. The most reliable way to test previews is by using the preview button from your draft in Prismic. If you wish to test the Preview locally, catch the URL that opens immediately after clicking the preview link:

    https://[your-domain.tld]/preview?token=https%3A%2F%[your-prismic-repo].prismic.io%2Fpreviews%2FXRag6xAAACA...ABwjduaa%3FwebsitePreviewId%3DXRA...djaa&documentId=XRBH...jduAa

    Then replace the protocol and domain at the beginning of the URL with your localhost:PORT instance, or wherever you’re wanting to preview from.

    This URL will be parsed and replaced by the web app and browser with the proper URL as specified in your page configuration.

    StaticQuery and useStaticQuery

    You can use StaticQuery as usual, but if you would like to preview them, you must use the withPreview function.

    See the example

    import { StaticQuery, graphql } from 'gatsby';
    import { withPreview } from 'gatsby-source-prismic-graphql';
    
    const articlesQuery = graphql`
      query {
        prismic {
          ...
        }
      }
    `;
    
    export const Articles = () => (
      <StaticQuery
        query={articlesQuery}
        render={withPreview(data => { ... }, articlesQuery)}
      />
    );

    useStaticQuery is not yet supported.

    Fragments

    Fragments are supported for both page queries and static queries.

    See the example

    Within page components:

    import { graphql } from 'gatsby';
    
    const fragmentX = graphql` fragment X on Y { ... } `;
    
    export const query = graphql`
      query {
        ...X
      }
    `;
    
    const MyPage = (data) => { ... };
    MyPage.fragments = [fragmentX];
    
    export default MyPage;

    With StaticQuery:

    import { StaticQuery, graphql } from 'gatsby';
    import { withPreview } from 'gatsby-source-prismic-graphql';
    
    const fragmentX = graphql` fragment X on Y { ... } `;
    
    export const query = graphql`
      query {
        ...X
      }
    `;
    
    export default () => (
      <StaticQuery
        query={query}
        render={withPreview(data => { ... }, query, [fragmentX])}
      />
    );

    Dynamic Queries and Fetching

    You can use this plugin to dynamically fetch data for your component using prismic.load. Refer to the pagination example to see it in action.

    import React from 'react';
    import { graphql } from 'gatsby';
    
    export const query = graphql`
      query Example($limit: Int) {
        prismic {
          allArticles(first: $limit) {
            edges {
              node {
                title
              }
            }
          }
        }
      }
    `;
    
    export default function Example({ data, prismic }) {
      const handleClick = () =>
        prismic.load({
          variables: { limit: 20 },
          query, // (optional)
          fragments: [], // (optional)
        });
    
      return (
        // ... data
        <button onClick={handleClick}>load more</button>
      );
    }

    Pagination

    Pagination can be accomplished statically (i.e., during initialy page generation) or dynamically (i.e., with JS in the browser). Examples of both can be found in the pagination example.

    Prismic pagination is cursor-based. See Prismic’s Paginate your results article to learn about cursor-based pagination.

    By default, pagination will be sorted by last publication date. If you would like to change that, specify a sortBy value in your page configuration in gatsby-config.js.

    Dynamically-Generated Pagination

    When coupled with prismic.load, as demonstrated in the index page of the pagination example, other pages can be fetched dynamically using page and cursor calculations.

    GraphQL documents from Prismic have a cursor–a base64-encoded string that represents their order, or page number, in the set of all documents queried. We provide two helpers for converting between cursor strings and page numbers:

    • getCursorFromDocumentIndex(index: number)
    • getDocumentIndexFromCursor(cursor: string)

    Statically-Generated Pagination

    Basic Pagination

    For basic linking between the pages, metadata for the previous and next pages are provided to you automatically via pageContext in the paginationPreviousMeta and paginationNextMeta properties. These can be used in conjunction with your linkResolver to generate links between pages without any additional GraphQL query. For an example of this, take a look at the <Pagination /> component in the pagination example’s article.js.

    Enhanced Pagination

    If you would like to gather other information about previous and next pages (say a title or image), simply modify your page query to retrieve those documents. This also is demonstrated in the same pagination example with the <EnhancedPagination /> component and the page’s GraphQL query.

    Working with gatsby-image

    The latest versions of this plugin support gatsby-image by adding a new property to GraphQL types that contains fields that match the sharpKeys array (this defaults to /image|photo|picture/) to the Sharp suffix.

    Note: When querying, make sure to also query the source field. For example:

    query {
      prismic {
        Article(id: "123") {
          title
          articlePhoto
          articlePhotoSharp {
            childImageSharp {
              fluid(maxWidth: 400, maxHeight: 250) {
                ...GatsbyImageSharpFluid
              }
            }
          }
        }
      }
    }

    You can also get access to specific crop sizes from Prismic by passing the crop argument:

    query {
      prismic {
        Author(id: "123") {
          name
          profile_picture
          profile_pictureSharp(crop: "face") {
            childImageSharp {
              fluid(maxWidth: 500, maxHeight: 500) {
                ...GatsbyImageSharpFluid
              }
            }
          }
        }
      }
    }

    NOTE Images are not transformed in preview mode, so be sure to fall back to the default image when the sharp image is null.

    import Img from 'gatsby-image';
    import get from 'lodash/get';
    
    // ...
    
    const sharpImage = get(data, 'prismic.Author.profile_pictureSharp.childImageSharp.fluid');
    return sharpImage ? (
      <Img fluid={sharpImage} />
    ) : (
      <img src={get(data, 'prismic.Author.profile_picture.url')} />
    );

    Later, we may add an Image component that does this for you and leverages the new Prismic Image API as a fallback for preview modes.

    Prismic.io Content A/B Experiments Integration

    You can use this plugin in combination with Prismic’s built-in experiments functionality, and a hosting service like Netlify, to run content A/B tests.

    Experiments in Prismic are basically branches of the core content, split into ‘refs’ similar to git branches. So if you want to get content from a certain experiment variation, you can pass the corresponding ref through to Prismic in your request, and it will return content based on that ref’s variation.

    A/B experiments are tricky to implement in a static website though; A/B testing needs a way to dynamically serve up the different variations to different website visitors. This is at odds with the idea of a static, non-dynamic website.

    Fortunately, static hosting providers like Netlify allow you to run A/B tests at a routing level. This makes it possible for us to build multiple versions of our project using different source data, and then within Netlify
    split traffic to our different static variations.

    Therefore, we can use A/B experiments from Prismic in the following way:

    1. Setup an experiment in Prismic.

    2. Create a new git branch of your project which will be used to get content. You will need to create a separate git branch for each variation.

    3. In that git branch, edit/add the optional ‘prismicRef’ parameter (documented above). The value of this should be the ref of the variation this git branch is for.

    4. Push the newly created branch to your git repo.

    5. Now go to your static hosting provider (we’ll use Netlify in this example), and setup split testing based on your git branches/Prismic variations.

    6. Now your static website will show different experimental variations of the content to different users! At this point the process is manual and non-ideal, but hopefully we’ll be able to automate it more in the future.

    How This Plugin Works

    1. The plugin creates a new page at /preview (by default, you can change this), that will be your preview URL you setup in the Prismic admin interface.

      It will automatically set cookies based on the query parameters and attempt to find the correct page to redirect to with your linkResolver.

    2. It uses a different babel-plugin-remove-graphql-queries on the client.

      The modified plugin emits your GraphQL queries as a string so they can be read and re-used on the client side by the plugin.

    3. Once redirected to a page with the content, everything will load normally.

      In the background, the plugin takes your original Gatsby GraphQL query, extracts the Prismic subquery and uses it to make a GraphQL request to Prismic with a preview reference.

      Once data is received, it will update the data prop with merged data from Prismic preview and re-render the component.

    Development

    git clone git@github.com:birkir/gatsby-source-prismic-graphql.git
    cd gatsby-source-prismic-graphql
    yarn install
    yarn setup
    yarn start
    
    # select example to work with
    cd examples/default
    yarn start

    Issues and Troubleshooting

    Please raise an issue on GitHub if you have any problems.

    My page GraphQL query does not hot-reload for previews

    This is a Gatsby limitation. You can bypass this limitation by adding the following:

    export const query = graphql` ... `;
    const MyPage = () => { ... };
    
    MyPage.query = query; // <-- set the query manually to allow hot-reload.

    Visit original content creator repository
    https://github.com/birkir/gatsby-source-prismic-graphql

  • gogh-figure

    gogh-figure

    Fast, Lightweight Style Transfer using Deep Learning: A re-implementation of “A Learned Representation For Artistic Style” (which proposed using Conditional Instance Normalization), “Instance Normalization: The Missing Ingredient for Fast Stylization”, and the fast neural-style transfer method proposed in “Perceptual Losses for Real-Time Style Transfer and Super-Resolution” using Lasagne and Theano.

    Results

    Conditional Instance Normalization

    This repository contains a re-implementation of the paper A Learned Representation For Artistic Style and its Google Magenta TensorFlow implementation. The major differences are as follows:

    1. The batch size has been changed (from 16 to 4); this was found to reduce training time without affecting the quality of the images generated.
    2. Training is done with the COCO dataset, as opposed to with ImageNet
    3. The style loss weights have been divided by the number of layers used to calculate the loss (though the values of the weights themselves have been increased so that the actual weights effectively remain the same)

    The following are the results when this technique was applied to style images described in the paper (to generate pastiches of a set of 32 paintings by various artists, and of 10 paintings by Monet, respectively):

    Misc. 32

    Joel Moniz

    Monet 10

    Real-Time Style Transfer

    This repository also contains a re-implementation of the paper Perceptual Losses for Real-Time Style Transfer and Super-Resolution and the author’s torch implementation(fast-neural-style) with the following differences:

    1. The implementation uses the conv2_2 layer of the VGG-Net for the content loss as in the paper, as opposed to the conv3_3 layer as in the author’s implementation.
    2. The following architectural differences are present in the transformation network, as recommended in the paper A Learned Representation For Artistic Style:
      a. Zero-padding is replaced with mirror-padding
      b. Deconvolutions are replaced by a nearest-neighbouring upsampling layer followed by a convolution layer
      These changes obviate the need of a total variation loss, in addition to providing other advantages.
    3. The implementation of the total variational loss is in accordance with this one, different from the author’s implementation. Total varition loss is no longer required, however (refer point 2).
    4. The implementation uses Instance Normalization (proposed in the paper “Instance Normalization: The Missing Ingredient for Fast Stylization”) by default: although Instance Normalization has been used in the repo containing the author’s implementation of the paper, it was proposed after the paper itself was released.
    5. The style loss weights have been divided by the number of layers used to calculate the loss (though the values of the weights themselves have been increased so that the actual weights effectively remain the same)

    Joel Moniz Still Joel Moniz, but stylized

    References

    Papers

    This repository re-implements 3 research papers:

    1. A Learned Representation For Artistic Style
    2. Instance Normalization: The Missing Ingredient for Fast Stylization
    3. Perceptual Losses for Real-Time Style Transfer and Super-Resolution

    Implementations

    Visit original content creator repository https://github.com/joelmoniz/gogh-figure
  • files-s3-backup

    Files-S3-Backup

    Tool to do backup of the selected directories and files on a server and send the resulting TAR file to AWS S3.

    Prerequisites

    • An IAM User or an AWS IAM Role attached to the EC2 instance (only for executions from EC2 instances) with the following IAM policy attached:

      Policy Name : AmazonS3FullAccess
      
      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": "s3:*",
                  "Resource": "*"
              }
          ]
      }
      
    • Pip tool for Python packages management. Installation:

      $ curl -O https://bootstrap.pypa.io/get-pip.py
      $ sudo python get-pip.py
      
    • AWS CLI to configure the profile to use (access key and/or region). Installation and configuration:

      $ sudo pip install awscli
      
      $ aws configure [--profile <profile-name>]
      AWS Access Key ID [None]: <access_key_id>         # Leave blank in EC2 instances with associated IAM Role
      AWS Secret Access Key [None]: <secret_access_key> # Leave blank in EC2 instances with associated IAM Role
      Default region name [None]: <region>              # eu-west-1, eu-central-1, us-east-1, ...
      Default output format [None]:
      

    Configuration

    1. Clone the project in the path you want:

      $ git clone https://github.com/rubenmromero/files-s3-backup.git
      
    2. Create a copy of fs3backup.conf.dist template as conf/fs3backup.conf and set the backup properties with the appropriate values:

      # From the project root folder
      $ cp conf/fs3backup.conf.dist conf/fs3backup.conf
      $ vi conf/fs3backup.conf
      
    3. If you want to schedule the periodic tool execution, copy the files-s3-backup template to the /etc/cron.d directory and replace the existing <tags> by the appropriate values:

      # From the project root folder
      $ sudo cp cron.d/files-s3-backup /etc/cron.d
      $ sudo vi cron.d/files-s3-backup
      

    Execution Method

    Once set up the backup properties in the conf/fs3backup.conf file, simply run bin/fs3backup.sh script as follows:

    # ./bin/fs3backup.sh
    

    Related Links

    Visit original content creator repository
    https://github.com/rubenmromero/files-s3-backup

  • cdiff_fbi

    cdiff_fbi

    Typing of Clostridioides difficile isolates using NGS data (reads and contigs) based on tandem repeat loci (TR6, TR10), and toxin genes (cdtA, cdtB, tcdA, tcdB, tcdC).

    Quick start

    # Type
    bash cdifftyping.sh -h
    # Process
    bash postcdifftyping.sh -h
    # Summarize
    python3 qc_cdiff_summary.py -h

    Installation

    Source

    # Clone this repo
    git clone https://github.com/ssi-dk/cdiff_fbi.git
    # Create an environment with the required tools with conda
    conda create --name cdiff_pipeline picard gatk4 biopython ruamel.yaml kraken bwa samtools
    # Activate the environment
    conda activate cdiff pipeline
    # Install a custom tool
    git clone https://github.com/ssi-dk/serum_readfilter
    cd serum_readfilter
    pip install .

    Usage

    Example

    # Download data into the test folder
    mkdir -p test
    wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR142/ERR142064/ERR142064_2.fastq.gz -P test
    wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR142/ERR142064/ERR142064_1.fastq.gz -P test
    touch ERR142064.fasta  # Create an empty file as a fake assembly for testing purposes

    Pipeline

    # Build the db
    bash cdifftyping.sh -db db -update yes  # WARNING: Not yet implemented for serumdb or trstdb
    # Type
    bash cdifftyping.sh -i ERR142064 -R1 test/ERR142064_1.fastq.gz -R2 test/ERR142064_2.fastq.gz -c test/ERR142064.fasta -qc pass -o test -db db -update no
    # Process
    bash postcdifftyping.sh -i ERR142064 -d test -stbit "STNA;NA:NA"
    # Summarize
    python3 qc_cdiff_summary.py -i test -o test

    Output

    .csv

    Name;cdtA/B;tcdA;tcdB;tcdClength;117del;A117T;TRST;TR6;TR10;ST;STalleles;WGS;tcdA:tcdB:tcdC:cdtA:cdtB
    ERR142064;+/+;+;+;0;+;-;Unknown;Unknown;Unknown;STNA;NA:NA;test;8119/8133:6914/7101:700/700:1389/1389:2628/2628
    

    .json

    {"Name": "ERR142064", "cdtA": "+", "cdtB": "+", "tcdA": "+", "tcdB": "+", "tcdClength": "0", "117del": "+", "A117T": "-", "TRST": "Unknown", "TR6": "Unknown", "TR10": "Unknown", "ST": "STNA;NA:NA", "WGS": "test", "cov_info": {"tcdA": "8119/8133", "tcdB": "6914/7101", "tcdC": "700/700", "cdtA": "1389/1389", "cdtB": "2628/2628"}}
    

    Updating the db

    # Build the db
    bash cdifftyping.sh -db db -update yes  # WARNING: Not yet implemented for serumdb or trstdb

    Visit original content creator repository
    https://github.com/ssi-dk/cdiff_fbi

  • SudokuSolver

    Sudoku Solver now delivered via AI! well Almost..

    I’ve revisted this project while attending Data Science Retreat – Batch 36 in Berlin.
    During the workshop, I was influenced by how much we relied on ChatGPT to explain things
    or build examples. I then asked myself, When I was originally inspired by a former colleague’s post
    on Why Coding Skills Matters
    .
    I set out to see how polyglut I could be.

    It took me about a month of working almost everyday. In the end I got to 15 languages. But this required a lot of googling, stack overflow, distractions, read herrings,etc. But it was during Covid, so a nice distraction to get my fingers on the keyboard again. I was proud of myself……

    Now that we have have tools like ChatGPT, this changes everything

    So now, I needed to accept my Ego Death, and peek into the future or Software Development. Will we still need Software Developers? I’ve setup a simple framework for chatting to GPT, using the same prompt, just changing the language that the code should be written in. Then I looked at if these programs actually compile, run and solve the puzzles in the same number of iterations. Many times they ran sucessfully on the first try!

    I will summarize my findings below:

    The Original Project Goal

    This project aims to be a teaching tool to understand how different computer programming languages
    work to solve the same problem, in this case Sudoku puzzles. Why create a Suduko Solver? It is small
    enough to be fun while also touching enough functionality such as parsing commandline arguments,
    reading a file of data, using two dimensional arrays, performing several loops as well as recursion.
    The aim is not specifically to write the quickest solver as this version uses only a brute force
    approach. Puzzle #6 is extremely hard and requires over 622 Million iterations to solve!

    Goals

    • Learn the nuances of different programming languages
    • Compare their performance by solving the same problem
    • Configure and use Microsoft Visual Studio Code Editor as the IDE for all languages

    Programming Languages to Explore

    Language Manually ChatGPT
    BASH Completed
    BASIC N/A N/A
    C Completed Solved First Time
    C# Completed
    C++ Completed
    Clojure
    ERLANG?
    F # Incomplete Solved First Time
    Fortran Completed Compiled, but errors in logic.
    Go Completed Solved First Time
    Java Completed Solved First Time
    JavaScript Completed Did not compile
    Julia Completed
    Kotlin
    Lisp (racket) Completed
    OCaml N/A Solved First Time*
    Pascal N/A Non compilable result
    PHP N/A Solved First Time
    Powershell
    Python Completed Solved First Time
    R Completed
    Rust Completed Complete/Required some Debugging
    Ruby Completed
    Scala
    Tcl Completed Not Working
    TypeScript Completed Issues with node env

    Performance Results (6 Matrices)

    Tests run on a Macbook Pro with 2.7 GHz Quad-Core Intel Core i7

    Go          Seconds 9.000
    Rust        Seconds 9.78
    TypeScript  Seconds 19.436
    JavaScript  Seconds 22.500
    C++         Seconds 23.064
    C           Seconds 22.944
    Java        Seconds 42.465
    C_Sharp     Seconds 57.142
    Julia       Seconds 61.362
    Fortran     Seconds 613.763
    Ruby        Seconds 1035.392
    Tcl         Seconds 4398.989
    Python      Seconds 5114.008
    R	        Seconds 90903.00 (2.751 hours)
    Lisp        Seconds 345484.826 (96 hours)
    

    Findings

    • It’s really good to have several ways of solving the problem to ensure your answers in one method correlate to the other.
    • Rust is fast but had a bit of a learning curve (for me at least)
    • Go is just as fast as rust and took me only a few hours to code (my first time using go)
    • I was surprised how fast JavaScript in Node.js performed
    • Visual Code development speed was increased by using the tabnine extension
    • JavaScript development did not have the best visual feedback in Visual Code
    • C# was very easy to pick up
    • Fortran was pretty slow. (Maybe there is some optimization possible)
    • Tcl (my all time favorite language) is still faster than Python (which seems to have won the popularity contest)
    • WOW R was really slow, and LISP was waaay slower

    Program Goals

    1. Read a unsolved sudoku matrix from a file
    2. Find a solution to the matrix using two methods
      1. Brute Force Solution
      2. Human Solution (TODO)
    3. Count the number of recursive iterations to solve matrix
    4. Print solution and time to calculate result
    5. Print memory usage (TODO)
    6. Solution should be compiled with optimization
    7. Create on Object Oriented Implementation where possible and compare effect on perfromance (TODO)

    Development Goals

    1. Each Solution should be complete with build files, etc.
    2. Each Solution should also include a test suite
    3. Solution should be able to solve several supplied matrices
    4. Project to be managed in GITHUB
    5. Develop launch.json and tasks.json file for handling all languages

    Implementation Details

    Input matrices

    Each matrix should be read from a simple CSV file with .matrix file extension in the following format.

    # Comment
    9 2 0 0 0 0 5 8 4
    0 0 0 5 0 0 0 0 3
    0 8 3 0 9 2 0 0 0
    2 6 0 8 5 4 0 0 1
    0 0 5 3 6 1 0 9 0
    1 0 0 0 0 9 0 0 0
    8 5 0 2 0 3 0 1 0
    4 1 2 9 8 0 0 3 0
    3 9 0 0 0 6 8 0 0
    

    The program should take as it’s input one or more .matrix files

    Language Notes

    Racket

    Install packages via

    raco pkg install global
    

    Julia

    To add packages, like ArgParse, run the following in a julia interpreter.

    julia
    using Pkg
    Pkg.add("ArgParse")
    

    Python

    Timing the Solver

    time python3 Python/Sudoku.py Matrices/*.matrix

    Rust

    Ruby

    Running ruby with –jit flag to enable Just-In-Time compiling

    Tcl

    Needed to return to stack level 0 after solve() was completed, else it continued to run through the rest of the stack.

    C

    To compile in VSC press Alt-B then choose clang complier

    JAVA

    JavaScript

    Using Node.js

    Note: JavaScript only has One Dimensional Arrays. So you need to create an Array of Arrays! Also be wary
    of variable scope, as i & j values of the calling function were used if not excplicityly set via
    let in the subroutine.

    FORTRAN 90

    Using gfortran.

    BASIC

    Object Oriented Thoughts

    • Puzzle Class
      • ReadFile (Initialize?)
      • Print
      • Solve(Method: Human| Brute)
      • IsPossible

    Sample Output

    ~/iCloud/Programming/SudokuSolver: C/RunMe.sh
    ../Matrices/1.matrix
    
    Puzzle:
    9 2 0 0 0 0 5 8 4
    0 0 0 5 0 0 0 0 3
    0 8 3 0 9 2 0 0 0
    2 6 0 8 5 4 0 0 1
    0 0 5 3 6 1 0 9 0
    1 0 0 0 0 9 0 0 0
    8 5 0 2 0 3 0 1 0
    4 1 2 9 8 0 0 3 0
    3 9 0 0 0 6 8 0 0
    
    Puzzle:
    9 2 1 6 3 7 5 8 4
    6 7 4 5 1 8 9 2 3
    5 8 3 4 9 2 1 6 7
    2 6 9 8 5 4 3 7 1
    7 4 5 3 6 1 2 9 8
    1 3 8 7 2 9 6 4 5
    8 5 6 2 7 3 4 1 9
    4 1 2 9 8 5 7 3 6
    3 9 7 1 4 6 8 5 2
    
    Solved in Iterations=656
    ../Matrices/2.matrix
    
    Puzzle:
    0 0 3 0 0 5 0 0 4
    5 0 0 9 8 1 0 0 0
    0 0 0 0 0 0 0 2 0
    2 0 0 7 0 0 9 0 0
    0 8 0 0 9 0 0 3 0
    0 0 9 0 0 2 0 0 1
    0 3 0 0 0 0 0 0 0
    0 0 0 1 4 9 0 0 5
    9 0 0 3 0 0 8 0 0
    
    Puzzle:
    8 6 3 2 7 5 1 9 4
    5 4 2 9 8 1 6 7 3
    1 9 7 4 3 6 5 2 8
    2 5 4 7 1 3 9 8 6
    6 8 1 5 9 4 2 3 7
    3 7 9 8 6 2 4 5 1
    4 3 5 6 2 8 7 1 9
    7 2 8 1 4 9 3 6 5
    9 1 6 3 5 7 8 4 2
    
    Solved in Iterations=439269
    ../Matrices/3.matrix
    
    Puzzle:
    0 9 0 6 0 0 0 7 0
    0 5 2 0 7 0 0 4 8
    0 8 0 0 0 1 0 2 0
    0 0 5 0 0 0 0 0 3
    0 1 0 0 0 0 0 6 0
    3 0 0 0 0 0 4 0 0
    0 4 0 2 0 0 0 1 0
    2 6 0 0 4 0 5 3 0
    0 3 0 0 0 6 0 9 0
    
    Puzzle:
    4 9 3 6 8 2 1 7 5
    1 5 2 3 7 9 6 4 8
    7 8 6 4 5 1 3 2 9
    6 7 5 1 2 4 9 8 3
    8 1 4 5 9 3 7 6 2
    3 2 9 8 6 7 4 5 1
    9 4 7 2 3 5 8 1 6
    2 6 1 9 4 8 5 3 7
    5 3 8 7 1 6 2 9 4
    
    Solved in Iterations=98847
    Seconds to process 0.020988 Seconds
    ./Sudoku ../Matrices/*.matrix  0.02s user 0.00s system 93% cpu 0.024 total
    

    Visit original content creator repository
    https://github.com/Cars-10/SudokuSolver