Security at source data level
Starting at the data level, the protection starts by hiding your data characteristics. We need not ever know what a data column represents. Further, as a requirement of more robust training, we request that each data column be scaled as (X{i} - a{i})/b{i}, and these a{i}, b{i} values be kept completely within your organization’s security perimeter. These values are not required by us - and not knowing them will ensure that the team here cannot re-construct the data. The value of a{i} need not be µ (mean) nor b_{i} needs to be σ (std dev) - this accelerates the computation of scaled X_{i} to a single pass operation.
When providing the data, you may randomize the columns, and the resulting weight values will follow the same sequence as the input data values provided to us.
Security at data record level
We understand that the labelled data is not generated at one central location in an organization. The data possible gets labelled by different units at different locations, and may need to travel to a central interchange point where its aggregated and then handed over to us.
From a security perspective, the data needs to be encrypted during its flow from the remote units to this central interchange point. In order to avoid the burden of decrypting and then re-encrypting the data at your central interchange point, we offer the feature where each data record may be encrypted using a different key, of different length, using a different initial value (nonce) and may have generated a different tag value. The individual keys and other associated data components are written out in a separate set of binary files, and those files are further encrypted using the ephemral key.
This helps your organization to maintain your security at a record level.
Security at encryption level
We accept the data only in an encrypted form. The encryption needs to be AES 128, AES 192 or AES 256 bit following the FIPS 197 specifications. The data encryption should follow our preferred codebook : the Galois Counter Method codebook (GCM) (refer NIST 800 - 38d) though we can accept CBC-CS 1 or CBC-CS 2 codebooks as well.
As a matter of design, we never decrypt the source file and save it to a disk. We decrypt the data just in time for processing and kept in the volatile memory. This approach ensures that the un-encrypted data is not kept on any storage device.
We have taken this approach to its logical extreme - the memory buffers are allocated strictly by partitioning the physical free memory, not the virtual memory so that under no circumstances a memory page gets swapped out with unencrypted data in the content. Therefore, the disks have no copy ever of the unencrypted data, even in the swap space.
All security implementations have been done in-house and have undergone CVAP validation from NIST.
Security at key exchange level
Key exchange remains the most vulnerable step in managing key security. We manage this by using a two level key generation approach.
The key value that is used for encrypting the source data is not transmitted to us directly. It is absorbed in the model file specification. The model file is then encrypted using an ephemeral key that is generated following the key exchange protocol as specified in NIST publication 800-56 Revision 3. The underlying technique used is Diffie Hellman Key exchange protocol that actually generates the private keys without crossing the security perimeter.
For each transaction, we generate a 82 to 87 decimal places long prime number drawn from a pool of prime numbers that have been pre-generated and pre-validated, and the corresponding primitive root values have been verified. These are assigned in an virtual interaction at the commencement of the data model generation , and are then incorporated into the DH key exchange program. These need not be preserved once the ephemeral keys have been generated.
This implementation is under security validation by NIST and once the certification is received, it will be published on this website.
Security at cloud storage level
The data processing takes place on a GCP instance. This instance is configured subject to your budget restrictions and desired throughput. Please keep in mind that this cost is considered a pass-through cost, and is actually completely distated by your preferences.
The encrypted data and the model are then placed in a bucket storage with a configuration that allows only the VM instances to read the data. The security strength comes from the fact that the data is not available for a read/copy/transfer out operation by any other entity except the corresponding VM instances. This data, once processed, is deleted from the bucket, and hence there are no residual data leaks. The