Using the power of AVX-512 assembly language and cloud server clusters for hyper-efficient training

[EDITING SECTION]

Native AVX-512 assembly language software offers extreme reliability and efficiency

The code has been developed on AVX-512 assembly language natively. This ensures that the hardware capabilities are deployed to the maximum extent possible. As an example, all internal activation functions, and other numerical processes are executed in IEEE-754 64-bit double precision, this guarantees least drift from the expected results, without compromising the speed.

Our approach has been to avoid any libraries since we cannot guarantee their security and error-free results. Every function that has been used has been written in-house, in native AVX512 assembly, has been tested for all extreme values, and are easily transportable to new instructions as and when those become available.

Secure AWS Environment with no outbound transmission capacity.

We allocate the necessary data storage space and receive the encrypted files directly in that storage space, thus prevent any possibility of unauthorized access while it is in transit. Our data storage is within United States Amazon cloud infrastructure. This data container has very limited outflow capacity, and therefore, prevents siphoning off the data. The cloud infrastructure also supports the requisite physical security of the data as an integral component.

[DUMP HERE]

Our model specification software insists that every source data record be encrypted using NIST approved codebooks such as Galois Counter Method GCM, or CBC-CS1 or CBC-CS2 using AES 256 or AES 192 or AES 128 encryption. Our implementation decrypts these records only once these are ready to be processed inside the RAM, thus reducing the attack surface to an absolute minimum.

————

Our pricing model considers the cloud machine cost as a pass-through component, so you are free to leverage your existing cloud contracts to reduce total cost.

We ingest the data in a normalized form, where each value is replaced by (X-a)/b, and the values of (a,b) do not cross your security perimeter, thus making it impossible for us to reconstruct the data - a very critical security characteristic.

Other than the C run time library that interfaces with the operating system, our code does not use any library by any third party. Every line of code, including all security algorithm implementations, all mathematical functions, all thread management, memory management and scheduling, have been developed in-house to guarantee reliability. All the security implementations are now under certification by NIST.

Security integration at every level

Starting at the data level, the protection starts by hiding your data characteristics. We need not ever know what a data column represents. Further, as a requirement of more robust training, we request that each data column be scaled as (X{i} - a{i})/b{i}, and these a{i}, b{i} values be kept completely within your organization’s security perimeter. These values are not required by us - and not knowing them will ensure that the team here cannot re-construct the data. The value of a{i} need not be µ (mean) nor b_{i} needs to be σ (std dev) - this accelerates the computation of scaled X_{i} to a single pass operation.

When providing the data, you can randomize the columns, and the resulting weight values will follow the same sequence as the input data values.

FIPS 197 128-bit or 256-bit secure encryption provided by us, or provided by you, to secure your data.

At no stage, by design, we write the data back on to any device, not even momentarily, in an unencrypted form. We analyze the source data for accuracy against the model specifications, and during that process, we decrypt the data in memory and once the data has been validated, we re-encrypt the data using our own 128 bit keys that are randomly generated. The training step uses our own key, and the output produced are the weight values, again written out in an encrypted form with our own 128 bit keys.

Thus, the data exists only in volatile memory in an unencrypted form. To prevent any side-channel attacks, we ensure that this program is the only program that is running in the provided instance.

A suite of supporting software is offered - in a Java source code format - to help interface your specific data layout and data-types into a machine ingestible form. These may be compiled at your end on any object code capability depending on your environment.

Our interface file structures will be made available should you decide to build your own interface software.

[EDITING SECTION]

Rapid scaling and operationalization of terabyte-scale models

This service assists organizations migrate from the Neural Networks implemented in the data science labs to real industrial sized decision-making systems. The models, designed in the labs, usually work well with standard libraries available in languages such as Python or MATLAB, as well as many others, as long as the test data sizes are small. Once these models are ready, Machine Learning Systems will be able to run the training step using real extremely large sized data sets. Apart from incorporating this service as a part of periodic update to the models, our service will also be of great assistance in fine tuning the models at the design stage itself if necessary.

We offer the service as a package. Once the models have stabilized, a client organization is expected to train its models periodically but infrequently, and therefore, shall need to avail such a service only periodically, synchronized with the internal schedule for model update.

Saving computing hardware costs, testing licenses and cloud costs

As a management advantage, the service approach takes away the vexing issues of capital allocation for infrequently used hardware and its associated maintenance costs.

The investment into software licenses, license management, hardware compatibilities, update cycles, and version upgrades etc. efforts are all diminished.

Data security maintained through best-in-class hardware and software tools

We are happy to accept FIPS 197 128-bit or 256-bit secure encryption. The interfaces for this kind of encryption are widely available, and, it is quite likely to be available as a component of the export feature of the platform used at source. We would also be delighted to offer a free utility software that does this encryption.

At no stage, by design, we write the data back on to any device, not even momentarily, in an unencrypted form. We analyze the source data for accuracy against the model specifications, and during that process, we decrypt the data in memory and once the data has been validated, we re-encrypt the data using our own 128 bit keys that are randomly generated. Thus, the data exists only in volatile memory in an unencrypted form. To prevent any side-channel attacks, we ensure that this program is the only program that is running in the provided instance.

Business Advantages

  • Rapid scaling and operationalization of terabyte-scale models

    This service assists organizations migrate from the Neural Networks implemented in the data science labs to real industrial sized decision-making systems. The models, designed in the labs, usually work well with standard libraries available in languages such as Python or MATLAB, as well as many others, as long as the test data sizes are small. Once these models are ready, Machine Learning Systems will be able to run the training step using real extremely large sized data sets. Apart from incorporating this service as a part of periodic update to the models, our service will also be of great assistance in fine tuning the models at the design stage itself if necessary.

    We offer the service as a package. Once the models have stabilized, a client organization is expected to train its models periodically but infrequently, and therefore, shall need to avail such a service only periodically, synchronized with the internal schedule for model update.

  • Saving computing hardware costs, testing licenses and cloud costs

    As a management advantage, the service approach takes away the vexing issues of capital allocation for infrequently used hardware and its associated maintenance costs.

    The investment into software licenses, license management, hardware compatibilities, update cycles, and version upgrades etc. efforts are all diminished.

  • Data security maintained through best-in-class hardware and software tools

    We are happy to accept FIPS 197 128-bit or 256-bit secure encryption. The interfaces for this kind of encryption are widely available, and, it is quite likely to be available as a component of the export feature of the platform used at source. We would also be delighted to offer a free utility software that does this encryption.

    At no stage, by design, we write the data back on to any device, not even momentarily, in an unencrypted form. We analyze the source data for accuracy against the model specifications, and during that process, we decrypt the data in memory and once the data has been validated, we re-encrypt the data using our own 128 bit keys that are randomly generated. Thus, the data exists only in volatile memory in an unencrypted form. To prevent any side-channel attacks, we ensure that this program is the only program that is running in the provided instance.

Technical Specifications

  • AVX-512 assembly language software takes programming language related inefficiencies away.

    The code has been developed on AVX-512 assembly language natively. This ensures that the hardware capabilities are deployed to the maximum extent possible. As an example, all internal activation functions, and other numerical processes are executed in IEEE-754 64-bit double precision, this guarantees least drift from the expected results, without compromising the speed.

    A suite of supporting software is offered - in a Java source code format - to help interface your specific data layout and data-types into a machine ingestible form. These may be compiled at your end on any object code capability depending on your environment.

    Our interface file structures will be made available should you decide to build your own interface software.

  • Managing input data values with large variation in their range values and we can accept the data in both a normalized or non-normalized form.

    At a data management level, the protection starts by hiding your metadata. We need not ever know if a data column represents any data under PII or PHI. When providing the data, you can randomize the columns, and the resulting weight values will follow the same sequence as the input data values.

    This normalization step can be done at the client source level, or, during the validation pre-processing of the data at our end. Our normalization methods do not affect any de-identification or other data-protection measures already applied on the data, and will retain their masking attributes.

  • FIPS 197 128-bit or 256-bit secure encryption provided by us, or provided by you, to secure your data.

    At no stage, by design, we write the data back on to any device, not even momentarily, in an unencrypted form. We analyze the source data for accuracy against the model specifications, and during that process, we decrypt the data in memory and once the data has been validated, we re-encrypt the data using our own 128 bit keys that are randomly generated. The training step uses our own key, and the output produced are the weight values, again written out in an encrypted form with our own 128 bit keys.

    Thus, the data exists only in volatile memory in an unencrypted form. To prevent any side-channel attacks, we ensure that this program is the only program that is running in the provided instance.

  • Secure AWS Environment with no outbound transmission capacity.

    We allocate the necessary data storage space and receive the encrypted files directly in that storage space, thus prevent any possibility of unauthorized access while it is in transit. Our data storage is within United States Amazon cloud infrastructure. This data container has very limited outflow capacity, and therefore, prevents siphoning off the data. The cloud infrastructure also supports the requisite physical security of the data as an integral component.

Business Advantages

  • Rapid scaling and operationalization of terabyte-scale models

    This service assists organizations migrate from the Neural Networks implemented in the data science labs to real industrial sized decision-making systems. The models, designed in the labs, usually work well with standard libraries available in languages such as Python or MATLAB, as well as many others, as long as the test data sizes are small. Once these models are ready, Machine Learning Systems will be able to run the training step using real extremely large sized data sets. Apart from incorporating this service as a part of periodic update to the models, our service will also be of great assistance in fine tuning the models at the design stage itself if necessary.

    We offer the service as a package. Once the models have stabilized, a client organization is expected to train its models periodically but infrequently, and therefore, shall need to avail such a service only periodically, synchronized with the internal schedule for model update.

  • Saving computing hardware costs, testing licenses and cloud costs

    As a management advantage, the service approach takes away the vexing issues of capital allocation for infrequently used hardware and its associated maintenance costs.

    The investment into software licenses, license management, hardware compatibilities, update cycles, and version upgrades etc. efforts are all diminished.

  • Data security maintained through best-in-class hardware and software tools

    We are happy to accept FIPS 197 128-bit or 256-bit secure encryption. The interfaces for this kind of encryption are widely available, and, it is quite likely to be available as a component of the export feature of the platform used at source. We would also be delighted to offer a free utility software that does this encryption.

    At no stage, by design, we write the data back on to any device, not even momentarily, in an unencrypted form. We analyze the source data for accuracy against the model specifications, and during that process, we decrypt the data in memory and once the data has been validated, we re-encrypt the data using our own 128 bit keys that are randomly generated. Thus, the data exists only in volatile memory in an unencrypted form. To prevent any side-channel attacks, we ensure that this program is the only program that is running in the provided instance.