Data Pre-processing

The source training data may get generated in different layouts at different systems within your business. These need to be pre-processed and brought to a common layout for a smooth ingestion. The required format is explained in layouts, and this format has to follow model specifications, the next step, closely. A supporting Java source code that does not use any special library is also offered without charges or any warranties on an as-is basis to help kickstart the data pre-processing step.

Once the data has been pre-processed, the resultant file needs to be encrypted. We insist that the data be encrypted using AES 128, 192 or 256 bit key, using GCM as the preferred codebook (we do support CBC-CS1 & CBC-CS2 codebooks). This key is transferred to us through the model preparation step.

Model preparation

Model specification is a process that we do in conjunction with you. Our layout of the specification file is shared, and we read in a configuration file to generate the model specifications file. At overall level, the model specifications contain three components:

  • the global parameters, such as data precision, or count of epochs to run, or the key value used for encrypting the source file

  • the layer wise specifications, such as count of nodes, the transfer function, or the loss function. Our model handling capacity allows us to integrate data from a prior layer with a specified integrating function.

  • the weight sets and related parameters, such as pre-trained weights, or weights with fixed values, including weights set to 0.0 to represent a non-connection between nodes.

    Eventually, the model is a binary file that is transferred to us, along with the source data file, again in an encrypted form using AES 128,192 or 256 bit key and GCM codebook. The selection of encryption key, or key length is independent of the choices of the key and key length used to encrypt the data.

Output

The resulting output is a binary file that contains the weights. The weights are represented in IEEE 754 standar floating point format, 32 bit precision or 64 bit precision depending on the model specification.

The sequence of the weights in the file is layer wise, starting from the weights for the input layer and proceeding towards the weights value of output layer.

Within weights value for each layer, the weights are saved starting from the node index 0 to node index N{i}-1, where, N{i} : count of nodes in that layer, followed by the weight for the bias for that layer.

To express mathematically, the weights are w_{i,j}, where i : index of the node element in the layer, and j : index of the layer itself, starting from input layer to output layer.

N = count of nodes in layer
W = 0…N-1
x = - b ± b 2 - 4 a c 2 a
1 t x x x 1 t 1 x
I i = 1 N