Developing CTC loss for Tensorflow.JS
Tensorflow.JS as of 2021.12.23 lacks a native implementation of the CTC loss. For practical purposes, I’ve decided to dive into the academic papers, and have a shot at it.
The implementation itself is available at GitHub, go check it out:
This story will document my approach to this problem, and perhaps you can find some inspiration in it. Also, I plan to write some stories on some common problems you can effectively solve in Tensorflow, so be sure to follow up.
Learn about the problem itself
There are some articles one should read about the usage of the CTC algorithm. They are a good starting point to understand why we need it:
- https://towardsdatascience.com/build-a-handwritten-text-recognition-system-using-tensorflow-2326a3487cd5
- https://distill.pub/2017/ctc/
- https://towardsdatascience.com/intuitively-understanding-connectionist-temporal-classification-3797e43a86c
The papers describing the algorithm are here:
- https://www.cs.toronto.edu/~graves/icml_2006.pdf — Graves et al.: Connectionist Temporal Classification: Labeling Unsegmented Sequence Data with Recurrent Neural Networks
- https://www.isca-speech.org/archive_v0/Interspeech_2017/pdfs/1557.PDF — An Efficient Phone N-gram Forward-backward Computation Using Dense Matrix Multiplication
- http://bacchiani.net/resume/papers/ASRU2017.pdf — Improving the efficiency of forward-backward algorithm using batched computation in Tensorflow
Lectures:
- https://axon.cs.byu.edu/~martinez/classes/778/Papers/CTC.pdf — Logan Mitchell: Sequence to sequence learning
- https://www.youtube.com/watch?v=c86gfVGcvh4 — Carnegie Mellon University Deep Learning, S18 Lecture 14: Connectionist Temporal Classification (CTC)
- https://www.youtube.com/watch?v=GxtMbmv169o — Carnegie Mellon University Deep Learning, F18 Recitation 8: Connectionist Temporal Classification (CTC)
Existing solutions
The only thing comes close to the native JS implementation is marsiancba’s solution that was commented in this issue: https://github.com/tensorflow/tfjs/issues/1759
However, I couldn’t wrap my head around some of the implementation’s peculiarities (namely, calculation of beta variables and input matching). However, it’s usage of the Tensorflow operators is very advanced, worth checking out.
The Tensorflow Python implementation is definitely worth checking out: ctc_ops.py
My personal favorite is the Stanford CTC implementation, since it is understandable for folks who are not well versed in Python: https://github.com/amaas/stanford-ctc/blob/master/ctc/ctc_fast.pyx The code is very easy to read, and you can relate much of it to the original paper.
Tensorflow architecture
If you want to develop your custom loss calculator, first check this description: https://www.tensorflow.org/js/guide/custom_ops_kernels_gradients Follow the examples, check them out. For me, it helped a lot to examine the logLoss calculation since it’s pretty simple. Also, there are other reading materials available: https://towardsdatascience.com/creating-custom-loss-functions-using-tensorflow-2-96c123d5ce6c
So the key here is to develop a custom gradient — custom operation, which returns a Tensor with the calculated loss, and a custom gradient function which can be called to calculate the gradients. The infrastructure is set up that during the loss calculation, parameters and intermediate calculations can be saved, which can be reused during the gradient calculation.
Develop a naive implementation
Throughout the implementation you should have two goals:
- Have everything calculated in tensors
- Keep everything you can in the GPU’s memory (in our case WebGL)
From my first experience, this won’t happen in your first try. Tensors are immutable, accessing values are a pain, and you sometimes just throw in the towel and do things the old fashioned way: get an array representation, do the trick as you are used to, then convert the result back into a tensor and move along. This is fine for getting things working, just don’t forget, that there is a reason for pressing work to be done the ‘Tensorflow way’. More on that later.
CTC is a dynamic algorithm, which means a lot of slicing / concatenation / conditional stuff is happening. It will take time to get there.
Test
CTC is a tricky algorithm, it has it’s perks. The main problem is, that it requires lot’s of data for the input, generates lot’s of outputs, and it is hard to assemble a list of inputs — expected outputs pairs without doing tedious calculations. My approach is that there are obvious cases:
- matching inputs and labels should return a zero loss and zero gradiens
- random noise inputs should produce “something” other than error
- should handle single elements and batched elements
- should run correctly with different length labels
- running model.fit() with 10 epochs we should see a decreasing loss
I’ve found it inevitable to calculate some results by hand. Here is one resolving “CA” when we should have found “CAT”:

Check the Excel at the GitHub repository for the details.
Refactor cycle
I’ve mentioned that it is ok to start with JavaScript arrays and stuff, but let me explain why we need to move to tensors. You see, Tensorflow is organized to have the concept of kernel functions — this means the codebase can utilize different implementations to execute the operations depending on what infrastructure is available without having the programmer rework everything for the sake of different platforms.
Think about it this way: you have two big, multidimensional tensors, that you need to multiply element-wise. You would use something like this:
const result = tensorA.mul(tensorB);
If you would need to do it in a pure JavaScript way, with arrays, you would implement the cycles, do the math, and return a new array with the results. The execution depending on what’s available for Tensorflow can be really different:
- tfjs only — just like you would do it in pure javascript, but somebody has programmed it for you. Sweet. Since modern JS engines are lightning fast, you’ll get surprised how fast this implementation can run
- tfjs-wasm — the core is implemented in WebAssembly, so there’s a significant improvement on performance. Not all functions are supported though. It is by far the fastest one if you don’t use any CUDA-based
- tfjs-node — kernel functions run natively on the processor, so you have the full power of your CPU, including the special instruction-sets you might have. However, usually you have to recompile Tensorflow from scratch to make use of it.
- tfjs-webgl — kernel functions take advantage of the parallel processing capabilities of your GPU
You might think that every engine brings at least a two-fold drop in execution time, but that’s not the case. The speed is so much reliant on the backend implementation, it is essential to have as many things as possible “tenosry”. It’s not trivial — some of the operators are (albeit very useful) pretty hard to grasp. To give you a small glimpse, here’s the implementation’s batch item calculation performance chart for the different implementation and backend styles:

That’s where the learning process and the cycle kicks in.
Summary
So, to sum up, just give yourself time to get accustomed to Tensorflow. It’s not trivial at first, but you’ll love it. Just get things started.
If you are stuck, you can always reach out to the developer community at the Tensorflow Forum.