Home Development for Android Creating Live Texton Android

Creating Live Texton Android

by admin
Creating Live Texton Android
Structure of the demo application

The activity contains three composable views that display a preview, detected text content and an action button.

The CameraX library is used to handle the preview and image frame analysis. Analysis interface provides the image available to the processor to perform processing on each frame. The analyzer is configured to work in a nonblock model (using the flag STRATEGY_KEEP_ONLY_LATEST ), which always tries to get the last image frame and check if there is text in it.

This demo application uses Huawei OCR library You can also use other OCR libraries, such as Firebase ML kit. The OCR service can run in the cloud or on your device. The OCR service running in the cloud recognizes text with higher accuracy and supports more languages. But the on-device version can provide a real-time process, which is a small ML algorithm that you can add to your application. It has limited language support, you can find more information on here The following recognition steps are very similar to most OCR libraries, so you can replace them with the one you like best.

TextAnalyzer:

class TextAnalyzer(private val onTextDetected: (MLText) -> Unit) : ImageAnalysis.Analyzer.{private val setting = MLLocalTextSetting.Factory().setOCRMode(MLLocalTextSetting.OCR_TRACKING_MODE).setLanguage("en").create()private val analyzer = MLAnalyzerFactory.getInstance().getLocalTextAnalyzer(setting)@SuppressLint("UnsafeOptInUsageError")override fun analyze(imageProxy: ImageProxy) {imageProxy.image?.let { image ->analyzer.asyncAnalyseFrame(MLFrame.fromMediaImage(image, imageProxy.imageInfo.rotationDegrees)).addOnSuccessListener { mlText ->mlText?.let {onTextDetected.invoke(it)}imageProxy.close()}.addOnFailureListener {imageProxy.close()}}}}
  1. Create an instance of the text analyzer MLTextAnalyzer to recognize text in the camera frame. You can use MLLocalTextSetting to specify the support languages and the OCR detection mode. There are two modes, one for detecting text in a single image (OCR_DETECT_MODE) and one for detecting text in a video stream ( OCR_TRACKING_MODE ).The result of scanning the preceding frame is used as a basis for quickly determining the position of the text in the camera frame.

  2. Create an object MLFrame by means of MLFrame.fromMediaImage When you create an MLFrame, you need to pass the value of the rotation angle, but if you use CameraX, the classes ImageAnalysis.Analyzer calculate it for you ( imageProxy.imageInfo.rotationDegrees ).

  3. Pass the object MLFrame in the method asyncAnalyseFrame for text recognition. The result of recognition is defined by the MLText.Block array.

Text display and action

As soon as the text is recognized, the library returns the contents of the detected text and the vertices of the bounding box. The "Share" button is displayed, as well as the detected text above the image. When you click on the text, it will be copied to the clipboard. You can also click the "Share" button to transfer all the texts detected in the image frame.

When getting detected text, I thought to use the component Text and place it over the preview, but that noticeably reduces performance, especially when you have a lot of text blocks. I had to switch to drawing the text on the canvas, which looked better, but the tap action requires more code to process, you will need to check that the tap point is in any area of the text block. The end result will look like this :

Text Recognition

Conclusion

After completing this simple demonstration, I think the hardest part is still the text recognition. The demo application only recognizes English characters. If there is more than one language character in the camera stream or the characters are difficult to recognize, the performance of the demo application will drop dramatically. Using OCR only on the device can improve privacy protection, but it will increase complexity, especially if you want to work with multiple languages at the same time. I think this may be why Live Text on iOS 15 only recognizes seven languages in the initial release.

The second is the user experience, especially when users try to highlight detected text. Drawing text on the canvas makes it appear more accurate, but highlighting will be more difficult. As I mentioned, Live Text is integrated into the iOS system, so it needs to be fast and lightweight. My current implementation took about 130MB of memory at startup and 18MB of installed space. This includes an OCR model of about 500 KB.

Another aspect I haven’t touched on in this demonstration is the machine learning of the device. From detecting text content to understanding context. When more and more actions happen, the local model can be improved based on feedback. You are free to fork repo and create your own Live Text.


Material prepared as part of the course "Android Developer. Professional". If you are interested to learn more about the training format and the program, to get acquainted with the teacher of the course – we invite you to the open house online. Register at Here.

You may also like