Google Cloud Vision API
Image Matching Use Case Using Web Entities
Cloud Vision API was created by Google to provide application developers the tools they need to build the next generation of applications that can see and understand content within the images. With it, customers can detect a broad set of entities within an image from everyday objects to faces and product logos.
In this blog post, we’ll review a specific use case of Google Cloud Vision API using web entities so you can have a better understanding of how it could be applied at your organization.
Let’s say that you clicked an image and it’s an award-winning photograph that is hosted on your licensed image sharing website, meaning people can pay you to get a copy of it. Now let’s say that you want to know when and where the image is being used and get a list of websites hosting it all over the web. In this use case, Google Cloud Vision API comes to your rescue.
What’s the Catch?
It’s not cheap but it’s also not too expensive for the purpose it solves. Here’s a screenshot of Cloud Vision API’s price terms:
It’s important to know that Cloud Vision API is backed by a data science model which does not produce swift SLAs. Remember, you are matching your actual image against the trillions of images Google has indexed over the years. If you are working with your best connection settings, you can expect a response time somewhere between 3–5 seconds.
If you are using Cloud Vision API with a service account json file, it is highly likely that you will come across this Google Doc page.
I’m not sure about you, but for me, this was a nightmare. Setting up an environment variable in an infrastructure entirely on cloud, which is other than Google Cloud, is no joke. That’s when I found this implementation: https://github.com/VinodhThiagarajan1309/vision/blob/master/src/main/java/ch/rasc/vision/controller/VisionService.java#L58
I was not able to get this working until I did a git search on how to use the service-json file without setting it up as an environment variable. Once you have the service file to set the credentials, you’re all set to hit the API.
What if the number of images you want to apply Google Vision API to range in the millions (better get your credit card ready first). In this case, you need to have a mechanism to programmatically feed the image data and persist the outcome. This is where you would need to split the workflow into 2 steps. The first is for persisting AWS, for which I would suggest S3 using a Spark program. The second is for hitting the Google Vision service, for which I would suggest writing a REST wrapper that hits Spark FlatMap.
In short, image URLs are fed from the Apache Spark Driver Program and for each image URL, the wrapper service is hit.
Let’s Talk About Throttling
In this use case, and with all enthusiasm, I first chose 20 executors and then ran the distributed job. Much to my dismay, as a result I got punched with the REST API throwing a bunch of 500s (Could have thrown better messages, just a proof of concept you see!) Now what happened under the hood? I made say ~100 calls each second, but as I soon realized, Google Vision API is not yet designed to handle such a load.
What I found is that the average SLA is ~3 seconds, meaning I needed to throttle the calls and not keep the circuit cool. So, I brought my executor count to 2 which made the job run well and stable.
15 Is the Magic Number
I didn’t see it in any docs, but I eventually found that you can combine up to 15 image requests into 1 request and get the response back in the order you submitted. The response time surprisingly was ~3 sec too. So, the distributed job was wired in such a way to hit the service in 15 packed batches.
Google Cloud Vision API is currently in its early stages and we can expect faster responses soon. However, for now it is apparent that authenticating and throttling are the pain points in their implementation. What are some of your use cases or how do you plan to use Google Vision API in the future? Let us know in the comments.