Azure Search is getting some killer abilities. I have to admit these are near and dear to my own heart with our work here at SSWUG and on client projects.
Why?
Because visual elements are so important to many systems. From images to video, full text search is support search of the content in those items now. This means that, essentially, Azure search will be transcribing the contents of the rich objects and then using that information in the search fulfillment.
This works really well in the full text search world because it already knows about synonyms and “like” type queriies. Applying that to the auto-transcribed contents of the spoken and presented elements and you can see how effective the search can be. Doesn’t really matter if it’s a perfect transcription. Internally, it has to be good enough for search tokens, good enough to search and match against.
The overall process is nearly the same as associating captions with the rich elements; the service puts out the “captions” and then you associate them with the video. This gives you a chance to review and correct the captions if there are glaring errors.
In work we’ve done with auto-transcriptions before, it get very close on content. Of course it depends on the materials, accents, industry-specific language, etc. But the rhythm of the presentation and the general gist of it generally comes through really well. I think it’s a great addition.
One of the challenges we’ve see in the transcription process is the “who” is saying a given thing. Of course this isn’t critical for searching for content – you’re really just looking for a word or phrase, not necessarily WHO said it. But you can imagine that, going forward (feature request), it would be cool to be able to say
“show me the videos where steve is talking about full-text indexing for the mars project“
It’s not too far-fetched to imagine how we can get there as the engines get better at recognizing different speakers,