What is automatic image captioning?
- AI writes a description of the contents of an image
Why is image captioning important?
- Image search using sentences.
- Help blind to see the world.
- One step towards AI.
How to achieve automatic image captioning?
- The system consists of 3 parts that mimic humans:
- Vision part: to view different parts of the images, which is like the eyes.
- Language part: to create a sentence.
- The link between vision and language.
- For the “vision part”, we use a convolutional neural network (CNN) to extract image features. For the “language part”, we use a causal convolution network to represent high-level word concepts. An attention model focuses on the important regions while processing the high-level concepts. A gated recurrent unit (GRU) fuses the image and language features.
- Input image, attention map, and generated caption