Increased productivity of teaching video production, cutting days of manual work down to about five minutes
Simplified teaching video production and lowering the learning curve, making it easy for anyone to create videos
Open-source MediaPipe framework processes images further to generate visually friendly patterns for visually impaired children
In collaboration with Bethel China, Google volunteers developed 'VisAid Learn,' a video production platform for teaching visually impaired children using Google Cloud AI technology.
Bethel China is a non-profit organization dedicated to providing early education and specialized training for visually impaired children. They help children overcome challenges such as fear of movement, speech delays, and motor skill development delays through targeted training in actions and language, fostering self-care and learning abilities.
Under the Google Serve project, which recruits volunteers within Google every year to help external NGO (Non-Governmental Organization) partners solve various problems, Bethel China is benefiting from the partnership in several ways.
Google volunteers observed firsthand the significant visual challenges affected children face. The children had difficulty perceiving the volunteers standing before them, sensing only a faint shadow. However, these physical obstacles did not hamper the children's curiosity to explore the world. Bethel China prepared some special teaching materials and tools (such as cloth books) for these children. These teaching aids, which may appear simplistic to regular people, have inspired the children and are of great help in cultivating the children's living and learning abilities.
Through investigation and research, Google volunteers found that the existing learning materials were not friendly to the visual cognitive abilities of visually impaired children and could not help them effectively obtain information smoothly. The children's weak vision can only distinguish strongly contrasting colors or shapes. Complex scenes and color combinations in existing learning materials are not compatible with the children's visual cognition, making it impossible for them to learn effectively.
To provide them with a better learning experience, Google started developing teaching materials specifically for visually impaired children to help them overcome visual cognitive barriers. Google recruited more than 40 volunteers and used video editing software to create dozens of teaching videos that are friendly to these children, teaching them basic concepts such as animals, tools, fruits, and shapes. These videos opened a new window for the children to learn and were well received by the children, strengthening Google volunteers' determination to further help them with continuous learning and exploration.
At the beginning of the project, Google volunteers manually created teaching videos using conventional editing software. New volunteers had to learn video editing tools from scratch. This approach was labor-intensive, inflexible, and resulted in low output. Thereafter, Google began using Google Cloud AI technology to generate teaching videos automatically.
As a start, volunteers used Gemini 1.5 Flash to generate video scripts. By inputting topic prompts related to the teaching content, Gemini automatically generated script text. The solution, known for its low latency and cost-effectiveness, leveraged its multimodal capabilities to generate script text suitable for teaching visually impaired children.
Next, the generated script text was fed into Vertex AI's image generator Imagen, which used its text-to-image capabilities to create images containing the teaching content. Using AI to generate images brought convenience without facing copyright issues. The raw images were then processed using Google's open-source MediaPipe framework, which separated objects from the background. Turning the background completely black or white and outlining objects in red or white made the images more visually friendly for children. The team also used MediaPipe for development and deployment on the device side, processing large volumes of images at a low cost.
Finally, Text-to-Speech was used to generate voiceovers for each video, followed by combining all the prepared images and voiceovers to create the final end products.
To make it easier for other people to use, the volunteers integrated these features into the 'VisAid Learn' platform. Through the collaboration of multiple Google AI agents, the platform efficiently generates teaching videos for visually impaired children. Accessible through a web interface, users do not need to learn video editing software. By providing simple text prompts or images, they can complete the production of teaching videos, significantly improving efficiency and flexibility. This platform not only enhances the productivity of volunteers but can also be used by Bethel China staff or parents.
With the help of AI technology, Google and Bethel China have efficiently produced many teaching videos with rich themes, illustrated content, and voice narration, significantly improving the learning outcomes of visually impaired children. Seeing the children immersed in these learning videos, happily and actively exploring the unknown world, is the greatest encouragement and comfort for the hard work. While circumstances may not have provided these children with clear vision, AI technology has helped them see a colorful world.
Based on the feedback from Bethel China, 'VisAid Learn' has consistently produced excellent results. Google volunteers continue to work with Bethel China to help more visually impaired children. They will collaborate with 15 schools for the blind and special education schools, using the platform to assist over 2,000 visually impaired children and opening new learning paths for them. Additionally, Bethel China plans to share VisAid Learn with institutions in India, the Philippines, and Malaysia to help visually impaired children globally.
Google's mission is to organize the world's information and make it universally accessible, and VisAid Learn is one example of this mission in action. The team will continue to work hard using AI technology to help more visually impaired children see a better world.
Bethel is a non-profit organization running two projects in China: The "Love is Blind Project," and "Project 555." Bethel is the first and only dedicated organization which targets the Chinese blind and visually impaired orphan population.
Industry: Education
Location: China
Products: Gemini 1.5 Flash, Vertex AI Imagen, Text-to-Speech, MediaPipe