A Simple Way to Better Computer Vision | MIT News

Before a machine learning model can perform a task, e.g. B. detecting cancer in medical images, the model must be trained. Training image classification models typically involves showing the model in millions of sample images collected in a huge dataset.

However, using real image data can raise practical and ethical concerns: the images might violate copyright laws, violate people’s privacy, or be biased against a specific racial or ethnic group. To avoid these pitfalls, researchers can use image generation programs to create synthetic data for model training. However, these techniques are limited because expert knowledge is often required to hand-design an image generation program that can produce effective training data.

Researchers from MIT, MIT-IBM Watson AI Lab and elsewhere took a different approach. Instead of designing custom imaging programs for a specific training task, they collected a data set of 21,000 publicly available programs from the Internet. Then they used this large collection of basic imaging programs to train a computer vision model.

These programs produce various images that display simple colors and textures. The researchers did not curate or modify the programs, each consisting of just a few lines of code.

The models they trained with this large data set of programs classified images more accurately than other synthetically trained models. And although their models outperformed models trained with real data, the researchers showed that increasing the number of image programs in the data set also increased model performance, showing a way to achieve greater accuracy.

“It turns out that using a lot of programs that aren’t curated is better than using a small number of programs that people need to manipulate. Data is important, but we’ve shown that you can get pretty far without real data,” says Manel Baradad, an Electrical Engineering and Computer Science (EECS) PhD student working at the Computer Science and Artificial Intelligence Laboratory (CSAIL) and lead author of das Paper describing this technique.

Co-authors include Tongzhou Wang, an EECS student in CSAIL; Rogerio Feris, senior scientist and manager at MIT-IBM Watson AI Lab; Antonio Torralba, Professor of Electrical Engineering and Computer Science at Delta Electronics and member of CSAIL; and senior author Phillip Isola, Associate Professor in EECS and CSAIL; along with others at JPMorgan Chase Bank and Xyla, Inc. Research results will be presented at the Conference on Neural Information Processing Systems.

Rethink preschool

Machine learning models are typically pre-trained, meaning they are first trained on a data set to help them create parameters that can be used to accomplish some other task. A model for classifying X-rays could be pre-trained on a huge data set of synthetically generated images before being trained for its actual task on a much smaller data set of real X-rays.

These researchers have previously shown that they can use a handful of image generation programs to create synthetic data for model pretraining, but the programs had to be carefully designed so that the synthetic images match certain properties of real images. This made it difficult to scale up the technique.

In the new work, they instead used a vast data set of uncurated image generation programs.

They started by compiling a collection of 21,000 image generators from the Internet. All programs are written in a simple programming language and consist of only a few code snippets, so they generate images quickly.

“These programs were created by developers around the world to produce images that have some of the properties that we are interested in. They produce images that look like abstract art,” explains Baradad.

These simple programs run so quickly that the researchers didn’t have to pre-create images to train the model. Researchers found they could generate images and train the model at the same time, streamlining the process.

They used their vast data set of image generation programs to pre-train computer vision models for both supervised and unsupervised image classification tasks. In supervised learning, the image data is labeled, while in unsupervised learning, the model learns to categorize images without labeling.

improvement in accuracy

When they compared their pre-trained models to state-of-the-art computer vision models that had been pre-trained with synthetic data, their models were more accurate, meaning they placed images in the correct categories more often. While accuracy levels were still lower than models trained on real data, their technique reduced the performance gap between models trained on real data and those trained on synthetic data by 38 percent.

“Importantly, we show that performance scales logarithmically for the number of programs collected. We don’t saturate performance. So if we collect more programs, the model would perform even better. So there is an opportunity to expand our approach,” says Manel.

Researchers also used each individual pre-training imaging program to uncover factors contributing to model accuracy. They found that the model performs better when a program generates a more diverse set of images. They also found that colorful images with scenes that fill the entire canvas tend to improve model performance the most.

Now that they have demonstrated the success of this pre-training approach, the researchers want to extend their technique to other types of data, such as multimodal data containing text and images. They also want to continue exploring ways to improve image classification performance.

“There is still a gap that needs to be filled with models trained on real data. This gives our research a direction that we hope others will follow,” he says.


Leave a Reply

Your email address will not be published. Required fields are marked *