VisionADL

VisionADL: Vision-Based Dataset to support activities of daily living for visually impaired individuals

Hochul Hwang Matthew Hersey
DARoS Lab @ UMass Amherst
Dataset 2023

Skip to: [Abstract] [Supplementary Video] [Random Samples]

Abstract: Generative models operate at fixed resolution, even though natural images come in a variety of sizes. As high-resolution details are downsampled away, and low-resolution images are discarded altogether, precious supervision is lost. We argue that every pixel matters and create datasets with variable-size images, collected at their native resolutions. Taking advantage of this data is challenging; high-resolution processing is costly, and current architectures can only process fixed-resolution data. We introduce continuous-scale training, a process that samples patches at random scales to train a new generator with variable output resolutions. First, conditioning the generator on a target scale allows us to generate higher resolutions images than previously possible, without adding layers to the model. Second, by conditioning on continuous coordinates, we can sample patches that still obey a consistent global layout, allowing for scalable training. Controlled FFHQ experiments show our method takes advantage of the multiresolution training data better than discrete multi-scale approaches, achieving better FID scores and cleaner high-frequency details. We also train on other natural image domains including churches, mountains, and birds, and demonstrate arbitrary scale synthesis with both coherent global layouts and realistic local details, going beyond 2K resolution in our experiments.

Summary

The typical preprocessing pipeline for unconditional image synthesize resizes all images to the same size, which discards available pixels. We propose a training procedure which can leverage these additional pixels from higher resolution images for image synthesis.

We treat an image as a continuous 2D surface, where real images and synthesized samples correspond to discretizations of this surface. To deal with images of varied sizes, we sample patches of a fixed size at continuous resolutions and locations.

We can modify our approach to synthesize on a cylindrical image plane, which naturally creates 360 degree panoramas.
Click here to view in video form.

Supplementary Video

Click here to view our supplementary video!

Reference

Hwang, H., Xia, T., Keita, I., Suzuki, K., Biswas, J., Lee, S. I., & Kim, D. System Configuration and Navigation of a Guide Dog Robot: Toward Animal Guide Dog-Level Guiding Work. ICRA 2023.

 @article{hwang2022system,
   title={System Configuration and Navigation of a Guide Dog Robot: Toward Animal Guide Dog-Level Guiding Work},
   author={Hwang, Hochul and Xia, Tim and Keita, Ibrahima and Suzuki, Ken and Biswas, Joydeep and Lee, Sunghoon I and Kim, Donghyun},
   journal={arXiv preprint arXiv:2210.13368},
   year={2022}
 }

Acknowledgements: The website template was adopted from Lucy Chai.