File format

The features are reported in zipped text files. Each file, except the last one, contains the information about 1 million images. The images have been randomly ordered. Thus, any subset can be considered statistically representative of the whole dataset.
Each line is related to one image and contains the following information separated by the tab character:

  • image_id
  • image_hash
  • 4096 float values

Please note that given that features are the activations of the fc6 layer of the Convolutional Neural Network before the ReLu and without any processing (e.g. L2Normalization).
You should avoid unzipping the archive. We suggest reading the text files unzipping on the fly.

Links

Files are only available through an SFTP for which you have to request access to fabrizio.falchi{at}cnr.it

License

The feature set is licensed under Creative Commons 0, meaning they are in the public domain and free for any use. (Use of the original YFCC100M metadata and media is subject to the Creative Commons licenses chosen by the uploaders.) However, we do appreciate credit as indicated.

Cite

A paper that propose the use of this features and ground-truth results as a benchmark for Similarity Search as been presented at 9th International Conference on Similarity Search and Applications (SISAP):

YFCC100M-HNfc6: A Large-Scale Deep Features Benchmark for Similarity Search
Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro and Fausto Rabitti
International Conference on Similarity Search and Applications. Springer International Publishing, 2016.
Bibtex

A paper that describes the dataset and presents performance indexing has been presented at the Multimedia Commons 2016 Workshop in Amsterdam during ACM Multimedia 2016:

YFCC100M HybridNet fc6 Deep Features for Content-Based Image Retrieval
Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti
Proceedings of the 2016 ACM Workshop on Multimedia COMMONS. ACM, 2016
Bibtex