MIRFLICKR-1M s16a extension

The MIRFLICKR-1M-s16a dataset provides an extension to the official MIRFLICKR-1M collection. Additional user generated annotation data has been downloaded and deep feature representations have been extracted for all 1 million images. All data is made publicly available for research purposes.

The Dataset

The MIRFLICKR-1M collection provides 1 million images downloaded from the social photography site Flickr through its public API. The images are provided under The Creative Commons attribution licenses, which allow for image use as long as the photographer is credited for the original creation.

The initial dataset provides the image data as well as raw user tags an EXIF information if available. In the context of leveraging user generated annotations for reliable groundtruth generation we downloaded additional metadata for all images upon availability.

User annotations

Metadata Type available for #images
title 864,081
description 607,663
tags 858,918
EXIF data 688,294
geo information 282,091
album allocation 760,702
user comments 851,174
notes 102,252
group allocation 740,263

We publish the data for research use:


The data is provided as individual JSON files structured similarly to the original dataset in chunks of 10,000 files. Each file is named according to the corresponding photo JPG to allow for easy mapping between photos and metadata.

Deep convolutional features

It has been shown that features extracted from the activation of a deep convolutional neural network, which has been trained to separate individual visual concept categories on a large dataset can be reused and adapted to novel classification tasks. It has been further shown that these novel tasks may differ from the original training scenario and that deep feature encodings significantly outperform any previously presented "shallow" encodings.

In order to obtain compact visual feature representations for all images in the MIRFLICKR-1M collection we extracted deep features taking a convolutional model pre-trained in the ILSVRC-2012 dataset. The model is provided as part of Caffe CNN implementation. The features we provide are based on the pen-ultimate (fc-7) layer as well as the last (fc-8) layer stored in 4,096 and 1,000 dimensional feature vectors.

We provide the data in chunks of 10,000 individual binary files aligned with the individual photos (e.g. 10/103169.jpg ->10/103169.dat):


We provide a small python script to extract both feature vectors from a binary file:

import numpy as np
import struct

def load_caffe_fc7fc8(fn):
    with open(fn, 'rb') as f:
        #unpack dimension of fc8 layer - size is 1000:
        fc8_dims = struct.unpack('=q',f.read(8))[0]
        assert(fc8_dims == 1000)
        #unpack 1000 float32 values
        fc8_data = list(struct.unpack('={0}f'.format(fc8_dims), bytearray(f.read(fc8_dims*4))))
        #unpack dimension of fc7 layer - size is 4096
        fc7_dims = struct.unpack('=q',f.read(8))[0]
        #unpack 4096 float32 values
        fc7_data = list(struct.unpack('={0}f'.format(fc7_dims), bytearray(f.read(fc7_dims*4))))
        #return both feature vectors as tuple of numpy matrices (fc7,fc8):
        return (np.asarray(fc7_data).reshape(1,fc7_dims) , np.asarray(fc8_data).reshape(1,fc8_dims))