The official implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance". In this repository, we provide the predicted scores from the Guidance Model using MAD Dataset.
07/14/2023: "Localizing Moments in Long Video Via Multimodal Guidance" was accepeted at ICCV 2023.
If you find this implementation useful in your research, please use the following BibTeX entry for citation:
@article{Barrios2023LocalizingMI,
title={Localizing Moments in Long Video Via Multimodal Guidance},
author={Wayner Barrios and Mattia Soldan and Fabian Caba Heilbron and Alberto M. Ceballos-Arroyo and Bernard Ghanem},
journal={ArXiv},
year={2023},
volume={abs/2302.13372}
}
Guidance Training code is located in the Guidance directoy.
The provided predictions correspond to the scores generated by the Guidance model using sliding windows of 64 frames and 128 frames in length. The predictions are stored in a pickle object with the following structure:
In [1]: import pickle
In [2]: with open("guidance_scores_MAD_test_128.pkl",'rb') as f:
...: scores = pickle.load(f)
In [3]: len(scores)
Out[3]: 72044
In [4]: scores[0].keys()
Out[4]: dict_keys(['qid', 'vid', 'windows', 'score'])
{ 'qid': '0',
'score': array([1.48404761e-05, 1.40372722e-05, 1.46572347e-05, 1.28814381e-05,
1.34291167e-05, 1.32850864e-05, 1.61252574e-05, 6.24697859e-05,
4.70118430e-05, 1.63803907e-05, 2.77301951e-05, 2.59740209e-05,
9.86061990e-01, 4.11081433e-01, 1.71889886e-02, 1.37453452e-01,
1.75393507e-05, 1.92647931e-05, 5.38236709e-05, 6.90551009e-04,
7.63237834e-01, 9.73204970e-02, 1.73201097e-05, 2.48163269e-05,
5.99260893e-05, 1.84824003e-05, 2.14560350e-05, 1.04043145e-04,
5.24206553e-05, 1.88337926e-05, 1.62523775e-05, 1.23760619e-05,
1.15747998e-05, 1.85713252e-05, 3.93810224e-05, 4.38277610e-04,
4.63226315e-05, 2.76185543e-04, 6.71112502e-05, 2.05889755e-05,
5.27229131e-05, 4.56629896e-05, 2.62997986e-04, 1.23860036e-05,
1.19574897e-05, 1.27713274e-05, 1.34036281e-05, 1.49246125e-05,
1.66437039e-05, 1.32685755e-05, 1.36442995e-05, 1.39407657e-05,
9.44265649e-02, 5.19266985e-02, 3.09179362e-04, 1.66565824e-05,
1.52278981e-05, 1.34415832e-05, 1.16731699e-05, 1.19617898e-05,
1.34421471e-05, 1.35606424e-05, 1.40685788e-05, 1.44712585e-05,
1.49164434e-05, 1.32006107e-05, 1.23232739e-05, 1.22480678e-05,
1.36934423e-05, 8.42598165e-05, 1.90059054e-05, 1.52820303e-05,
1.25335091e-05, 1.30556955e-05, 1.18760063e-05, 1.14885261e-05,
1.17362497e-05, 1.12321404e-05, 1.24243248e-04, 1.45946506e-05,
4.47804232e-05, 1.39249141e-05, 1.34848015e-05, 3.25621368e-05,
1.44184843e-01, 2.68866897e-05, 1.92906227e-05, 1.76019021e-05,
1.58657276e-05, 1.28230713e-05, 1.28012252e-05, 1.29981381e-05,
1.67807830e-05, 1.70492331e-05, 1.40562279e-05, 1.61650114e-05,
1.47591518e-05, 1.63778402e-02, 1.42061428e-04, 6.93475548e-03,
6.02264590e-05, 8.72147648e-05, 9.83794928e-01, 9.91553962e-01,
9.63991106e-01, 8.97689939e-01, 1.28758256e-04, 2.88744595e-05,
1.70378244e-05, 2.29878224e-05, 2.43768354e-05, 1.59022475e-05,
1.30911794e-05, 1.81753130e-05, 2.05728411e-05, 1.25869919e-05,
1.25580364e-05, 1.16062802e-05, 1.37536981e-05, 1.34730390e-05,
1.40373795e-05, 1.33059066e-05, 1.30285189e-05, 1.37811385e-05,
2.23064744e-05, 1.44057722e-05, 1.42116378e-05, 1.93661017e-05,
1.58555758e-05, 1.43071402e-05, 1.38224150e-05, 1.28803194e-05,
1.20950817e-05, 1.41009232e-05, 1.45958602e-05, 1.23285527e-05,
1.38767664e-05, 1.59005958e-05, 1.49218240e-05, 1.21883040e-05,
1.24096860e-05, 1.63976423e-04, 3.71323113e-05, 1.49581110e-05,
1.28865731e-05, 8.20189889e-05, 1.94104978e-05, 1.45575204e-05,
1.19119395e-05, 1.17359577e-05, 1.33997301e-05, 1.31552797e-05,
1.29547625e-05, 1.46081702e-05, 1.37864763e-05, 2.89076870e-05,
2.40834688e-05, 2.44160365e-05, 3.74382762e-05, 4.72434871e-02,
1.53820711e-05, 1.25494762e-05, 1.16858791e-05, 1.33582507e-05,
6.86281201e-05, 1.72452001e-05, 1.32617952e-05, 1.24350836e-05,
1.32563446e-05, 1.50281312e-05, 2.07685662e-05, 3.12883203e-05,
5.31642836e-05, 7.05183193e-05, 1.51949525e-05, 1.41901855e-05,
1.51822069e-05, 3.32951342e-04, 8.94680124e-05, 1.65749607e-05,
2.18829446e-05, 2.16037024e-05, 1.89978218e-05, 4.97834710e-03,
2.03153506e-01, 1.54585496e-03, 1.23195614e-05, 1.28703259e-05,
1.51874347e-05, 1.30843009e-05, 1.32952518e-05, 1.83968314e-05,
3.42841486e-05, 9.24622072e-05, 1.33280428e-05, 1.38418063e-05,
1.52235261e-05, 1.41796754e-05, 1.46450093e-05, 2.20195379e-05,
1.83107302e-04, 1.82420099e-05, 1.50840988e-05, 1.33859876e-05,
1.51073200e-05, 1.47391929e-05, 1.49910848e-05, 1.53916826e-05,
1.31657725e-05, 1.38312898e-05, 1.90024621e-05, 1.58155744e-05,
1.31786610e-05, 1.57141967e-05, 1.65828824e-05, 1.46924167e-05,
1.38433634e-05, 5.21887268e-05, 2.85502132e-02, 2.30753481e-01,
7.06195598e-04, 1.50714346e-04, 1.27303065e-03, 1.33986650e-02,
7.64285505e-04, 2.07327234e-04, 6.83149046e-05, 3.26294066e-05,
3.00217052e-05, 3.59058060e-04, 1.75943842e-05, 4.50351909e-05,
6.54372343e-05, 7.06970895e-05, 3.67312983e-04, 1.05719395e-01,
4.43235294e-05, 2.82063011e-05, 7.51458792e-05, 1.61291231e-04,
4.26617444e-05, 8.98458238e-05, 5.37320266e-05, 7.81280905e-05,
4.74652685e-02, 6.73964678e-04, 7.80265400e-05, 2.98924297e-05,
4.71418061e-05, 9.99735785e-05, 5.41929447e-04, 8.76590490e-01,
7.32870936e-01, 9.47873652e-01, 9.83479261e-01, 9.41197515e-01,
3.02340268e-05, 5.52863061e-01, 4.90591303e-02, 5.52392844e-03,
1.66527767e-04, 6.01128559e-05, 2.75078182e-05, 5.36037696e-05,
2.72706511e-05, 5.20218709e-05, 1.74067172e-04, 9.59624112e-01,
9.92105484e-01, 6.41801059e-01, 7.50956178e-01, 1.66324535e-05,
1.36247700e-05, 1.38954510e-05, 1.32978639e-05, 2.76602568e-05,
8.64359558e-01, 2.82314628e-01, 6.86250278e-04, 1.61339794e-05,
1.76240802e-01, 6.14342950e-02, 1.79430062e-05, 1.85770459e-05,
2.49132900e-05, 4.90641105e-05, 1.38329369e-05, 1.35371911e-05,
1.19879533e-05, 1.28572465e-05, 1.49452917e-05, 1.34064794e-05,
1.20641280e-05, 1.38642654e-05, 1.28597740e-05, 1.21135636e-05,
1.19547185e-05, 1.27106450e-05, 1.24800990e-05, 1.45651029e-05,
1.51306494e-05, 1.31757206e-05, 1.44625528e-05, 2.93072371e-05,
1.55961770e-05, 1.38226005e-05, 2.85501122e-01, 9.54893649e-01,
4.26807284e-01, 7.88133383e-01, 1.15605462e-05, 1.27675758e-05,
1.74503912e-05, 1.22338257e-04, 4.07951375e-05, 6.67655331e-05,
2.63322181e-05, 6.43799603e-01, 9.40359533e-01, 8.85976017e-01,
4.58170444e-01, 1.68637175e-03, 5.94505800e-05, 9.05500948e-01,
3.18567127e-01, 4.67336411e-03, 2.84927974e-05, 3.81192891e-03,
4.18508105e-04, 6.88799983e-03, 9.18629944e-01, 8.45510900e-01,
1.88187569e-01, 1.15205767e-02, 6.14926934e-01, 9.16110933e-01,
3.21912378e-01, 9.68408361e-02, 2.36877706e-03, 3.30457231e-04,
9.32341874e-01, 6.69624686e-01, 3.61131132e-02, 4.71764088e-01,
3.23702669e-04, 5.40765934e-04, 2.96235172e-04, 1.00755557e-01,
2.59187482e-02, 9.91479377e-04, 5.00017107e-02, 9.33302939e-03,
8.73835742e-01, 9.06303883e-01, 1.98892485e-02, 2.06603622e-03,
2.67300452e-03, 1.63171062e-05, 4.14947972e-05, 2.11949199e-02,
5.66720143e-02, 6.37245998e-02, 3.02139521e-01, 4.86139301e-03,
6.51149167e-05, 8.24632589e-05, 2.42551632e-05, 2.16892213e-01,
9.93161321e-01, 9.07774687e-01, 9.85157251e-01, 7.91489899e-01,
6.24064269e-05, 2.82448274e-03, 6.10993884e-05, 4.63459146e-05,
6.72110255e-05, 2.53440558e-05, 2.50527592e-05, 4.85404918e-04,
7.80891351e-05, 4.56315975e-05, 1.90765320e-04, 8.94685328e-01,
9.85134244e-01, 9.36044097e-01, 1.42211165e-05, 1.49489415e-05,
1.69001578e-05, 1.66201044e-05, 2.41175085e-01, 5.41068694e-05,
1.77346919e-05, 3.90491296e-05, 2.48894852e-04, 1.45345357e-05,
1.64555768e-05, 1.53538731e-05, 1.38164451e-05, 1.68559291e-05,
3.19991705e-05, 2.60154466e-05, 1.41664159e-05, 1.22337908e-04,
4.30386774e-02, 3.52067378e-04, 2.77736799e-05, 1.43605203e-05,
1.33721569e-05, 1.43800498e-05, 1.23751524e-05, 2.31819286e-05,
9.83208010e-05, 2.08199883e-04, 3.14763274e-05, 3.47468827e-04,
1.10434856e-04, 3.18150487e-05, 1.72609471e-05, 2.70375167e-05,
1.67231119e-05, 1.80254483e-05, 2.09855771e-05, 1.66565824e-05,
1.64901703e-05, 3.01825115e-04, 7.29017615e-01, 1.12410297e-03,
6.18876831e-04, 2.08720026e-04, 2.29539564e-05, 1.47635437e-05,
4.10786743e-05, 2.57481169e-02, 8.77772836e-05, 4.92439649e-05,
9.44633852e-04, 2.61720526e-03, 8.41950595e-01, 8.63339067e-01,
5.76047751e-05, 8.71496499e-01, 9.07008648e-01, 8.54207218e-01,
3.62060557e-04, 7.98364286e-04, 7.50755966e-02, 3.81207588e-04,
6.62766863e-03, 1.50808028e-03, 5.67528963e-01, 7.69607246e-01,
4.62092081e-04, 1.82087897e-04, 9.24605787e-01, 9.67480242e-01,
3.22210602e-03, 3.38318609e-02, 7.42516349e-05, 2.80661490e-02,
7.69108906e-03, 8.99414954e-05, 5.23393810e-01, 7.17914104e-01,
5.11704478e-04, 2.06177612e-03, 7.79069304e-01, 1.28432157e-05,
1.51723981e-01, 7.02154310e-03, 9.71324384e-01, 8.30839634e-01,
6.24295863e-05, 1.97836489e-05, 1.80826428e-05, 1.67380622e-05,
1.57646009e-05, 5.96713580e-05, 7.05929342e-05, 2.16401986e-05,
1.69063496e-05, 1.36657072e-05, 1.44965925e-05, 2.01106413e-05,
1.66287136e-05, 1.51022632e-05, 1.20727018e-05, 1.36815515e-05,
1.57434170e-05, 3.38077080e-03, 1.93943546e-04, 1.50704973e-05,
1.36058252e-05, 1.23554828e-05, 1.20090635e-05, 1.20484674e-05,
1.15330831e-05, 1.24158278e-05, 1.21374187e-05, 1.20495934e-05,
1.25650204e-05, 1.16137307e-05, 1.18198168e-05, 1.15763123e-05,
1.24373146e-05, 1.25643137e-05, 1.48531772e-05, 1.28844113e-05,
1.19790957e-05, 1.42352001e-05, 2.61451223e-05, 1.16819347e-05],
dtype=float32),
'vid': '3001_21_JUMP_STREET',
'windows': array([[ 0, 128],
[ 64, 192],
[ 128, 256],
...,
[32576, 32704],
[32640, 32768],
[32704, 32832]])}
The predictions are available on HuggingFace repo.
The following command line run the Guidance Training at query-dependent setup:
$ bash guidance/scripts/train.sh --num_workers 20
To download Audio Features from MAD dataset use the following link: Google Drive File. Do not forget to cite if you use the audio features only.
To do the scoring fusion please refer the examples located in the following directory.
email: [email protected] or [email protected]