I mostly spend my time designing and developing machine learning
models to aid biomedical applications. My research interests fall under
the umbrella of multimodal representation learning.
However, I am still on the exploration journey
to find my niche. I would
be happy to talk more on anything related:
Topics of interest
Intersection of video and language understanding.
Are models capable of continually learning new tasks on videos?
Can we stop training to align embeddings from different
off-the-shelf foundation models?