Arinjoy Basak

Arinjoy Basak

PhD Student, Dept. of Computer Science

Graduate Research Assistant, NDSSL, Biocomplexity Institute

Member of the Honor Society of Phi Kappa Phi

Virginia Tech

Contact Me

About Me

I am a second-year PhD student in the Department of Computer Science at Virginia Tech. I am broadly interested in Data Mining, Big Data Analytics, Machine Learning, and Artificial Intelligence. I am currently working with Dr. Anil Kumar S. Vullikanti at the Network Dynamics and Simulation Sciences Laboratory. My Current research focus broadly covers problems related to networks in Biology and Epidemiology, with the current focus being detection of dense subgraphs.

I am also a member of the Phi Kappa Phi Honor Society, Virginia Tech chapter, having been inducted in December 2017. Phi Kappa Phi ( is a national Honor Society that recognizes and promotes academic excellence in all fields of higher education and engages the community of scholars in service to others. The society derives its name from the Greek letters forming its motto - Philosophía Krateítõ Phõtôn, "Let the love of learning rule humanity." Phi Kappa Phi is the only national society that recognizes excellence across all disciplines. Here is a list of the most noted members of the Phi Kappa Phi honor society. You can find the link to my Phi Kappa Phi merit page here.

Prior to joining CS@VT, I have worked in Dimensionality Reduction algorithms for Data Mining, Educational Big Data Analytics, and Cryptography. More details about my research work can be found here.

In my spare time, I also like to pursue photography and music, as well as catch up on my ever increasing reading list. No need to go too far - you can find what I'm listening to right now by just clicking here.


Bachelor of Engineering, Computer Science and Technology

IIEST, Shibpur (formerly BESU, Shibpur) (August, 2012-May, 2016)

Primary and Secondary School

St. Xaviers's Collegiate School, Kolkata (2000-2012)


Intro to Python (CS 1064)

Graduate Teaching Assistant, also Guest lecturer. Spring 2017

Intro to Python (CS 1064)

Graduate Teaching Assistant Fall 2016


Theory of Algorithms (CS 5114)

Dr. Sharath Raghvendra, Dept. of Computer Science, Virginia Tech Fall 2017

Numerical Analysis 1 (CS 5465)

Dr. Jeffery Borggaard, Dept. of Mathematics, Virginia Tech Fall 2017

Convex Optimization (AOE 5734)

Dr. Mazen Farhood, Dept. of Aerospace and Ocean Engineering, Virginia Tech Spring 2017

Statistical Inference (STAT 5114)

Dr. Scotland Leman, Dept. of Statistics, Virginia Tech Spring 2017

Probability and Distribution (STAT 5104)

Dr. Leanna House, Dept. of Statistics, Virginia Tech Fall 2016

Data Analytics 1 (CS 5525)

Dr. Chandan Reddy, Dept. of Computer Science, Virginia Tech Fall 2016

Data Analytics 2 (CS 5526)

Dr. Xinwei Deng, Dept. of Statistics, Virginia Tech Spring 2018

Research Experiences

PhD Student - CS@VT - Virginia Tech (February 2017 - present)

Working with Dr. Anil Kumar S. Vullikanti at the Network Dynamics and Simulation Sciences Laboratory. The work done so far here includes the following: 1) Detection of Dense Subgraphs in RNA-Seq data, which was presented at the ICSB 2017 Conference; 2) Formulation of anomaly detection in graphs with uncertainty, which has been accepted for presentation at the AAAI-2018 Conference.

PhD Student - CS@VT - Virginia Tech (August 2016 - February 2017)

Worked with Dr. Francisco Servant on Software Analytics at SEALAB. The project work focused on the study of change integration lengths and practices in Continuous Integration environments with a focus on Travis-CI enabled environments. In particular, we performed an empirical study to understand the how performing intermediate integrations balances the benefits of continuous integration and the expenses of continuously performing it, and developed different measures to observe the benefits obtained. We submitted a paper on this work to MSR 2017 Mining Challenge.

Bachelor's Degree Project Work IIEST Shibpur, Howrah, India (August 2015 - May 2016)

I was working with Dr. Asit Kr. Das on the development of Graph Based Feature Selection Algorithms for my final year project, which resulted in a conference paper which was accepted at the IEEE-IEMCON 2016 Conference at Vancouver, Canada. The paper received the Best Paper in Data Mining Award there.

Ekalavya 2015 Summer Intern - IIT Bombay, India (May 2015 - July 2015) [link]

My main work was focused on developing Data Analytics components for IITBombayX Insights. I worked on the development of a Data Analytics module for InSights which would determine the regions of the lecture videos that were difficult for the students to grasp or understand. The work was supported by the MHRD (Ministry of Human Resource Development) National Mission on Education through ICT (Information and Communication Technology) undertaken by the institute, and all the R&D contributions made by the students were released in open source.

I also worked on the Software Specification of the Blended MOOCs system IITBombayX during first part, focusing on creation of Use Cases for the MIS Systems for IITBombayX.

Summer Research Internship - Indian Statistical Institute, Kolkata, India (May 2014 - July 2014)

I worked under Prof. Sushmita Ruj, in the group of Dr. Bimal Kr. Roy, Director of ISI and the Head of the Cryptology group in the sphere of Unattended Wireless Sensor Networks. I worked on developing a non-cryptographic technique of achieving data survivability and confidentiality in wireless sensor networks. We also published a paper on this work, which was subsequently accepted in the IEEE - International Conference on Advanced Information Networking and Applications (AINA) – 2015 and for presentation in the IEEE CPS Proceedings, held in Gwangju, Korea, from 24th to 27th March, 2015.

Project Work Indian Institute of Engineering Science and Technology, Shibpur, Howrah, India (August 2013 - May 2014)

I was working on Feature Selection algorithms in Data Mining with Dr Asit Kr. Das and with Dr. Saptarshi Ghosh, Department of Computer Science and Technology, IIEST Shibpur. My work was divided in two parts - Study of Rough Set theory and implementation of the Quick Reduct Extraction Algorithm using Rough Set theory, and development of an algorithm for dynamic extraction of most relevant features from a dataset using graph based algorithms.

My Current Curriculum Vitae

Recent Publications and Reports:

Graph Scan Statistics with Uncertainty

Graph Scan Statistics with Uncertainty

Jose Cadena, Arinjoy Basak, Anil Vullikanti, Xinwei Deng

The 32nd Conference on Artificial Intelligence (AAAI 2018), New Orleans, Louisiana, USA, February 2018

Scan statistics is one of the most popular approaches for anomaly detection in spatial and network data. In practice, there are numerous sources of uncertainty in the observed data. However, most prior works have overlooked such uncertainty, which can affect the accuracy and inferences of such meth- ods. In this paper, we develop the first systematic approach to incorporating uncertainty in scan statistics. We study two formulations for robust scan statistics, one based on the sam- ple average approximation and the other using a max-min objective. We show that uncertainty significantly increases the computational complexity of these problems. Rigorous algorithms and efficient heuristics for both formulations are developed with justification of theoretical bounds. We evaluate our proposed methods on synthetic and real datasets, and we observe that our methods give significant improvement in the detection power as well as optimization objective, relative to a baseline.

Check out the paper here

project name

Finding Coordinated Expression Motifs in RNA-Seq Data

Arinjoy Basak, Clark Cuccinel, Alexandra Cummings, Jose Cadena, Andrew Warren, Rebecca Wattam, Allan Dickerman, Anil Vullikanti

The 18th International Conference on Systems Biology (ICSB 2017), Virginia Tech, Blacksburg, Virginia, USA, August 2017

Advances in high-throughput sequencing technologies have led to a high volume of public RNA-seq data, enabling assembly of large data sets to search for novel biological patterns not visible to individual studies, although methods for doing so remain a significant challenge. The use of clusters and bi-clusters is a popular unsupervised machine learning approach for discovering co-expressed, and hence functionally related, gene sets. Different notions of clustering have been used, including graph-theoretical methods based on density and hierarchical clustering. Expression data can be viewed as a signed dataset, with up or down regulation captured by a positive or negative quantity, respectively. However, most of these prior approaches tend to ignore the signs and works on unsigned data. This is partly because the analysis of signed data tends to be much more challenging. We develop a novel approach for finding coordinated motifs of expression by formalizing them as quasicliques in signed networks. This is computationally much harder than the problem in unsigned networks, and we use a convex optimization approach, combined with pruning, to find the top k quasicliques, in terms of their objective values. We incorporate functional similarity measures on nodes in quasicliques, e.g., the fraction of genes within each cluster that have high scores of semantic similarity as annotated on the Gene Ontology. Clusters with low known functional similarity can be indicators of new biological patterns in such data, and might help guide further experiments. We also study a new approach that involves finding quasicliques with given constraints on the level of functional similarity within the nodes. We evaluate these methods and present findings from analysis of a large compilation of RNA-seq expression data from humans.

Check out the poster here

project name

A Graph Based Feature Selection Algorithm Utilizing Attribute Intercorrelation

Arinjoy Basak, Asit Kr. Das

The 7th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEEE - IEMCON 2016), University of British Columbia, Vancouver, Canada, October 2016

Recently, every enterprise generates large volumes of on a regular basis. Complex data mining and analysis techniques are used to feasibly analyse high dimensional data. Feature selection aids in this by providing a reduced representation of this data while maintaining integrity. We propose a graph-based feature selection algorithm utilizing feature intercorrelation to construct a weighted attribute graph, from which attributes are iteratively removed to construct the reduct based on a scoring scheme. Disconnectivity of the graph serves as the point of termination for our algorithm. The performance of our algorithm on real valued and discretized datasets is evaluated statistically by generating the Receiver Operator Characteristic (ROC) curve for each reduced dataset, and by measuring accuracies for classification training tasks, for the datasets reduced by our method.

Check out the paper here

project name

[Report Excerpt] Data Analytics for IITBombayX (Based on OpenEdx InSights) - Detection of difficulty regions in lecture videos of students

Arinjoy Basak

Ekalavya 2015, Department of Computer Science, IIT Bombay

This part of the project report talks about the project work done regarding the development of models for the detection of difficulty regions in videos based on the students behaviour, recorded through the log events in IITBombayX. Such inferences can be determined for a video, in a particular course, based on the students' activities on the lecture video – and inform the same to the course instructors or coordinators, who would then take appropriate steps to address such a problem. We provide a basic outline of our idea to deal with the problem and subsequently, we describe how we had designed the data model and the functionality both at the processing and visualization level for a final implementation of the analytics module in the scenario of Big Data. After the final implementation and testing, the module was integrated with the OPENedX Insight analytics system for IITBombayX, and made available for use by the course instructors participating in the Blended MOOCs model. The entire work of this part was completed in 3 weeks of the internship.

Check out the excerpt of my work here

Check out the FULL report here

project name

Achieving Data Survivability and Confidentiality in Unattended Wireless Sensor Networks

Arpan Sen, Shrestha Ghosh, Arinjoy Basak, Harsh Parshuram Puria, Sushmita Ruj

The 29th IEEE International Conference on Advanced Information Networking and Applications (AINA-2015) , Gwangju, Korea, March 2015

In Unattended Wireless Sensor Networks (UWSNs) the nodes are subjected to hostile environment for sensing critical data. Due to the unattended nature of the network the sink is not always present. Hence, the nodes in the network are required to function in a distributed way in order to ensure Data Survivability and Data Confidentiality. In this work we address these two issues. We have proposed algorithm (s) to ensure Data Survivability by encryption and data replication. We propose a simple scheme for key management which ensures confidentiality by sharing the key among various nodes in the network so that the adversary cannot read the data by compromising a node in the network. We have compared our scheme with the existing ones, both mathematically and by simulations. Analysis shows that our scheme performs better in terms of overheads and efficiency.

Check out the paper here

My Google Scholar profile

Here's How to Contact me

  • Location: Steger Hall, Biocomplexity Institute, Virginia Tech, 1015 Life Science Circle.

  • Email: email id