Monday, July 21, 2014

Caroline Casey - Week 5 - Complex Systems Group: ANOVA, z-scores, and more community detection - University of Pennsylvania

This week was very busy and exciting! It was very nice to see Dr. Peretz on Tuesday and Saturday; I enjoyed showing her the lab and explaining more about my work. I completed a lot of work this week and continued working on my paper.

On Monday, I met with Dr. Bassett in the morning and we discussed her uncertainty in the interference values I have. Last week, I re-did the t-test to find the interference values by computing a correlation and combining the two sequences; however, this week, we decided to perform a repeated measure Analysis of Variation (ANOVA) instead of a t-test. Dr. Bassett did not believe that combining the 2 sequences and performing a t-test was the best method. Although re-doing the work takes a lot of time, it showed me how important it is to test every method in research and how the right way to find something is not necessarily the first or even second way you complete it; it all takes time. So, Monday, I spent the day reading literature on what an ANOVA is, how to manipulate my data to make it compatible with the ANOVA, and how to implement the ANOVA on Matlab. We also had a lab meeting on Monday where we listened to a lecture from a Graduate student in the lab, Shi.

Tuesday, I started re-doing my work using the repeated measures ANOVA. I managed to complete the ANOVA and all the remaining computational steps. I created the networks for the 4 scenarios as well. The four scenarios are the 4 different ways I computed edges from the Pearson's correlation. I used the 3 scanning-session adjacency matrices and the 2 interference value vectors for the 2 sessions in 4 different combinations. Two of the combinations (interference values from session 2 with scan 1 and interference values from session 3 with scan 2) create predictive edges and two of the combinations (interference values from session 2 with scan 2 and interference values from session 3 with scan 3) create retrospective edges. Creating these combinations allowed me to look at the networks and analyze if something in the retrospective or predictive networks seemed to indicate a trend or showed some aspect of the brain that plays an important role in interference. I also met with Dr. Peretz in the morning and for lunch, it was a very exciting day!

I finished the brain surface plots and the degree analysis on Wednesday. Throughout Wednesday, I also re-did my PowerPoint so that it contained the data from the ANOVA instead of the data from the t-test. In the afternoon, I met with Dr. Bassett to go over the next steps in the analysis of the data. Dr. Bassett suggested my next step be finding the z-score of all of the nodes in each of the scenarios in order to find hubs. Hubs are defined as nodes with large degrees. Hubs can be nodes with a z-score> 1, a z-score> 2, or a z-score> 3; depending on the results of the z-score. So, I spent my afternoon reading what z-scores are and how to calculate them. I then computed z-scores on all of the nodes for each of the 4 networks. In the networks I created, hubs were identified as nodes having a z-score> 1. I created an Excel sheet with the list for each of the 4 scenarios that contained the hubs of each scenario, their respective anatomical name and their z-score. I then looked at the list to see if there were hubs that were the same in both scenarios of the predictive group and if there were hubs that were the same in both scenarios of the retrospective group. I found a couple of regions that were hubs in both scenarios in each of the groups.

On Thursday, I emailed Dr. Bassett my PowerPoint in order to see what her comments were. Dr. Bassett emailed me back with a list of changes needed on the PowerPoint. She also told me to go further in my community detection analysis. In a community, the nodes are more highly connected to each other than to other communities.The first time I did a community detection on the 4 scenarios, the community structure was very weak for all of the scenarios. She gave me a list of analyses to perform as well as some literature to read. I spent Thursday editing my PowerPoint and then deciphering what analyses we need to perform, how they would help in analyzing the community structure, and what algorithms were necessary to perform the tasks.

Friday I began working on going more in depth with my community detection analysis. I previously used the genlouvain community detection algorithm where I input my adjacency matrices (with 0's representing no edge and 1's representing a significant edge that was found through the p-values and permutation tests). The output is S which represented the partitions of the 112 nodes into communities and Q, the quality of the partitions. My first step on Friday was to write a loop that created random matrices, with the same number of nodes and edges, for each of the 4 scenarios in order to determine if the Q values from the random networks were always lower than the Q values from the actual data. If the Q values were always lower in the random networks, then the community partitions in the actual data was always stronger than in the random data. I noticed that the random Q values are not always lower than the actual data; except in scenario 1 and scenario 4, where all the Q values in the random networks were lower than the actual Q values. After completing that, I created module allegiance matrices over 100 optimizations for the 4 scenarios. A module allegiance matrix is the number of times nodes i and j are in the same community divided by the total number of optimizations. I had to manipulate the S and Q data I originally had, write a loop to perform 100 optimizations, and then use the module allegiance algorithm I found to create the matrices for the scenarios. After completing that, I computed consensus partitions for the 4 scenarios. Consensus partitions use the multiple optimizations to find a single (consensus) clustering that is a better fit. It provides more accurate community partitions for the nodes. I used the 100 optimizations I created for the module allegiance matrices for the 4 scenarios and transposed those matrices so that they would be compatible in the consensus partition algorithm. I then used the algorithm and visualized the consensus partitions community structure.
Consensus partition image of the first scenario (edges: interference session 2 with scan 1). The image depicts the strength of nodes being assigned to communities with other nodes.


I really enjoy figuring out how to perform certain tasks and computations by myself; I am learning so much along the way including programming and a lot of statistics! I also love the fact that I have my own project and that I am responsible for making sure everything is accurate, writing a paper, and creating a PowerPoint. I am very excited for the next 5 weeks!

No comments:

Post a Comment