For the Embedding, we can use various manifold learning algorithms or random method to embed microbes in 2D; For the Grouping, we can use various clustering methods to group microbes or use the pre-defined group information to group the microbes. Now, let's try different methods to perform the embedding and grouping operations in MEGMA
.
In this section, we can change the embedding method on the loaded megma
object. The megma
object supports a refit operation to update itself, so you don't need to reinitialize a new megma
.
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import seaborn as sns
import pandas as pd
import numpy as np
import aggmap
from aggmap import loadmap, AggMap
megma = loadmap('./megma/megma.all')
Let's try several manifold methods to embed the microbes in megma.
manifold_methods = ['mds', 'isomap', 'umap', 'tsne', 'lle', 'se']
# using mds
megma_new = megma.copy()
megma_new.fit(emb_method = 'mds', verbose=0)
megma_new.plot_scatter(htmlpath='./images', radius = 5)
# using isomap
megma_new = megma.copy()
megma_new.fit(emb_method = 'isomap', verbose=0)
megma_new.plot_scatter(htmlpath='./images', radius = 5)
# using tsne
megma_new = megma.copy()
megma_new.fit(emb_method = 'tsne', verbose=0)
megma_new.plot_scatter(htmlpath='./images', radius = 5)
Random embedding method just randomly assigns the 2D-coordinates to the microbes:
# using tsne
megma_new = megma.copy()
megma_new.fit(emb_method = 'random', random_state=123, verbose=0)
megma_new.plot_scatter(htmlpath='./images', radius = 5)
The microbes can be grouped into several groups based on their phenotype or genotype distances.
For the phenotype-based grouping (or metagenomic grouping), we can calculate the microbial metagenomic abundance correlation distances and then use the agglomerative hierarchical clustering to group the microbes. The default MEGMA uses this method to group the microbes (The cluster number c = 5),the number of clusters can be specified by users.
For the genotype-based grouping, we can build a phylogenetic tree and then we can group the microbes by truncating taxonomic levels in the phylogenetic tree, such as by truncating the Kingdom or Phylum level we can generate the different cluster numbers.
The cluster number c is the number of the channels in the feature map, e. g., c=10 means that the number of channels of the 2D-microbiomeprint is 10.
# c = 10
megma_new = megma.copy()
megma_new.fit(cluster_channels = 10, verbose = 0)
fig = megma_new.plot_tree(leaf_font_size=0)
megma_new.plot_scatter(htmlpath='./images', radius = 5)
# c = 20
megma_new = megma.copy()
megma_new.fit(cluster_channels = 20, verbose = 0)
fig = megma_new.plot_tree(leaf_font_size=0)
megma_new.plot_scatter(htmlpath='./images', radius = 5)
Now, let's try to group the microbes by taxonomic level. The microbes we used has no taxonomic profiles, therefore, we first need to map the taxonomic profile to each microbes based on the mOTU ID.
The parameter feature_group_list
supports a customized grouping information to group the microbes, we can generate the feature group list based on the taxonomic level of kingdom, phylum, class, order, and so on.
## get the taxonomic profiles of all mOTUs
url = 'https://raw.githubusercontent.com/shenwanxiang/bidd-aggmap/master/docs/source/_example_MEGMA/dataset/'
dfm = pd.read_csv(url + 'mOTUs_new_taxonomic_profile.txt',sep='\t')
dfm = dfm.set_index('#mOTU')[['consensus_taxonomy']]
dfm.head(5)
## get the mOTU id for our microbes in megma
dfs = pd.DataFrame(megma.alist, columns = ['IDs'])
dfs['mOTU'] = dfs.IDs.apply(lambda x:x.split('[')[1]).apply(lambda x:x.split(']')[0])
dfs = dfs.set_index('mOTU')
dfs.head(5)
#join the taxonomic profile
dfs = dfs.join(dfm)
dfs.head(5)
#the taxonomic level for each microbe
dft = dfs['consensus_taxonomy'].apply(lambda x: dict([i.split('__') for i in x.split('|')])).apply(pd.Series)
level_dict = {'k':'kingdom', 'p':'phylum', 'c':'class' ,'o':'order' ,'f':'family' ,'g': 'genus','s': 'species'}
dft = dft.rename(columns=level_dict)
dft.head(5)
# grouping by kingdom level
feature_group_list = dft['kingdom'].tolist()
megma_new = megma.copy()
megma_new.fit(feature_group_list = feature_group_list, verbose = 0)
megma_new.plot_scatter(htmlpath='./images', radius = 5)
# grouping by phylum level
feature_group_list = dft['phylum'].tolist()
megma_new = megma.copy()
megma_new.fit(feature_group_list = feature_group_list, verbose = 0)
megma_new.plot_scatter(htmlpath='./images', radius = 5)
# grouping by class level
feature_group_list = dft['class'].tolist()
megma_new = megma.copy()
megma_new.fit(feature_group_list = feature_group_list, verbose = 0)
megma_new.plot_scatter(htmlpath='./images', radius = 5)