Mutational Signature Report¶
Here, we describe the procedure to generate Mutation Signature Report using sample data [*].
[*] The sample data is equipped with the example
directory of the paplot directory.
1. Input data format¶
To generate Mutation Signature Report using paplot, json format input data is required.
{
"signature":[
[ # signature 1
[0.0018,0.0003,0.0002,0.0005,0.0014,0.0008,0.0002,0.0007,0.0012,0.0003,0.0002,0.0004,0.0271,0.0107,0.0016,0.0145], # C -> A
[0.0023,0.0007,0.0001,0.002,0.0027,0.0005,0.0004,0.0032,0.0007,0.0004,0.0001,0.0013,0.1546,0.0306,0.0055,0.1931], # C -> G
[0.0043,0.0016,0.0027,0.0019,0.0096,0.0026,0.0046,0.0053,0.0045,0.0021,0.0034,0.0028,0.2612,0.0517,0.0284,0.1335], # C -> T
[0.0012,0.0007,0.0004,0.0003,0.0003,0.0003,0,0,0.0003,0.0001,0.0003,0,0.0005,0.0001,0.0001,0.0002], # T -> A
[0.0008,0.0003,0.0008,0.0007,0.0002,0.0004,0.0009,0.0005,0.0004,0.0003,0.0006,0.0003,0.0003,0.0004,0.0002,0.0004], # T -> C
[0.0001,0.0001,0.0001,0.0001,0,0.0001,0.0001,0,0.0001,0.0001,0.0009,0.0002,0.0001,0,0.0001,0.0005] # T -> G
],
[ # signature 2
[0.0266,0.0222,0.0026,0.02,0.0205,0.0145,0.0012,0.0155,0.0155,0.0094,0.0009,0.011,0.0224,0.0177,0.0019,0.0307],
[0.0127,0.0079,0.0035,0.0145,0.0058,0.0048,0.0015,0.0115,0.0034,0.0032,0,0.0071,0.0047,0.0145,0.0006,0.0246],
[0.0232,0.0099,0.042,0.0184,0.014,0.0108,0.0219,0.02,0.0137,0.0102,0.0264,0.0128,0.0048,0.0186,0.0153,0.0165],
[0.0096,0.0084,0.0094,0.0175,0.0075,0.0076,0.0046,0.0123,0.0044,0.0035,0.0028,0.008,0.0176,0.0047,0.0031,0.0139],
[0.0245,0.0087,0.0144,0.0235,0.0098,0.0096,0.0051,0.0102,0.0105,0.0053,0.0042,0.0108,0.0114,0.0081,0.0038,0.0098],
[0.0046,0.0006,0.0036,0.0035,0.0025,0.0009,0.0028,0.0082,0.0023,0.0005,0.004,0.0048,0.0041,0.0012,0.0056,0.0104]
]
],
"id":["PD3851a","PD3890a","PD3904a"],
"mutation":[[0,0,0.0594],[0,1,0.7677],[0,2,0.1727],[1,0,0.1474],[1,1,0.4064],[1,2,0.4461]],
"mutation_count":[4001,7174,5804]
}
Elements of the input data for Mutation Signature Report
signature: | Probability masses for each mutation pattern.
Input the probability value for each mutation signature, substitution pattern (e.g., C > A), and context (e.g., TpCpA > TpApA).
The number of bases should be three or five.
The number of contexts for each substitution pattern should be identical (16 and 256 when the numbers of bases are three and five, respectively).
|
---|
As the number of bases is three in the above example data, probability values for the 16 contexts should be put down in the following order:
ANA,ANC,ANG,ANT,CNA,CNA,CNG,CNT,GNA,GNC,GNG,GNT,TNA,TNA,TNG,TNT
When base = 5, the 256 context values should be put down in the following order:
AANAA,AANAC,AANAG,AANAT,AANCA,AANCC,AANCG,AANCT,AANGA,AANGC,AANGG,AANGT,AANTA,AANTC,AANTG,AANTT,
ACNAA,ACNAC,ACNAG,ACNAT,ACNCA,ACNCC,ACNCG,ACNCT,ACNGA,ACNGC,ACNGG,ACNGT,ACNTA,ACNTC,ACNTG,ACNTT,
AGNAA,AGNAC,AGNAG,AGNAT,AGNCA,AGNCC,AGNCG,AGNCT,AGNGA,AGNGC,AGNGG,AGNGT,AGNTA,AGNTC,AGNTG,AGNTT,
ATNAA,ATNAC,ATNAG,ATNAT,ATNCA,ATNCC,ATNCG,ATNCT,ATNGA,ATNGC,ATNGG,ATNGT,ATNTA,ATNTC,ATNTG,ATNTT,
CANAA,CANAC,CANAG,CANAT,CANCA,CANCC,CANCG,CANCT,CANGA,CANGC,CANGG,CANGT,CANTA,CANTC,CANTG,CANTT,
CCNAA,CCNAC,CCNAG,CCNAT,CCNCA,CCNCC,CCNCG,CCNCT,CCNGA,CCNGC,CCNGG,CCNGT,CCNTA,CCNTC,CCNTG,CCNTT,
CGNAA,CGNAC,CGNAG,CGNAT,CGNCA,CGNCC,CGNCG,CGNCT,CGNGA,CGNGC,CGNGG,CGNGT,CGNTA,CGNTC,CGNTG,CGNTT,
CTNAA,CTNAC,CTNAG,CTNAT,CTNCA,CTNCC,CTNCG,CTNCT,CTNGA,CTNGC,CTNGG,CTNGT,CTNTA,CTNTC,CTNTG,CTNTT,
GANAA,GANAC,GANAG,GANAT,GANCA,GANCC,GANCG,GANCT,GANGA,GANGC,GANGG,GANGT,GANTA,GANTC,GANTG,GANTT,
GCNAA,GCNAC,GCNAG,GCNAT,GCNCA,GCNCC,GCNCG,GCNCT,GCNGA,GCNGC,GCNGG,GCNGT,GCNTA,GCNTC,GCNTG,GCNTT,
GGNAA,GGNAC,GGNAG,GGNAT,GGNCA,GGNCC,GGNCG,GGNCT,GGNGA,GGNGC,GGNGG,GGNGT,GGNTA,GGNTC,GGNTG,GGNTT,
GTNAA,GTNAC,GTNAG,GTNAT,GTNCA,GTNCC,GTNCG,GTNCT,GTNGA,GTNGC,GTNGG,GTNGT,GTNTA,GTNTC,GTNTG,GTNTT,
TANAA,TANAC,TANAG,TANAT,TANCA,TANCC,TANCG,TANCT,TANGA,TANGC,TANGG,TANGT,TANTA,TANTC,TANTG,TANTT,
TCNAA,TCNAC,TCNAG,TCNAT,TCNCA,TCNCC,TCNCG,TCNCT,TCNGA,TCNGC,TCNGG,TCNGT,TCNTA,TCNTC,TCNTG,TCNTT,
TGNAA,TGNAC,TGNAG,TGNAT,TGNCA,TGNCC,TGNCG,TGNCT,TGNGA,TGNGC,TGNGG,TGNGT,TGNTA,TGNTC,TGNTG,TGNTT,
TTNAA,TTNAC,TTNAG,TTNAT,TTNCA,TTNCC,TTNCG,TTNCT,TTNGA,TTNGC,TTNGG,TTNGT,TTNTA,TTNTC,TTNTG,TTNTT
Elements for signature contribution graph
This graph is optional.
Signature contribution graph presents the amount of mutations associated with each mutation signature. When id, mutation, and mutation_count are set in the input json file, the signature contribution graph is generated (example ).
id: | List of samples. For each sample, sample indices are assigned (in this example, PD3851a = 0, PD3890a = 1, PD3904a = 2, etc.).
|
---|---|
mutation_count: | The number of mutations for each sample (the mutation number for PD3851a = 4001, that for PD3890a = 7174, etc.).
|
mutation: | Contribution ratio of each mutation signature to each sample ([sample index, signature index, value]).
The indices for mutation signature (signature index) are assigned in the listed order in the signature key.
In the above example, (signature1 = 0, signature2 = 1, signature3 = 2).
|
Note
The keys in the input json file can be modified by changing the contents in the [result_format_signature]
section of the configuration file.
[result_format_signature]
# the keys in input json file
key_signature = signature
key_id = id
key_mutation = mutation
key_mutation_count = mutation_count
Note
One procedure to validate json file format
paplot using json python package. When the input file can be loaded successfully using the load() function from json python package, then the input file is confirmed to be valid json format.
Example, when the file name is “data2.json”.
$ python
>>> import json
>>> json.load(open("data2.json"))
2. Minimal dataset¶
For the format of input data, please refer to 1. Input data format.
Input data file (the number of mutation signatures is two)
{
"signature":[
# signature 1
[
[0.0021,0.0006,0.0002,0.0007,0.0017,0.001,0.0003,0.0009,0.0014,0.0006,0.0003,0.0006,0.027,0.0108,0.0016,0.0147],
[0.0025,0.0009,0.0002,0.0022,0.0029,0.0007,0.0005,0.0034,0.0009,0.0006,0.0002,0.0014,0.1504,0.0301,0.0053,0.1884],
[0.0046,0.0018,0.0031,0.0021,0.0097,0.0029,0.0049,0.0055,0.0047,0.0024,0.0037,0.003,0.2557,0.0513,0.0286,0.1312],
[0.0014,0.0009,0.0007,0.0006,0.0004,0.0005,0.0003,0.0003,0.0004,0.0003,0.0005,0.0002,0.0008,0.0003,0.0003,0.0005],
[0.001,0.0004,0.0011,0.001,0.0003,0.0007,0.0012,0.0008,0.0006,0.0004,0.0007,0.0005,0.0005,0.0007,0.0004,0.0007],
[0.0003,0.0003,0.0003,0.0003,0.0001,0.0003,0.0003,0.0003,0.0002,0.0002,0.0011,0.0004,0.0003,0.0002,0.0003,0.0009]
],
# signature 2
[
[0.022,0.0183,0.0028,0.0171,0.0192,0.0148,0.0026,0.0157,0.0143,0.0108,0.0018,0.0116,0.0181,0.016,0.0021,0.0246],
[0.0133,0.0088,0.0037,0.0136,0.0095,0.008,0.003,0.0131,0.0065,0.0063,0.0016,0.0095,0.0044,0.0135,0.0016,0.0171],
[0.0195,0.0098,0.0283,0.0159,0.0138,0.0112,0.0156,0.0183,0.0128,0.0108,0.0186,0.0127,0,0.0146,0.0095,0.0115],
[0.0095,0.0085,0.0102,0.0155,0.0077,0.0102,0.0096,0.0135,0.0054,0.0052,0.0058,0.0089,0.0145,0.0076,0.0058,0.016],
[0.0192,0.0089,0.0135,0.0198,0.0089,0.0113,0.0092,0.0117,0.0092,0.0063,0.0064,0.01,0.0107,0.0096,0.0061,0.0123],
[0.0059,0.0028,0.0068,0.0063,0.0039,0.0044,0.0076,0.0101,0.004,0.0028,0.007,0.0064,0.006,0.0046,0.008,0.0132]
]
]
}
Configuration file
[signature]
tooltip_format_signature_title = {sig}
tooltip_format_signature_partial = {route}: {#sum_item_value:6.2}
signature_y_max = -1
alt_color_CtoA = #1BBDEB
alt_color_CtoG = #211D1E
alt_color_CtoT = #E62623
alt_color_TtoA = #CFCFCF
alt_color_TtoC = #ACD577
alt_color_TtoG = #EDC7C4
[result_format_signature]
format = json
background = False
key_signature = signature
Execute paplot.
paplot signature signature_minimal/data.json ./tmp signature_minimal \
--config_file ./signature_minimal/paplot.cfg
Then the report is generated in the tmp directory.
Here, the file names (graph_signature2.html) are determined by the number of mutation signatures (interpreted automatically from the input data).
./tmp
┗ signature_minimal
┗ graph_signature2.html
3. Mutation signature with multiple numbers of signatures¶
For the format of input data, please refer to 1. Input data format.
The input data for each signature number and configuration file are necessary for generating Mutation Signature Report with various numbers of signatures.
In this example dataset, the following files are prepared:
example/signature_multi_class/
# Input data files
┣ data2.json # signature num = 2
┣ data3.json # signature num = 3
┣ data4.json # signature num = 4
┣ data5.json # signature num = 5
┣ data6.json # signature num = 6
# Configuration file
┗ paplot.cfg
Execute paplot for each mutation signature number.
paplot signature signature_multi_class/data2.json ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg
paplot signature signature_multi_class/data3.json ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg
paplot signature signature_multi_class/data4.json ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg
paplot signature signature_multi_class/data5.json ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg
paplot signature signature_multi_class/data6.json ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg
Or, execute the following batch command:
paplot signature "signature_multi_class/data*.json" ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg
Then, the report is generated in the tmp directory.
Here, the file names (graph_signature2.html) are determined by the number of mutation signatures (interpreted automatically from the input data).
./tmp
┗ signature_multi_class
┣ graph_signature2.html
┣ graph_signature3.html
┣ graph_signature4.html
┣ graph_signature5.html
┗ graph_signature6.html
4. Signature contribution graph¶
Here, we add a signature contribution graph.
For the format of input data, please refer to 1. Input data format.
For generating report with various signature numbers, please refer to 3. Mutation signature with multiple numbers of signatures.
Execute paplot.
paplot signature "signature_stack/data*.json" ./tmp signature_stack \
--config_file ./signature_stack/paplot.cfg