Bulk RNA-Seq quantification in kallisto

Introduction

We previously showed that pooled single cell CPM was strongly correlated with bulk log TPM (but not log CPM/RPKM).

To understand why pooled CPM eQTL mapping appears much more difficult, estimate log TPM for the iPSC bulk RNA-Seq using kallisto

File manifest

The data were deposited in GEO under accession GSE89895.

Get the mapping from samples to files.

| BioSample    | Experiment |   LoadDate | MBases | MBytes | ReleaseDate | Run        | SRA_Sample | Sample_Name | Assay_Type | AvgSpotLen | BioProject  | Center_Name | Consent | DATASTORE_filetype | DATASTORE_provider | InsertSize | Instrument          | LibraryLayout | LibrarySelection | LibrarySource  | Organism     | Platform | SRA_Study | cell_type                     | source_name                   | title   |
| SAMN06020640 | SRX3452559 | 2017-12-08 |   4777 |   3322 |  2017-12-15 | SRR6355950 | SRS1802805 | GSM2392685  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18486 |
| SAMN06020639 | SRX3452560 | 2017-12-26 |   1728 |   1179 |  2017-12-26 | SRR6355951 | SRS1802806 | GSM2392686  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18489 |
| SAMN06020691 | SRX3452561 | 2017-12-08 |   3516 |   2343 |  2017-12-15 | SRR6355952 | SRS1802807 | GSM2392687  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18498 |
| SAMN06020690 | SRX3452562 | 2017-12-08 |   4605 |   3062 |  2017-12-15 | SRR6355953 | SRS1802809 | GSM2392688  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18501 |
| SAMN06020689 | SRX3452563 | 2017-12-08 |   3900 |   2597 |  2017-12-15 | SRR6355954 | SRS1802808 | GSM2392689  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18502 |
| SAMN06020688 | SRX3452564 | 2017-12-08 |    685 |    382 |  2017-12-15 | SRR6355955 | SRS1802810 | GSM2392690  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18505 |
| SAMN06020687 | SRX3452565 | 2017-12-26 |   3402 |   2337 |  2017-12-26 | SRR6355956 | SRS1802811 | GSM2392691  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18507 |
| SAMN06020686 | SRX3452566 | 2017-12-26 |   4110 |   2807 |  2017-12-26 | SRR6355957 | SRS1802812 | GSM2392692  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18508 |
| SAMN06020685 | SRX3452567 | 2017-12-08 |    832 |    468 |  2017-12-15 | SRR6355958 | SRS1802813 | GSM2392693  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18510 |
| SAMN06020684 | SRX3452568 | 2017-12-08 |   1495 |    839 |  2017-12-15 | SRR6355959 | SRS1802814 | GSM2392694  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18511 |
| SAMN06020683 | SRX3452569 | 2017-12-08 |   2891 |   1942 |  2017-12-15 | SRR6355960 | SRS1802815 | GSM2392695  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18517 |
| SAMN06020682 | SRX3452570 | 2017-12-08 |   5092 |   3400 |  2017-12-15 | SRR6355961 | SRS1802816 | GSM2392696  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18519 |
| SAMN06020681 | SRX3452571 | 2017-12-08 |   7683 |   5327 |  2017-12-15 | SRR6355962 | SRS1802817 | GSM2392697  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18520 |
| SAMN06020680 | SRX3452572 | 2017-12-08 |   7461 |   5221 |  2017-12-15 | SRR6355963 | SRS1802818 | GSM2392698  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18522 |
| SAMN06020679 | SRX3452573 | 2017-12-08 |   2906 |   1893 |  2017-12-15 | SRR6355964 | SRS1802819 | GSM2392699  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18852 |
| SAMN06020678 | SRX3452574 | 2017-12-08 |   3202 |   2142 |  2017-12-15 | SRR6355965 | SRS1802820 | GSM2392700  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18853 |
| SAMN06020677 | SRX3452575 | 2017-12-26 |   4154 |   2899 |  2017-12-26 | SRR6355966 | SRS1802821 | GSM2392701  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18855 |
| SAMN06020676 | SRX3452576 | 2017-12-26 |   4122 |   2906 |  2017-12-26 | SRR6355967 | SRS1802822 | GSM2392702  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18856 |
| SAMN06020675 | SRX3452577 | 2017-12-08 |   4716 |   3053 |  2017-12-15 | SRR6355968 | SRS1802824 | GSM2392703  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18858 |
| SAMN06020674 | SRX3452578 | 2017-12-08 |   6313 |   4378 |  2017-12-15 | SRR6355969 | SRS1802823 | GSM2392704  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18859 |
| SAMN06020673 | SRX3452579 | 2017-12-26 |   3910 |   2521 |  2017-12-26 | SRR6355970 | SRS1802827 | GSM2392705  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18861 |
| SAMN06020672 | SRX3452580 | 2017-12-08 |   3787 |   2525 |  2017-12-15 | SRR6355971 | SRS1802825 | GSM2392706  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18862 |
| SAMN06020671 | SRX3452581 | 2017-12-08 |   1020 |    575 |  2017-12-15 | SRR6355972 | SRS1802826 | GSM2392707  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18870 |
| SAMN06020670 | SRX3452582 | 2017-12-08 |   3750 |   2522 |  2017-12-15 | SRR6355973 | SRS1802828 | GSM2392708  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18907 |
| SAMN06020669 | SRX3452583 | 2017-12-08 |   1279 |    713 |  2017-12-15 | SRR6355974 | SRS1802833 | GSM2392709  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18912 |
| SAMN06020668 | SRX3452584 | 2017-12-08 |   5057 |   3320 |  2017-12-15 | SRR6355975 | SRS1802832 | GSM2392710  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA18913 |
| SAMN06020667 | SRX3452585 | 2017-12-08 |   9035 |   5880 |  2017-12-15 | SRR6355976 | SRS1802829 | GSM2392711  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19093 |
| SAMN06020666 | SRX3452586 | 2017-12-26 |   2122 |   1442 |  2017-12-26 | SRR6355977 | SRS1802830 | GSM2392712  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19098 |
| SAMN06020665 | SRX3452587 | 2017-12-08 |   1609 |    920 |  2017-12-15 | SRR6355978 | SRS1802831 | GSM2392713  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19099 |
| SAMN06020664 | SRX3452588 | 2017-12-26 |   2777 |   1804 |  2017-12-26 | SRR6355979 | SRS1802834 | GSM2392714  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19101 |
| SAMN06020663 | SRX3452589 | 2017-12-08 |   3984 |   2621 |  2017-12-15 | SRR6355980 | SRS1802836 | GSM2392715  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19102 |
| SAMN06020662 | SRX3452590 | 2017-12-08 |    719 |    408 |  2017-12-15 | SRR6355981 | SRS1802835 | GSM2392716  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19108 |
| SAMN06020661 | SRX3452591 | 2017-12-08 |    562 |    301 |  2017-12-15 | SRR6355982 | SRS1802837 | GSM2392717  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19114 |
| SAMN06020660 | SRX3452592 | 2017-12-08 |   4131 |   2945 |  2017-12-15 | SRR6355983 | SRS1802838 | GSM2392718  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19116 |
| SAMN06020659 | SRX3452593 | 2017-12-08 |   5158 |   3443 |  2017-12-15 | SRR6355984 | SRS1802839 | GSM2392719  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19119 |
| SAMN06020658 | SRX3452594 | 2017-12-08 |   4047 |   2897 |  2017-12-15 | SRR6355985 | SRS1802840 | GSM2392720  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19127 |
| SAMN06020657 | SRX3452595 | 2017-12-08 |    758 |    429 |  2017-12-15 | SRR6355986 | SRS1802841 | GSM2392721  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19128 |
| SAMN06020656 | SRX3452596 | 2017-12-08 |   2909 |   1962 |  2017-12-15 | SRR6355987 | SRS1802842 | GSM2392722  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19130 |
| SAMN06020655 | SRX3452597 | 2017-12-08 |    763 |    438 |  2017-12-15 | SRR6355988 | SRS1802843 | GSM2392723  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19138 |
| SAMN06020654 | SRX3452598 | 2017-12-08 |   3254 |   2342 |  2017-12-15 | SRR6355989 | SRS1802846 | GSM2392724  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19140 |
| SAMN06020653 | SRX3452599 | 2017-12-08 |   2939 |   1986 |  2017-12-15 | SRR6355990 | SRS1802844 | GSM2392725  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19143 |
| SAMN06020652 | SRX3452600 | 2017-12-08 |   3215 |   2301 |  2017-12-15 | SRR6355991 | SRS1802845 | GSM2392726  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19144 |
| SAMN06020651 | SRX3452601 | 2017-12-08 |   3187 |   2130 |  2017-12-15 | SRR6355992 | SRS1802851 | GSM2392727  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19152 |
| SAMN06020650 | SRX3452602 | 2017-12-08 |   4138 |   2778 |  2017-12-15 | SRR6355993 | SRS1802848 | GSM2392728  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19153 |
| SAMN06020649 | SRX3452603 | 2017-12-08 |   5082 |   3380 |  2017-12-15 | SRR6355994 | SRS1802849 | GSM2392729  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19159 |
| SAMN06020648 | SRX3452604 | 2017-12-08 |   4789 |   3437 |  2017-12-15 | SRR6355995 | SRS1802847 | GSM2392730  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19160 |
| SAMN06020647 | SRX3452605 | 2017-12-08 |   2984 |   1999 |  2017-12-15 | SRR6355996 | SRS1802850 | GSM2392731  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19190 |
| SAMN06020646 | SRX3452606 | 2017-12-08 |   5594 |   3980 |  2017-12-15 | SRR6355997 | SRS1802854 | GSM2392732  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19192 |
| SAMN06020645 | SRX3452607 | 2017-12-08 |   3053 |   2033 |  2017-12-15 | SRR6355998 | SRS1802852 | GSM2392733  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19193 |
| SAMN06020644 | SRX3452608 | 2017-12-08 |   3290 |   2200 |  2017-12-15 | SRR6355999 | SRS1802853 | GSM2392734  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19204 |
| SAMN06020643 | SRX3452609 | 2017-12-08 |   3832 |   2548 |  2017-12-15 | SRR6356000 | SRS1802859 | GSM2392735  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19206 |
| SAMN06020642 | SRX3452610 | 2017-12-08 |   4932 |   3598 |  2017-12-15 | SRR6356001 | SRS1802856 | GSM2392736  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19207 |
| SAMN06020641 | SRX3452611 | 2017-12-08 |   1720 |    932 |  2017-12-15 | SRR6356002 | SRS1802855 | GSM2392737  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19209 |
| SAMN06020696 | SRX3452612 | 2017-12-08 |   5306 |   3879 |  2017-12-15 | SRR6356003 | SRS1802857 | GSM2392738  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19210 |
| SAMN06020695 | SRX3452613 | 2017-12-08 |   1128 |    811 |  2017-12-15 | SRR6356004 | SRS1802858 | GSM2392739  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19225 |
| SAMN06020694 | SRX3452614 | 2017-12-08 |   2963 |   1986 |  2017-12-15 | SRR6356005 | SRS1802860 | GSM2392740  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19238 |
| SAMN06020693 | SRX3452615 | 2017-12-26 |   4332 |   3036 |  2017-12-26 | SRR6356006 | SRS1802862 | GSM2392741  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19239 |
| SAMN06020692 | SRX3452616 | 2017-12-26 |   4555 |   3142 |  2017-12-26 | SRR6356007 | SRS1802861 | GSM2392742  | RNA-Seq    |         50 | PRJNA420980 | GEO         | public  | sra                | ncbi               |          0 | Illumina HiSeq 2500 | SINGLE        | cDNA             | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | SRP126289 | Induced pluripotent stem cell | Induced pluripotent stem cell | NA19257 |

Build the index

We need to be careful and use exactly the same sequences which were used in the scRNA-Seq quantification.

sbatch --partition=broadwl
#!/bin/bash
curl -OL --ftp-pasv "ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/cds/Homo_sapiens.GRCh37.75.cds.all.fa.gz"
Submitted batch job 44782550

srun --partition=broadwl --pty bash
srun: job 44825594 queued and waiting for resources
srun: job 44825594 has been allocated resources
echo 'org_babel_sh_eoe'
zcat Homo_sapiens.GRCh37.75.cds.all.fa.gz | head
zcat Homo_sapi 
ens.GRCh37.75.cds.all.fa.gz | head
ENST00000415118 cds:known chromosome:GRCh37:14:22907539:22907546:1 gene:ENSG00000223997 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene
GAAATAGT
ENST00000434970 cds:known chromosome:GRCh37:14:22907999:22908007:1 gene:ENSG00000237235 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene
CCTTCCTAC
ENST00000448914 cds:known chromosome:GRCh37:14:22918105:22918117:1 gene:ENSG00000228985 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene
ACTGGGGGATACG
ENST00000604642 cds:known chromosome:GRCh37:15:20209093:20209115:-1 gene:ENSG00000270961 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene
GTGGATATAGTGTCTACGATTAC
ENST00000603326 cds:known chromosome:GRCh37:15:20210050:20210068:-1 gene:ENSG00000271317 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene
NNTGACTATGGTGCTAACTAC
zcat Homo_sapiens.GRCh37.75.cds.all.fa.gz | awk '{split($3, a, ":"); c=a[3]} c ~ /^([0-9][0-9]?|[XY]|MT)$/ && $5 == "gene_biotype:protein_coding" {k[$4] = 1} END {for (i in k) {n += 1} print n}'

3, a, ":"); c=a[3]} c ~ /^([0-9][0-9]?|[XY]| 
MT)$/ && $5 == "gene_biotype:protein_coding" {k[$4] = 1} END {for (i in k) {n += 1} pri 
nt n}'
20327

Build the index.

sbatch --partition=broadwl --mem=8G
#!/bin/bash
source activate scqtl
kallisto index -i transcripts.idx /scratch/midway2/aksarkar/singlecell/run-kallisto/ensembl-75-protein-coding.fa.gz
Submitted batch job 44822034

Run kallisto

Download the data and run quantification.

sbatch --partition=broadwl --mem=5G --job-name kallisto -a 2-58
#!/bin/bash
set -e
source activate scqtl
readarray tasks <metadata.txt
task=(${tasks[$SLURM_ARRAY_TASK_ID]})
test -f ${task[1]}.fastq.gz || fastq-dump --gzip ${task[1]}
kallisto quant --plaintext --single -l 200 -s 50 -i transcripts.idx -o ${task[-1]} ${task[1]}.fastq.gz
gzip ${task[-1]}/abundance.tsv
Submitted batch job 44829590

Move the output to permanent storage.

test -e .rsync-filter || cat >.rsync-filter <<EOF
+ */
+ abundance.tsv.gz
- *
EOF
rsync -FFau /scratch/midway2/aksarkar/singlecell/run-kallisto/ /project2/mstephens/aksarkar/projects/singlecell-qtl/data/kallisto/

Process the output.

zcat Homo_sapiens.GRCh37.75.cds.all.fa.gz | awk '/^>/ {sub(">", "", $1); split($3, a, ":"); split($4, b, ":"); c=a[3]} c ~ /^([0-9][0-9]?|[XY]|MT)$/ && $5 == "gene_biotype:protein_coding" {print $1, b[2]}' >mapping.txt
sbatch --partition=broadwl
#!/bin/bash
set -e
function process () {
    zcat $1 | awk -v ind=$(basename $(dirname $1)) 'BEGIN {while (getline <"mapping.txt") {m[$1] = $2}} NR > 1 {tpm[m[$1]] += $NF} END {for (gene in tpm) {print ind, gene, tpm[gene]}}'
}
export -f process
find -name "abundance.tsv.gz" | parallel --halt now,fail=1 -j1 process | gzip >bulk-ipsc-tpm.txt.gz
cp bulk-ipsc-tpm.txt.gz /project2/mstephens/aksarkar/projects/singlecell-qtl/data/kallisto/
Submitted batch job 44835137

Author: Abhishek Sarkar Abhishek Sarkar

Created: 2018-05-22 Tue 13:57

Validate