Changes and Clarifications
==========================
20060618 Belatedly, added in Oxford LD-based genetic map positions. File is
now rather large, as it includes the Affy 100K SNPs. End of
chromosome 1 fixed (extrapolation for last two markers very poor).
20051010 Added D6S226 D6S224, D6S225, D6S1848, D6S220, at request of Jo Steele.
D6S224 seems to be a cryptic duplicate (nested primers) of D6S1717,
but given separate entry.
20050607 Blasted GATA156F11 and D19S1179.
20050517 Finished including Build 35.1 physical positions, and reinterpolated
linkage map. A few markers moved compared to previous
positions: large changes reflected the absence of information
about D2S2738, D9S1844, D9S19, UT792 (primers map to 3p,
linkage map to 9), AAT251, where the physical position is
inferred solely based on Marshfield map position. There were
126 local flips of order, usually between adjacent markers.
D17S793 duplication due to existence of 165xd4 and 165xd4A - latter
is different marker. D2S1240 moved back to chromosome 2, and
D14S780 moved back to chromosome 14, after visits to 21 and 22
respectively, based on Build 34.3 sequence data.
20050421 Started collating Build 35.1 physical positions: no great surprises,
save chromosome 17 where positions for D17S126-D17S928 all shifted
2-3 Mbp, X (and Y). Ordering not greatly affected.
20050420 Nievergelt et al placed D5S1454 (ATA4F06) on chromosome 4 based
on BLAST search. This marker is not placed on the most recent builds,
and our linkage data confirms it at its Marshfield position on
chromosome 5 (between D5S433 and D5S2501). So inserted at
107100000 bp on chromosome 5, 116.7 Rutger cM from pter.
20050210 Harry Beeby points out some duplications still left after merging in
of Rutgers map:
Common Name Alias1 Alias2
----------- ------ ------
ATA10H11 D3S2409 D3S2384
ATA28C05 ATA28C05 DXS6795
ATA31E12 ATA31E12 DXS7126
D12S1091 D12S1091 D12S1075
D12S1341 D12S1341 D12S1330
D19S178 . .
D19S393 D19S543 .
D2S1337 D2S406 .
D2S2735 . .
DXS6792 D5S2796 . (on chr 5 at 65234477 bp)
GATA109 D1S1585 D1S1728
GATA112F02 D6S1270 D6S1045
GATA119B03 D7S2200 D7S3047
GATA186D06 DXS9907 .
GATA31E08 D3S2390 . (chromos different 3, X; dropped 3)
GATA43F06 D2S1370 D3S2442 (SEE BELOW)
GATA62F03 D9S2169 D4S2624
GATA66B04 D19S714 D19S588
GATA8B01 D20S201 .
GGAT3F08 DXS9900 DXS6814 (pseudoautosomal map)
I have fixed these except where there are four _good_ aliases,
where I left the duplicates adjacent on the map.
20050204 How many mistakes are left to be made? Added extrapolation
of genetic map outside included markers (20050124), which was
not good on the acrocentric chromosomes. Reverted to original
assumption that marker coverage complete.
20050124 Moved GATA164A09 to correct location at 118 Mbp on X. Replaced
some pseudoautosomal markers physical positions with
their Build 35 positions (so as to coordinate with recently
BLASTed new markers).
20050124 Thomas Wienker has suggested that sex-averaged distances for X are
hard to interpret outside the pseudoautosomal regions. The I.cM now
are female cM for X. A separate file pseudoautosomal_map.dat
contains a map of pseudoautosomal markers. Added a few Xpter markers
notably TELA, DXYS201, DXYS218.
20041118 SCA10 is at 44511678 bp on Build 35.1, while D22S532 is at 44443698 bp
D22S1160 44749701. Merged in Rutgers map, which includes a number of
SNPs (TSC mapping set). This entails addition of 3 fields for
the sex-averaged and sex-specific linkage map positions from
that pooled data analysis (Kong et al 2004). The interpolated
map position is now in Rutger cM.
There are a few inconsistent positions (1% discrepancy):
name1 name2 name3 chr Our.pos Rutgers.pos
D16S3027 AFMa154wc9 D16S2622 16 3864420 4051157
D18S1356 ATA33B11 . 18 42793019 45581487
D9S1844 AFMb321zf1 . 9 42528253 43656049
D3S4557 . 3 473473 456628
D6S2441 ATA50H07 . 6 27737689 26007311
And PI microsatellite has been placed on chr 14 with the gene.
20041111 A few duplications persist (Scott Gordon has pointed these out),
Some of these may represent known segmental duplications (duplicons):
D17S793 is given at 15.379 Mbp and 18.653 Mbp on Ensembl and NCBI,
and is not included on the Kong et al map. On 17p12, the
Smith-Magenis syndrome/dup(1)(p11.2-p11.2) duplication is
18.524-18.677 Mbp and 20.345-20.492 Mbp
D7S804 was resolved using the Build 35.1 positioning, and noting that
D7S805 is an alias.
There are remaining cases where I have left a duplication via an alias
in the "name3" column. In these cases, I believe the correct position
is that given where that name appears as "name1":
D22S427, D11S1883, GATA43F06 (see below), D4S393, D4S409.
20041101 Compared map to that of Kong et al 2004 (14759 markers),
which lists many marker duplications and unmappable markers.
Added in additional chromosome 11p markers from Sequana
Therapeutics fine map of that region.
Notable changes (added aliases were not previously present in master map):
D6S1270 duplicates D6S1045 (have same pos in master map anyway!)
D9S2152 added as alias for D9S1116
D10S1419 added as alias for D10S2470
D13S784 added as alias for D13S1807
D13S785 alias for D13S1811
D13S789 alias for D13S1812
D10S1218, despite being present on the original Marshfield and Decode
maps, is identified as "unlinked to any chromosome".
There are 73 markers present on the master map that Kong et al label
as "physical position does not match linkage position":
D1S1193 D1S396 D1S519 D1S2829 D1S3469 D1S179 D2S2982 D2S323
D2S319 D2S262 D2S1251 D2S2241 D2S1776 D2S1245 D3S1211 D3S3719
D3S4544 D4S1609 D2S2738 D4S1619 D4S1517 D4S2292 D4S2290 D4S1523
D5S593 AC016604-5 D5S2034 D6S1689 D6S941 D6S262 D6S495 D6S1693
D7S628 D7S460 D7S678 D7S1507 D7S2448 D7S1503 D8S1825 D9S280
D9S779 D10S197 D10S589 D10S1141 D11S992 D11S1337 D11S1390
D11S1284 D12S94 VWF D12S58 D12S1074 D12S63 D14S582
D14S543 D17S663 D17S1810 D17S799 D17S968 D18S1140 D18S975
D18S474 D18S68 D18S1374 HRC.A D21S210 D21S1913 D21S1408
D21S1245 DXS6807 DXS1036 DXS997 DXS1049.
The mean absolute difference between our interpolated linkage
map position and Kong et al is 1.1 cM. Therefore, they have
been left in the map file. The Kong et al positions are very
close to the published Marshfield positions, rather than DeCODE
positions.
20040804 Added 39 markers from Manuel Ferreira. These include DXS6792 on chr 5
(genotype data confirm this to be autosomal). A number turn
out to be "cryptic" aliases ie product of one completely contains
that of other marker.
20040623 Have compared this map to that of Nievergelt et al 2004. They
have BLASTed the markers for which we interpolated physical
positions based on the Marshfield map. They could not find 429
of these. They note more markers on chromosomes different to
the original assignment eg UT2548 (D11S1916) seems to be on chr
2 at 151549427; UT5086 (D19S724) on chr 1 at 1117405 and so on.
However, I am distrustful of some of these, given that both
DeCODE and Marshfield, for example, agree that UT924 is
D14S539, rather than D10S519 (as the physical maps have it).
This gives us physical positions of an additional 1000 markers.
name1 name2 chr chr2 build343
D19S724 UT5086 1 19 11174025
PLA2 SGC35515 1 12 184196629
D20S159 UT1307 2 20 136667226
D11S1916 UT2548 2 11 151549427
GC . 2 4 219331392
D2S2394 AFMa101xg5 3 2 485608
D10S1161 UT5819 3 10 153767481
D9S765 UT1531 4 9 56824033
D5S1454 ATA4F06 4 5 60766934
UT2361 . 5 9 26276045
D17S1288 ATA1H07 5 17 80540347
D15S536 UT6886 5 15 110235845
D12S814 UT5140 5 12 149902513
D20S1142 GATA124A11 6 20 6399627
D7S2249 D2S1249 7 2 31648936
D21S2051 GATA116E08 8 21 36216937
D1S460 AFM123yc5 8 1 127193369
D8S1019 UT885 8 3 139929949
D14S539 UT924 10 14 29105585
FB7F11 WI-14125 10 18 67309958
D11S1914 UT1607 11 12 28923393
D18S966 ATA37G10 11 8 43770130
D15S529 UT935 15 12 51424156
D14S779 UT1888 16 14 3403730
UT556 . 16 21 35082119
UT1598 . 16 17 51240316
UT18 . 17 2 120515
PI . 18 14 5534609
D8S1013 UT5182 18 8 59779184
D7S1525 UT7368 19 7 51729360
D14S132 UT563 20 14 25872150
D21S1249 UT1025 21 16 6991420
D2S1240 UT5146 21 2 16907604
D8S2318 GATA115F05 21 8 26770586
D14S780 UT6047 22 14 14519674
D2S1280 UT5116III 22 2 14611866
UT597 . 22 14 14613571
D11S1905 UT832 23 11 10553732
D12S832 UT6574 23 12 45684466
D9S757 UT764 23 9 133552054
D2S1276 UT7691 23 2 138195102
20040618 Thomas Wienker pointed out that there are 100 odd markers
from the Marshfield and Decode maps that have not been included
(many on chr 20): these have been added. Several aliases were
also fixed, notably IFNAR for D21S2039, Mfd92 for F8VWF, 1GF1 for
IGF1. A few physical map positions have been updated (D1S443,
GATA145F08,D1S2677,GATA153G01,D1S2879,D7S804,D17S793,D21S2039).
20040511 AAC023 is bp 67951716 bp. AGAT128 is at 62464943 bp.
GTTT002 is at 155405903 bp. CATA002 is at 23707416 bp.
GATA66B04 is an alias for GATA27C12 (D19S588): removed entry
and added alias Scott G).
20040429 D6S502 Build 34.3 position corrected and name entered as alias for
D6S500/GATA7B06 (Harry).
20040428 GGAAT1B07, Y-27H39, ATA10F11, GATA62F03 given Build 34.3 positions (Scott).
SRaP added as alias for TPO. GATA2A12 added at 882323 bp on X.
20040421 TAT024 is on chr 3 at 52269515, TAGA049 on chr 4 at 174665378,
and alias of D6S502 as GATA7G07 removed. Build 34.3 positions of
CTAT014, TTTTA002 and GATA143C02 added.
20040416 D16S2616 represented twice: Marshfield position at 11.46 cM used.
GATA119B03 has two positions in database: Build 34.3 and
HSC_TCAG (The Center for Applied Genomics alternate build,
usually only used now for chr 7): choose former as Ensemble no
longer uses TCAG.
GATA129B03 represented twice: Marshfield position at 35.51 cM used.
GATA158H04 represented twice, also as g10693: latter removed.
Mfd238 is in fact D19S254, as given by the public databases
(BLASTing primer gives 62358762 bp as location), and not
D19S559 as given in Marshfield spreadsheet (SET13).
GATA29B01 is D19S589 as per public database rather than D19S254 (BLAST).
GATA23B01 is D19S586 as per public database rather than D19S589 (BLAST).
UT7544 is D19S559 as per public database rather than D19S246 (BLAST).
Mfd232 is D19S246 as per public database rather than D19S245 (BLAST).
Und so weiter on chr 19 SET 11-13.
According to Marshfield, GATA7G07 *is* D6S502 on chromosome 6,
but an Ensemble search will instead give D8S1179 on chromosome 8.
TAT024 is on chr 4 in the Marshfield Set 51 spreadsheet and
on chr 3 in the Marshfield Set 52 spreadsheet.
TAGA049 is on chr 15 in the Marshfield Set 51 spreadsheet and
on chr 4 in the Marshfield Set 52 spreadsheet.
20040408 D3S2395 is localised to chr 12 at 8048872 bp, the same location
as D12S397. GATA43F06 is a chromosome 2 marker according to
Marshfield, placed at 227.6 Build 32 Mbp and 231.3 Marshfield cM.
According to most other sources, it is on chromosome 3 at 161 Mbp.
BLASTing Marshfield primers maps to 2.
20040407 Several compound names eg MFD424-TTTA003 renamed to TTTA003
with Mfd424 as alternate. D2S441 is an alias for D2S1779
(added). Several hundred additional aliases added (including
those used by Marshfield sets), with reordering of fields in
many records to have D name as canonical name.
20040406 Added 11p13 markers from ESE2/3 list. Note that D11S1392 and
D11S2008 are identical (nested). Then refitted all
I.cM from the current I.bp, using a smaller alpha. This, as Andy B
pointed out, is a nicer fit at the telomeres.
20040405 Removed further duplicate records, thanks to Scott.
Detected ~10 likely aliases that are not known to databases
D1S2132, D14S539 interpolated position changed manually based on
estimate using knn algorithm excluding the Marshfield position.
Removed D20S159 (UT1307) Build 34.3 position since this placed on
chromosome 2 in sequence map: RH and Marshfield linkage information.
place on 20q13.
20040401 Folded in newer Marshfield standard mapping set markers