USCMS T2 Transfers
This twiki is intended to aggregate all necessary information for the current effort of improving inter-T2 PhEDEx transfers in the context of USCMS.
It is known that the networks supported between the 8 sites are of high capacity and availability. However there seem to be some limitations to be addressed and tested at the level of CMS Transfer tools or configurations of these, that could improve the overall performance and at the end, make these systems perform better and deliver data faster between sites.
The general picture on transfers over 20 Gbps and some of these configuration problems are mentioned in
Samir's talk
at the T2 meeting of 07/29.
So far, the showstopper was the uplink bandwidth for most sites. Since July 2014 this is starting to change.
The ideal is that even 10 Gbps sites could participate, as it is possible that the currrent settings are not optimal for fast transfers. We could tune it until it saturates the 10 Gbps link and everyone would have exercised how to improve transfer rates in Debug.
Plan for the exercise
As discussed in the meeting, we would like to use Caltech as the source site, as it managed to do 25/29 Gbps with its setup, being a good source for sites optimizing their configurations. Once everyone else optimizes their download configurations and we observe which rates we get to which sites, we could start rotating who is the source site, and see what are the maximum rates that we get from them. It is important that we have multiple sink sites, as even if there are limitations in sites, the others will add up to the total rate.
There are 3 major steps on this exercise, 2 of them will require coordination among sites :
- Tuning PhEDEx download configurations, so LoadTest settings will correspond better to reality
- Observing how transfers behave at the FTS level, note if the Optimizer algorithm is a limiting factor or it actually helps to achieve the optimal setting of active transfers for the available bandwidth at a given moment.
- The logical limitations would have been removed, sites can focus on setting their upload rates as they want and optimize their GridFTP setups, start observing what are the best rates they can get out of the storage.
When we are done with these, the transfer test framework through PhEDEx will be more responsive and we will actually be able to run more advanced tests on higher rates. For example coordinate pushing data from many sites to one.
Participation of sites
In order to contact only the interested sites, please fill out the table below :
Site |
Connectivity |
Participating |
Notes |
T2_BR_SPRACE |
10G |
N/A |
|
T2_US_Caltech |
100G |
|
|
T2_US_Florida |
100G |
N/A |
|
T2_US_MIT |
10G |
N/A |
|
T2_US_Nebraska |
10G |
N/A |
Upgrading to 100G soon |
T2_US_Purdue |
100G |
N/A |
|
T2_US_UCSD |
10G |
N/A |
|
T2_US_Wisconsin |
10G |
N/A |
|
T2_US_Vanderbilt |
10G |
N/A |
|
FTS Notes
Currently we have 3 official FTS servers :
- cmsfts3.fnal.gov
- fts3.cern.ch
- lcgfts3.gridpp.rl.ac.uk
There is an official recommendation that is the most logic distribution of what you should use, however for this exercise people are encouraged to try other deployments and possibly different behaviors. For example it was observed 208 transfers in parallel in CERN's FTS, but not more than 50 at FNAL (yet).
In the long run, US Sites should use FNAL. But it might be worth to understand if other FTS servers have a different optimizer behavior and why.
PhEDEx Documentation
We will be exercising mostly the Download agent, therefore the most useful documenation for us is
this
.
However there is also
this
if you would like to read more.
PhEDEx configurations
One of the limitations is how much the download site
PhEDEx agent submits to FTS. Caltech was asked in the meeting how they control that. In that case, we have 2 agents. One for general transfers and another exclusively for US Transfers. the -ignore and -accept flags will do the separation. Also, see that one can throttle the number of active transfers for each site as needed and set a default for the sites not specified. The relevant part for Config.Debug is :
### AGENT LABEL=download-debug-fts PROGRAM=Toolkit/Transfer/FileDownload DEFAULT=on
-db ${PHEDEX_DBPARAM}
-nodes ${PHEDEX_NODE}
-delete ${PHEDEX_CONF}/FileDownloadDelete
-validate ${PHEDEX_CONF}/FileDownloadVerify
-ignore '%T2_US%'
-verbose
-backend FTS
-batch-files 50
-link-pending-files 200
-max-active-files 700
-link-active-files 'T1_CH_CERN_Buffer=50'
-link-active-files 'T1_DE_KIT_Buffer=10'
-link-active-files 'T1_DE_KIT_Disk=10'
-link-active-files 'T1_ES_PIC_Buffer=100'
-link-active-files 'T2_RU_RRC_KI=2'
-link-active-files 'T1_FR_CCIN2P3_Buffer=100'
-link-active-files 'T1_FR_CCIN2P3_Disk=100'
-link-active-files 'T1_IT_CNAF_Buffer=150'
-link-active-files 'T1_TW_ASGC_Buffer=100'
-link-active-files 'T1_UK_RAL_Buffer=50'
-link-active-files 'T1_US_FNAL_Buffer=100'
-link-active-files 'T2_DE_RWTH=10'
-link-active-files 'T2_IT_Pisa=20'
-default-link-active-files 100
-protocols srmv2
-mapfile ${PHEDEX_FTS_MAP}
### AGENT LABEL=download-debug-t2fts PROGRAM=Toolkit/Transfer/FileDownload DEFAULT=on
-db ${PHEDEX_DBPARAM}
-nodes ${PHEDEX_NODE}
-delete ${PHEDEX_CONF}/FileDownloadDelete
-validate ${PHEDEX_CONF}/FileDownloadVerify
-accept '%T2_US%'
-verbose
-backend FTS
-batch-files 20
-link-pending-files 300
-max-active-files 300
-protocols srmv2
-mapfile ${PHEDEX_FTS_MAP}
-- Main.samir - 2014-07-29