[climateprediction.net] Linux work *perhaps* coming up in quantity

News and Information related to Distributed Computing
crashtech
TAAT Member
Reactions:
Posts: 1484
Joined: Sun Sep 15, 2019 4:45 pm
Location: Idaho, USA

Re: [climateprediction.net] Linux work *perhaps* coming up in quantity

Post by crashtech » Fri Jan 13, 2023 1:25 pm

Some of us are exasperated, but I think the problem was simply exacerbated. :P

StefanR5R
TAAT Member
Reactions:
Posts: 1552
Joined: Wed Sep 25, 2019 4:32 pm

Re: [climateprediction.net] Linux work *perhaps* coming up in quantity

Post by StefanR5R » Tue Jan 17, 2023 7:55 am

On January 16 Glenn Carver wrote:
Update from CPDN meeting 16/1/23

Upload http connections will be increased over the next 2 days back to the previous maximum of 300 simultaneous connections as most of the previous uploaded data has now been moved off.

The 'grace period' for return of tasks has been changed from 'nothing' to 30 days. CPDN hope this and the increase in httpd connections will be enough for those remaining tasks to upload in time.
(source)

I for one no longer have trouble to connect to the upload server. But the old problem of uploads frequently getting stuck at a random point during the transfer still exists. However, I have dialed in the number of concurrently running tasks and some other parameters such that the backlog of stuck transfers isn't increasing in the long term.

- - - - Update - - - -
On January 17 David Wallom wrote:Hello Everyone,

We increased the number of concurrent uploads allowed to 150 from 50 and the server ended up indeed running out of space. This is with 5 parallel transfers and deletions of successful WU from jasmin-upload to the analysis space. We have temp restricted back to 100 and are seeing free space increasing, 1.5TB out of 24TB. Of the OpenIFS@Home batches, each has up to 800GB of successful workunits we are transferring off and there are 44 batches.

Thanks for your contributions

David
100 connections is certainly better than 50, but not as good as 300… %-)


On January 17 Glenn Carver wrote:
Upcoming work...

There are two other OpenIFS projects with work about ready to go. One will be the OpenIFS 'BL: baroclinic lifecycle' app. This looks at idealized storms in a changing climate. The model runs are much shorter than the OpenIFS PS app. The other project uses the standard OpenIFS model for some atmospheric perturbations studies. Neither of these will involve as many batches as the current PS app.

Release of these workunits is pending testing of some code changes I'm making following feedback & study of the issues arising from the OpenIFS PS app batches.

In short, they'll be no shortage of OpenIFS work for some time.

And just a reminder, please do not over-provision memory for these OpenIFS tasks and if you have a low memory machine (virtual or real), 8Gb or less, only allow 1 task at a time, or best use an app_config.xml file to control how many OpenIFS tasks are started simultaneously. The boinc client does not understand the memory needs of these tasks well enough and can start too many at once crashing the tasks.
(source)

crashtech
TAAT Member
Reactions:
Posts: 1484
Joined: Sun Sep 15, 2019 4:45 pm
Location: Idaho, USA

Re: [climateprediction.net] Linux work *perhaps* coming up in quantity

Post by crashtech » Tue Jan 17, 2023 11:19 am

I have made an app_config.xml like so:

Code: Select all

<app_config>
  <app>
   <name>oifs_43r3_ps</name>
   <max_concurrent>5</max_concurrent>
  </app>
</app_config>
In hopes that this will keep me from accidentally running too many of these at once, though as yet I can't tell if it's effective, since CPDN gave me one task then asked for an hour to think about sending more.

Edit: Seems to work for me, so that other CPU projects can be run in the same instance and with less danger of overconsuming resources.

StefanR5R
TAAT Member
Reactions:
Posts: 1552
Joined: Wed Sep 25, 2019 4:32 pm

Re: [climateprediction.net] Linux work *perhaps* coming up in quantity

Post by StefanR5R » Mon Jan 23, 2023 3:11 pm

On January 19, the upload server filled its disk again. This time the reason is that the tape drive system went down, to which data were streamed off of the upload server. To make matters worse, when the tape drive system will be back up, for some unstated reason, it won't have enough space for the rest of the current planned science data. Therefore they are looking into getting extra filespace and/or into reducing the data somehow. (source)

crashtech
TAAT Member
Reactions:
Posts: 1484
Joined: Sun Sep 15, 2019 4:45 pm
Location: Idaho, USA

Re: [climateprediction.net] Linux work *perhaps* coming up in quantity

Post by crashtech » Mon Jan 23, 2023 4:19 pm

I don't have anything nice to say other than to hope they get it fixed.

Post Reply