[Slackbuilds-users] sbog ping: invalid download urls in .info files

Fri Mar 2 18:49:55 UTC 2018

On 3/2/18, rundstutzen at gmx.de <rundstutzen at gmx.de> wrote:
> i didn't even know that sbosrcarch does exist (even though i have seen
> slackware.uk before). so the functionality i programmed does already
> exist.

Not exactly: it's the archive creation/maintenance script that does all
the "fake HEAD" requests. It's looking at the headers to decide whether
the file has changed. If it thinks the file's different, it downloads
a new copy for the archive. There's no "check links, but don't download
files" mode, so it doesn't do what "sbog ping" does.

> i can't find the sources of your ping script, though.

Here: http://urchlay.naptime.net/repos/sbostuff/tree/

...or: git clone git://urchlay.naptime.net/sbostuff.git

It's in perl, and not what I'd call beautiful code, so you might want
to put on your goggles before looking at it :)

> i hate workarounds with passion. its one of the banes of the software
> industry. what i am trying to do is actually what HEAD requests were
> made for.
> ...
> but alas - these servers are not supported by "sbog
> ping". if a server is not able to properly handle a HEAD request (which
> is not hard) then this is not the fault of sbog.

Right. In a perfect world, all web servers would comply with the spec,
and would support HEAD requests. Your approach is valid, but I took
a different approach, since I wanted sbosrcarch to be as complete an
archive as possible. The only things it doesn't have are files hidden
behind a click-through license (like Oracle's jdk download).

You might at least look through your logs and make a list of the servers
that don't do HEAD requests, and add some code that logs "this server
doesn't support HEAD requests", so the user knows it's the server's fault.

> if i read the output of "curl -v --head -X GET $url" correctly: the
> server will get a GET request, so the server (or proxy) will start
> sending the body (before the request is closed/cancelled), causing
> traffic overhead. correct me if i'm wrong. i don't want to do that.

You're right. How much overhead depends on how the server/proxy is
configured, but it seems to be 64K bytes per request usually. I decided
that wasn't a problem, partly because the guy who hosts the archive said
so (I wrote the script, he runs it and lets people download the files
it collects).

> there is another reason i am don't want to use GET requests: sbog uses a
> client pool, sending request concurrently to servers. this speeds
> things up *a lot*. i don't want to send several GET request to a
> server simultaneously.

Make the client pool smart enough to serialize requests to the same
server, and do them in parallel only when they're for different servers?
That seems like it'd be worth doing even with regular HEAD requests like
you use now.

sbosrcarch doesn't do anything in parallel, it's one request after
another. Normally it's run non-interactively (via cron job) so it doesn't
matter if it takes a long time to finish.