Server shut down at 01:23EDT June 22, 2021

Started by Phil Harvey, June 22, 2021, 08:19:40 AM

Previous topic - Next topic

Phil Harvey

Dreamhost shut down the ExifTool web server this morning due to high load.

I don't know yet what the source of the problem was, but I have taken some steps to reduce the server load to avoid this happening again.  The changes include disabling the forum Search function and removing old versions of the ExifTool packages from the server.

I'll continue to investigate the issue, but hopefully the steps I have taken so far will help to avoid this problem in the future.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

It happened again today.  The server was offline for about 8 hours before I fixed it. :(

- Phil

Edit: To help alleviate this situation, I've blocked the following robots from indexing this site:  Neevabot DotBot PetalBot AspiegelBot MJ12Bot AhrefsBot MauiBot and SEMrushBot. (Neevabot was the most active at the time of the shutdown, at a rate of about 1 request per second.)
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

I was just about to email you when my cable went completely out, both internet and TV, for 12 hours.
Down the street, Cox Cable had about 4 trucks working on it and were blocking half the lanes at an intersection in both directions.

Searched on Neevabot and didn't find much, but there was this reddit post which complained about Neevabot  slamming their servers.  Includes graphs comparing Neevabot's requests to Google's requests.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

Hmmm.  I disallowed Neevabot in robots.txt more than 6 hours ago, but we continue to be slammed by this bot. :(

I've sent them an email asking them to stop indexing this site.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Ha!  I've blocked Neevabot using the .htaccess file.  It continues to hammer the site, but is now getting "denied" responses.  The downside to this is that it won't be able to read my robots.txt file if it ever decides to obey it.  I wonder how long their bot will continue to plague the site before it gets the message.

- Phil

(The dreamhost support has been less than helpful, with their only advice being to upgrade to a more expensive plan. :( )
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

#5
Update: Neevabot responded to my email saying they have disabled crawling of exiftool.org, but their robot continues to hammer the site.

- Phil

Edit: It finally stopped at 3:38 PM EDT, about 30 minutes after they said it had been disabled.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

#6
Just for my reference, here is a cronjob I am now running to automatically take care of this problem so we don't suffer long downtimes like we did before.

The script executes every 10 minutes.  If it detects that the server has been disabled, it blocks the most active IP from the last 2000 accesses to the web site, then waits another 10 minutes and re-enables the web site.

So if this happens again, the web site should be back up within 10 to 20 minutes.

#!/bin/sh
# check to see if dreamhost webserver is up, and fix it if not
# (run via cron job in my dreamhost account)
mv -f exiftool.org_DISABLED* exiftool.org_checkup >/dev/null 2>&1
if [ $? -eq 0 ]; then
echo checkup_one `date` >> notes/checkup.log
# update .htaccess to deny worst offender from last 2000 accesses
ip=`tail -2000 logs/exiftool.org/https/access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head -n 1 | awk '{print $2}'`
echo "" >> exiftool.org/.htaccess
echo "# added automatically" `date` >> exiftool.org/.htaccess
echo "deny from $ip" >> exiftool.org/.htaccess
echo "denied $ip" >> notes/checkup.log
else
mv -f exiftool.org_checkup exiftool.org >/dev/null 2>&1
if [ $? -eq 0 ]; then
echo checkup_two `date` >> notes/checkup.log
fi
fi


- Phil

Edit: It happened again (read here), and the above script didn't work because DreamHost used a different format for the disabled directory name.  So here is the new script that should handle this:

#!/bin/sh
# check to see if dreamhost webserver is up, and fix it if not
# (run via cron job in my dreamhost account)
up=`ls exiftool.org/rss.xml 2>/dev/null`
if [ "$up" == "" ]; then
date=`date +%Y%m%d.%H%M%S`
checked=`ls exiftool.org_checkup/rss.xml 2>/dev/null`
if [ "$checked" != "" ]; then
mv exiftool.org exiftool.org.$date >/dev/null 2>&1
if [ $? -eq 0 ]; then
echo "$date Saved exiftool.org.$date" >> notes/checkup.log
fi
mv exiftool.org_checkup exiftool.org >/dev/null 2>&1
if [ $? -eq 0 ]; then
echo "$date Restored exiftool.org" >> notes/checkup.log
fi
exit 1
fi
echo "$date exiftool.org is down"
disabled=`ls */rss.xml 2>/dev/null`
count=`echo "$disabled" | wc -l`
if [ "$disabled" != "" ] && [ $count -eq 1 ]; then
dir=`echo "$disabled" | sed 's/rss.xml//'`
mv exiftool.org_checkup exiftool.org_checkup.$date >/dev/null 2>&1
mv $dir exiftool.org_checkup
echo "$date Prepared $dir" >> notes/checkup.log

# update .htaccess to deny worst offender from last 2000 accesses
ip=`tail -2000 logs/exiftool.org/https/access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head -n 1 | awk '{print $2}'`
echo "" >> exiftool.org_checkup/.htaccess
echo "# added automatically $date" >> exiftool.org_checkup/.htaccess
echo "deny from $ip" >> exiftool.org_checkup/.htaccess
echo "$date Denied $ip" >> notes/checkup.log
fi
fi
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).