Troubleshooting a Slow Website / Server

Hopefully this page loaded for you extra-fast and looks correct.  This post will review how I fixed a slow website using traceroute and other methods.  If the site did not load quickly, it means I’ve either given up, or just failed – both are possible.

I’ve been in the process of trying to self-host my sites to see if I can better optimize them within a reasonable budget.  I have these trial credits with vultr.com (get your own $100 trial credits following this link) and so I set-up a cyberpanel / ubuntu 20.04 server and I’ve been reasonably happy with it.  A few days ago, though, it seemed like one of my sites was a bit slow to access the admin panel, like it’d hang for a few seconds before loading – most people probably wouldn’t notice it, but since I’ve been so focused on page speed, I noticed.

Since I’m relatively new to managing servers, I may have missed a few steps.  My configuration is also a bit complicated since I have cloudflare as well as openlitespeed doing some cache work.

Step 0 – make sure the server isn’t overloaded.  I checked the server load numbers and they were below 0.5 for the past 1, 5, and 15 minutes.  I checked the top command in console to make sure nothing looked unusual – I’m no expert here, but everything looked pretty typical.

Step 1 – I first disabled cloudflare from the site to make sure that wasn’t part of the problem.  I had to make sure I was accessing the server directly, and still not going through cloudflare.  That isn’t always easy to tell.  I go to gtmetrix.com and have it process my site.  I then select the waterfall tab and click on the plus next to the resource I’m interested in – usually just the home page of the site.  That should show the headers for the website and one of the headers should be “server”.  If the site is going through cloudflare, the server will be cloudflare.  With cyberpanel, when not going through a CDN, the server is listed as “litespeed”.

So, I disabled cloudflare, and I was still noticing the site would sometimes load slowly.  Intermittent problems are tough to pinpoint.

2.  Disabled litespeed cache (was using this site in WordPress) – I wanted to make sure there wasn’t anything unusual with my cache instructions, so I went into the litespeed cache plugin and under toolbox / debug, there is an option to disable litespeed.  I did that.

Still no change.

3.  I started to wonder if it was maybe a plugin causing the delays, and I messed around a little with Jetpack and some other plugins, but nothing made any difference.  This step should have been last – and likely has many, many more steps involved if I hadn’t already figured out my problem at Step 4.  And troubleshooting specific processes would involve looking at the top command in console or htop gives some additional information.  If I ever have to go through that troubleshooting process, I’ll make another post.

4. (which should have been step 1 – after checking the server load).  I checked the network between me and the server.  I learned to do this years ago by going to the dos prompt (type command in search in windows) and from the command prompt type: tracert yourdomainname.com

It will check each “hop” along the way between where I am and where my server is, and give each step 3 tries, showing the amount of time it takes to get a response at each step.  Each line is a new hop, farther from me, and closer to the server.  I’m no expert in looking at traceroutes, but I generally know if there are *’s in only 1 or 2 of the 3 lines on a hop, that is often a problem.  Normal would be no *’s although sometimes you can get all 3 *’s and it won’t be abnormal, unless it happens for all remaining steps – in that case there is typically a complete failure at some hop along the route.

Anyway, with vultr.com there were a few steps that were all *’s but the last step shows only 1 asterisk the first time I did a traceroute.  Also, one of the numbers seemed much higher than I would have expected.  I repeated the traceroute a few times and concluded there could be a problem with the network path between me and the vultr server, likely close to the server.  It’s possible there is a configuration error on the server, but I know I can’t troubleshoot the server itself.  I considered trying to redeploy another server in a different datacenter to see if the problem persisted, but I figured I’d contact vultr support just to see if they have any insight into the issue.  I really didn’t expect any support as these are meant to be unmanaged instances.  If it truly is a network issue, I’d expect them to have detected it and be working on it.  And if it is a server configuration issue, I’m going to struggle – probably would have to re-install things and see if the error persists.

5.  Contacted vultr.com support.  I sent them a copy of my tracert and they sent a response in about 20 minutes on a Saturday – better than I expected.  They said that they use mtr as a more detailed way to get traceroute information, and they asked for me to run it from my home computer (there is a winmtr open source program) and run it from the server to my home using mtr in ubuntu.  I did both – and I like mtr.  It shows more information than tracert alone, checking each hop every second and giving me stats on dropped packets for each hop.  It turns out that the last hop to my server was getting about 20-25% packet loss, while every other step was less than 5%, most being between 0-2%.  I sent them this additional information.

So Vultr support said it looked like a combination of issues on their end and my end (although to me it looked like almost all issues were on their end).  They said they could migrate me to a new server and see if that helps, so I said go ahead.  It didn’t make much difference – and I’m curious what they actually did.  I asked if I could get moved to another datacenter and they said that I’d have to deploy a new server to do that.  I said I’d go ahead and do that on my own and they said they would forward the issue to the engineers.

I created a snapshot of my server and based on other reviews I’d seen in the meantime, I decided to go with their Intel based high frequency option, as the performance looked to be better (and it is – I’ll maybe do a review on that soon).  I also knew that if I redeployed a server with an Intel chip (instead of AMD as my initial server was), I’d have to be on a new machine rather than a just a new partition on the same possibly problematic server.

Migrating everything over due to the change in IP address did take some work since I was new to the process, and the issues I encountered I covered in another post.  But overall, it was pretty easy to do.

I rechecked the traceroute and there are no dropped packets on the new server, even in the same datacenter.  So the issue is possibly related just to the one machine rather than the datacenter.  Also, the Intel high frequency winds up being a noticeably faster configuration.  The same $6 option does have less bandwidth (1 TB instead of 2 TB), but there is still way more bandwidth than what my needs are at this time.  I’ll post the performance comparisons in a separate post when I have some time.

Leave a Reply