SRV39 Downtime Details and Updates

Update (May 16, 3:05pm): We are now restoring accounts. Test restores of accounts were all successful!

Please request a restoration of your site by sending a ticket to our support email id. Please also mention all the sites under your account so we are able to verify the working as well.

Please note as well that this will be data / mysql restore initially. For emails restoring, please simply create the email id using the control panel and it should work fine.

Update (May 16, 3:35am): The new server is setup and further we are able to access the recovered drive data. From the recovered data it looks 100% of the recovery is made. We are now copying the data to the new server, next we will try to start the mysql service and then we should start restoring the accounts (hopefully later today).

Update (May 15, 11:15pm): Sorry for the delay in processing. The drive was sent with delay at the recovery agency end. However it is now received at the datacenter and we have just got the confirmation. We will be setting up a new server and then run data / mysql verification. Once done we will update here.

Update (May 10, 3:55am): Good news. We have received successful recovery confirmation from the Flashback recovery. The drive containing the recovery data should be available back to our datacenter by Thursday night. Once we received the data and verify (both files and mysql data) we will start restoring the accounts. Hopefully sometime by Friday.

Delay, inconvenience is highly regretted.

Update (May 4, 10:50am): We have received the initial evaluation of the disks and the team has started the recovery process. This should take another 3 to 4 working days.

Evaluation complete, your media has the following problems: Sector Damage, Partition Corruption, File System Damage, Media Corruption, Firmware Corruption, Logical Alignment Failure.

Update (May 3, 11:25am): We have not yet received an update from the recovery agency. We expect to get an update tomorrow sometime.

Sorry for the delay in this, we are trying to have the data recovered as soon as possible, however the recovery company has their own workload as well which is delaying the progress.

Inconvenience is highly regretted.

Update (May 2, 10:55am): Due to Sat/Sun, 1st of May Holidays the recovery company has not yet started work on the drives. We will update the post as soon as an update is provided by them. Regarding the ETA if data is recovered quickly we expect the sites to be live by end of this week. If it is delayed this may go to next Monday / Tuesday.

Update (May 1, 11:40pm): All the accounts have been restored from backup. Waiting on the recovered data to restore the remaining accounts.

Update (May 1, 2:40pm): Transfer was complete and we are now restoring the accounts. This should finish by tonight.

Update (May 1, 11:15am): The transfer process is still in progress. Currently at around 80%.

Update (May 1, 2:50am): 25% of the accounts for which backup is available are now transferred. Once completed we will start restoring the accounts.

Update (April 30, 10:45pm): The new server is now online and we are transferring the accounts to it now.

Update (April 30, 8:40pm): The new server is still going through hardware diagnostics. 3 drives are fully tested and now it is going through the 4th and final drive. Once finished, we will start moving the available backups and restoring them. We expect this to be completed by tomorrow morning.

Regarding the recovery, the drives were sent to the flashback recovery, but due to Sat/Sun holidays and the 1st of May holiday the recovery may get delayed. If data is recovered quickly we expect the sites to be live by end of next week. If it is delayed this may go to next Monday / Tuesday.

However we expect that the data will be recovered quickly as it was RAID10 array so there is redundant data available which increases the chances of recovery and also faster process as well.

Thanks again for your patience.

Update (April 30, 8:50am): New server was setup and is ready to be used, however we are now thoroughly testing the hardware to make sure everything is okay. Then we will start transferring the data and restoring accounts.

Update (April 30, 1:05am): The new server is now being prepared. Once fully prepared and configured we will start restoring from the available backups (about 470 accounts). Later when the drive data is recovered the remaining accounts will be restored.

Update (April 29, 9:00pm): We have now received the go ahead to send the drives. And the drives will be sent to the company by tonight.

Regarding the ETA, please note that the recovery process may take anywhere between 5 to 7 days.

We will be setting up a new server by tonight and restore the accounts for which backup is available. This will mean about 50% of the sites will be online by tomorrow morning. For those clients whose backup is not available we can create a new blank accounts and once data is recovered we will be restore their data.

Update (April 29, 11:30am): The datacenter is yet to release the drives to the recovery company due to some of their internal regulations. We expect it to be released by tonight.

Update (April 29, 5:50am): Just an update regarding backups. We do maintain ‘courtesy’ backups of all the servers but due to r1soft crashing the server and mysql getting corrupt with innodb issues (back in February), all the future backups were failing with errors. During this time we have been trying different backup methods to have a working backup, as without it we were not even able to move the accounts to another server.

The good news is we attempted a backup of this server about two weeks ago which was partially successful and looking in detail we found that there is backup of around 50% accounts available. However still the remaining accounts were not backed up in that attempt. And you may have to wait for the recovery service to finish their job.

If you would like us to restore this backup, please open a ticket using this link.

Considering your backup is available we should be able to restore it for you asap.

Please note again that backups we maintain are courtesy backups, and no backup service is advertised in our package features list. As is also listed in our terms of service.

We also maintain a redundant RAID10 array of disks which recovers from 99% of these crashes, but with this freak incident not one but two drives failed resulting in the issue we all are facing.

Update (April 29, 5:15am): We are still waiting on the datacenter to release the drives to the recovery company. Will update this when the drives have reached the recovery company facility.

Update (April 29, 12:15am): We have got the final quote from the data recovery agency and are now in process of having the disks released to the recovery agency. Waiting on the datacenter approval.

Update (April 28, 8:25pm): Just an update: we are now in-contact with 3 data recovery companies. None of the companies give 100% recovery guarantee, however are confident that they will be able to repair the damage, also they are requiring at-least 5 days time for the recovery process. Will update this as soon as deal is finalised. We will be asking them to reduce the recovery time so the services are back online asap.

Update (April 28, 2:25pm): We are getting a lot of queries regarding when the server will be online again (ETA). Please note that at the moment no companies have yet answered (through call or email), as it is night hours in the US. We are waiting for their morning to proceed further. Once a deal is made with the recovery company, they will collect the drives form the datacenter in the US, take them to their own lab / offices and begin recovery. At that time we may get a tentative ETA. Then once the drives are received back, we hope every thing is fine and server boots up.

Inconvenience is highly regretted.

Update (April28, 12:50am): We have contacted some other RAID recovery services providers as well, however as it is after office hours in the US, we have to wait.

Update (April 28, 10:44am): Upon advise from the datacenter we have contacted the FlashBackData (http://www.flashbackdata.com/) recovery company for a quote. Waiting on their response.

Complete Incident Details:-

On 27th of April around 11pm Pak time, the server (SRV39) faced a crash, and our monitoring team immediately contacted the datacenter to start bringing the server online as soon as possible. First it looked the issue was with the RAM sticks which were immediately replaced and the server was booted again but even after changing the RAM sticks the issue remained.

The datacenter staff then further checked the server and found that the RAID card is reporting some degraded arrays. The server was using a RAID10 array (meaning total of 4 drives saving all the data) however after examination not ONE but TWO drives (drives 0 and 3) were not being detected by RAID array. Each time that they tried to replace either of the drives and issued a rebuild, either the full Array no longer sees the other RAID1 array as degraded and will not rebuild or it simply shows as off line.

Further the team tried the use the force (force rebuild and force online) commands which may result in some data loss, but got the server online however that method failed as well.

We along with the datacenter staff have been working on fixing the issue now from almost 10 hours and I believe they have made every attempt possible that can at least get this system back up for rebuild, or at least in a useable fashion.

So our next option is to look for a professional data recovery company who most likely be able to fix the issue. We are currently searching for a professional data recovery company so the data can be recovered.

Due to nature of the issue we cannot yet provide a confirm time when the server will be online (ETA). And appreciate your patience in this matter.

We are very sorry for the inconveniences this issue has caused, and we understand how important it is to for you to have a working 24×7 sites but unfortunately it is very hard to recover from a hardware issue.

We are in contact with our datacenter and asking them as to why after having redundant drives in RAID10 (4 backup drives instead of just one), we still got this issue. We are unable to recover from backups due to complete RAID array failure.

How can I get my site back online as soon as possible:

=1) You can provide us the local backup of your account data and we can immediately restore it on another server for you. Please use this form to send the request for immediate processing: Open backup restore ticket

=2) We can always create a new blank account for you (without the data) on another server. You should be able to start receiving emails. I need a brand new account on another server.

=3) You can always contact us at our support email ids and helplines if you have a specific issue that needs our attention.

=4) For any other specific related query please open a ticket here: I have a specific query regarding my account

=5) For any other general concerns, you can always leave us a message in the comments box below.

Inodes limits and how to reduce inodes count

An Inode is a data structure used to store the meta data of a file. The number of inodes indicates the number of files and folders you have under your hosting account. All the shared hosting service providers maintain these limits in order to avoid disk abuse as having large number of files on the system causes IO issues resulting in slowness for all the sites on the server.

Limits

The implemented inodes limits are:

Soft limit (150,000): If your inode count is 150K, your account will be removed from automatic weekly backups.

Hard limit (250,000): When the inode count reaches 250K, no further files can be added to the account.

If your hosting account usage is normal, you will not need to check the inodes limit, however if your account is creating large number of files automatically sprouting thousands of files within a day then you may hit the inode limit and may require to clean or properly configure your account.

The good news is you can very easily get your account under the limit by following these guidelines:

  1. Delete the unnecessary files and folders under your account
  2. Delete junk or spam emails
  3. Delete cache created by many CMS

You can also contact us and we can provide you with exact location of folder where most of inodes are being used.

 

Activating and Configuring WordPress Fastest Cache

If you are running a WordPress site, it is recommended to enable the WordPress Fastest Cache plugin as it creates static html pages for your php pages, resulting in much faster page response times.

Plugin can be enabled from the wordpress admin dashboard by simply visiting Plugins -> Add New -> Search Plugins and searching for ‘WordPress Fastest Cache’.

1

2

Once Installed and Activated, simply enable the ‘Cache System’ and ‘Browser Cache’ to enable the functionality. You can enable more options as well depending on your requirements.

3

 

Enabling Cloudflare for your Domain Name

CloudFlare is a free services that protects and accelerates your site by optimising content delivery and routing traffic through their intelligent global network, blocking hacking attempts against your site so your visitors get the fastest page load times, best performance and enhanced security.

To activate cloudfalre, simply login to your cPanel at: example.com:2082 (replace example.com with your own domain name) and under the ‘Software’ section click on ‘Cloudflare’ icon. Them simply follow the steps to enable the free cloudflare service:

1_a

2_a

3_a

4_a

 

How to run traceroute (tracert)

Traceroute is command line utility which can help us diagnosing your connection issues. If you are unable to open your website in browser, and you are sure it is not your internet connection issue, then tracert command result should help us in diagnosing the problem source.

To run tracert, you will need to open the ‘Command Prompt’.

Depending on your operating system, here is the process to run tracert command:

– Windows Vista / Windows7: Go to the Start menu, type cmd in the search field and then press Enter.

  • For Windows NT, 2000, and XP: From the Start menu, select Run. Type cmd then press Enter.

From the command prompt type:

tracert your-domain-name.com <Press Enter>

Please change the your-domain-name.com with your own domain name on which you are facing the issue.

It may take some time for the command to complete. When you see it is not processing anymore, take a screenshot of the command prompt window and send us the result in email as attachment.

Note: You can also right-click in the command prompt window and choose ‘Select All’ to select and copy text on the screen, and simply paste it in the email.