Zero Downtime: An Epic Post-Mortem of a Live Website Server Rescue

We’ve all heard the golden rule of system administration: double-check your commands in the terminal before pressing Enter. As a part-time system administrator and web developer, I always believed I was cautious enough to avoid a catastrophic typo. But yesterday, while performing routine maintenance on my production websites, I lived through every sysadmin’s absolute worst nightmare.

The environment is a cloud VPS running Ubuntu 24.04 LTS, hosting several Drupal websites powered by Apache2, PHP 8.3, and PostgreSQL 16. A single, disastrous chown typo stripped the entire filesystem of its root ownership, broke sudo, and threatened to permanently cripple my database and SSH access.

Against all odds, I successfully executed a complete, live website server rescue—achieving zero data loss, zero OS reinstallations, and without a single reboot. This recovery was made possible through a series of advanced privilege escalation bypasses, container mounts, and system package reinstalls.

Below is the complete, unfiltered technical log of the disaster, the near-misses, the adjusted commands, and the step-by-step recovery process.

The Fatal `chown` Typo

It was supposed to be a routine permission reset. Suspecting that some cached assets in my web folder had incorrect ownership, I decided to recursively reset the permissions of my local directory to the web server user, www-data.

In my mind, I was typing ./ (dot-slash) to target the current folder. In reality, my fingers reversed the sequence of the characters to /. (slash-dot).

In UNIX-like filesystems, /. resolves directly to the system root directory / itself. Thus, even without any spaces, reversing the sequence from ./ to /. is already a 100% fatal mistake that recursively targets the entire root partition. To make matters worse, an accidental space also slipped in between, executing:

ubuntu@ubuntu:/var/www/html/my-website$ sudo chown -R www-data:www-data / .

This space split the target into two distinct arguments: / (the absolute system root) and . (the current directory). The command instantly began changing the ownership of the entire root partition to www-data:www-data.

Almost immediately, my terminal began spitting out read-only filesystem warnings as it traversed the system's snap packages:

chown: changing ownership of '/./snap/core18/2983/usr/share/bash-completion/completions/pm-is-supported': Read-only file system
chown: changing ownership of '/./snap/core18/2983/usr/share/bash-completion/completions/pm-powersave': Read-only file system
...

I frantically pressed Ctrl+C to abort. But in the fraction of a second before the shell registered my interrupt, the command had already recursively processed alphabetical system directories past /bin, /boot, /etc, /home, and /lib.

When I tried to use sudo to inspect the damage, the system locked me out with a cold security warning:

ubuntu@ubuntu:/var/www/html/my-website$ sudo su
sudo: /etc/sudo.conf is owned by uid 33, should be 0
sudo: /etc/sudoers is owned by uid 33, should be 0
sudo: error initializing audit plugin sudoers_audit

Because chown had modified /etc/sudo.conf and /etc/sudoers to be owned by www-data (UID 33) instead of root (UID 0), the system's security boundaries were breached. To make matters worse, changing the ownership of setuid binaries (like /usr/bin/sudo) automatically strips their setuid privilege bit.

I also tried pkexec, but since this was a minimal cloud image, pkexec was not installed:

ubuntu@ubuntu-arm-web:~$ pkexec chown root:root /etc/sudoers
bash: pkexec: command not found

I was officially locked out. I had no root access, my active terminal was my only remaining lifeline, and if I closed it or rebooted, the server would be permanently bricked.

The Online Self-Rescue: Bypassing the Abyss

With my active SSH terminal as my only remaining gateway to a dying server, I turned to my AI co-pilot in absolute panic.

Initially, the advice I received was sober, realistic, and standard for any Linux expert: "The filesystem permissions tree is permanently destroyed. Take a backup of your site and database immediately, destroy the instance, and rebuild it from scratch."

But I wasn't ready to give up. I had invested too much effort into this setup, and my active terminal was still alive. I pushed back, insisting on finding a way to save the running system: "No, I don't want to reinstall. Help me find a way to recover."

Prompted by my determination to fight, the AI dug deep into the absolute lowest layers of the Linux environment and, in a moment of sheer technical brilliance, pulled out its ultimate secret weapons—a highly creative, multi-stage "jailbreak" and online self-rescue protocol.

Prompted by my determination to fight, the AI dug deep into the absolute lowest layers of the Linux environment and, in a moment of sheer technical brilliance, pointed out a hidden loophole. It reminded me to inspect my active user's system groups to see if we had the "secret weapon" needed for a jailbreak:

ubuntu@ubuntu:/var/www/html/my-website$ groups
ubuntu adm cdrom sudo dip www-data lxd

There it was, lying at the very end of the output: lxd (or lxc).

Because my local user belonged to the LXD group, we had the theoretical capacity to bypass sudo entirely. We would launch a privileged container, mount the host’s broken root filesystem (/) inside it, and restore sudo's ownership from within the container.

Step 1: Eager Container Jailbreak Attempt & The Permission Wall

This was our first ray of hope. But the moment I tried to execute it, the plan crashed into a brick wall.

To initialize the LXD daemon and start the container, lxd and snap require full write access to the user's home directory. However, because my accidental chown had processed the /home directory, my home folder (/home/ubuntu) was now owned by the web server user www-data instead of ubuntu.

As a regular user, I no longer had permission to write to my own home folder, and the initialization was instantly blocked:

ubuntu@ubuntu:/var/www/html/my-website$ lxd init --auto
Installing LXD snap, please be patient.
cannot create user data directory: /home/ubuntu/snap/lxd/38779: Permission denied

Thinking I could easily bypass this folder restriction, I tried a classic system-level trick: I created a temporary directory under the world-writable /tmp partition and redirected my HOME environment variable to it:

ubuntu@ubuntu:/var/www/html/my-website$ mkdir -p /tmp/snap-home
ubuntu@ubuntu:/var/www/html/my-website$ export HOME=/tmp/snap-home
ubuntu@ubuntu:/var/www/html/my-website$ lxd init --auto

But I had underestimated the security model of Ubuntu's Snap ecosystem. The Snap client launcher (cmd_run.go in snapd) is designed to ignore custom $HOME overrides for core sandbox directories. It strictly queries the system's passwd database for the user's real physical home directory (/home/ubuntu) and attempts to write to /home/ubuntu/snap anyway.

Because /home/ubuntu was still owned by www-data, the bypass failed, yielding the exact same permission error:

cannot create snap home dir: mkdir /home/ubuntu/snap: permission denied

I was trapped in a classic deadlock: to fix the home folder permissions I needed root access, but to get root access via LXD, I needed a writable home folder. The first attempt at jailbreak had failed before it even started, and we were completely out of conventional options.

Step 2: Out of Options — The PHP Web Exploit (Jailbreaking via the Web Server)

We were completely out of options. Every local command was locked, and the terminal seemed dead.

At this seemingly impassable wall, my AI co-pilot scrambled to find an out-of-the-box solution, exploiting the very nature of my permission disaster. The AI noticed a key detail: since /home/ubuntu was now owned by www-data, and www-data was the active user running my Apache2 web server, my running website's PHP process had the authority to modify my home folder!

Using my active terminal, I exploited this loophole by creating a temporary, single-line PHP script inside my web root:

ubuntu@ubuntu:/var/www/html/my-website$ echo "<?php shell_exec('chmod 777 /home/ubuntu'); echo 'Success!'; ?>" > /var/www/html/my-website/repair.php

I then opened my browser and navigated to: https://www.my-website.com/repair.php

The page loaded and outputted: Success! Check your home dir permissions.

Because www-data was the legitimate owner of /home/ubuntu due to the accidental chown, my web server's PHP process successfully executed the chmod command. My home directory was now world-writable (777), bypassing the permission blockade without requiring root access!

Step 3: LXD Container Privilege Escalation (Breaking Back into Root)

With my home directory finally writable thanks to the PHP web exploit, the path was clear. I returned to my terminal, successfully created the required snap directory directly under my real home folder, and initialized LXD natively:

# 1. Ensure our HOME is pointed back to our now-writable home directory
ubuntu@ubuntu:/var/www/html/my-website$ export HOME=/home/ubuntu

# 2. Create the snap directory (this succeeded instantly now that /home/ubuntu was 777!)
ubuntu@ubuntu:/var/www/html/my-website$ mkdir -p /home/ubuntu/snap

# 3. Initialize LXD automatically
ubuntu@ubuntu:/var/www/html/my-website$ lxd init --auto

The initialization completed with zero errors! We now executed the full container-based rescue sequence:

# 1. Launch a privileged Ubuntu 22.04 container (with no UID mapping to the host)
ubuntu@ubuntu:/var/www/html/my-website$ lxc launch ubuntu:22.04 temp-rescue -c security.privileged=true
Launching temp-rescue

# 2. Mount the entire host's root filesystem (/) inside the container at /mnt/host
ubuntu@ubuntu:/var/www/html/my-website$ lxc config device add temp-rescue host-root disk source=/ path=/mnt/host recursive=true
Device host-root added to temp-rescue

# 3. Inside the privileged container (where we are root), restore ownership of the main sudo and sudoers files
ubuntu@ubuntu:/var/www/html/my-website$ lxc exec temp-rescue -- chown -R 0:0 /mnt/host/etc/sudoers /mnt/host/etc/sudo.conf /mnt/host/usr/bin/sudo

# 4. Restore the crucial setuid privilege bit on the sudo binary
ubuntu@ubuntu:/var/www/html/my-website$ lxc exec temp-rescue -- chmod 4755 /mnt/host/usr/bin/sudo

# 5. Restore standard read-only permissions on /etc/sudoers
ubuntu@ubuntu:/var/www/html/my-website$ lxc exec temp-rescue -- chmod 440 /mnt/host/etc/sudoers

# 6. Delete the temporary rescue container
ubuntu@ubuntu:/var/www/html/my-website$ lxc delete temp-rescue --force

Everything executed perfectly. I felt an immense sense of triumph. I was finally about to reclaim my server. I took a deep breath and typed the command to elevate to root:

ubuntu@ubuntu:/var/www/html/my-website$ sudo su

Instead of a root prompt, the terminal spat out another cold security warning, followed by a prompt that made my blood run cold:

sudo: /etc/sudoers.d is owned by uid 33, should be 0
[sudo] password for ubuntu:

I stared at the screen in absolute disbelief.

Because I always connect to my cloud server using secure SSH digital certificates, my ubuntu user did not have a local password set. I had no password to enter. The system had successfully bypassed the broken sudo binary ownership, but because /etc/sudoers.d (the directory containing my passwordless sudo cloud-init configurations) was still owned by www-data(UID 33), sudo deemed the directory insecure, ignored my passwordless configuration, and demanded a password.

We had successfully built our escape tunnel, only to find a locked steel door at the very end.

Step 4: The Second LXD Jailbreak & Restoring the Cloud-Init Sudoers (The Home Stretch)

We were down, but not out. We now knew the exact cause: /etc/sudoers.d and its nested configuration files also needed their ownership reverted back to root.

Since LXD was already initialized and fully functional, I booted the rescue container up for a second time.

This time, to ensure we fixed both the /etc/sudoers.d directory and all of its nested files (such as 90-cloud-init-users), we ran a sh -c wrapper inside lxc exec to safely execute wildcards (*) inside the container:

# 1. Relaunch the container and mount the root filesystem again
ubuntu@ubuntu:/var/www/html/my-website$ lxc launch ubuntu:22.04 temp-rescue -c security.privileged=true
ubuntu@ubuntu:/var/www/html/my-website$ lxc config device add temp-rescue host-root disk source=/ path=/mnt/host recursive=true

# 2. Fix ownership of the /etc/sudoers.d directory and its nested files back to root
ubuntu@ubuntu:/var/www/html/my-website$ lxc exec temp-rescue -- sh -c "chown -R 0:0 /mnt/host/etc/sudoers.d"

# 3. Restore the correct security permissions (750) to the sudoers.d directory
ubuntu@ubuntu:/var/www/html/my-website$ lxc exec temp-rescue -- sh -c "chmod 750 /mnt/host/etc/sudoers.d"

# 4. Force standard read-only permissions (440) on all files inside sudoers.d
ubuntu@ubuntu:/var/www/html/my-website$ lxc exec temp-rescue -- sh -c "chmod 440 /mnt/host/etc/sudoers.d/*"

# 5. Delete the temporary rescue container once and for all
ubuntu@ubuntu:/var/www/html/my-website$ lxc delete temp-rescue --force

With the container deleted, I returned to my terminal and tried the escalation one more time:

ubuntu@ubuntu:/var/www/html/my-website$ sudo su
root@ubuntu:/var/www/html/my-website#

We were root! The system-level blockade was officially broken, and I was back in the driver's seat!

System Restoration: Rebuilding the Server from the Inside Out

After successfully reclaiming root access, the war was far from over. Thousands of system libraries, configurations (/etc), and binaries (/usr/bin) were still owned by www-data.

To assess the actual scope of the damage, I ran our global audit command for the very first time:

root@ubuntu:/var/www/html/my-website# find / -user www-data -not -path "/var/www/*" -not -path "/proc/*" -not -path "/sys/*" -not -path "/dev/*" -not -path "/run/*" -not -path "/tmp/*" 2>/dev/null

The output was a terrifying, endless torrent of files scrolling past my terminal screen, so long that the commands were pushed completely out of sight.

Manually checking, verifying, and resetting permissions on tens of thousands of system files was a physical impossibility. That was when my AI co-pilot proposed an incredibly powerful, automated self-healing masterstroke leveraging Debian/Ubuntu's native package database.

Step 1: Massive DPKG/APT Re-installation (Healing the Filesystem)

Because our package manager and APT system under /var/lib/dpkg were completely untouched by the original chown (as I had aborted the command before it reached the letter v), the database was 100% healthy. We triggered an automated re-installation of every single package currently installed on the server:

root@ubuntu:/var/www/html/my-website# apt-get install --reinstall $(dpkg --get-selections | grep -v deinstall | cut -f1) -y

How It Works:

This command instructs apt to download and reinstall every active system package. As dpkg extracts the files over the existing ones, it automatically overwrites and restores every system file's default owner, permissions, and setuid privilege flags to their exact out-of-the-box factory defaults.

During this automated process, I answered two standard interactive prompts:

Prompt 1 (Firewall): Save current IPv4 rules? [yes/no] -> I typed yes and pressed Enter.
Prompt 2 (Keyboard): The layout of keyboards varies per country... -> I typed 35 (corresponding to English (US)) and pressed Enter.

The process took about 10 minutes to download and overwrite everything, completing with zero errors and successfully rebuilding the entire system's default permission structure.

Step 2: The Database Crisis (Restoring PostgreSQL Permissions)

Once the system packages were fully reinstalled, I attempted to verify the website's status by rebuilding the cache. My heart sank as a massive database error filled the terminal:

ubuntu@ubuntu:/var/www/html/my-website$ sudo -u www-data /var/www/html/my-website/vendor/bin/drush cr

In Connection.php line 181:
                                                                                                             
  SQLSTATE[08006] [7] connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused  
        Is the server running on that host and accepting TCP/IP connections?

My PostgreSQL 16 database (port 5432) was completely offline.

The cause was simple: during our mass package re-installation, the database service was either stopped or it failed to automatically initialize because the recursive chown had stripped the correct ownership from the database's physical storage (/var/lib/postgresql) and its core configuration folders under /etc.

PostgreSQL enforces extremely strict security on its directories. If they are owned by anyone other than postgres, the daemon refuses to start.

Using my newly restored root terminal, I executed the precise physical permission restore sequence:

# 1. Restore the physical database storage directory ownership (must be postgres:postgres)
root@ubuntu:/var/www/html/my-website# chown -R postgres:postgres /var/lib/postgresql

# 2. Enforce strict directory permissions (700) on the data directory
root@ubuntu:/var/www/html/my-website# chmod 700 /var/lib/postgresql

# 3. Restore ownership on PostgreSQL configuration directories
root@ubuntu:/var/www/html/my-website# chown -R postgres:postgres /etc/postgresql
root@ubuntu:/var/www/html/my-website# chown -R root:postgres /etc/postgresql-common

# 4. Restart the master PostgreSQL service
root@ubuntu:/var/www/html/my-website# systemctl restart postgresql

The master service restarted with zero errors. I immediately re-ran the cache rebuild command:

root@ubuntu:/var/www/html/my-website# sudo -u www-data /var/www/html/my-website/vendor/bin/drush cr
 [success] Cache rebuild complete.

The database connection established instantly, and the system was fully online!

Step 3: The Post-Rescue Permission Audit & Root-Level Cleanup

With the database connection successfully restored and all websites running smoothly, I had reclaimed full control over my server. However, before executing a system reboot, I needed to perform a final system-wide audit.

I executed the exact same find audit command that had previously flooded my terminal screen. I was stunned by the result—the massive, endless list had shrunk to only a dozen lines!

root@ubuntu:/var/www/html/my-website# find / -user www-data -not -path "/var/www/*" -not -path "/proc/*" -not -path "/sys/*" -not -path "/dev/*" -not -path "/run/*" -not -path "/tmp/*" 2>/dev/null
/lib
/bin
/lost+found
/lib.usr-is-merged
/home
/home/ubuntu
/home/ubuntu/.ssh
/home/ubuntu/.ssh/authorized_keys
/home/ubuntu/.bash_logout
/home/ubuntu/.profile
/home/ubuntu/.bashrc
/snap/README
/snap/bin
/snap/core18/current
/var/cache/apache2/mod_cache_disk

Because our earlier dpkg re-installation had automatically restored the correct ownership and setuid privilege flags of all standard system packages under /bin, /sbin, /usr/bin, and /usr/sbin, the only remaining orphans were the root-level bootstrap directories and symlinks (which are not owned by standard .deb packages).

We just had to clean up those final remnants manually with a few precise commands:

# 1. Restore ownership of the three critical root-level system symlinks 
# (using the -h flag to ensure we only modify the symlink itself, not the target directory)
root@ubuntu:/var/www/html/my-website# chown -h root:root /lib /bin /lib.usr-is-merged

# 2. Restore ownership of the remaining root-level bootstrap directories
root@ubuntu:/var/www/html/my-website# chown root:root /lost+found /home /snap

# 3. Restore ownership and strict security permissions on my home directory and SSH keys
# (Crucial! This determines whether my SSH key certificate will work after a reboot!)
root@ubuntu:/var/www/html/my-website# chown -R ubuntu:ubuntu /home/ubuntu
root@ubuntu:/var/www/html/my-website# chmod 750 /home/ubuntu
root@ubuntu:/var/www/html/my-website# chmod 700 /home/ubuntu/.ssh
root@ubuntu:/var/www/html/my-website# chmod 600 /home/ubuntu/.ssh/authorized_keys

Step 4: Erasing the Footprints & The Safe Reboot

After executing those cleanup commands, I ran the audit tool once more. The output was perfectly clean:

The only files still listed under /snap were the read-only loop-mounted assets.
The Apache disk cache directory (mod_cache_disk) remained correctly owned by www-data so the web server could read and write compiled assets.

The main root partition (ls -al /) was back to a pristine state:

drwxr-xr-x 102 root root      12288 May 31 15:20 etc
drwxr-xr-x   4 root root       4096 Aug 23  2025 home
drwx------   8 root root       4096 May 31 15:29 root
drwxr-xr-x  32 root root       1020 May 31 15:39 run

For security reasons, keeping a world-writable PHP script that executes system commands on your web server is an immense hazard. Now that my ubuntu home directory was safely restored, I immediately deleted the temporary exploit script to secure my site:

root@ubuntu:/var/www/html/my-website# rm /var/www/html/my-website/repair.php

I took a deep breath, and typed the final command:

root@ubuntu:/var/www/html/my-website# reboot

The SSH session disconnected. For about thirty seconds, I waited in anticipation. I opened a new terminal on my local machine and initiated an SSH connection:

ssh -i ~/.ssh/id_rsa ubuntu@ubuntu

The screen flashed, the cryptographic key verified instantly, and the prompt loaded without a single warning:

ubuntu@ubuntu:~$

The server had booted perfectly. Apache2 and PostgreSQL 16 launched seamlessly on boot under their correct, secure system users. The websites were completely responsive, and the entire platform was fully functional. We had successfully executed an online website server rescue with zero data loss and zero downtime.

Final Thoughts & Key Takeaways

Looking back at this intense, one-hour sprint, I am filled with both immense relief and profound gratitude. What began as a routine directory reset transformed into a masterclass in modern systems administration, cooperative debugging, and live recovery.

Through this dramatic experience, I’ve walked away with several invaluable lessons that every administrator and developer should keep in mind:

1. The Weight of the Prompt:

The Linux shell is unforgivingly powerful. A reversed sequence combined with an accidental space (/ .) can dismantle an operating system in a fraction of a second. Double-checking recursive commands is not just good practice—it is a non-negotiable insurance policy.

2. The absolute Rule of Backups & Snapshots:

No matter how confident you are in your commands, a fresh backup is your ultimate lifeline. When purchasing cloud hosting, never hesitate to spend a little extra on automated daily backups and snapshots. Having a secure, external restore point gives you the massive peace of mind needed to troubleshoot under pressure, or a safe exit if things truly go south.

3. Don't Panic and Exhaust Every Option:

When disaster strikes, panic is your worst enemy. It leads to rushed, emotional decisions that often compound the damage. Take a deep breath, keep your active terminals open, and explore creative angles before throwing in the towel. Many complex systems have hidden, un-documented rescue paths if you look deep enough and refuse to give up.

4. The Power of Human Persistence & AI Collaboration:

When the sky was falling, the standard, logical advice was to reinstall. But my refusal to give up pushed my AI co-pilot to dig deeper. Together, we moved prior conventional constraints to devise highly creative, system-level workarounds—like our PHP-based home directory un-block and the LXD privileged container jailbreak. This partnership turned a potential week-long disaster into a fascinating technical victory in under an hour.

5. The Elegance of Native Systems:

We successfully rebuilt, healed, and stabilized the entire Ubuntu 24.04 filesystem online, with zero data loss and zero downtime, by leveraging native, robust, and de facto standard core tools like the dpkg package database and standard systemd targets.

Now, over to you: Have you ever experienced a similar nerve-wracking server disaster? What was your most memorable "fat-finger" command, and how did you survive it? Drop a comment below and share your stories—I’d love to hear how you escaped the abyss!