Systems Administration

1. Introduction / Purpose#

This documentation provides comprehensive guidance for managing cloud infrastructure, Linux servers, and Windows Server environments.

It is intended for IT staff, and Systems Administrators alike.

2. Environment Overview#

Systems and IT infrastructure generally operate under three primary environments: Development, Testing (Staging), and Production. Each environment serves a specific role in the system lifecycle and must be managed not only for functionality, but also for cost efficiency, security, and operational stability.

In addition to environment separation, systems administration must account for resource utilization, cost optimization, and continuous monitoring of both cloud and physical IT assets.

2.1 Environment Classification#

Development Environment#

Used for development, experimentation, and initial configuration testing.
Systems in this environment are expected to change frequently.
Cost optimization is critical due to non-production usage.

Administrative Considerations:

Use smaller instance sizes (e.g., AWS t-series, Azure B-series).
Implement auto-shutdown schedules for unused resources.
Limit access to developers and system administrators only.
Minimal monitoring, focused on availability rather than performance.

Testing / Staging Environment#

Serves as a pre-production validation environment.
Closely mirrors production configurations.

Administrative Considerations:

Use snapshots and backups for rollback testing.
Enable monitoring and logging to simulate production behavior.
Maintain cost parity with production while avoiding over-provisioning.
Temporary resources should be removed after testing cycles.

Production Environment#

Live environment serving end users and business operations.
Requires high availability, security, and performance.

Administrative Considerations:

Strict access control (IAM roles, RBAC, least privilege).
Continuous monitoring and alerting enabled.
High-availability and redundancy implemented.
Regular patching, backups, and disaster recovery plans enforced.

2.2 Cost Optimization Responsibilities#

Cost optimization is a core responsibility of a systems administrator, particularly in cloud-based environments. Resources must be provisioned efficiently while maintaining system reliability.

Key Cost Optimization Practices:

Right-sizing virtual machines based on actual usage metrics.
Removing unused or orphaned resources (disks, snapshots, IPs).
Using reserved or savings plans for long-running workloads.
Implementing auto-scaling and auto-shutdown policies.
Monitoring cloud billing dashboards and usage reports.

Example Tools:

AWS Cost Explorer
Azure Cost Management
CloudWatch / Azure Monitor metrics

2.3 IT Equipment and Infrastructure Monitoring#

Monitoring extends beyond cloud resources and includes servers, network devices, and endpoint equipment.

Monitored Assets Include:

Cloud servers (CPU, memory, disk, network usage)
On-prem or virtualized servers
Network devices (routers, switches, firewalls)
Storage systems
End-user IT equipment (where applicable)

Monitoring Objectives:

Detect failures and performance degradation early.
Prevent downtime through proactive alerts.
Ensure hardware and systems operate within expected thresholds.
Maintain inventory visibility and lifecycle tracking.

Common Monitoring Metrics:

CPU, RAM, and disk utilization
Network latency and packet loss
Service uptime and response times
Hardware health indicators

2.4 Administrative Oversight#

A systems administrator is responsible for maintaining operational awareness across all environments.

Administrative Tasks Include:

Asset inventory management
Patch and update scheduling
Access and permission audits
Incident response and root cause analysis
Documentation of system changes and configurations

2.5 Summary#

Effective systems administration requires balancing environment separation, cost efficiency, and infrastructure monitoring. Proper planning and continuous oversight ensure systems remain secure, performant, and financially sustainable across all environments.

Note: Proper segregation of environments ensures that changes are tested safely before affecting users, reduces downtime, and improves system reliability.

3. Server Administration Overview#

This section provides an overview of Linux and Windows Server administration, including commonly used commands and administrative tasks required for daily operations, monitoring, and maintenance.

3.1 Linux Server Overview#

Linux servers are commonly used for web services, application servers, databases, and infrastructure tooling due to their stability, performance, and flexibility.

Common Linux Server Roles#

Web servers (Nginx, Apache)
Application servers
Database servers
Bastion hosts
CI/CD runners

Core Administrative Responsibilities#

User and permission management
Service management
System monitoring
Patch management
Backup and recovery
Security hardening

3.1.1 Common Linux Administrative Commands#

System Information & Monitoring#

uptime | awk '{print $1,$2,$3}' top -b -n 1 | head -n 20 free -m | grep Mem df -h | grep -E '^/dev/' du -sh /var/log/* | sort -h

Process & Service Management#

User & Permission Management#

cat /etc/passwd | grep sysadmin getent group sudo | grep sysadmin ls -ld /var/www | awk '{print $1,$3,$4}'

Package Management#

apt list --upgradable | grep security yum check-update | grep -v '^Loaded'

Networking#

ip a | grep inet ss -tuln | grep LISTEN ping -c 4 google.com | grep 'packet loss' traceroute google.com | tail -n 5

Log Analysis#

cat /var/log/auth.log | grep sshd | tail -n 20 journalctl | grep error | head -n 20

3.1.2 Linux Server Best Practices#

Disable root SSH login
Use SSH key-based authentication
Enable firewall (UFW or firewalld)
Schedule automated backups
Monitor logs regularly

3.2 Windows Server Overview#

Windows Server is commonly used for enterprise environments requiring centralized identity management, file services, and Microsoft-based workloads.

Common Windows Server Roles#

Active Directory Domain Controller
File and Print Server
DNS and DHCP Server
Application Server

Core Administrative Responsibilities#

User and group management
Group Policy administration
Server role management
Event log monitoring
Patch management

3.2.1 Common PowerShell Administrative Commands#

System Information#

Get-ComputerInfo | Select-Object OsName, OsVersion, CsName Get-Process | Sort-Object CPU -Descending | Select-Object -First 10 Get-Service | Where-Object {$_.Status -eq "Running"}

User & Group Management (Active Directory)#

Get-ADUser -Filter * | Select-Object Name, Enabled Get-ADGroupMember "Domain Admins" | Select-Object Name Get-ADUser sysadmin | Format-List *

Service Management#

Get-Service | Where-Object {$_.Name -like "*update*"} Get-Service wuauserv | Select-Object Status, StartType

Disk & Resource Monitoring#

Get-Volume | Where-Object {$_.DriveType -eq 'Fixed'} Get-PSDrive | Sort-Object Used -Descending Get-Counter '\Processor(_Total)\% Processor Time' | Select-Object -ExpandProperty CounterSamples

Event Logs#

Get-WinEvent -LogName System | Where-Object {$_.LevelDisplayName -eq "Error"} | Select-Object -First 20 Get-EventLog Security | Where-Object {$_.EventID -eq 4625} | Select-Object TimeGenerated, Message

3.2.2 Windows Server Best Practices#

Enforce least privilege via AD and GPO
Regularly review event logs
Enable Windows Defender and firewall
Apply scheduled updates and patches
Maintain regular system backups

3.4 Operational Considerations#

Both Linux and Windows servers must be monitored and maintained consistently to ensure performance, security, and cost efficiency.

Key Focus Areas:

Resource utilization tracking
Alerting and incident response
Automation using scripts
Documentation of configuration changes

4. AWS Administration#

AWS (Amazon Web Services) provides scalable cloud infrastructure for servers, storage, networking, and applications. As a Systems Administrator, responsibilities include instance management, storage, security, monitoring, cost optimization, and automation.

4.1 AWS Core Services for Sysadmins#

Service	Purpose
EC2	Virtual servers (instances) for workloads
S3	Object storage for backups, logs, assets
IAM	User, group, and role management
CloudWatch	Monitoring and alerting
VPC	Network configuration and security
RDS / DynamoDB	Managed database services

4.2 EC2 Management#

Listing running instances and filtering by state:

aws ec2 describe-instances \ --query "Reservations[*].Instances[*].[InstanceId,State.Name,Tags]" \ --output table | grep running

Start / Stop instances with filtering:

aws ec2 start-instances --instance-ids i-0123456789abcdef0 aws ec2 stop-instances --instance-ids i-0123456789abcdef0

Get instance CPU utilization using CloudWatch:

aws cloudwatch get-metric-statistics \ --metric-name CPUUtilization \ --start-time $(date -u +"%Y-%m-%dT%H:%M:%SZ" -d "1 hour ago") \ --end-time $(date -u +"%Y-%m-%dT%H:%M:%SZ") \ --period 300 \ --namespace AWS/EC2 \ --statistics Average \ --dimensions Name=InstanceId,Value=i-0123456789abcdef0 | jq '.Datapoints[] | {Timestamp, Average}'

4.3 S3 Management#

Listing buckets and filtering:

aws s3 ls | grep backup

Copy files to S3 with progress and filtering:

aws s3 cp /var/log/ s3://company-backup/logs/ --recursive --exclude "*" --include "*.log" | tee s3-upload.log

Check bucket storage usage:

aws s3 ls s3://company-backup/logs/ --recursive | awk '{sum+=$3} END {print sum/1024/1024 " MB"}'

4.4 IAM (Identity and Access Management)#

List all users and their attached policies:

aws iam list-users | jq -r '.Users[].UserName' | while read user; do echo "User: $user" aws iam list-attached-user-policies --user-name $user | jq -r '.AttachedPolicies[].PolicyName' done

Check last login and inactivity:

aws iam list-users | jq -r '.Users[] | [.UserName, .PasswordLastUsed] | @tsv' | grep -v null

4.5 Monitoring & Alerts (CloudWatch)#

Get top 5 instances by CPU utilization:

Set CloudWatch alarm example:

aws cloudwatch put-metric-alarm \ --alarm-name HighCPUUtilization \ --metric-name CPUUtilization \ --namespace AWS/EC2 \ --statistic Average \ --period 300 \ --threshold 80 \ --comparison-operator GreaterThanThreshold \ --dimensions Name=InstanceId,Value=i-0123456789abcdef0 \ --evaluation-periods 2 \ --alarm-actions arn:aws:sns:us-east-1:123456789012:NotifyMe

4.6 Cost Optimization Practices#

Identify idle EC2 instances:

aws ec2 describe-instances --filters "Name=instance-state-name,Values=running" \ | jq -r '.Reservations[].Instances[] | select(.CpuOptions.CoreCount==1) | .InstanceId'

List unattached EBS volumes:

aws ec2 describe-volumes --filters Name=status,Values=available | jq -r '.Volumes[].VolumeId'

Monitor S3 storage usage by bucket:

aws s3 ls | awk '{print $3}' | xargs -I {} aws s3 ls s3://{} --recursive | awk '{sum+=$3} END {print sum/1024/1024 " MB"}'

4.7 Operational Notes#

Always tag resources properly (Environment, Owner, Project) for monitoring and cost tracking.
Use automation scripts for repetitive tasks like backups, snapshots, and scaling.
Regularly review CloudWatch metrics, billing dashboards, and IAM audit logs.
Combine aws-cli with jq, grep, awk for filtering and reporting.

5. Microsoft Azure Administration#

Azure provides cloud infrastructure for compute, storage, networking, and identity services. As a Systems Administrator, responsibilities include VM and resource management, identity & access control, monitoring, automation, and cost optimization.

5.1 Core Azure Services for Sysadmins#

Service	Purpose
Azure Virtual Machines (VMs)	Virtual servers for workloads
Azure Storage	Blob storage, file shares, and backups
Azure Active Directory (AAD)	Identity and access management
Azure Monitor	Metrics, logs, and alerting
Resource Groups	Logical organization of resources
Virtual Networks (VNet)	Network isolation and configuration

5.2 Azure CLI Basics#

The Azure CLI (az) is a powerful tool to manage resources, filter output, and automate tasks. You can combine commands with pipes for real operational tasks.

5.2.1 Virtual Machine Management#

List all running VMs in a subscription:

az vm list --show-details --query "[?powerState=='VM running'].[name,resourceGroup,location]" -o table | grep -i "production"

Start / Stop a VM:

az vm start --name prod-web-01 --resource-group ProdRG az vm deallocate --name dev-db-01 --resource-group DevRG

Check CPU and memory metrics for a VM:

az monitor metrics list \ --resource "/subscriptions/<sub-id>/resourceGroups/ProdRG/providers/Microsoft.Compute/virtualMachines/prod-web-01" \ --metric "Percentage CPU" "Available Memory Bytes" \ --interval PT5M | jq '.value[] | {name:.name.value, average:.timeseries[0].data[-1].average}'

5.2.2 Storage Management#

List all storage accounts and filter by name:

az storage account list -o table | grep backup

Check blob storage usage:

az storage blob list --container-name logs --account-name backupstorage | jq '.[] | {name:.name, size:.properties.contentLength}' | awk '{sum+=$2} END {print sum/1024/1024 " MB"}'

Upload files to a blob container with logging:

az storage blob upload-batch -d logs --account-name backupstorage -s /var/log/ --pattern "*.log" | tee azure-upload.log

5.2.3 Azure Active Directory & Identity Management#

List all users and filter disabled accounts:

az ad user list --query "[?accountEnabled==false].[displayName,userPrincipalName]" -o table

Check group memberships for a user:

az ad user get-member-groups --id sysadmin@company.com | jq '.[]'

Assign roles to a user (least privilege):

az role assignment create --assignee sysadmin@company.com --role "Reader" --scope /subscriptions/<sub-id>/resourceGroups/DevRG

5.2.4 Monitoring & Alerts#

View metrics for a resource:

az monitor metrics list --resource /subscriptions/<sub-id>/resourceGroups/ProdRG/providers/Microsoft.Compute/virtualMachines/prod-web-01 \ --metric "Percentage CPU" --interval PT5M | jq '.value[0].timeseries[0].data[-1]'

List alerts triggered in the last 24 hours:

az monitor alert list --query "[?properties.status=='Fired'].[name,properties.condition]" -o table

Set an alert for high CPU usage:

az monitor metrics alert create \ --name HighCPUAlert \ --resource-group ProdRG \ --scopes "/subscriptions/<sub-id>/resourceGroups/ProdRG/providers/Microsoft.Compute/virtualMachines/prod-web-01" \ --condition "avg Percentage CPU > 80" \ --description "Alert for CPU utilization above 80%" \ --action "/subscriptions/<sub-id>/resourceGroups/ProdRG/providers/microsoft.insights/actionGroups/NotifyOps"

5.2.5 Cost Optimization Practices#

Identify underutilized VMs:

az vm list -d --query "[?powerState=='VM running'].[name, powerState, hardwareProfile.vmSize]" -o table | grep -i "Standard_B1s"

Remove unattached disks:

az disk list --query "[?managedBy==null].[name, resourceGroup, diskSizeGb]" -o table

Review storage usage per resource group:

az storage account list --query "[].{Name:name, ResourceGroup:resourceGroup}" -o table | while read name group; do az storage blob list --account-name $name --container-name logs | jq '[.[] | .properties.contentLength] | add/1024/1024' done

5.3 Operational Notes#

Tag resources (Environment, Owner, Project) for billing and monitoring clarity.
Automate repetitive tasks (start/stop VMs, cleanup unused resources, backups).
Regularly check metrics, logs, and alerts via Azure Monitor.
Use Azure CLI with jq, grep, and awk for reporting and automation.

6. Linux Server Administration#

Linux servers are widely used in enterprise environments for web servers, databases, application servers, and infrastructure services. Effective Linux administration involves user management, service control, package management, monitoring, security hardening, backups, and automation.

6.1 Linux Server Roles#

Role	Purpose
Web Server	Host websites/applications (Nginx, Apache)
Database Server	MySQL, PostgreSQL, MongoDB, etc.
Bastion Host	Secure access point for network administration
File Server / Storage	Centralized data repository
CI/CD Runner / Build Server	Automated build & deployment pipelines

6.2 Core Administrative Tasks#

User and Group Management
Service Management
Package Updates and Maintenance
Filesystem and Storage Management
Network Configuration
System Monitoring and Logging
Security Hardening
Backup and Recovery
Automation with Scripts and Cron Jobs

6.3 Linux Command Examples#

6.3.1 System Information & Monitoring#

uptime | awk '{print "Uptime: "$3,$4,$5}' top -b -n 1 | head -n 20 free -m | grep Mem df -h | grep '^/dev/' du -sh /var/log/* | sort -h

6.3.2 Process & Service Management#

ps aux | grep nginx | grep -v grep systemctl status nginx | grep Active systemctl restart nginx journalctl -u nginx | tail -n 50

6.3.3 User & Permission Management#

cat /etc/passwd | grep sysadmin getent group sudo | grep sysadmin ls -ld /var/www | awk '{print $1,$3,$4}' chmod 750 /var/www chown www-data:www-data /var/www

6.3.4 Package Management#

# Ubuntu/Debian apt list --upgradable | grep security apt update && apt upgrade -y # RHEL/CentOS yum check-update | grep -v '^Loaded' yum update -y

6.3.5 Networking#

ip a | grep inet ss -tuln | grep LISTEN ping -c 4 google.com | grep 'packet loss' traceroute google.com | tail -n 5

6.3.6 Log Analysis#

cat /var/log/auth.log | grep sshd | tail -n 20 journalctl | grep error | head -n 20

6.4 Monitoring & Automation#

Monitoring Examples#

CPU/Memory/Disk:

top -b -n1 | head -n5 free -m | awk 'NR==2{printf "Memory Usage: %.2f%%\n", $3*100/$2 }' df -h | awk '$5+0 > 80 {print $0}'

Log monitoring with tail and grep:

tail -f /var/log/syslog | grep error

Alerts via scripts:

df -h | awk '$5+0 > 90 {print "Disk Full: "$6}' | mail -s "Disk Alert" admin@company.com

Automation with Cron Jobs#

# Daily backup at 2 AM 0 2 * * * /usr/local/bin/backup.sh >> /var/log/backup.log 2>&1 # Weekly system update 0 3 * * 0 apt update && apt upgrade -y | tee /var/log/weekly-upgrade.log

6.5 Security Best Practices#

Disable root SSH login: PermitRootLogin no in /etc/ssh/sshd_config
Use SSH key authentication
Enable and configure firewall (UFW or iptables)
Install and maintain antivirus or intrusion detection (e.g., ClamAV, fail2ban)
Regularly audit users, groups, and permissions

6.6 Backup & Recovery#

Use rsync or tar for file backups:

rsync -avz /var/www/ /backup/www/ | tee /var/log/backup.log tar -czvf /backup/etc_backup_$(date +%F).tar.gz /etc

Automate snapshots for cloud-hosted Linux VMs (AWS/Azure CLI)—

6.7 Operational Notes#

Regularly check system metrics and logs.
Automate repetitive tasks wherever possible.
Maintain documentation for changes and configurations.
Combine pipes, awk, grep, jq for monitoring, reporting, and automation.
Follow security and backup best practices consistently.

7. Windows Server Administration#

Windows Server is widely used in enterprise environments for identity management, file services, application hosting, and network infrastructure. Effective Windows administration involves user/group management, service control, monitoring, patching, security, and automation.

7.1 Windows Server Roles#

Role	Purpose
Active Directory Domain Controller (AD DC)	Centralized authentication and authorization
File and Print Server	Centralized file sharing and printing
DNS / DHCP Server	Name resolution and IP address management
Application Server / IIS	Host web applications
Backup & Storage Server	Centralized backup and storage services

7.2 Core Administrative Tasks#

User and group management (Active Directory)
Service monitoring and management
Patch management and updates
Disk and storage management
Event log monitoring
Security and access control
Automation with PowerShell scripts
Backup and disaster recovery

7.3 PowerShell Command Examples (With Pipelines)#

7.3.1 System Information & Monitoring#

Get-ComputerInfo | Select-Object CsName, OsName, OsVersion Get-Process | Sort-Object CPU -Descending | Select-Object -First 10 Get-Service | Where-Object {$_.Status -eq "Running"} | Sort-Object DisplayName

7.3.2 User & Group Management (Active Directory)#

# List all users Get-ADUser -Filter * | Select-Object Name, Enabled | Sort-Object Name # List disabled accounts Get-ADUser -Filter {Enabled -eq $false} | Select-Object Name, LastLogonDate # Check group memberships for a user Get-ADUser sysadmin | Get-ADPrincipalGroupMembership | Select-Object Name # Add user to a group Add-ADGroupMember -Identity "Domain Admins" -Members sysadmin

7.3.3 Service Management#

Get-Service | Where-Object {$_.Name -like "*update*"} | Select-Object Name, Status Restart-Service -Name wuauserv | Tee-Object -FilePath C:\Logs\service_restart.log

7.3.4 Disk & Resource Monitoring#

Get-Volume | Where-Object {$_.DriveType -eq 'Fixed'} | Select-Object DriveLetter, SizeRemaining Get-PSDrive | Sort-Object Used -Descending Get-Counter '\Processor(_Total)\% Processor Time' | Select-Object -ExpandProperty CounterSamples

7.3.5 Event Log Analysis#

# Recent system errors Get-WinEvent -LogName System | Where-Object {$_.LevelDisplayName -eq "Error"} | Select-Object TimeCreated, Id, Message | Sort-Object TimeCreated -Descending | Select-Object -First 20 # Failed login attempts Get-EventLog Security | Where-Object {$_.EventID -eq 4625} | Select-Object TimeGenerated, Message | Sort-Object TimeGenerated -Descending

7.4 Automation & Scheduled Tasks#

Example: Daily Backup via PowerShell

# Backup C:\Data to D:\Backup $source = "C:\Data" $destination = "D:\Backup" Copy-Item -Path $source -Destination $destination -Recurse | Tee-Object -FilePath C:\Logs\backup.log

Example: Scheduled Task for System Updates

$action = New-ScheduledTaskAction -Execute "powershell.exe" -Argument "-File C:\Scripts\update.ps1" $trigger = New-ScheduledTaskTrigger -Daily -At 3am Register-ScheduledTask -TaskName "WeeklyUpdate" -Action $action -Trigger $trigger -User "SYSTEM" -RunLevel Highest

7.5 Security Best Practices#

Enforce least privilege and proper AD group memberships.
Enable Windows Firewall and configure inbound/outbound rules.
Use account lockout policies and MFA for sensitive accounts.
Regularly audit event logs and AD changes.
Ensure regular patching and updates.

7.6 Backup & Recovery#

Use Windows Server Backup or PowerShell-based backup scripts:

wbadmin start backup -backupTarget:D: -include:C: -allCritical -quiet

Automate snapshots for cloud-hosted Windows VMs (AWS EC2, Azure VM).
Maintain offsite or cloud backups for disaster recovery.

7.7 Operational Notes#

Combine PowerShell pipelines with filters (Where-Object, Select-Object) for monitoring and reporting.
Regularly check CPU, memory, disk, and event logs.
Document server configurations, role assignments, and scripts.
Automate repetitive tasks where possible (updates, backups, monitoring).

8. Automation and Scripting#

Automation is a core responsibility for Systems Administrators. It reduces manual errors, improves operational efficiency, and ensures consistency across servers and cloud environments. Common automation areas include:

Server provisioning and configuration
Backup and disaster recovery
Patch management and updates
Monitoring and alerts
Resource cleanup and cost optimization
Application deployment

Automation can be implemented using shell scripts, PowerShell scripts, CLI pipelines, cron jobs, scheduled tasks, and orchestration tools.

8.1 Linux Automation (Bash)#

8.1.1 Backup Script Example#

#!/bin/bash # Daily backup of /var/www to /backup SOURCE="/var/www" DEST="/backup/www-$(date +%F)" mkdir -p $DEST rsync -avz $SOURCE $DEST | tee /var/log/backup.log # Remove backups older than 7 days find /backup -maxdepth 1 -type d -mtime +7 -exec rm -rf {} \;

8.1.2 Monitoring Script Example#

#!/bin/bash # Check disk usage and alert if > 90% df -h | awk '$5+0 > 90 {print "Disk Full: "$6}' | mail -s "Disk Alert" admin@company.com

8.1.3 Cron Jobs#

# Daily backup at 2 AM 0 2 * * * /usr/local/bin/backup.sh >> /var/log/cron_backup.log 2>&1 # Weekly system update at 3 AM Sunday 0 3 * * 0 apt update && apt upgrade -y | tee /var/log/weekly_update.log

8.2 Windows Automation (PowerShell)#

8.2.1 Backup Script Example#

# Backup C:\Data to D:\Backup $source = "C:\Data" $destination = "D:\Backup\Data-$(Get-Date -Format yyyy-MM-dd)" Copy-Item -Path $source -Destination $destination -Recurse | Tee-Object -FilePath C:\Logs\backup.log

8.2.2 Monitoring Script Example#

# CPU utilization alert $cpu = Get-Counter '\Processor(_Total)\% Processor Time' | Select-Object -ExpandProperty CounterSamples if ($cpu.CookedValue -gt 80) { Send-MailMessage -To admin@company.com -Subject "High CPU Alert" -Body "CPU usage is $($cpu.CookedValue)%" }

8.2.3 Scheduled Tasks#

$action = New-ScheduledTaskAction -Execute "powershell.exe" -Argument "-File C:\Scripts\backup.ps1" $trigger = New-ScheduledTaskTrigger -Daily -At 2am Register-ScheduledTask -TaskName "DailyBackup" -Action $action -Trigger $trigger -User "SYSTEM" -RunLevel Highest

8.3 AWS Automation (CLI + Scripts)#

8.3.1 EC2 Start/Stop Script#

#!/bin/bash # Start all dev instances in a resource group aws ec2 describe-instances --filters "Name=tag:Environment,Values=Dev" "Name=instance-state-name,Values=stopped" \ --query "Reservations[*].Instances[*].InstanceId" -o text | \ xargs -n1 aws ec2 start-instances --instance-ids

8.3.2 Snapshot Automation#

# Daily snapshot of volumes aws ec2 describe-volumes --filters Name=tag:Backup,Values=True | \ jq -r '.Volumes[].VolumeId' | while read vol; do aws ec2 create-snapshot --volume-id $vol --description "Daily Backup $(date +%F)" done

8.4 Azure Automation (CLI + Scripts)#

8.4.1 Start/Stop VMs#

# Stop all dev VMs to save costs az vm list -d --query "[?tags.Environment=='Dev'].{Name:name, ResourceGroup:resourceGroup}" -o tsv | \ while read vm rg; do az vm deallocate --name $vm --resource-group $rg done

8.4.2 Storage Cleanup#

# Delete blobs older than 30 days az storage blob list --account-name mystorage --container-name logs -o json | \ jq -r '.[] | select(.properties.lastModified < "'$(date -d '30 days ago' --iso-8601)'" ) | .name' | \ xargs -I {} az storage blob delete --account-name mystorage --container-name logs --name {}

8.5 Best Practices for Automation#

Test scripts in development/staging first.
Use logging and notifications for errors and successes.
Tag resources to allow automated filtering.
Schedule automation with cron (Linux) or Scheduled Tasks (Windows).
Use version control for scripts (Git) and document changes.
Combine CLI tools with filtering utilities (grep, awk, jq) for reporting and control.
Implement rollback mechanisms when automating destructive tasks (deletions, terminations, snapshots).

9. Troubleshooting & Best Practices#

Effective systems administration relies on quick diagnostics, structured troubleshooting, and proactive best practices. This section covers common issues, troubleshooting commands, and preventive measures across Linux, Windows, AWS, and Azure.

9.1 General Troubleshooting Principles#

Identify the problem clearly – logs, error messages, or monitoring alerts.
Isolate the issue – determine if it’s server, network, application, or configuration related.
Reproduce if possible – in development or staging environments.
Use logs and metrics – filter and analyze using CLI pipelines or scripts.
Escalate when necessary – involve team members or cloud support if the issue is outside your scope.
Document the resolution – update runbooks, knowledge base, or playbooks.

9.2 Linux Troubleshooting#

9.2.1 Common Issues & Commands#

High CPU/Memory Usage

top -b -n1 | head -n10 ps aux --sort=-%cpu | head -n10 free -m | awk 'NR==2{printf "Memory Usage: %.2f%%\n",$3*100/$2 }'

Disk Full / Low Space

df -h | grep '^/dev/' | sort -k5 -r | head -n5 du -sh /var/log/* | sort -h | tail -n10

Service Failure

systemctl status nginx | grep Active journalctl -u nginx | tail -n50

Network Issues

ip a | grep inet ss -tuln | grep LISTEN ping -c4 8.8.8.8 | grep 'packet loss' traceroute google.com | tail -n5

Log Analysis

cat /var/log/syslog | grep error | tail -n50 journalctl | grep fail | head -n20

9.3 Windows Troubleshooting#

9.3.1 Common Issues & PowerShell Commands#

High CPU / Memory

Get-Process | Sort-Object CPU -Descending | Select-Object -First 10 Get-Counter '\Processor(_Total)\% Processor Time' | Select-Object -ExpandProperty CounterSamples

Service Failures

Get-Service | Where-Object {$_.Status -ne "Running"} | Select-Object Name, Status Restart-Service wuauserv | Tee-Object C:\Logs\service_restart.log

Event Log Analysis

Get-WinEvent -LogName System | Where-Object {$_.LevelDisplayName -eq "Error"} | Select-Object TimeCreated, Id, Message | Sort-Object TimeCreated -Descending | Select-Object -First 20

Network Issues

Test-Connection google.com -Count 4 | Where-Object {$_.StatusCode -ne 0} Get-NetAdapter | Select-Object Name, Status, LinkSpeed

9.4 Cloud Troubleshooting (AWS / Azure)#

9.4.1 AWS#

Check instance health & metrics

aws ec2 describe-instance-status --instance-ids i-0123456789abcdef0 | jq aws cloudwatch get-metric-statistics --metric-name CPUUtilization --namespace AWS/EC2 --dimensions Name=InstanceId,Value=i-0123456789abcdef0 --period 300 --statistics Average

Check S3 access issues

aws s3 ls s3://company-backup | grep "permission"

IAM / Access Problems

aws iam get-user | jq aws sts get-caller-identity

9.4.2 Azure#

VM status and metrics

az vm list -d --query "[?powerState!='VM running'].[name,resourceGroup,powerState]" -o table az monitor metrics list --resource <resource-id> --metric "Percentage CPU" --interval PT5M | jq

Storage issues

az storage blob list --account-name mystorage --container-name logs -o json | jq

Access / Role Issues

az role assignment list --assignee sysadmin@company.com -o table

9.5 Automation & Scripting Troubleshooting#

Always log script output with tee (Linux) or Tee-Object (PowerShell).
Validate command exit codes in Bash:

if ! rsync -avz /src /dst; then echo "Backup failed" | mail -s "Backup Error" admin@company.com fi

Use dry-run / test modes for destructive actions (deletions, terminations, snapshots).
Combine CLI + pipes + filters to isolate errors quickly.

9.6 Best Practices#

Regular Monitoring
- Linux: top, df, journalctl, grep pipelines
- Windows: PowerShell counters, Event Logs
- Cloud: CloudWatch / Azure Monitor / CLI filters
Access Control & Security
- Enforce least privilege
- Use SSH keys / MFA
- Audit user activities and role changes
Backup & Recovery
- Daily incremental backups, weekly full backups
- Test recovery procedures regularly
Automation & Consistency
- Use scripts and scheduled tasks for repetitive operations
- Version-control automation scripts (Git)
Documentation
- Maintain updated runbooks for common issues
- Document environment changes, deployments, and automation scripts
Cost & Resource Optimization
- Stop unused VMs
- Remove orphaned storage volumes
- Tag resources for tracking

10. Appendices#

The appendices provide ready-to-use scripts and command examples to simplify repetitive administrative tasks. These can be adapted to your environment.

10.1 Database Backup Script (MySQL / MariaDB)#

The appendices provide ready-to-use scripts and step-by-step instructions for automating common administrative tasks, including database backups and SSL certificate renewal.

Image Description