Trouble Shooting & Debugging - Tools

 Initially tools mentioned in Google Automation: Trouble Shooting & Debugging Coursera course

Useful Tools

  • WEEK 2 Videos - review for tools!

  • to get more info:
  • tcpdump and Wireshark can show us ongoing network connections, and help us analyze the traffic going over our cables
  • ps, top, or free can show us the number and types of resources used in the system.  
  • strace to look at the system calls made by a program, or ltrace to look at the library calls made by the software.
  • Debuggers (often programming language specific) let us 
    • follow the code line by line, 
    • inspect changes in variable assignments, 
    • interrupt the program when a specific condition is met, 
    • and more.
    • we can modify the code, 
    • we can change it so that it provides more logging information.
  • To check for IO problems:
    • iotop similar to top that lets us see which processes are using the most input and output. 
    • iostat and vmstat, these tools show statistics on the input/output operations and the virtual memory operations. 
    • ionice to make our backup system reduce its priority to access the disk and let other services like web services use it too.
    • nice and renice
  • To check for network  issues:
    • iftop, similar to top that shows the current traffic on the network interfaces. 
    • rsync command, which is often used for backing up data, includes a -bwlimit, just for this purpose. If that option isn't available, we can use a program like Trickle to limit the bandwidth being used
  • To check for compression issues:
    • compression algorithms selected is too aggressive, and compressing the backups is using all of the server's processing power.  nice command can help.
    • Go to co workers, forums etc for more help
  • bisect - to help use binary search to find problem
  • logrotate - to rotate files used by a log so 1 log file doesn't get too big.
  • ab -n <times>  to get the average timing of <times> requests
  • time returns real, user, sys:
    • real (wall-clock) is the amount of actual time that it took to execute the command.
    • user is the time spent doing operations in the user space
    • sys is the time spent doing system level operations. 
    • The values of user and sys won't necessarily add up to the value of real because the computer might be busy with other processes. 
  • parallel processes:in python via modules:
    • Threading 
    • AsyncIO

Comments

Popular posts from this blog

Monitoring Tools

Trouble Shooting & Debugging - Terms & Steps

Getting started with Git