Trouble Shooting & Debugging - Tools
Initially tools mentioned in Google Automation: Trouble Shooting & Debugging Coursera course
Useful Tools
- WEEK 2 Videos - review for tools!
- to get more info:
- tcpdump and Wireshark can show us ongoing network connections, and help us analyze the traffic going over our cables
- ps, top, or free can show us the number and types of resources used in the system.
- strace to look at the system calls made by a program, or ltrace to look at the library calls made by the software.
- Debuggers (often programming language specific) let us
- follow the code line by line,
- inspect changes in variable assignments,
- interrupt the program when a specific condition is met,
- and more.
- we can modify the code,
- we can change it so that it provides more logging information.
- To check for IO problems:
- iotop similar to top that lets us see which processes are using the most input and output.
- iostat and vmstat, these tools show statistics on the input/output operations and the virtual memory operations.
- ionice to make our backup system reduce its priority to access the disk and let other services like web services use it too.
- nice and renice
- To check for network issues:
- iftop, similar to top that shows the current traffic on the network interfaces.
- rsync command, which is often used for backing up data, includes a -bwlimit, just for this purpose. If that option isn't available, we can use a program like Trickle to limit the bandwidth being used
- To check for compression issues:
- compression algorithms selected is too aggressive, and compressing the backups is using all of the server's processing power. nice command can help.
- Go to co workers, forums etc for more help
- bisect - to help use binary search to find problem
- logrotate - to rotate files used by a log so 1 log file doesn't get too big.
- ab -n <times> to get the average timing of <times> requests
- time returns real, user, sys:
- real (wall-clock) is the amount of actual time that it took to execute the command.
- user is the time spent doing operations in the user space
- sys is the time spent doing system level operations.
- The values of user and sys won't necessarily add up to the value of real because the computer might be busy with other processes.
- parallel processes:in python via modules:
- Threading
- AsyncIO
Comments
Post a Comment