How to Design Log Content
2023-03-09
This article mainly provides some experience and suggestions for readers who need to engage in software development.
Target Audience
- Those who need to engage in software development
programmers
- Those who need to have overall control over the developed system
software engineers
1. Basic Discussion on Logs
Before introducing how to design log content, there are a few questions that need some simple discussion.
The entire content of this article is written based on these discussions.
1.1 Why Output Logs?
Answer: To locate "issues".
Some people believe that when the system encounters runtime errors during operation, the programming language or virtual machine level will throw an error and mark the specific line number and function location. There is no need to additionally output logs. Others think that before the system goes online, it has already been thoroughly tested, and all processes have been verified, so there is no need to output relevant content in the logs. Some also believe that after the system goes online, if problems occur, they are still responsible for fixing them themselves, and others will not even look at the logs, so whether or not to output logs does not matter.
These thoughts are all relatively one-sided.
When we mention issues, it's not just system errors; it can also be processes that do not conform to design, or even evidence sought by users unwilling to admit their actions. Secondly, although the system should be fully tested before going online, testing work itself may also have errors or omissions. If we require testing to be 100% free of negligence, it might as well directly require development to be 100% free of bugs. Finally, developing a system involves more than just development; testing, operations, and even subsequent operations are involved. Good logs not only bring convenience to your own development but also help other positions, improving the overall efficiency of the team. Conversely, if other positions provide feedback with logs available for reference, it can bring great convenience to development itself.
Therefore, regardless of how perfect a system is or whether it has been thoroughly tested, outputting logs is essential work.
1.2 Who Should View Logs?
Answer: "All" individuals related to this system need to view all or part of the logs.
Some people believe that logs are only for self-development debugging and subsequent investigation of online bugs. "Since the code will eventually be modified by oneself," others don't need to see it, nor is it necessary.
This thought is also incorrect.
Just like "writing code that others can understand besides oneself," if code is limited to being viewed only by developers, then more positions need to see logs. As mentioned in the previous question, testing, operations, and even operations could potentially need to view all or part of the logs and use them to assist their work.
Position | Purpose of Logs |
---|---|
Developer | Debugging during development, locating faults |
Tester | Basis for submitting bugs Understanding key points of program execution, designing more targeted test cases |
Operations | Basis for feedback Determining and excluding basic issues (e.g., configuration errors causing the system to fail to start) |
Operation | Viewing specific user operations Helping resolve user disputes, etc. |
Therefore, the more people who can understand and utilize the logs, the better our goal should be.
1.3 What Are the Harms of Not Outputting Logs or Having Arbitrary Log Content?
Answer: Various conceivable or inconceivable problems will follow.
In conjunction with the above two questions, the answer to this becomes obvious.
Position | Harm of Not Logging |
---|---|
Developer | Difficult to locate issues during development-> Developer overtime |
Tester | Vague bug descriptions submitted by testers-> Developers need to reproduce bugs-> Without logs, issues are hard to locate-> OvertimeTesting process entirely black-box -> Testers cannot design comprehensive test cases-> System failures after going online-> Without logs, issues are hard to locate-> Developer and tester overtime |
Operations | Unable to independently troubleshoot any issues-> Everything relies on developers-> Without logs, issues are hard to locate-> Developer and operations overtime |
Operation | Users encounter issues while using the system-> Users do not remember or know what actions they took-> Everything relies on developers-> Without logs, issues are hard to locate-> Developer overtime |
Through the above "three log questions," readers should be able to understand the views on logs presented in this article. Next, let's take a closer look at how to output logs.
2. Log Format
Generally speaking, one line of log equals one line of text. Although logs themselves do not strictly require a format, standardized and uniform log output formats can make reading easier and facilitate various log systems' collection and extraction.
2.1 Plain Text Logs
Plain text logs are the most common and simplest form of log output. In most programming languages, you can simply print
logs.
The advantage is that you can completely plan the format yourself, but the disadvantage is that it is not conducive to log systems collecting and extracting data.
A typical example of plain text logs is as follows:
Text Only | |
---|---|
1 2 3 4 5 6 7 |
|
This log mainly consists of two parts:
1. Automatically generated label section ([I] [2021-12-13 18:54:44] [web01] [TRACE-FD8CDE88] [ANONYMITY] [@ANONYMITY] [+0ms] [0ms]
)
2. Specific encoded output content section ([REQUEST USER] System Administrator(u-admin)
)
The main purpose of this log is for easy reading. Therefore, the log level ([I]
), timestamp ([12-13 18:54:44]
), and trace ID (TRACE-FD8CDE88
) have been shortened.
2.2 JSON Format Logs
When logs are output in JSON format, it is generally mainly for the consideration of facilitating log system collection.
The advantage is naturally convenient for log system collection, but the disadvantage is that it is less intuitive when read directly.
Taking the log from section 2.1 as an example, the content in JSON format is as follows:
Text Only | |
---|---|
1 2 3 4 5 6 7 |
|
2.3 Plain Text Logs vs. JSON Format Logs
Plain text logs and JSON format logs each have their own advantages and disadvantages. You can choose based on actual circumstances.
For example, if the system scale is small, the system is a monolithic application, without complex processing, and without introducing third-party log systems, plain text is sufficient. For distributed systems, microservices scenarios, or systems where third-party log systems are already set up, consider using JSON format output.
If conditions allow, you can modify the log module of the business system so that it outputs different formats of logs according to configuration. This enables the output of plain text logs locally during development without connecting to a third-party log system, while outputting JSON format logs in production environments.
3. Log Tags
In the previous examples, you can see that apart from the specified output content defined during coding, each log entry has the same field "tags." These tags can distinguish the source and other basic information of each log line.
It can be said that logs without tags are almost worthless. For example, in Web applications, multiple users and requests inevitably access the system simultaneously. Imagine the following two concurrent request scenarios producing logs:
Text Only | |
---|---|
1 2 3 4 |
|
Through the traceId
, it is clear that the GET /users
interface returned 3 records, while the GET /books
interface returned 10 records.
Without the traceId
tag, the logs would appear as follows:
Text Only | |
---|---|
1 2 3 4 |
|
Although the same content was output compared to before, due to the lack of traceId
, large amounts of logs are mixed together. Even when checking the logs, each line must be examined individually without distinguishing context logs from the same request. Thus, such logs are clearly not very useful.
Below are some commonly used log tags. Depending on the actual situation, different combinations can be chosen to add tags to each log line, making the output logs searchable, filterable, and trackable.
Importance | Field Name | Description |
---|---|---|
appName |
Application name | |
moduleName |
Module name | |
upTime |
System uptime since startup | |
Important | level |
Log level, e.g.: DEBUG , INFO , WARNING , ERROR |
levelShort |
Shortened log level, e.g.: D , I , W , E |
|
Important | timestamp |
Timestamp (seconds), e.g.: 1503936000 |
timestampMs |
Timestamp (milliseconds), e.g.: 1503936000000 |
|
Important | timestampHumanized |
Human-readable timestamp, e.g.: 2017-08-29 00:00:00 |
timestampHumanizedShort |
Shortened human-readable timestamp, e.g.: 08-29 00:00:00 |
|
hostname |
Hostname | |
Important | traceId / requestId |
Trace/request ID, can be UUID4, e.g.: FD8CDE88-78AB-47E4-886A-85681700520B |
traceIdShort / requestIdShort |
Shortened trace/request ID, can be the first segment of UUID, e.g.: FD8CDE88 |
|
Important | userId |
User ID, unlogged users can be fixed as "ANONYMITY" |
Important | username |
Username, unlogged users can be fixed as "ANONYMITY" |
clientId |
Client ID, browsers can implement through Cookies | |
clientIP |
Client IP address | |
diffTime |
Time interval between this log and the previous one (milliseconds), used to judge longer-executing code ranges | |
Important | costTime |
Time interval from the start of the request to this log (milliseconds), used to judge longer-executing code ranges |
3.1 Log Levels
The various log tags mentioned earlier are relatively straightforward, but here we need to specifically mention "log levels."
Log levels are a part that is very prone to confusion. Some systems categorize log levels arbitrarily, sometimes just filling them out casually, such as all logs being DEBUG, any "error" being ERROR. Doing this creates ambiguity when subsequently viewing and analyzing logs and interferes with third-party log systems.
To reasonably set the level of each log, we first need to determine how many levels there are and the corresponding severity of each issue.
# | Log Level |
---|---|
1 | Fatal FATAL |
2 | Critical CRITICAL |
3 | Error ERROR |
4 | Warning WARNING |
5 | Info INFO |
6 | Debug DEBUG |
7 | Trace TRACE |
Generally, all log levels we encounter probably fall into the above seven types, but in specific different systems, the total number of levels and classification methods may vary.
Too many levels can increase the burden on developers, moreover, the contents corresponding to each level are not strictly defined. Here we assume a Web system whose log module contains five levels (CRITICAL
, ERROR
, WARNING
, INFO
, DEBUG
) as an example to illustrate.
Critical CRITICAL
Completely unforeseeable issues, errors that prevent further processing and may even be thrown at the runtime level, causing the system to crash and restart, such as:
- Missing system-related components, packages, configuration files
- Retrieving data that should exist but actually does not
- Failure executing SQL statements
When such issues arise, if the system can still return an HTTP response, it generally returns a 5xx
server error.
Issues at this level are necessarily system bugs, so any issue belonging to this category needs immediate resolution.
Error ERROR
Issues that prevent further processing but do not cause the system to crash, errors that can be caught within the processing logic, such as:
- Third-party API call errors
- Database connection failure
- Incorrect parameters for interface calls
- Expired user authentication tokens leading to authentication failure
- Business disallowed operations, such as duplicate data entry
When such issues arise, the system generally can still return an HTTP response, returning the same 5xx
server error.
Issues at this level generally originate from external systems and are not bugs of the current system. Therefore, they may not necessarily need "fixing" but rather "optimization," such as:
- Adding retry mechanisms when third-party API calls fail
- Optimizing UI to avoid users entering invalid content or performing disallowed operations, etc.
Warning WARNING
Some non-issue "issues," such as:
- Accessing non-existent routes or data (i.e., 404, such as
GET /favicon.ico
) - Various username, password, captcha errors
Such issues often do not need fixing; regular checks on whether the quantity surges are sufficient. The sources of such issues are often crawler systems, various health checks, vulnerability scans, but generally, the quantity remains stable, rarely experiencing sharp increases or decreases.
Information INFO
Simply recording prompt information, such as:
Text Only | |
---|---|
1 2 3 |
|
This level can be used in test environments to track the testing process and can also serve as user data analysis after the system goes online. Of course, such logs can also be completely omitted.
Debug DEBUG
Information facing developers and testers, detailed variable content outputs to help debug programs, specific content can be referenced below.
4. Log Content
Log content refers to the specified output content during coding. This part is relatively free, but there are usually some techniques that can be used.
For ease of understanding, the following example codes are all in Python, using print() as an example. Log content itself is not directly associated with specific languages/frameworks.
4.1 Output Variables
Outputting the content of key variables is the most common practice when outputting log content. However, in actual projects, many people do not output logs sufficiently.
A more complete example of variable output is as follows:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Output:
Text Only | |
---|---|
1 2 3 4 |
|
Output Variable Names
Variable names should always be output when outputting variables.
Many people habitually use something like print(my_var)
to output logs. When there are fewer logs, this may not be a big problem, but when involving loops, multiple conditional branches, random numbers, etc., if there are no variable names as prompts, it can easily lead to misunderstandings or mistakes. For example:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Output:
Text Only | |
---|---|
1 2 3 |
|
In this case, unless comparing this code line by line, it's impossible to know the value of each variable.
Output Variable Values Should Indicate Type
When outputting variable values, the easiest confusion arises from things like numbers 1
and strings "1"
, booleans true
and strings "true"
, which are most common in JavaScript and Python. Ignoring this issue sometimes makes the log content look very strange. For example:
Python | |
---|---|
1 2 3 4 5 |
|
Output:
Text Only | |
---|---|
1 2 |
|
In this case, the log clearly shows the
is_ok
variable as false, yet it still enters the branch, which is very odd!
In Python, there are multiple ways to indicate the type of variable values:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Output:
Text Only | |
---|---|
1 2 3 4 |
|
Output Strings Should Indicate Leading and Trailing Spaces
Another thing that easily confuses when outputting variable values is leading and trailing spaces and invisible characters in strings. Ignoring this issue can also make the logs look very strange, such as:
Python | |
---|---|
1 2 3 4 5 6 7 |
|
Output:
Text Only | |
---|---|
1 2 |
|
In this case, the log clearly shows the
my_var
variable as"main"
, yet it still enters another branch, which is very odd!
In Python, there are multiple ways to indicate leading and trailing spaces in strings:
Python | |
---|---|
1 2 3 4 5 |
|
Output:
Text Only | |
---|---|
1 2 |
|
Output Dates and Times Should Retain Time Zones
During development, dates are also a major source of problems. Since the time zones configured on local machines and servers may differ, it is very easy to output ambiguous dates in logs, such as:
Python | |
---|---|
1 2 3 4 |
|
Output:
Text Only | |
---|---|
1 |
|
Although the output time is "correct," it is difficult to determine whether the time is Beijing time or UTC time, which is not conducive to troubleshooting.
For Python, outputting ISO 8601 standard-compliant dates is very simple, such as:
Python | |
---|---|
1 2 3 |
|
Output:
Text Only | |
---|---|
1 |
|
4.2 Output Logs Before Calling Key Functions or Entering Branches
After understanding how to output variables, the next step is to consider which variables to output.
Generally, the basis for judging key functions and branches should be output to determine the program flow and locate issues. A large number of intermediate variables in the program processing process do not need to be output and can be viewed entirely using Debug single-step debugging during development, such as:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Output:
Text Only | |
---|---|
1 2 |
|
4.3 Control the Length of Each Log Line
Since outputting logs consumes system resources, and excessively long single-line logs may also have design issues, the maximum length of each log line should be controlled to avoid blindly outputting logs for the sake of "complete logs," such as:
Python | |
---|---|
1 2 3 |
|
Output:
Text Only | |
---|---|
1 |
|
Excessively long logs not only consume system resources but also make them difficult to read.
Therefore, for long content, only key parts or the beginning part should be output to avoid meaningless excessively long logs, such as:
Python | |
---|---|
1 2 |
|
Output:
Text Only | |
---|---|
1 |
|
4.4 Handling Content Containing Line Breaks
Some log content itself includes uncontrollable line breaks, such as function call stack lists that need to be output when encountering errors. In JSON format, the entire content in the message
field does not present any problems, but if output in plain text format, it will fragment the log content, such as:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Output:
Text Only | |
---|---|
1 2 3 4 |
|
At this point, whether reading directly, third-party log system collection, or simply using the
grep
command to filter log files, line breaks can easily interfere.
At this time, the log content containing line breaks can be output line by line according to the line break symbol, ensuring that each line has "tags," such as:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Output:
Text Only | |
---|---|
1 2 3 4 |
|
Epilogue
Good log output not only helps oneself and colleagues in work but is also an important part of system observability.
The methods and designs mentioned in this article cannot fit all systems. It is merely provided as a reference, hoping to offer useful assistance.