How to Design Log Content
2023-03-09
This article mainly provides some experience and suggestions for readers who need to engage in software development.
Target Audience
- Programmers who need to engage in software development
- Software engineers who need to have overall control of the developed system
1. Basic Discussion on Logs
Before introducing how to design log content, there are a few questions that need some simple discussion.
The entire content of this article is based on these discussions.
1.1 Why Output Logs?
Answer: To locate "issues".
Some people believe that when the system encounters runtime errors during operation, the programming language or virtual machine level will throw an error and mark the specific line number and function location, so there is no need to output additional logs. Others think that the system has been thoroughly tested before going online, and all processes have been inspected, so there is no need to output related content in the logs. There are also those who believe that after the system goes online, if problems arise, they themselves are responsible for fixing them, and others will not look at the logs anyway, so it doesn't matter whether or not the logs are output.
These thoughts are all relatively one-sided.
When we mention issues, we're not just talking about system errors but also processes that do not conform to design, or even evidence sought by users unwilling to admit their actions. Secondly, although the system should be fully tested before going online, testing itself may also have errors or omissions. If we were to require testing to be 100% error-free, it would be more direct to require development to be 100% bug-free. Finally, developing a system involves more than just development; testing, operations, and even subsequent operations are involved. Good logs not only bring convenience to your own development but also help other positions, improving the overall efficiency of the team. Conversely, if other positions can provide feedback with logs attached, it brings great convenience to developers themselves.
Therefore, regardless of how perfect a system is or whether it has been thoroughly tested, outputting logs is an essential task.
1.2 Who Should View the Logs?
Answer: "All" people related to this system need to view all or part of the logs.
Some people believe that logs are only for their own development debugging and subsequent investigation of online bugs. "Anyway, the code will eventually be modified by myself," so others don't need to see it or have any reason to see it.
This idea is also incorrect.
Just like "writing code should be understandable not only to oneself but also to others." If we say that code is only for the development position, then more positions need to see the logs. As mentioned in the previous question, testing, operations, and even operations may need to view all or part of the logs and use them for their work.
Position | Purpose of Logs |
---|---|
Development | Debugging development, locating faults |
Testing | Basis for submitting bugs Understanding key points of program execution, designing more targeted test cases |
Operations | Basis for feedback on issues Determining and excluding basic issues (such as configuration errors causing the system to fail to start) |
Business Operations | Viewing specific user actions Helping resolve user disputes |
Therefore, the more people who can understand and utilize the logs, the better we should aim for.
1.3 What Are the Harms of Not Outputting Logs or Having Arbitrary Log Content?
Answer: All sorts of expected and unexpected problems will follow.
In conjunction with the above two questions, the answer to this question becomes obvious.
Position | Harm of Not Logging |
---|---|
Development | Difficult to locate issues during development-> Overtime for development |
Testing | Vague descriptions of submitted bugs-> Developers need to reproduce bugs-> Without logs, issues are hard to locate-> OvertimeTesting process entirely black-boxed -> Unable to design comprehensive test cases-> System crashes after going online-> Without logs, issues are hard to locate-> Overtime for both development and testing |
Operations | Unable to independently troubleshoot any issues-> Everything needs developer assistance-> Without logs, issues are hard to locate-> Overtime for both development and operations |
Business Operations | Users encounter issues while using the system-> Users do not remember or know what operations they performed-> Everything needs developer assistance-> Without logs, issues are hard to locate-> Developer overtime |
Through the above "three log questions," readers should understand the views presented in this article regarding logs. Next, let's take a closer look at how to output logs.
2. Log Format
Generally speaking, one line of log equals one line of text. Although logs themselves do not strictly require formatting, standardized and unified log output formats can not only facilitate reading but also make it easier for various log systems to collect and extract information.
2.1 Plain Text Logs
Plain text logs are the most common and simplest form of log output. In most programming languages, you can simply print
to output logs.
An advantage is that you can completely plan the format yourself, but the disadvantage is that it is not conducive to log systems collecting and extracting data.
A typical plain text log segment looks like this:
Text Only | |
---|---|
1 2 3 4 5 6 7 |
|
This log is mainly divided into two parts:
1. Automatically generated label section ([I] [2021-12-13 18:54:44] [web01] [TRACE-FD8CDE88] [ANONYMITY] [@ANONYMITY] [+0ms] [0ms]
)
2. Specific encoded output content section ([REQUEST USER] System Administrator (u-admin)
)
The main purpose of this log is to facilitate reading. Therefore, the log level ([I]
), timestamp ([12-13 18:54:44]
), and trace ID (TRACE-FD8CDE88
) have been shortened.
2.2 JSON Format Logs
Outputting logs in JSON format is generally done to facilitate collection by log systems.
The advantage is that it's convenient for log systems to collect, but the disadvantage is that it's less intuitive when reading directly.
Using the log from Section 2.1 as an example, the content in JSON format is as follows:
Text Only | |
---|---|
1 2 3 4 5 6 7 |
|
2.3 Plain Text Logs vs. JSON Format Logs
Both plain text logs and JSON format logs have their own advantages and disadvantages. The choice can be made based on the actual situation.
For example, if the scale of the developed system is small, the system is a monolithic application without complex processing and third-party log systems are not introduced, plain text logs are sufficient. For distributed systems, microservices scenarios, and environments where third-party log systems are already set up, consider using JSON format output.
If conditions permit, the log module of the business system can be modified to allow different log formats based on configurations. This allows plain text logs to be output locally during development without connecting to third-party log systems, while JSON format logs are output in production environments.
3. Log Labels
In the previous examples, it can be seen that each log entry has identical fields called "labels" in addition to the output content specified during coding. These labels can distinguish the source of each line of logs.
It can be said that logs without labels are almost worthless. For example, in Web applications, multiple users and requests access the system simultaneously. Consider the following two concurrent request scenarios with the output logs:
Text Only | |
---|---|
1 2 3 4 |
|
Through the traceId
, it is clear that the /users
interface returned 3 pieces of data, and the /books
interface returned 10 pieces of data.
Without the traceId
label, the logs would look like this:
Text Only | |
---|---|
1 2 3 4 |
|
Although the same content was output, due to the lack of traceId
, the logs are completely mixed together. Even when checking the logs, it is only possible to view each line of logs in isolation, unable to distinguish logs from the same request context. Thus, these logs are clearly not very useful.
Below are some commonly used log labels. Based on actual circumstances, different combinations can be chosen to add labels to each line of logs, making the output logs searchable, filterable, and trackable.
Importance | Field Name | Description |
---|---|---|
appName |
Application name | |
moduleName |
Module name | |
upTime |
System uptime since startup | |
Important | level |
Log level, such as: DEBUG , INFO , WARNING , ERROR |
levelShort |
Shortened log level, such as: D , I , W , E |
|
Important | timestamp |
Timestamp (seconds), such as: 1503936000 |
timestampMs |
Timestamp (milliseconds), such as: 1503936000000 |
|
Important | timestampHumanized |
Human-readable timestamp, such as: 2017-08-29 00:00:00 |
timestampHumanizedShort |
Simplified human-readable timestamp, such as: 08-29 00:00:00 |
|
hostname |
Hostname | |
Important | traceId / requestId |
Trace/request ID, can be UUID4, such as: FD8CDE88-78AB-47E4-886A-85681700520B |
traceIdShort / requestIdShort |
Shortened trace/request ID, can be the first segment of UUID, such as FD8CDE88 |
|
Important | userId |
User ID, unlogged users can be fixed as "ANONYMITY" |
Important | username |
Username, unlogged users can be fixed as "ANONYMITY" |
clientId |
Client ID, can be implemented via Cookies in browsers | |
clientIP |
Client IP address | |
diffTime |
Time interval between this log and the previous one (milliseconds), used to simply judge time-consuming code ranges | |
Important | costTime |
Time interval from the start of the request to this log (milliseconds), used to simply judge time-consuming code ranges |
3.1 Log Levels
The various log labels mentioned above are relatively easy to understand, but the "log level" requires special mention here.
Log levels are a part that can easily cause confusion. Some systems divide log levels arbitrarily, and sometimes they are filled out randomly, such as all logs being marked as DEBUG, or any "error" being marked as ERROR. This can lead to confusion when viewing and analyzing logs later, and can also interfere with third-party log systems.
To reasonably set the level of each log, first, we need to determine how many levels there are and the severity corresponding to each level.
# | Log Level |
---|---|
1 | Fatal FATAL |
2 | Critical CRITICAL |
3 | Error ERROR |
4 | Warning WARNING |
5 | Info INFO |
6 | Debug DEBUG |
7 | Trace TRACE |
Generally, all log levels we encounter are roughly the above seven types, but in specific different systems, the total number of levels and classification methods vary.
Too many levels can increase the burden on developers, and the content corresponding to each level does not have strict regulations. Here, we assume a Web system whose log module includes five levels (CRITICAL
, ERROR
, WARNING
, INFO
, DEBUG
) as an example to illustrate.
Critical CRITICAL
Completely unforeseen problems that cause processing to be unable to proceed, and may even be thrown to the runtime level causing the system to crash and restart, such as:
- Missing system-related components, packages, configuration files
- Expected data missing upon retrieval
- SQL statement execution failure
When such problems occur, if the system can still return an HTTP response, it usually returns a 5xx
server error.
Problems at this level are necessarily system bugs, and any issue belonging to this category needs immediate resolution.
Error ERROR
Problems that prevent processing from proceeding but do not cause the system to crash, and errors can be caught within the handling logic, such as:
- Calling third-party API errors
- Database connection failures
- Incorrect parameters for API calls
- User authentication token expiration leading to authentication failure
- Business-disallowed operations, such as duplicate data entries
When such problems occur, the system can usually still return an HTTP response, returning the same 5xx
server error.
Problems at this level generally originate from external systems and are not bugs in this system. Therefore, they may not be "fixed" but rather "optimized," such as:
- Adding retry mechanisms when calling third-party APIs fails
- Optimizing UI to avoid users entering illegal content or performing disallowed operations, etc.
Warning WARNING
Issues that are not really problems, such as:
- Accessing non-existent routes or data (i.e., 404, like
GET /favicon.ico
) - Various username, password, captcha errors
These issues often do not need to be fixed, just regular checks to see if the quantity surges. The source of these issues is often crawler systems, various health checks, vulnerability scans, but generally, the quantity is stable, rarely experiencing drastic fluctuations.
Info INFO
Simply recording prompt information, such as:
Text Only | |
---|---|
1 2 3 |
|
This level can be used in the test environment to track the testing process and can also be used after the system goes online for user data analysis. Of course, it can also be completely omitted.
Debug DEBUG
Information facing developers and testers, detailed variable contents are output in this information to help debug programs, the specific content can be referred to below.
4. Log Content
Log content refers to the output content specified during coding. This part of the content is relatively free, but there are generally some techniques that can be used.
For ease of understanding, the following example codes are all in Python, using print()
for output. The log content itself is not directly related to the specific language/framework.
4.1 Output Variables
Outputting the content of key variables is the most common practice when outputting log content. However, in actual projects, many people do not output logs very thoroughly.
A relatively complete example of variable output is as follows:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Output:
Text Only | |
---|---|
1 2 3 4 |
|
Output Variable Names
Variable names should always be output along with their values.
Many people habitually use something like print(my_var)
to output logs. When the number of logs is small, this might not be a big problem. However, when dealing with loops, multiple conditional branches, random numbers, etc., without variable names as prompts, it is easy to produce unreadable or incorrect results. For example:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Output:
Text Only | |
---|---|
1 2 3 |
|
In this case, without comparing each line of code one by one, it is impossible to know the value of each variable.
Output Variable Values Should Indicate Type
When outputting variable values, the easiest confusion arises from distinguishing between numeric 1
and string "1"
, boolean true
and string "true"
. This issue is most common in JavaScript and Python. Ignoring this problem sometimes makes the log content look very strange. For example:
Python | |
---|---|
1 2 3 4 5 |
|
Output:
Text Only | |
---|---|
1 2 |
|
In this case, the log shows that the
is_ok
variable is false, but it still enters the branch, which is very bizarre!
Taking Python as an example, there are several ways to indicate the type of variable values:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Output:
Text Only | |
---|---|
1 2 3 4 |
|
Output Strings Should Indicate Leading and Trailing Spaces
Another thing that can easily confuse people when outputting variable values is the leading and trailing spaces and invisible characters in strings. Ignoring this problem can also make the logs look very strange, such as:
Python | |
---|---|
1 2 3 4 5 6 7 |
|
Output:
Text Only | |
---|---|
1 2 |
|
In this case, the log shows that the
my_var
variable outputs"main"
, but it still enters another branch, which is very bizarre!
Taking Python as an example, there are several ways to indicate the leading and trailing spaces in strings:
Python | |
---|---|
1 2 3 4 5 |
|
Output:
Text Only | |
---|---|
1 2 |
|
Output Date and Time Should Include Timezone
During development, dates are also a major source of problems. Due to the different timezones configured on the local machine and the server, it is very easy to output ambiguous dates in logs, such as:
Python | |
---|---|
1 2 3 4 |
|
Output:
Text Only | |
---|---|
1 |
|
Although the output time is "correct," it is difficult to determine whether the time is Beijing time or UTC time, which is not conducive to troubleshooting.
For Python, outputting dates that comply with the ISO 8601 standard is very simple, such as:
Python | |
---|---|
1 2 3 |
|
Output:
Text Only | |
---|---|
1 |
|
4.2 Output Logs Before Calling Key Functions or Entering Branches
After understanding how to output variables, the next step is to consider which variables to output.
Generally, the judgment basis for key functions and branches should be output to determine the program flow and locate problems. On the other hand, a large number of intermediate variables in the program processing process do not need to be output and can be viewed using Debug single-step debugging during development, such as:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Output:
Text Only | |
---|---|
1 2 |
|
4.3 Control the Length of Each Line of Logs
Since outputting logs also consumes system resources, and overly long single-line logs themselves may have design issues, the maximum length of each line of logs should be controlled to avoid blindly outputting logs for "log completeness," such as:
Python | |
---|---|
1 2 3 |
|
Output:
Text Only | |
---|---|
1 |
|
Overly long logs not only consume system resources but are also difficult to read.
Therefore, for long content, only key parts or the beginning part should be output to avoid meaningless overly long logs, such as:
Python | |
---|---|
1 2 |
|
Output:
Text Only | |
---|---|
1 |
|
4.4 Handling Content with Line Breaks
Some log content inherently contains line breaks that we cannot control, such as the function call stack list that needs to be output when reporting errors. In JSON format, the entire content is in the message
field, which poses no problem, but if output in plain text, it will split the log content, such as:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Output:
Text Only | |
---|---|
1 2 3 4 |
|
At this point, whether reading directly, collected by a third-party log system, or simply filtering log files with the
grep
command, it is very easy to be interfered by line breaks.
At this point, the log content containing line breaks can be output line by line according to the line break symbol, ensuring that each line has "tags," such as:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Output:
Text Only | |
---|---|
1 2 3 4 |
|
Epilogue
Good log output not only helps oneself and colleagues in their work but is also an important part of system observability.
The methods and designs mentioned in this article may not apply to all systems, but they are provided here as reference recommendations, hoping to be helpful to everyone.