Forum Post: Future changes of db log files

Status
Not open for further replies.
G

George Potemkin

Guest
If I understand correctly it was announced that Progress is planning to use two database log files: one for login/logout messages and another for the rest. I understand the reasons behind this idea but I'm not sure if it will make me happy. When I see an error in db log I got form a customer the first thing I'm trying to create is the list of /all/ messages issued by the same session - including the login/logout messages. I'm sure that having two log files for an incident a customer will send to TS the smallest (and, of course, the most important) file - the one without logins/logouts. Would it be enough for the most cases? I'm not sure. What are the problems you have while parsing the current structure of db logs? What are the changes in db logs would you wish to get in the future Progress releases? Future Progress releases seem to be going to use more than 4 billions messages. ;- ) At least _DbParams-Msg-Num introduced in V11.5 is INT64. So we should get ready for such "verbose" releases (with the hundreds GBs promsgs file!). ;-) For information: the version 11.5.1 has only 17,919 messages (the promsgs' size is 1.6MB) and each new release adds approximately 300-400 messages. The most annoying (IMHO) message is # 14658 (# 5512 in the past): Previous message sent on behalf of user number , server pid pid , broker pid pid . Instead of this message I would prefer to have a new column in the log that should report the number of user the message is talking /about/. The current log structure has the column with the number of user who /writes/ the message to the log file. It's a server for the remote clients. But the "subject-object" relations also exist between the self-service processes. For example: WDOG 249: (5028) SYSTEM ERROR: Releasing regular latch. latchId: 4 WDOG 249: (5028) SYSTEM ERROR: Releasing regular latch. latchId: 1 WDOG 43: (2522) User 249 died holding 2 shared memory locks. In first two messages the watchdog "stole the identity" of dead session (instead of its own user number it used the user number of dead session). It's not a big issue when you read by the eyes a small fragment of db log but it might result in the missed messages when you need to parse the millions lines in the log files. Transaction backout messages do not have any information about the user who owns transaction (and about TRID as well): Begin transaction backout. (2252) Transaction backout completed. (2253) The session can be already dead at this moment but its number is still reserved in connection table. User num died holding num shared memory locks. (2522) User num died with num buffers locked. (2523) vs User num died holding num shared memory locks. (5026) User num died with num buffers locked. (5027) The only difference between these messages is /who/ writes them: watchdog or "dead" user itself who was in fact "a bit alive": ABL 10: (5027) User 10 died with 1 buffers locked. I'm not suggesting to change all existent messages (like Progress did for the messages with RECID in 11.4 to support table partitioning) but new column could be used at least instead of 14658's and by the future messages. It would be nice if Progress will use the numbers for all messages without exceptions. Not like: SRV 1: (-----) Received RECONNECT from WTB Promsgs file is not the only source of Progress messages, some messages are hard coded into the executables - namely the ones introduced in the patches. Later (in new releases) Progress is moving these messages to the promsgs file but sometimes it forgets to change the executables. And the difference between the real message text in db log and its "official" version in promsgs file (and in Help) may give the wrong results while scanning log files by message text. Progress uses the message types like "F" for "fatal" messages: F ABL 10: (5027) User 10 died with 1 buffers locked. F Usr 97: (1124) SYSTEM ERROR: Wrong dbkey in block. Found 57689, should be 9787072 in area 19. F RFUTIL : (4524) SYSTEM ERROR: File /backup/xxxx too small 0, blocksize 16384 extend failed. But some messages that are not fatal for /database/ are also marked as "F": F ABL 2113: (14675) SYSTEM ERROR: Attempt to define too many indexes for area 6 database DBI16zaSWw. (40) F RFUTIL : (835) ** Cannot disable after-imaging, it is not enabled. F RFUTIL 46: (836) ** Cannot do AIMAGE NEW, after-imaging is not enabled. The "W" type is rare: W ABL 1228: (5408) WARNING: -l exceeded. Automatically increasing from 4482 to 4492. (5408) The most messages are reported as type "I" and I did not see other message types. In other words the message types so far are not consistently used by Progress and we can't rely on the message types. For example, Progress could add, "C"(-onnect) type instead of moving the connection messages to a separate files. It would be easy to filter out the "unwanted" messages provided they are correspondingly tagged. I would like to have the less frequently re-used user number - similar to PID in the OS: it's unique at any given moment, it can be re-used later but in fact even on very busy systems it will be re-used once per hour or so. The current limit (32K) is enough for user numbers. The user number can be still related to a slot in the user control table (a.k.a. the connection table) e.g. as modulo by table's size. But if a session crashed and the same end-user almost immediately started new session then it's very likely new session will get the number that was just released by previous session. We need to be careful not to mess the messages from these two sessions especially for remote sessions where we don't know their PIDs. It would be great if db logs will report the PIDs for remote sessions. Now we have only the following information: SRV 1: (742) Login usernum 947, userid root client type WTA , on xxxxx batch using TCP/IP IPV4 address 192.xxx.xxx.xx. SRV 1: (708) Userid is now XXXXX. SRV 1: (14658) Previous message sent on behalf of user 948, server pid 28017, broker pid 24771. (5512) This information (userid on OS level, client's host, client's type ) does not allow us reliably identify all connections of the same session in the logs of all databases the session was connected to. Even the connection time is useless, for example, in case of the pre-started appserver's sessions. And I'd wish that all Progress logs have /exactly/ the same format. So we can use the same tool to parse all of them. Do you have something to add to the list? Best regards, George

Continue reading...
 
Status
Not open for further replies.
Top