Suddenly slow database and sometimes losing connection

tommyd

New Member
We have a client with a virtual (MS Hyper-V) Windows server with our Progress software and our Progress database. The users are working directly on the server with RDP.

Everything went well until their hardware provider moved the virtual Windows server from one physical harddrive to another harddrive. They say that that is the only thing they did.

Now the users say that our software is working slow and that they sometimes lose connection with the database (this connection loss is visibly in the logs).

What can be the cause of this problem? At the moment we can't think of a solution.
 

Cringer

ProgressTalk.com Moderator
Staff member
What's the Progress version?

What is the difference in the IOPs of the 2 disks that were switched?
 

tommyd

New Member
What's the Progress version?

What is the difference in the IOPs of the 2 disks that were switched?
Thanks for your answer.

The version is 11.6.2

I don't know the answer to your last question. They moved it to a faster SSD they said. When I check Task Manager on the virtual server, the disk is behaving normal at first view.
 

TomBascom

Curmudgeon
It is not at all strange, this kind of thing happens all the time.

Assuming that they are telling the truth* and that the only change is to the disk, then they have probably moved you to a slow SAN disk. It could be that they imagine that because the SAN has flash drives that it will be fast. That's wrong. Flash drives in a SAN are not fast. They are at the wrong end of a cable and Einstein showed, way back in 1905, that that is the wrong place to be putting fast storage.

Or, if this is a cloud deployment, they have moved you to a disk with lesser provisioned IO ops.

Your best defense against these sorts of problems is to regularly track key performance indicators so that when things change you can demonstrate the before and after impacts.

ProTop has several "self defense" metrics that we created specifically for this purpose:
  • ioResponse - random read response time, the vast majority of OpenEdge IO is random reads
  • syncIO - synchronous write performance, this is what you experience when you do updates
  • BogoMIPs - single core CPU throughput, like it or not OpenEdge is aggressively single-threaded and this metric establishes how well the CPU cores are performing for you
These metrics all report what the OpenEdge application sees from end to end - not the narrowly focused and often misleading point of view seen from internal SAN instrumentation or Virtual consoles and the ilk. The big benefit is when you have a history of these metrics and can correlate changes to in the environment to user complaints. "Starting Monday users are complaining that the system is slow" correlating to syncIO went from 15Mbps to 1Mbps and ioResponse changed from 0.5ms to 4ms with spikes into the hundreds of ms gives you the ammunition to tell the admins that their "innocuous" change was a failure. It also gives them some very straightforward targets instead of the usual wishy-washy "it's slow" stuff.

It bears repeating that these metrics are from the point of view of the OpenEdge application. The sysadmins can argue all day long that "Progress is doing it wrong" or whatever but, at the end of the day, this is what OpenEdge is seeing and this is what your users are experiencing.

* this leap of faith is often unwarranted but without data to point out what they have neglected to mention you generally have no choice but to take their word for it.
 

Cringer

ProgressTalk.com Moderator
Staff member
That sort of configuration is quite common, for example if there have been instances where client sessions die within transactions as in a shared memory configuration this can bring down the database.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Do you know anything about the storage? If they moved the VM to another "hard drive" but not to another physical host then presumably there is a separate storage tier, e.g. a SAN. I'd ask the provider to move the VM back to the old storage, or something equivalent, to see if the performance problem is still present.

Talk to your users and try to get a problem definition with more detail; "our software is working slow" won't cut it. Is it slow to do database I/O? Slow to execute non-I/O business logic? Slow to access the file system? Depending on the answers, you may find that you don't have a "slow database" at all. Document your findings and gather whatever quantitative information you can from logs or other sources; e.g. "report foo.p completed in 07:12 on 07/03/2022 but it completed in 25:37 on 07/10/2022; foo.p does the following work: ...".

Just because a particular configuration has been used for a long time, that doesn't mean it is a good practice. "We've always done that" has been used by lots of people for lots of years to justify not changing things; or, in some cases, to not be seen as being responsible for a change in case it doesn't work out. Change can feel scary but, when changes are managed properly and risks are mitigated appropriately, there is no need for alarm. And when the status quo is acknowledged to be bad, the worst thing to do is to not attempt to make things better. "Normal" is whatever users are accustomed to, and they can accept lots of unnecessarily bad things as normal.

So I suggest you revisit your database and client configurations, including the decision to run local ABL clients client/server rather than self-service.
 

TomBascom

Curmudgeon
The users are working directly on the server with RDP

This seems unlikely. Most RDP licenses only allow 2 users to connect. In theory you can buy more but I've yet to stumble across a customer who has done so. Much more likely is that the users are RDP'ing to a terminal server or something that launches a client/server connection.

Especially since you mention that they frequently lose their connection - if that were to happen "on the server" with a shared memory connection you would be regaling us with complaints about the db crashing.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
This seems unlikely. Most RDP licenses only allow 2 users to connect.
While unlikely, it is possible; I have seen it. Windows servers have two CALs by default but you can buy more. I have had clients try to economize by deploying a single server to run the front end and back end of an application, reasoning that the extra CALs are less expensive than a second server.

But for now we can only speculate. Seeing the actual configuration of the clients and database broker(s) (command lines, parameters, .pf contents) would shed more light on the situation.
 

tommyd

New Member
Thanks for the answers.

I asked the hardware company to recheck their settings.

What are the most important databasesettings that are mostly the cause of a slow database (apart from this problem)?
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
If I am reading between the lines correctly, you work for an application vendor and you are not the person at that vendor who typically configures the OpenEdge database clients and brokers.

What are the most important databasesettings that are mostly the cause of a slow database (apart from this problem)?

Configuring OpenEdge application clients and databases for optimal performance is not a simple process of "set all of the go-fast parameters". There is no one set of configuration parameters that is appropriate for all situations. Performance tuning is contextual. Doing it well means assessing the application solution holistically: taking into account database size and activity, structure, application topology, available server hardware resources and the other demands for those resources, how the application code is written, OS version and configuration, OpenEdge release capabilities, and more. And to be most effective, performance tuning can be a time-consuming, iterative process: evaluating baseline performance, making configuration changes, re-evaluating and assessing the effects of the changes, and repeating until the desired result is achieved.

To deploy applications that run well, you (i.e. your employer) need to acquire this performance-tuning knowledge. But that is a long-term process. To address short-term needs, you can buy this knowledge (hire an experienced, knowledgeable OpenEdge DBA) or rent it (engage with a consultant or DBA service provider). Of course, hiring can be challenging if you lack the internal resources to vet candidates properly and assess the work of the new hire.

The best I can offer in this forum is that if you post your database structure and the configuration settings of your database clients and brokers, we can offer some high-level feedback and advice. It would also be helpful to know the hardware/OS specifications of your database server machine. If you're not sure how to find all that information, please say so; we might be able to help you with that too.
 
I agree with Tom/Rob. To add to this, very rarely the performance of a database goes down so dramatically over 1-2 days if no changes were made to a configuration. So why ask for any DB parameters?

Second point, has the new configuration been tested?

"until their hardware provider moved the virtual Windows server from one physical harddrive to another harddrive. They say that that is the only thing they did." Here it is suggested this was not done in a controlled manner. This points to poor IT procedures, relying on the myth of "faster SSD's". Such changes should have been tested and approved. It also points to lack of DR/backup processes. As soon as the change has been made it should have been immediately reverted. It seems like there was no such plan.

Basically - where was the Change Control process and, who was responsible for it and who signed it off?
 

tommyd

New Member
Thanks for the answers. And sorry for my late answer.

There was indeed a problem with the "new" SSD. We discovered it with the Windows "Resource Monitor". After a lot of discussion the hardware provider installed a new faster SSD and now everything is ok.
 

dimitri.p

Member
As soon as the change has been made it should have been immediately reverted. It seems like there was no such plan.

Basically - where was the Change Control process and, who was responsible for it and who signed it off?

Not everyone is in a position to slam the brakes on a "change" before it happens.

Also you must appreciate the vagueness of hardware changes.
Why the drive was changed in the first place, what it was changed from , and most importantly if there was indeed "something" wrong with the drive who the *&(ck validated it before switching live users on it?

The OP got lucky, the customer/users only suffered a little bit.
The vendor probably not so much.

We'll add it as one more story to the "New hardware , slower performance" chronicles


Dimitri
 
Top