Server's slow disk speed.

Cecil

19+ years progress programming and still learning.
INFO:
OS: CentOS LInux 64Bit 6.5

Development Server Kernel:

Kernel and CPU Linux 2.6.32-431.23.3.el6.x86_64 on x86_64
Processor information Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz, 8 cores
Memory: 8GB

Production Server Kernel:

Kernel and CPU Linux 2.6.32.46-xenU on x86_64
Processor information Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 2 cores
Memory: 8GB


I was doing some performance testing of my development server which is hosted on a VMWare ESXi and our production server is hosted "in the cloud" which is using XEN.

Our 3rd party VPS production server is about 3~4x times slower based upon a UX compared to my in house development server, so I did some hard disk speed tests comparing the two servers.

Development Server:
Code:
proenv>hdparm -Tt /dev/sda

/dev/sda:
Timing cached reads:   20386 MB in  1.99 seconds = 10221.18 MB/sec
Timing buffered disk reads: 262 MB in  3.02 seconds =  86.88 MB/sec

Production server at a 3rd party data centre:
Code:
proenv>sudo hdparm -Tt /dev/xvda1

/dev/xvda1:
Timing cached reads:   14404 MB in  1.98 seconds = 7256.50 MB/sec
Timing buffered disk reads:   6 MB in  3.22 seconds =   1.86 MB/sec

To me the VPS is considerably slower. Is the VPS acceptable for running a OpenEdge Database based upon these test results? Or is this not a fair nor accurate test to perform on a VPS server as XEN might have some sort of performance throttling enabled?

I know there are other things to consider like CPU & memory and we haven't even touched on database tuning. I just wanted to start with the basics first before moving onto the database configuration.
 

TheMadDBA

Active Member
0) This tests sequential reads... which you will pretty much never do in Progress outside of maintenance activity like index rebuilds, dbanalys, backups (on newer versions), certain binary dump and loads/etc. So probably not the test you want to run... BUT those production numbers are pathetic. You need to be focusing on the disk reads (not the cached reads) and 1.86 MB/sec is slower than slow.

1) Yes the performance of "cloud" disks is going to be way slower than your own disks. Usually a combination of throttling, cheap SAN and shared controllers/disks/etc. You can pay more to get guaranteed IOPS (IO operations per second) and throughput (MB/sec) but a lot of cloud providers are still going to be painfully slow and not much is going to beat a well tuned/configured local set of disks.

I wouldn't install much of anything on that production hardware unless you had the smallest of databases with very few users. Any kind of maintenance is going to be a nightmare and if the sequential IO is that bad I wouldn't expect the random IO (most of what your Progress app will do) to be much better.

Try and find out what the random IO rates are like from the file system (not the device) and compare that between the two systems, making sure the tool uses your DB blocksize and not larger chunks. "bonnie" is probably your best bet.
 

Cecil

19+ years progress programming and still learning.
Thank for that MadDBA. I'm just trying doing some DB house cleaning, removing some old records and it's so slow to delete. So my conclusion at first is that the cloud is not all what it's all cracked up to be. So now I've got to figure out how or where to have an optimised OpenEdge DB in the cloud.

The main reason for hosting the production environment in the cloud/datacenter is because of the real risk of earthquakes here in New Zealand and maintaining business continuity.
 

Cecil

19+ years progress programming and still learning.
So I run a fio command to get a some sort of server performance stats, but I had to stop the process due to it's impact on UX. I'm not even sure if these results are even very good or not.

Code:
proenv>sudo ./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.0.9
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 4096MB)
^Cbs: 1 (f=1): [m] [26.9% done] [326K/126K /s] [81 /31  iops] [eta 04h:22m:11s]
fio: terminating on signal 2
Jobs: 1 (f=1): [m] [26.9% done] [0K/0K /s] [0 /0  iops] [eta 04h:22m:12s]
test: (groupid=0, jobs=1): err= 0: pid=17646: Tue Oct 28 12:32:18 2014
  read : io=848640KB, bw=149935 B/s, iops=36 , runt=5795864msec
  write: io=280832KB, bw=49616 B/s, iops=12 , runt=5795864msec
  cpu          : usr=0.02%, sys=0.09%, ctx=194264, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=212160/w=70208/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=848640KB, aggrb=146KB/s, minb=146KB/s, maxb=146KB/s, mint=5795864msec, maxt=5795864msec
  WRITE: io=280832KB, aggrb=48KB/s, minb=48KB/s, maxb=48KB/s, mint=5795864msec, maxt=5795864msec

Disk stats (read/write):
  xvda1: ios=214746/85890, merge=141/22243, ticks=220325230/126495080, in_queue=346826250, util=100.00%
 

Cecil

19+ years progress programming and still learning.
More VPS benchmarking results:

Code:
Benchmark Run: Tue Oct 28 2014 14:07:08 - 14:37:58
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       24989966.7 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     2692.8 MWIPS (9.9 s, 7 samples)
Execl Throughput                               1402.1 lps   (29.4 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        241317.6 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           62107.1 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        683264.3 KBps  (30.0 s, 2 samples)
Pipe Throughput                              341879.6 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  70643.0 lps   (10.0 s, 7 samples)
Process Creation                               2894.9 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   2801.6 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    537.2 lpm   (60.1 s, 2 samples)
System Call Overhead                         289752.2 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   24989966.7   2141.4
Double-Precision Whetstone                       55.0       2692.8    489.6
Execl Throughput                                 43.0       1402.1    326.1
File Copy 1024 bufsize 2000 maxblocks          3960.0     241317.6    609.4
File Copy 256 bufsize 500 maxblocks            1655.0      62107.1    375.3
File Copy 4096 bufsize 8000 maxblocks          5800.0     683264.3   1178.0
Pipe Throughput                               12440.0     341879.6    274.8
Pipe-based Context Switching                   4000.0      70643.0    176.6
Process Creation                                126.0       2894.9    229.8
Shell Scripts (1 concurrent)                     42.4       2801.6    660.8
Shell Scripts (8 concurrent)                      6.0        537.2    895.3
System Call Overhead                          15000.0     289752.2    193.2
                                                                   ========
System Benchmarks Index Score                                         470.4

------------------------------------------------------------------------
Benchmark Run: Tue Oct 28 2014 14:37:58 - 15:08:41
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       49309540.2 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     5369.0 MWIPS (9.9 s, 7 samples)
Execl Throughput                               2554.1 lps   (29.1 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        396467.5 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          100357.3 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1129472.8 KBps  (30.0 s, 2 samples)
Pipe Throughput                              677243.3 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 137728.1 lps   (10.0 s, 7 samples)
Process Creation                               5110.2 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   3965.2 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    547.8 lpm   (60.1 s, 2 samples)
System Call Overhead                         554828.5 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   49309540.2   4225.3
Double-Precision Whetstone                       55.0       5369.0    976.2
Execl Throughput                                 43.0       2554.1    594.0
File Copy 1024 bufsize 2000 maxblocks          3960.0     396467.5   1001.2
File Copy 256 bufsize 500 maxblocks            1655.0     100357.3    606.4
File Copy 4096 bufsize 8000 maxblocks          5800.0    1129472.8   1947.4
Pipe Throughput                               12440.0     677243.3    544.4
Pipe-based Context Switching                   4000.0     137728.1    344.3
Process Creation                                126.0       5110.2    405.6
Shell Scripts (1 concurrent)                     42.4       3965.2    935.2
Shell Scripts (8 concurrent)                      6.0        547.8    913.0
System Call Overhead                          15000.0     554828.5    369.9
                                                                   ========
System Benchmarks Index Score                                         800.8
 

TheMadDBA

Active Member
No the cloud isn't all that it is cracked up to be... especially at lower price points. Co-location (putting your hardware in a datacenter) is another option. Or just let the cloud be your disaster site and expect things to be horrible when that happens.

Or you are going to have to pay more for faster storage. I know Amazon has options to get more IOPS for more money per month but I have not been impressed once you get past the cache.
 

Cecil

19+ years progress programming and still learning.
I raised as support question with my current VPS provider if I can in some way increase the IOPS and they came up with a solution of a dedicated server which is about 4~5x time the cost. Doh! Also I was told that having a table with 65,000 records was HUGH and would take a very long time to process. He also recommended to add an index to the table, which it does.

As a quick/atternative solution I've started using ETAG headers into the HTTP header server response of the AJAX JSON requests. So when the header record changes on the DB I generate a new random ETAG value.
So when the client's browser does condition GET request, WebSpeed does a comparison between the Client's and the Server's ETAG. WebSpeed now will send a 304 Server Response or send the JSON result along with the updated ETAG header.

Now the overall process from CLIENT-->SERVER-->CLIENT has reduced it's time from 2.2 Seconds down to 14 Milliseconds and increasing the UX. :)
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
A table with 65,000 records is not huge. It's small. Even a table scan shouldn't take more than a few seconds.

If your index components match your query predicates then your queries should be very efficient. But how long they run will depend on other factors like your IOPS.
 

Cecil

19+ years progress programming and still learning.
A table with 65,000 records is not huge. It's small. Even a table scan shouldn't take more than a few seconds.

If your index components match your query predicates then your queries should be very efficient. But how long they run will depend on other factors like your IOPS.

Yeah, I totally agree, 65,000 records is not huge. I've double checked my xref & listings making sure the indexes are correct. I think the general is issue is just my VPS server is just not performing as well as I like.
 

TheMadDBA

Active Member
Sounds like you have a bad vendor if they think 65,000 records is anywhere approaching "huge" and based on the disk stats you have provided the disks are indeed really slow.

All that being said... those 65k records should be in memory (either -B/-B2/OS cache) and you probably aren't reading them from disk.

Confused on what you are actually doing that takes 2.2 seconds though... is it getting all records from that table or some?

How long does the DB portion take compared to the network portion?

Also... XREF lies quite often... or rather it doesn't provide enough information sometimes. The _TableStat and _IndexStat (or _UserTableStat/_UserIndexStat) VSTs are your friends and will tell you exactly how many records a query is reading.
 

Cecil

19+ years progress programming and still learning.
Sounds like you have a bad vendor if they think 65,000 records is anywhere approaching "huge" and based on the disk stats you have provided the disks are indeed really slow.

All that being said... those 65k records should be in memory (either -B/-B2/OS cache) and you probably aren't reading them from disk.
I am consistency getting a 99% buffer hit every time I run the query.

Confused on what you are actually doing that takes 2.2 seconds though... is it getting all records from that table or some?
The queries are aggregating 1 years of records (65,000) breaking them down by the relevant sorts orders and accumulating the results. i.e. BY Calendar Month, BY Batch, BY Weekday, BY Hour, BY Gender/Generation. All having the appropriate indexes. I'm pretty confident that there is nothing wrong with the code as it's blisteringly fast on my development server, however that not really a fair comparison.

How long does the DB portion take compared to the network portion?
The DB portion is taking the longest part of the end-to-end process leaving only milliseconds for the network transfer giving an average total time of 2.2 seconds.

Also... XREF lies quite often... or rather it doesn't provide enough information sometimes. The _TableStat and _IndexStat (or _UserTableStat/_UserIndexStat) VSTs are your friends and will tell you exactly how many records a query is reading.
Is there KB I can use to get some code to test with???​

I've just done a quick timing and have noticed a performance improvement, this must do to the current load on the server and current time of the day (Sunday evening). I'll see if can move the VM to a machine with less guests. Apparently they will only allow 10-20 guest per CPU, which sound pretty good to me.

Server Side Count & Timings
Record Count

64575
Process Time

1284
Record Rate

50.2920560748

Client Side Timings (using Firebug)
speedtest-png.1263
 

Attachments

  • SpeedTest..PNG
    SpeedTest..PNG
    6.1 KB · Views: 49

TheMadDBA

Active Member
Those numbers don't sound very impressive at all. 65k records from cache should be pretty quick on any modern system. But disk performance doesn't seem to be the problem for this query.

This is a decent tool to look at User VST activity: http://www.oehive.org/project/DBActMon

You need to make sure you set -tablerangesize and -indexrangesize appropriately. Also the profiler is a good tool to find exactly where your code is spending it's time.
 

Cecil

19+ years progress programming and still learning.
Those numbers don't sound very impressive at all. 65k records from cache should be pretty quick on any modern system. But disk performance doesn't seem to be the problem for this query.

This is a decent tool to look at User VST activity: http://www.oehive.org/project/DBActMon

You need to make sure you set -tablerangesize and -indexrangesize appropriately. Also the profiler is a good tool to find exactly where your code is spending it's time.

Hey, Thanks for that. I've just used that DBActMon but I need some clarification. Are the metrics values displayed as the number of records? The reason I ask is because within one WebSpeed get request I read the WebSession table 17,922??? I need to investigate that code further.

screenshot121-png.1264


Also the table I'm expecting to see is a table called BatchOutcomes, it never shows.
 

Attachments

  • ScreenShot121.png
    ScreenShot121.png
    9.9 KB · Views: 30

Rob Fitzpatrick

ProgressTalk.com Sponsor
The number displayed in "Session Total" is from _UserTableStat._UserTableStat-read, which is the number of records read by that client session since (a) it logged in or (b) the stats were last zeroed out.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Also the table I'm expecting to see is a table called BatchOutcomes, it never shows
Are you sure you have set -tablerangesize high enough for your schema? If you have any tables whose _file-number is greater than (tablerangesize + basetable - 1) then you won't have any CRUD stats for them in _TableStat or _UserTableStat.

Note that your highest application table number isn't necessarily equal to your application table count; it could be higher.

Same concept applies to indexes.
 

Cecil

19+ years progress programming and still learning.
Are you sure you have set -tablerangesize high enough for your schema? If you have any tables whose _file-number is greater than (tablerangesize + basetable - 1) then you won't have any CRUD stats for them in _TableStat or _UserTableStat.

Note that your highest application table number isn't necessarily equal to your application table count; it could be higher.

Same concept applies to indexes.
Woops, Missed that bit about -tablerangesize and -indexrangesize . Thanks.
 

Cecil

19+ years progress programming and still learning.
Something interesting has turned up when using the DBActMon code/utility. I've noticed that the record count is doubled to what I was expecting it to be. It's almost like the procedure which does the query is called twice.
64,575 x 2 = 129,150

So I zero the values by pressing the spacebar, execute the web query in my browser and press the spacebar again to refresh the values. Most likely the problem is situate between the back of the chair and the keyboard.


screenshot123-png.1265
 

Attachments

  • ScreenShot123.png
    ScreenShot123.png
    14.2 KB · Views: 31

TheMadDBA

Active Member
It could also be sorting that causes the extra reads... you can compile with an XREF and look for SORT-ACCESS.

Also you can look at the index view to make sure it is using the index you think it should be.... if you see half the number of reads in the index view compared to the table view that means ROWID/RECID lookups are happening. Either because of a sort (Progress sorts the records and then finds them by ROWID) or because you are upgrading the lock or finding a buffer.

EDIT: Also, also... you can steal the VST code from the tool and integrate it into your webspeed call for more fine tuned stats gathering.
 

TheMadDBA

Active Member
The number displayed in "Session Total" is from _UserTableStat._UserTableStat-read, which is the number of records read by that client session since (a) it logged in or (b) the stats were last zeroed out.

That is correct... the "Last Sample" column shows reads since you last hit the space bar or 60 seconds have elapsed.
 
Top