M
moledro
Guest
I'm working on a data migration project involving a Progress OpenEdge program on a separate Windows Server. My current approach handles large datasets by segmenting them into batches, processing each through temporary tables. Initially, I store each entity (let's say 'units') in these tables. For each unit, the program creates and processes a dedicated temporary table, which is cleared once processing completes before moving on to the next unit. This involves executing SQL queries via an ODBC connection to an RDS database, followed by constructing JSON strings for POST requests to WebAPI endpoints, which then insert the data into a remote MS SQL database.
This multistep process shows a substantial performance boost when the WebAPI and the MS SQL database are local—almost five times faster than when operated remotely. This system's complexity is compounded by the need for immediate IDs from newly inserted rows for subsequent steps, preventing a straightforward switch to direct SQL inserts. Each unit's processing might trigger actions for multiple associated 'sub-units' (could be hundreds or thousands), requiring individual attention in the workflow.
Given the intricacies and the pressing need to enhance efficiency, I'm exploring architectural improvements or data handling strategies that could streamline operations. Could adjustments in reducing network latency, optimizing data transfer, or restructuring how data is batched and processed offer improvements? I also integrate data cleaning functions within this process, which introduces additional layers of complexity. I would greatly appreciate any insights or suggestions on managing and optimizing such a distributed system effectively.
Continue reading...
This multistep process shows a substantial performance boost when the WebAPI and the MS SQL database are local—almost five times faster than when operated remotely. This system's complexity is compounded by the need for immediate IDs from newly inserted rows for subsequent steps, preventing a straightforward switch to direct SQL inserts. Each unit's processing might trigger actions for multiple associated 'sub-units' (could be hundreds or thousands), requiring individual attention in the workflow.
Given the intricacies and the pressing need to enhance efficiency, I'm exploring architectural improvements or data handling strategies that could streamline operations. Could adjustments in reducing network latency, optimizing data transfer, or restructuring how data is batched and processed offer improvements? I also integrate data cleaning functions within this process, which introduces additional layers of complexity. I would greatly appreciate any insights or suggestions on managing and optimizing such a distributed system effectively.
Continue reading...