Loading

Performance Measures in Collabtive 2.1

General Discussion around Collabtive

Performance Measures in Collabtive 2.1

Postby ChristianF » 13.03.2015, 16:26

Hello,

I've been looking at the update today, and I've noticed what you've done with the logger. Not sure how much that'll change though, as I'll have to run it through my profiler first. However, what I did notice was the following line from init.php:
Code: Select all
$template->compileAllConfig('.config', true);

Which is commented out in the 2.1 release, unlike what it was in the 2.0 release. By commenting it out in the 2.0 codebase I shaved about 58% of the execution time, seeing as it didn't recompile the smarty config file at every pageload. Which reduced my load time from a bit over 4 seconds to around 2.

The remaining runtime seems to be mostly tied up to HTMLpurifier, and the getProjectTasks () function. As evidenced by the following:
Code: Select all
** Function                     #calls   Time      Own      avg   
getArrayVal()                  532      2,259.53   339.02   4.25   initfunctions.php:135
HTMLPurifier_Config::getAll()      1539   347.32      235.33   0.23   HTMLPurifier.standalone.php:1991
HTMLPurifier_PropertyList::get()   49761   131.16      74.76   0.00   HTMLPurifier.standalone.php:7948
HTMLPurifier_Config::get()         25137   180.48      48.09   0.01   HTMLPurifier.standalone.php:1884
HTMLPurifier_PropertyList::has()   49761   56.40      40.05   0.00   HTMLPurifier.standalone.php:7975
task::getTask()                  229      2,329.16   34.52   10.17   class.task.php:219

Where "Time" is the total time spent inside the function, including children, and "Own" only the time spent in the function itself. Rest should be pretty obvious. ;)

A lot of that time (about 150 ms) could be shaved off by consolidating queries, by means of joins, subqueries or simply just deleting unnecessary queries. Instead of running several hundred, or even thousands, of queries against the DB. As evidenced by the following snippet:
Code: Select all
   All   (1842)   152.73 ms
   Select   (1841)   152.44 ms
   Update   (1)   0.29 ms

[/code]
Note that a lot of those queries are stuff like this:
Code: Select all
4   SELECT COUNT(*) FROM tasks WHERE project = 1 AND status = 1   0.07   1   OK
5   SELECT COUNT(*) FROM tasks WHERE project = 1 AND status = 0   0.07   1   OK      
6   SELECT customer FROM customers_assigned WHERE project = 1   0.10   N/A   OK      
7   SELECT * FROM company WHERE ID = 0   0.09   N/A   OK   

Which repeats for each task list in the project, accounting for 78 of those queries listed above. Worse is the queries associated with each task inside the task lists though, as they repeat the following set of queries:
Code: Select all
88   SELECT * FROM roles WHERE ID = 2   0.08   1   OK
89   SELECT * FROM tasks WHERE ID = 980   0.10   1   OK
90   SELECT name FROM projekte WHERE ID = 11   0.07   1   OK
91   SELECT name FROM tasklist WHERE ID = 179   0.08   1   OK
92   SELECT user FROM tasks_assigned WHERE task = 980   0.08   1   OK
93   SELECT * FROM user WHERE ID = 24   0.17   1   OK
94   SELECT role FROM roles_assigned WHERE user = 24   0.08   1   OK

If you have 200 tasks, that's 1400 queries there alone. Per pageview!

That said, the number above should probably be higher, but I've already started the process of improving the query performance. So far I've only changed the project::getMyProjects() function, so that it looks like this:
Code: Select all
    function getMyProjects($user, $status = 1)
    {
        global $conn;

        $myprojekte = array();
        $user = (int) $user;
        $status = (int) $status;

        // 20150312: Use only _one_ query to fetch all information about projects. - christian@nax.no
        //          Also made SQL sort the projects, instead of having PHP do so afterwards.
        $sel = $conn->prepare("SELECT * FROM projekte WHERE ID IN(SELECT projekt FROM projekte_assigned WHERE user = ?) AND status=? ORDER BY `end` ASC");
        $selStmt = $sel->execute(array ($user, $status));

        while ($project = $sel->fetch()) {
            if (empty($project)) {
                continue;
            }

            if ($project["end"]) {
                $daysleft = $this->getDaysLeft($project["end"]);
                $project["daysleft"] = $daysleft;
                $endstring = date(CL_DATEFORMAT, $project["end"]);
                $project["endstring"] = $endstring;
            } else {
                $project["daysleft"] = "";
            }

            $startstring = date(CL_DATEFORMAT, $project["start"]);
            $project["startstring"] = $startstring;

            $project["name"] = stripslashes($project["name"]);
            $project["desc"] = stripslashes($project["desc"]);
            $project["done"] = $this->getProgress($project["ID"]);

            $companyObj = new company();
            $project["customer"] = $companyObj->getProjectCompany($project["ID"]);

            // 20150312: Use default array appending method, instead of merging arrays. - christian@nax.no
            $myprojekte[] = $project;
        }

        if (empty($myprojekte)) {
            return false;
        }

        return $myprojekte;
    }
ChristianF
 
Posts: 17
Joined: 13.03.2015, 10:18
Location: Norway

Re: Performance Measures in Collabtive 2.1

Postby Philipp » 13.03.2015, 18:23

ChristianF wrote:Hello,

I've been looking at the update today, and I've noticed what you've done with the logger. Not sure how much that'll change though, as I'll have to run it through my profiler first.


From the profiling we did the change on the logger removes a lot of object instantiations. and thus provides a pretty noticeable speedup.

dont think there is a lot that can be done about the runtime of HTMLpurifier.
User avatar
Philipp
Site Admin
 
Posts: 1118
Joined: 14.12.2007, 03:06
Location: Saarbrücken, germany

Re: Performance Measures in Collabtive 2.1

Postby ChristianF » 16.03.2015, 14:51

I did some runs as well, and noticed a pretty significant speedup as well. Went down to about 950 ms for the same page. Not great, but at least it's usable. So good job on that. :)
Though, personally I'd probably go for DI or at least a singleton pattern, instead of using the global keyword.

I agree on the runtime of the HTMLpurifier, at least not without a significant amount of effort into rewriting it. That said, most of the remaining time seems to be spent fetching data from the database, and processing it.
Code: Select all
Execution Time (918.618 ms)
PHP   469.07 ms   51 %
Database   428.83 ms   47 %
IO   0.58 ms   ~ 0 %
Network   20.13 ms   2 %

I would estimate that at least 350ms of the DB response time could be shaved off, simply by moving queries out of loops. So that we fetch all of the necessary (related) data in one query, instead of one (or more!) query per row, per table.

If we look at per function statistics, even if I sort by total time inclusive, these are the top 6 functions:
Code: Select all
{main}   1   917.38   100.08   917.38   
task::getTaskDetails()   293   158.06   54.87   0.54   class.task.php:693
task::getTask()   293   647.69   51.93   2.21   class.task.php:223
roles::getRole()   370   126.68   48.62   0.34   class.roles.php:294
roles::getUserRole()   370   227.07   41.27   0.61   class.roles.php:234
user::getProfile()   370   345.86   38.80   0.93   class.user.php:223

The only exception is 2 calls to tasklist::getProjectTasklists() which comes in at a second place, if ordered by total time inclusive. Own runtime of 6.33 ms, so all of its runtime is tied into the children functions.

I'm going to look at this during the week, to see if I can't refactor some of the queries. I've forked the project in GitHub, so that it should be pretty easy to audit and merge the changes I do. :)
ChristianF
 
Posts: 17
Joined: 13.03.2015, 10:18
Location: Norway

Re: Performance Measures in Collabtive 2.1

Postby Eva » 18.03.2015, 12:31

If you could suggest concrete improvements, that would be great. :)
Please keep in mind that the loose coupling / high cohesion of the code should be kept during refactoring, as far as possible.
Project Management the way you like it: Collaborative - Open Source - Free

facebook.com/Collabtive
twitter.com/Collabtive
xing.com/companies/collabtive
linkedin.com/company/collabtive
User avatar
Eva
 
Posts: 1471
Joined: 01.01.2008, 23:31
Location: Saarbrücken, Germany

Re: Performance Measures in Collabtive 2.1

Postby ChristianF » 27.03.2015, 11:58

Working on it. Though it's a bit slow work, as I'm not working full time as a developer.
I'll be submitting the suggestions via git, when I get that far. :-)

There are two current theoretical solutions that can be implented:
1. Implement a datastorage object, with DI for persistence and access control, for the information retrieved from the DB. So that once you've fetched something from the DB, it doesn't have to be retrieved from the DB again. Instead it's just retrieved from the DS object. Basically a cache, in other words.
2. Rewrite some of the internal logic, and condense the queries with the use of JOINS. Meaning no more queries inside loops, or at least a greatly reduced number of, and fewer function calls per pageload.

There are benefits and drawbacks with each scenario, and when considering your desire to keep the current architectural style, the optimal solution might be a mix of the two scenarios above. Using the DS object to pre-fetch some necessary data, such as company data, user data and so forth; And condensing loops to fetch all associated tasks at once, referencing the DS-object when associated data from other "sections" is required.
Will be a bit more memory intensive than the current solution, but should drasticly reduce the number of DB queries.
ChristianF
 
Posts: 17
Joined: 13.03.2015, 10:18
Location: Norway


Return to General

Who is online

Users browsing this forum: No registered users

cron