Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch list implementation to more effective one #7898

Draft
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

ZyX-I
Copy link
Contributor

@ZyX-I ZyX-I commented Jan 23, 2018

Pretty sure it will end up being dynamically allocated array, but some things are undecided yet. Have a nice collection of data (in Russian though) for an article regarding list implementations in different languages (VimL, Python, Lua(Jit), Perl, MoarVM (Perl6), Ruby, PHP), most (except for VimL) are using reallocated arrays, but details are very different (especially PHP which shares structures and code with hash table; Lua(Jit) also has different kind of hash/array hybrid).

TODO:

  • Replace list implementation with something better.
  • Use benchmarks to confirm that new list implementation is actually faster.
  • Compare memory footprints on actual configuration with new and old code.
  • Update unit tests.
  • Create new functional and/or unit tests operating on larger lists: specifically, lists which are at least twice and at least four times larger then initial array size.
  • Create new functional and/or unit tests testing GC operating on containers with bigger nesting: some GC errors I did encounter occurred after second reallocation of GC stack vector.

@marvim marvim added the WIP label Jan 23, 2018
@ZyX-I ZyX-I force-pushed the list-reimplement branch 2 times, most recently from 567ccfa to 6d48c33 Compare January 27, 2018 00:18
@ZyX-I
Copy link
Contributor Author

ZyX-I commented Jan 27, 2018

Some stats out of @justinmk log:

---
allalloc: 8920
allappend: 1262
allendfind: 280
allfind: 1105
allfindcachemiss: 1091
allfreelist: 8907
allknown_alloc_int: 0
allmov: 0
allremov: 0
allrevers: 59
alls10init: 0
allsinit: 9904
allsort: 60
allstartfind: 317
allstatic_alloc: 9904
allunknown_alloc: 6
append: {avg: 0.0668503019387647, len: 18878, max: 41, med: 0.0, min: 0}
drop: {avg: 0.07193558639686408, len: 18878, max: 59, med: 0.0, min: 0}
dropsiz: {avg: 1.0, len: 1358, max: 1, med: 1.0, min: 1}
find: {avg: 0.05853374298124801, len: 18878, max: 13, med: 0.0, min: 0}
findcachediff: {avg: 0.8816568047337278, len: 169, max: 1, med: 1, min: -1}
findcachemiss: {avg: 0.05779213899777519, len: 18878, max: 6, med: 0.0, min: 0}
findenddiff: {avg: 4.664253393665159, len: 1105, max: 8, med: 8, min: 0}
findstartdiff: {avg: 0.7809954751131222, len: 1105, max: 4, med: 1, min: 0}
first: {avg: 1.0343256700921708, len: 18878, max: 91, med: 0.0, min: 0}
freecont: {avg: 8907.0, len: 1, max: 8907, med: 8907, min: 8907}
freecontsiz: {avg: 2.484562703491636, len: 8907, max: 53, med: 2, min: 0}
freelistsiz: {avg: 0.0, len: 8907, max: 0, med: 0, min: 0}
insert: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
iter: {avg: 0.24811950418476533, len: 18878, max: 66, med: 0.0, min: 0}
iterconst: {avg: 0.00031783027863121094, len: 18878, max: 1, med: 0.0, min: 0}
known_alloc_len: {avg: 2.1035372144436257, len: 2714, max: 100, med: 2.0, min: 0}
last: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
len: {avg: 1.3253522618921496, len: 18878, max: 92, med: 0.0, min: 0}
mov: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
remov: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
revers: {avg: 0.0031253310732069074, len: 18878, max: 1, med: 0.0, min: 0}
sort: {avg: 0.003178302786312109, len: 18878, max: 1, med: 0.0, min: 0}
startinsert: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
unknown_alloc_len: {avg: 2.856670341786108, len: 5442, max: 53, med: 1.0, min: 0}
unsure_alloc_len: {avg: 2.503957783641161, len: 758, max: 5, med: 3.0, min: 0}

Full output: https://gist.github.com/e2745696b5c5c60e4d694fbb51c9659e.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Jan 27, 2018

Updated after latest commit:

---
allalloc: 8920
allappend: 1262
allendfind: 280
allfind: 1105
allfindcachemiss: 1091
allfreelist: 8907
allknown_alloc_int: 0
allmov: 0
allremov: 0
allrevers: 59
alls10init: 0
allsinit: 9904
allsort: 60
allstartfind: 317
allstatic_alloc: 9904
allunknown_alloc: 6
append: {avg: 0.0668503019387647, len: 18878, max: 41, med: 0.0, min: 0}
drop: {avg: 0.07193558639686408, len: 18878, max: 59, med: 0.0, min: 0}
dropsiz: {avg: 1.0, len: 1358, max: 1, med: 1.0, min: 1}
find: {avg: 0.05853374298124801, len: 18878, max: 13, med: 0.0, min: 0}
findcachediff: {avg: 0.8816568047337278, len: 169, max: 1, med: 1, min: -1}
findcachemiss: {avg: 0.05779213899777519, len: 18878, max: 6, med: 0.0, min: 0}
findenddiff: {avg: 4.664253393665159, len: 1105, max: 8, med: 8, min: 0}
findstartdiff: {avg: 0.7809954751131222, len: 1105, max: 4, med: 1, min: 0}
first: {avg: 1.0343256700921708, len: 18878, max: 91, med: 0.0, min: 0}
freecont: {avg: 8907.0, len: 1, max: 8907, med: 8907, min: 8907}
freecontsiz: {avg: 2.484562703491636, len: 8907, max: 53, med: 2, min: 0}
freelistsiz: {avg: 0.0, len: 8907, max: 0, med: 0, min: 0}
insert: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
iter: {avg: 0.24811950418476533, len: 18878, max: 66, med: 0.0, min: 0}
iterconst: {avg: 0.00031783027863121094, len: 18878, max: 1, med: 0.0, min: 0}
known_alloc_len: {avg: 2.1035372144436257, len: 2714, max: 100, med: 2.0, min: 0}
last: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
len: {avg: 1.3253522618921496, len: 18878, max: 92, med: 0.0, min: 0}
lengthdiff: {avg: 2.6475207538703165, len: 8914, max: 100, med: 2.0, min: 0}
maxlength: {avg: 1.2853883701188455, len: 18848, max: 100, med: 0.0, min: 0}
minlength: {avg: 0.003477675566524568, len: 8914, max: 5, med: 0.0, min: 0}
mov: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
remov: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
revers: {avg: 0.0031253310732069074, len: 18878, max: 1, med: 0.0, min: 0}
sort: {avg: 0.003178302786312109, len: 18878, max: 1, med: 0.0, min: 0}
startinsert: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
unknown_alloc_len: {avg: 2.856670341786108, len: 5442, max: 53, med: 1.0, min: 0}
unsure_alloc_len: {avg: 2.503957783641161, len: 758, max: 5, med: 3.0, min: 0}

My “typical editing session” (though it yields 69 MiB log after something like five minutes) stats:

---
allalloc: 31952
allappend: 31387
allendfind: 10840
allfind: 22234
allfindcachemiss: 16109
allfreelist: 25144
allknown_alloc_int: 361
allmov: 722
allremov: 0
allrevers: 0
alls10init: 0
allsinit: 28976
allsort: 1
allstartfind: 14108
allstatic_alloc: 28976
allunknown_alloc: 13
append: {avg: 0.5151405734543485, len: 60929, max: 870, med: 0, min: 0}
drop: {avg: 0.6805133844310591, len: 60929, max: 869, med: 0, min: 0}
dropsiz: {avg: 1.0278320430263126, len: 41463, max: 98, med: 1, min: 1}
find: {avg: 0.3649165422048614, len: 60929, max: 1739, med: 0, min: 0}
findcachediff: {avg: 0.484904738641915, len: 10235, max: 97, med: 0, min: -29}
findcachemiss: {avg: 0.2643896994862873, len: 60929, max: 1337, med: 0, min: 0}
findenddiff: {avg: 4.397994063146532, len: 22234, max: 117, med: 1.0, min: 0}
findstartdiff: {avg: 1.8097958082216425, len: 22234, max: 97, med: 0.0, min: 0}
first: {avg: 4.671896797912324, len: 60929, max: 3702, med: 0, min: 0}
freecont: {avg: 25144.0, len: 1, max: 25144, med: 25144, min: 25144}
freecontsiz: {avg: 5.380806554247534, len: 25144, max: 1542, med: 1.0, min: 0}
freelistsiz: {avg: 0.0, len: 25144, max: 0, med: 0.0, min: 0}
insert: {avg: 0.032644553496692874, len: 60929, max: 795, med: 0, min: 0}
iter: {avg: 0.15700241264422524, len: 60929, max: 80, med: 0, min: 0}
iterconst: {avg: 0.0033317467872441696, len: 60929, max: 1, med: 0, min: 0}
known_alloc_len: {avg: 15.274523007856342, len: 8910, max: 1542, med: 2.0, min: 0}
last: {avg: 0.0, len: 60929, max: 0, med: 0, min: 0}
len: {avg: 3.2614846788885425, len: 60929, max: 7400, med: 1, min: 0}
lengthdiff: {avg: 5.580937401698647, len: 31790, max: 1542, med: 1.0, min: 0}
maxlength: {avg: 3.271500787815126, len: 60928, max: 1542, med: 0.0, min: 0}
minlength: {avg: 0.35460836741113555, len: 31790, max: 49, med: 0.0, min: 0}
mov: {avg: 0.011849858031479263, len: 60929, max: 160, med: 0, min: 0}
remov: {avg: 0.0, len: 60929, max: 0, med: 0, min: 0}
revers: {avg: 0.0, len: 60929, max: 0, med: 0, min: 0}
sort: {avg: 1.6412545749971278e-05, len: 60929, max: 1, med: 0, min: 0}
startinsert: {avg: 0.017413711040719527, len: 60929, max: 635, med: 0, min: 0}
unknown_alloc_len: {avg: 0.9374942209893666, len: 10815, max: 58, med: 0, min: 0}
unsure_alloc_len: {avg: 1.9331852358168147, len: 11704, max: 93, med: 2.0, min: 0}

It does not do anything useful with the parsed log yet though.

[ci skip]
@ZyX-I
Copy link
Contributor Author

ZyX-I commented Jan 27, 2018

It appears that typical list is small and immutable - hardly a surprise. What is a surprise is that at least half of the lists are empty ones. As for operations: :unlet is used literally never for deleting list elements, [] is hardly ever used for long lists and in any case normally used to get first or last list elements, remove() is almost never used with range, @justinmk has something which uses reverse() and sort() relatively often (59 and 60 hits) while I have no plugins calling reverse() at all normally and only one sort() call.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Jan 27, 2018

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Jan 27, 2018

I am thinking of implementing list on top of kvec_withinit_t(typval_T, 4). Should increase memory usage then, but 90% of lists will need exactly one allocation.

@justinmk
Copy link
Member

What are the benefits?

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Jan 28, 2018

Of what exactly? Using vector over linked list should be faster (well, except for empty lists), and avoiding second allocation is going to speed up things a bit further for most lists. The vector allocation pattern is though merely “double vector size always”, yet a number of languages (like lua or perl, though first is special case (has two vectors to grow separately with some logic behind whether item with numeric key is going to stored in hash or actual array) and second is a bit less then doubles: actually, old_len + min_len - 2; and apparently whoever wrote kvec.h was OK with that) manage to work fine with that.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Jan 28, 2018

By the way, it appears that when using vectors behaviour of :for is going to change: previously sort() and reverse() on the list iterated over caused to continue with the same item, no matter where it appeared after the operation, now it is easier to code if it continued with the same index.

@justinmk
Copy link
Member

Should increase memory usage then, but 90% of lists will need exactly one allocation.

Now I see, here you were talking about the "double vector size always" pattern.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Jan 28, 2018

@justinmk I was talking about allocating a small array as a part of the list structure always, specifically the second argument to kvec_withinit_t (4). Without that memory usage will be reduced: vector has size, capacity and a pointer, current list implementation has pointers to first, last and cached elements and also size and cached element index, not to mention two pointers per list element and possible losses due to memory fragmentation and/or alignment issues as lists forced many small allocations, one per each element. With this array it will allocate those four elements no matter what, and they are not going to be used for large lists anyway.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Jan 28, 2018

Of what I checked so far only Ruby somewhat pays this price for its arrays, but it pays with three elements per array (why not four?), and they are of pointer integer type and not typval_T like we have.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Jan 28, 2018

Probably not four to not really pay any price: that array is in a union with long, union(long, pointer) and pointer.

@ZyX-I ZyX-I force-pushed the list-reimplement branch 2 times, most recently from ee9f76e to 8eaac17 Compare February 5, 2018 08:50
@ZyX-I ZyX-I force-pushed the list-reimplement branch 2 times, most recently from 373cd71 to 5d15b11 Compare February 11, 2018 17:57
@ZyX-I
Copy link
Contributor Author

ZyX-I commented Feb 12, 2018

Current status: functional tests eval/* are working, legacy/055* is crashing (specifically, on “list and dictionary types a:000 function argument”; and only on this now), AFAIR all functional tests up to legacy/055* were working as well. Benchmarking was not done (deferred until zero crashes on functional tests), unit tests were not refactored. If I am not mistaking, new tv_clear implementation is rather stable now (and it is far easier to debug in any case).

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Feb 13, 2018

All functional tests pass locally now; and nvim works normally with my actual config only crashing at exit with -DEXITFREE (nothing new, was doing that before the PR, but I did not debug the issue yet). Time for some benchmarks now (not today though).

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Feb 15, 2018

Looks like preallocating four elements per list was a very bad idea:

before (pps nvim -o vsize,size,rss,size,sz):

                                     VSZ  SIZE   RSS  SIZE    SZ
 2678 bin/nvim                     82212 33364 17888 33364 20553

after:

pps nvim -o vsize,size,rss,size,sz
                                     VSZ  SIZE   RSS  SIZE    SZ
 6558 bin/nvim                    4259928 4211272 1065400 4211272 1064982

(these are memory footprints of nvim running with my config on this branch and on master where by “master” I mean the commit I branched of).

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Feb 15, 2018

This is strange, removing four preallocated elements actually doubled vsize in place of reducing it:

                                     VSZ  SIZE   RSS  SIZE    SZ
22016 bin/nvim                    8454228 8405572 2113344 8405572 2113557

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Mar 4, 2018

Gcc complains about “missing initializer” for { .attr = … } style initializer. I thought it is OK. And my gcc does not complain about that.

This was actually an attempt to fix local clang warnings from glibc-2.25
macros, specifically:

1. `-Wconversion` on fpclassify() macros.
2. `-Wgnu-statement-expression` from `assert()` macros.

Though after some attempts it appeared that

1 is not actually fixable without making clang ignore warnings from
  math.h (should already be doing that, but does not for some reason) or
  making it pretend being gcc-4.4 or higher and not gcc-4.2 (in the
  former case glibc uses __builtin_fpclassify which is actually
  defined).
2 is also not fixable withoun using `-ansi` (or defining
  `__STRICT_ANSI__` or undefining `__GNUC__`). Though this should be
  fixed by a newer libc: it [uses `__extension__`][1] to mask warnings.

[1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=assert/assert.h;h=5468c16c2c4e354e86454386c39a049daa130523;hb=23158b08a0908f381459f273a984c6fd328363cb#l105
@dgel
Copy link

dgel commented Mar 13, 2018

Hi! I'm sorry for just dropping into this pull request, since I'm not a neovim dev and don't really follow the existing code. I just had a few questions about this, since similar comparisons are often done in c++ for std::list vs std::vector. Do I understand correctly that the old implementation was a doubly-linked list of basic types with some sort of caching mechanism? and the new one is a dynamic array implementation?

  • You mention that half the lists are empty, but also that you always allocate 4 elements. Do you mean that you allocate 4 elements upon inserting the first element?
  • If most vectors are small, does it make sense to use some form of small_vec implementation to avoid allocating for very small vectors?
  • inserting at the end being faster for lists seems odd, but maybe it's because you're comparing small lengths only
  • in c++, std::vector doesn't shrink unless requested. Perhaps this doesn't make sense for vimscript, but it's worth considering deallocating less often than you allocate (double when capacity is full, shrink vector when less than half or one quarter capacity?)
  • why can extending a list with itself reallocate twice?

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Mar 13, 2018

  1. Old implementation is a double-linked list. Accessing a certain index (as index) leaves that in a cache. New variant is dynamically realloc()ated array with size equal to one of the powers of two.

  2. I am allocating 4 elements upon list allocation. This does not increase memory usage too much and makes single malloc() call good for 90% of lists.

  3. I do not know C++ and can’t say what is “some form of small_vec implementation”. Google knows about rust’s, but I do not see how I can use stack, list is there for VimL implementation which needs keeping pointers around for undetermined time, for internal uses there are normally other equivalents.

  4. This were my thoughts as well. Inserting at the end ought to be faster as long as there is free space: you don’t need to allocate and you need to write to less places: with double linked list places to write are

    1. List length saved in structure.
    2. Pointer to the last element.
    3. Pointer in the element itself. Two unless using calloc().
    4. Value in the element itself.

    With vectors that would be

    1. List length.
    2. Value of the element.

    And that’s all (unless reallocation happens). How it ended up slower dunno.

  5. Removes are relatively rare and it is not like there is a possibility of keeping “vector size is a power of two” rule without shrinking when more then 1/2 of capacity filled. 1/4 may be used, but I do not find optimizing that a priority.

  6. Do not remember why I thought it was happening. Should actually be another explanation: e.g. with 255/256 elements it needs to copy 256 elements when reallocation happens, with 257 elements it needs to copy 512 elements when reallocation happens because 512 elements were already allocated, but 257*2 = 514 do not fit there.

@justinmk justinmk added the refactor changes that are not features or bugfixes label Mar 13, 2018
@dgel
Copy link

dgel commented Mar 14, 2018

Hi, thanks for replying =) Possibly I'm misunderstanding and you're doing it already, but would it make sense to not allocate at all until an element is added to the vector?

By a small_vector I meant a vector that has an inline buffer (union with the vector's pointer to the buffer, size and capacity) where on small sizes it uses the inline buffer instead of allocating. This may be total overkill / counter productive however.

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Mar 14, 2018

@dgel How do you imagine not allocating? I need a editable object so that two variables holding the same empty vector will get “changed” by putting a value to it.

About small: maybe some code? I kind of have something like this now: allocating a list will allocate space for structure which has four list items among other things. There are no unions though, vector with preallocated items is supposed to be compatible with vector with no preallocated items (i.e. any function which operates on the pointer to vector without preallocated items should also work with pointer to vector with preallocated items) as long as changes made to the vector do not trigger some kind of reallocation. So in place of code checking size against number of preallocated items and selecting where to get values from pointer to the items array is used, which incidentally points to the memory just after the memory where pointer itself is stored. Less space efficient, but easier to work with: I actually use this feature to have vectors with different number of preallocated items (Vim has two lists with different number of preallocated items: for group captures when pattern matching and for user function arguments), and also still be able to easily switch back to vector without any preallocated items at all.

@dgel
Copy link

dgel commented Mar 15, 2018

Ahh yeah it looks like I was misunderstanding. I thought a struct { Data* buffer; size_t cap; size_t len} was being passed around directly, but it seems you're passing a pointer to something like struct {Data* buffer, size_t cap; size_t len; Data initial[4]; }? Or perhaps you always keep the data inline and grow the entire struct instead of holding a pointer to a separate buffer. Either way there's no real advantage to making it a union other than to try to save 16 bytes per vector.

Thanks for talking about this! It's nice to learn about how such lists are implemented in different places. Perhaps I can find an English-language equivalent of that article about different list implementations

@ZyX-I
Copy link
Contributor Author

ZyX-I commented Mar 15, 2018

I can’t grow the entire struct ever. First, that makes vector not compatible with vector without initial elements. Second, I do need a never-changing pointer to hold and right now nearly the same struct you wrote is a part of another struct which additionally has data for GC and some other technical thingies. I considered the variant of having Data buffer[] at the end and grow it, but it neither allows using existing kvec implementation nor has an advantage of less amount of by-pointer memory accesses. By the way, I have seen exactly zero languages where cap and size are stored in the same memory block as buffer side from some optimizations in Ruby (does use union, but in Ruby Data is actually Data *, in Neovim that is complex struct with actual data most of time; also once union is overfilled data array is no longer in the same memory block because of it being accessed via a pointer).

(About article: I have the data collected, but did not quite like how I structured it yet, so did not publish it so far. So you would not have that in Russian either yet.)

@ZyX-I ZyX-I force-pushed the list-reimplement branch 3 times, most recently from 5cc8a9f to 05b433c Compare March 18, 2018 19:33
ZyX-I added 3 commits May 1, 2018 21:48
With this `make -jN` is somewhat working (if you are lucky, tests still do share 
common filenames). Better though is that there are no implicit dependencies now.
This actually makes sense only on developer machines and adds overhead (copying 
all test files for each test run), but not much of that.
@dundargoc dundargoc added needs:response waiting for reply from the author and removed WIP labels Feb 7, 2023
@marvim marvim requested a review from bfredl February 7, 2023 21:35
@dundargoc dundargoc marked this pull request as draft February 7, 2023 21:37
@marvim marvim removed the request for review from bfredl February 7, 2023 21:37
@dundargoc dundargoc removed the needs:response waiting for reply from the author label Feb 7, 2023
@dundargoc dundargoc changed the title [WIP] Switch list implementation to more effective one Switch list implementation to more effective one Apr 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance issues reporting performance problems refactor changes that are not features or bugfixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants