Linode swapper/0 Page allocation failure issue

I kept getting emails from my Linode about Apache falling over and memory contention issues, and have been battling it for a while. I thought I had it licked with reducing the amount of apache processes available (yes, I'm using prefork because I'm too lazy to fix it) but it started to reoccur.

The general solution is to increase vm.min_free_kbytes to a minimum of 4096, anywhere up to .. a lot. Basically put this on its own line in /etc/sysctl.conf:

vm.min_free_kbytes = 4096

Add it in at the end of the file (check there's nothing similar anywhere else) and then run “sysctl -p” to apply it. Rebooting can work too.

The best fix is probably to upgrade to the latest kernel (I was running Linux linode-prime 3.4.2-linode44 #1 SMP Tue Jun 12 15:04:46 EDT 2012 i686 GNU/Linux) as at 30/01/2013 there's a much newer Linode kernel version at 3.6.5-linode47.

There's a short thread about the issue on the Linode forums. For a more extended discussion there's a much longer thread here.

Basically it looks like a memory allocation bug in the IPv4 stack which is patched in the later version. People noticed that swapping backup processes to IPv6 was reducing error rates, because the huge amounts of connections that these types of things cause was exacerbating the problem.

So far the new kernel's been fine, didn't even have to make the change to the system settings. 🙂

Below's an example of the error messgae it'd spit out into syslog:

swapper/0: page allocation failure: order:4, mode:0x20
Pid: 0, comm: swapper/0 Not tainted 3.4.2-linode44 #1
Call Trace:
[<c019b538>] ? warn_alloc_failed+0x98/0x100
[<c019bfa8>] ? __alloc_pages_nodemask+0x4d8/0x6e0
[<c01c3b41>] ? T.874+0x31/0xe0
[<c01c3e31>] ? T.871+0x91/0x250
[<c01c423e>] ? cache_alloc_refill+0x24e/0x290
[<c01c433e>] ? __kmalloc+0xbe/0xd0
[<c056232e>] ? pskb_expand_head+0x12e/0x240
[<c01c32d2>] ? kmem_cache_free+0x42/0x60
[<c05628cd>] ? __pskb_pull_tail+0x4d/0x2a0
[<c05675cf>] ? netif_skb_features+0xaf/0xc0
[<c056b9dd>] ? dev_hard_start_xmit+0x1ed/0x410
[<c05fb7f0>] ? ip_finish_output2+0x280/0x280
[<c057fd7a>] ? sch_direct_xmit+0xba/0x180
[<c057ea98>] ? eth_header+0x28/0xc0
[<c056bcff>] ? dev_queue_xmit+0xff/0x340
[<c05fb95a>] ? ip_finish_output+0x16a/0x2f0
[<c05fbb74>] ? ip_output+0x94/0xb0
[<c05fad58>] ? ip_local_out+0x18/0x20
[<c060eb65>] ? tcp_transmit_skb+0x395/0x660
[<c06114cd>] ? tcp_write_xmit+0x1dd/0x500
[<c0611854>] ? __tcp_push_pending_frames+0x24/0x90
[<c060d64b>] ? tcp_rcv_established+0xfb/0x610
[<c061413b>] ? tcp_v4_do_rcv+0xbb/0x190
[<c06148cd>] ? tcp_v4_rcv+0x6bd/0x7a0
[<c05f6697>] ? ip_local_deliver_finish+0x97/0x220
[<c05f6600>] ? ip_rcv+0x330/0x330
[<c05f6067>] ? ip_rcv_finish+0xd7/0x340
[<c0568dc3>] ? __netif_receive_skb+0x2c3/0x350
[<c056a79f>] ? netif_receive_skb+0x1f/0x70
[<c0506b51>] ? handle_incoming_queue+0x1a1/0x270
[<c0506e44>] ? xennet_poll+0x224/0x570
[<c06f0000>] ? schedule_timeout+0x110/0x1c0
[<c056afda>] ? net_rx_action+0xea/0x1a0
[<c0131c4c>] ? __do_softirq+0x7c/0x110
[<c0131bd0>] ? irq_enter+0x70/0x70
<IRQ> [<c0131a26>] ? irq_exit+0x66/0x90
[<c049414d>] ? xen_evtchn_do_upcall+0x1d/0x30
[<c06f3147>] ? xen_do_upcall+0x7/0xc
[<c01013a7>] ? hypercall_page+0x3a7/0x1000
[<c010603f>] ? xen_safe_halt+0xf/0x20
[<c010fcac>] ? default_idle+0x1c/0x40
[<c010feea>] ? cpu_idle+0x4a/0x80
[<c089986e>] ? start_kernel+0x2e3/0x2e8
[<c0899406>] ? kernel_init+0x127/0x127
[<c089c813>] ? xen_start_kernel+0x520/0x528
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
CPU 1: hi: 0, btch: 1 usd: 0
CPU 2: hi: 0, btch: 1 usd: 0
CPU 3: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 37
CPU 1: hi: 186, btch: 31 usd: 104
CPU 2: hi: 186, btch: 31 usd: 31
CPU 3: hi: 186, btch: 31 usd: 166
active_anon:29120 inactive_anon:12087 isolated_anon:0
active_file:31397 inactive_file:31188 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
free:8485 slab_reclaimable:10186 slab_unreclaimable:1743
mapped:5257 shmem:2455 pagetables:295 bounce:0
DMA free:2080kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:192kB active_file:4000kB inactive_file:832kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15808kB mlocked:0kB dirty:0kB writeback:0kB mapped:120kB shmem:0kB slab_reclaimable:12kB slab_unreclaimable:4kB kernel_stack:8kB pagetables:4kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 500 500 500
Normal free:31860kB min:2816kB low:3520kB high:4224kB active_anon:116480kB inactive_anon:48156kB active_file:121588kB inactive_file:123920kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:512064kB mlocked:0kB dirty:0kB writeback:0kB mapped:20908kB shmem:9820kB slab_reclaimable:40732kB slab_unreclaimable:6968kB kernel_stack:760kB pagetables:1176kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 0*4kB 2*8kB 1*16kB 2*32kB 19*64kB 6*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2080kB
Normal: 385*4kB 160*8kB 1631*16kB 92*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 31860kB
66740 total pagecache pages
1688 pages in swap cache
Swap cache stats: add 5957780, delete 5956092, find 24396952/24872330
Free swap = 246740kB
Total swap = 262140kB
133104 pages RAM
0 pages HighMem
5882 pages reserved
43471 pages shared
81160 pages non-shared
SLAB: Unable to allocate memory on node 0 (gfp=0x20)
cache: size-65536, object size: 65536, order: 4
node 0: slabs: 3/3, objs: 3/3, free: 0


