BUG: MAX_STACK_TRACE_ENTRIES too low!

syzbot

unread,

Mar 2, 2019, 10:06:05 AM3/2/19

to [email protected], [email protected], [email protected], [email protected]

Hello,

syzbot found the following crash on:

HEAD commit: c63e9e91a254 Add linux-next specific files for 20190301
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=16a559b2c00000
kernel config: https://syzkaller.appspot.com/x/.config?x=f5875f9dc6e009b2
dashboard link: https://syzkaller.appspot.com/bug?extid=78923eea7cf44364f4fb
compiler: gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]

BUG: MAX_STACK_TRACE_ENTRIES too low!
turning off the locking correctness validator.
CPU: 0 PID: 19385 Comm: syz-executor.0 Not tainted 5.0.0-rc8-next-20190301
#1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
save_trace kernel/locking/lockdep.c:467 [inline]
save_trace.cold+0x14/0x19 kernel/locking/lockdep.c:437
mark_lock+0x2fb/0x1380 kernel/locking/lockdep.c:3401
__lock_acquire+0x548/0x3fb0 kernel/locking/lockdep.c:3648
lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4202
flush_workqueue+0x126/0x14c0 kernel/workqueue.c:2774
drain_workqueue+0x1b4/0x470 kernel/workqueue.c:2939
destroy_workqueue+0x21/0x700 kernel/workqueue.c:4315
xfs_destroy_mount_workqueues+0x11d/0x1c0 fs/xfs/xfs_super.c:906
xfs_fs_fill_super+0x8e9/0x1670 fs/xfs/xfs_super.c:1786
mount_bdev+0x307/0x3c0 fs/super.c:1346
xfs_fs_mount+0x35/0x40 fs/xfs/xfs_super.c:1834
legacy_get_tree+0xf2/0x200 fs/fs_context.c:584
vfs_get_tree+0x123/0x450 fs/super.c:1481
do_new_mount fs/namespace.c:2622 [inline]
do_mount+0x1436/0x2c40 fs/namespace.c:2942
ksys_mount+0xdb/0x150 fs/namespace.c:3151
__do_sys_mount fs/namespace.c:3165 [inline]
__se_sys_mount fs/namespace.c:3162 [inline]
__x64_sys_mount+0xbe/0x150 fs/namespace.c:3162
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45a89a
Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 2d 8e fb ff c3 66 2e 0f
1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff
ff 0f 83 0a 8e fb ff c3 66 0f 1f 84 00 00 00 00 00
RSP: 002b:00007f6d038b0a88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007f6d038b0b30 RCX: 000000000045a89a
RDX: 00007f6d038b0ad0 RSI: 0000000020000140 RDI: 00007f6d038b0af0
RBP: 0000000020000140 R08: 00007f6d038b0b30 R09: 00007f6d038b0ad0
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000003
R13: 0000000000000000 R14: 00000000004dbde0 R15: 00000000ffffffff

---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.

syzbot

unread,

Mar 2, 2019, 2:45:06 PM3/2/19

to [email protected], [email protected], [email protected], [email protected]

syzbot has found a reproducer for the following crash on:

HEAD commit: c63e9e91a254 Add linux-next specific files for 20190301
git tree: linux-next

console output: https://syzkaller.appspot.com/x/log.txt?x=10856157200000

kernel config: https://syzkaller.appspot.com/x/.config?x=f5875f9dc6e009b2
dashboard link: https://syzkaller.appspot.com/bug?extid=78923eea7cf44364f4fb
compiler: gcc (GCC) 9.0.0 20181231 (experimental)

syz repro: https://syzkaller.appspot.com/x/repro.syz?x=149ae1eb200000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15ee41eb200000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]

BUG: MAX_STACK_TRACE_ENTRIES too low!
turning off the locking correctness validator.

CPU: 0 PID: 1176 Comm: kworker/u5:0 Not tainted 5.0.0-rc8-next-20190301 #1

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011

Workqueue: hci1 hci_power_on

Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
save_trace kernel/locking/lockdep.c:467 [inline]
save_trace.cold+0x14/0x19 kernel/locking/lockdep.c:437
mark_lock+0x2fb/0x1380 kernel/locking/lockdep.c:3401

mark_irqflags kernel/locking/lockdep.c:3304 [inline]
__lock_acquire+0xf2c/0x3fb0 kernel/locking/lockdep.c:3644
lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4202
process_one_work+0x8df/0x1790 kernel/workqueue.c:2244
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x357/0x430 kernel/kthread.c:253
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

Christoph Hellwig

unread,

Mar 8, 2019, 11:11:12 AM3/8/19

to syzbot, [email protected], [email protected], [email protected], [email protected], Peter Zijlstra, Ingo Molnar, Will Deacon

Probably more a lockdep than XFS thing..

---end quoted text---

syzbot

unread,

Mar 11, 2019, 4:26:02 PM3/11/19

syzbot has bisected this bug to:

commit 669de8bda87b92ab9a2fc663b3f5743c2ad1ae9f
Author: Bart Van Assche <[email protected]>
Date: Thu Feb 14 23:00:54 2019 +0000

kernel/workqueue: Use dynamic lockdep keys for workqueues

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=17dec757200000
start commit: 669de8bd kernel/workqueue: Use dynamic lockdep keys for wo..
git tree: upstream
final crash: https://syzkaller.appspot.com/x/report.txt?x=143ec757200000
console output: https://syzkaller.appspot.com/x/log.txt?x=103ec757200000
kernel config: https://syzkaller.appspot.com/x/.config?x=bd349f14600e1305
dashboard link: https://syzkaller.appspot.com/bug?extid=78923eea7cf44364f4fb
userspace arch: amd64
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=149c858d200000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=111aee33200000

Reported-by: [email protected]
Fixes: 669de8bd ("kernel/workqueue: Use dynamic lockdep keys for
workqueues")

Bart Van Assche

unread,

Mar 11, 2019, 6:19:38 PM3/11/19

I think this issue has been fixed by a commit that went upstream yesterday. Hence:

#syz fix: workqueue, lockdep: Fix an alloc_workqueue() error path

Linus Torvalds

unread,

Mar 11, 2019, 8:55:30 PM3/11/19

to Bart Van Assche, syzbot, Andrew Morton, Darrick J. Wong, [email protected], Gustavo A. R. Silva, Christoph Hellwig, Jann Horn, Jason Gunthorpe, [email protected], [email protected], Linux List Kernel Mailing, [email protected], [email protected], Waiman Long, Ingo Molnar, Ingo Molnar, Paul McKenney, Peter Zijlstra, [email protected], [email protected], [email protected], [email protected], [email protected], Thomas Gleixner, Tejun Heo, Will Deacon, Cong Wang

On Mon, Mar 11, 2019 at 8:19 AM Bart Van Assche <[email protected]> wrote:
>
> I think this issue has been fixed by a commit that went upstream yesterday. Hence:
>
> #syz fix: workqueue, lockdep: Fix an alloc_workqueue() error path

Well, syzbot just reported a problem with that fix itself ("WARNING in
lockdep_unregister_key").

Looks like now the lockdep_unregister_key() cleanup might be called
even when the lockdep map was never initialized at all in that error
case. Hmm?

I _think_ you need to split that "err_free_wq" label into "just free
the wq" and "unregister lockdep and free wq".

But I didn't look any more closely, I might be misreading things.

Linus

Bart Van Assche

unread,

Mar 11, 2019, 9:17:23 PM3/11/19

to Linus Torvalds, syzbot, Andrew Morton, Darrick J. Wong, [email protected], Gustavo A. R. Silva, Christoph Hellwig, Jann Horn, Jason Gunthorpe, [email protected], [email protected], Linux List Kernel Mailing, [email protected], [email protected], Waiman Long, Ingo Molnar, Ingo Molnar, Paul McKenney, Peter Zijlstra, [email protected], [email protected], [email protected], [email protected], [email protected], Thomas Gleixner, Tejun Heo, Will Deacon, Cong Wang

Hi Linus,

From the console output of the syzbot bug report at the start of this
e-mail thread:

WARNING: CPU: 1 PID: 7649 at kernel/locking/lockdep.c:747 register_lock_class+0x10de/0x2220

On line 747 in lockdep.c there is the following warning statement:

WARN_ON_ONCE(class->name != lock->name);

My interpretation is that this means that a lockdep key got reregistered
without unregistration between the two registration calls. That's why I
posted my "#syz fix: workqueue, lockdep: Fix an alloc_workqueue() error
path" reply.

The latest syzbot complaint is different. In the console output of the latest
report I found the following:

WARNING: CPU: 1 PID: 6970 at kernel/locking/lockdep.c:4925 lockdep_unregister_key+0x21c/0x4e0

On line 4925 there is the following warning statement:

WARN_ON_ONCE(!found);

In other words, lockdep_unregister_key() got called for a key that was never
registered.

I agree with your conclusion that the err_free_wq label needs to be split. I
will post a patch that realizes this.

Bart.

Reply all

Reply to author

Forward